git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug: write-tree corrupts intent-to-add index state
@ 2012-11-06  8:53 Jonathon Mah
  2012-11-06 12:37 ` Nguyen Thai Ngoc Duy
  2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 16+ messages in thread
From: Jonathon Mah @ 2012-11-06  8:53 UTC (permalink / raw
  To: git@vger.kernel.org

Hi revisionaries,

I came across what appears to be a bug while working today. I had several changes, both staged and unstaged, and two files that I had marked with intent-to-add (git add -N).

$ git status -sb
## Library-3.x...origin/Library-3.x [ahead 4]
M  Library/LIRootSource.m
 M Library/Library.xcodeproj/project.pbxproj
[...]
 M Library/SceneKit/LISceneKitBookshelfScene.m
AM Library/SceneKit/LISceneKitDisplayedMediumProxy.h
AM Library/SceneKit/LISceneKitDisplayedMediumProxy.m

(The last two files were added with -N.) I then attempted to stash my work, but that failed:

$ git stash -k
error: Entry 'Library/SceneKit/LISceneKitDisplayedMediumProxy.h' not uptodate. Cannot merge.
Cannot save the current worktree state

Re-checking the status showed that the intent-to-add files were now simply modified, yet they did not appear in git ls-tree -r HEAD.

$ git status -sb
## Library-3.x...origin/Library-3.x [ahead 4]
M  Library/LIRootSource.m
 M Library/Library.xcodeproj/project.pbxproj
[...]
 M Library/SceneKit/LISceneKitBookshelfScene.m
 M Library/SceneKit/LISceneKitDisplayedMediumProxy.h
 M Library/SceneKit/LISceneKitDisplayedMediumProxy.m

It appears the index forgot that those files were new. So not only has the intent-to-add status been lost, but git status shows a file existing in neither HEAD nor the index as not-new-but-modified.

Tracing through it, I narrowed it down to git write-tree affecting the index state. As far as I can tell, it's incorrect for that command to change the index. I'm now out of my (shallow) depth in the git codebase, so I'm throwing it out there for someone to tackle (hopefully). I've attached a failing test case.

Additional notes for bug hunters:

- I've only seen files in already-existing subdirectories lose intent-to-add status, whereas marking a file in the top-level as i-t-a is preserved during write-tree.

- The 'git ls-files -s --debug' output doesn't change across the write-tree:

100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	dir/baz
  ctime: 0:0
  mtime: 0:0
  dev: 0	ino: 0
  uid: 0	gid: 0
  size: 0	flags: 20004000


---
 t/t2203-add-intent.sh | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index ec35409..fcc67c0 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -62,5 +62,27 @@ test_expect_success 'can "commit -a" with an i-t-a entry' '
 	git commit -a -m all
 '
 
+test_expect_success 'i-t-a status unaffected by write-tree' '
+	git reset --hard &&
+	mkdir dir &&
+	echo frotz >dir/bar &&
+	git add dir &&
+	git commit -m "establish dir" &&
+	echo fizfaz >foo &&
+	echo fragz >dir/baz &&
+	git add rezrov &&
+	git add -N foo &&
+	git add -N dir/baz &&
+	cat >expect <<-\EOF &&
+	AM dir/baz
+	AM foo
+	EOF
+	git status --untracked-files=no --porcelain >actual &&
+	test_cmp actual expect &&
+	git write-tree &&
+	git status --untracked-files=no --porcelain >actual &&
+	test_cmp actual expect
+'
+
 test_done
 
-- 
1.8.0



Jonathon Mah
me@JonathonMah.com

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Bug: write-tree corrupts intent-to-add index state
  2012-11-06  8:53 Bug: write-tree corrupts intent-to-add index state Jonathon Mah
@ 2012-11-06 12:37 ` Nguyen Thai Ngoc Duy
  2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy
  1 sibling, 0 replies; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-11-06 12:37 UTC (permalink / raw
  To: Jonathon Mah; +Cc: git@vger.kernel.org

On Tue, Nov 06, 2012 at 12:53:27AM -0800, Jonathon Mah wrote:
> It appears the index forgot that those files were new. So not only
> has the intent-to-add status been lost, but git status shows a file
>  existing in neither HEAD nor the index as not-new-but-modified.
>
> Tracing through it, I narrowed it down to git write-tree affecting
> the index state. As far as I can tell, it's incorrect for that
> command to change the index. I'm now out of my (shallow) depth in
> the git codebase, so I'm throwing it out there for someone to tackle
> (hopefully). I've attached a failing test case.

I played with your test a bit and found that if we skip the commit
step, the bug disappears. I checked and believe that i-t-a flag in
index is preserved, which leaves cache-tree as the culprit. If
cache-tree is (incorrectly) valid, diff operations won't bother
checking index.

We used to not allow writing tree with i-t-a entries present. Then we
allowed it, but forgot that we need to invalidate any paths that
contain i-t-a entries, otherwise cache-tree won't truly reflect
index.

The below patch seems to do the trick, but I'd rather do the
invalidation inside update_one() so we don't need to traverse over the
index again. I haven't succesfully put cache-tree.c back in my head
again to make it happen. Anybody is welcome to step up, or I'll finish
it this weekend.

-- 8< --
diff --git a/cache-tree.c b/cache-tree.c
index 28ed657..30a8018 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it,
 	i = update_one(it, cache, entries, "", 0, flags);
 	if (i < 0)
 		return i;
+	for (i = 0; i < entries; i++)
+		if (cache[i]->ce_flags & CE_INTENT_TO_ADD)
+			cache_tree_invalidate_path(it, cache[i]->name);
 	return 0;
 }
 
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index ec35409..3ddbd89 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' '
 	git commit -a -m all
 '
 
+test_expect_success 'cache-tree invalidates i-t-a paths' '
+	git reset --hard &&
+	mkdir dir &&
+	: >dir/foo &&
+	git add dir/foo &&
+	git commit -m foo &&
+
+	: >dir/bar &&
+	git add -N dir/bar &&
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual &&
+
+	git write-tree >/dev/null &&
+
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 8< -- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] cache-tree: invalidate i-t-a paths after writing trees
  2012-11-06  8:53 Bug: write-tree corrupts intent-to-add index state Jonathon Mah
  2012-11-06 12:37 ` Nguyen Thai Ngoc Duy
@ 2012-11-09 11:04 ` Nguyễn Thái Ngọc Duy
  2012-11-09 11:57   ` Junio C Hamano
  2012-12-08  4:10   ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 16+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-11-09 11:04 UTC (permalink / raw
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathon Mah,
	Nguyễn Thái Ngọc Duy

Intent-to-add entries used to forbid writing trees so it was not a
problem. After commit 3f6d56d (commit: ignore intent-to-add entries
instead of refusing - 2012-02-07), an index with i-t-a entries can
write trees. However, the commit forgets to invalidate all paths
leading to i-t-a entries. With fully valid cache-tree (e.g. after
commit or write-tree), diff operations may prefer cache-tree to index
and not see i-t-a entries in the index, because cache-tree does not
have them.

The invalidation is done after calling update_one() in
cache_tree_update() for simplicity because it's probably not worth the
complexity of changing a recursive function and the performance
bottleneck won't likely fall to this new loop, rather in
write_sha1_file or hash_sha1_file. But this means that if update_one()
has new call sites, they must re-do the same what this commit does to
avoid the same fault.

Reported-by: Jonathon Mah <me@JonathonMah.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.c          |  3 +++
 t/t2203-add-intent.sh | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/cache-tree.c b/cache-tree.c
index 28ed657..30a8018 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it,
 	i = update_one(it, cache, entries, "", 0, flags);
 	if (i < 0)
 		return i;
+	for (i = 0; i < entries; i++)
+		if (cache[i]->ce_flags & CE_INTENT_TO_ADD)
+			cache_tree_invalidate_path(it, cache[i]->name);
 	return 0;
 }
 
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index ec35409..2a4a749 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' '
 	git commit -a -m all
 '
 
+test_expect_success 'cache-tree invalidates i-t-a paths' '
+	git reset --hard &&
+	mkdir dir &&
+	: >dir/foo &&
+	git add dir/foo &&
+	git commit -m foo &&
+
+	: >dir/bar &&
+	git add -N dir/bar &&
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual &&
+
+	git write-tree >/dev/null &&
+
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual
+'
+
 test_done
 
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees
  2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy
@ 2012-11-09 11:57   ` Junio C Hamano
  2012-11-10 11:04     ` Nguyen Thai Ngoc Duy
  2012-12-08  4:10   ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-11-09 11:57 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/cache-tree.c b/cache-tree.c
> index 28ed657..30a8018 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it,
>  	i = update_one(it, cache, entries, "", 0, flags);
>  	if (i < 0)
>  		return i;
> +	for (i = 0; i < entries; i++)
> +		if (cache[i]->ce_flags & CE_INTENT_TO_ADD)
> +			cache_tree_invalidate_path(it, cache[i]->name);
>  	return 0;
>  }

I notice there is another special case for CE_REMOVE but there is
nothing that adjusts the cache-tree for these entries in the current
codebase.

I suspect the original code before we (perhaps incorrectly) updated
the code not to error out upon I-T-A entries was fine only because
we do not attempt to fully populate the cache-tree during a merge in
the unpack-trees codepath, which will mark the index entries that
are to be removed with CE_REMOVE in the resulting index.

The solution implemented with this patch will break if we start
updating the cache tree after a successful merge in unpack-trees, I
suspect.

An alternative might be to add a "phoney" bit next to "used" in the
cache_tree structure, mark the cache tree as phoney when we skip an
entry marked as CE_REMOVE or CE_ITA, and make the postprocessing
loop this patch adds aware of that bit, instead of iterating over
the index entries; instead, it would recurse the resulting cache
tree and invalidate parts of the tree that have subtrees with the
"phoney" bit set, or something.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees
  2012-11-09 11:57   ` Junio C Hamano
@ 2012-11-10 11:04     ` Nguyen Thai Ngoc Duy
  2012-11-30  0:06       ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-11-10 11:04 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah

On Fri, Nov 9, 2012 at 6:57 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> diff --git a/cache-tree.c b/cache-tree.c
>> index 28ed657..30a8018 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it,
>>       i = update_one(it, cache, entries, "", 0, flags);
>>       if (i < 0)
>>               return i;
>> +     for (i = 0; i < entries; i++)
>> +             if (cache[i]->ce_flags & CE_INTENT_TO_ADD)
>> +                     cache_tree_invalidate_path(it, cache[i]->name);
>>       return 0;
>>  }
>
> I notice there is another special case for CE_REMOVE but there is
> nothing that adjusts the cache-tree for these entries in the current
> codebase.
>
> I suspect the original code before we (perhaps incorrectly) updated
> the code not to error out upon I-T-A entries was fine only because
> we do not attempt to fully populate the cache-tree during a merge in
> the unpack-trees codepath, which will mark the index entries that
> are to be removed with CE_REMOVE in the resulting index.
>
> The solution implemented with this patch will break if we start
> updating the cache tree after a successful merge in unpack-trees, I
> suspect.

I don't understand. I thought we handled CE_REMOVE correctly (i.e. no
CE_REMOVE entries in cache tree even after a successful merge). Or
should we keep CE_REMOVE in cache tree after a successful merge?

> An alternative might be to add a "phoney" bit next to "used" in the
> cache_tree structure, mark the cache tree as phoney when we skip an
> entry marked as CE_REMOVE or CE_ITA, and make the postprocessing
> loop this patch adds aware of that bit, instead of iterating over
> the index entries; instead, it would recurse the resulting cache
> tree and invalidate parts of the tree that have subtrees with the
> "phoney" bit set, or something.

Yeah, that sounds better.
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees
  2012-11-10 11:04     ` Nguyen Thai Ngoc Duy
@ 2012-11-30  0:06       ` Junio C Hamano
  2012-11-30  1:26         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-11-30  0:06 UTC (permalink / raw
  To: Nguyen Thai Ngoc Duy; +Cc: git, Jeff King, Jonathon Mah

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

>> An alternative might be to add a "phoney" bit next to "used" in the
>> cache_tree structure, mark the cache tree as phoney when we skip an
>> entry marked as CE_REMOVE or CE_ITA, and make the postprocessing
>> loop this patch adds aware of that bit, instead of iterating over
>> the index entries; instead, it would recurse the resulting cache
>> tree and invalidate parts of the tree that have subtrees with the
>> "phoney" bit set, or something.
>
> Yeah, that sounds better.

Did anything happen to this topic after this?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees
  2012-11-30  0:06       ` Junio C Hamano
@ 2012-11-30  1:26         ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-11-30  1:26 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah

On Fri, Nov 30, 2012 at 7:06 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>>> An alternative might be to add a "phoney" bit next to "used" in the
>>> cache_tree structure, mark the cache tree as phoney when we skip an
>>> entry marked as CE_REMOVE or CE_ITA, and make the postprocessing
>>> loop this patch adds aware of that bit, instead of iterating over
>>> the index entries; instead, it would recurse the resulting cache
>>> tree and invalidate parts of the tree that have subtrees with the
>>> "phoney" bit set, or something.
>>
>> Yeah, that sounds better.
>
> Did anything happen to this topic after this?

Not from my side because I forgot to mark this thread as a todo item
and unsurprisingly forgot about it.
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees
  2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy
  2012-11-09 11:57   ` Junio C Hamano
@ 2012-12-08  4:10   ` Nguyễn Thái Ngọc Duy
  2012-12-10  6:50     ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-12-08  4:10 UTC (permalink / raw
  To: git
  Cc: Jeff King, Junio C Hamano, Jonathon Mah,
	Nguyễn Thái Ngọc Duy

Intent-to-add entries used to forbid writing trees so it was not a
problem. After commit 3f6d56d (commit: ignore intent-to-add entries
instead of refusing - 2012-02-07), we can generate trees from an index
with i-t-a entries.

However, the commit forgets to invalidate all paths leading to i-t-a
entries. With fully valid cache-tree (e.g. after commit or
write-tree), diff operations may prefer cache-tree to index and not
see i-t-a entries in the index, because cache-tree does not have them.

Reported-by: Jonathon Mah <me@JonathonMah.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Sorry for the late update, I have little time for git these days.
 It turns out that we don't need phony bit and another round for
 invalidation. We can do it in update_one.

 cache-tree.c          | 30 ++++++++++++++++++++++++++----
 t/t2203-add-intent.sh | 20 ++++++++++++++++++++
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 28ed657..989a7ff 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it,
 	int missing_ok = flags & WRITE_TREE_MISSING_OK;
 	int dryrun = flags & WRITE_TREE_DRY_RUN;
 	int i;
+	int to_invalidate = 0;
 
 	if (0 <= it->entry_count && has_sha1_file(it->sha1))
 		return it->entry_count;
@@ -324,7 +325,13 @@ static int update_one(struct cache_tree *it,
 			if (!sub)
 				die("cache-tree.c: '%.*s' in '%s' not found",
 				    entlen, path + baselen, path);
-			i += sub->cache_tree->entry_count - 1;
+			i--; /* this entry is already counted in "sub" */
+			if (sub->cache_tree->entry_count < 0) {
+				i -= sub->cache_tree->entry_count;
+				to_invalidate = 1;
+			}
+			else
+				i += sub->cache_tree->entry_count;
 			sha1 = sub->cache_tree->sha1;
 			mode = S_IFDIR;
 		}
@@ -339,8 +346,23 @@ static int update_one(struct cache_tree *it,
 				mode, sha1_to_hex(sha1), entlen+baselen, path);
 		}
 
-		if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD))
-			continue; /* entry being removed or placeholder */
+		/*
+		 * CE_REMOVE entries are removed before the index is
+		 * written to disk. Skip them to remain consistent
+		 * with the future on-disk index.
+		 */
+		if (ce->ce_flags & CE_REMOVE)
+			continue;
+
+		/*
+		 * CE_INTENT_TO_ADD entries exist on on-disk index but
+		 * they are not part of generated trees. Invalidate up
+		 * to root to force cache-tree users to read elsewhere.
+		 */
+		if (ce->ce_flags & CE_INTENT_TO_ADD) {
+			to_invalidate = 1;
+			continue;
+		}
 
 		strbuf_grow(&buffer, entlen + 100);
 		strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0');
@@ -360,7 +382,7 @@ static int update_one(struct cache_tree *it,
 	}
 
 	strbuf_release(&buffer);
-	it->entry_count = i;
+	it->entry_count = to_invalidate ? -i : i;
 #if DEBUG
 	fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n",
 		it->entry_count, it->subtree_nr,
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index ec35409..2a4a749 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' '
 	git commit -a -m all
 '
 
+test_expect_success 'cache-tree invalidates i-t-a paths' '
+	git reset --hard &&
+	mkdir dir &&
+	: >dir/foo &&
+	git add dir/foo &&
+	git commit -m foo &&
+
+	: >dir/bar &&
+	git add -N dir/bar &&
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual &&
+
+	git write-tree >/dev/null &&
+
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual
+'
+
 test_done
 
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees
  2012-12-08  4:10   ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy
@ 2012-12-10  6:50     ` Junio C Hamano
  2012-12-10 11:53       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-12-10  6:50 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/cache-tree.c b/cache-tree.c
> index 28ed657..989a7ff 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it,
>  	int missing_ok = flags & WRITE_TREE_MISSING_OK;
>  	int dryrun = flags & WRITE_TREE_DRY_RUN;
>  	int i;
> +	int to_invalidate = 0;
>  
>  	if (0 <= it->entry_count && has_sha1_file(it->sha1))
>  		return it->entry_count;
> @@ -324,7 +325,13 @@ static int update_one(struct cache_tree *it,
>  			if (!sub)
>  				die("cache-tree.c: '%.*s' in '%s' not found",
>  				    entlen, path + baselen, path);
> -			i += sub->cache_tree->entry_count - 1;
> +			i--; /* this entry is already counted in "sub" */
> +			if (sub->cache_tree->entry_count < 0) {
> +				i -= sub->cache_tree->entry_count;
> +				to_invalidate = 1;
> +			}
> +			else
> +				i += sub->cache_tree->entry_count;

Hrm.  update_one() is prepared to see a cache-tree whose entry count
is zero (see the context lines in the previous hunk) and the
invariant for the rest of the code is "if 0 <= entry_count, the
cached tree is valid; invalid cache-tree has -1 in entry_count.
More importantly, entry_count negated does not in general express
how many entries there are in the subtree and does not tell us how
many index entries to skip.

> @@ -339,8 +346,23 @@ static int update_one(struct cache_tree *it,
>  				mode, sha1_to_hex(sha1), entlen+baselen, path);
>  		}
>  
> -		if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD))
> -			continue; /* entry being removed or placeholder */
> +		/*
> +		 * CE_REMOVE entries are removed before the index is
> +		 * written to disk. Skip them to remain consistent
> +		 * with the future on-disk index.
> +		 */
> +		if (ce->ce_flags & CE_REMOVE)
> +			continue;
> +
> +		/*
> +		 * CE_INTENT_TO_ADD entries exist on on-disk index but
> +		 * they are not part of generated trees. Invalidate up
> +		 * to root to force cache-tree users to read elsewhere.
> +		 */
> +		if (ce->ce_flags & CE_INTENT_TO_ADD) {
> +			to_invalidate = 1;
> +			continue;
> +		}

Thanks for documenting these.

> @@ -360,7 +382,7 @@ static int update_one(struct cache_tree *it,
>  	}
>  
>  	strbuf_release(&buffer);
> -	it->entry_count = i;
> +	it->entry_count = to_invalidate ? -i : i;

See above.  I am not fundamentally opposed to a change to redefine
entry_count so that it always maintains how many index entries the
subtree covers, even for invalidated subtree, but I do not think
this patch alone is sufficient to maintain such invariant.

>  #if DEBUG
>  	fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n",
>  		it->entry_count, it->subtree_nr,
> diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
> index ec35409..2a4a749 100755
> --- a/t/t2203-add-intent.sh
> +++ b/t/t2203-add-intent.sh
> @@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' '
>  	git commit -a -m all
>  '
>  
> +test_expect_success 'cache-tree invalidates i-t-a paths' '
> +	git reset --hard &&
> +	mkdir dir &&
> +	: >dir/foo &&
> +	git add dir/foo &&
> +	git commit -m foo &&
> +
> +	: >dir/bar &&
> +	git add -N dir/bar &&
> +	git diff --cached --name-only >actual &&
> +	echo dir/bar >expect &&
> +	test_cmp expect actual &&
> +
> +	git write-tree >/dev/null &&
> +
> +	git diff --cached --name-only >actual &&
> +	echo dir/bar >expect &&
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees
  2012-12-10  6:50     ` Junio C Hamano
@ 2012-12-10 11:53       ` Nguyen Thai Ngoc Duy
  2012-12-10 17:22         ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-12-10 11:53 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah

On Mon, Dec 10, 2012 at 1:50 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> diff --git a/cache-tree.c b/cache-tree.c
>> index 28ed657..989a7ff 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it,
>>       int missing_ok = flags & WRITE_TREE_MISSING_OK;
>>       int dryrun = flags & WRITE_TREE_DRY_RUN;
>>       int i;
>> +     int to_invalidate = 0;
>>
>>       if (0 <= it->entry_count && has_sha1_file(it->sha1))
>>               return it->entry_count;
>> @@ -324,7 +325,13 @@ static int update_one(struct cache_tree *it,
>>                       if (!sub)
>>                               die("cache-tree.c: '%.*s' in '%s' not found",
>>                                   entlen, path + baselen, path);
>> -                     i += sub->cache_tree->entry_count - 1;
>> +                     i--; /* this entry is already counted in "sub" */
>> +                     if (sub->cache_tree->entry_count < 0) {
>> +                             i -= sub->cache_tree->entry_count;
>> +                             to_invalidate = 1;
>> +                     }
>> +                     else
>> +                             i += sub->cache_tree->entry_count;
>
> Hrm.  update_one() is prepared to see a cache-tree whose entry count
> is zero (see the context lines in the previous hunk) and the
> invariant for the rest of the code is "if 0 <= entry_count, the
> cached tree is valid; invalid cache-tree has -1 in entry_count.
> More importantly, entry_count negated does not in general express
> how many entries there are in the subtree and does not tell us how
> many index entries to skip.

Yeah I use entry_count for two different things here. In the previous
(unsent) version of the patch I had "entry_count = -1" right after "i
-= entry_count"

>> +                     if (sub->cache_tree->entry_count < 0) {
>> +                             i -= sub->cache_tree->entry_count;
>> +                             sub->cache_tree->entry_count = -1;
>> +                             to_invalidate = 1;
>> +                     }


which makes it clearer that the use of negative entry count is only
valid within update_one. Then I changed my mind because it says
'negative means "invalid"' in cache-tree.h. So, put "entry_count = -1"
back or just add another field  to struct cache_tree for this?
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees
  2012-12-10 11:53       ` Nguyen Thai Ngoc Duy
@ 2012-12-10 17:22         ` Junio C Hamano
  2012-12-13  1:39           ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-12-10 17:22 UTC (permalink / raw
  To: Nguyen Thai Ngoc Duy; +Cc: git, Jeff King, Jonathon Mah

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> Yeah I use entry_count for two different things here. In the previous
> (unsent) version of the patch I had "entry_count = -1" right after "i
> -= entry_count"
>
>>> +                     if (sub->cache_tree->entry_count < 0) {
>>> +                             i -= sub->cache_tree->entry_count;
>>> +                             sub->cache_tree->entry_count = -1;
>>> +                             to_invalidate = 1;
>>> +                     }
>
>
> which makes it clearer that the use of negative entry count is only
> valid within update_one. Then I changed my mind because it says
> 'negative means "invalid"' in cache-tree.h.

But you make it to mean not just 'negative means invalid' but
'negate it and you can learn how many index entries to skip to reach
the entry outside the subtree'.  That is what I was wondering about.

I do not think that new invariant does not hold in general; it may
hold true while inside this function, but immediately after somebody
else calls cache_tree_invalidate_path() it won't be true anymore.
Leaking the information to outside callers that can easily be
mistaken as if it may mean something without any comment looks like
we are asking for trouble.

> So, put "entry_count = -1"
> back or just add another field  to struct cache_tree for this?

As long as it is made clear with in-code comment that says "negative
entry count, when negated, will give us how many index entries are
covered by this invalid cache tree, only inside this function", and
such a negative entry_count is reset to -1 to avoid leaking out the
value that will soon go stale to the outside world in the "write
this tree out" loop, I think the v2 patch is OK, if not ideal.

If we were to commit to keep the new invariant true throughout the
system, on the other hand, I suspect that a boolean flag "valid" may
help people who keep i-t-a entries around across write-tree calls.
The first if () statement in the update_one() function can check the
validity, and you can say the cache tree is valid even if the
section in the index that is covered by the cache-tree contains
i-t-a entries _after_ you wrote it out as a tree, no?  That may make
the semantics a lot cleaner, I suspect.

The new semantics might be:

 * update_one() returns negative only when the section of the index
   given to it cannot be written out as a tree (i.e. not merged, or
   corrupt index);

 * update_one() returns the number of entries in the index covered
   by the tree, including i-t-a entries (but not REMOVED, as these
   entries will not be in the index after the cache-tree has been
   written out);

 * cache_tree->valid tells if the cache_tree->sha1 is valid; the
   section of the index the tree covers may or may not have i-t-a
   entries in it;

 * cache_tree->entry_count holds the number of index entries the
   tree covers, couting i-t-a entries. This field is only meaningful
   when cache_tree->valid is true.

or something like that.

I noticed that the recent "ignore i-t-a only while writing trees
instead of erroring out" broke 331fcb5 (git add --intent-to-add: do
not let an empty blob be committed by accident, 2008-11-28) in
another way, by the way.  In verify_cache(), there is an unreachable
else clause to warn about "not added yet" entries that should have
been removed but is left behind.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees
  2012-12-10 17:22         ` Junio C Hamano
@ 2012-12-13  1:39           ` Nguyễn Thái Ngọc Duy
  2012-12-13  1:39             ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy
  2012-12-13  2:04             ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-12-13  1:39 UTC (permalink / raw
  To: git
  Cc: Jeff King, Junio C Hamano, Jonathon Mah,
	Nguyễn Thái Ngọc Duy

Intent-to-add entries used to forbid writing trees so it was not a
problem. After commit 3f6d56d (commit: ignore intent-to-add entries
instead of refusing - 2012-02-07), we can generate trees from an index
with i-t-a entries.

However, the commit forgets to invalidate all paths leading to i-t-a
entries. With fully valid cache-tree (e.g. after commit or
write-tree), diff operations may prefer cache-tree to index and not
see i-t-a entries in the index, because cache-tree does not have them.

Reported-by: Jonathon Mah <me@JonathonMah.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 This version ensures that entry_count can only be >= -1 after
 update_one returns. Not ideal but good enough.

 cache-tree.c          | 40 ++++++++++++++++++++++++++++++++++++----
 t/t2203-add-intent.sh | 20 ++++++++++++++++++++
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 28ed657..1fbc81a 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it,
 	int missing_ok = flags & WRITE_TREE_MISSING_OK;
 	int dryrun = flags & WRITE_TREE_DRY_RUN;
 	int i;
+	int to_invalidate = 0;
 
 	if (0 <= it->entry_count && has_sha1_file(it->sha1))
 		return it->entry_count;
@@ -324,7 +325,14 @@ static int update_one(struct cache_tree *it,
 			if (!sub)
 				die("cache-tree.c: '%.*s' in '%s' not found",
 				    entlen, path + baselen, path);
-			i += sub->cache_tree->entry_count - 1;
+			i--; /* this entry is already counted in "sub" */
+			if (sub->cache_tree->entry_count < 0) {
+				i -= sub->cache_tree->entry_count;
+				sub->cache_tree->entry_count = -1;
+				to_invalidate = 1;
+			}
+			else
+				i += sub->cache_tree->entry_count;
 			sha1 = sub->cache_tree->sha1;
 			mode = S_IFDIR;
 		}
@@ -339,8 +347,23 @@ static int update_one(struct cache_tree *it,
 				mode, sha1_to_hex(sha1), entlen+baselen, path);
 		}
 
-		if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD))
-			continue; /* entry being removed or placeholder */
+		/*
+		 * CE_REMOVE entries are removed before the index is
+		 * written to disk. Skip them to remain consistent
+		 * with the future on-disk index.
+		 */
+		if (ce->ce_flags & CE_REMOVE)
+			continue;
+
+		/*
+		 * CE_INTENT_TO_ADD entries exist on on-disk index but
+		 * they are not part of generated trees. Invalidate up
+		 * to root to force cache-tree users to read elsewhere.
+		 */
+		if (ce->ce_flags & CE_INTENT_TO_ADD) {
+			to_invalidate = 1;
+			continue;
+		}
 
 		strbuf_grow(&buffer, entlen + 100);
 		strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0');
@@ -360,7 +383,7 @@ static int update_one(struct cache_tree *it,
 	}
 
 	strbuf_release(&buffer);
-	it->entry_count = i;
+	it->entry_count = to_invalidate ? -i : i;
 #if DEBUG
 	fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n",
 		it->entry_count, it->subtree_nr,
@@ -381,6 +404,15 @@ int cache_tree_update(struct cache_tree *it,
 	i = update_one(it, cache, entries, "", 0, flags);
 	if (i < 0)
 		return i;
+	/*
+	 * update_one() uses negative entry_count as a way to mark an
+	 * entry invalid _and_ pass the number of entries back to
+	 * itself at the parent level. This is for internal use and
+	 * should not be leaked out after the top-level update_one
+	 * exits.
+	 */
+	if (it->entry_count < 0)
+		it->entry_count = -1;
 	return 0;
 }
 
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index ec35409..2a4a749 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' '
 	git commit -a -m all
 '
 
+test_expect_success 'cache-tree invalidates i-t-a paths' '
+	git reset --hard &&
+	mkdir dir &&
+	: >dir/foo &&
+	git add dir/foo &&
+	git commit -m foo &&
+
+	: >dir/bar &&
+	git add -N dir/bar &&
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual &&
+
+	git write-tree >/dev/null &&
+
+	git diff --cached --name-only >actual &&
+	echo dir/bar >expect &&
+	test_cmp expect actual
+'
+
 test_done
 
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache()
  2012-12-13  1:39           ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy
@ 2012-12-13  1:39             ` Nguyễn Thái Ngọc Duy
  2012-12-13 18:34               ` Junio C Hamano
  2012-12-13  2:04             ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-12-13  1:39 UTC (permalink / raw
  To: git
  Cc: Jeff King, Junio C Hamano, Jonathon Mah,
	Nguyễn Thái Ngọc Duy

This code is added in 331fcb5 (git add --intent-to-add: do not let an
empty blob be committed by accident - 2008-11-28) to forbid committing
when i-t-a entries are present. When we allow that, we forgot to
remove this unreachable code.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 1fbc81a..3e79e27 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -166,12 +166,8 @@ static int verify_cache(struct cache_entry **cache,
 				fprintf(stderr, "...\n");
 				break;
 			}
-			if (ce_stage(ce))
-				fprintf(stderr, "%s: unmerged (%s)\n",
-					ce->name, sha1_to_hex(ce->sha1));
-			else
-				fprintf(stderr, "%s: not added yet\n",
-					ce->name);
+			fprintf(stderr, "%s: unmerged (%s)\n",
+				ce->name, sha1_to_hex(ce->sha1));
 		}
 	}
 	if (funny)
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees
  2012-12-13  1:39           ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy
  2012-12-13  1:39             ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy
@ 2012-12-13  2:04             ` Junio C Hamano
  2012-12-15  2:52               ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-12-13  2:04 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Intent-to-add entries used to forbid writing trees so it was not a
> problem. After commit 3f6d56d (commit: ignore intent-to-add entries
> instead of refusing - 2012-02-07), we can generate trees from an index
> with i-t-a entries.
>
> However, the commit forgets to invalidate all paths leading to i-t-a
> entries. With fully valid cache-tree (e.g. after commit or
> write-tree), diff operations may prefer cache-tree to index and not
> see i-t-a entries in the index, because cache-tree does not have them.
>
> Reported-by: Jonathon Mah <me@JonathonMah.com>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  This version ensures that entry_count can only be >= -1 after
>  update_one returns. Not ideal but good enough.
>
>  cache-tree.c          | 40 ++++++++++++++++++++++++++++++++++++----
>  t/t2203-add-intent.sh | 20 ++++++++++++++++++++
>  2 files changed, 56 insertions(+), 4 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 28ed657..1fbc81a 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it,
>  	int missing_ok = flags & WRITE_TREE_MISSING_OK;
>  	int dryrun = flags & WRITE_TREE_DRY_RUN;
>  	int i;
> +	int to_invalidate = 0;
>  
>  	if (0 <= it->entry_count && has_sha1_file(it->sha1))
>  		return it->entry_count;
> @@ -324,7 +325,14 @@ static int update_one(struct cache_tree *it,
>  			if (!sub)
>  				die("cache-tree.c: '%.*s' in '%s' not found",
>  				    entlen, path + baselen, path);
> -			i += sub->cache_tree->entry_count - 1;
> +			i--; /* this entry is already counted in "sub" */

Huh?

The "-1" in the original is the bog-standard compensation for the
for(;;i++) loop.

> +			if (sub->cache_tree->entry_count < 0) {
> +				i -= sub->cache_tree->entry_count;
> +				sub->cache_tree->entry_count = -1;
> +				to_invalidate = 1;
> +			}
> +			else
> +				i += sub->cache_tree->entry_count;

While the rewritten version is not *wrong* per-se, I have a feeling
that it may be much easier to read if written like this:

	if (sub->cache_tree_entry_count < 0) {
		to_invalidate = 1;
                to_skip = 0 - sub->cache_tree_entry_count;
		sub->cache_tree_entry_count = -1;
	} else {
		to_skip = sub->cache_tree_entry_count;
	}
        i += to_skip - 1;

> @@ -360,7 +383,7 @@ static int update_one(struct cache_tree *it,
>  	}
>  
>  	strbuf_release(&buffer);
> -	it->entry_count = i;
> +	it->entry_count = to_invalidate ? -i : i;
>  #if DEBUG
>  	fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n",
>  		it->entry_count, it->subtree_nr,
> @@ -381,6 +404,15 @@ int cache_tree_update(struct cache_tree *it,
>  	i = update_one(it, cache, entries, "", 0, flags);
>  	if (i < 0)
>  		return i;
> +	/*
> +	 * update_one() uses negative entry_count as a way to mark an
> +	 * entry invalid _and_ pass the number of entries back to
> +	 * itself at the parent level. This is for internal use and
> +	 * should not be leaked out after the top-level update_one
> +	 * exits.
> +	 */
> +	if (it->entry_count < 0)
> +		it->entry_count = -1;

Nice.  I think what entry_count means immediately after update_one()
returned should be commented near the beginning of that function,
too, though.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache()
  2012-12-13  1:39             ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy
@ 2012-12-13 18:34               ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2012-12-13 18:34 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah

Will replace the one in 'pu' with these two.  Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees
  2012-12-13  2:04             ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano
@ 2012-12-15  2:52               ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-12-15  2:52 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah

On Thu, Dec 13, 2012 at 9:04 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> @@ -324,7 +325,14 @@ static int update_one(struct cache_tree *it,
>>                       if (!sub)
>>                               die("cache-tree.c: '%.*s' in '%s' not found",
>>                                   entlen, path + baselen, path);
>> -                     i += sub->cache_tree->entry_count - 1;
>> +                     i--; /* this entry is already counted in "sub" */
>
> Huh?
>
> The "-1" in the original is the bog-standard compensation for the
> for(;;i++) loop.

Exactly. It took me a while to figure out what " - 1" was for and I
wanted to avoid that for other developers. Only I worded it badly.
I'll replace the for loop with a while loop to make it clearer...

>
>> +                     if (sub->cache_tree->entry_count < 0) {
>> +                             i -= sub->cache_tree->entry_count;
>> +                             sub->cache_tree->entry_count = -1;
>> +                             to_invalidate = 1;
>> +                     }
>> +                     else
>> +                             i += sub->cache_tree->entry_count;
>
> While the rewritten version is not *wrong* per-se, I have a feeling
> that it may be much easier to read if written like this:
>
>         if (sub->cache_tree_entry_count < 0) {
>                 to_invalidate = 1;
>                 to_skip = 0 - sub->cache_tree_entry_count;
>                 sub->cache_tree_entry_count = -1;
>         } else {
>                 to_skip = sub->cache_tree_entry_count;
>         }
>         i += to_skip - 1;
>

..or this would be fine too. Which way to go?

A while we're still at the cache tree

> -               if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD))
> -                       continue; /* entry being removed or placeholder */
> +               /*
> +                * CE_REMOVE entries are removed before the index is
> +                * written to disk. Skip them to remain consistent
> +                * with the future on-disk index.
> +                */
> +               if (ce->ce_flags & CE_REMOVE)
> +                       continue;
> +

A CE_REMOVE entry is removed later from the index, however it is still
counted in entry_count. entry_count serves two purposes: to skip the
number of processed entries in the in-memory index, and to record the
number of entries in the on-disk index. These numbers do not match
when CE_REMOVE is present. We have correct in-memory entry_count,
which means incorrect on-disk entry_count in this case.

I tested with an index that has a/b and a/c. The latter has CE_REMOVE.
After writing cache tree I get:

$ git ls-files --stage
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       a/b
$ ../test-dump-cache-tree
878e27c626266ac04087a203e4bdd396dcf74763  (2 entries, 1 subtrees)
878e27c626266ac04087a203e4bdd396dcf74763 #(ref)  (1 entries, 1 subtrees)
4277b6e69d25e5efa77c455340557b384a4c018a a/ (2 entries, 0 subtrees)
4277b6e69d25e5efa77c455340557b384a4c018a #(ref) a/ (1 entries, 0 subtrees)

If I throw out that index, create a new one with a/b alone and write-tree, I get

$ ../test-dump-cache-tree
878e27c626266ac04087a203e4bdd396dcf74763  (1 entries, 1 subtrees)
4277b6e69d25e5efa77c455340557b384a4c018a a/ (1 entries, 0 subtrees)

Shall we fix this too? I'm thinking of adding "skip" argument to update_one and

   i += sub->cache_tree->entry_count - 1;

would become

   i += sub->cache_tree->entry_count + skip - 1;

and entry_count would always reflect on-disk value. This "skip" can be
reused for this i-t-a patch as well.
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-12-15  2:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-06  8:53 Bug: write-tree corrupts intent-to-add index state Jonathon Mah
2012-11-06 12:37 ` Nguyen Thai Ngoc Duy
2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy
2012-11-09 11:57   ` Junio C Hamano
2012-11-10 11:04     ` Nguyen Thai Ngoc Duy
2012-11-30  0:06       ` Junio C Hamano
2012-11-30  1:26         ` Nguyen Thai Ngoc Duy
2012-12-08  4:10   ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy
2012-12-10  6:50     ` Junio C Hamano
2012-12-10 11:53       ` Nguyen Thai Ngoc Duy
2012-12-10 17:22         ` Junio C Hamano
2012-12-13  1:39           ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy
2012-12-13  1:39             ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy
2012-12-13 18:34               ` Junio C Hamano
2012-12-13  2:04             ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano
2012-12-15  2:52               ` Nguyen Thai Ngoc Duy

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).