git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] dir.c: avoid stat() in valid_cached_dir()
@ 2017-12-28  0:28 Nguyễn Thái Ngọc Duy
  2017-12-28 19:05 ` Junio C Hamano
  2017-12-28 19:50 ` [PATCH] dir.c: avoid stat() in valid_cached_dir() Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2017-12-28  0:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

stat() may follow a symlink and return stat data of the link's target
instead of the link itself. We are concerned about the link itself.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 I noticed this while looking at the untracked cache bug [1] but
 because it's not directly related to that bug, I'm submitting it
 separately here.

 [1] https://public-inbox.org/git/CACsJy8AmbKSp0mDLRaDCWn45veeNc03B-Gq8r8PQPrDt9bM_EA@mail.gmail.com/

 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..edcb7bb462 100644
--- a/dir.c
+++ b/dir.c
@@ -1809,7 +1809,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 	 */
 	refresh_fsmonitor(istate);
 	if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
-		if (stat(path->len ? path->buf : ".", &st)) {
+		if (lstat(path->len ? path->buf : ".", &st)) {
 			invalidate_directory(dir->untracked, untracked);
 			memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
 			return 0;
-- 
2.15.0.320.g0453912d77


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] dir.c: avoid stat() in valid_cached_dir()
  2017-12-28  0:28 [PATCH] dir.c: avoid stat() in valid_cached_dir() Nguyễn Thái Ngọc Duy
@ 2017-12-28 19:05 ` Junio C Hamano
  2017-12-28 19:10   ` Junio C Hamano
  2017-12-28 19:50 ` [PATCH] dir.c: avoid stat() in valid_cached_dir() Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2017-12-28 19:05 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> stat() may follow a symlink and return stat data of the link's target
> instead of the link itself. We are concerned about the link itself.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  I noticed this while looking at the untracked cache bug [1] but
>  because it's not directly related to that bug, I'm submitting it
>  separately here.

Thanks; it totally makes sense.

>
>  [1] https://public-inbox.org/git/CACsJy8AmbKSp0mDLRaDCWn45veeNc03B-Gq8r8PQPrDt9bM_EA@mail.gmail.com/
>
>  dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index 7c4b45e30e..edcb7bb462 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1809,7 +1809,7 @@ static int valid_cached_dir(struct dir_struct *dir,
>  	 */
>  	refresh_fsmonitor(istate);
>  	if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
> -		if (stat(path->len ? path->buf : ".", &st)) {
> +		if (lstat(path->len ? path->buf : ".", &st)) {
>  			invalidate_directory(dir->untracked, untracked);
>  			memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
>  			return 0;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dir.c: avoid stat() in valid_cached_dir()
  2017-12-28 19:05 ` Junio C Hamano
@ 2017-12-28 19:10   ` Junio C Hamano
  2018-01-01 23:57     ` Duy Nguyen
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2017-12-28 19:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>>  [1] https://public-inbox.org/git/CACsJy8AmbKSp0mDLRaDCWn45veeNc03B-Gq8r8PQPrDt9bM_EA@mail.gmail.com/
>>
>>  dir.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/dir.c b/dir.c
>> index 7c4b45e30e..edcb7bb462 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1809,7 +1809,7 @@ static int valid_cached_dir(struct dir_struct *dir,
>>  	 */
>>  	refresh_fsmonitor(istate);
>>  	if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
>> -		if (stat(path->len ? path->buf : ".", &st)) {
>> +		if (lstat(path->len ? path->buf : ".", &st)) {

Hmph, I have to wonder if this is sufficient, though.  When you got
"a/b/c" in path->buf, is it sufficient to avoid mistaking 'c' that
is (eh, rather, "has unexpectedly turned into") a symbolic link to a
directory as a true directory?  Wouldn't we have to be equally careful
about "a" and "a/b" as well?

>>  			invalidate_directory(dir->untracked, untracked);
>>  			memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
>>  			return 0;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dir.c: avoid stat() in valid_cached_dir()
  2017-12-28  0:28 [PATCH] dir.c: avoid stat() in valid_cached_dir() Nguyễn Thái Ngọc Duy
  2017-12-28 19:05 ` Junio C Hamano
@ 2017-12-28 19:50 ` Ævar Arnfjörð Bjarmason
  2018-01-02  0:02   ` Duy Nguyen
  1 sibling, 1 reply; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-12-28 19:50 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git


On Thu, Dec 28 2017, Nguyễn Thái Ngọc Duy jotted:

> stat() may follow a symlink and return stat data of the link's target
> instead of the link itself. We are concerned about the link itself.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  I noticed this while looking at the untracked cache bug [1] but
>  because it's not directly related to that bug, I'm submitting it
>  separately here.
>
>  [1] https://public-inbox.org/git/CACsJy8AmbKSp0mDLRaDCWn45veeNc03B-Gq8r8PQPrDt9bM_EA@mail.gmail.com/

I'm slowly trying to piece together a re-submission of this whole
series, if this is a bug and not just an optimziation shouldn't there be
some test case that demonstrates this bug?

>  dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index 7c4b45e30e..edcb7bb462 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1809,7 +1809,7 @@ static int valid_cached_dir(struct dir_struct *dir,
>  	 */
>  	refresh_fsmonitor(istate);
>  	if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
> -		if (stat(path->len ? path->buf : ".", &st)) {
> +		if (lstat(path->len ? path->buf : ".", &st)) {
>  			invalidate_directory(dir->untracked, untracked);
>  			memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
>  			return 0;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dir.c: avoid stat() in valid_cached_dir()
  2017-12-28 19:10   ` Junio C Hamano
@ 2018-01-01 23:57     ` Duy Nguyen
  2018-01-03 20:49       ` [PATCH v2 0/5] untracked cache bug fixes Ævar Arnfjörð Bjarmason
                         ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Duy Nguyen @ 2018-01-01 23:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Fri, Dec 29, 2017 at 2:10 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>>>  [1] https://public-inbox.org/git/CACsJy8AmbKSp0mDLRaDCWn45veeNc03B-Gq8r8PQPrDt9bM_EA@mail.gmail.com/
>>>
>>>  dir.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/dir.c b/dir.c
>>> index 7c4b45e30e..edcb7bb462 100644
>>> --- a/dir.c
>>> +++ b/dir.c
>>> @@ -1809,7 +1809,7 @@ static int valid_cached_dir(struct dir_struct *dir,
>>>       */
>>>      refresh_fsmonitor(istate);
>>>      if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
>>> -            if (stat(path->len ? path->buf : ".", &st)) {
>>> +            if (lstat(path->len ? path->buf : ".", &st)) {
>
> Hmph, I have to wonder if this is sufficient, though.  When you got
> "a/b/c" in path->buf, is it sufficient to avoid mistaking 'c' that
> is (eh, rather, "has unexpectedly turned into") a symbolic link to a
> directory as a true directory?  Wouldn't we have to be equally careful
> about "a" and "a/b" as well?

We do. But the way directory traversal works, I believe a and a/b must
have been checked (by this function too) before we call
valid_cached_dir("a/b/c").
-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dir.c: avoid stat() in valid_cached_dir()
  2017-12-28 19:50 ` [PATCH] dir.c: avoid stat() in valid_cached_dir() Ævar Arnfjörð Bjarmason
@ 2018-01-02  0:02   ` Duy Nguyen
  0 siblings, 0 replies; 13+ messages in thread
From: Duy Nguyen @ 2018-01-02  0:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Fri, Dec 29, 2017 at 2:50 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Thu, Dec 28 2017, Nguyễn Thái Ngọc Duy jotted:
>
>> stat() may follow a symlink and return stat data of the link's target
>> instead of the link itself. We are concerned about the link itself.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>>  I noticed this while looking at the untracked cache bug [1] but
>>  because it's not directly related to that bug, I'm submitting it
>>  separately here.
>>
>>  [1] https://public-inbox.org/git/CACsJy8AmbKSp0mDLRaDCWn45veeNc03B-Gq8r8PQPrDt9bM_EA@mail.gmail.com/
>
> I'm slowly trying to piece together a re-submission of this whole
> series, if this is a bug and not just an optimziation shouldn't there be
> some test case that demonstrates this bug?

It's kind of hard to demonstrate the bug. I think when path->buf is a
symlink, we most likely find that its target's stat data does not
match our cached one, which means we ignore the cache and fall back to
slow path. This is performance issue, not correctness (though we could
still catch it by verying test-dump-untracked-cache, I guess; I could
try writing a test case for this if you want). The less unlikely case
is, link target stat data matches the cached version and we
incorrectly go fast path, ignoring real data on disk. A test for this
may involve manipulating stat data, which may be not portable.
-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 0/5] untracked cache bug fixes
  2018-01-01 23:57     ` Duy Nguyen
@ 2018-01-03 20:49       ` Ævar Arnfjörð Bjarmason
  2018-01-03 20:49       ` [PATCH v2 1/5] status: add a failing test showing a core.untrackedCache bug Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-03 20:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Christian Couder, Ben Peart, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Took me a while to get around to this. This is a replacement for the
patches Duy and I have floating around on the list so far related to
the untracked cache bugs raised recently.

Part of this has been me incorporating Duy's work and writing commit
messages etc. for him.

Nguyễn Thái Ngọc Duy (3):
  dir.c: avoid stat() in valid_cached_dir()
  dir.c: fix missing dir invalidation in untracked code
  dir.c: stop ignoring opendir() error in open_cached_dir()

Ævar Arnfjörð Bjarmason (2):
  status: add a failing test showing a core.untrackedCache bug
  update-index doc: note a fixed bug in the untracked cache

 Documentation/git-update-index.txt | 16 +++++++
 dir.c                              | 33 ++++++++++-----
 t/t7063-status-untracked-cache.sh  | 87 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 125 insertions(+), 11 deletions(-)

-- 
2.15.1.424.g9478a66081


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/5] status: add a failing test showing a core.untrackedCache bug
  2018-01-01 23:57     ` Duy Nguyen
  2018-01-03 20:49       ` [PATCH v2 0/5] untracked cache bug fixes Ævar Arnfjörð Bjarmason
@ 2018-01-03 20:49       ` Ævar Arnfjörð Bjarmason
  2018-01-03 20:49       ` [PATCH v2 2/5] dir.c: avoid stat() in valid_cached_dir() Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-03 20:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Christian Couder, Ben Peart, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

The untracked cache gets confused when a directory is swapped out for
a file. It is easiest to reproduce this by swapping out a directory
with a symlink to another directory, and as the tests show the symlink
case is the only case we've found where "git status" will subsequently
report incorrect information, even though it's possible to otherwise
get the untracked cache into a state where its internal data
structures don't reflect reality.

In the symlink case, whatever files are inside the target of the
symlink will be incorrectly shown as untracked. This issue does not
happen if the symlink links to another file, only if it links to
another directory.

A stand-alone testcase for copying into a terminal:

    (
        rm -rf /tmp/testrepo &&
        git init /tmp/testrepo &&
        cd /tmp/testrepo &&
        mkdir x y &&
        touch x/a y/b &&
        git add x/a y/b &&
        git commit -msnap &&
        git rm -rf y &&
        ln -s x y &&
        git add y &&
        git commit -msnap2 &&
        git checkout HEAD~ &&
        git status &&
        git checkout master &&
        sleep 1 &&
        git status &&
        git status
    )

This will incorrectly show y/a as an untracked file. Both the "git
status" call right before "git checkout master" and the "sleep 1"
after the "checkout master" are needed to reproduce this, presumably
due to the untracked cache tracking on the basis of cached whole
seconds from stat(2).

When git gets into this state, a workaround to fix it is to issue a
one-off:

    git -c core.untrackedCache=false status

For the non-symlink case, the bug is that the output of
test-dump-untracked-cache should not include:

   /one/ 0000000000000000000000000000000000000000 recurse valid

It being in the output implies that cached traversal of root includes
the directory "one" which does not exist on disk anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7063-status-untracked-cache.sh | 87 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index e5fb892f95..dba7f50bbb 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -22,6 +22,12 @@ avoid_racy() {
 	sleep 1
 }
 
+status_is_clean() {
+	>../status.expect &&
+	git status --porcelain >../status.actual &&
+	test_cmp ../status.expect ../status.actual
+}
+
 test_lazy_prereq UNTRACKED_CACHE '
 	{ git update-index --test-untracked-cache; ret=$?; } &&
 	test $ret -ne 1
@@ -683,4 +689,85 @@ test_expect_success 'untracked cache survives a commit' '
 	test_cmp ../before ../after
 '
 
+test_expect_success 'teardown worktree' '
+	cd ..
+'
+
+test_expect_success SYMLINKS 'setup worktree for symlink test' '
+	git init worktree-symlink &&
+	cd worktree-symlink &&
+	git config core.untrackedCache true &&
+	mkdir one two &&
+	touch one/file two/file &&
+	git add one/file two/file &&
+	git commit -m"first commit" &&
+	git rm -rf one &&
+	ln -s two one &&
+	git add one &&
+	git commit -m"second commit"
+'
+
+test_expect_failure SYMLINKS '"status" after symlink replacement should be clean with UC=true' '
+	git checkout HEAD~ &&
+	status_is_clean &&
+	status_is_clean &&
+	git checkout master &&
+	avoid_racy &&
+	status_is_clean &&
+	status_is_clean
+'
+
+test_expect_success SYMLINKS '"status" after symlink replacement should be clean with UC=false' '
+	git config core.untrackedCache false &&
+	git checkout HEAD~ &&
+	status_is_clean &&
+	status_is_clean &&
+	git checkout master &&
+	avoid_racy &&
+	status_is_clean &&
+	status_is_clean
+'
+
+test_expect_success 'setup worktree for non-symlink test' '
+	git init worktree-non-symlink &&
+	cd worktree-non-symlink &&
+	git config core.untrackedCache true &&
+	mkdir one two &&
+	touch one/file two/file &&
+	git add one/file two/file &&
+	git commit -m"first commit" &&
+	git rm -rf one &&
+	cp two/file one &&
+	git add one &&
+	git commit -m"second commit"
+'
+
+test_expect_failure '"status" after file replacement should be clean with UC=true' '
+	git checkout HEAD~ &&
+	status_is_clean &&
+	status_is_clean &&
+	git checkout master &&
+	avoid_racy &&
+	status_is_clean &&
+	test-dump-untracked-cache >../actual &&
+	grep -F "recurse valid" ../actual >../actual.grep &&
+	cat >../expect.grep <<EOF &&
+/ 0000000000000000000000000000000000000000 recurse valid
+/two/ 0000000000000000000000000000000000000000 recurse valid
+EOF
+	status_is_clean &&
+	test_cmp ../expect.grep ../actual.grep
+'
+
+test_expect_success '"status" after file replacement should be clean with UC=false' '
+	git config core.untrackedCache false &&
+	git checkout HEAD~ &&
+	status_is_clean &&
+	status_is_clean &&
+	git checkout master &&
+	avoid_racy &&
+	status_is_clean &&
+	status_is_clean
+'
+
 test_done
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/5] dir.c: avoid stat() in valid_cached_dir()
  2018-01-01 23:57     ` Duy Nguyen
  2018-01-03 20:49       ` [PATCH v2 0/5] untracked cache bug fixes Ævar Arnfjörð Bjarmason
  2018-01-03 20:49       ` [PATCH v2 1/5] status: add a failing test showing a core.untrackedCache bug Ævar Arnfjörð Bjarmason
@ 2018-01-03 20:49       ` Ævar Arnfjörð Bjarmason
  2018-01-03 20:49       ` [PATCH v2 3/5] dir.c: fix missing dir invalidation in untracked code Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-03 20:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Christian Couder, Ben Peart, Eric Sunshine

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

stat() may follow a symlink and return stat data of the link's target
instead of the link itself. We are concerned about the link itself.

It's kind of hard to demonstrate the bug. I think when path->buf is a
symlink, we most likely find that its target's stat data does not
match our cached one, which means we ignore the cache and fall back to
slow path.

This is performance issue, not correctness (though we could still
catch it by verifying test-dump-untracked-cache. The less unlikely
case is, link target stat data matches the cached version and we
incorrectly go fast path, ignoring real data on disk. A test for this
may involve manipulating stat data, which may be not portable.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..edcb7bb462 100644
--- a/dir.c
+++ b/dir.c
@@ -1809,7 +1809,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 	 */
 	refresh_fsmonitor(istate);
 	if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
-		if (stat(path->len ? path->buf : ".", &st)) {
+		if (lstat(path->len ? path->buf : ".", &st)) {
 			invalidate_directory(dir->untracked, untracked);
 			memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
 			return 0;
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/5] dir.c: fix missing dir invalidation in untracked code
  2018-01-01 23:57     ` Duy Nguyen
                         ` (2 preceding siblings ...)
  2018-01-03 20:49       ` [PATCH v2 2/5] dir.c: avoid stat() in valid_cached_dir() Ævar Arnfjörð Bjarmason
@ 2018-01-03 20:49       ` Ævar Arnfjörð Bjarmason
  2018-01-03 20:49       ` [PATCH v2 4/5] update-index doc: note a fixed bug in the untracked cache Ævar Arnfjörð Bjarmason
  2018-01-03 20:49       ` [PATCH v2 5/5] dir.c: stop ignoring opendir() error in open_cached_dir() Ævar Arnfjörð Bjarmason
  5 siblings, 0 replies; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-03 20:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Christian Couder, Ben Peart, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Let's start with how create a new directory cache after the last one
becomes invalid (e.g. because its dir mtime has changed...). In
open_cached_dir():

1. We start out with valid_cached_dir() returning false, which should
   call invalidate_directory() to put a directory state back to
   initial state, no untracked entries (untracked_nr zero), no sub
   directory traversal (dirs[].recurse zero).

2. Since the cache cannot be used, we go the slow path opendir() and
   go through items one by one via readdir(). All the directories on
   disk will be added back to the cache (if not already exist in
   dirs[]) and its flag "recurse" gets changed to one to note that
   it's part of the cached dir travesal next time.

3. By the time we reach close_cached_dir() we should have a good
   subdir list in dirs[]. Those with "recurse" flag set are the ones
   present in the on-disk directory. The directory is now marked
   "valid".

Next time read_directory() is called, since the directory is marked
valid, it will skip readdir(), go fast path and traverse through
dirs[] array instead.

Steps one and two need some tight cooperation. If a subdir is removed,
readdir() will not find it and of course we cannot examine/invalidate
it. To make sure removed directories on disk are gone from the cache,
step one must make sure recurse flag of all subdirs are zero.

But that's not true. If "valid" flag is already false, there is a
chance we go straight to the end of valid_cached_dir() without calling
invalidate_directory(). Or we fail to meet the "if (untracked-valid)"
condition and skip over the invalidate_directory().

After step 3, we mark the cache valid. Any stale subdir with incorrect
recurse flag becomes a real subdir next time we traverse the directory
using dirs[] array.

We could avoid this by making sure invalidate_directory() is always
called (therefore dirs[].recurse cleared) at the beginning of
open_cached_dir(). Which is what this patch does.

As to how we get into this situation, the key in the test is this
command

    git checkout master

where "one/file" is replaced with "one" in the index. This index
update triggers untracked_cache_invalidate_path(), which clears valid
flag of the root directory while keeping "recurse" flag on the subdir
"one" on. On the next git-status, we go through steps 1-3 above and
save an incorrect cache on disk. The second git-status blindly follows
the bad cache data and shows the problem.

This is arguably because of a bad design where "recurse" flag plays
double roles: whether a directory should be saved on disk, and whether
it is part of a directory traversal.

We need to keep recurse flag set at "checkout master" because of the
first role: we need to keep subdir caches (dir "two" for example has
not been touched at all, no reason to throw its cache away).

As long as we make sure to ignore/reset "recurse" flag at the
beginning of a directory traversal, we're good. But maybe eventually
we should separate these two roles.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 dir.c                             | 22 ++++++++++++++--------
 t/t7063-status-untracked-cache.sh |  4 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/dir.c b/dir.c
index edcb7bb462..163ca69df0 100644
--- a/dir.c
+++ b/dir.c
@@ -774,7 +774,16 @@ static void invalidate_directory(struct untracked_cache *uc,
 				 struct untracked_cache_dir *dir)
 {
 	int i;
-	uc->dir_invalidated++;
+
+	/*
+	 * Invalidation increment here is just roughly correct. If
+	 * untracked_nr or any of dirs[].recurse is non-zero, we
+	 * should increment dir_invalidated too. But that's more
+	 * expensive to do.
+	 */
+	if (dir->valid)
+		uc->dir_invalidated++;
+
 	dir->valid = 0;
 	dir->untracked_nr = 0;
 	for (i = 0; i < dir->dirs_nr; i++)
@@ -1810,23 +1819,18 @@ static int valid_cached_dir(struct dir_struct *dir,
 	refresh_fsmonitor(istate);
 	if (!(dir->untracked->use_fsmonitor && untracked->valid)) {
 		if (lstat(path->len ? path->buf : ".", &st)) {
-			invalidate_directory(dir->untracked, untracked);
 			memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
 			return 0;
 		}
 		if (!untracked->valid ||
 			match_stat_data_racy(istate, &untracked->stat_data, &st)) {
-			if (untracked->valid)
-				invalidate_directory(dir->untracked, untracked);
 			fill_stat_data(&untracked->stat_data, &st);
 			return 0;
 		}
 	}
 
-	if (untracked->check_only != !!check_only) {
-		invalidate_directory(dir->untracked, untracked);
+	if (untracked->check_only != !!check_only)
 		return 0;
-	}
 
 	/*
 	 * prep_exclude will be called eventually on this directory,
@@ -1858,8 +1862,10 @@ static int open_cached_dir(struct cached_dir *cdir,
 	if (valid_cached_dir(dir, untracked, istate, path, check_only))
 		return 0;
 	cdir->fdir = opendir(path->len ? path->buf : ".");
-	if (dir->untracked)
+	if (dir->untracked) {
+		invalidate_directory(dir->untracked, untracked);
 		dir->untracked->dir_opened++;
+	}
 	if (!cdir->fdir)
 		return -1;
 	return 0;
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index dba7f50bbb..46b947824f 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -707,7 +707,7 @@ test_expect_success SYMLINKS 'setup worktree for symlink test' '
 	git commit -m"second commit"
 '
 
-test_expect_failure SYMLINKS '"status" after symlink replacement should be clean with UC=true' '
+test_expect_success SYMLINKS '"status" after symlink replacement should be clean with UC=true' '
 	git checkout HEAD~ &&
 	status_is_clean &&
 	status_is_clean &&
@@ -742,7 +742,7 @@ test_expect_success 'setup worktree for non-symlink test' '
 	git commit -m"second commit"
 '
 
-test_expect_failure '"status" after file replacement should be clean with UC=true' '
+test_expect_success '"status" after file replacement should be clean with UC=true' '
 	git checkout HEAD~ &&
 	status_is_clean &&
 	status_is_clean &&
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/5] update-index doc: note a fixed bug in the untracked cache
  2018-01-01 23:57     ` Duy Nguyen
                         ` (3 preceding siblings ...)
  2018-01-03 20:49       ` [PATCH v2 3/5] dir.c: fix missing dir invalidation in untracked code Ævar Arnfjörð Bjarmason
@ 2018-01-03 20:49       ` Ævar Arnfjörð Bjarmason
  2018-01-03 20:49       ` [PATCH v2 5/5] dir.c: stop ignoring opendir() error in open_cached_dir() Ævar Arnfjörð Bjarmason
  5 siblings, 0 replies; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-03 20:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Christian Couder, Ben Peart, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Document the bug tested for in my "status: add a failing test showing
a core.untrackedCache bug" and fixed in Duy's "dir.c: fix missing dir
invalidation in untracked code".

Since this is very likely something others will encounter in the
future on older versions, and it's not obvious how to fix it let's
document both that it exists, and how to "fix" it with a one-off
command.

As noted in that commit, even though this bug gets the untracked cache
into a bad state, we have not yet found a case where this is user
visible, and thus it makes sense for these docs to focus on the
symlink case only.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-update-index.txt | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index bdb0342593..128e0c671f 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -464,6 +464,22 @@ command reads the index; while when `--[no-|force-]untracked-cache`
 are used, the untracked cache is immediately added to or removed from
 the index.
 
+Before 2.16, the untracked cache had a bug where replacing a directory
+with a symlink to another directory could cause it to incorrectly show
+files tracked by git as untracked. See the "status: add a failing test
+showing a core.untrackedCache bug" commit to git.git. A workaround for
+that was (and this might work for other undiscoverd bugs in the
+future):
+
+----------------
+$ git -c core.untrackedCache=false status
+----------------
+
+This bug has also been shown to affect non-symlink cases of replacing
+a directory with a file when it comes to the internal structures of
+the untracked cache, but no case has been found where this resulted in
+wrong "git status" output.
+
 File System Monitor
 -------------------
 
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/5] dir.c: stop ignoring opendir() error in open_cached_dir()
  2018-01-01 23:57     ` Duy Nguyen
                         ` (4 preceding siblings ...)
  2018-01-03 20:49       ` [PATCH v2 4/5] update-index doc: note a fixed bug in the untracked cache Ævar Arnfjörð Bjarmason
@ 2018-01-03 20:49       ` Ævar Arnfjörð Bjarmason
  2018-01-07 12:44         ` Duy Nguyen
  5 siblings, 1 reply; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-03 20:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Christian Couder, Ben Peart, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

A follow-up to the recently fixed bugs in the untracked
invalidation. If opendir() fails it should show a warning, perhaps
this should die, but if this ever happens the error is probably
recoverable for the user, and dying would just make things worse.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 dir.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 163ca69df0..a605e01692 100644
--- a/dir.c
+++ b/dir.c
@@ -1857,17 +1857,22 @@ static int open_cached_dir(struct cached_dir *cdir,
 			   struct strbuf *path,
 			   int check_only)
 {
+	const char *c_path;
+
 	memset(cdir, 0, sizeof(*cdir));
 	cdir->untracked = untracked;
 	if (valid_cached_dir(dir, untracked, istate, path, check_only))
 		return 0;
-	cdir->fdir = opendir(path->len ? path->buf : ".");
+	c_path = path->len ? path->buf : ".";
+	cdir->fdir = opendir(c_path);
 	if (dir->untracked) {
 		invalidate_directory(dir->untracked, untracked);
 		dir->untracked->dir_opened++;
 	}
-	if (!cdir->fdir)
+	if (!cdir->fdir) {
+		warning_errno(_("could not open directory '%s'"), c_path);
 		return -1;
+	}
 	return 0;
 }
 
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 5/5] dir.c: stop ignoring opendir() error in open_cached_dir()
  2018-01-03 20:49       ` [PATCH v2 5/5] dir.c: stop ignoring opendir() error in open_cached_dir() Ævar Arnfjörð Bjarmason
@ 2018-01-07 12:44         ` Duy Nguyen
  0 siblings, 0 replies; 13+ messages in thread
From: Duy Nguyen @ 2018-01-07 12:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeff King, Christian Couder,
	Ben Peart, Eric Sunshine

On Thu, Jan 4, 2018 at 3:49 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> A follow-up to the recently fixed bugs in the untracked
> invalidation. If opendir() fails it should show a warning, perhaps
> this should die, but if this ever happens the error is probably
> recoverable for the user, and dying would just make things worse.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  dir.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 163ca69df0..a605e01692 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1857,17 +1857,22 @@ static int open_cached_dir(struct cached_dir *cdir,
>                            struct strbuf *path,
>                            int check_only)
>  {
> +       const char *c_path;
> +
>         memset(cdir, 0, sizeof(*cdir));
>         cdir->untracked = untracked;
>         if (valid_cached_dir(dir, untracked, istate, path, check_only))
>                 return 0;
> -       cdir->fdir = opendir(path->len ? path->buf : ".");
> +       c_path = path->len ? path->buf : ".";
> +       cdir->fdir = opendir(c_path);
>         if (dir->untracked) {
>                 invalidate_directory(dir->untracked, untracked);
>                 dir->untracked->dir_opened++;
>         }
> -       if (!cdir->fdir)
> +       if (!cdir->fdir) {
> +               warning_errno(_("could not open directory '%s'"), c_path);

This should be closer to opendir(). The code in between,
invalidate_directory(), could have modified errno and we would print
incorrect message here.

But you can't simply move the whole "if (!cdir->fdir) { .. }" block up
either because we want to invalidate before returning -1, so perhaps

       cdir->fdir = opendir(c_path);
       if (!cdir->fdir)
               warning_errno(_("could not open directory '%s'"), c_path);
       if (dir->untracked) { ... }
       if (!cdir->fdir)
               return -1;

>                 return -1;
> +       }
>         return 0;
>  }
>
> --
> 2.15.1.424.g9478a66081
>



-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-01-07 12:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-28  0:28 [PATCH] dir.c: avoid stat() in valid_cached_dir() Nguyễn Thái Ngọc Duy
2017-12-28 19:05 ` Junio C Hamano
2017-12-28 19:10   ` Junio C Hamano
2018-01-01 23:57     ` Duy Nguyen
2018-01-03 20:49       ` [PATCH v2 0/5] untracked cache bug fixes Ævar Arnfjörð Bjarmason
2018-01-03 20:49       ` [PATCH v2 1/5] status: add a failing test showing a core.untrackedCache bug Ævar Arnfjörð Bjarmason
2018-01-03 20:49       ` [PATCH v2 2/5] dir.c: avoid stat() in valid_cached_dir() Ævar Arnfjörð Bjarmason
2018-01-03 20:49       ` [PATCH v2 3/5] dir.c: fix missing dir invalidation in untracked code Ævar Arnfjörð Bjarmason
2018-01-03 20:49       ` [PATCH v2 4/5] update-index doc: note a fixed bug in the untracked cache Ævar Arnfjörð Bjarmason
2018-01-03 20:49       ` [PATCH v2 5/5] dir.c: stop ignoring opendir() error in open_cached_dir() Ævar Arnfjörð Bjarmason
2018-01-07 12:44         ` Duy Nguyen
2017-12-28 19:50 ` [PATCH] dir.c: avoid stat() in valid_cached_dir() Ævar Arnfjörð Bjarmason
2018-01-02  0:02   ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).