git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc
@ 2017-05-08  9:41 Johannes Schindelin
  2017-05-08  9:41 ` [PATCH 1/1] unpack-trees: preserve index extensions Johannes Schindelin
  2017-05-08 10:12 ` [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc Christian Couder
  0 siblings, 2 replies; 8+ messages in thread
From: Johannes Schindelin @ 2017-05-08  9:41 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Ben Peart

I recently sent out a request for assistance, after noticing that the
untracked cache is simply thrown away after operations such as
`git checkout` or `git reset --hard`:

http://public-inbox.org/git/alpine.DEB.2.20.1705031202470.3480@virtualbox/

Duy responded with some high-level reasoning that it should be possible
to simply reuse the untracked cache data structure in the new index, as
he had a gut feeling that "we do invalidation right".

I did not have time to back that up by a thorough analysis of the code,
but it turns out that it is unnecessary: Ben Peart pointed me to a patch
of Dave Turner's that was submitted as part of the watchman series,
addressing the very issue about which I was concerned.

And I trust Dave to have validated the idea that the untracked cache
invalidation "is done right" even when we simply move the pointer to a
different index_state struct than originally.

Seeing as the untracked cache being dropped unceremoniously when it
should not be dropped, in a surprising number of operations, I think it
is a sensible change, and important, too, and independent enough from
the watchman patches to merit being separated out and applied pretty
soon.

So what I did was simply to drop the two lines from this patch that
referred to index_state fields added by Dave's watchman patch series.

Please do not mistake this for a sign that I am disinterested in
watchman support, far from it... stay tuned ;-)

Oh, and I adjusted Dave's email address. Dave, is that okay?

As we are in a feature freeze phase, I was debating whether to send out
this patch now or later.

Having thought about it for quite a bit, I am now convinced that this
patch fixes a bug in the untracked cache feature that is so critical as
to render it useless: if you

- have to switch between branches frequently, or
- rebase frequently (which calls `git reset --hard`), or
- stash frequently (which calls `git reset --hard`),

it is as if you had not enabled the untracked cache at all. Even worse,
Git will do a ton of work to recreate the untracked cache and to store
it as an index extension, *just* to throw the untracked away in the end.


David Turner (1):
  unpack-trees: preserve index extensions

 cache.h                           |  1 +
 read-cache.c                      |  6 ++++++
 t/t7063-status-untracked-cache.sh | 22 ++++++++++++++++++++++
 unpack-trees.c                    |  1 +
 4 files changed, 30 insertions(+)


base-commit: 4fa66c85f11bc5a541462ca5ae3246aa0ce02e74
Published-As: https://github.com/dscho/git/releases/tag/preserve-untracked-cache-v1
Fetch-It-Via: git fetch https://github.com/dscho/git preserve-untracked-cache-v1

-- 
2.12.2.windows.2.800.gede8f145e06


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] unpack-trees: preserve index extensions
  2017-05-08  9:41 [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc Johannes Schindelin
@ 2017-05-08  9:41 ` Johannes Schindelin
  2017-05-08 10:12 ` [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc Christian Couder
  1 sibling, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2017-05-08  9:41 UTC (permalink / raw)
  To: git
  Cc: David Turner, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Ben Peart

From: David Turner <dturner@twosigma.com>

Make git checkout (and other unpack_tree operations) preserve the
untracked cache. This is valuable for two reasons:

1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.

2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.

[jes: backed out the watchman-specific parts]

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 cache.h                           |  1 +
 read-cache.c                      |  6 ++++++
 t/t7063-status-untracked-cache.sh | 22 ++++++++++++++++++++++
 unpack-trees.c                    |  1 +
 4 files changed, 30 insertions(+)

diff --git a/cache.h b/cache.h
index e1f0e182ad0..d41336dfd5d 100644
--- a/cache.h
+++ b/cache.h
@@ -597,6 +597,7 @@ extern int read_index_unmerged(struct index_state *);
 #define CLOSE_LOCK		(1 << 1)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
+extern void move_index_extensions(struct index_state *dst, struct index_state *src);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
 extern int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
diff --git a/read-cache.c b/read-cache.c
index 0d0081a11b8..79827a4d710 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2628,3 +2628,9 @@ void stat_validity_update(struct stat_validity *sv, int fd)
 		fill_stat_data(sv->sd, &st);
 	}
 }
+
+void move_index_extensions(struct index_state *dst, struct index_state *src)
+{
+	dst->untracked = src->untracked;
+	src->untracked = NULL;
+}
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 0667bd9dd3e..e5fb892f957 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -661,4 +661,26 @@ test_expect_success 'test ident field is working' '
 	test_i18ncmp ../expect ../err
 '
 
+test_expect_success 'untracked cache survives a checkout' '
+	git commit --allow-empty -m empty &&
+	test-dump-untracked-cache >../before &&
+	test_when_finished  "git checkout master" &&
+	git checkout -b other_branch &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after &&
+	test_commit test &&
+	test-dump-untracked-cache >../before &&
+	git checkout master &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
+test_expect_success 'untracked cache survives a commit' '
+	test-dump-untracked-cache >../before &&
+	git add done/two &&
+	git commit -m commit &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index aa15111fefc..17117bd0fda 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1391,6 +1391,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
+		move_index_extensions(&o->result, o->dst_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
-- 
2.12.2.windows.2.800.gede8f145e06

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc
  2017-05-08  9:41 [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc Johannes Schindelin
  2017-05-08  9:41 ` [PATCH 1/1] unpack-trees: preserve index extensions Johannes Schindelin
@ 2017-05-08 10:12 ` Christian Couder
  2017-05-08 15:58   ` David Turner
  1 sibling, 1 reply; 8+ messages in thread
From: Christian Couder @ 2017-05-08 10:12 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Ben Peart, David Turner

(Adding Dave in Cc as it looks like he is involved.)

On Mon, May 8, 2017 at 11:41 AM, Johannes Schindelin
<johannes.schindelin@gmx.de> wrote:
> I recently sent out a request for assistance, after noticing that the
> untracked cache is simply thrown away after operations such as
> `git checkout` or `git reset --hard`:
>
> http://public-inbox.org/git/alpine.DEB.2.20.1705031202470.3480@virtualbox/
>
> Duy responded with some high-level reasoning that it should be possible
> to simply reuse the untracked cache data structure in the new index, as
> he had a gut feeling that "we do invalidation right".
>
> I did not have time to back that up by a thorough analysis of the code,
> but it turns out that it is unnecessary: Ben Peart pointed me to a patch
> of Dave Turner's that was submitted as part of the watchman series,
> addressing the very issue about which I was concerned.
>
> And I trust Dave to have validated the idea that the untracked cache
> invalidation "is done right" even when we simply move the pointer to a
> different index_state struct than originally.
>
> Seeing as the untracked cache being dropped unceremoniously when it
> should not be dropped, in a surprising number of operations, I think it
> is a sensible change, and important, too, and independent enough from
> the watchman patches to merit being separated out and applied pretty
> soon.
>
> So what I did was simply to drop the two lines from this patch that
> referred to index_state fields added by Dave's watchman patch series.
>
> Please do not mistake this for a sign that I am disinterested in
> watchman support, far from it... stay tuned ;-)
>
> Oh, and I adjusted Dave's email address. Dave, is that okay?
>
> As we are in a feature freeze phase, I was debating whether to send out
> this patch now or later.
>
> Having thought about it for quite a bit, I am now convinced that this
> patch fixes a bug in the untracked cache feature that is so critical as
> to render it useless: if you
>
> - have to switch between branches frequently, or
> - rebase frequently (which calls `git reset --hard`), or
> - stash frequently (which calls `git reset --hard`),
>
> it is as if you had not enabled the untracked cache at all. Even worse,
> Git will do a ton of work to recreate the untracked cache and to store
> it as an index extension, *just* to throw the untracked away in the end.
>
>
> David Turner (1):
>   unpack-trees: preserve index extensions
>
>  cache.h                           |  1 +
>  read-cache.c                      |  6 ++++++
>  t/t7063-status-untracked-cache.sh | 22 ++++++++++++++++++++++
>  unpack-trees.c                    |  1 +
>  4 files changed, 30 insertions(+)
>
>
> base-commit: 4fa66c85f11bc5a541462ca5ae3246aa0ce02e74
> Published-As: https://github.com/dscho/git/releases/tag/preserve-untracked-cache-v1
> Fetch-It-Via: git fetch https://github.com/dscho/git preserve-untracked-cache-v1
>
> --
> 2.12.2.windows.2.800.gede8f145e06
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc
  2017-05-08 10:12 ` [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc Christian Couder
@ 2017-05-08 15:58   ` David Turner
  2017-05-09  5:02     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: David Turner @ 2017-05-08 15:58 UTC (permalink / raw)
  To: 'Christian Couder', Johannes Schindelin
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Ben Peart

Can you actually keep the email address as my Twopensource one?  I want to make sure that Twitter, my employer at the time, gets credit for this work (just as I want to make sure that my current employer, Two Sigma, gets credit for my current work).

Please feel free to add Signed-off-by: David Turner <dturner@twosigma.com> in case that makes tracking easier.

Thanks.

WRT the actual patch, I want to note that past me did not do a great job here.  The tests do not correctly check that the post-checkout untracked cache is still valid after a checkout.  For example, let's say that previously, the directory foo was entirely untracked (but it contained a file bar), but after the checkout, there is a file foo/baz.  Does the untracked cache need to get updated?  

Unfortunately, the untracked cache is very unlikely to make it to the top of my priority list any time soon, so I won't be able to correct this test (and, if necessary, correct the code).    But I would strongly suggest that the test be improved before this code is merged.

Thanks for CCing me.

> -----Original Message-----
> From: Christian Couder [mailto:christian.couder@gmail.com]
> Sent: Monday, May 8, 2017 6:12 AM
> To: Johannes Schindelin <johannes.schindelin@gmx.de>
> Cc: git <git@vger.kernel.org>; Junio C Hamano <gitster@pobox.com>; Nguyễn
> Thái Ngọc Duy <pclouds@gmail.com>; Ben Peart <benpeart@microsoft.com>;
> David Turner <David.Turner@twosigma.com>
> Subject: Re: [PATCH 0/1] Preserve the untracked cache across checkout, reset --
> hard, etc
> 
> (Adding Dave in Cc as it looks like he is involved.)
> 
> On Mon, May 8, 2017 at 11:41 AM, Johannes Schindelin
> <johannes.schindelin@gmx.de> wrote:
> > I recently sent out a request for assistance, after noticing that the
> > untracked cache is simply thrown away after operations such as `git
> > checkout` or `git reset --hard`:
> >
> > http://public-inbox.org/git/alpine.DEB.2.20.1705031202470.3480@virtual
> > box/
> >
> > Duy responded with some high-level reasoning that it should be
> > possible to simply reuse the untracked cache data structure in the new
> > index, as he had a gut feeling that "we do invalidation right".
> >
> > I did not have time to back that up by a thorough analysis of the
> > code, but it turns out that it is unnecessary: Ben Peart pointed me to
> > a patch of Dave Turner's that was submitted as part of the watchman
> > series, addressing the very issue about which I was concerned.
> >
> > And I trust Dave to have validated the idea that the untracked cache
> > invalidation "is done right" even when we simply move the pointer to a
> > different index_state struct than originally.
> >
> > Seeing as the untracked cache being dropped unceremoniously when it
> > should not be dropped, in a surprising number of operations, I think
> > it is a sensible change, and important, too, and independent enough
> > from the watchman patches to merit being separated out and applied
> > pretty soon.
> >
> > So what I did was simply to drop the two lines from this patch that
> > referred to index_state fields added by Dave's watchman patch series.
> >
> > Please do not mistake this for a sign that I am disinterested in
> > watchman support, far from it... stay tuned ;-)
> >
> > Oh, and I adjusted Dave's email address. Dave, is that okay?
> >
> > As we are in a feature freeze phase, I was debating whether to send
> > out this patch now or later.
> >
> > Having thought about it for quite a bit, I am now convinced that this
> > patch fixes a bug in the untracked cache feature that is so critical
> > as to render it useless: if you
> >
> > - have to switch between branches frequently, or
> > - rebase frequently (which calls `git reset --hard`), or
> > - stash frequently (which calls `git reset --hard`),
> >
> > it is as if you had not enabled the untracked cache at all. Even
> > worse, Git will do a ton of work to recreate the untracked cache and
> > to store it as an index extension, *just* to throw the untracked away in the
> end.
> >
> >
> > David Turner (1):
> >   unpack-trees: preserve index extensions
> >
> >  cache.h                           |  1 +
> >  read-cache.c                      |  6 ++++++
> >  t/t7063-status-untracked-cache.sh | 22 ++++++++++++++++++++++
> >  unpack-trees.c                    |  1 +
> >  4 files changed, 30 insertions(+)
> >
> >
> > base-commit: 4fa66c85f11bc5a541462ca5ae3246aa0ce02e74
> > Published-As:
> > https://github.com/dscho/git/releases/tag/preserve-untracked-cache-v1
> > Fetch-It-Via: git fetch https://github.com/dscho/git
> > preserve-untracked-cache-v1
> >
> > --
> > 2.12.2.windows.2.800.gede8f145e06
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc
  2017-05-08 15:58   ` David Turner
@ 2017-05-09  5:02     ` Junio C Hamano
  2017-05-09 12:51       ` Ben Peart
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-05-09  5:02 UTC (permalink / raw)
  To: David Turner
  Cc: 'Christian Couder', Johannes Schindelin, git,
	Nguyễn Thái Ngọc Duy, Ben Peart

David Turner <David.Turner@twosigma.com> writes:

> Can you actually keep the email address as my Twopensource one?  I want to make sure that Twitter, my employer at the time, gets credit for this work (just as I want to make sure that my current employer, Two Sigma, gets credit for my current work).
>
> Please feel free to add Signed-off-by: David Turner <dturner@twosigma.com> in case that makes tracking easier.
>
> Thanks.
>
> WRT the actual patch, I want to note that past me did not do a
> great job here.  The tests do not correctly check that the
> post-checkout untracked cache is still valid after a checkout.
> For example, let's say that previously, the directory foo was
> entirely untracked (but it contained a file bar), but after the
> checkout, there is a file foo/baz.  Does the untracked cache need
> to get updated?
>
> Unfortunately, the untracked cache is very unlikely to make it to
> the top of my priority list any time soon, so I won't be able to
> correct this test (and, if necessary, correct the code).  But I
> would strongly suggest that the test be improved before this code
> is merged.
>
> Thanks for CCing me.

I will try to find time to tweak what was sent to the list here to
reflect your affiliations better, but marked with DONTMERGE waiting
for the necessary updates you mentioned above, so that this change
is not forgotten.  It may turn out to be that copying from src to
dst like the patch does is all that is needed, or the cache may need
further invalidation when the copying happens, and I haven't got a
good feeling that anybody who are familiar with the codepath vetted
the correctness from seeing the discussion from sidelines (yet).

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc
  2017-05-09  5:02     ` Junio C Hamano
@ 2017-05-09 12:51       ` Ben Peart
  2017-05-18 16:58         ` Ben Peart
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Peart @ 2017-05-09 12:51 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: 'Christian Couder', Johannes Schindelin, git,
	Nguyễn Thái Ngọc Duy, Ben Peart


On 5/9/2017 1:02 AM, Junio C Hamano wrote:
> David Turner <David.Turner@twosigma.com> writes:
>
>> Can you actually keep the email address as my Twopensource one?  I want to make sure that Twitter, my employer at the time, gets credit for this work (just as I want to make sure that my current employer, Two Sigma, gets credit for my current work).
>>
>> Please feel free to add Signed-off-by: David Turner <dturner@twosigma.com> in case that makes tracking easier.
>>
>> Thanks.
>>
>> WRT the actual patch, I want to note that past me did not do a
>> great job here.  The tests do not correctly check that the
>> post-checkout untracked cache is still valid after a checkout.
>> For example, let's say that previously, the directory foo was
>> entirely untracked (but it contained a file bar), but after the
>> checkout, there is a file foo/baz.  Does the untracked cache need
>> to get updated?
>>
>> Unfortunately, the untracked cache is very unlikely to make it to
>> the top of my priority list any time soon, so I won't be able to
>> correct this test (and, if necessary, correct the code).  But I
>> would strongly suggest that the test be improved before this code
>> is merged.
>>
>> Thanks for CCing me.
> I will try to find time to tweak what was sent to the list here to
> reflect your affiliations better, but marked with DONTMERGE waiting
> for the necessary updates you mentioned above, so that this change
> is not forgotten.  It may turn out to be that copying from src to
> dst like the patch does is all that is needed, or the cache may need
> further invalidation when the copying happens, and I haven't got a
> good feeling that anybody who are familiar with the codepath vetted
> the correctness from seeing the discussion from sidelines (yet).
>
> Thanks.

I've been looking into similar issues with the cache associated with 
using a file system monitor (aka Watchman) 
(https://github.com/git-for-windows/git/compare/master...benpeart:fsmonitor) 
to speed updating the index to reflect changes in the working directory.

I can take a look and see if this patch results in valid results and 
reply to the thread or resubmit as needed.

Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc
  2017-05-09 12:51       ` Ben Peart
@ 2017-05-18 16:58         ` Ben Peart
  2017-05-18 17:30           ` David Turner
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Peart @ 2017-05-18 16:58 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: 'Christian Couder', Johannes Schindelin, git,
	Nguyễn Thái Ngọc Duy, Ben Peart



On 5/9/2017 8:51 AM, Ben Peart wrote:
>
> On 5/9/2017 1:02 AM, Junio C Hamano wrote:
>> David Turner <David.Turner@twosigma.com> writes:
>>
>>> Can you actually keep the email address as my Twopensource one?  I
>>> want to make sure that Twitter, my employer at the time, gets credit
>>> for this work (just as I want to make sure that my current employer,
>>> Two Sigma, gets credit for my current work).
>>>
>>> Please feel free to add Signed-off-by: David Turner
>>> <dturner@twosigma.com> in case that makes tracking easier.
>>>
>>> Thanks.
>>>
>>> WRT the actual patch, I want to note that past me did not do a
>>> great job here.  The tests do not correctly check that the
>>> post-checkout untracked cache is still valid after a checkout.
>>> For example, let's say that previously, the directory foo was
>>> entirely untracked (but it contained a file bar), but after the
>>> checkout, there is a file foo/baz.  Does the untracked cache need
>>> to get updated?
>>>
>>> Unfortunately, the untracked cache is very unlikely to make it to
>>> the top of my priority list any time soon, so I won't be able to
>>> correct this test (and, if necessary, correct the code).  But I
>>> would strongly suggest that the test be improved before this code
>>> is merged.
>>>
>>> Thanks for CCing me.
>> I will try to find time to tweak what was sent to the list here to
>> reflect your affiliations better, but marked with DONTMERGE waiting
>> for the necessary updates you mentioned above, so that this change
>> is not forgotten.  It may turn out to be that copying from src to
>> dst like the patch does is all that is needed, or the cache may need
>> further invalidation when the copying happens, and I haven't got a
>> good feeling that anybody who are familiar with the codepath vetted
>> the correctness from seeing the discussion from sidelines (yet).
>>
>> Thanks.
>
> I've been looking into similar issues with the cache associated with
> using a file system monitor (aka Watchman)
> (https://github.com/git-for-windows/git/compare/master...benpeart:fsmonitor)
> to speed updating the index to reflect changes in the working directory.
>
> I can take a look and see if this patch results in valid results and
> reply to the thread or resubmit as needed.
>
> Ben

TLDR: the patch looks good from my perspective but I'd like the experts 
to weigh in as well.

After digging into the untracked cache code and thinking about whether 
it is reasonable to copy the cache from the old index to the new index 
in unpack_trees() I believe the answer is "yes."  I'm not the expert in 
this code so I'll outline my reasoning here and hopefully the real 
experts can review it and see if I've missed something.

The interesting part of the untracked cache for this discussion is the 
list of untracked_cache_dir structures.  Because each directory cache 
entry contains stat_data (esp ctime and mtime) for that directory - the 
existing logic will detect if that directory has had any changes made in 
it since the cache entry was saved.  It doesn't really care when, why, 
or how the change was made, just if one has happened.

I then tried to think of ways that this logic could be broken (like 
David's example above) but was unsuccessful in coming up with any.  This 
makes sense because the untracked cache obviously has to correctly 
detect _any_ change so really doesn't care whether it's cached state was 
initially saved before or after a call to unpack_trees().

Even scenarios of creating files in sub-directories of sub-directories 
works because eventually, either is a directory or file is created in a 
cached directory entry which will change the mtime of that directory and 
invalidate that part of the cache.

Ultimately, it is this behavior of saving the mtime of each cached 
directory that makes this all work as each entry can be 
validated/invalidated separately from all the rest and independently 
from the index from which they came.


Once I did the code examination and thinking exercise, I wanted to test 
it out and see if the theory held up.  I started out with some manual 
testing (esp of the scenario David mentioned) and then wrote a couple of 
additional tests all of which passed.

I then ran all existing git tests with the patch applied and they all 
passed.  This only really tells us that it didn't break anything because 
untracked cache is turned off by default but it does show us that it 
still passes the untracked cache specific test cases (as they obviously 
turn it on).

I then modified the test_create_repo() function in test-lib-functions.sh 
to turn on the untracked cache feature after creating the test repo and 
ran all the tests again twice - the first time without the patch and 
again with the patch).  This run is more interesting because it is 
testing that having the untracked cache turned (with and without the 
patch) on doesn't break anything.

There were two test scripts that had failures:

t7063-status-untracked-cache.sh failed the test "not ok 1 - 
core.untrackedCache is unset"  This is actually a positive result 
because it is showing that I successfully turned on the untracked cache 
feature.

t1700-split-index.sh failed several tests in both runs (with and without 
patch) and upon examining the tests and their failures they are to be 
expected and do not indicate any bug.  Specifically, the failures were 
caused because the tests check the sha of the index against a hard coded 
value in the test script.  Because the untracked cache is turned on, the 
sha of the index does not match that hard coded value.  I edited several 
of the tests to update the sha they are checking against to match the 
sha actually generated and the tests pass.

In the end, both my code examination and all the testing I was able to 
do give me some confidence that the patch will produce valid results. 
However, I'm not the expert in this area so I'd like the experts to 
weigh in on any potential issues this patch may cause that I've missed.

Thanks,

Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc
  2017-05-18 16:58         ` Ben Peart
@ 2017-05-18 17:30           ` David Turner
  0 siblings, 0 replies; 8+ messages in thread
From: David Turner @ 2017-05-18 17:30 UTC (permalink / raw)
  To: 'Ben Peart', Junio C Hamano
  Cc: 'Christian Couder', Johannes Schindelin, git,
	Nguyễn Thái Ngọc Duy, Ben Peart

> -----Original Message-----
> From: Ben Peart [mailto:peartben@gmail.com]
> Sent: Thursday, May 18, 2017 12:58 PM
> To: Junio C Hamano <gitster@pobox.com>; David Turner
> <David.Turner@twosigma.com>
> Cc: 'Christian Couder' <christian.couder@gmail.com>; Johannes Schindelin
> <johannes.schindelin@gmx.de>; git <git@vger.kernel.org>; Nguyễn Thái Ngọc
> Duy <pclouds@gmail.com>; Ben Peart <benpeart@microsoft.com>
> Subject: Re: [PATCH 0/1] Preserve the untracked cache across checkout, reset --
> hard, etc
> 
> 
> 
> On 5/9/2017 8:51 AM, Ben Peart wrote:
> >
> > On 5/9/2017 1:02 AM, Junio C Hamano wrote:
> >> David Turner <David.Turner@twosigma.com> writes:
> >>
> >>> Can you actually keep the email address as my Twopensource one?  I
> >>> want to make sure that Twitter, my employer at the time, gets credit
> >>> for this work (just as I want to make sure that my current employer,
> >>> Two Sigma, gets credit for my current work).
> >>>
> >>> Please feel free to add Signed-off-by: David Turner
> >>> <dturner@twosigma.com> in case that makes tracking easier.
> >>>
> >>> Thanks.
> >>>
> >>> WRT the actual patch, I want to note that past me did not do a great
> >>> job here.  The tests do not correctly check that the post-checkout
> >>> untracked cache is still valid after a checkout.
> >>> For example, let's say that previously, the directory foo was
> >>> entirely untracked (but it contained a file bar), but after the
> >>> checkout, there is a file foo/baz.  Does the untracked cache need to
> >>> get updated?
> >>>
> >>> Unfortunately, the untracked cache is very unlikely to make it to
> >>> the top of my priority list any time soon, so I won't be able to
> >>> correct this test (and, if necessary, correct the code).  But I
> >>> would strongly suggest that the test be improved before this code is
> >>> merged.
> >>>
> >>> Thanks for CCing me.
> >> I will try to find time to tweak what was sent to the list here to
> >> reflect your affiliations better, but marked with DONTMERGE waiting
> >> for the necessary updates you mentioned above, so that this change is
> >> not forgotten.  It may turn out to be that copying from src to dst
> >> like the patch does is all that is needed, or the cache may need
> >> further invalidation when the copying happens, and I haven't got a
> >> good feeling that anybody who are familiar with the codepath vetted
> >> the correctness from seeing the discussion from sidelines (yet).
> >>
> >> Thanks.
> >
> > I've been looking into similar issues with the cache associated with
> > using a file system monitor (aka Watchman)
> > (https://github.com/git-for-windows/git/compare/master...benpeart:fsmo
> > nitor) to speed updating the index to reflect changes in the working
> > directory.
> >
> > I can take a look and see if this patch results in valid results and
> > reply to the thread or resubmit as needed.
> >
> > Ben
> 
> TLDR: the patch looks good from my perspective but I'd like the experts to weigh
> in as well.

Thanks for looking into this.  I'm glad to learn that I got it right the first time, 
although I still wish I had been more assiduous about testing back then. 

> After digging into the untracked cache code and thinking about whether it is
> reasonable to copy the cache from the old index to the new index in
> unpack_trees() I believe the answer is "yes."  I'm not the expert in this code so I'll
> outline my reasoning here and hopefully the real experts can review it and see if
> I've missed something.
> 
> The interesting part of the untracked cache for this discussion is the list of
> untracked_cache_dir structures.  Because each directory cache entry contains
> stat_data (esp ctime and mtime) for that directory - the existing logic will detect
> if that directory has had any changes made in it since the cache entry was saved.
> It doesn't really care when, why, or how the change was made, just if one has
> happened.
> 
> I then tried to think of ways that this logic could be broken (like David's example
> above) but was unsuccessful in coming up with any.  This makes sense because
> the untracked cache obviously has to correctly detect _any_ change so really
> doesn't care whether it's cached state was initially saved before or after a call to
> unpack_trees().

It looks like unpack_trees calls (somewhere -- I didn't investigate the full call chain) 
untracked_cache_invalidate_entry.  It looks like my patch adds the move *after* 
any invalidation, though, so I think this is OK.

> Even scenarios of creating files in sub-directories of sub-directories works
> because eventually, either is a directory or file is created in a cached directory
> entry which will change the mtime of that directory and invalidate that part of
> the cache.
> 
> Ultimately, it is this behavior of saving the mtime of each cached directory that
> makes this all work as each entry can be validated/invalidated separately from
> all the rest and independently from the index from which they came.
> 
> 
> Once I did the code examination and thinking exercise, I wanted to test it out
> and see if the theory held up.  I started out with some manual testing (esp of the
> scenario David mentioned) and then wrote a couple of additional tests all of
> which passed.
> 
> I then ran all existing git tests with the patch applied and they all passed.  This
> only really tells us that it didn't break anything because untracked cache is
> turned off by default but it does show us that it still passes the untracked cache
> specific test cases (as they obviously turn it on).
> 
> I then modified the test_create_repo() function in test-lib-functions.sh to turn
> on the untracked cache feature after creating the test repo and ran all the tests
> again twice - the first time without the patch and again with the patch).  This run
> is more interesting because it is testing that having the untracked cache turned
> (with and without the
> patch) on doesn't break anything.
> 
> There were two test scripts that had failures:
> 
> t7063-status-untracked-cache.sh failed the test "not ok 1 - core.untrackedCache
> is unset"  This is actually a positive result because it is showing that I successfully
> turned on the untracked cache feature.
> 
> t1700-split-index.sh failed several tests in both runs (with and without
> patch) and upon examining the tests and their failures they are to be expected
> and do not indicate any bug.  Specifically, the failures were caused because the
> tests check the sha of the index against a hard coded value in the test script.
> Because the untracked cache is turned on, the sha of the index does not match
> that hard coded value.  I edited several of the tests to update the sha they are
> checking against to match the sha actually generated and the tests pass.
> 
> In the end, both my code examination and all the testing I was able to do give
> me some confidence that the patch will produce valid results.
> However, I'm not the expert in this area so I'd like the experts to weigh in on any
> potential issues this patch may cause that I've missed.

This testing seems sufficient to me, assuming that the new automated tests make it 
into the patch.  Thanks for finishing this up -- it had slipped my mind entirely.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-18 17:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08  9:41 [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc Johannes Schindelin
2017-05-08  9:41 ` [PATCH 1/1] unpack-trees: preserve index extensions Johannes Schindelin
2017-05-08 10:12 ` [PATCH 0/1] Preserve the untracked cache across checkout, reset --hard, etc Christian Couder
2017-05-08 15:58   ` David Turner
2017-05-09  5:02     ` Junio C Hamano
2017-05-09 12:51       ` Ben Peart
2017-05-18 16:58         ` Ben Peart
2017-05-18 17:30           ` David Turner

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).