Confused about sparse vs untracked-cache

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Confused about sparse vs untracked-cache
@ 2015-07-30  2:32 David Turner
  2015-07-30 14:09 ` Duy Nguyen
  0 siblings, 1 reply; 5+ messages in thread
From: David Turner @ 2015-07-30  2:32 UTC (permalink / raw)
  To: git mailing list, Duy Nguyen

I'm looking at dir.c, and there's a bit I'm confused about:

prep_exclude() says:
		     /*
		      * .. and .gitignore does not exist before
		      * (i.e. null exclude_sha1 and skip_worktree is
		      * not set). Then we can skip loading .gitignore,
		      * which would result in ENOENT anyway.
		      * skip_worktree is taken care in read_directory()
		      */
		     !is_null_sha1(untracked->exclude_sha1))) {

That "skip_worktree is taken care in read_directory()" appears to be
referring to this bit of validate_untracked_cache():
	/*
	 * An optimization in prep_exclude() does not play well with
	 * CE_SKIP_WORKTREE. It's a rare case anyway, if a single
	 * entry has that bit set, disable the whole untracked cache.
	 */
	for (i = 0; i < active_nr; i++)
		if (ce_skip_worktree(active_cache[i]))
			return NULL;
------------
I'm confused about why skip_worktree needs to be unset.  When I comment
out the second snippet, all the tests still pass.  What was the reason
behind that condition?  Is it really necessary?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Confused about sparse vs untracked-cache
  2015-07-30  2:32 Confused about sparse vs untracked-cache David Turner
@ 2015-07-30 14:09 ` Duy Nguyen
  2015-07-30 23:30   ` David Turner
  0 siblings, 1 reply; 5+ messages in thread
From: Duy Nguyen @ 2015-07-30 14:09 UTC (permalink / raw)
  To: David Turner; +Cc: git mailing list

On Thu, Jul 30, 2015 at 9:32 AM, David Turner <dturner@twopensource.com> wrote:
> I'm looking at dir.c, and there's a bit I'm confused about:
>
> prep_exclude() says:
>                      /*
>                       * .. and .gitignore does not exist before
>                       * (i.e. null exclude_sha1 and skip_worktree is
>                       * not set). Then we can skip loading .gitignore,
>                       * which would result in ENOENT anyway.
>                       * skip_worktree is taken care in read_directory()
>                       */
>                      !is_null_sha1(untracked->exclude_sha1))) {
>
> That "skip_worktree is taken care in read_directory()" appears to be
> referring to this bit of validate_untracked_cache():
>         /*
>          * An optimization in prep_exclude() does not play well with
>          * CE_SKIP_WORKTREE. It's a rare case anyway, if a single
>          * entry has that bit set, disable the whole untracked cache.
>          */
>         for (i = 0; i < active_nr; i++)
>                 if (ce_skip_worktree(active_cache[i]))
>                         return NULL;
> ------------
> I'm confused about why skip_worktree needs to be unset.  When I comment
> out the second snippet, all the tests still pass.  What was the reason
> behind that condition?  Is it really necessary?

This code is added in 27b099a (untracked cache: don't open
non-existent .gitignore - 2015-03-08) so it's about non-existent
.gitignore files. We have two cases: .gitignore does not exist, which
we want to avoid opening it, and .gitignore does not exist, but it's
in the index and is skip-worktree'd. We would want to call
add_excludes() anyway in the second case.

I think I followed that train of thought when I wrote this and to
avoid trouble, I just left skip-worktree case of out. But that
ce_skip_worktree() check in read_directory() is probably too strong.
All we need is disable the cache only when there's an .gitignore file
being skip-worktree'd. If we do that and make sure all .gitignore
files are not skip-worktree'd, the two can work toghether.

Back to the problem. The question is, is
is_null_sha1(untracked->exclude_sha1) enough to satisfy both cases? If
so, we don't have to disable the cache in the presence of
skip-worktree. I haven't stared at this code again long enough to be
confident, but I think we may be alright. exclude_sha1 should reflect
the true, effective .gitignore content, wherever it's from. So in
skip-worktree case, is_null_sha1(exclude_sha1) should only be true
when the entry does not exist in both worktree and index. There will
be an unnecessary open() in this case before the index version is
used, but that's probably ok.

No I don't think the tests cover skip-worktree + untracked cache
combination, so yeah it would pass.
-- 
Duy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Confused about sparse vs untracked-cache
  2015-07-30 14:09 ` Duy Nguyen
@ 2015-07-30 23:30   ` David Turner
  2015-07-31  5:13     ` David Turner
  0 siblings, 1 reply; 5+ messages in thread
From: David Turner @ 2015-07-30 23:30 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git mailing list

On Thu, 2015-07-30 at 21:09 +0700, Duy Nguyen wrote:
> On Thu, Jul 30, 2015 at 9:32 AM, David Turner <dturner@twopensource.com> wrote:
> > I'm looking at dir.c, and there's a bit I'm confused about:
> >
> > prep_exclude() says:
> >                      /*
> >                       * .. and .gitignore does not exist before
> >                       * (i.e. null exclude_sha1 and skip_worktree is
> >                       * not set). Then we can skip loading .gitignore,
> >                       * which would result in ENOENT anyway.
> >                       * skip_worktree is taken care in read_directory()
> >                       */
> >                      !is_null_sha1(untracked->exclude_sha1))) {
> >
> > That "skip_worktree is taken care in read_directory()" appears to be
> > referring to this bit of validate_untracked_cache():
> >         /*
> >          * An optimization in prep_exclude() does not play well with
> >          * CE_SKIP_WORKTREE. It's a rare case anyway, if a single
> >          * entry has that bit set, disable the whole untracked cache.
> >          */
> >         for (i = 0; i < active_nr; i++)
> >                 if (ce_skip_worktree(active_cache[i]))
> >                         return NULL;
> > ------------
> > I'm confused about why skip_worktree needs to be unset.  When I comment
> > out the second snippet, all the tests still pass.  What was the reason
> > behind that condition?  Is it really necessary?
> 
> This code is added in 27b099a (untracked cache: don't open
> non-existent .gitignore - 2015-03-08) so it's about non-existent
> .gitignore files. We have two cases: .gitignore does not exist, which
> we want to avoid opening it, and .gitignore does not exist, but it's
> in the index and is skip-worktree'd. We would want to call
> add_excludes() anyway in the second case.
> 
> I think I followed that train of thought when I wrote this and to
> avoid trouble, I just left skip-worktree case of out. But that
> ce_skip_worktree() check in read_directory() is probably too strong.
> All we need is disable the cache only when there's an .gitignore file
> being skip-worktree'd. If we do that and make sure all .gitignore
> files are not skip-worktree'd, the two can work toghether.
> 
> Back to the problem. The question is, is
> is_null_sha1(untracked->exclude_sha1) enough to satisfy both cases? If
> so, we don't have to disable the cache in the presence of
> skip-worktree. I haven't stared at this code again long enough to be
> confident, but I think we may be alright. exclude_sha1 should reflect
> the true, effective .gitignore content, wherever it's from. So in
> skip-worktree case, is_null_sha1(exclude_sha1) should only be true
> when the entry does not exist in both worktree and index. There will
> be an unnecessary open() in this case before the index version is
> used, but that's probably ok.
> 
> No I don't think the tests cover skip-worktree + untracked cache
> combination, so yeah it would pass.

I went to add some tests for this, but I ran into what appears to be an
independent bug in the untracked cache code. 

The call to lookup_untracked in treat_directory() passes dirname, but
dirname is the full name of the directory.  lookup_untracked is looking
for the base name -- that is, the code needs to subtract out the length
of not just the parent dir, but all ancestor directories.  So it ends up
creating an entry named e.g. foo/bar inside foo (instead of just finding
an existing 'bar', or adding one called 'bar')

I wasn't really sure what the right way to fix this is, as I just
started looking into this code, but I thought I should report it.

My test case for this was having two levels of subdirectory e.g.
foo/bar/some-file.  

Let me know if you need more details.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Confused about sparse vs untracked-cache
  2015-07-30 23:30   ` David Turner
@ 2015-07-31  5:13     ` David Turner
  2015-07-31 11:28       ` Duy Nguyen
  0 siblings, 1 reply; 5+ messages in thread
From: David Turner @ 2015-07-31  5:13 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git mailing list

On Thu, 2015-07-30 at 19:30 -0400, David Turner wrote:
> On Thu, 2015-07-30 at 21:09 +0700, Duy Nguyen wrote:
> > On Thu, Jul 30, 2015 at 9:32 AM, David Turner <dturner@twopensource.com> wrote:
> > > I'm looking at dir.c, and there's a bit I'm confused about:
> > >
> > > prep_exclude() says:
> > >                      /*
> > >                       * .. and .gitignore does not exist before
> > >                       * (i.e. null exclude_sha1 and skip_worktree is
> > >                       * not set). Then we can skip loading .gitignore,
> > >                       * which would result in ENOENT anyway.
> > >                       * skip_worktree is taken care in read_directory()
> > >                       */
> > >                      !is_null_sha1(untracked->exclude_sha1))) {
> > >
> > > That "skip_worktree is taken care in read_directory()" appears to be
> > > referring to this bit of validate_untracked_cache():
> > >         /*
> > >          * An optimization in prep_exclude() does not play well with
> > >          * CE_SKIP_WORKTREE. It's a rare case anyway, if a single
> > >          * entry has that bit set, disable the whole untracked cache.
> > >          */
> > >         for (i = 0; i < active_nr; i++)
> > >                 if (ce_skip_worktree(active_cache[i]))
> > >                         return NULL;
> > > ------------
> > > I'm confused about why skip_worktree needs to be unset.  When I comment
> > > out the second snippet, all the tests still pass.  What was the reason
> > > behind that condition?  Is it really necessary?
> > 
> > This code is added in 27b099a (untracked cache: don't open
> > non-existent .gitignore - 2015-03-08) so it's about non-existent
> > .gitignore files. We have two cases: .gitignore does not exist, which
> > we want to avoid opening it, and .gitignore does not exist, but it's
> > in the index and is skip-worktree'd. We would want to call
> > add_excludes() anyway in the second case.
> > 
> > I think I followed that train of thought when I wrote this and to
> > avoid trouble, I just left skip-worktree case of out. But that
> > ce_skip_worktree() check in read_directory() is probably too strong.
> > All we need is disable the cache only when there's an .gitignore file
> > being skip-worktree'd. If we do that and make sure all .gitignore
> > files are not skip-worktree'd, the two can work toghether.
> > 
> > Back to the problem. The question is, is
> > is_null_sha1(untracked->exclude_sha1) enough to satisfy both cases? If
> > so, we don't have to disable the cache in the presence of
> > skip-worktree. I haven't stared at this code again long enough to be
> > confident, but I think we may be alright. exclude_sha1 should reflect
> > the true, effective .gitignore content, wherever it's from. So in
> > skip-worktree case, is_null_sha1(exclude_sha1) should only be true
> > when the entry does not exist in both worktree and index. There will
> > be an unnecessary open() in this case before the index version is
> > used, but that's probably ok.
> > 
> > No I don't think the tests cover skip-worktree + untracked cache
> > combination, so yeah it would pass.
> 
> I went to add some tests for this, but I ran into what appears to be an
> independent bug in the untracked cache code. 
> 
> The call to lookup_untracked in treat_directory() passes dirname, but
> dirname is the full name of the directory.  lookup_untracked is looking
> for the base name -- that is, the code needs to subtract out the length
> of not just the parent dir, but all ancestor directories.  So it ends up
> creating an entry named e.g. foo/bar inside foo (instead of just finding
> an existing 'bar', or adding one called 'bar')
> 
> I wasn't really sure what the right way to fix this is, as I just
> started looking into this code, but I thought I should report it.
> 
> My test case for this was having two levels of subdirectory e.g.
> foo/bar/some-file.  
> 
> Let me know if you need more details.

I should mention that other than that, skip-worktree + untracked cache
seems to work fine.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Confused about sparse vs untracked-cache
  2015-07-31  5:13     ` David Turner
@ 2015-07-31 11:28       ` Duy Nguyen
  0 siblings, 0 replies; 5+ messages in thread
From: Duy Nguyen @ 2015-07-31 11:28 UTC (permalink / raw)
  To: David Turner; +Cc: git mailing list

On Fri, Jul 31, 2015 at 12:13 PM, David Turner <dturner@twopensource.com> wrote:
> I should mention that other than that, skip-worktree + untracked cache
> seems to work fine.

Please go ahead and make a patch to allow it. I'll spend some more
time looking at this code. But I think it'll be fine.
-- 
Duy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-07-31 11:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-30  2:32 Confused about sparse vs untracked-cache David Turner
2015-07-30 14:09 ` Duy Nguyen
2015-07-30 23:30   ` David Turner
2015-07-31  5:13     ` David Turner
2015-07-31 11:28       ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).