git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>, Samuel Lijin <sxlijin@gmail.com>
Subject: Re: [RFC PATCH 4/7] dir: Directories should be checked for matching pathspecs too
Date: Thu, 5 Apr 2018 15:31:24 -0400	[thread overview]
Message-ID: <20180405193124.GA24643@sigill.intra.peff.net> (raw)
In-Reply-To: <CABPp-BEnFiEnao0NqU3GerYkpxO9fJadQLHo6_PZ-hXLZfbbdg@mail.gmail.com>

On Thu, Apr 05, 2018 at 12:15:59PM -0700, Elijah Newren wrote:

> On Thu, Apr 5, 2018 at 11:58 AM, Jeff King <peff@peff.net> wrote:
> 
> > It sounds like correct_untracked_entries() is doing the wrong thing, and
> > it should be aware of the pathspec-matching when culling entries. In
> > other words, my understanding was that read_directory() does not
> > necessarily promise to cull fully (which is what led to cf424f5fd in the
> > first place), and callers are forced to apply their own pathspecs.
> >
> > The distinction is academic for this particular bug, but it makes me
> > wonder if there are other cases where "clean" needs to be more careful
> > with what comes out of dir.c.
> 
> Interesting, I read things very differently.  Looking back at commit
> 6b1db43109ab ("clean: teach clean -d to preserve ignored paths",
> 2017-05-23), which introduced correct_untracked_entries(), I thought
> that correct_untracked_entries() wasn't there to correct pathspec
> issues with fill_directory(), but instead to special case the handling
> of files which are both untracked and ignored.  Did I mis-read or were
> there other commits that changed the semantics?
> 
> Also, it would just seem odd to me that fill_directory() requires
> pathspecs, and it uses those pathspecs, but it doesn't guarantee that
> the files it returns matches them.  That seems like an API ripe for
> mis-use, especially since I don't see any comment in the code about
> such an assumption.  Is that really the expectation?

To be honest, I don't know. Most of dir.c predates me, and I've tried to
avoid looking at it too hard. But I had a vague recollection of it being
"best effort", and this bit from cf424f5fd89b reinforces that:

  However, read_directory does not actually check against our pathspec.
  It uses a simplified version that may turn up false positives. As a
  result, we need to check that any hits match our pathspec.

So I don't know that correct_untracked_entries() is there to fix the
pathspec handling. But I think that anybody who looks at the output of
fill_directory() does need to be aware that they may get more entries
than they expected, and has to apply the pathspecs themselves. And
that's what that extra dir_path_match() call in cmd_clean() is
there for (it used to be match_pathspec before some renaming).

I agree it's an error-prone interface. I don't know all the conditions
under which dir.c might return extra entries, but it seems like it might
be sane for it to do a final pathspec-matching pass so that callers
don't have to. That would mean that correct_untracked_entries() sees the
correctly culled list, and the extra check in cmd_clean() could be
dropped.

Ideally, of course, we'd fix those individual cases, since that would be
more efficient. And your patch may be the right first step in that
direction. But since we don't know what all of them are, it seems ripe
for regressions.

-Peff

  reply	other threads:[~2018-04-05 19:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-05 17:34 [RFC PATCH 0/7] Fix `git clean` with pathspecs Elijah Newren
2018-04-05 17:34 ` [RFC PATCH 1/7] dir.c: Fix typo in comment Elijah Newren
2018-04-05 17:34 ` [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item Elijah Newren
2018-04-05 17:49   ` Jeff King
2018-04-05 18:36     ` Elijah Newren
2018-04-05 19:04       ` Jeff King
2018-04-05 20:06         ` Elijah Newren
2018-04-06 17:53           ` Jeff King
2018-04-05 17:34 ` [RFC PATCH 3/7] t7300: Add some testcases showing failure to clean specified pathspecs Elijah Newren
2018-04-05 17:34 ` [RFC PATCH 4/7] dir: Directories should be checked for matching pathspecs too Elijah Newren
2018-04-05 18:58   ` Jeff King
2018-04-05 19:15     ` Elijah Newren
2018-04-05 19:31       ` Jeff King [this message]
2018-04-09  2:07         ` Junio C Hamano
2018-04-05 17:34 ` [RFC PATCH 5/7] dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
2018-04-05 17:34 ` [RFC PATCH 6/7] dir: If our pathspec might match files under a dir, recurse into it Elijah Newren
2018-04-05 17:34 ` [RFC PATCH 7/7] If we do not want globs to recurse into subdirs without -d Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180405193124.GA24643@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=sxlijin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).