git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Brandon Williams <bmwill@google.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>, git@vger.kernel.org
Subject: Re: Bug in pathspec handling (in conjunction with submodules)
Date: Tue, 28 Nov 2017 15:06:24 -0800	[thread overview]
Message-ID: <20171128230624.GA194092@google.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1.1711260210060.6482@virtualbox>

On 11/26, Johannes Schindelin wrote:
> Hi Duy & Brandon,
> 
> in 74ed43711fd (grep: enable recurse-submodules to work on <tree> objects,
> 2016-12-16), the do_match() function in tree-walk.c was changed so that it
> can recurse across submodule boundaries.
> 
> However, there is a bug, and I *think* there may be two bugs actually. Or
> even three.
> 
> First of all, here is an MCVE that I distilled from
> https://github.com/git-for-windows/git/issues/1371:
> 
> 	git init repo
> 	cd repo
> 
> 	git init submodule
> 	git -C submodule commit -m initial --allow-empty
> 
> 	touch "[bracket]"
> 	git add "[bracket]"
> 	git commit -m bracket
> 	git add submodule
> 	git commit -m submodule
> 
> 	git rev-list HEAD -- "[bracket]"
> 
> Nothing fancy, just adding a file with brackets in the name, then a
> submodule, then showing the commit history filtered by the funny file
> name.
> 
> However, the log prints *both* commits. Clearly the submodule commit
> should *not* be shown.
> 
> Now, how does this all happen?
> 
> Since the pathspec contains brackets, parse_pathspec() marks it as
> containing wildcards and sets nowildcard_len to 0.
> 
> Now, note that [bracket] *is* a wildcard expression: it should only match
> a single character that is one of  a, b, c, e, k, r or t.
> 
> I think this is the first bug: `git rev-list` should not even match the
> commit that adds the file [bracket] because its file name does not match
> that expression. From where I sit, it would appear that f1a2ddbbc2d
> (tree_entry_interesting(): optimize wildcard matching when base is
> matched, 2010-12-15) simply added the fnmatch() code without disabling the
> literal match_entry() code when the pathspec contains a pattern.

I can see both sides to this, wanting to try matching literally first
and then trying the wildcards, so I don't really have an opinion on
how/if we should fix that.

> 
> But it does not stop there: there is *another* bug which causes the
> pattern to somehow match the submodule. I *guess* the idea of
> https://github.com/git/git/commit/74ed43711#diff-7a08243175f2cae66aedf53f7dce3bdfR1015
> was to allow a pattern like *.c to match files in a submodule, but the
> pattern [bracket] should not match any file in submodule/. I think that
> that code needs to be a little bit more careful to try to match the
> submodule's name against the pattern (it seems to interpret nowildcard_len
> == 0 to mean that the wildcard is `*`).

This is a much bigger issue and I'm surprised it took this long to find
this bug.  And of course its due to one of my earlier contributions to
the project :)

> 
> However, the commit introducing that code wanted to teach *grep* (not
> *rev-list*) a new trick, and it relies on the `recursive` flag of the
> pathspec to be set.

This is the root cause of the bug.  The added code to match against
submodules was intended to allow for matching past submodule boundaries
for those commands (like grep) which are recursing submodules.  So
really there should be an additional flag which is passed in to trigger
this logic instead of relying on the recursive flag of the pathspec.  Or
we can add a recurse_submodules flag to the pathspec struct and respect
that flag instead of the 'recursive' flag.

I have a quick patch to do just that which I'll send shortly.

> 
> And now it gets really interesting. Or confusing, depending on your mental
> condition. This recursive flag of the pathspec is set in
> ll_diff_tree_paths() (yep, changing the flag in the passed-in opt
> structure... which I found a bit... unexpected, given the function name, I
> would have been less surprised if that function only diff'ed the trees and
> used the options without changing the options). That flag-change was
> introduced in
> https://github.com/git/git/commit/bc96cc87dbb2#diff-15203e8cd8ee9191113894de9d97a8a6R149
> which is another patch that changed the tree diff machinery to accommodate
> `git grep` (but maybe not really paying a lot of attention to the fact
> that the same machinery is called repeatedly by the revision machinery,
> too).
> 
> I am really confused by this code mainly due to the fact that the term
> "recursive" is pretty ambiguous in that context: does it refer to
> directories/tree objects, or to submodules? I guess it is used for both
> when there should be two flags so that rev-list can recurse over tree
> objects but not submodules (unless told to do so).
> 
> The problem, of course, is that `git rev-list HEAD -- '[bracket]'` never
> recurses into the submodule. And therefore, the promised "more accurate
> matching [...] in the submodule" is never performed. And the commit adding
> the submodule is never pruned.
> 
> Since I am not really familiar with all that tree diff code (and as a
> general rule to protect my mental health, I try my best to stay away from
> submodules, too), but you two are, may I ask you gentle people to have a
> closer look to fix those bugs?
> 
> Thanks,
> Dscho

-- 
Brandon Williams

  reply	other threads:[~2017-11-28 23:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-26  2:15 Bug in pathspec handling (in conjunction with submodules) Johannes Schindelin
2017-11-28 23:06 ` Brandon Williams [this message]
2017-11-28 23:22 ` [PATCH] pathspec: only match across submodule boundaries when requested Brandon Williams
2017-11-29 21:29   ` Johannes Schindelin
2017-12-05  0:09     ` Brandon Williams
2017-12-05  0:07   ` [PATCH v2] " Brandon Williams
2017-12-05 19:19     ` Junio C Hamano
2017-12-06 21:20     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171128230624.GA194092@google.com \
    --to=bmwill@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).