git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Brandon Williams <bmwill@google.com>
Cc: git@vger.kernel.org, "Heiko Voigt" <hvoigt@hvoigt.net>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Stefan Beller" <sbeller@google.com>
Subject: Re: [PATCH] ls-files: add pathspec matching for submodules
Date: Mon, 19 Sep 2016 10:00:06 -0700	[thread overview]
Message-ID: <xmqqvaxrg6zt.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <CAKoko1r6cfv-2HVCJPgGbXyCVe-wdUBS+2nXtaTHO3jshVg8MA@mail.gmail.com> (Brandon Williams's message of "Sun, 18 Sep 2016 11:40:22 -0700")

Brandon Williams <bmwill@google.com> writes:

>> OK, so as discussed previously with Heiko and Stefan, the idea is to
>>
>>  - pass the original pathspec as-is,
>>
>>  - when --submodule-prefix is given, a path discovered in a
>>    submodule repository is first prefixed with that string before
>>    getting checked to see if it matches the original pathspec.
>>
>> And this loop is about relaying the original pathspec.
>
> Exactly.  Perhaps I should have made this more clear either with a
> detailed comment or
> more information in the commit msg.

I think you were clear enough.

Don't read everything other people say in their reviews as pointing
out issues.  Often trying to rephrase what they read in the code in
their own words is a good way to make sure the reviewers and the
original author are on the same page.  The above was one of these
cases.

>>> +     if (prefix > 0) {
>>> +             if (ps_strncmp(item, pattern, string, prefix))
>>> +                     return WM_NOMATCH;
>>
>> This says: when we have a set prefix that must literally match, and
>> that part does not match what we have, it cannot possibly match.
>>
>> Is that correct?  What do we have in "name" and "item" at this
>> point?  We disable the common-prefix optimization, so we do not have
>> to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
>> giving you "sub/dir" as the common prefix, when you are wondering if
>> it is worth descending into "sub/" without knowing what it contains.
>> Is that what guarantees why this part is correct?
>
> I adopted this structure from another part of the code.  The caller
> uses a field in
> the pathspec item which indicates the location of the first wildcard character.
> So the prefix (everything prior to the wildcard char) must match
> literally before
> we drop into a more expensive wildmatch function.

"Another part of the code" is about tree walking, right?  Weren't
you saying that part of the code may be buggy or something earlier
(e.g. pathspec "su?/" vs entry "sub")?

Again, what do we have in "name" and "item" at this point?  If we
have a submodule at "sub/" and we are checking a pathspec element
"sub/dir1/*", what is the non-wildcard part of the pathspec and what
is the "string"?  Aren't then "sub/dir1/" and "sub/" respectively,
which would not pass ps_strncmp() and produce a (false) negative?

I am starting to have a feeling that the best we can do in this
function safely is to see if prefix (i.e. the constant part of the
pathspec before the first wildcard) is long enough to cover the
"name" and if "name" part does not match to the directory boundary,
e.g. for this combination

	pathspec = "a/b/sib/c/*"
        name = "a/b/sub/"

we can say with confidence that it is not worth descending into.

When prefix is long enough and "name" and leading part of the prefix
matches to the directory boundary, e.g.

	pathspec = "a/b/sub/c/*"
        name = "a/b/sub/"

we can say it is worth descending into.

If these two checks cannot decide, we may have to be pessimistic and
say "it may match; we don't know until we descend into it".  When
prefix is shorter than name, I am not sure if we can devise a set of
simple rules, e.g.

	pathspec = "a/**/c/*"
        name = "a/b/sub/"

may match with its ** "b/sub" part and worth descending into, so is

	pathspec = "a/b/*/c/*"
        name = "a/b/sub/"

but not this one:

	pathspec = "a/b/su[c-z]/c/*"
        name = "a/b/sub/"

but this is OK:

	pathspec = "a/b/su[a-z]/c/*"
        name = "a/b/sub/"

So I would think we'd be in the business of counting slashes in the
name (called "string" in this function) and the pathspec, while
noticing '*' and '**' in the latter, and we may be able to be more
precise, but I am not sure how complex the end result would become.


  reply	other threads:[~2016-09-19 17:00 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 23:57 [RFC] extending pathspec support to submodules Brandon Williams
2016-09-15 11:57 ` Heiko Voigt
2016-09-15 15:26   ` Brandon Williams
2016-09-15 22:08     ` Junio C Hamano
2016-09-15 22:28       ` Stefan Beller
2016-09-16  9:34         ` Heiko Voigt
2016-09-16 18:40           ` Brandon Williams
2016-09-17  0:59             ` [PATCH] ls-files: add pathspec matching for submodules Brandon Williams
2016-09-17  3:46               ` Junio C Hamano
2016-09-18 18:40                 ` Brandon Williams
2016-09-19 17:00                   ` Junio C Hamano [this message]
2016-09-19 17:26                     ` Brandon Williams
2016-09-19 18:04                       ` Junio C Hamano
2016-09-19 18:20                         ` Brandon Williams
2016-09-19 18:22                           ` Junio C Hamano
2016-09-19 18:30                             ` Brandon Williams
2016-09-19 18:34                               ` Junio C Hamano
2016-09-19 18:35                                 ` Brandon Williams
2016-09-19 18:52                                   ` [PATCH v2] " Brandon Williams
2016-09-19 23:21                                     ` Junio C Hamano
2016-09-20 16:30                                       ` Brandon Williams
2016-09-20 21:03                                         ` Brandon Williams
2016-09-21 17:12                                           ` Junio C Hamano
2016-09-21 17:49                                             ` Junio C Hamano
2016-09-19 18:18               ` [PATCH] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqvaxrg6zt.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=hvoigt@hvoigt.net \
    --cc=pclouds@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).