git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Brandon Williams <bmwill@google.com>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH v2] ls-files: add pathspec matching for submodules
Date: Mon, 19 Sep 2016 16:21:53 -0700	[thread overview]
Message-ID: <xmqqh99bcw6m.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <1474311151-117883-1-git-send-email-bmwill@google.com> (Brandon Williams's message of "Mon, 19 Sep 2016 11:52:31 -0700")

Brandon Williams <bmwill@google.com> writes:

> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> index a623ebf..09e4449 100644
> --- a/Documentation/git-ls-files.txt
> +++ b/Documentation/git-ls-files.txt
> @@ -19,7 +19,7 @@ SYNOPSIS
> -		[--output-path-prefix=<path>]
> +		[--submodule-prefix=<path>]
> ...
> ---output-path-prefix=<path>::
> +--submodule-prefix=<path>::
> ...
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
> ...
> -	if (output_path_prefix && *output_path_prefix) {
> +	if (submodule_prefix && *submodule_prefix) {
>  		strbuf_reset(&full_name);
> -		strbuf_addstr(&full_name, output_path_prefix);
> +		strbuf_addstr(&full_name, submodule_prefix);
> ...
> -	argv_array_pushf(&cp.args, "--output-path-prefix=%s%s/",
> -			 output_path_prefix ? output_path_prefix : "",
> +	argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/",
> +			 submodule_prefix ? submodule_prefix : "",
>  			 ce->name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

That means that this patch will become 2/2 of a series, and 1/2 is
rerolled to use submodule_prefix from the get-go, without ever
introducing output_path_prefix variable, so that many of the above
lines we won't have to review in 2/2.

> +	/*
> +	 * Pass in the original pathspec args.  The submodule will be
> +	 * responsible for prepending the 'submodule_prefix' prior to comparing
> +	 * against the pathspec for matches.
> +	 */

Good.

> +	argv_array_push(&cp.args, "--");
> +	for (i = 0; i < pathspec.nr; ++i)
> +		argv_array_push(&cp.args, pathspec.items[i].original);
> +

Please prefer post-increment i++ over pre-increment ++i when writing
a for(;;) loop, unless there is a strong reason not to (familiarlity
in C++ is not a good reason).

> diff --git a/dir.c b/dir.c
> index 0ea235f..1afc3ff 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,35 @@ int fspathncmp(const char *a, const char *b, size_t count)
>  	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +/**
> + * Used to perform prefix matching against a pathspec_item for determining if we
> + * should decend into a submodule.  This function can result in false positives
> + * since we are only trying to match the 'string' to a prefix of the 'pattern'
> + */

Perhaps s/can/is allowed to/.  Implicit but equally important is
that it is not OK to produce false negative.

> +static int prefix_fnmatch(const struct pathspec_item *item,
> +		   const char *pattern, const char *string,
> +		   int prefix)
> +{
> +	if (prefix > 0) {
> +		if (ps_strncmp(item, pattern, string, prefix))
> +			return WM_NOMATCH;
> +		pattern += prefix;
> +		string += prefix;
> +	}
> +
> +	if (item->flags & PATHSPEC_ONESTAR) {
> +		return WM_MATCH;
> +	} else if (item->magic & PATHSPEC_GLOB) {
> +		return wildmatch(pattern, string,
> +				 WM_PATHNAME |
> +				 (item->magic & PATHSPEC_ICASE ?
> +				  WM_CASEFOLD : 0),
> +				 NULL);

Isn't this last one overly tight?  I am wondering about a scenario
where you have a submodule at "sub/" in the superproject, and "sub/"
has a "file" at the top of its working tree.  And you do:

	git ls-files --recurse-submodules ':(glob)??b/fi?e'

at the top of the superproject.  The "pattern" would be '??b/fi?e"
while string would be 'sub', and wildmatch() would not like it, but
there is no way for this caller to append anything to 'sub' before
making this call, as it hasn't looked into what paths appear in the
submodule repository (and it should not want to).  And I think we
would want it to recurse to find sub/file.  IOW, this looks like a
false negative we must avoid in this function.  As we cannot afford
to check if anything that matches 'fi?e' is in the index file of the
submodule repository, we shouldn't try to match 'fi?e' portion of
the given pathspec pattern.

Extending the scenario, if I also create a directory "sib/" in the
superproject next to "sub/" and add a "file" in it, the same
pathspec:

	git ls-files [--recurse-submodules] ':(glob)??b/fi?e'

does find "sib/file" (with or without the recursion option).

    Duy cc'ed because I am not quite sure why it is a good thing
    that I need to add an exlicit ':(glob)' in these examples to
    illustrate the breakage.  This comes from v1.8.3.2-849-gbd30c2e
    ("pathspec: support :(glob) syntax", 2013-07-14) and I do not
    think we want to change its behaviour. I just want to be
    reminded why we did this.  I am guessing that it was because of
    an ancient design decision that we would use fnmatch without
    FNM_PATHNAME by default, i.e. it would be too disruptive if we
    made :(glob), i.e. honor directory boundaries, the default.

With :(glob) that follows FNM_PATHNAME behaviour, ":(glob)s*" would
be rejected as a match with "sub/file" or "sib/file", and that is
OK, but I think our ':(glob)??b/fi?e' example should find both of
these paths as matches to the pattern.

> +
> +	return WM_NOMATCH;
> +}
> +

And of course, if it is not PATHSPEC_GLOB, returning NOMATCH does
not sound like erring on the side to prevent false negatives.  For
example, what does

	git ls-files --recurse-submodule '??b/fi?e'

do in the above scenario with this patch?  Doesn't the if/else
cascade fall through and hit this return that says "Nah, 'sub'
cannot possibly match, do not bother descending into it"?

One thing I was originally confused about was that there are at
least three kinds we need to worry about.

 - PATHSPEC_LITERAL, where we no-wildcard-len should cover
   everything in the pattern string, e.g. "sub/file" may be the
   pattern that would want to produce a match when trying to see if
   we want to descend into "sub".

 - PATHSPEC_GLOB, as discussed above.

 - Not having either LITERAL or GLOB.  's*' would want to say that
   it is worth descending into "sub".  If there were another
   submodule at "sob/dir", that would match the pattern 's*' as
   well, because an asterisk can match 'ob/dir' part unlike in
   PATHSPEC_GLOB case.

Thanks.


  reply	other threads:[~2016-09-19 23:22 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 23:57 [RFC] extending pathspec support to submodules Brandon Williams
2016-09-15 11:57 ` Heiko Voigt
2016-09-15 15:26   ` Brandon Williams
2016-09-15 22:08     ` Junio C Hamano
2016-09-15 22:28       ` Stefan Beller
2016-09-16  9:34         ` Heiko Voigt
2016-09-16 18:40           ` Brandon Williams
2016-09-17  0:59             ` [PATCH] ls-files: add pathspec matching for submodules Brandon Williams
2016-09-17  3:46               ` Junio C Hamano
2016-09-18 18:40                 ` Brandon Williams
2016-09-19 17:00                   ` Junio C Hamano
2016-09-19 17:26                     ` Brandon Williams
2016-09-19 18:04                       ` Junio C Hamano
2016-09-19 18:20                         ` Brandon Williams
2016-09-19 18:22                           ` Junio C Hamano
2016-09-19 18:30                             ` Brandon Williams
2016-09-19 18:34                               ` Junio C Hamano
2016-09-19 18:35                                 ` Brandon Williams
2016-09-19 18:52                                   ` [PATCH v2] " Brandon Williams
2016-09-19 23:21                                     ` Junio C Hamano [this message]
2016-09-20 16:30                                       ` Brandon Williams
2016-09-20 21:03                                         ` Brandon Williams
2016-09-21 17:12                                           ` Junio C Hamano
2016-09-21 17:49                                             ` Junio C Hamano
2016-09-19 18:18               ` [PATCH] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh99bcw6m.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).