git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Brandon Williams <bmwill@google.com>
Cc: git@vger.kernel.org, "Heiko Voigt" <hvoigt@hvoigt.net>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Stefan Beller" <sbeller@google.com>
Subject: Re: [PATCH] ls-files: add pathspec matching for submodules
Date: Fri, 16 Sep 2016 20:46:35 -0700	[thread overview]
Message-ID: <xmqqtwdfgpd0.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <1474073981-96620-1-git-send-email-bmwill@google.com> (Brandon Williams's message of "Fri, 16 Sep 2016 17:59:41 -0700")

Brandon Williams <bmwill@google.com> writes:

> ...
>  		[--full-name] [--recurse-submodules]
> -		[--output-path-prefix=<path>]
> +		[--submodule-prefix=<path>]
>  		[--abbrev] [--] [<file>...]
>  
> ---output-path-prefix=<path>::
> +--submodule-prefix=<path>::
>  	Prepend the provided path to the output of each file
> ...
>  static int show_eol;
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
>  static int recurse_submodules;
> ...
>  	static struct strbuf full_name = STRBUF_INIT;
> -	if (output_path_prefix && *output_path_prefix) {
> +	if (submodule_prefix && *submodule_prefix) {
>  		strbuf_reset(&full_name);
> -		strbuf_addstr(&full_name, output_path_prefix);
> +		strbuf_addstr(&full_name, submodule_prefix);
>  		strbuf_addstr(&full_name, name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

> +	/* Add pathspec args */
> +	argv_array_push(&cp.args, "--");
> +	for (i = 0; i < pathspec.nr; ++i)
> +		argv_array_push(&cp.args, pathspec.items[i].original);

OK, so as discussed previously with Heiko and Stefan, the idea is to

 - pass the original pathspec as-is,

 - when --submodule-prefix is given, a path discovered in a
   submodule repository is first prefixed with that string before
   getting checked to see if it matches the original pathspec.

And this loop is about relaying the original pathspec.

> @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
>  
>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>  {
> +	struct strbuf name = STRBUF_INIT;
>  	int len = max_prefix_len;
> +	if (submodule_prefix)
> +		strbuf_addstr(&name, submodule_prefix);
> +	strbuf_addstr(&name, ce->name);
>  
>  	if (len >= ce_namelen(ce))
> -		die("git ls-files: internal error - cache entry not superset of prefix");
> +		die("git ls-files: internal error - cache entry not "
> +		    "superset of prefix");

This is not such a great thing to do.  Upon a bug report, we can no
longer do

	git grep 'cache entry not superset'

to see where the error message is coming from.

> -	if (!match_pathspec(&pathspec, ce->name, ce_namelen(ce),
> -			    len, ps_matched,
> -			    S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
> -		return;
> -	if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
> +	if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> +	    submodule_path_match(&pathspec, name.buf, ps_matched)) {
>  		show_gitlink(ce);
> -		return;
> -	}
> +	} else if (match_pathspec(&pathspec, name.buf, name.len,
> +				  len, ps_matched,
> +				  S_ISDIR(ce->ce_mode) ||
> +				  S_ISGITLINK(ce->ce_mode))) {
> +		if (tag && *tag && show_valid_bit &&
> + ...

Argh.  If we had a preparatory clean-up step, would it have helped
to avoid this big re-indentation that makes the patch harder to read
than necessary, I wonder?

Another way would have been to "goto" from the end of this block

> +	if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> +	    submodule_path_match(&pathspec, name.buf, ps_matched)) {

where we used to "return" out to the central clean-up location, i.e.
here.

> +	strbuf_release(&name);
>  }


>  	parse_pathspec(&pathspec, 0,
>  		       PATHSPEC_PREFER_CWD |
>  		       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>  		       prefix, argv);
>  
> -	/* Find common prefix for all pathspec's */
> -	max_prefix = common_prefix(&pathspec);
> +	/*
> +	 * Find common prefix for all pathspec's
> +	 * This is used as a performance optimization which violates correctness
> +	 * in the recurse_submodules mode
> +	 */

The two new lines phrase it overly negatively and also misleading.
I thought you were saying "We do this as optimization anyway; damn
the correctness in the submodule case!" in my first reading before
reading the statements the comment talks about.  "This optimization
unfortunately cannot be done when recursing into submodules" would
have been better.

> +	if (recurse_submodules)
> +		max_prefix = NULL;
> +	else
> +		max_prefix = common_prefix(&pathspec);
>  	max_prefix_len = max_prefix ? strlen(max_prefix) : 0;

> diff --git a/dir.c b/dir.c
> index 0ea235f..630dc7a 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count)
>  	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +static int prefix_fnmatch(const struct pathspec_item *item,
> +		   const char *pattern, const char *string,
> +		   int prefix)
> +{

Is this meant to be free of false positives, free of false
negatives, or exact?  I think you use it to decide, without knowing
what kind of paths the submodule contains, if it is worth descending
into it, so as long as you definitively say "The pathspec can never
match anything in the submodule" with WM_NOMATCH, it is OK if you
returned WM_MATCH when it actually couldn't match anything.  I.e. it
is OK to give false positive but it is a bug to give false negative.

The answer to the above question should be a good explanation to
prepend as /* comment */ before the function.

> +	if (prefix > 0) {
> +		if (ps_strncmp(item, pattern, string, prefix))
> +			return WM_NOMATCH;

This says: when we have a set prefix that must literally match, and
that part does not match what we have, it cannot possibly match.

Is that correct?  What do we have in "name" and "item" at this
point?  We disable the common-prefix optimization, so we do not have
to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
giving you "sub/dir" as the common prefix, when you are wondering if
it is worth descending into "sub/" without knowing what it contains.
Is that what guarantees why this part is correct?

> +		pattern += prefix;
> +		string += prefix;
> +	}
> +
> +	if (item->flags & PATHSPEC_ONESTAR) {
> +		return WM_MATCH;

We have a pathspec that has a segment without wildcard letters,
followed by a '*', and there is no wildcard letters after that
asterisk.  We punt and assume it might match, which is OK for the
purpose of not giving a false negative.

> +	} else if (item->magic & PATHSPEC_GLOB) {
> +		return wildmatch(pattern, string,
> +				 WM_PATHNAME |
> +				 (item->magic & PATHSPEC_ICASE ?
> +				  WM_CASEFOLD : 0),
> +				 NULL);

What does this say?  If we are using the :(glob) semantics, which is
the default, we'll ask wildmatch() to see the remainder of the
pattern (after stripping the fixed prefix part if necessary) matches
the string (which also may have lost the prefix that we already know
matches).

Is that correct?  I think it depends on what "string" is being fed,
but I am assuing that you are working in the top-level project here
to decide if it is worth descending into a submodule.  If the item
is sub/dir?/*.c and we are considering "sub/" submodule, wildmatch
would not say "It could match" if "string" is "sub/".  Perhaps I am
reading the patch incorrectly.  Let me read on to see what the caller
does later.

  reply	other threads:[~2016-09-17  3:46 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 23:57 [RFC] extending pathspec support to submodules Brandon Williams
2016-09-15 11:57 ` Heiko Voigt
2016-09-15 15:26   ` Brandon Williams
2016-09-15 22:08     ` Junio C Hamano
2016-09-15 22:28       ` Stefan Beller
2016-09-16  9:34         ` Heiko Voigt
2016-09-16 18:40           ` Brandon Williams
2016-09-17  0:59             ` [PATCH] ls-files: add pathspec matching for submodules Brandon Williams
2016-09-17  3:46               ` Junio C Hamano [this message]
2016-09-18 18:40                 ` Brandon Williams
2016-09-19 17:00                   ` Junio C Hamano
2016-09-19 17:26                     ` Brandon Williams
2016-09-19 18:04                       ` Junio C Hamano
2016-09-19 18:20                         ` Brandon Williams
2016-09-19 18:22                           ` Junio C Hamano
2016-09-19 18:30                             ` Brandon Williams
2016-09-19 18:34                               ` Junio C Hamano
2016-09-19 18:35                                 ` Brandon Williams
2016-09-19 18:52                                   ` [PATCH v2] " Brandon Williams
2016-09-19 23:21                                     ` Junio C Hamano
2016-09-20 16:30                                       ` Brandon Williams
2016-09-20 21:03                                         ` Brandon Williams
2016-09-21 17:12                                           ` Junio C Hamano
2016-09-21 17:49                                             ` Junio C Hamano
2016-09-19 18:18               ` [PATCH] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtwdfgpd0.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=hvoigt@hvoigt.net \
    --cc=pclouds@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).