git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>, dana@dana.is
Subject: Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode
Date: Mon, 29 Oct 2018 14:24:00 +0100	[thread overview]
Message-ID: <87h8h4lwcv.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20181027084823.23382-1-pclouds@gmail.com>


On Sat, Oct 27 2018, Nguyễn Thái Ngọc Duy wrote:

> In WM_PATHNAME mode (or FNM_PATHNAME), '*' does not match '/' and '**'
> can but only in three patterns:
>
> - '**/' matches zero or more leading directories
> - '/**/' matches zero or more directories in between
> - '/**' matches zero or more trailing directories/files
>
> When '**' is present but not in one of these patterns, the current
> behavior is consider the pattern invalid and stop matching. In other
> words, 'foo**bar' never matches anything, whatever you throw at it.
>
> This behavior is arguably a bit confusing partly because we can't
> really tell the user their pattern is invalid so that they can fix
> it. So instead, tolerate it and make '**' act like two regular '*'s
> (which is essentially the same as a single asterisk). This behavior
> seems more predictable.
>
> Noticed-by: dana <dana@dana.is>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/gitignore.txt | 3 ++-
>  t/t3070-wildmatch.sh        | 4 ++--
>  wildmatch.c                 | 4 ++--
>  wildmatch.h                 | 1 -
>  4 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
> index d107daaffd..1c94f08ff4 100644
> --- a/Documentation/gitignore.txt
> +++ b/Documentation/gitignore.txt
> @@ -129,7 +129,8 @@ full pathname may have special meaning:
>     matches zero or more directories. For example, "`a/**/b`"
>     matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
>
> - - Other consecutive asterisks are considered invalid.
> + - Other consecutive asterisks are considered regular asterisks and
> +   will match according to the previous rules.
>
>  NOTES
>  -----
> diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
> index 46aca0af10..891d4d7cb9 100755
> --- a/t/t3070-wildmatch.sh
> +++ b/t/t3070-wildmatch.sh
> @@ -237,7 +237,7 @@ match 0 0 0 0 foobar 'foo\*bar'
>  match 1 1 1 1 'f\oo' 'f\\oo'
>  match 1 1 1 1 ball '*[al]?'
>  match 0 0 0 0 ten '[ten]'
> -match 0 0 1 1 ten '**[!te]'
> +match 1 1 1 1 ten '**[!te]'
>  match 0 0 0 0 ten '**[!ten]'
>  match 1 1 1 1 ten 't[a-g]n'
>  match 0 0 0 0 ten 't[!a-g]n'
> @@ -253,7 +253,7 @@ match 1 1 1 1 ']' ']'
>  # Extended slash-matching features
>  match 0 0 1 1 'foo/baz/bar' 'foo*bar'
>  match 0 0 1 1 'foo/baz/bar' 'foo**bar'
> -match 0 0 1 1 'foobazbar' 'foo**bar'
> +match 1 1 1 1 'foobazbar' 'foo**bar'
>  match 1 1 1 1 'foo/baz/bar' 'foo/**/bar'
>  match 1 1 0 0 'foo/baz/bar' 'foo/**/**/bar'
>  match 1 1 1 1 'foo/b/a/z/bar' 'foo/**/bar'
> diff --git a/wildmatch.c b/wildmatch.c
> index d074c1be10..9e9e2a2f95 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -104,8 +104,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  					    dowild(p + 1, text, flags) == WM_MATCH)
>  						return WM_MATCH;
>  					match_slash = 1;
> -				} else
> -					return WM_ABORT_MALFORMED;
> +				} else /* WM_PATHNAME is set */
> +					match_slash = 0;
>  			} else
>  				/* without WM_PATHNAME, '*' == '**' */
>  				match_slash = flags & WM_PATHNAME ? 0 : 1;
> diff --git a/wildmatch.h b/wildmatch.h
> index b8c826aa68..5993696298 100644
> --- a/wildmatch.h
> +++ b/wildmatch.h
> @@ -4,7 +4,6 @@
>  #define WM_CASEFOLD 1
>  #define WM_PATHNAME 2
>
> -#define WM_ABORT_MALFORMED 2
>  #define WM_NOMATCH 1
>  #define WM_MATCH 0
>  #define WM_ABORT_ALL -1

This patch looks good to me, but I think it's a bad state of affairs to
keep changing these semantics and not having something like a
"gitwildmatch" doc were we document this matching syntax.

Also I still need to dig up the work for using PCRE as an alternate
matching engine, the PCRE devs produced a bug-for-bug compatible version
of our wildmatch function (all the more reason to document it), so I
think they'll need to change it now that this is in, but I haven't
rebased those ancient patches yet.

Do you have any thoughts on how to proceed with getting this documented
/ into some stable state where we can specify it? Even if we don't end
up using PCRE as a matching engine (sometimes it was faster, sometimes
slower) I think it would be very useful if we can spew out "here's your
pattern as a regex" for self-documentation purposes.

Then that can be piped into e.g. "perl -Mre=debug" to see a step-by-step
guide for how the pattern compiles, and why it does or doesn't match a
given thing.

  parent reply	other threads:[~2018-10-29 13:24 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 10:19 [BUG] gitignore documentation inconsistent with actual behaviour dana
2018-10-11 10:37 ` dana
2018-10-11 11:08 ` Ævar Arnfjörð Bjarmason
2018-10-14  2:14   ` dana
2018-10-14 12:15   ` Duy Nguyen
2018-10-14 22:56     ` Junio C Hamano
2018-10-15 15:27       ` Duy Nguyen
2018-10-20  5:26 ` Duy Nguyen
2018-10-20  5:53   ` dana
2018-10-20  6:03     ` Duy Nguyen
2018-10-20  6:26       ` dana
2018-10-27  8:48 ` [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode Nguyễn Thái Ngọc Duy
2018-10-28  6:25   ` Torsten Bögershausen
2018-10-28  6:35     ` Duy Nguyen
2018-10-29  2:28       ` Junio C Hamano
2018-10-29 13:24   ` Ævar Arnfjörð Bjarmason [this message]
2018-10-29 15:53     ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h8h4lwcv.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=dana@dana.is \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).