git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "David Burström" <davidburstrom@spotify.com>, git@vger.kernel.org
Subject: Re: Bug or unexpected behaviour in git show <rev>:a\b
Date: Fri, 24 Jan 2020 19:00:51 -0500	[thread overview]
Message-ID: <20200125000051.GA566074@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqk15gzmc8.fsf@gitster-ct.c.googlers.com>

On Fri, Jan 24, 2020 at 11:27:35AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So everything is working as designed, or at least explainable. But I
> > think there is some room for improvement. A backslash that isn't
> > followed by a glob meta-character _is_ still a meta character (your
> > "a\b" would be globbing for "ab"). But it's useless enough that I think
> > it shouldn't be enough to trigger the "oh, you probably meant this as a
> > pathspec" DWIM rule.
> 
> This sounds sensible.

OK, the patch I came up with is below.

> > We _could_ also say "even though this could technically be a pathspec
> > because of its metacharacter, it looks vaguely enough like a
> > path-in-tree revision that we shouldn't guess". That I'm less
> > comfortable with, just because it makes the heuristics even more
> > magical.
> 
> Not just it becomes more magical, I am afraid that the code to
> implement such a heuristics would be fragile and become a source of
> unnecessary bugs.  Let's not go there.

OK. It does mean that:

  git show HEAD:a*

will still quietly produce no output instead of saying "hey, there is no
a* in HEAD". But I think given the lack of bug reports over the years
that this case (and the backslash one I'm fixing) are probably
relatively rare.  The backslash one seems a lot more likely, just
because Windows folks may treat it like a path separator (I'm not sure
if that even works, considering its meaning in a glob, but certainly I
can imagine somebody doing so as an experiment and getting confused by
the result).

> I should learn to use "working as designed or at least explainable"
> more often in my responses, by the way.  That's quite a useful and
> good phrase ;-)

Perhaps that can be Git's motto. ;)

Anyway, here's the patch. Even though this is rare, I think it's worth
doing. The code is simple and I don't anticipate anybody complaining
about the tightening.

-- >8 --
Subject: verify_filename(): handle backslashes in "wildcards are pathspecs" rule

Commit 28fcc0b71a (pathspec: avoid the need of "--" when wildcard is
used, 2015-05-02) allowed:

  git rev-parse '*.c'

without the double-dash. But the rule it uses to check for wildcards
actually looks for any glob special. This is overly liberal, as it means
that a pattern that doesn't actually do any wildcard matching, like
"a\b", will be considered a pathspec.

If you do have such a file on disk, that's presumably what you wanted.
But if you don't, the results are confusing: rather than say "there's no
such path a\b", we'll quietly accept it as a pathspec which very likely
matches nothing (or at least not what you intended). Likewise, looking
for path "a\*b" doesn't expand the search at all; it would only find a
single entry, "a*b".

This commit switches the rule to trigger only when glob metacharacters
would expand the search, meaning both of those cases will now report an
error (you can still disambiguate using "--", of course; we're just
tightening the DWIM heuristic).

Note that we didn't test the original feature in 28fcc0b71a at all. So
this patch not only tests for these corner cases, but also adds a
regression test for the existing behavior.

Reported-by: David Burström <davidburstrom@spotify.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 setup.c                        | 23 ++++++++++++++++++++---
 t/t1506-rev-parse-diagnosis.sh | 14 ++++++++++++++
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/setup.c b/setup.c
index e2a479a64f..12228c0d9c 100644
--- a/setup.c
+++ b/setup.c
@@ -197,9 +197,26 @@ static void NORETURN die_verify_filename(struct repository *r,
  */
 static int looks_like_pathspec(const char *arg)
 {
-	/* anything with a wildcard character */
-	if (!no_wildcard(arg))
-		return 1;
+	const char *p;
+	int escaped = 0;
+
+	/*
+	 * Wildcard characters imply the user is looking to match pathspecs
+	 * that aren't in the filesystem. Note that this doesn't include
+	 * backslash even though it's a glob special; by itself it doesn't
+	 * cause any increase in the match. Likewise ignore backslash-escaped
+	 * wildcard characters.
+	 */
+	for (p = arg; *p; p++) {
+		if (escaped) {
+			escaped = 0;
+		} else if (is_glob_special(*p)) {
+			if (*p == '\\')
+				escaped = 1;
+			else
+				return 1;
+		}
+	}
 
 	/* long-form pathspec magic */
 	if (starts_with(arg, ":("))
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index 6d951ca015..8a75f37a11 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -222,4 +222,18 @@ test_expect_success 'reject Nth ancestor if N is too high' '
 	test_must_fail git rev-parse HEAD~100000000000000000000000000000000
 '
 
+test_expect_success 'pathspecs with wildcards are not ambiguous' '
+	echo "*.c" >expect &&
+	git rev-parse "*.c" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'backslash does not trigger wildcard rule' '
+	test_must_fail git rev-parse "foo\\bar"
+'
+
+test_expect_success 'escaped char does not trigger wildcard rule' '
+	test_must_fail git rev-parse "foo\\*bar"
+'
+
 test_done
-- 
2.25.0.421.gb74d19af79


  reply	other threads:[~2020-01-25  0:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-24 12:45 Bug or unexpected behaviour in git show <rev>:a\b David Burström
2020-01-24 19:01 ` Jeff King
2020-01-24 19:27   ` Junio C Hamano
2020-01-25  0:00     ` Jeff King [this message]
2020-01-25 13:21       ` David Burström
2020-01-27 18:47       ` Junio C Hamano
2020-01-25  0:05   ` Jeff King
2020-01-25  0:06     ` [PATCH 1/3] t1400: avoid "test" string comparisons Jeff King
2020-01-25  0:06     ` [PATCH 2/3] t1506: drop space after redirection operator Jeff King
2020-01-25  0:13     ` [PATCH 3/3] sha1-name: mark get_oid() error messages for translation Jeff King
2020-01-29 21:30       ` Junio C Hamano
2020-01-29 21:42         ` Junio C Hamano
2020-01-30  7:17           ` Jeff King
2020-01-30 19:16             ` Junio C Hamano
2020-01-31  0:15               ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200125000051.GA566074@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=davidburstrom@spotify.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).