git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] remote: prefer exact matches when using refspecs
@ 2018-07-31 21:18 Jonathan Tan
  2018-07-31 21:31 ` Jonathan Nieder
  2018-07-31 21:53 ` Junio C Hamano
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Tan @ 2018-07-31 21:18 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When matching a non-wildcard LHS of a refspec against a list of refs,
find_ref_by_name_abbrev() returns the first ref that matches using the
DWIM rules used by refname_match() in refs.c, even if an exact match
occurs later in the list of refs.

This causes unexpected behavior when (for example) fetching using the
refspec "refs/heads/s:<something>" from a remote with both
"refs/heads/refs/heads/s" and "refs/heads/s". (Even if the former was
inadvertently created, one would still expect the latter to be fetched.)

This problem has only been observed when the desired ref comes after the
undesired ref in alphabetical order. However, for completeness, the test
in this patch also checks what happens when the desired ref comes first
alphabetically.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote.c         |  7 +++++--
 t/t5510-fetch.sh | 28 ++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/remote.c b/remote.c
index 3fd43453f..eeffe3488 100644
--- a/remote.c
+++ b/remote.c
@@ -1687,12 +1687,15 @@ static struct ref *get_expanded_map(const struct ref *remote_refs,
 
 static const struct ref *find_ref_by_name_abbrev(const struct ref *refs, const char *name)
 {
+	const struct ref *best_match = NULL;
 	const struct ref *ref;
 	for (ref = refs; ref; ref = ref->next) {
-		if (refname_match(name, ref->name))
+		if (!strcmp(name, ref->name))
 			return ref;
+		if (refname_match(name, ref->name))
+			best_match = ref;
 	}
-	return NULL;
+	return best_match;
 }
 
 struct ref *get_remote_ref(const struct ref *remote_refs, const char *name)
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index e402aee6a..da88f35f0 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -535,6 +535,34 @@ test_expect_success "should be able to fetch with duplicate refspecs" '
 	)
 '
 
+test_expect_success 'LHS of refspec prefers exact matches' '
+	mkdir lhs-exact &&
+	(
+		cd lhs-exact &&
+		git init server &&
+		test_commit -C server unwanted &&
+		test_commit -C server wanted &&
+
+		git init client &&
+
+		# Check a name coming after "refs" alphabetically ...
+		git -C server update-ref refs/heads/s wanted &&
+		git -C server update-ref refs/heads/refs/heads/s unwanted &&
+		git -C client fetch ../server refs/heads/s:refs/heads/checkthis &&
+		git -C server rev-parse wanted >expect &&
+		git -C client rev-parse checkthis >actual &&
+		test_cmp expect actual &&
+
+		# ... and one before.
+		git -C server update-ref refs/heads/q wanted &&
+		git -C server update-ref refs/heads/refs/heads/q unwanted &&
+		git -C client fetch ../server refs/heads/q:refs/heads/checkthis &&
+		git -C server rev-parse wanted >expect &&
+		git -C client rev-parse checkthis >actual &&
+		test_cmp expect actual
+	)
+'
+
 # configured prune tests
 
 set_config_tristate () {
-- 
2.18.0.345.g5c9ce644c3-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: prefer exact matches when using refspecs
  2018-07-31 21:18 [PATCH] remote: prefer exact matches when using refspecs Jonathan Tan
@ 2018-07-31 21:31 ` Jonathan Nieder
  2018-07-31 21:53 ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2018-07-31 21:31 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Dana Dahlstrom

Hi,

Jonathan Tan wrote:

> When matching a non-wildcard LHS of a refspec against a list of refs,
> find_ref_by_name_abbrev() returns the first ref that matches using the
> DWIM rules used by refname_match() in refs.c, even if an exact match
> occurs later in the list of refs.
>
> This causes unexpected behavior when (for example) fetching using the
> refspec "refs/heads/s:<something>" from a remote with both
> "refs/heads/refs/heads/s" and "refs/heads/s". (Even if the former was
> inadvertently created, one would still expect the latter to be fetched.)
>
> This problem has only been observed when the desired ref comes after the
> undesired ref in alphabetical order. However, for completeness, the test
> in this patch also checks what happens when the desired ref comes first
> alphabetically.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  remote.c         |  7 +++++--
>  t/t5510-fetch.sh | 28 ++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 2 deletions(-)

Very clear analysis --- thank you.

> --- a/remote.c
> +++ b/remote.c
> @@ -1687,12 +1687,15 @@ static struct ref *get_expanded_map(const struct ref *remote_refs,
>  
>  static const struct ref *find_ref_by_name_abbrev(const struct ref *refs, const char *name)
>  {
> +	const struct ref *best_match = NULL;
>  	const struct ref *ref;
>  	for (ref = refs; ref; ref = ref->next) {
> -		if (refname_match(name, ref->name))
> +		if (!strcmp(name, ref->name))
>  			return ref;
> +		if (refname_match(name, ref->name))

Should this check be

		if (!best_match && refname_match(name, ref->name))

?  Otherwise, this would make us prefer the last ref instead of the
first (which probably doesn't matter but would be an unintended
behavior change).

> +			best_match = ref;
>  	}
> -	return NULL;
> +	return best_match;
>  }
>  
>  struct ref *get_remote_ref(const struct ref *remote_refs, const char *name)
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index e402aee6a..da88f35f0 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -535,6 +535,34 @@ test_expect_success "should be able to fetch with duplicate refspecs" '
>  	)
>  '
>  
> +test_expect_success 'LHS of refspec prefers exact matches' '

Nice.

With or without the suggested tweak,

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for a pleasant read.

> +	mkdir lhs-exact &&
> +	(
> +		cd lhs-exact &&
> +		git init server &&
> +		test_commit -C server unwanted &&
> +		test_commit -C server wanted &&
> +
> +		git init client &&
> +
> +		# Check a name coming after "refs" alphabetically ...
> +		git -C server update-ref refs/heads/s wanted &&
> +		git -C server update-ref refs/heads/refs/heads/s unwanted &&
> +		git -C client fetch ../server refs/heads/s:refs/heads/checkthis &&
> +		git -C server rev-parse wanted >expect &&
> +		git -C client rev-parse checkthis >actual &&
> +		test_cmp expect actual &&
> +
> +		# ... and one before.
> +		git -C server update-ref refs/heads/q wanted &&
> +		git -C server update-ref refs/heads/refs/heads/q unwanted &&
> +		git -C client fetch ../server refs/heads/q:refs/heads/checkthis &&
> +		git -C server rev-parse wanted >expect &&
> +		git -C client rev-parse checkthis >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  # configured prune tests
>  
>  set_config_tristate () {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: prefer exact matches when using refspecs
  2018-07-31 21:18 [PATCH] remote: prefer exact matches when using refspecs Jonathan Tan
  2018-07-31 21:31 ` Jonathan Nieder
@ 2018-07-31 21:53 ` Junio C Hamano
  2018-07-31 22:28   ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2018-07-31 21:53 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> When matching a non-wildcard LHS of a refspec against a list of refs,
> find_ref_by_name_abbrev() returns the first ref that matches using the
> DWIM rules used by refname_match() in refs.c, even if an exact match
> occurs later in the list of refs.

When you have refs/heads/refs/heads/s and refs/heads/s, and if you
ask for refs/heads/s, you want that exact match (i.e. the latter) to
take precedence over DWIMmed refs/heads/refs/heads/s.

What is unfortunate is that ref_rev_parse_rules[] array already
expresses that preference by listing the "fullname" choice "%.*s"
before other DWIM choices like "refs/heads/%.*s".

Now we iterate over refs we say from ls-remote output, and with the
updated one, the logic _manually_ gives the precedence to the first
entry in ref_rev_parse_rules[], so in that "I have a branch s and
also another branch refs/heads/s" case, that may happen to work, but
would the updated code do the right thing when you have entries that
can match, say, both second and third entry in the rules[] and
tiebreak correctly the same way?  Say you ask for "tags/T" when
there are "refs/tags/T" and "refs/heads/tags/T" at the same time in
the refs linked list.  None of the ref on refs list trigger
!strcmp() as there is no exact mqatch, and refname_match() would say
"Yeah I see a match" when checking "refs/heads/tags/T" and say it is
the best match.  Then it finds "refs/tags/T" also on the refs list
and finds it also matches user-supplied "tags/T".

In order to resolve this correctly with the precedence rules, I
think you need to make refname_match() return the precedence number
(e.g. give 1 to "%.*s", 2 to "refs/%.*s", etc., using the index in
ref_rev_parse_rules[] array), and make this loop keep track of the
"best" match paying attention to the returned precedence.

> This causes unexpected behavior when (for example) fetching using the
> refspec "refs/heads/s:<something>" from a remote with both
> "refs/heads/refs/heads/s" and "refs/heads/s". (Even if the former was
> inadvertently created, one would still expect the latter to be fetched.)
>
> This problem has only been observed when the desired ref comes after the
> undesired ref in alphabetical order. However, for completeness, the test
> in this patch also checks what happens when the desired ref comes first
> alphabetically.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  remote.c         |  7 +++++--
>  t/t5510-fetch.sh | 28 ++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/remote.c b/remote.c
> index 3fd43453f..eeffe3488 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -1687,12 +1687,15 @@ static struct ref *get_expanded_map(const struct ref *remote_refs,
>  
>  static const struct ref *find_ref_by_name_abbrev(const struct ref *refs, const char *name)
>  {
> +	const struct ref *best_match = NULL;
>  	const struct ref *ref;
>  	for (ref = refs; ref; ref = ref->next) {
> -		if (refname_match(name, ref->name))
> +		if (!strcmp(name, ref->name))
>  			return ref;
> +		if (refname_match(name, ref->name))
> +			best_match = ref;
>  	}
> -	return NULL;
> +	return best_match;
>  }
>  
>  struct ref *get_remote_ref(const struct ref *remote_refs, const char *name)
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index e402aee6a..da88f35f0 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -535,6 +535,34 @@ test_expect_success "should be able to fetch with duplicate refspecs" '
>  	)
>  '
>  
> +test_expect_success 'LHS of refspec prefers exact matches' '
> +	mkdir lhs-exact &&
> +	(
> +		cd lhs-exact &&
> +		git init server &&
> +		test_commit -C server unwanted &&
> +		test_commit -C server wanted &&
> +
> +		git init client &&
> +
> +		# Check a name coming after "refs" alphabetically ...
> +		git -C server update-ref refs/heads/s wanted &&
> +		git -C server update-ref refs/heads/refs/heads/s unwanted &&
> +		git -C client fetch ../server refs/heads/s:refs/heads/checkthis &&
> +		git -C server rev-parse wanted >expect &&
> +		git -C client rev-parse checkthis >actual &&
> +		test_cmp expect actual &&
> +
> +		# ... and one before.
> +		git -C server update-ref refs/heads/q wanted &&
> +		git -C server update-ref refs/heads/refs/heads/q unwanted &&
> +		git -C client fetch ../server refs/heads/q:refs/heads/checkthis &&
> +		git -C server rev-parse wanted >expect &&
> +		git -C client rev-parse checkthis >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  # configured prune tests
>  
>  set_config_tristate () {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: prefer exact matches when using refspecs
  2018-07-31 21:53 ` Junio C Hamano
@ 2018-07-31 22:28   ` Junio C Hamano
  2018-07-31 23:33     ` Jonathan Tan
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2018-07-31 22:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> In order to resolve this correctly with the precedence rules, I
> think you need to make refname_match() return the precedence number
> (e.g. give 1 to "%.*s", 2 to "refs/%.*s", etc., using the index in
> ref_rev_parse_rules[] array), and make this loop keep track of the
> "best" match paying attention to the returned precedence.

That is, something like this, perhaps.  The resulting behaviour
should match how "git rev-parse X" would give precedence to tag X
over branch X by going this route.  What do you think?

 refs.c   |  8 +++-----
 remote.c | 13 ++++++++++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/refs.c b/refs.c
index 457fb78057..fd1a7f7478 100644
--- a/refs.c
+++ b/refs.c
@@ -495,11 +495,9 @@ int refname_match(const char *abbrev_name, const char *full_name)
 	const char **p;
 	const int abbrev_name_len = strlen(abbrev_name);
 
-	for (p = ref_rev_parse_rules; *p; p++) {
-		if (!strcmp(full_name, mkpath(*p, abbrev_name_len, abbrev_name))) {
-			return 1;
-		}
-	}
+	for (p = ref_rev_parse_rules; *p; p++)
+		if (!strcmp(full_name, mkpath(*p, abbrev_name_len, abbrev_name)))
+			return (p - ref_rev_parse_rules) + 1;
 
 	return 0;
 }
diff --git a/remote.c b/remote.c
index 86e6098774..ed2f80e45c 100644
--- a/remote.c
+++ b/remote.c
@@ -1689,11 +1689,18 @@ static struct ref *get_expanded_map(const struct ref *remote_refs,
 static const struct ref *find_ref_by_name_abbrev(const struct ref *refs, const char *name)
 {
 	const struct ref *ref;
+	const struct ref *best_match = NULL;
+	int best_score = -1;
+
 	for (ref = refs; ref; ref = ref->next) {
-		if (refname_match(name, ref->name))
-			return ref;
+		int score = refname_match(name, ref->name);
+
+		if ((score && (best_score < 0 || score < best_score))) {
+			best_match = ref;
+			best_score = score;
+		}
 	}
-	return NULL;
+	return best_match;
 }
 
 struct ref *get_remote_ref(const struct ref *remote_refs, const char *name)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: prefer exact matches when using refspecs
  2018-07-31 22:28   ` Junio C Hamano
@ 2018-07-31 23:33     ` Jonathan Tan
  2018-08-01  0:32       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Tan @ 2018-07-31 23:33 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> That is, something like this, perhaps.  The resulting behaviour
> should match how "git rev-parse X" would give precedence to tag X
> over branch X by going this route.  What do you think?

[snip]

>  static const struct ref *find_ref_by_name_abbrev(const struct ref *refs, const char *name)
>  {
>  	const struct ref *ref;
> +	const struct ref *best_match = NULL;
> +	int best_score = -1;
> +
>  	for (ref = refs; ref; ref = ref->next) {
> -		if (refname_match(name, ref->name))
> -			return ref;
> +		int score = refname_match(name, ref->name);
> +
> +		if ((score && (best_score < 0 || score < best_score))) {
> +			best_match = ref;
> +			best_score = score;
> +		}
>  	}
> -	return NULL;
> +	return best_match;
>  }

This looks good to me. I've checked that refname_match (and
branch_merge_matches(), which returns the result of refname_match()
directly) is only used in "if" contexts, so making it return a value
other than 1 is fine.

I would initialize best_score to INT_MAX to avoid needing the
"best_score < 0" comparison, but don't feel strongly about it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: prefer exact matches when using refspecs
  2018-07-31 23:33     ` Jonathan Tan
@ 2018-08-01  0:32       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2018-08-01  0:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> This looks good to me. I've checked that refname_match (and
> branch_merge_matches(), which returns the result of refname_match()
> directly) is only used in "if" contexts, so making it return a value
> other than 1 is fine.

Yes, the log message should say that existing callers only care if
the returned value is 0 or not (i.e. if we have any match).

> I would initialize best_score to INT_MAX to avoid needing the
> "best_score < 0" comparison, but don't feel strongly about it.

If we want to lose that "have we seen any possible result?" check, I
think defining (ARRAY_SIZE(ref_rev_parse_rules) - p) as the score,
so that the "full path" gets score 6 (or whatever) and "some remote
tracking name" (like "origin->refs/remotes/origin/HEAD") gets score
of 1 (smallest but true) may make more sense.  Then, start the best
score at 0 and every time we get a score strictly better than the
best so far, we overwrite the best.  That way, we can even lose the
"did we get any positive score?" check, too, and making the
condition in the inner loop quite simple, i.e.

	int best_score = 0;
	...
	for (ref = refs; ref; ref = ref->next) {
		int score = refname_match(name, ref->name);

		if (best_score < score) {
			best_match = ref;
			best_score = score;
		}
	}

We need a commit log message (hopefully we can lift most parts from
your patch in this thread) and a test update to ensure that the same
precedence order used as ref resolution (i.e. tags get higher prio
over branches, etc.) in addition to the test you had in your patch.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-01  0:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31 21:18 [PATCH] remote: prefer exact matches when using refspecs Jonathan Tan
2018-07-31 21:31 ` Jonathan Nieder
2018-07-31 21:53 ` Junio C Hamano
2018-07-31 22:28   ` Junio C Hamano
2018-07-31 23:33     ` Jonathan Tan
2018-08-01  0:32       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).