git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, stolee@gmail.com, git@jeffhostetler.com,
	sbeller@google.com, Jeff King <peff@peff.net>
Subject: Re: [PATCH v3 3/5] sha1_name: Unroll len loop in find_unique_abbrev_r
Date: Wed, 04 Oct 2017 15:07:25 +0900	[thread overview]
Message-ID: <xmqqtvzfcuoy.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20171002145651.204984-4-dstolee@microsoft.com> (Derrick Stolee's message of "Mon, 2 Oct 2017 10:56:49 -0400")

Derrick Stolee <dstolee@microsoft.com> writes:

> -int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
> +struct min_abbrev_data {
> +	unsigned int init_len;
> +	unsigned int cur_len;
> +	char *hex;
> +};
> +
> +static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
>  {
> -	int status, exists;
> +	struct min_abbrev_data *mad = cb_data;
> +
> +	char *hex = oid_to_hex(oid);
> +	unsigned int i = mad->init_len;
> +	while (mad->hex[i] && mad->hex[i] == hex[i])
> +		i++;
> +
> +	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
> +		mad->cur_len = i + 1;
> +
> +	return 0;
> +}

The idea is to keep the target to be abbreviated in mad->hex[], and
as the two functions find_short_{object_filename,packed_object}
discover objects whose first 'len' letters of hexname are the same
as the target, they'd call into the "always_call_fn" callback, which
is this function, and it keeps track of the maximum of the common
prefix it has seen.  Which makes sense.  Well, sort of.

> -	exists = has_sha1_file(sha1);
> -	while (len < GIT_SHA1_HEXSZ) {
> -		struct object_id oid_ret;
> -		status = get_short_oid(hex, len, &oid_ret, GET_OID_QUIETLY);
> -		if (exists
> -		    ? !status
> -		    : status == SHORT_NAME_NOT_FOUND) {
> -			hex[len] = 0;
> -			return len;
> -		}
> -		len++;
> -	}
> -	return len;

The "always_call_fn" thing is a big sledgehammer that overrides
everything else in update_candidates().  It bypasses the careful
machinery set up to avoid having to open ambiguous object to learn
their types as much as possible.  One narrow exception when it is OK
to use is if we never limit our candidates with type.

And it might appear that the conversion is safe (if only because we
do not see any type limitation in the get_short_oid() call above),
but I think there is one case where this patch changes the
behaviour: what happens if core.disambiguate was set to anything
other than "none"?  The new code does not know anything about type
based filtering, so it can end up reporting longer abbreviation than
it was asked to produce.  It may not be a problem in practice, though.

I am not sure if setting core.disambiguate is generally a good idea
in the first place, and if it is OK to break find_unique_abbrev()
with respect to the configuration variable like this patch does.

I'd feel safe if we get extra input from Peff, who introduced the
feature in 5b33cb1f ("get_short_sha1: make default disambiguation
configurable", 2016-09-27).

> +
> +	if (init_object_disambiguation(hex, len, &ds) < 0)
> +		return -1;
> +
> +	mad.init_len = len;
> +	mad.cur_len = len;
> +	mad.hex = hex;
> +
> +	ds.fn = extend_abbrev_len;
> +	ds.always_call_fn = 1;
> +	ds.cb_data = (void*)&mad;
> +
> +	find_short_object_filename(&ds);
> +	find_short_packed_object(&ds);
> +	(void)finish_object_disambiguation(&ds, &oid_ret);
> +
> +	hex[mad.cur_len] = 0;
> +	return mad.cur_len;
>  }
>  
>  const char *find_unique_abbrev(const unsigned char *sha1, int len)

  parent reply	other threads:[~2017-10-04  6:07 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-25  9:54 [PATCH v2 0/5] Improve abbreviation disambiguation Derrick Stolee
2017-09-25  9:54 ` [PATCH v2 1/5] test-list-objects: List a subset of object ids Derrick Stolee
2017-09-26  9:24   ` Junio C Hamano
2017-10-05  8:42   ` Jeff King
2017-10-05  9:48     ` Junio C Hamano
2017-10-05 10:00       ` Jeff King
2017-10-05 10:16         ` Junio C Hamano
2017-10-05 12:39         ` Derrick Stolee
2017-10-06 14:11           ` Jeff King
2017-10-07 19:12             ` Derrick Stolee
2017-10-07 19:33               ` Jeff King
2017-10-08  1:46                 ` Junio C Hamano
2017-09-25  9:54 ` [PATCH v2 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf Derrick Stolee
2017-09-26  9:27   ` Junio C Hamano
2017-10-05  8:55   ` Jeff King
2017-10-05  8:57     ` Jeff King
2017-09-25  9:54 ` [PATCH v2 3/5] sha1_name: Unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-09-25  9:54 ` [PATCH v2 4/5] sha1_name: Parse less while finding common prefix Derrick Stolee
2017-09-25 23:42   ` Stefan Beller
2017-10-02 14:52     ` Derrick Stolee
2017-09-25  9:54 ` [PATCH v2 5/5] sha1_name: Minimize OID comparisons during disambiguation Derrick Stolee
2017-10-02 14:56 ` [PATCH v3 0/5] Improve abbreviation disambituation Derrick Stolee
2017-10-05  9:49   ` Jeff King
2017-10-02 14:56 ` [PATCH v3 1/5] test-list-objects: List a subset of object ids Derrick Stolee
2017-10-03  4:16   ` Junio C Hamano
2017-10-02 14:56 ` [PATCH v3 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf Derrick Stolee
2017-10-02 14:56 ` [PATCH v3 3/5] sha1_name: Unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-10-03 10:49   ` Junio C Hamano
2017-10-03 11:26     ` Derrick Stolee
2017-10-04  6:10       ` Junio C Hamano
2017-10-04 13:06         ` Derrick Stolee
2017-10-04  6:07   ` Junio C Hamano [this message]
2017-10-04 13:19     ` Derrick Stolee
2017-10-05  1:26       ` Junio C Hamano
2017-10-05  9:13     ` Jeff King
2017-10-05  9:50       ` Junio C Hamano
2017-10-02 14:56 ` [PATCH v3 4/5] sha1_name: Parse less while finding common prefix Derrick Stolee
2017-10-04  6:14   ` Junio C Hamano
2017-10-02 14:56 ` [PATCH v3 5/5] sha1_name: Minimize OID comparisons during disambiguation Derrick Stolee
2017-10-03 15:55   ` Stefan Beller
2017-10-03 17:05     ` Derrick Stolee
2017-10-05  9:44   ` Jeff King
2017-10-06 13:52     ` [PATCH] cleanup: fix possible overflow errors in binary search Derrick Stolee
2017-10-06 14:18       ` Jeff King
2017-10-06 14:41         ` Derrick Stolee
2017-10-08 18:29           ` [PATCH v2] " Derrick Stolee
2017-10-09 13:33             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtvzfcuoy.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).