git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee <dstolee@microsoft.com>,
	git@vger.kernel.org, johannes.schindelin@gmx.de,
	git@jeffhostetler.com, kewillf@microsoft.com
Subject: Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()
Date: Tue, 19 Sep 2017 09:51:25 +0900	[thread overview]
Message-ID: <xmqqr2v3se7m.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <960c73e2-6909-6894-a9ab-a191426aeda9@gmail.com> (Derrick Stolee's message of "Mon, 18 Sep 2017 04:36:09 -0700")

Derrick Stolee <stolee@gmail.com> writes:

>> But I do not think we want this "clever" optimization that involves
>> 'n' in the first place.
>>>> +	while (count++ < 100000) {
>>> +		for (i = 0; i < n; i++)
>>> +			((unsigned int*)oid.hash)[i] = hash_base;
>>
>> Does it make sense to assume that uint is always 4-byte (so this
>> code won't work if it is 8-byte on your platform) and doing this is
>> faster than using platform-optimized memcpy()?
>
> I'm not sure what you mean by using memcpy to improve this, because
> it would require calling memcpy in the inner loop, such as
>
> 	for (i = 0; i < n; i++)
> 		memcpy(oid.hash + i * sizeof(unsigned), &hash_base,
> 		       sizeof(unsigned));

Sorry, I left it without saying as I thought it was obvious, but
what I meant was to use a whole "struct oid", not just a single
unsigned (repeated), as the hash that is tested.  If you have an
array of object names you use in the test, then

	for (count = 0; count < limit; count++) {
		hashcpy(&oid.hash, &samples[count]);

		... do the probing ...
	}

> First, this doesn't just measure the time it takes to determine non-
> existence,

Sorry, my phrasing was indeed misleading.  I know the time we spend
to see if we have or do not have the object is the largest cycle
spender in these codepaths (and even if it were, I do not think that
is what you are trying to optimize in these patches anyway).  

But if I recall correctly, the way we come up with the unique
abbreviation for an object that exists and an object that does not
exist are different?  And because most of the time (think: "git log
-p" output) we would be finding abbreviation for objects that we do
have, benchmarking and tweaking the code that comes up with an
object that does not exist is not optimizing for the right case.

Back when I wrote that initial response, I didn't check how
different the code was between the two cases, but now I did.  It
seems that in both cases we start from the shortest-allowed length
and then extend the same way, and the only difference between these
two cases is that we return immediately when our candidate name is
long enough not to match any existing object when the full name
refers to an object we do not have, while we return only when
disambiguity is resolved.  I _think_ these amount to the same
computation (i.e. when an object with the full name we have exists,
the amount of computation we need to come up with its unique
abbreviation is the same as the computation we need to find the
unique abbreviation for the same name in another repository that has
identical set of objects, minus that single object), so from that
point of view, throwing random hashes, most of which would not name
any existing object, and measuring how much time it takes to run
get_short_oid() to compute the minimum length of the unique prefix
should be sufficient.

Thanks.


  reply	other threads:[~2017-09-19  0:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 16:57 [PATCH 0/3] Improve abbreviation disambiguation Derrick Stolee
2017-09-15 16:57 ` [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev() Derrick Stolee
2017-09-18  0:51   ` Junio C Hamano
2017-09-18 11:36     ` Derrick Stolee
2017-09-19  0:51       ` Junio C Hamano [this message]
2017-09-15 16:57 ` [PATCH 2/3] sha1_name: Unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-09-15 16:57 ` [PATCH 3/3] sha1_name: Parse less while finding common prefix Derrick Stolee
2017-09-15 17:08 ` [PATCH 0/3] Improve abbreviation disambiguation Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqr2v3se7m.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=kewillf@microsoft.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).