git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, johannes.schindelin@gmx.de,
	git@jeffhostetler.com, kewillf@microsoft.com
Subject: Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()
Date: Mon, 18 Sep 2017 09:51:38 +0900	[thread overview]
Message-ID: <xmqqa81su8v9.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170915165750.198201-2-dstolee@microsoft.com> (Derrick Stolee's message of "Fri, 15 Sep 2017 12:57:48 -0400")

Derrick Stolee <dstolee@microsoft.com> writes:

> +int cmd_main(int ac, const char **av)
> +{
> +	setup_git_directory();

As far as I recall, we do not (yet) allow declaration after
statement in our codebase.  Move this down to make it after all
decls.

> +
> +	unsigned int hash_delt = 0x13579BDF;
> +	unsigned int hash_base = 0x01020304;
> +	struct object_id oid;
> +
> +	int i, count = 0;
> +	int n = sizeof(struct object_id) / sizeof(int);

It probably is technically OK to assume sizeof(int) always equals to
sizeof(unsigned), but because you use 'n' _only_ to work with uint
and never with int, it would make more sense to match.  

But I do not think we want this "clever" optimization that involves
'n' in the first place.

> +	while (count++ < 100000) {
> +		for (i = 0; i < n; i++)
> +			((unsigned int*)oid.hash)[i] = hash_base;

Does it make sense to assume that uint is always 4-byte (so this
code won't work if it is 8-byte on your platform) and doing this is
faster than using platform-optimized memcpy()?

> +		find_unique_abbrev(oid.hash, MINIMUM_ABBREV);
> +
> +		hash_base += hash_delt;
> +	}

I also wonder if this is measuring the right thing.  I am guessing
that by making many queries for a unique abbreviation of "random"
(well, it is deterministic, but my point is these queries are not
based on the names of objects that exist in the repository) hashes,
this test measures how much time it takes for us to decide that such
a random hash does not exist.  In the real life use, we make the
opposite query far more frequently: we have an object that we _know_
exists in the repository and we try to find a sufficient length to
disambiguate it from others, and I suspect that these two use
different logic.  Don't you need to be measuring the time it takes
to compute the shortest abbreviation of an object that exists
instead?

> +	exit(0);
> +}
> diff --git a/t/perf/p0008-abbrev.sh b/t/perf/p0008-abbrev.sh
> new file mode 100755
> index 000000000..7c3fad807
> --- /dev/null
> +++ b/t/perf/p0008-abbrev.sh
> @@ -0,0 +1,12 @@
> +#!/bin/sh
> +
> +test_description='Test object disambiguation through abbreviations'
> +. ./perf-lib.sh
> +
> +test_perf_large_repo
> +
> +test_perf 'find_unique_abbrev()' '
> +	test-abbrev
> +'
> +
> +test_done

  reply	other threads:[~2017-09-18  0:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 16:57 [PATCH 0/3] Improve abbreviation disambiguation Derrick Stolee
2017-09-15 16:57 ` [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev() Derrick Stolee
2017-09-18  0:51   ` Junio C Hamano [this message]
2017-09-18 11:36     ` Derrick Stolee
2017-09-19  0:51       ` Junio C Hamano
2017-09-15 16:57 ` [PATCH 2/3] sha1_name: Unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-09-15 16:57 ` [PATCH 3/3] sha1_name: Parse less while finding common prefix Derrick Stolee
2017-09-15 17:08 ` [PATCH 0/3] Improve abbreviation disambiguation Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa81su8v9.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=kewillf@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).