git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Derrick Stolee <stolee@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [ANNOUNCE] Git v2.19.0-rc0
Date: Wed, 22 Aug 2018 02:07:35 -0400	[thread overview]
Message-ID: <20180822060735.GA13195@sigill.intra.peff.net> (raw)
In-Reply-To: <20180822053626.GB535143@genre.crustytoothpaste.net>

On Wed, Aug 22, 2018 at 05:36:26AM +0000, brian m. carlson wrote:

> On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> > So I wonder if there's some other way to tell the compiler that we'll
> > only have a few values. An enum comes to mind, though I don't think the
> > enum rules are strict enough to make this guarantee (after all, it's OK
> > to bitwise-OR enums, so they clearly don't specify all possible values).
> 
> I was thinking about this:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..1f5c6e9319 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	switch (the_hash_algo->rawsz) {
> +		case 20:
> +			return memcmp(sha1, sha2, 20);
> +		case 32:
> +			return memcmp(sha1, sha2, 32);
> +		default:
> +			assert(0);
> +	}
>  }

Unfortunately this version doesn't seem to be any faster than the status
quo. And looking at the generated asm, it still looks to be calling
memcpy(). Removing the "case 32" branch switches it back to fast
assembly (this is all using gcc 8.2.0, btw). So I think we're deep into
guessing what the optimizer is going to do, and there's a good chance
that other versions are going to optimize it differently.

We might be better off just writing it out manually. Unfortunately, it's
a bit hard because the neg/0/pos return is more expensive to compute
than pure equality. And only the compiler knows at each inlined site
whether we actually want equality. So now we're back to switching every
caller to use hasheq() if that's what they want.

But _if_ we're OK with that, and _if_ we don't mind some ifdefs for
portability, then this seems as fast as the original (memcmp+constant)
code on my machine:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..c406105f3c 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,16 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	switch (the_hash_algo->rawsz) {
+	case 20:
+		if (*(uint32_t *)sha1 == *(uint32_t *)sha2 &&
+		    *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 *)(sha2+4))
+			return 0;
+	case 32:
+		return memcmp(sha1, sha2, 32);
+	default:
+		assert(0);
+	}
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)

Which is really no surprise, because the generated asm looks about the
same. There are obviously alignment questions there. It's possible it
could even be written portably as a simple loop. Or maybe not. We used
to do that, but modern compilers were able to optimize the memcmp
better. Maybe that's changed. Or maybe they were simply unwilling to
unroll a 20-length loop to find out that it could be turned into a few
quad-word compares.

> That would make it obvious that there are at most two options.
> Unfortunately, gcc for me determines that the buffer in walker.c is 20
> bytes in size and steadfastly refuses to compile because it doesn't know
> that the value will never be 32 in our codebase currently.  I'd need to
> send in more patches before it would compile.

Yeah, I see that warning all over the place (everywhere that calls
is_null_oid(), which is passing in a 20-byte buffer).

> I don't know if something like this is an improvement or now, but this
> seems to at least compile:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..3207f74771 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	switch (the_hash_algo->rawsz) {
> +		case 20:
> +		case 32:
> +			return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +		default:
> +			assert(0);
> +	}

I think that would end up with the same slow code, as gcc would rather
call memcmp than expand out the two sets of asm.

> I won't have time to sit down and test this out until tomorrow afternoon
> at the earliest.  If you want to send in something in the mean time,
> even if that limits things to just 20 for now, that's fine.

I don't have a good option. The assert() thing works until I add in the
"32" branch, but that's just punting the issue off until you add support
for the new hash.

Hand-rolling our own asm or C is a portability headache, and we need to
change all of the callsites to use a new hasheq().

Hiding it behind a per-hash function is conceptually cleanest, but not
quite as fast. And it also requires hasheq().

So all of the solutions seem non-trivial.  Again, I'm starting to wonder
if it's worth chasing this few percent.

-Peff

  reply	other threads:[~2018-08-22  6:07 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-20 22:13 [ANNOUNCE] Git v2.19.0-rc0 Junio C Hamano
2018-08-20 22:41 ` Stefan Beller
2018-08-20 23:39   ` Jonathan Nieder
2018-08-21  0:27     ` Jonathan Nieder
2018-08-21  0:46       ` Stefan Beller
2018-08-21 20:41 ` Derrick Stolee
2018-08-21 21:29   ` Jeff King
2018-08-22  0:48     ` brian m. carlson
2018-08-22  3:03       ` Jeff King
2018-08-22  3:36         ` Jeff King
2018-08-22 11:11           ` Derrick Stolee
2018-08-22  5:36         ` brian m. carlson
2018-08-22  6:07           ` Jeff King [this message]
2018-08-22  7:39             ` Ævar Arnfjörð Bjarmason
2018-08-22 11:14               ` Derrick Stolee
2018-08-22 15:17                 ` Jeff King
2018-08-22 16:08                   ` Duy Nguyen
2018-08-22 16:14                     ` Duy Nguyen
2018-08-22 16:26                       ` Jeff King
2018-08-22 16:49                         ` Derrick Stolee
2018-08-22 16:58                           ` Duy Nguyen
2018-08-22 17:04                             ` Derrick Stolee
2018-08-22 16:59                           ` Jeff King
2018-08-22 17:02                             ` Junio C Hamano
2018-08-22 15:14               ` Jeff King
2018-08-22 14:28           ` Derrick Stolee
2018-08-22 15:24             ` Jeff King
2018-08-22 12:42         ` Paul Smith
2018-08-22 15:23           ` Jeff King
2018-08-23  1:23             ` Jonathan Nieder
2018-08-23  2:16               ` Jeff King
2018-08-23  2:27                 ` Jonathan Nieder
2018-08-23  5:02                   ` Jeff King
2018-08-23  5:09                     ` brian m. carlson
2018-08-23  5:10                     ` Jonathan Nieder
2018-08-23 13:20                     ` Junio C Hamano
2018-08-23 16:31                       ` wide t/perf output, was " Jeff King
2018-08-23  3:47                 ` brian m. carlson
2018-08-23  5:04                   ` Jeff King
2018-08-23 10:26                     ` Derrick Stolee
2018-08-23 13:16                       ` Junio C Hamano
2018-08-23 16:14                       ` Jeff King
2018-08-23 23:30                         ` Jacob Keller
2018-08-23 23:40                           ` Jeff King
2018-08-24  0:06                             ` Jeff King
2018-08-24  0:16                               ` Jeff King
2018-08-24  2:48                                 ` Jacob Keller
2018-08-24  2:59                                   ` Jeff King
2018-08-24  6:45                                     ` Jeff King
2018-08-24 11:04                                       ` Derrick Stolee
2018-08-27 19:36                                     ` Junio C Hamano
2018-08-23 18:53                       ` Jeff King
2018-08-23 20:59                         ` Derrick Stolee
2018-08-24  6:56                           ` Jeff King
2018-08-24  7:57                             ` Ævar Arnfjörð Bjarmason
2018-08-24 16:45                           ` Derrick Stolee
2018-08-25  8:26                             ` Jeff King
2018-09-02 18:53                       ` Kaartic Sivaraam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180822060735.GA13195@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).