From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 9218C1F954 for ; Wed, 22 Aug 2018 03:36:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727529AbeHVG7w (ORCPT ); Wed, 22 Aug 2018 02:59:52 -0400 Received: from cloud.peff.net ([104.130.231.41]:51234 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1726186AbeHVG7w (ORCPT ); Wed, 22 Aug 2018 02:59:52 -0400 Received: (qmail 25571 invoked by uid 109); 22 Aug 2018 03:36:56 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Wed, 22 Aug 2018 03:36:56 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 29091 invoked by uid 111); 22 Aug 2018 03:37:02 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Tue, 21 Aug 2018 23:37:02 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 21 Aug 2018 23:36:54 -0400 Date: Tue, 21 Aug 2018 23:36:54 -0400 From: Jeff King To: "brian m. carlson" , Derrick Stolee , Junio C Hamano , git@vger.kernel.org Subject: Re: [ANNOUNCE] Git v2.19.0-rc0 Message-ID: <20180822033654.GA30196@sigill.intra.peff.net> References: <1b20b754-987c-a712-2594-235b845bc5d0@gmail.com> <20180821212923.GB24431@sigill.intra.peff.net> <20180822004815.GA535143@genre.crustytoothpaste.net> <20180822030344.GA14684@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180822030344.GA14684@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: > with the obvious "oideq()" implementation added, that seems to get me to > 2-3%. Not _quite_ as good as the original branching version I showed. > And we had to touch all the callsites (although arguably that kind of > "eq" function is a better interface anyway, since it obviously allows > for more optimization. > > So maybe the branching thing is actually not so insane. It makes new > hash_algo's Just Work; they just won't be optimized. And the change is > very localized. Hmph. So I went back to double-check my measurements on that branching version, and I couldn't replicate it! It turns out what I showed (and measured) before has a bug. Can you see it? diff --git a/cache.h b/cache.h index b1fd3d58ab..9c004a26c9 100644 --- a/cache.h +++ b/cache.h @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + if (the_hash_algo->rawsz == 20) + return memcmp(sha1, sha2, 20); + else + return memcmp(sha1, sha1, the_hash_algo->rawsz); } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) The problem is the fallback code compares "sha1" to "sha1". The compiler realizes that's a noop and is able to treat it like a constant. Thus essentially leaving only the first branch, which it then expands into a few instructions. If we fix that bug, then we really do memcmp on either side of the conditional. And the compiler is smart enough to realize that hey, that's the same as just calling memcmp with the_hash_algo->rawsz on either side. And we end up with roughly the same code that we started with. So the assert() version really is the fastest. I didn't test, but I suspect we could "trick" the compiler by having the fallback call an opaque wrapper around memcmp(). That would prevent it from combining the two paths, and presumably it would still optimize the constant-20 side. Or maybe it would eventually decide our inline function is getting too big and scrap it. Which probably crosses a line of craziness (if I didn't already cross it two emails ago). -Peff