From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,T_RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 969DC202F2 for ; Mon, 20 Nov 2017 20:58:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752923AbdKTU6r (ORCPT ); Mon, 20 Nov 2017 15:58:47 -0500 Received: from cloud.peff.net ([104.130.231.41]:35102 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752749AbdKTU6p (ORCPT ); Mon, 20 Nov 2017 15:58:45 -0500 Received: (qmail 5097 invoked by uid 109); 20 Nov 2017 20:58:45 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Mon, 20 Nov 2017 20:58:45 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 16240 invoked by uid 111); 20 Nov 2017 20:59:00 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with ESMTPA; Mon, 20 Nov 2017 15:59:00 -0500 Authentication-Results: peff.net; auth=pass (cram-md5) smtp.auth=relayok Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 20 Nov 2017 15:58:43 -0500 Date: Mon, 20 Nov 2017 15:58:43 -0500 From: Jeff King To: Stefan Beller Cc: git Subject: Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1 Message-ID: <20171120205843.chcpp5qryhta4ctt@sigill.intra.peff.net> References: <20171120202607.tf2pvegqe35mhxjs@sigill.intra.peff.net> <20171120203523.c3pt5qi43e24ttqq@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Nov 20, 2017 at 12:47:53PM -0800, Stefan Beller wrote: > > This is the minimal fix that addresses the performance issues. > > I'd actually have no problem at all declaring that looking up a null > > sha1 is insane, and having the object-lookup routines simply return "no > > such object" without even doing the loose/pack lookup first. > > > > I'm also fine with going down the BUG() route and fixing the > > higher-level callers, but it's a big enough task (with little enough > > real-world impact) that I think it would be worth applying this in the > > meantime. > > It would have a lot of impact in the future, when new developers > are hindered mis-using the API. The (unwritten) series with introducing > BUG() would help a lot in 'holding it right' and I would expect fewer > performance > regressions over time. Yeah, I agree the BUG() route is the nicest. I started trying to hunt down every code path causing a test failure, though, and soon got overwhelmed. I actually think it's not the worst thing in the world to say "it's OK to look up the null sha1; the result is well-defined, and it does not exist". It's perhaps philosophically a little ugly, but I think it's quite practical. > The patch is impressively small for such a performance gain. > Personally I think (1) (which essentially means "making null sha1 > work like a regular sha1") is quite an endeavor at this point in time > for this code base. Yeah, it would be a crazy amount of effort to try to support that, for very little gain. If we did want to just declare that the null sha1 does not exist (even if you happen to have objects/00/00... in your repository), the patch is similarly easy: diff --git a/sha1_file.c b/sha1_file.c index 8a7c6b7eba..754fe7f03e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1152,6 +1152,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, lookup_replace_object(sha1) : sha1; + if (is_null_sha1(sha1)) + return -1; + if (!oi) oi = &blank_oi; > As a tangent, a similar but solved problem in the diff code is how > NUL in user data is treated in xdiff for example, as there we kept > being careful since the beginning (though I think we don't have tests > for it, so it might be broken) There I think it's nice if we can be tolerant of NULs, because they're a thing that might actually happen in user input. A real null sha1 has a 2^-160 chance of happening, and the design of sha1 is such that it's hard for somebody to do it intentionally. You might be able to get up to some mischief with replace objects, though. -Peff