From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,LOTS_OF_MONEY,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 4EB0E1F47C for ; Thu, 19 Jan 2023 02:31:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229590AbjASCbk (ORCPT ); Wed, 18 Jan 2023 21:31:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjASCbi (ORCPT ); Wed, 18 Jan 2023 21:31:38 -0500 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A594EA24E for ; Wed, 18 Jan 2023 18:31:37 -0800 (PST) Received: (qmail 5686 invoked by uid 109); 19 Jan 2023 02:31:37 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 19 Jan 2023 02:31:37 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 28187 invoked by uid 111); 19 Jan 2023 02:31:39 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Wed, 18 Jan 2023 21:31:39 -0500 Authentication-Results: peff.net; auth=none Date: Wed, 18 Jan 2023 21:31:36 -0500 From: Jeff King To: Taylor Blau Cc: git@vger.kernel.org, =?utf-8?B?UmVuw6k=?= Scharfe , =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Subject: Re: [PATCH 6/6] hash-object: use fsck for object checks Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Jan 18, 2023 at 04:34:02PM -0500, Taylor Blau wrote: > That being said, let me play devil's advocate for a second. Do the new > fsck checks slow anything in hash-object down significantly? If so, then > it's plausible to imagine a hash-object caller who (a) doesn't use > `--literally`, but (b) does care about throughput if they're writing a > large number of objects at once. > > I don't know if such a situation exists, or if these new fsck checks > even slow hash-object down enough to care. But I didn't catch a > discussion of this case in your series, so I figured I'd bring it up > here just in case. That's a really good point to bring up. Prior to timing anything, here were my guesses: - it won't make a big difference either way because the time is dominated by computing sha1 anyway - we might actually be a little faster for commits and tags in the new code, because they aren't allocating structs for the pointed-to objects (trees, parents, etc). Nor stuffing them into obj_hash, so our total memory usage would be lower. - trees may be a little slower, because we're doing a more analysis on the filenames (sort order, various filesystem specific checks for .git, etc) And here's what I timed, using linux.git. First I pulled out the raw object data like so: mkdir -p commit tag tree git cat-file --batch-all-objects --unordered --batch-check='%(objecttype) %(objectname)' | perl -alne 'print $F[1] unless $F[0] eq "blob"' | git cat-file --batch | perl -ne ' /(\S+) (\S+) (\d+)/ or die "confusing: $_"; my $dir = "$2/" . substr($1, 0, 2); my $fn = "$dir/" . substr($1, 2); mkdir($dir); open(my $fh, ">", $fn) or die "open($fn): $!"; read(STDIN, my $buf, $3) or die "read($3): $!"; print $fh $buf; read(STDIN, $buf, 1); # trailing newline ' And then I timed it like this: find commit -type f | sort >input hyperfine -L v old,new './git.{v} hash-object --stdin-paths -t commit input hyperfine -L v old,new './git.{v} hash-object --stdin-paths -t tree