From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id A058B1F4B4 for ; Fri, 9 Apr 2021 17:51:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234262AbhDIRve (ORCPT ); Fri, 9 Apr 2021 13:51:34 -0400 Received: from cloud.peff.net ([104.130.231.41]:46486 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233332AbhDIRvd (ORCPT ); Fri, 9 Apr 2021 13:51:33 -0400 Received: (qmail 475 invoked by uid 109); 9 Apr 2021 17:51:20 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 09 Apr 2021 17:51:20 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 14349 invoked by uid 111); 9 Apr 2021 17:51:20 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 09 Apr 2021 13:51:20 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 9 Apr 2021 13:51:19 -0400 From: Jeff King To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: git@vger.kernel.org, Junio C Hamano , Johannes Schindelin , Taylor Blau , Elijah Newren Subject: Re: [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer() Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote: > As noted in the comment introduced in 837d395a5c0 (Replace > parse_blob() with an explanatory comment, 2010-01-18) the old > parse_blob() function and the current parse_blob_buffer() exist merely > to provide consistency in the API. > > We're not going to parse blobs like we "parse" commits, trees or > tags. So let's not have the parse_blob_buffer() take arguments that > pretends that we do. Its only use is to set the "parsed" flag. > > See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with > parse_object(), 2005-05-06) for the introduction of parse_blob_buffer(). OK. Calling it parse_blob_buffer() is a little silly since it doesn't even take a buffer anymore. But I guess parse_blob() might imply that it actually loads the contents from disk to check them (which the other parse_foo() functions do), so that's not a good name. So this might be the least bad thing. Given that there are only two callers, just setting blob->object.parsed might not be unreasonable, either. But I don't think it's worth spending too much time on. > @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; > } > - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); > + parse_blob_buffer(lookup_blob(r, oid)); > return lookup_object(r, oid); Not new in your patch, but I wondered if this could cause a segfault when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit this code path when either: - lookup_object() returns an object with type OBJ_BLOB, in which case lookup_blob() would return that same object - lookup_object() returned NULL, in which case lookup_blob() will call it again, get NULL again, and then auto-create the blob and return it So I think it is OK. But there are a bunch of duplicate hash lookups in this code. It would be clearer and more efficient as: diff --git a/object.c b/object.c index 2c32691dc4..2dfa038f13 100644 --- a/object.c +++ b/object.c @@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || (!obj && repo_has_object_file(r, oid) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + if (!obj) + obj = create_object(r, oid, alloc_blob_node(r)); if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; } - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); - return lookup_object(r, oid); + parse_blob_buffer(obj, NULL, 0); + return obj; } buffer = repo_read_object_file(r, oid, &type, &size); but I doubt the efficiency matters much in practice. Those hash lookups will be lost in the noise of computing the hash of the blob contents. -Peff