From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-11.7 required=3.0 tests=AWL,BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS, USER_IN_DEF_DKIM_WL shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id B47251F5AE for ; Thu, 27 May 2021 17:50:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235452AbhE0RwT (ORCPT ); Thu, 27 May 2021 13:52:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229791AbhE0RwR (ORCPT ); Thu, 27 May 2021 13:52:17 -0400 Received: from mail-qk1-x749.google.com (mail-qk1-x749.google.com [IPv6:2607:f8b0:4864:20::749]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC409C061574 for ; Thu, 27 May 2021 10:50:43 -0700 (PDT) Received: by mail-qk1-x749.google.com with SMTP id z2-20020a3765020000b02903a5f51b1c74so1054037qkb.7 for ; Thu, 27 May 2021 10:50:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=PvqzhPZTSkattjrVKnNz3j6VGVFL3YcrWbevHOUvVDk=; b=SmEatSt+CYoVczeBu51cV/iS+Qg3/gxIPvDKRq/v07awoW2MwIMNI2QB7hDfOktnFY 8qI4TE4XQjKYm94a8lzokOZzIxfwfMuWlIZe/ACjdvIv27R9yk0R2av8MmvJRMFmrY1m ycyPd7Z+yeSr2PJ+2ltJPeSBXK/KxDvcPY2AEAgCIYju+JVBGxxZQV7U8rjZq0qiwO6q K195h0uKDGqVYtnw6dnbouvyam4kWFnD5AuJ2HWAAasnGV56Xj9EwdKt1EEY46PmJiDX XHEdGqTDj/FXF7l+2NTP7gNIfy5ss+6pU0wHv8pWttURq0+h4UvPc+QGCLsxmddRVSi2 3TmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=PvqzhPZTSkattjrVKnNz3j6VGVFL3YcrWbevHOUvVDk=; b=WnoiW+eLYawmnmSNPWfN14M9lAH7gk8KTQNcda3M13xwRpvasjein8Ry3+Nd3S5e9a 90i/Cc+JOp9UkfbRbWXe4wk4ET8gLrQn7ayF+YoDTJjvRUTODR6F46nODd94qu7u0fBH GR0I+yOx0AzHc2YHsStIT7U9OFj/+MKqXDevIt0bNjAM0VE8tLyEXxHPcDG/rbJzixcu dqBn71pUoPuQg3rKb9xvNeEhnenNnlXyNDnCefUGQlKD1nc/RKkE8EebmusD/9z4sHK/ w3WQyoL3g02kB6i2LkX8KeZjmuuxQssLYvpp5EMFjCYkVbXZW4kOZoDdv2FigP5VWDgW cQjg== X-Gm-Message-State: AOAM531ofSilkeG2ZBeifjcjC/yrEDqa96SRb5RW9GuK72/l+NMSamr4 VkYHoj0qQxsJZR9IQaI6b4O2x/3YintS+omAjqB1 X-Google-Smtp-Source: ABdhPJysUIecJae/wldI3SI3uvNsKEWb5vHmvb8Q4Hvut+EFHW6myKL3xYqu97BfdRpx9n9ncu/GrJ/r2cfI7nAvPncy X-Received: from twelve4.c.googlers.com ([fda3:e722:ac3:10:24:72f4:c0a8:437a]) (user=jonathantanmy job=sendgmr) by 2002:a05:6214:c8b:: with SMTP id r11mr3008432qvr.52.1622137842948; Thu, 27 May 2021 10:50:42 -0700 (PDT) Date: Thu, 27 May 2021 10:50:39 -0700 In-Reply-To: Message-Id: <20210527175039.2672716-1-jonathantanmy@google.com> Mime-Version: 1.0 References: X-Mailer: git-send-email 2.32.0.rc0.204.g9fa02ecfa5-goog Subject: Re: [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() From: Jonathan Tan To: avarab@gmail.com Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net, j6t@kdbg.org, Jonathan Tan Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org > Start the libification of parse_loose_header() by making it return > error codes and data instead of invoking die() by itself. For now > we'll move the relevant die() call to loose_object_info() and > read_loose_object() to keep this change smaller, but in subsequent > commits we'll also libify those. > > The reason this makes sense is that with the refactoring of > parse_loose_header_extended() in an earlier commit the public > interface for parse_loose_header() no longer just accepts a "unsigned > long *sizep". Rather it accepts a "struct object_info *", that > structure will be populated with information about the object. > > It thus makes sense to further libify the interface so that it stops > calling die() when it encounters OBJ_BAD, and instead rely on its > callers to check the populated "oi->typep". > > This also allows us to simplify away the > unpack_loose_header_to_strbuf() function added in > 46f034483eb (sha1_file: support reading from a loose object of unknown > type, 2015-05-03). Its code was mostly copy/pasted between it and both > of unpack_loose_header() and unpack_loose_short_header(). We now have > a single unpack_loose_header() function which accepts an optional > "struct strbuf *" instead. > > I think the remaining unpack_loose_header() function could be further > simplified, we're carrying some complexity just to be able to emit a > garbage type longer than MAX_HEADER_LEN, we could alternatively just > say "we found a garbage type ..." instead, but let's > leave this in place for now. Looking at the patch itself, this patch: 1. Combines unpack_loose_header(), unpack_loose_short_header(), and unpack_loose_header_to_strbuf() into one function. It does different things depending on whether the struct strbuf * is provided. 2. Updates parse_loose_header() to: a. never die upon invalid object type b. not accept a flags argument (which was used solely to control whether it died or not upon invalid object type) 3. Updates the callers of these functions. I think 2b should have been mentioned in the commit message. (And overall I think that the commit message could be shorter, but what information to include and exclude is a subjective matter, so I won't concern myself so much about that.) Also, I think that 1 should be split into its own commit. As it is, I'm not even sure if it's a good idea - for me, it is confusing for a function to consume or not consume more of the stream depending on whether an argument is NULL or not. > /* > * Check if entire header is unpacked in the first iteration. > */ > if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer)) > return 0; > > + /* > + * We have a header longer than MAX_HEADER_LEN. We abort early > + * unless under we're running as e.g. "cat-file > + * --allow-unknown-type". > + */ > + if (!header) > + return -1; What do you mean by "unless under we're running as"? And how would we know at this point that we're running as "cat-file --allow-unknown-type" just by checking "header"? > - if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) { > - if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0) > - status = error(_("unable to unpack %s header with --allow-unknown-type"), > - oid_to_hex(oid)); > - } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) { > + > + hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr), > + allow_unknown ? &hdrbuf : NULL); > + if (hdr_ret < 0) { > status = error(_("unable to unpack %s header"), > oid_to_hex(oid)); > } This hunk would go into the split commit (for the unpack_loose_header() refactoring) I suggested above. > - > - if (status < 0) { > - /* Do nothing */ > - } else if (hdrbuf.len) { > - if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0) > - status = error(_("unable to parse %s header with --allow-unknown-type"), > - oid_to_hex(oid)); > - } else if ((status = parse_loose_header(hdr, oi, flags)) < 0) { > + if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) { > status = error(_("unable to parse %s header"), oid_to_hex(oid)); > } > + if (!allow_unknown && *oi->typep < 0) > + die(_("invalid object type")); I think this change doesn't need to be so big? The oi->typep check could just go in the "else if" branch that happens if --allow-unknown-type is not set. > diff --git a/object-store.h b/object-store.h > index d443964447..740edcac30 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -477,11 +477,30 @@ int for_each_object_in_pack(struct packed_git *p, > int for_each_packed_object(each_packed_object_fn, void *, > enum for_each_object_flags flags); > > +/** > + * unpack_loose_header() initializes the data stream needed to unpack > + * a loose object header. > + * > + * Returns 0 on success. Returns negative values on error. > + * > + * It will only parse up to MAX_HEADER_LEN bytes unless an optional > + * "hdrbuf" argument is non-NULL. This is intended for use with > + * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error) > + * reporting. The full header will be extracted to "hdrbuf" for use > + * with parse_loose_header(). > + */ > int unpack_loose_header(git_zstream *stream, unsigned char *map, > unsigned long mapsize, void *buffer, > + unsigned long bufsiz, struct strbuf *hdrbuf); Parsing up to MAX_HEADER_LEN only occasionally is confusing. Could we always make it parse up to MAX_HEADER_LEN, and then put a truncated header in hdrbuf if it is too big (and therefore invalid)? > +/** > + * parse_loose_header() parses the starting " \0" of an > + * object. If it doesn't follow that format -1 is returned. To check > + * the validity of the populate the "typep" in the "struct > + * object_info". It will be OBJ_BAD if the object type is unknown. > + */ > +int parse_loose_header(const char *hdr, struct object_info *oi); Also mention what happens if the size is invalid.