From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=0.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 749FF1F670 for ; Wed, 2 Mar 2022 18:15:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232126AbiCBSQY (ORCPT ); Wed, 2 Mar 2022 13:16:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239492AbiCBSQQ (ORCPT ); Wed, 2 Mar 2022 13:16:16 -0500 Received: from pb-smtp21.pobox.com (pb-smtp21.pobox.com [173.228.157.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B02FA0BE2 for ; Wed, 2 Mar 2022 10:15:21 -0800 (PST) Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id D760D17FD1E; Wed, 2 Mar 2022 13:15:15 -0500 (EST) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=30J8asKvaWaoGhTRQVBtrtFOTXJsJOQiB8rtrd oQt+I=; b=HB4NMB09ZUe1JyxEOURBJfvPPRgZMnujKiYffoVGiosxF8DYVdPHES 9DYni2Bs7vKvlX8FYUiYfY6SmhQXr/w4CEkoUTtJIghLep6/VsDq4BNpBdkeLnLi CWUxqvS2wWEizZx07ZcEJTDlH+6ed8x2H5UsXZdd/KR5MMcYgRSy8= Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id CF63317FD1D; Wed, 2 Mar 2022 13:15:15 -0500 (EST) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.82.80.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id 5405C17FD1B; Wed, 2 Mar 2022 13:15:13 -0500 (EST) (envelope-from junio@pobox.com) From: Junio C Hamano To: Derrick Stolee Cc: Patrick Steinhardt , Derrick Stolee via GitGitGadget , git@vger.kernel.org, me@ttaylorr.com, abhishekkumar8222@gmail.com Subject: Re: [PATCH 3/7] commit-graph: start parsing generation v2 (again) References: <1b9912f7-87be-2520-bb53-9e23529ad233@github.com> Date: Wed, 02 Mar 2022 10:15:10 -0800 In-Reply-To: (Derrick Stolee's message of "Wed, 2 Mar 2022 09:57:17 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: B43F2212-9A54-11EC-BC0F-CBA7845BAAA9-77302942!pb-smtp21.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Derrick Stolee writes: > Since our repro relies on private information, but is consistent, I > wonder if we should take the patch below, which starts to ignore the > older generation number v2 data and only writes freshly-computed > numbers. ;-) > Clearly, there is something else going on. The situation is not > completely understood, but the errors do not reproduce if the > commit-graphs are all generated by a Git version including these recent > fixes. Do you mean "we know doing X and then Y and then Z on this particular private data with older version of Git without those two fixes will lead to a broken timestamp, but doing exactly the same with the two fixes, the breakage does not reproduce"? If so, that is quite encouraging news. Thanks for working well together. > If we cannot trust the existing data in the GDAT and GDOV chunks, then > we can alter the format to change the chunk IDs for these chunks. This > causes the new version of Git to silently ignore the older chunks (and > disabling generation number v2 in the process) while writing new > commit-graph files with correct data in the GDA2 and GDO2 chunks. > > Update commit-graph-format.txt including a historical note about these > deprecated chunks. Sensible. > @@ -156,3 +156,11 @@ CHUNK DATA: > TRAILER: > > H-byte HASH-checksum of all of the above. > + > +== Historical Notes: > + > +The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have > +the number '2' in their chunk IDs because a previous version of Git wrote > +possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By > +changing the IDs, newer versions of Git will silently ignore those older > +chunks and write the new information without trusting the incorrect data. Good. How does a new version of Git skip and ignore GDAT and GDOV in existing files? By not having any code to recognize what they are? I am wondering if there is some notion of "if you do not understand what this chunk is, you are incapable of handling this file correctly, so do not use it" kind of bit per chunks (similar to the index extensions where ones that begin with [A-Z] are optional) that may negatively affect this plan. Thanks. > diff --git a/commit-graph.c b/commit-graph.c > index b86a6a634fe..fb2ced0bd6d 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -39,8 +39,8 @@ void git_test_write_commit_graph_or_die(void) > #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ > #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ > #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ > -#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */ > -#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */ > +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444132 /* "GDA2" */ > +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f32 /* "GDO2" */ > #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ > #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ > #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */