From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FROM_EXCESS_BASE64,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 119311F453 for ; Wed, 1 May 2019 20:26:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726276AbfEAU0E (ORCPT ); Wed, 1 May 2019 16:26:04 -0400 Received: from mail-ed1-f66.google.com ([209.85.208.66]:35809 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726077AbfEAU0D (ORCPT ); Wed, 1 May 2019 16:26:03 -0400 Received: by mail-ed1-f66.google.com with SMTP id p26so277633edr.2 for ; Wed, 01 May 2019 13:26:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:user-agent:in-reply-to:date :message-id:mime-version:content-transfer-encoding; bh=ryJplpuvXck9mwHqLS2pTBzZnld4C69eohrfdFKFAJ8=; b=XKUOmfUKfkuPyxJJ2LzoreZmDzzDsROJy140cQJHpMmV2lkn5/6BoRTbAj9P/lcR0h 57uF4tst/Eq7Idy+b/QkExJa7JTx9JPq1IH7+idIfJk7HOBW/tYsI6Vb4FVNvIpF1x4m oFiSCWSeqyOx1FzASCjelsN86CBix3IJcGxs/kC09HnGipL47zYq6QeIg5xLLNIFq7fU Mtwuz6uImgxaA0UU0PlFZ8EOVeC2rhdmrjfUpVbbpwuy1SpaO4JSmnyeloao6ty63fBn Xbpt96d4/+7gpESufKYCIRVCqO2jRpOA2zdVkW1rX8/2f68uOGDpxjw3S4FaeQetGezo 142Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:user-agent :in-reply-to:date:message-id:mime-version:content-transfer-encoding; bh=ryJplpuvXck9mwHqLS2pTBzZnld4C69eohrfdFKFAJ8=; b=q7GoFkARIgPJf9AGC2nOFV/wHz4dZQbbb+bEIjAedt/qJOWM/VKTQnmgLP3ox7iATP tFk04a/sfWmfVWF8XsULpS+lrgbgJbgMQcWYfbYFFfLXjmJuukJ4s4WnIVkCpm7S3/nD ZxVwZKXYIZx065xTjpwHCG/B4CjPDY2j7U8NFPnSyeIBHJh3wj1dSRQoeb97+sO9kr+x xGnrJVG4b0pZW391k1Gfs4OTpMtGi1WjyG6NCmqJ2ZpXfap4c2NhTMnUlbOtF3/fOdAI wHYeaXIgF00Tz9y1MFENcBQYBH/h+R7qjp+nYCbjVNWIbK39ycPHphtBd9PW901mKLuP 6Sew== X-Gm-Message-State: APjAAAVjFFAS/P2pz/ZMclm8bfAlipWe1p+0m9FrLySDwbC2x0jRRQ+r wVBjxbrbkQm2h2Q4BVISACg= X-Google-Smtp-Source: APXvYqyRwPj7CUG7SLC7ie4PhCoZtsLIaHfrEjzXOH7eGvN0hhJV0fKMsFfAZNNjerLbh5lNMhGswQ== X-Received: by 2002:aa7:ce07:: with SMTP id d7mr84703edv.294.1556742361392; Wed, 01 May 2019 13:26:01 -0700 (PDT) Received: from evledraar (dhcp-077-251-215-224.chello.nl. [77.251.215.224]) by smtp.gmail.com with ESMTPSA id u47sm1196667edm.86.2019.05.01.13.25.59 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 May 2019 13:26:00 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Derrick Stolee via GitGitGadget Cc: git@vger.kernel.org, sandals@crustytoothpaste.net, Junio C Hamano Subject: Re: [PATCH v3 0/6] Create commit-graph file format v2 References: User-agent: Debian GNU/Linux buster/sid; Emacs 26.1; mu4e 1.1.0 In-reply-to: Date: Wed, 01 May 2019 22:25:59 +0200 Message-ID: <87lfzprkfc.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote: > The commit-graph file format has some shortcomings that were discussed > on-list: > > 1. It doesn't use the 4-byte format ID from the_hash_algo. > > 2. There is no way to change the reachability index from generation numb= ers > to corrected commit date [1]. > > 3. The unused byte in the format could be used to signal the file is > incremental, but current clients ignore the value even if it is > non-zero. > > This series adds a new version (2) to the commit-graph file. The fifth by= te > already specified the file format, so existing clients will gracefully > respond to files with a different version number. The only real change now > is that the header takes 12 bytes instead of 8, due to using the 4-byte > format ID for the hash algorithm. > > The new bytes reserved for the reachability index version and incremental > file formats are now expected to be equal to the defaults. When we update > these values to be flexible in the future, if a client understands > commit-graph v2 but not those new values, then it will fail gracefully. > > NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts > were significant and subtle. > > Updates in V3: Thanks for all the feedback so far! > > * Moved the version information into an unsigned char parameter, instead= of > a flag. > > * We no longer default to the v2 file format, as that will break users w= ho > downgrade. This required some changes to the test script. > > * Removed the "Future work" section from the commit-graph design document > in a new patch. > > * I did not change the file name for v2 file formats, as =C3=86var sugge= sted. > I'd like the discussion to continue on this topic. I won't repeat my outstanding v2 feedback about v1 & v2 incompatibilities, except to say that I'd in principle be fine with having a v2 format the way this series is adding it. I.e. saying "here it is, it's never written by default, we'll figure out these compat issues later". My only objection/nit on that point would be that the current docs/commit messages should make some mention of the really bad interactions between v1 and v2 on different git versions. However, having written this again I really don't understand why we need a v2 of this format at all. The current format is: [chunk offsets for our $num_chunks] [arbitrary chunk data for our $num_chunks] And you want to change it to: [chunk offsets for our $num_chunks] [arbitrary chunk data for our $num_chunks] Where "chunks" in the v1 format has always been a non-exhaustive list of things *where we ignore anything we don't know about*. So given our really bad compatibility issues with any non-v1 format I suggested "let's use a different filename". But on closer look I retract that. How about we instead just don't change the header? I.e.: * Let's just live with "1" as the marker for SHA-1. Yeah it would be cute to use 0x73686131 instead like "struct git_hash_algo", but we can live with a 1=3D0x73686131 ("sha1"), 2=3D0x73323536 ("s256") mapping somewhere. It's not like we're going to be running into the 255 limit of hash algorithms Git will support any time soon. * Don't add the reachability index version *to the header* or change the reserved byte to be an error (see [1] again). Instead we just add these things to new "chunks" as appropriate. As this patch of mine shows we can easily do that, and it doesn't error out on any existing version of git: https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95 I now can't imagine a situation where we'd ever need to change the format. We have 32 bits of chunk ids to play with, and can have 255 of these chunks at a time, and unknown chunks are ignored by existing versions and future version. We can even have more than 255 if it comes to that by having a special "extension" chunk, or even use the existing reserved byte for that and pull the nasty trick of putting another set after the existing file checksum, but I digress. If we ever find that we e.g. don't want to write SHA-1 data anymore but just want SHA-256 we just write a tiny amount of dummy data. Older git versions will shrug at what looks like a really incomplete commit graph data, but newer versions will know the real data is in some other chunk they know about. Ditto this "gen numbers or adjusted timestamps?" plan in https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/ We can have a chunk of adjusted timestamps into the generation number chunk, and even start adding chunks of other side-data, e.g. the path bloom filters... This E-Mail needs to stop at some point, but as a brief aside I don't see how this die("commit-graph hash algorithm does not match current algorithm") plan makes sense either. The hash-function-transition.txt plan describes how we'll have an index of SHA-1<->SHA-256 object names. Why would it be an error to read a SHA-1 commit-graph under SHA-256? Won't we just say "oh this graph lists the SHA-1s" and then use that lookup table to resolve them as SHA-256s? And as discussed once we go for extending things with chunks we can do a lot better. We'd just keep the SHA-1 data segment, and perhaps (for "just commits" cache locality) have another chunk of SHA-256 mappings to those SHA-1s in the commit-graph file. 1. See feedback on the v2 patch in https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/