From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 962651F5AE for ; Thu, 13 May 2021 23:26:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231510AbhEMX1b (ORCPT ); Thu, 13 May 2021 19:27:31 -0400 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:52353 "EHLO wout3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229544AbhEMX1a (ORCPT ); Thu, 13 May 2021 19:27:30 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 62DC210AB; Thu, 13 May 2021 19:26:17 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Thu, 13 May 2021 19:26:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= linuxprogrammer.org; h=date:from:to:cc:subject:message-id :references:mime-version:content-type:in-reply-to; s=fm3; bh=4uc CQWBkUFNGlzlR7jzr+sOAd1ACFZ34jySAF/jDei0=; b=QhoQ2Ol7fVRs2mK1vsN ctEsiomPIiYyTCRz4ek9u0nvEmIKew979k80a3hmKhW/cEecerJnIJS3fTBWvspd kqZJENQ5c46V6YWti1mehjy9/nzhgVylKcdWUTQFhref7i7FBsap9Vwt/E3MhqKX h8UMlW3NPYCFL6otqLxtsjngJihYDcl74wA8W2TAILrFK7mNAEWltkplrpcwnAAV 1KrKJw+MtZMSCHTfUlV4Ita1ths+f1yjfn4MaimQSUhU3dr7Kd90krfP9c9FnqjT S4Lh89Ri21Zm2XPUvmG9ot5Ebo4G0QPX9YaNSKhgEm5B+6GaSwY84o+JFoIblUn5 hrg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=4ucCQW BkUFNGlzlR7jzr+sOAd1ACFZ34jySAF/jDei0=; b=O8Mx/8Sq/+eS/6AR1KiS0V N9yxkonX+nO7/JUqb6AIcSuA12VCyRXWBuI7KYTpPI+/qn1FeP19Bs03eVp7vebs 9E9kVl/arQd6efwLvGyPGDxjeJzr5BKGlH1lyJFmmuPddZkH32xXIRvwsqm10Bju iMdlbg/zhuIUusjS3ms2VkTx5d51NBxp9r2dBPMQlbYERUbFIVR9EseKdUhv/FZA vd3Cmnm+P/OyOcs30+AyOcdxniiup1WEorXBCX23Vlg1IPT1w8wsiI0Q8lEhudBk ooUjVazjy44WDHOXbYLbkz/sig860YAHLvAc5B/dyayqLQKebAnzysm/HV+uAszg == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdehhedgvdduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujggfsehttd ertddtreejnecuhfhrohhmpegufihhsehlihhnuhigphhrohhgrhgrmhhmvghrrdhorhhg necuggftrfgrthhtvghrnhepgfdtkedvkedtfeevheeifeeggfffheejkeffvdelhffffe efleehvdehheeileelnecukfhppedujeegrdehvddrudehrdefjeenucevlhhushhtvghr ufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegufihhsehlihhnuhigphhroh hgrhgrmhhmvghrrdhorhhg X-ME-Proxy: Received: from localhost (c-174-52-15-37.hsd1.ut.comcast.net [174.52.15.37]) by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 13 May 2021 19:26:16 -0400 (EDT) Date: Thu, 13 May 2021 16:26:14 -0700 From: dwh@linuxprogrammer.org To: Junio C Hamano Cc: "brian m. carlson" , =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , git@vger.kernel.org Subject: Re: Is the sha256 object format experimental or not? Message-ID: <20210513232614.GF11882@localhost> References: <20210508022225.GH3986@localhost> <87lf8mu642.fsf@evledraar.gmail.com> <20210513202919.GE11882@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 14.05.2021 06:03, Junio C Hamano wrote: >dwh@linuxprogrammer.org writes: > >> I think Git should externalize the calculation of object digests just >> like it externalizes the calcualtion of object digital signatures. > >The hashing algorithms used to generate object names has >requirements fundamentally different from that of digital >signatures. I strongly suspect that that fact would change the >equation when you rethink what you said above. I agree with you. Object names are exactly that: names. Names for resources/data must be persistent, as well as global in scope and uniqueness, and autonomously assigned. What this means is that once an object has a name, that name shall never change as long as the object remains unchanged. The names must be unique in the scope of all objects (e.g. all copies of a repo) and generated without coordination. Calculating object names using a digest algorithm meets all of these requirements. Choosing a strong digest algorithm creates a strong cryptographic binding between the name and the object contents. Using self-describing digests allows for a repo to switch digest algorithms at arbitrary points in the history. I think that objects named with SHA1 digests should remain named with the SHA1 digest. I do *not* advocate going back and rewriting history to change all of the object names to a digest with a different algorithm. Git is a provenance log and history matters. I recommend preserving all existing names, even if they were created with known-weak digest algorithms, and making the change to a new algorithm at a specific point in time (e.g. at a tag). Using self-describing digest encoding and externalizing digest calculation future-proofs repositories and allows for preservation of history while allowing algorithm agility. To illustrate my point, I envision that a repos could have a history like this: object 2923f6fa36614586ea09b4424b438915cc1b9b67 (naked SHA1) | | object 5f167fb6b3e96273b564fff0b041fb94fee4d3de (naked SHA1) | | object 98c2e1c0965e60b0f137577ac5dd0a5c96ce224d (naked SHA1) | | | object IAOdLVxteOxQwKa-xn8yCBUkuPkjAqcuQ2V7fKAlao8o (self-desc.SHA2-256) | | | object EK832G0PFhBFf-Dfgr205UKpUMqmVXJX9ltLwQo4Awct (self-desc.SHA3-256) | . . . Neither decision to switch to SHA2-256 nor to SHA3-256 would require any code changes. If we continue down the current SHA-256 road, we will have to repeat that multi-year effort in the future to switch to SHA3 or something else. Most importantly, the choice of digest algorithm would be left up to the maintainers of a given repo and not limited to the algorithms we have hard coded into Git. Brian's work on the SHA-256 switch is valuable. We can leverage a lot of it to switch to externalized digest calculation and self-describing digests and never have to worry about doing that again. Cheers! Dave