From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id CBB5E1F5AE for ; Fri, 14 May 2021 18:10:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232969AbhENSLZ (ORCPT ); Fri, 14 May 2021 14:11:25 -0400 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:39221 "EHLO wout1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232649AbhENSLY (ORCPT ); Fri, 14 May 2021 14:11:24 -0400 Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.west.internal (Postfix) with ESMTP id 727C813A9; Fri, 14 May 2021 14:10:12 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Fri, 14 May 2021 14:10:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= linuxprogrammer.org; h=date:from:to:cc:subject:message-id :references:mime-version:content-type:content-transfer-encoding :in-reply-to; s=fm3; bh=SNcdWnYuJXqv3XR/URN1Z+d6Trw5VNcEMIFRBSll lyQ=; b=XchzOpNtXEuqKItkmICWvcgGOWTHpQadTDy9ACCx8Md9hYkH2R0Nji2c YgKojjwnWv/xW8cHAap8kQR/RCc0gkWAy7X96AlJzKPJJ+nw9PcLRkjIi//C0cSC BgV9HsXZ8Xol12LjC1g/Wjb8I+yNWcUtn+DU9q5VFU1x+cN2K2wkbx1PR5tHisjd nUKtzpiE5dXBdGom72WvysYZemMEW0rVmn652pF/LEGqX7OWMvE9ogdU6FobF7Mu wYqajPCFKcSJo0pEx881fXTenHzK1uhyO2pId5t4dQFnV99HqrL0VD3y4edg4NdU t5k8/OLIjNzYrvqP7Od5A+vngQaGpw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=SNcdWnYuJXqv3XR/URN1Z+d6Trw5VNcEMIFRBSlll yQ=; b=RjdP4iQvKWo3vnOodhdHPXK8rvHWHbMJZ/5VSfZe6/HUUTR/QgJuWYedO /jwjAquS049z2okcgTyNKZ0O5wg3eCs9bQgYrVCZjWIVukFAzFjqYqAFvWUMfuuT tAr+aNur6BCEKJ+t3VlCFEV3xFZG8Ly9WiR0NV+mrJsiiIb/+QMqKQkMBgKDd2Vd YXgI6qzZGz9/fVuOIZrGk5v4mPpD/vKrDCKpt/jACmUllmEi+4FYkKTv2tYLSGqy NPkMx/5Rly+U67w+25T8NfmZOtKEAmiQlEnjsFP9ogcjCmQoCNgaSaqup5UAHBI+ fcpMiaBsvPjj5e/WVrhW/3N9NcPXA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdehkedgvdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggugfgjfgesth ekredttderjeenucfhrhhomhepugifhheslhhinhhugihprhhoghhrrghmmhgvrhdrohhr ghenucggtffrrghtthgvrhhnpeegtdehhfefudeghfeftdejjeegleffveefvedutefhud egkeelffelvdevieekudenucfkphepudejgedrhedvrdduhedrfeejnecuvehluhhsthgv rhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepugifhheslhhinhhugihprh hoghhrrghmmhgvrhdrohhrgh X-ME-Proxy: Received: from localhost (c-174-52-15-37.hsd1.ut.comcast.net [174.52.15.37]) by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 14 May 2021 14:10:11 -0400 (EDT) Date: Fri, 14 May 2021 11:10:09 -0700 From: dwh@linuxprogrammer.org To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Junio C Hamano , "brian m. carlson" , git@vger.kernel.org Subject: Re: Is the sha256 object format experimental or not? Message-ID: <20210514181009.GB16542@localhost> References: <20210508022225.GH3986@localhost> <87lf8mu642.fsf@evledraar.gmail.com> <20210513202919.GE11882@localhost> <875yzlsngv.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <875yzlsngv.fsf@evledraar.gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 14.05.2021 10:49, Ævar Arnfjörð Bjarmason wrote: >I agree insofar that I don't see a good reason for us to support some >plethora of hash algorithms, but I wouldn't have objections to adding >more if people find them useful for some reason. See e.g. [1] for an >implementation. I think Git should not try to do any cryptographic operations at all and rely on external tools that are implemented properly and hardended. Implementing cryptography isn't just about translating the algorithm into code but also getting memory security correct, file handling correct, input security correct, control flow correct (equal cost multi-path), etc, etc. Most of the cryptography libraries aren't designed to be misuse resistant. The only one I know of that has that as a top-line requirement is Hyperledger Ursa [1]. I would like to see us remove all cryptography code (e.g. digests, digital signatures, etc) from Git and rely on external tools entirely. If we store the cryptographic material in a self-describing format that identifies the associated tool as well as the cryptographic data, then Git can be completely agnostic. >But I really don't see how anything you've said would present a >technical hurdle once we have SHA-1<->SHA-256 interop in a good enough >state. At that point we'll support re-hashing on arrival of content >hashed with algorithm X into Y, with a local lookup table between X<=>Y. > >So if somebody wants to maintain content hashed with algorithm Z locally >we should easily be able to support that. The "diversity of naming" >won't matter past that local repository, any mention of Z will be >translated to X or Y on fetch/push. Using self-describing formats allows us to honor history and keep old object names as they and eliminate all of this added complications you describe. I think there is a lot of room for errors to creep in when collaborators have copies of the same repo and they have local mappings between different hashing algorithms. How is this not setting up for a combinatorial explosion of data? If the canonical repo uses SHA1 and one contributor uses SHA2-512, another uses Blake2b-256, and yet another uses SHA3-384, won't they all have to maintain six different translation tables for all objects? SHA1 <=> SHA2-512, SHA1 <=> Blake2b-256, SHA1 <=> SHA3-384, SHA2-512 <=> Blake2b-256, SHA2-512 <=> SHA3-384, and Blake2b-256 <=> SHA3-384? I guess that's your motivation for not allowing algorithmic agility. The way around this is to use self-describing formats and external tools. Git repo copies wouldn't be required to have only *one* algorithm naming all objects, requiring the translation tables. Instead Git repos would/could have heterogeneous object names, each one with a single name generated with a different digest algorithm. Git would simply consider those names as plain strings and validating those strings requires talking to the correct external tool, sending the name string and the object data and reading back the result. I think this is a much better approach because: 1. It creates algorithmic agility in a way that isn't top-down and heavy handed. 2. It eliminates the need for all of the translation tables and round-tripping complexity. 3. It empowers maintainers to decide which algorithms can/must be used when naming objcts in a given repo. Merge hooks, CI/CD checks and etiquette guides can be used to enforce this. 4. Git's attack surface becomes smaller (a very good thing) and limited to doing IPC to external tools correctly and securely (easy) instead of trying to get cryptography client code correct (very difficult). One other thing to consider is that there are new tools being developed that do similar things as Git that do have algorithmic agility and use self-describing cryptographic primitives. Late-binding trust is now a best practice and has been for quite some time. Many people rely upon Git and I think we should keep up with the best practices. Cheers! Dave