From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 28EA5209B8 for ; Mon, 11 Sep 2017 18:59:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750999AbdIKS7R (ORCPT ); Mon, 11 Sep 2017 14:59:17 -0400 Received: from mail-pg0-f49.google.com ([74.125.83.49]:36372 "EHLO mail-pg0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750945AbdIKS7Q (ORCPT ); Mon, 11 Sep 2017 14:59:16 -0400 Received: by mail-pg0-f49.google.com with SMTP id i130so9557183pgc.3 for ; Mon, 11 Sep 2017 11:59:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=qUaZiy/6DgJfZEBbTZZLgjHwYMbIM1X6iu5iZal3D+4=; b=qSOfOi7ZLSgHVpCiU0mIKvT8pyecho3qp7BD+eTQQzvR8pO9woFfP4rqNgAgz3PRk9 kX18Fe9oiQgIf7PxgJm2/eDXU4oHS/fYp4/4ihYjiNC3WjbqntuMmyiWfp0UqCljuFp+ M/b9JQhENdzS6YAbjnUbgGXcuT+FB4nU78QbjTM/bn9zU9aOe+QPIjyFiWyHag/IASx3 I/bfGdnFab4Ehg9thooCg08U+z8dzwmwGpTfJuEk5tF/ZOb08tHS4INJjZHEK73xVyj6 c4TfjLZ3z7IObpUrkF9tkbSLSslO+lYhlQDcxciEfw2Lw+M4EPlNS9REGI3VVH4G2fsi Ap6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=qUaZiy/6DgJfZEBbTZZLgjHwYMbIM1X6iu5iZal3D+4=; b=Tlo/O5ih37TeRmRWT3tltvWvKidOzke4zUUrDikeaMXuHFTGM7ZLPJR8S7i0zYKX9R 4FD3diSLbp7YpJ0ZDGcId2QwqOu4fP2ABenWaYZeRRCblgvFStiRfrkGMOdZtYiJ6uxB X8QVNm8oru23SGzgOUAM9Xc5QJT2B7/SZoNmG3lFa+bU5A1JviDeYa+b87GN/QGZe2oA 4NqApvVf/XuqmUgCUIDzLQCBqwr19tgGJEqChSRYNn0pyR9iJGb1I0HaJ4oEhCt6omns 7zO4tBblRbeNQQV9UithtFFii+iVJaPDFdBgM0IOr9wNSPnFLWsCz4lr+iduDOmFPLWJ 5eow== X-Gm-Message-State: AHPjjUizYv+lFkyNH1G2XHrSmrrseesW2UqLLm2WAATxcSvuZea7ZvFG +JiQ7CSqfVPVqWZx X-Google-Smtp-Source: ADKCNb7u8t035mrZ2mtxgy47uGTqMpdhZZiwnTayQ4z/1l7Z+sh1BygAr4p1U46YJscVqyUZneamEA== X-Received: by 10.99.132.199 with SMTP id k190mr12780562pgd.25.1505156356052; Mon, 11 Sep 2017 11:59:16 -0700 (PDT) Received: from google.com ([2620:0:100e:422:cc97:adde:6e9f:4374]) by smtp.gmail.com with ESMTPSA id m190sm16569941pga.2.2017.09.11.11.59.14 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 11 Sep 2017 11:59:15 -0700 (PDT) Date: Mon, 11 Sep 2017 11:59:13 -0700 From: Brandon Williams To: Junio C Hamano Cc: Jonathan Nieder , Linus Torvalds , Git Mailing List , Stefan Beller , jonathantanmy@google.com, Jeff King , David Lang , "brian m. carlson" Subject: Re: RFC v3: Another proposed hash function transition plan Message-ID: <20170911185913.GA5869@google.com> References: <20170304011251.GA26789@aiede.mtv.corp.google.com> <20170307001709.GC26789@aiede.mtv.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 09/08, Junio C Hamano wrote: > Junio C Hamano writes: > > > One thing I still do not know how I feel about after re-reading the > > thread, and I didn't find the above doc, is Linus's suggestion to > > use the objects themselves as NewHash-to-SHA-1 mapper [*1*]. > > ... > > [Reference] > > > > *1* > > I think this falls into the same category as the often-talked-about > addition of the "generation number" field. It is very tempting to > add these "mechanically derivable but expensive to compute" pieces > of information to the sha3-content while converting from > sha1-content and creating anew. We didn't discuss that in the doc since this particular transition plan we made uses an external NewHash-to-SHA1 map instead of an internal one because we believe that at some point we would be able to drop compatibility with SHA1. Now I suspect that wont happen for a long time but I think it would be preferable over carrying the SHA1 luggage indefinitely. At some point, then, we would be able to stop hashing objects twice (once with SHA1 and once with NewHash) instead of always requiring that we hash them with each hash function which was used historically. > > Because the "sha1-name" or the "generation number" can mechanically > be computed, as long as everybody agrees to _always_ place them in > the sha3-content, the same sha1-content will be converted into > exactly the same sha3-content without ambiguity, and converting them > back to sha1-content while pushing to an older repository will > correctly produce the original sha1-content, as it would just be the > matter of simply stripping these extra pieces of information. > > The reason why I still feel a bit uneasy about adding these things > (aside from the fact that sha1-name thing will be a baggage we would > need to carry forever even after we completely wean ourselves off of > the old hash) is because I am not sure what we should do when we > encounter sha3-content in the wild that has these things _wrong_. > An object that exists today in the SHA-1 world is fetched into the > new repository and converted to SHA-3 contents, and Linus's extra > "original SHA-1 name" field is added to the object's header while > recording the SHA-3 content. But for whatever reason, the original > SHA-1 name is recorded incorrectly in the resulting SHA-3 object. This wasn't one of the issues that I thought of but it just makes the argument against adding sha1's to the sha3 content stronger. > > The same thing could happen if we decide to bake "generation number" > in the SHA-3 commit objects. One possible definition would be that > a root commit will have gen #0; a commit with 1 or more parents will > get max(parents' gen numbers) + 1 as its gen number. But somebody > may botch the counting and records sum(parents' gen numbers) as its > gen number. > > In these cases, not just the SHA3-content but also the resulting > SHA-3 object name would be different from the name of the object > that would have recorded the same contents correctly. So converting > back to SHA-1 world from these botched SHA-3 contents may produce > the original contents, but we may end up with multiple "plausibly > looking" set of SHA-3 objects that (clain to) correspond to a single > SHA-1 object, only one of which is a valid one. > > Our "git fsck" already treats certain brokenness (like a tree whose > entry has mode that is 0-padded to the left) as broken but still > tolerate them. I am not sure if it is sufficient to diagnose and > declare broken and invalid when we see sha3-content that records > these "mechanically derivable but expensive to compute" pieces of > information incorrectly. > > I am leaning towards saying "yes, catching in fsck is enough" and > suggesting to add generation number to sha3-content of the commit > objects, and to add even the "original sha1 name" thing if we find > good use of it. But I cannot shake this nagging feeling off that I > am missing some huge problems that adding these fields and opening > ourselves to more classes of broken objects. > > Thoughts? > > -- Brandon Williams