From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-2.5 required=3.0 tests=AWL,BAYES_00, DATE_IN_PAST_03_06,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD,T_DKIM_INVALID shortcircuit=no autolearn=no autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id EC6F12013E for ; Fri, 3 Mar 2017 01:43:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751530AbdCCBnm (ORCPT ); Thu, 2 Mar 2017 20:43:42 -0500 Received: from mail-it0-f54.google.com ([209.85.214.54]:34989 "EHLO mail-it0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750709AbdCCBnm (ORCPT ); Thu, 2 Mar 2017 20:43:42 -0500 Received: by mail-it0-f54.google.com with SMTP id 203so4785239ith.0 for ; Thu, 02 Mar 2017 17:42:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=m/yj7iNaXqAPayO5bSwN9qxsqjz3T2+5b7QIZHRRlWc=; b=CtWRPq9u7VtTvI8/g2mbH5/0qkYXLVPGLtShT5nqFbKLko75TO9C+3YwwPk4cNNwb7 ITmQLyMhmK+QsDbMbf6nDLITaYiIFtLPyOoUATGCNrr03RI9//hhOGpyDYhUtLmdkI7W U+BPColSETL710J6M3AJiYln0vhlSu5Y69jX0+Fo2YJbQ5NQm5jKZ/vEo1ptaB+M+Mv6 bc1nItufMpe1HWjegJ46RqP0h5HF3wtZwpFj1XSu3inSdZzKmbct/bXjYwPFDc3d9do3 EByJALvCdfnnmEhR7LiE/L520eG4PTUnwB1xF7ZP6Q4YKOmSSPKhZR95p3TaVz//DePe zujQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=m/yj7iNaXqAPayO5bSwN9qxsqjz3T2+5b7QIZHRRlWc=; b=rLgfp/DAsM93/mCMC6joY+8CtgDZskbn6fzaFdeumAS0wNLwdyRXeYBJGjByTEMDKC Und5aTdlZ3sG1A04ijLb8W7BUaIS17p5r5hIllevODYGt9WdBT1u1CsVXn30wQe83dL+ McRzOT2b6lTE1KDR3717iFA+plzUZet3WiSOwXkIuEsYlJnHLx3Nxb4oNSDntBP6sQVq ccBY2OQEXFbsjCowLT4+R9H6a6KDeK0c1f15g09bYcnU6zqk17tc97j4oiWONbF7lGg3 wKjZyqI/H/xll7WBOw/yWSmz8WQViXFfDVUZc3LFVYJUoraR5llGK5tJ/M5LUkcJsJru /5jA== X-Gm-Message-State: AMke39kxcB+UlvsL1BpiKPgRHAq6+fsdJO5os8dfIe0HfGgJ87tLTjvo8a/aRf97yCvuROazV7sWdyhzUymYKQ== X-Received: by 10.36.204.137 with SMTP id x131mr389125itf.35.1488489691328; Thu, 02 Mar 2017 13:21:31 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.146.131 with HTTP; Thu, 2 Mar 2017 13:21:30 -0800 (PST) In-Reply-To: References: <20170223164306.spg2avxzukkggrpb@kitenet.net> <22704.19873.860148.22472@chiark.greenend.org.uk> <20170224233929.p2yckbc6ksyox5nu@sigill.intra.peff.net> From: Linus Torvalds Date: Thu, 2 Mar 2017 13:21:30 -0800 X-Google-Sender-Auth: T0YgLyuT4GqQ8JknK_vgPJD-xOU Message-ID: Subject: Re: SHA1 collisions found To: Junio C Hamano Cc: Jeff King , Ian Jackson , Joey Hess , Git Mailing List Content-Type: text/plain; charset=UTF-8 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Mar 2, 2017 at 12:43 PM, Junio C Hamano wrote: > > My reaction heavily depends on how that "object version" thing > works. > > Would "object version" be like a truncated SHA-1 over the same data > but with different IV or something, i.e. something that guarantees > anybody would get the same result given the data to be hashed? Yes, it does need to be that in practice. So what I was thinking the object version would be is: (a) we actually take the object type into account explicitly. (b) we explicitly add another truncated hash. The first part we can already do without any actual data structure changes, since basically all users already know the type of an object when they look it up. So we already have information that we could use to narrow down the hash collision case if we saw one. There are some (very few) cases where we don't already explicitly have the object type (a tag reference can be any object, for example, and existing scripts might ask for "give me the type of this SHA1 object with "git cat-file -t"), but that just goes back to the whole "yeah, we'll handle legacy uses and we will look up objects even _without_ the extra version data, so it actually integrates well into the whole notion. Basically, once you accept that "hey, we'll just have a list of objects with that hash", it just makes sense to narrow it down by the object type we also already have. But yes, the object type is obviously only two bits of information (actually, considering the type distribution, probably just one bit), and it's already encoded in the first hash, so it doesn't actually help much as "collision avoidance" particularly once you have a particular attack against that hash in place. It's just that it *is* extra information that we already have, and that is very natural to use once you start thinking of the hash lookup as returning a list of objects. It also mitigates one of the worst _confusions_ in git, and so basically mitigates the worst-case downside of an attack basically for free, so it seems like a no-brainer. But the real new piece of object version would be a truncated second hash of the object. I don't think it matters too much what that second hash is, I would say that we'd just approximate having a total of 256 bits of hash. Since we already have basically 160 bits of fairly good hashing, and roughly 128 bits of that isn't known to be attackable, we'd just use another hash and truncate that to 128 bits. That would be *way* overkill in practice, but maybe overkill is what we want. And it wouldn't really expand the objects all that much more than just picking a new 256-bit hash would do. So you'd have to be able to attack both the full SHA1, _and_ whatever other different good hash to 128 bits. Linus PS. if people think that SHA1 is of a good _size_, and only worry about the known weaknesses of the hashing itself, we'd only need to get back the bits that the attacks take away from brute force. That's currently the 80 -> ~63 bits attack, so you'd really only want about 40 bits of second hash to claw us back back up to 80 bits of brute force (again: brute force is basically sqrt() of the search space, so half the bits, so adding 40 bits of hash adds 20 bits to the brute force cost and you'd get back up to the 2**80 we started with). So 128 bits of secondary hash really is much more than we'd need. 64 bits would probably be fine.