git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Ian Jackson <ijackson@chiark.greenend.org.uk>,
	Joey Hess <id@joeyh.name>,
	git@vger.kernel.org
Subject: Re: SHA1 collisions found
Date: Fri, 24 Feb 2017 18:39:29 -0500	[thread overview]
Message-ID: <20170224233929.p2yckbc6ksyox5nu@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq60jz5wbm.fsf@gitster.mtv.corp.google.com>

On Fri, Feb 24, 2017 at 09:32:13AM -0800, Junio C Hamano wrote:

> >  * Therefore the transition needs to be done by giving every object
> >    two names (old and new hash function).  Objects may refer to each
> >    other by either name, but must pick one.  The usual shape of
> 
> I do not think it is necessrily so.  Existing code may not be able
> to read anything new, but you can make the new code understand
> object names in both formats, and for a smooth transition, I think
> the new code needs to.
> 
> For example, a new commit that records a merge of an old and a new
> commit whose resulting tree happens to be the same as the tree of
> the old commit may begin like so:
> 
>     tree 21b97d4c4f968d1335f16292f954dfdbb91353f0
>     parent 20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69
>     parent 22af6fef9b6538c9e87e147a920be9509acf1ddd
> 
> naming the only object whose name was done with new hash with the
> new longer hash, while recording the names of the other existing
> objects with SHA-1.  We would need to extend the object format for
> tag (which would be trivial as the object reference is textual and
> similar to a commit) and tree (much harder), of course.

One thing I worry about in a mixed-hash setting is how often the two
will be mixed. That will lead to interoperability complications, but I
also think it creates security hazards (if I can convince you somehow to
refer to my evil colliding file by its sha1, for example, then I can
subvert the strength of the new hash).

So I'd much rather see strong rules like:

  1. Once a repo has flag-day switched over to the new hash format[1],
     new references are _always_ done with the new hash. Even ones that
     point to pre-flag-day objects!

     So you get a "commit-v2" object instead of a "commit", and it has a
     distinct hash identity from its "commit" counterpart. You can point
     to a classic "commit", but you do so by its new-hash.

     The flag-day switch would probably be a repo config flag based on
     repositoryformatversion (so old versions would just punt if they
     see it). Let's call this flag "newhash" for lack of a better term.

  2. Repos that have new-hash set will consider the new hash
     format as primary, and always use it when writing and referring to
     new objects (e.g., in refs). A (purely local) sha1->new mapping can
     be maintained for doing old-style object lookups, or for quick
     equivalence checks (this mapping might need to be bi-directional
     for some use cases; I haven't thought hard enough about it to say
     either way).

  3. For protocol interop, the rules would be something like[2]:

      a. If upload-pack is serving a newhash repo, it advertises
         so in the capabilities.

	 Recent clients know that the rest of the conversation will
	 involve the new hash format. If they're cloning, they set the
	 newhash flag in their local config.  If they're fetching, they
	 probably abort and say "please enable newhash" (because for an
	 existing repo, it probably needs to migrate refs, for example).

	 An old client would fail to send back the newhash capability,
	 and the server would abort the conversation at that point.

	 A new upload-pack serving a non-newhash repo behaves the same
	 as now (use sha1, happily interoperate with existing and new
	 clients).

      b. receive-pack is more or less the mirror image.

         A server for a newhash-flagged repo has a capability for "this
	 is a newhash repo" and advertises newhash refs. An existing
	 client might still try to push, but the server would reject it
	 unless it advertises "newhash" back to the server.

	 A newhash-enabled client on a non-newhash repo would abort more
	 gracefully ("please upgrade your local repo to newhash").

	 For a newhash-enabled server with a non-newhash repo, it would
	 probably not advertise anything (not even "I understand
	 newhash"). Because the process for converting to newhash is not
	 "just push some newhash objects", but an out-of-band flag-day
	 to convert it over.

That's just a sketch I came up with. There are probably holes. And it
definitely leaves a lot of _possible_ interoperability on the table in
favor of the flag-day approach. But I think the flag-day approach is a
lot easier to reason about. Both in the code, and in terms of the
security properties.

-Peff

[1] I was intentionally vague on "new hash format" here. Obviously there
    are various contenders like SHA-256. But I think there's also an
    open question of whether the new format should be a multi-hash
    format. That would ease further transitions. At the same time, we
    really _don't_ want people picking bespoke hashes for their
    repositories. It creates complications in the code, and it destroys
    a bunch of optimizations (like knowing when we are both talking
    about the same object based on the hash).

    So I am torn between "move to SHA-256 (or whatever)" and "move to a
    hash format that encodes the hash-type in the first byte, but refuse
    to allocate more than one hash for now".

[2] If we're having a flag-day event, this _might_ be time to consider
    some of the breaking protocol changes that have been under
    discussion.  I'm really hesitant to complicate this already-tricky
    issue by throwing in the kitchen sink. But if there's going to be a
    flag day where you need to upgrade Git to access certain repos, it
    might be nice if there's only one. I dunno.

  parent reply	other threads:[~2017-02-24 23:39 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-23 16:43 SHA1 collisions found Joey Hess
2017-02-23 17:00 ` David Lang
2017-02-23 17:02 ` Junio C Hamano
2017-02-23 17:12   ` David Lang
2017-02-23 20:49     ` Jakub Narębski
2017-02-23 20:57       ` Jeff King
2017-02-23 17:18   ` Junio C Hamano
2017-02-23 17:35   ` Joey Hess
2017-02-23 17:52     ` Linus Torvalds
2017-02-23 18:21       ` Joey Hess
2017-02-23 18:31         ` Joey Hess
2017-02-23 19:13           ` Morten Welinder
2017-02-24 15:52             ` Geert Uytterhoeven
2017-02-23 18:40         ` Linus Torvalds
2017-02-23 18:46           ` Jeff King
2017-02-23 19:09             ` Linus Torvalds
2017-02-23 19:32               ` Jeff King
2017-02-23 19:47                 ` Linus Torvalds
2017-02-23 19:57                   ` Jeff King
     [not found]                     ` <alpine.LFD.2.20.1702231428540.30435@i7.lan>
2017-02-23 22:43                       ` Jeff King
2017-02-23 22:50                         ` Linus Torvalds
2017-02-23 23:05                         ` Jeff King
2017-02-23 23:05                           ` [PATCH 1/3] add collision-detecting sha1 implementation Jeff King
2017-02-23 23:15                             ` Stefan Beller
2017-02-24  0:01                               ` Jeff King
2017-02-24  0:12                                 ` Linus Torvalds
2017-02-24  0:16                                   ` Jeff King
2017-02-23 23:05                           ` [PATCH 2/3] sha1dc: adjust header includes for git Jeff King
2017-02-23 23:06                           ` [PATCH 3/3] Makefile: add USE_SHA1DC knob Jeff King
2017-02-24 18:36                             ` HW42
2017-02-24 18:57                               ` Jeff King
2017-02-23 23:14                           ` SHA1 collisions found Linus Torvalds
2017-02-28 18:41                           ` Junio C Hamano
2017-02-28 19:07                             ` Junio C Hamano
2017-02-28 19:20                               ` Jeff King
2017-03-01  8:57                                 ` Dan Shumow
2017-02-28 19:34                               ` Linus Torvalds
2017-02-28 19:52                                 ` Shawn Pearce
2017-02-28 22:56                                   ` Linus Torvalds
2017-02-28 21:22                                 ` Dan Shumow
2017-02-28 22:50                                   ` Marc Stevens
2017-02-28 23:11                                     ` Linus Torvalds
2017-03-01 19:05                                       ` Jeff King
2017-02-23 20:47               ` Øyvind A. Holm
2017-02-23 20:46             ` Joey Hess
2017-02-23 18:42         ` Jeff King
2017-02-23 17:52     ` David Lang
2017-02-23 19:20   ` David Lang
2017-02-23 17:19 ` Linus Torvalds
2017-02-23 17:29   ` Linus Torvalds
2017-02-23 18:10   ` Joey Hess
2017-02-23 18:29     ` Linus Torvalds
2017-02-23 18:38     ` Junio C Hamano
2017-02-24  9:42 ` Duy Nguyen
2017-02-25 19:04   ` brian m. carlson
2017-02-27 13:29     ` René Scharfe
2017-02-28 13:25       ` brian m. carlson
2017-02-24 15:13 ` Ian Jackson
2017-02-24 17:04   ` ankostis
2017-02-24 17:23   ` Jason Cooper
2017-02-25 23:22     ` ankostis
2017-02-24 17:32   ` Junio C Hamano
2017-02-24 17:45     ` David Lang
2017-02-24 18:14       ` Junio C Hamano
2017-02-24 18:58         ` Stefan Beller
2017-02-24 19:20           ` Junio C Hamano
2017-02-24 20:05             ` ankostis
2017-02-24 20:32               ` Junio C Hamano
2017-02-25  0:31                 ` ankostis
2017-02-26  0:16                   ` Jason Cooper
2017-02-26 17:38                     ` brian m. carlson
2017-02-26 19:11                       ` Linus Torvalds
2017-02-26 21:38                         ` Ævar Arnfjörð Bjarmason
2017-02-26 21:52                           ` Jeff King
2017-02-27 13:00                             ` Transition plan for git to move to a new hash function Ian Jackson
2017-02-27 14:37                               ` Why BLAKE2? Markus Trippelsdorf
2017-02-27 15:42                                 ` Ian Jackson
2017-02-27 19:26                               ` Transition plan for git to move to a new hash function Tony Finch
2017-02-28 21:47                               ` brian m. carlson
2017-03-02 18:13                                 ` Ian Jackson
2017-03-04 22:49                                   ` brian m. carlson
2017-03-05 13:45                                     ` Ian Jackson
2017-03-05 23:45                                       ` brian m. carlson
2017-02-24 20:05             ` SHA1 collisions found Junio C Hamano
2017-02-24 20:33           ` Philip Oakley
2017-02-24 23:39     ` Jeff King [this message]
2017-02-25  0:39       ` Linus Torvalds
2017-02-25  0:54         ` Linus Torvalds
2017-02-25  1:16         ` Jeff King
2017-02-26 18:55           ` Junio C Hamano
2017-02-25  6:10         ` Junio C Hamano
2017-02-26  1:13           ` Jason Cooper
2017-02-26  5:18             ` Jeff King
2017-02-26 18:30               ` brian m. carlson
2017-03-02 21:46               ` Brandon Williams
2017-03-03 11:13                 ` Jeff King
2017-03-03 14:54                   ` Ian Jackson
2017-03-03 22:18                     ` Jeff King
2017-03-02 19:55         ` Linus Torvalds
2017-03-02 20:43           ` Junio C Hamano
2017-03-02 21:21             ` Linus Torvalds
2017-03-02 21:54               ` Joey Hess
2017-03-02 22:27                 ` Linus Torvalds
2017-03-03  1:50                   ` Mike Hommey
2017-03-03  2:19                     ` Linus Torvalds
2017-03-03 11:04           ` Jeff King
2017-03-03 21:47           ` Stefan Beller
2017-02-25  1:00       ` David Lang
2017-02-25  1:15         ` Stefan Beller
2017-02-25  1:21         ` Jeff King
2017-02-25  1:39           ` David Lang
2017-02-25  1:47             ` Jeff King
2017-02-25  1:56               ` David Lang
2017-02-25  2:28             ` Jacob Keller
2017-02-25  2:26           ` Jacob Keller
2017-02-25  5:39             ` grarpamp
2017-02-24 23:43     ` Ian Jackson
2017-02-25  0:06       ` Ian Jackson
2017-02-25 18:50     ` brian m. carlson
2017-02-25 19:26       ` Jeff King
2017-02-25 22:09         ` Mike Hommey
2017-02-26 17:38           ` brian m. carlson
2017-02-24 22:47 ` Jakub Narębski
2017-02-24 22:53   ` Santiago Torres
2017-02-24 23:05     ` Jakub Narębski
2017-02-24 23:24       ` Øyvind A. Holm
2017-02-24 23:06   ` Jeff King
2017-02-24 23:35     ` Jakub Narębski
2017-02-25 22:35     ` Lars Schneider
2017-02-26  0:46       ` Jeff King
2017-02-26 18:22         ` Junio C Hamano
2017-02-26 18:57     ` Thomas Braun
2017-02-26 21:30       ` Jeff King
2017-02-27  9:57         ` Geert Uytterhoeven
2017-02-27 10:43           ` Jeff King
2017-02-27 12:39             ` Morten Welinder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170224233929.p2yckbc6ksyox5nu@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=id@joeyh.name \
    --cc=ijackson@chiark.greenend.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).