git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Ian Jackson <ijackson@chiark.greenend.org.uk>,
	Joey Hess <id@joeyh.name>, Git Mailing List <git@vger.kernel.org>
Subject: Re: SHA1 collisions found
Date: Sun, 26 Feb 2017 10:55:09 -0800	[thread overview]
Message-ID: <xmqqtw7gye7m.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170225011636.qjlv2luj3zefmrpz@sigill.intra.peff.net> (Jeff King's message of "Fri, 24 Feb 2017 20:16:36 -0500")

Jeff King <peff@peff.net> writes:

> Trees are more difficult, as they don't have any such field. But a valid
> tree does need to start with a mode, so sticking some non-numeric flag
> at the front of the object would work (it breaks backwards
> compatibility, but that's kind of the point).

Just like the object header format does not inherently impose a
maximum length the system can handle on our objects or the number of
mode bits we can use in an entry in the tree object [*1*], the
format in which tags and commits refer to other objects does not
impose what hash is used for these references [*2*].  

The object names in the tree format is an oddball; by being a binary
20-byte field and without any other hint, it does limit us to stick
to SHA-1.

I think the helper functions in tree-walk.h, namely 

	init_tree_desc();
	tree_entry_extract();
	update_tree_entry();

and the associated data structures can be updated to read a tree
object in a new format without affecting the readers too much.  By
having a "I am in a new format" byte at the beginning that cannot be
a valid first byte in the current tree format (non-octal is a good
thing to use here), init_tree_desc() can set things up in the desc
structure to expect that the data that will be read by
tree_entry_extract() and update_tree_entry() are formatted in a new
way, and by varying that "tree-format signature" byte, we can update
the format in the future.

So at the loose-object format level, we may not even need "tree2";
we can view this update in a way similar to the change we did when
we started supporting submodules/gitlinks.  Older Git would have
said "There is an object that is not tree or blob recorded" and
barfed but newer one takes such a tree just fine.  This "we are now
introducing a new hash, and a tree can either have objects all named
by SHA-1 or all new (non SHA-1) hash" update can be treated the same
way, methinks.

The normal flow to write tree objects is (supposed to be) all
contained in cache-tree.c.  As long as we can tell from "struct
object" which hash names the object (i.e. struct object_id may
become an enum and a union), we should be able to use it to convert
objects near the tip of the existing history to new hashes
incrementally. Ideally, the flag-day for one tip of a dag may be
just a matter of

	git commit --allow-empty -m "object name hash update"

without anything else.  The commit by default would want to name
itself with the new hash, which requires it to get its tree named
with the new hash, which may read the old tree and associated blobs
all named with SHA-1, but write_index_as_tree() should be able to
(1) read the tree with its SHA-1 name to learn what is contained;
(2) read the contents of blobs with their SHA-1 names, and compute
their names with the new hash; and (3) write out a containing tree
object in the updated format and named with the new hash.  And that
would give us the tree object named with the new hash that the
command can write into the new commit object on its "tree" line.


[Footnote]

*1* These lengths and mode bits are spelled out in ASCII without any
    fixed length limit for the number of the bytes in this ASCII
    string that represents the length.  The current code may happen
    to read them into unsigned long and unsigned int, which does
    impose limit on the individual reader in the sense that if your
    ulong is only 32-bit, you cannot have an object larger than 4GB.
    But that is not an inherent limit in the format; you can lift it
    by upgrading the reader.

*2* They are also spelled out in ASCII and there is no length limit.
    Existing implementation may happen to assume that they are all
    SHA-1, but the readers and the writers can be updated to allow
    other hashes to be used in a way that does not break existing
    code when we are only using SHA-1 by marking a reference that
    uses new hash distinguishable from SHA-1 references.

  reply	other threads:[~2017-02-26 18:55 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-23 16:43 SHA1 collisions found Joey Hess
2017-02-23 17:00 ` David Lang
2017-02-23 17:02 ` Junio C Hamano
2017-02-23 17:12   ` David Lang
2017-02-23 20:49     ` Jakub Narębski
2017-02-23 20:57       ` Jeff King
2017-02-23 17:18   ` Junio C Hamano
2017-02-23 17:35   ` Joey Hess
2017-02-23 17:52     ` Linus Torvalds
2017-02-23 18:21       ` Joey Hess
2017-02-23 18:31         ` Joey Hess
2017-02-23 19:13           ` Morten Welinder
2017-02-24 15:52             ` Geert Uytterhoeven
2017-02-23 18:40         ` Linus Torvalds
2017-02-23 18:46           ` Jeff King
2017-02-23 19:09             ` Linus Torvalds
2017-02-23 19:32               ` Jeff King
2017-02-23 19:47                 ` Linus Torvalds
2017-02-23 19:57                   ` Jeff King
     [not found]                     ` <alpine.LFD.2.20.1702231428540.30435@i7.lan>
2017-02-23 22:43                       ` Jeff King
2017-02-23 22:50                         ` Linus Torvalds
2017-02-23 23:05                         ` Jeff King
2017-02-23 23:05                           ` [PATCH 1/3] add collision-detecting sha1 implementation Jeff King
2017-02-23 23:15                             ` Stefan Beller
2017-02-24  0:01                               ` Jeff King
2017-02-24  0:12                                 ` Linus Torvalds
2017-02-24  0:16                                   ` Jeff King
2017-02-23 23:05                           ` [PATCH 2/3] sha1dc: adjust header includes for git Jeff King
2017-02-23 23:06                           ` [PATCH 3/3] Makefile: add USE_SHA1DC knob Jeff King
2017-02-24 18:36                             ` HW42
2017-02-24 18:57                               ` Jeff King
2017-02-23 23:14                           ` SHA1 collisions found Linus Torvalds
2017-02-28 18:41                           ` Junio C Hamano
2017-02-28 19:07                             ` Junio C Hamano
2017-02-28 19:20                               ` Jeff King
2017-03-01  8:57                                 ` Dan Shumow
2017-02-28 19:34                               ` Linus Torvalds
2017-02-28 19:52                                 ` Shawn Pearce
2017-02-28 22:56                                   ` Linus Torvalds
2017-02-28 21:22                                 ` Dan Shumow
2017-02-28 22:50                                   ` Marc Stevens
2017-02-28 23:11                                     ` Linus Torvalds
2017-03-01 19:05                                       ` Jeff King
2017-02-23 20:47               ` Øyvind A. Holm
2017-02-23 20:46             ` Joey Hess
2017-02-23 18:42         ` Jeff King
2017-02-23 17:52     ` David Lang
2017-02-23 19:20   ` David Lang
2017-02-23 17:19 ` Linus Torvalds
2017-02-23 17:29   ` Linus Torvalds
2017-02-23 18:10   ` Joey Hess
2017-02-23 18:29     ` Linus Torvalds
2017-02-23 18:38     ` Junio C Hamano
2017-02-24  9:42 ` Duy Nguyen
2017-02-25 19:04   ` brian m. carlson
2017-02-27 13:29     ` René Scharfe
2017-02-28 13:25       ` brian m. carlson
2017-02-24 15:13 ` Ian Jackson
2017-02-24 17:04   ` ankostis
2017-02-24 17:23   ` Jason Cooper
2017-02-25 23:22     ` ankostis
2017-02-24 17:32   ` Junio C Hamano
2017-02-24 17:45     ` David Lang
2017-02-24 18:14       ` Junio C Hamano
2017-02-24 18:58         ` Stefan Beller
2017-02-24 19:20           ` Junio C Hamano
2017-02-24 20:05             ` ankostis
2017-02-24 20:32               ` Junio C Hamano
2017-02-25  0:31                 ` ankostis
2017-02-26  0:16                   ` Jason Cooper
2017-02-26 17:38                     ` brian m. carlson
2017-02-26 19:11                       ` Linus Torvalds
2017-02-26 21:38                         ` Ævar Arnfjörð Bjarmason
2017-02-26 21:52                           ` Jeff King
2017-02-27 13:00                             ` Transition plan for git to move to a new hash function Ian Jackson
2017-02-27 14:37                               ` Why BLAKE2? Markus Trippelsdorf
2017-02-27 15:42                                 ` Ian Jackson
2017-02-27 19:26                               ` Transition plan for git to move to a new hash function Tony Finch
2017-02-28 21:47                               ` brian m. carlson
2017-03-02 18:13                                 ` Ian Jackson
2017-03-04 22:49                                   ` brian m. carlson
2017-03-05 13:45                                     ` Ian Jackson
2017-03-05 23:45                                       ` brian m. carlson
2017-02-24 20:05             ` SHA1 collisions found Junio C Hamano
2017-02-24 20:33           ` Philip Oakley
2017-02-24 23:39     ` Jeff King
2017-02-25  0:39       ` Linus Torvalds
2017-02-25  0:54         ` Linus Torvalds
2017-02-25  1:16         ` Jeff King
2017-02-26 18:55           ` Junio C Hamano [this message]
2017-02-25  6:10         ` Junio C Hamano
2017-02-26  1:13           ` Jason Cooper
2017-02-26  5:18             ` Jeff King
2017-02-26 18:30               ` brian m. carlson
2017-03-02 21:46               ` Brandon Williams
2017-03-03 11:13                 ` Jeff King
2017-03-03 14:54                   ` Ian Jackson
2017-03-03 22:18                     ` Jeff King
2017-03-02 19:55         ` Linus Torvalds
2017-03-02 20:43           ` Junio C Hamano
2017-03-02 21:21             ` Linus Torvalds
2017-03-02 21:54               ` Joey Hess
2017-03-02 22:27                 ` Linus Torvalds
2017-03-03  1:50                   ` Mike Hommey
2017-03-03  2:19                     ` Linus Torvalds
2017-03-03 11:04           ` Jeff King
2017-03-03 21:47           ` Stefan Beller
2017-02-25  1:00       ` David Lang
2017-02-25  1:15         ` Stefan Beller
2017-02-25  1:21         ` Jeff King
2017-02-25  1:39           ` David Lang
2017-02-25  1:47             ` Jeff King
2017-02-25  1:56               ` David Lang
2017-02-25  2:28             ` Jacob Keller
2017-02-25  2:26           ` Jacob Keller
2017-02-25  5:39             ` grarpamp
2017-02-24 23:43     ` Ian Jackson
2017-02-25  0:06       ` Ian Jackson
2017-02-25 18:50     ` brian m. carlson
2017-02-25 19:26       ` Jeff King
2017-02-25 22:09         ` Mike Hommey
2017-02-26 17:38           ` brian m. carlson
2017-02-24 22:47 ` Jakub Narębski
2017-02-24 22:53   ` Santiago Torres
2017-02-24 23:05     ` Jakub Narębski
2017-02-24 23:24       ` Øyvind A. Holm
2017-02-24 23:06   ` Jeff King
2017-02-24 23:35     ` Jakub Narębski
2017-02-25 22:35     ` Lars Schneider
2017-02-26  0:46       ` Jeff King
2017-02-26 18:22         ` Junio C Hamano
2017-02-26 18:57     ` Thomas Braun
2017-02-26 21:30       ` Jeff King
2017-02-27  9:57         ` Geert Uytterhoeven
2017-02-27 10:43           ` Jeff King
2017-02-27 12:39             ` Morten Welinder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtw7gye7m.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=id@joeyh.name \
    --cc=ijackson@chiark.greenend.org.uk \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).