git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	"git\@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Migrating away from SHA-1?
Date: Tue, 12 Apr 2016 18:03:02 -0700	[thread overview]
Message-ID: <xmqqlh4imibd.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20160412234251.GB2210@sigill.intra.peff.net> (Jeff King's message of "Tue, 12 Apr 2016 19:42:52 -0400")

Jeff King <peff@peff.net> writes:

> So a slightly nicer thing is to parameterize the algorithm for every
> object name reference. So commits look like:
>
>   tree sha256:1234abcd...
>   parent sha256:1234abcd...
>
> and so on. Of course trees don't have any space for this; they have a
> fixed-length for the hash part of each record, which is basically:
>
>   <mode> <name> NUL <20-byte-sha1>
>
> So we'd probably need a "treev2" object type that gives room for an
> algorithm byte (or we'd have to try to shove it into the mode, but since
> old versions won't know the new algorithm anyway, I don't think it
> solves that much...). Or you can just define for the whole tree object
> (either implicit in its type, or in a header) that it always uses
> algorithm X.

This will hurt the performance a lot during the transition period as
it no longer will be possible to rely on "most of the time a fine
grained commit changes only a small part of the tree, and we can
cheaply avoid descending into trees that haven't changed because we
can tell that the corresponding tree objects in the pre- and post-
trees have the same object name" optimization.  But we cannot avoid
it.

> Transitioning to that would be something like:
>
>   0. Overhaul all of the git code to handle arbitrary-sized object ids.
>
>   1. Decide on the new algorithm and implement it in git.
>
>   2. Recognize parameterized object ids in commits and tags (designing
>      format, implementing the reading side).
>
>   3. Recognize parameterized object ids somehow in trees (designing
>      format, implementing the reading side).
>
>   4. Teach the object database to index objects by the new algorithm (or
>      possibly both algorithms).
>
>   5. Add a protocol extension so that both sides can decide which
>      algorithm is being used when they talk about oids.
>
>   6. Add a config option to write references in objects using the new
>      algorithm.
>
>   7. After a while, flip the config option on. Hopefully the readers
>      from steps 1-5 have percolated to the masses by then, and it's not
>      a horrible flag day.
>
> We're basically on step 0 right now. I'm sure I'm missing some
> subtleties in there, too.

One subtlety is that 7. "not a flag day" may not be a good thing.

There has to be a section of a history that spans the transition,
set of commits and trees that have pointers to both kinds of object
names.  The narrower such a section of the history, the more
pleasant to use the result of the transition would be.

Different projects that can have their own flag days at their own
pace is a good thing, so the above observation does not invalidate
your transition plan, though.

  reply	other threads:[~2016-04-13  1:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-12 22:38 Migrating away from SHA-1? H. Peter Anvin
2016-04-12 23:00 ` Stefan Beller
2016-04-12 23:06   ` H. Peter Anvin
2016-04-12 23:15   ` Jeff King
2016-04-12 23:15   ` David Turner
2016-04-12 23:44     ` Jeff King
2016-04-14  1:53     ` Theodore Ts'o
2016-04-14 16:47       ` Joey Hess
2016-04-14 17:23       ` David Turner
2016-04-14 17:28         ` H. Peter Anvin
2016-04-14 22:40           ` Theodore Ts'o
2016-04-15  2:13             ` Jeff King
2016-04-15  2:18               ` Junio C Hamano
2016-04-15  2:22                 ` Jeff King
2016-04-12 23:42 ` Jeff King
2016-04-13  1:03   ` Junio C Hamano [this message]
2016-04-13  1:36     ` Jeff King
2016-04-13  1:38     ` H. Peter Anvin
2016-04-13  1:51 ` Duy Nguyen
2016-04-13  1:58   ` H. Peter Anvin
2016-04-15  1:50     ` brian m. carlson
  -- strict thread matches above, loose matches on Subject: below --
2016-06-18  2:10 Leo Gaspard
2016-06-18  3:30 ` Eric Wong
2016-06-24 18:17 ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqlh4imibd.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).