git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, sbeller@google.com, bmwill@google.com,
	jonathantanmy@google.com,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: RFC: Another proposed hash function transition plan
Date: Mon, 6 Mar 2017 03:43:53 -0500	[thread overview]
Message-ID: <20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net> (raw)
In-Reply-To: <20170304011251.GA26789@aiede.mtv.corp.google.com>

On Fri, Mar 03, 2017 at 05:12:51PM -0800, Jonathan Nieder wrote:

> This past week we came up with this idea for what a transition to a new
> hash function for Git would look like.  I'd be interested in your
> thoughts (especially if you can make them as comments on the document,
> which makes it easier to address them and update the document).

Overall it's an interesting idea. I thought at first that you were
suggesting servers do on-the-fly conversion, but after a more careful
reading that isn't the case. And I don't think that would work, because
the conversion is expensive.

So this pushes the conversion cost onto the clients who decide to move
to SHA-256. That may be a problem for sites which have a lot of clients
(like CI hosts). But I guess they would just stick with SHA-1 as long as
possible, until the upstream repo switches (and that _is_ a per-repo
flag day, because the upstream host isn't going to convert back to SHA-1
on the fly to serve the old clients).

> You can use the doc URL
> 
>  https://goo.gl/gh2Mzc

I'd encourage anybody following along to follow that link. I almost
didn't, but there are a ton of comments there (I'm not sure how I feel
about splitting the discussion off the list, though).

> Goals
> -----
> 1. The transition to SHA256 can be done one local repository at a time.
>    a. Requiring no action by any other party.
>    b. A SHA256 repository can communicate with SHA-1 Git servers and
>       clients (push/fetch).
>    c. Users can use SHA-1 and SHA256 identifiers for objects
>       interchangeably.
>    d. New signed objects make use of a stronger hash function than
>       SHA-1 for their security guarantees.
> 2. Allow a complete transition away from SHA-1.
>    a. Local metadata for SHA-1 compatibility can be dropped in a
>       repository if compatibility with SHA-1 is no longer needed.

I suspect we'll never get away from keeping the mapping table. You'll
need at least the sha1->sha256 table if you want to look up names found
in historic commit messages, mailing list posts, etc.

And you'll need the sha256->sha1 table if you want to verify the gpg
signatures on old tags and commits. That might be something people are
willing to drop, though.

> After negotiation, the server sends a packfile containing the
> requested objects. We convert the packfile to SHA-256 format using the
> following steps:
> 
> 1. index-pack: inflate each object in the packfile and compute its
>    SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
>    objects the client has locally. These objects can be looked up using
>    the translation table and their sha1-content read as described above
>    to resolve the deltas.
> 2. topological sort: starting at the "want"s from the negotiation
>    phase, walk through objects in the pack and emit a list of them in
>    topologically sorted order. (This list only contains objects
>    reachable from the "wants". If the pack from the server contained
>    additional extraneous objects, then they will be discarded.)

I don't think we do this right now, but you can actually find the entry
(and exit) points of a pack during the index-pack step. Basically:

  1. Keep a hashmap of objects mentioned in the pack.

  2. When we process an object's content (i.e., compute its hash), also
     parse it for any object references. Add entries in the hashmap for
     any object mentioned this way. Mark the entry for the object we
     processed with a "HAVE" bit, and mark any referenced object with a
     "REF" bit.

  3. After processing all objects, anything with a "HAVE" but no "REF"
     is an entry point to the pack (i.e., something that we should have
     asked for with a want). Anything with a "REF" but not a "HAVE" is
     an exit point (i.e., an object that we are expected to already have
     in our repo).

     (I've thought about this before because we could possibly shortcut
     the connectivity check using the exit points. It's complicated by
     the fact that we don't assume the transitive presence of objects
     unless they are reachable).

I don't think using the "want"s as the entry points is unreasonable,
though. The server _shouldn't_ generally be sending us other cruft.

I do wonder if you might be able to omit the extra object-graph walk
from your step 2, if you could assign "depths" to each object during
step 1 instead of HAVE/REF bits. The trouble, of course, is that you're
not visiting the nodes in the right order (so given two trees, you're
not sure if one might eventually be a child of the other; how do you
assign their depths?). I have a feeling there's a proof that it's
impossible, but I might just not be clever enough.


Overall the basics of the conversion seem sound to me. The "nohash"
things seems more complicated than I think it ought to be, which
probably just means I'm missing something.  I left a few related
comments on the google doc, so I won't repeat them here.

-Peff

  parent reply	other threads:[~2017-03-06  9:11 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-04  1:12 RFC: Another proposed hash function transition plan Jonathan Nieder
2017-03-05  2:35 ` Linus Torvalds
2017-03-06  0:26   ` brian m. carlson
2017-03-06 18:24     ` Brandon Williams
2017-06-15 10:30       ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05         ` Mike Hommey
2017-06-15 13:01           ` Jeff King
2017-06-15 16:30             ` Ævar Arnfjörð Bjarmason
2017-06-15 19:34               ` Johannes Schindelin
2017-06-15 21:59                 ` Adam Langley
2017-06-15 22:41                   ` brian m. carlson
2017-06-15 23:36                     ` Ævar Arnfjörð Bjarmason
2017-06-16  0:17                       ` brian m. carlson
2017-06-16  6:25                         ` Ævar Arnfjörð Bjarmason
2017-06-16 13:24                           ` Johannes Schindelin
2017-06-16 17:38                             ` Adam Langley
2017-06-16 20:52                               ` Junio C Hamano
2017-06-16 21:12                                 ` Junio C Hamano
2017-06-16 21:24                                   ` Jonathan Nieder
2017-06-16 21:39                                     ` Ævar Arnfjörð Bjarmason
2017-06-16 20:42                             ` Jeff King
2017-06-19  9:26                               ` Johannes Schindelin
2017-06-15 21:10             ` Mike Hommey
2017-06-16  4:30               ` Jeff King
2017-06-15 17:36         ` Brandon Williams
2017-06-15 19:20           ` Junio C Hamano
2017-06-15 19:13         ` Jonathan Nieder
2017-03-07  0:17   ` RFC v3: " Jonathan Nieder
2017-03-09 19:14     ` Shawn Pearce
2017-03-09 20:24       ` Jonathan Nieder
2017-03-10 19:38         ` Jeff King
2017-03-10 19:55           ` Jonathan Nieder
2017-09-28  4:43       ` [PATCH v4] technical doc: add a design doc for hash function transition Jonathan Nieder
2017-09-29  6:06         ` Junio C Hamano
2017-09-29  8:09           ` Junio C Hamano
2017-09-29 17:34           ` Jonathan Nieder
2017-10-02  8:25             ` Junio C Hamano
2017-10-02 19:41             ` Jason Cooper
2017-10-02  9:02         ` Junio C Hamano
2017-10-02 19:23         ` Jason Cooper
2017-10-03  5:40         ` Junio C Hamano
2017-10-03 13:08           ` Jason Cooper
2017-10-04  1:44         ` Junio C Hamano
2017-09-06  6:28     ` RFC v3: Another proposed hash function transition plan Junio C Hamano
2017-09-08  2:40       ` Junio C Hamano
2017-09-08  3:34         ` Jeff King
2017-09-11 18:59         ` Brandon Williams
2017-09-13 12:05           ` Johannes Schindelin
2017-09-13 13:43             ` demerphq
2017-09-13 22:51               ` Jonathan Nieder
2017-09-14 18:26                 ` Johannes Schindelin
2017-09-14 18:40                   ` Jonathan Nieder
2017-09-14 22:09                     ` Johannes Schindelin
2017-09-13 23:30               ` Linus Torvalds
2017-09-14 18:45                 ` Johannes Schindelin
2017-09-18 12:17                   ` Gilles Van Assche
2017-09-18 22:16                     ` Johannes Schindelin
2017-09-19 16:45                       ` Gilles Van Assche
2017-09-29 13:17                         ` Johannes Schindelin
2017-09-29 14:54                           ` Joan Daemen
2017-09-29 22:33                             ` Johannes Schindelin
2017-09-30 22:02                               ` Joan Daemen
2017-10-02 14:26                                 ` Johannes Schindelin
2017-09-18 22:25                     ` Jonathan Nieder
2017-09-26 17:05                   ` Jason Cooper
2017-09-26 22:11                     ` Johannes Schindelin
2017-09-26 22:25                       ` [PATCH] technical doc: add a design doc for hash function transition Stefan Beller
2017-09-26 23:38                         ` Jonathan Nieder
2017-09-26 23:51                       ` RFC v3: Another proposed hash function transition plan Jonathan Nieder
2017-10-02 14:54                         ` Jason Cooper
2017-10-02 16:50                           ` Brandon Williams
2017-10-02 14:00                       ` Jason Cooper
2017-10-02 17:18                         ` Linus Torvalds
2017-10-02 19:37                           ` Jeff King
2017-09-13 16:30             ` Jonathan Nieder
2017-09-13 21:52               ` Junio C Hamano
2017-09-13 22:07                 ` Stefan Beller
2017-09-13 22:18                   ` Jonathan Nieder
2017-09-14  2:13                     ` Junio C Hamano
2017-09-14 15:23                       ` Johannes Schindelin
2017-09-14 15:45                         ` demerphq
2017-09-14 22:06                           ` Johannes Schindelin
2017-09-13 22:15                 ` Junio C Hamano
2017-09-13 22:27                   ` Jonathan Nieder
2017-09-14  2:10                     ` Junio C Hamano
2017-09-14 12:39               ` Johannes Schindelin
2017-09-14 16:36                 ` Brandon Williams
2017-09-14 18:49                 ` Jonathan Nieder
2017-09-15 20:42                   ` Philip Oakley
2017-03-05 11:02 ` RFC: " David Lang
     [not found]   ` <CA+dhYEXHbQfJ6KUB1tWS9u1MLEOJL81fTYkbxu4XO-i+379LPw@mail.gmail.com>
2017-03-06  9:43     ` Jeff King
2017-03-06 23:40   ` Jonathan Nieder
2017-03-07  0:03     ` Mike Hommey
2017-03-06  8:43 ` Jeff King [this message]
2017-03-06 18:39   ` Jonathan Tan
2017-03-06 19:22     ` Linus Torvalds
2017-03-06 19:59       ` Brandon Williams
2017-03-06 21:53       ` Junio C Hamano
2017-03-07  8:59     ` Jeff King
2017-03-06 18:43   ` Junio C Hamano
2017-03-07 18:57 ` Ian Jackson
2017-03-07 19:15   ` Linus Torvalds
2017-03-08 11:20     ` Ian Jackson
2017-03-08 15:37       ` Johannes Schindelin
2017-03-08 15:40       ` Johannes Schindelin
2017-03-20  5:21         ` Use base32? Jason Hennessey
2017-03-20  5:58           ` Michael Steuer
2017-03-20  8:05             ` Jacob Keller
2017-03-21  3:07               ` Michael Steuer
2017-03-13  9:24 ` RFC: Another proposed hash function transition plan The Keccak Team
2017-03-13 17:48   ` Jonathan Nieder
2017-03-13 18:34     ` ankostis
2017-03-17 11:07       ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=sbeller@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).