git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>,
	Stefan Beller <sbeller@google.com>,
	bmwill@google.com, Jonathan Tan <jonathantanmy@google.com>,
	Jeff King <peff@peff.net>, David Lang <david@lang.hm>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Masaya Suzuki <masayasuzuki@google.com>,
	demerphq@gmail.com, The Keccak Team <keccak@noekeon.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v4] technical doc: add a design doc for hash function transition
Date: Mon, 02 Oct 2017 18:02:21 +0900
Message-ID: <xmqqefqlorc2.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170928044320.GA84719@aiede.mtv.corp.google.com>

Jonathan Nieder <jrnieder@gmail.com> writes:

> +Reading an object's sha1-content
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The sha1-content of an object can be read by converting all newhash-names
> +its newhash-content references to sha1-names using the translation table.

Sure.

> +Fetch
> +~~~~~
> +Fetching from a SHA-1 based server requires translating between SHA-1
> +and NewHash based representations on the fly.
> +
> +SHA-1s named in the ref advertisement that are present on the client
> +can be translated to NewHash and looked up as local objects using the
> +translation table.
> +
> +Negotiation proceeds as today. Any "have"s generated locally are
> +converted to SHA-1 before being sent to the server, and SHA-1s
> +mentioned by the server are converted to NewHash when looking them up
> +locally.

Any of our alternate object store by definition is a NewHash
repository--otherwise we'd violate "no mixing" rule.  It may or may
note have the translation table for its objects.  If it no longer
has the translation table (because it migrated to NewHash only world
before we did), then we can still use it as our alternate but we
cannot use it for the purpose of common ancestore discovery.

> +After negotiation, the server sends a packfile containing the
> +requested objects.

s/objects.$/& These are all SHA-1 contents./

> +We convert the packfile to NewHash format using
> +the following steps:
> +
> +1. index-pack: inflate each object in the packfile and compute its
> +   SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
> +   objects the client has locally. These objects can be looked up
> +   using the translation table and their sha1-content read as
> +   described above to resolve the deltas.

That procedure would give us the object's SHA-1 contents for
ref-delta objects.  For an ofs-delta object, by definition, its base
object should appear in the same packstream, so we should eventually
be able to get to the SHA-1 contents of the delta base, and from
there we can apply the delta to obtain the SHA-1 contents.  For a
non-delta object, we already have its SHA-1 contents in the
packstream.

So we can get SHA-1 names and SHA-1 contents of each and every
object in the packstream in this step.

Are we actually writing out a .pack/.idx pair that is usable in the
SHA-1 world at this stage?  Or are we going to read from something
we keep in-core in the step #3 below?

> +2. topological sort: starting at the "want"s from the negotiation
> +   phase, walk through objects in the pack and emit a list of them,
> +   excluding blobs, in reverse topologically sorted order, with each
> +   object coming later in the list than all objects it references.
> +   (This list only contains objects reachable from the "wants". If the
> +   pack from the server contained additional extraneous objects, then
> +   they will be discarded.)

Presumably this is a list of SHA-1 names, as we do not yet have
enough information to compute NewHash names yet at this point.  May
want to spell it out here.

Would it discard the auto-followed tags if we do the "traverse from
wants only"?  Traversing the objects in the packfile to find the
"tips" that are not referenced from any other object in the pack
might be necessary, and it shouldn't be too costly, I'd guess.

> +3. convert to newhash: open a new (newhash) packfile. Read the topologically
> +   sorted list just generated. For each object, inflate its
> +   sha1-content, convert to newhash-content, and write it to the newhash
> +   pack. Record the new sha1<->newhash mapping entry for use in the idx.

Are we doing any deltification here?  If we are computing .pack/.idx
pair that can be usable in the SHA-1 world in step #1, then reusing
blob deltas should be trivial (a good delta-base in the SHA-1 world
is a good delta-base in the NewHash world, too).  Things that have
outgoing references like trees, it might be possible that such a
heuristic may not give us the absolute best delta-base, but I guess
it would still be a good approximation to reuse the delta/base
object relationship in SHA-1 world to NewHash world, assuming that
the server did a good job choosing the bases.

> +4. sort: reorder entries in the new pack to match the order of objects
> +   in the pack the server generated and include blobs. Write a newhash idx
> +   file

OK.

> +5. clean up: remove the SHA-1 based pack file, index, and
> +   topologically sorted list obtained from the server in steps 1
> +   and 2.

Ah, OK, so we do write the SHA_1 pack/idx in the first step.  OK.

> +Push
> +~~~~
> +Push is simpler than fetch because the objects referenced by the
> +pushed objects are already in the translation table. The sha1-content
> +of each object being pushed can be read as described in the "Reading
> +an object's sha1-content" section to generate the pack written by git
> +send-pack.

OK.

> +Signed Commits
> +~~~~~~~~~~~~~~
> +We add a new field "gpgsig-newhash" to the commit object format to allow
> +signing commits without relying on SHA-1. It is similar to the
> +existing "gpgsig" field. Its signed payload is the newhash-content of the
> +commit object with any "gpgsig" and "gpgsig-newhash" fields removed.

Do we prepare for newerhash, too?  IOW, should the signed payload be
the newhash-contents with any field whose name is "gpgsig" or begins
with "gpgsig-" followed by anything?

> +This means commits can be signed
> +1. using SHA-1 only, as in existing signed commit objects
> +2. using both SHA-1 and NewHash, by using both gpgsig-newhash and gpgsig
> +   fields.
> +3. using only NewHash, by only using the gpgsig-newhash field.
> +
> +Old versions of "git verify-commit" can verify the gpgsig signature in
> +cases (1) and (2) without modifications and view case (3) as an
> +ordinary unsigned commit.

For old clients to be able to verify (2), signed payload for SHA-1
is everything in SHA-1 contents minus "gpgsig"; "gpgsig-newhash"
should not get excluded from the computation.  Am I correct?

I am primarily finding it a bit disturbing that there is a bit of
asymmetry here.

> +Signed Tags
> +~~~~~~~~~~~

This message stops here for now.

  parent reply index

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-04  1:12 RFC: Another proposed hash function transition plan Jonathan Nieder
2017-03-05  2:35 ` Linus Torvalds
2017-03-06  0:26   ` brian m. carlson
2017-03-06 18:24     ` Brandon Williams
2017-06-15 10:30       ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05         ` Mike Hommey
2017-06-15 13:01           ` Jeff King
2017-06-15 16:30             ` Ævar Arnfjörð Bjarmason
2017-06-15 19:34               ` Johannes Schindelin
2017-06-15 21:59                 ` Adam Langley
2017-06-15 22:41                   ` brian m. carlson
2017-06-15 23:36                     ` Ævar Arnfjörð Bjarmason
2017-06-16  0:17                       ` brian m. carlson
2017-06-16  6:25                         ` Ævar Arnfjörð Bjarmason
2017-06-16 13:24                           ` Johannes Schindelin
2017-06-16 17:38                             ` Adam Langley
2017-06-16 20:52                               ` Junio C Hamano
2017-06-16 21:12                                 ` Junio C Hamano
2017-06-16 21:24                                   ` Jonathan Nieder
2017-06-16 21:39                                     ` Ævar Arnfjörð Bjarmason
2017-06-16 20:42                             ` Jeff King
2017-06-19  9:26                               ` Johannes Schindelin
2017-06-15 21:10             ` Mike Hommey
2017-06-16  4:30               ` Jeff King
2017-06-15 17:36         ` Brandon Williams
2017-06-15 19:20           ` Junio C Hamano
2017-06-15 19:13         ` Jonathan Nieder
2017-03-07  0:17   ` RFC v3: " Jonathan Nieder
2017-03-09 19:14     ` Shawn Pearce
2017-03-09 20:24       ` Jonathan Nieder
2017-03-10 19:38         ` Jeff King
2017-03-10 19:55           ` Jonathan Nieder
2017-09-28  4:43       ` [PATCH v4] technical doc: add a design doc for hash function transition Jonathan Nieder
2017-09-29  6:06         ` Junio C Hamano
2017-09-29  8:09           ` Junio C Hamano
2017-09-29 17:34           ` Jonathan Nieder
2017-10-02  8:25             ` Junio C Hamano
2017-10-02 19:41             ` Jason Cooper
2017-10-02  9:02         ` Junio C Hamano [this message]
2017-10-02 19:23         ` Jason Cooper
2017-10-03  5:40         ` Junio C Hamano
2017-10-03 13:08           ` Jason Cooper
2017-10-04  1:44         ` Junio C Hamano
2017-09-06  6:28     ` RFC v3: Another proposed hash function transition plan Junio C Hamano
2017-09-08  2:40       ` Junio C Hamano
2017-09-08  3:34         ` Jeff King
2017-09-11 18:59         ` Brandon Williams
2017-09-13 12:05           ` Johannes Schindelin
2017-09-13 13:43             ` demerphq
2017-09-13 22:51               ` Jonathan Nieder
2017-09-14 18:26                 ` Johannes Schindelin
2017-09-14 18:40                   ` Jonathan Nieder
2017-09-14 22:09                     ` Johannes Schindelin
2017-09-13 23:30               ` Linus Torvalds
2017-09-14 18:45                 ` Johannes Schindelin
2017-09-18 12:17                   ` Gilles Van Assche
2017-09-18 22:16                     ` Johannes Schindelin
2017-09-19 16:45                       ` Gilles Van Assche
2017-09-29 13:17                         ` Johannes Schindelin
2017-09-29 14:54                           ` Joan Daemen
2017-09-29 22:33                             ` Johannes Schindelin
2017-09-30 22:02                               ` Joan Daemen
2017-10-02 14:26                                 ` Johannes Schindelin
2017-09-18 22:25                     ` Jonathan Nieder
2017-09-26 17:05                   ` Jason Cooper
2017-09-26 22:11                     ` Johannes Schindelin
2017-09-26 22:25                       ` [PATCH] technical doc: add a design doc for hash function transition Stefan Beller
2017-09-26 23:38                         ` Jonathan Nieder
2017-09-26 23:51                       ` RFC v3: Another proposed hash function transition plan Jonathan Nieder
2017-10-02 14:54                         ` Jason Cooper
2017-10-02 16:50                           ` Brandon Williams
2017-10-02 14:00                       ` Jason Cooper
2017-10-02 17:18                         ` Linus Torvalds
2017-10-02 19:37                           ` Jeff King
2017-09-13 16:30             ` Jonathan Nieder
2017-09-13 21:52               ` Junio C Hamano
2017-09-13 22:07                 ` Stefan Beller
2017-09-13 22:18                   ` Jonathan Nieder
2017-09-14  2:13                     ` Junio C Hamano
2017-09-14 15:23                       ` Johannes Schindelin
2017-09-14 15:45                         ` demerphq
2017-09-14 22:06                           ` Johannes Schindelin
2017-09-13 22:15                 ` Junio C Hamano
2017-09-13 22:27                   ` Jonathan Nieder
2017-09-14  2:10                     ` Junio C Hamano
2017-09-14 12:39               ` Johannes Schindelin
2017-09-14 16:36                 ` Brandon Williams
2017-09-14 18:49                 ` Jonathan Nieder
2017-09-15 20:42                   ` Philip Oakley
2017-03-05 11:02 ` RFC: " David Lang
     [not found]   ` <CA+dhYEXHbQfJ6KUB1tWS9u1MLEOJL81fTYkbxu4XO-i+379LPw@mail.gmail.com>
2017-03-06  9:43     ` Jeff King
2017-03-06 23:40   ` Jonathan Nieder
2017-03-07  0:03     ` Mike Hommey
2017-03-06  8:43 ` Jeff King
2017-03-06 18:39   ` Jonathan Tan
2017-03-06 19:22     ` Linus Torvalds
2017-03-06 19:59       ` Brandon Williams
2017-03-06 21:53       ` Junio C Hamano
2017-03-07  8:59     ` Jeff King
2017-03-06 18:43   ` Junio C Hamano
2017-03-07 18:57 ` Ian Jackson
2017-03-07 19:15   ` Linus Torvalds
2017-03-08 11:20     ` Ian Jackson
2017-03-08 15:37       ` Johannes Schindelin
2017-03-08 15:40       ` Johannes Schindelin
2017-03-20  5:21         ` Use base32? Jason Hennessey
2017-03-20  5:58           ` Michael Steuer
2017-03-20  8:05             ` Jacob Keller
2017-03-21  3:07               ` Michael Steuer
2017-03-13  9:24 ` RFC: Another proposed hash function transition plan The Keccak Team
2017-03-13 17:48   ` Jonathan Nieder
2017-03-13 18:34     ` ankostis
2017-03-17 11:07       ` Johannes Schindelin

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqefqlorc2.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=bmwill@google.com \
    --cc=david@lang.hm \
    --cc=demerphq@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=keccak@noekeon.org \
    --cc=masayasuzuki@google.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=sbeller@google.com \
    --cc=spearce@spearce.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox