git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Edward Thomson <ethomson@edwardthomson.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	demerphq <demerphq@gmail.com>,
	Brandon Williams <bmwill@google.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: Questions about the hash function transition
Date: Fri, 24 Aug 2018 01:40:07 +0000	[thread overview]
Message-ID: <20180824014007.GF535143@genre.crustytoothpaste.net> (raw)
In-Reply-To: <878t4xfaes.fsf@evledraar.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5395 bytes --]

On Thu, Aug 23, 2018 at 04:02:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > [...]
> > Goals
> > -----
> > 1. The transition to SHA-256 can be done one local repository at a time.
> >    a. Requiring no action by any other party.
> >    b. A SHA-256 repository can communicate with SHA-1 Git servers
> >       (push/fetch).
> >    c. Users can use SHA-1 and SHA-256 identifiers for objects
> >       interchangeably (see "Object names on the command line", below).
> >    d. New signed objects make use of a stronger hash function than
> >       SHA-1 for their security guarantees.
> > 2. Allow a complete transition away from SHA-1.
> >    a. Local metadata for SHA-1 compatibility can be removed from a
> >       repository if compatibility with SHA-1 is no longer needed.
> > 3. Maintainability throughout the process.
> >    a. The object format is kept simple and consistent.
> >    b. Creation of a generalized repository conversion tool.
> >
> > Non-Goals
> > ---------
> > 1. Add SHA-256 support to Git protocol. This is valuable and the
> >    logical next step but it is out of scope for this initial design.
> 
> This is a non-goal according to the docs, but now that we have protocol
> v2 in git, perhaps we could start specifying or describing how this
> protocol extension will work?

I have code that does this.  The reason is that the first stage of the
transition code is to implement stage 4 of the transition: that is, a
full SHA-256 implementation without any SHA-1 support.  Implementing it
that way means that we don't have to deal with any of the SHA-1 to
SHA-256 mapping in the first stage of the code.

In order to clone an SHA-256 repo (which the testsuite is completely
broken without), you need to be able to have basic SHA-256 support in
the protocol.  I know this was a non-goal, but the alternative is a an
inability to run the testsuite using SHA-256 until all the code is
merged, which is unsuitable for development.  The transition plan also
anticipates stage 4 (full SHA-256) support before earlier stages, so
this will be required.

I hope to be able to spend some time documenting this in a little bit.
I have documentation for that code in my branch, but I haven't sent it
in yet.

I realize I have a lot of code that has not been sent in yet, but I also
tend to build on my own series a lot, and I probably need to be a bit
better about extracting reusable pieces that can go in independently
without waiting for the previous series to land.

> > [...]
> > 3. Intermixing objects using multiple hash functions in a single
> >    repository.
> 
> But isn't that the goal now per "Translation table" & writing both SHA-1
> and SHA-256 versions of objects?

No, I think this statement is basically that you have to have the entire
repository use all one algorithm under the hood in the .git directory,
translation tables excluded.  I don't think that's controversial.

> > [...]
> > Pack index
> > ~~~~~~~~~~
> > Pack index (.idx) files use a new v3 format that supports multiple
> > hash functions. They have the following format (all integers are in
> > network byte order):
> >
> > - A header appears at the beginning and consists of the following:
> >   - The 4-byte pack index signature: '\377t0c'
> >   - 4-byte version number: 3
> >   - 4-byte length of the header section, including the signature and
> >     version number
> >   - 4-byte number of objects contained in the pack
> >   - 4-byte number of object formats in this pack index: 2
> >   - For each object format:
> >     - 4-byte format identifier (e.g., 'sha1' for SHA-1)
> 
> So, given that we have 4-byte limit and have decided on SHA-256 are we
> just going to call this 'sha2'? That might be confusingly ambiguous
> since SHA2 is a standard with more than just SHA-256, maybe 's256', or
> maybe we should give this 8 bytes with trailing \0s so we can have
> "SHA-1\0\0\0" and "SHA-256\0"?

This is the format_version field in struct git_hash_algo.

For SHA-1, I have 0x73686131, which is "sha1", big-endian, and for
SHA-256, I have 0x73323536, which is "s256", big-endian.  The former is
in the codebase already; the latter, in my hash-impl branch.

If people have objections, we can change this up until we merge the pack
index v3 code (which is not yet finished).  It needs to be unique, and
that's it.  We could specify 0x00000001 and 0x00000002 if we wanted,
although I feel the values I mentioned above are self-documenting, which
is desirable.

> > [...]
> > - The trailer consists of the following:
> >   - A copy of the 20-byte SHA-256 checksum at the end of the
> >     corresponding packfile.
> >
> >   - 20-byte SHA-256 checksum of all of the above.
> 
> We need to update both of these to 32 byte, right? Or are we planning to
> truncate the checksums?
> 
> This seems like just a mistake when we did s/NewHash/SHA-256/g, but then
> again it was originally "20-byte NewHash checksum" ever since 752414ae43
> ("technical doc: add a design doc for hash function transition",
> 2017-09-27), so what do we mean here?

Yes, this will be 32 bytes.  The code I have uses 32 bytes, because
truncating it means that we have to write special code just for that
case, which seems silly.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

  parent reply	other threads:[~2018-08-24  1:40 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23 14:02 Questions about the hash function transition Ævar Arnfjörð Bjarmason
2018-08-23 14:27 ` Junio C Hamano
2018-08-23 15:20   ` Ævar Arnfjörð Bjarmason
2018-08-23 16:13     ` Junio C Hamano
2018-08-24  1:40 ` brian m. carlson [this message]
2018-08-24  1:54   ` Jonathan Nieder
2018-08-24  4:47     ` brian m. carlson
2018-08-24  4:52       ` Jonathan Nieder
2018-08-24  1:47 ` Jonathan Nieder
2018-08-28 12:04   ` Johannes Schindelin
2018-08-28 12:49     ` Derrick Stolee
2018-08-28 17:12       ` Jonathan Nieder
2018-08-28 17:11     ` Jonathan Nieder
2018-08-29 13:09       ` Johannes Schindelin
2018-08-29 13:27         ` Derrick Stolee
2018-08-29 14:43           ` Derrick Stolee
2018-08-29  9:13   ` How is the ^{sha256} peel syntax supposed to work? Ævar Arnfjörð Bjarmason
2018-08-29 17:51     ` Stefan Beller
2018-08-29 17:59       ` Jonathan Nieder
2018-08-29 18:34         ` Stefan Beller
2018-08-29 18:41         ` Ævar Arnfjörð Bjarmason
2018-08-29 19:12           ` Jonathan Nieder
2018-08-29 19:37             ` Ævar Arnfjörð Bjarmason
2018-08-29 20:46               ` Jonathan Nieder
2018-08-29 23:45                 ` Jeff King
2018-08-29 20:53             ` Junio C Hamano
2018-08-29 21:01               ` Jonathan Nieder
2018-08-29 17:56     ` Jonathan Nieder
2018-08-24  2:51 ` Questions about the hash function transition Jonathan Nieder
2018-08-28 13:50 ` Ævar Arnfjörð Bjarmason
2018-08-28 14:15   ` Edward Thomson
2018-08-28 15:02     ` Ævar Arnfjörð Bjarmason
2018-08-28 15:45     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180824014007.GF535143@genre.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=demerphq@gmail.com \
    --cc=ethomson@edwardthomson.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=stolee@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).