git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [TOPIC 2/8] State of SHA-256 transition
Date: Thu, 29 Sep 2022 15:19:58 -0400	[thread overview]
Message-ID: <YzXv3r1OONqxAvih@nand.local> (raw)
In-Reply-To: <YzXvMRc6X60kjVeY@nand.local>

# SHA-256 transition (brian)

- (brian) Functional version of "state four" implementation with only
	SHA-256 in the repository
- Interop work (to use sha1 and sha256) is mostly stalled, brian is
	mostly not working on it at the moment
- Current implementation is partially functional, though failing a lot
	of tests.  Can write SHA-256 objects into the repo, according to the
	transition, will write a loose mapping between SHA-1 and SHA-256,
	along with index v3 with the hashes for both
- When you index a pack, computes both hashes and stores them in the
	loose object store or pack
- Tricky part is when you're indexing a pack, you don't always get all
	blobs before all trees, before all commits, etc.
- In order to rewrite a commit from SHA-256 -> SHA-1, you need all
	reachable objects before in order to compute the hash. Try to look up
	in a temporary lookup table ahead of time, and lazily hash the object
	we're going to get and come back to it later.
- "Rewind the pack" to compute the proper objects, which works
- For submodules (currently unwritten), going to send both hashes over
	the wire, but unfortunately no way to validate those in real time. If
	your submodules are checked out, rewritten automatically.
- brian working on it slowly as they get to it, hopes that their
	employer will devote more time to it
- Wants to also work on libgit2 at the same time, since it doesn't yet
	understand SHA-256, though they hope that somebody else will work on
	it, since they are tired of writing SEGVs :-).
- (demetr): what if you have a remote that speaks only SHA-1?
	 - Goal is to have that information come over the pipe, and rewrite
		 into SHA-256 upon entering the new objects into the repository
- (demetr): can you then push a converted-into-SHA-256 repository back
	to a SHA-1 repo
	 - Goal is to be able to do that, unless you have a SHA-1 collision,
		 in which case it won't work.
	 - No major hosting platform yet supports only SHA-256 repositories,
		 though maybe Gitolite and CGit do
- (Peff): so, in the worst case, index-pack takes twice as long?
	 - brian: depends on how many are blob objects, since only takes a
		 single pass
	 - Will try to rewrite objects in as few passes as possible
	 - May need multiple passes in order to visit objects in topological
		 order
	 - Actually: worst case is N where N is the maximum tree depth
- (Stolee): what you really need is reverse-topo order on the object
	graph
	 - brian: yes, would be nice if the server sent them in that order.
		 But the server doesn't know how to do that.
- (Emily): so for something like shallow/partial-clone, the server needs
	to be able to do SHA-256 for you to compute it yourself?
	 - brian: there will be a capability, since data needs to come over
		 the pipe for submodules, and could be extended for shallow and
		 partial clones as well. Would fit into protocol v2, and will be
		 essential for submodules, so will have to exist regardless.
	 - Hopefully server has that information, though how that expensive
		 will be to compute is highly dependent.
- (jrn): submodules have to be updated, do you have an idea of what that
	protocol change will look like?
	 - brian: fuzzy idea, but nothing concrete yet
	 - (jrn): this reminds me of the early days of partial clones where we
		 talked about "promised" objects at the edge and associated metadata
- (Toon): so no interop, but is there a way to do a single step
	conversion from SHA-1 to SHA-256?
	 - brian: yes, you can use fast-export and fast-import. Currently any
		 signatures references are broken, but in the future would like to
		 update them (that code exists, but it hasn't been upstreamed)
	 - doesn't quite work with smoothly submodules, since you have to
		 rewrite them first, then generate a set of marks, and then export
		 and import
	 - verified with git/git, resulting index isn't substantially larger
		 (basically 32 bytes per object, along with slightly larger commit
		 and tree objects)
- (demetr): Could be significantly larger if you have a zillion commits
	 - brian: we'd have other problems before then :-).
- (Elijah): common in commit messages to refer back to earlier commits.
	Do we want to rewrite those?
	 - brian: maybe, depends on future plans if/when we deprecate earlier
		 hash algos
	 - (jrn): Don't have a good way to retroactively change commit
		 messages, but we do have git notes. First instinct is to use notes
		 for this kind of historical reference info
	 - (Terry): annotated tags?
	 - (Elijah): filter-repo does this kind of commit message munging

  parent reply	other threads:[~2022-09-29 19:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-29 19:17 Notes from the Git Contributor's Summit, 2022 Taylor Blau
2022-09-29 19:19 ` [TOPIC 1/8] Bundle URIs Taylor Blau
2022-09-29 19:19 ` Taylor Blau [this message]
2022-09-29 19:20 ` [TOPIC 3/8] Merge ORT timeline Taylor Blau
2022-09-29 19:20 ` [TOPIC 4/8] Commit `--filter`'s Taylor Blau
2022-09-29 19:21 ` [TOPIC 5/8] Server side merges and rebases Taylor Blau
2022-09-29 19:21 ` [TOPIC 6/8] State of sparsity work Taylor Blau
2022-09-29 19:21 ` [TOPIC 7/8] Speeding up the connectivity check Taylor Blau
2022-09-29 19:22 ` [TOPIC 8/8] Using Git securely in shared services Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzXv3r1OONqxAvih@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).