From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [TOPIC 2/8] State of SHA-256 transition
Date: Thu, 29 Sep 2022 15:19:58 -0400 [thread overview]
Message-ID: <YzXv3r1OONqxAvih@nand.local> (raw)
In-Reply-To: <YzXvMRc6X60kjVeY@nand.local>
# SHA-256 transition (brian)
- (brian) Functional version of "state four" implementation with only
SHA-256 in the repository
- Interop work (to use sha1 and sha256) is mostly stalled, brian is
mostly not working on it at the moment
- Current implementation is partially functional, though failing a lot
of tests. Can write SHA-256 objects into the repo, according to the
transition, will write a loose mapping between SHA-1 and SHA-256,
along with index v3 with the hashes for both
- When you index a pack, computes both hashes and stores them in the
loose object store or pack
- Tricky part is when you're indexing a pack, you don't always get all
blobs before all trees, before all commits, etc.
- In order to rewrite a commit from SHA-256 -> SHA-1, you need all
reachable objects before in order to compute the hash. Try to look up
in a temporary lookup table ahead of time, and lazily hash the object
we're going to get and come back to it later.
- "Rewind the pack" to compute the proper objects, which works
- For submodules (currently unwritten), going to send both hashes over
the wire, but unfortunately no way to validate those in real time. If
your submodules are checked out, rewritten automatically.
- brian working on it slowly as they get to it, hopes that their
employer will devote more time to it
- Wants to also work on libgit2 at the same time, since it doesn't yet
understand SHA-256, though they hope that somebody else will work on
it, since they are tired of writing SEGVs :-).
- (demetr): what if you have a remote that speaks only SHA-1?
- Goal is to have that information come over the pipe, and rewrite
into SHA-256 upon entering the new objects into the repository
- (demetr): can you then push a converted-into-SHA-256 repository back
to a SHA-1 repo
- Goal is to be able to do that, unless you have a SHA-1 collision,
in which case it won't work.
- No major hosting platform yet supports only SHA-256 repositories,
though maybe Gitolite and CGit do
- (Peff): so, in the worst case, index-pack takes twice as long?
- brian: depends on how many are blob objects, since only takes a
single pass
- Will try to rewrite objects in as few passes as possible
- May need multiple passes in order to visit objects in topological
order
- Actually: worst case is N where N is the maximum tree depth
- (Stolee): what you really need is reverse-topo order on the object
graph
- brian: yes, would be nice if the server sent them in that order.
But the server doesn't know how to do that.
- (Emily): so for something like shallow/partial-clone, the server needs
to be able to do SHA-256 for you to compute it yourself?
- brian: there will be a capability, since data needs to come over
the pipe for submodules, and could be extended for shallow and
partial clones as well. Would fit into protocol v2, and will be
essential for submodules, so will have to exist regardless.
- Hopefully server has that information, though how that expensive
will be to compute is highly dependent.
- (jrn): submodules have to be updated, do you have an idea of what that
protocol change will look like?
- brian: fuzzy idea, but nothing concrete yet
- (jrn): this reminds me of the early days of partial clones where we
talked about "promised" objects at the edge and associated metadata
- (Toon): so no interop, but is there a way to do a single step
conversion from SHA-1 to SHA-256?
- brian: yes, you can use fast-export and fast-import. Currently any
signatures references are broken, but in the future would like to
update them (that code exists, but it hasn't been upstreamed)
- doesn't quite work with smoothly submodules, since you have to
rewrite them first, then generate a set of marks, and then export
and import
- verified with git/git, resulting index isn't substantially larger
(basically 32 bytes per object, along with slightly larger commit
and tree objects)
- (demetr): Could be significantly larger if you have a zillion commits
- brian: we'd have other problems before then :-).
- (Elijah): common in commit messages to refer back to earlier commits.
Do we want to rewrite those?
- brian: maybe, depends on future plans if/when we deprecate earlier
hash algos
- (jrn): Don't have a good way to retroactively change commit
messages, but we do have git notes. First instinct is to use notes
for this kind of historical reference info
- (Terry): annotated tags?
- (Elijah): filter-repo does this kind of commit message munging
next prev parent reply other threads:[~2022-09-29 19:20 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-29 19:17 Notes from the Git Contributor's Summit, 2022 Taylor Blau
2022-09-29 19:19 ` [TOPIC 1/8] Bundle URIs Taylor Blau
2022-09-29 19:19 ` Taylor Blau [this message]
2022-09-29 19:20 ` [TOPIC 3/8] Merge ORT timeline Taylor Blau
2022-09-29 19:20 ` [TOPIC 4/8] Commit `--filter`'s Taylor Blau
2022-09-29 19:21 ` [TOPIC 5/8] Server side merges and rebases Taylor Blau
2022-09-29 19:21 ` [TOPIC 6/8] State of sparsity work Taylor Blau
2022-09-29 19:21 ` [TOPIC 7/8] Speeding up the connectivity check Taylor Blau
2022-09-29 19:22 ` [TOPIC 8/8] Using Git securely in shared services Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YzXv3r1OONqxAvih@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).