git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC] Extending git-replace
@ 2020-01-14  5:33 Kaushik Srenevasan
  2020-01-14  6:55 ` Elijah Newren
  2020-01-14 18:19 ` David Turner
  0 siblings, 2 replies; 9+ messages in thread
From: Kaushik Srenevasan @ 2020-01-14  5:33 UTC (permalink / raw)
  To: git

We’ve been trying to get rid of objects larger than a certain size
from one of our repositories that contains tens of thousands of
branches and hundreds of thousands of commits. While we’re able to
accomplish this using BFG[0] , it results in ~ 90% of the repository’s
history being rewritten. This presents the following problems
1. There are various systems (Phabricator for one) that use the commit
hash as a key in various databases. Rewriting history will require
that we update all of these systems.
2. We’ll have to force everyone to reclone a copy of this repository.

I was looking through the git code base to see if there is a way
around it when I chanced upon `git-replace`. While the basic idea of
`git-replace` is what I am looking for, it doesn’t quite fit the bill
due to the `--no-replace-objects` switch, the `GIT_NO_REPLACE_OBJECTS`
environment variable, and `--no-replace-objects` being the default for
certain git commands. Namely fsck, upload-pack, pack/unpack-objects,
prune and index-pack. That Git may still try to load a replaced object
when a git command is run with the `--no-replace-objects` option
prevents me from removing it from the ODB permanently. Not being able
to run prune and fsck on a repository where we’ve deleted the object
that’s been replaced with git-replace effectively rules this option
out for us.

A feature that allowed such permanent replacement (say a
`git-blacklist` or a `git-replace --blacklist`) might work as follows:
1. Blacklisted objects are stored as references under a new namespace
-- `refs/blacklist`.
2. The object loader unconditionally translates a blacklisted OID into
the OID it’s been replaced with.
3. The `+refs/blacklist/*:refs/blacklist/*` refspec is implicitly
always a part of fetch and push transactions.

This essentially turns the blacklist references namespace into an
additional piece of metadata that gets transmitted to a client when a
repository is cloned and is kept updated automatically.

I’ve been playing around with a prototype I wrote and haven’t observed
any breakage yet. I’m writing to seek advice on this approach and to
understand if this is something (if not in its current form, some
version of it) that has a chance of making it into the product if we
were to implement it. Happy to write up a more detailed design and
share my prototype as a starting point for discussion.

                                           -- Kaushik

[0] https://rtyley.github.io/bfg-repo-cleaner/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-01-16  3:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14  5:33 [RFC] Extending git-replace Kaushik Srenevasan
2020-01-14  6:55 ` Elijah Newren
2020-01-14 19:11   ` Jonathan Tan
2020-01-16  3:30   ` Kaushik Srenevasan
2020-01-14 18:19 ` David Turner
2020-01-14 19:03   ` Jonathan Tan
2020-01-14 20:39     ` Elijah Newren
2020-01-14 21:57       ` Jonathan Tan
2020-01-14 22:46         ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).