git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Kaushik Srenevasan <kaushik@twitter.com>
To: git@vger.kernel.org
Subject: [RFC] Extending git-replace
Date: Mon, 13 Jan 2020 21:33:19 -0800
Message-ID: <CAN_uzpK1m42J19Xi8oc3Dwmhk9qF1n4hFrBVqD+0RHZuZ_15Jw@mail.gmail.com> (raw)

We’ve been trying to get rid of objects larger than a certain size
from one of our repositories that contains tens of thousands of
branches and hundreds of thousands of commits. While we’re able to
accomplish this using BFG[0] , it results in ~ 90% of the repository’s
history being rewritten. This presents the following problems
1. There are various systems (Phabricator for one) that use the commit
hash as a key in various databases. Rewriting history will require
that we update all of these systems.
2. We’ll have to force everyone to reclone a copy of this repository.

I was looking through the git code base to see if there is a way
around it when I chanced upon `git-replace`. While the basic idea of
`git-replace` is what I am looking for, it doesn’t quite fit the bill
due to the `--no-replace-objects` switch, the `GIT_NO_REPLACE_OBJECTS`
environment variable, and `--no-replace-objects` being the default for
certain git commands. Namely fsck, upload-pack, pack/unpack-objects,
prune and index-pack. That Git may still try to load a replaced object
when a git command is run with the `--no-replace-objects` option
prevents me from removing it from the ODB permanently. Not being able
to run prune and fsck on a repository where we’ve deleted the object
that’s been replaced with git-replace effectively rules this option
out for us.

A feature that allowed such permanent replacement (say a
`git-blacklist` or a `git-replace --blacklist`) might work as follows:
1. Blacklisted objects are stored as references under a new namespace
-- `refs/blacklist`.
2. The object loader unconditionally translates a blacklisted OID into
the OID it’s been replaced with.
3. The `+refs/blacklist/*:refs/blacklist/*` refspec is implicitly
always a part of fetch and push transactions.

This essentially turns the blacklist references namespace into an
additional piece of metadata that gets transmitted to a client when a
repository is cloned and is kept updated automatically.

I’ve been playing around with a prototype I wrote and haven’t observed
any breakage yet. I’m writing to seek advice on this approach and to
understand if this is something (if not in its current form, some
version of it) that has a chance of making it into the product if we
were to implement it. Happy to write up a more detailed design and
share my prototype as a starting point for discussion.

                                           -- Kaushik

[0] https://rtyley.github.io/bfg-repo-cleaner/

             reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14  5:33 Kaushik Srenevasan [this message]
2020-01-14  6:55 ` Elijah Newren
2020-01-14 19:11   ` Jonathan Tan
2020-01-16  3:30   ` Kaushik Srenevasan
2020-01-14 18:19 ` David Turner
2020-01-14 19:03   ` Jonathan Tan
2020-01-14 20:39     ` Elijah Newren
2020-01-14 21:57       ` Jonathan Tan
2020-01-14 22:46         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN_uzpK1m42J19Xi8oc3Dwmhk9qF1n4hFrBVqD+0RHZuZ_15Jw@mail.gmail.com \
    --to=kaushik@twitter.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git