git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Repacking a repository uses up all available disk space
Date: Mon, 13 Jun 2016 00:58:31 -0400	[thread overview]
Message-ID: <20160613045831.GA3950@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8Awd2oCm0puh=bnKu9snOZr85+kVRe0D5DUhP6NhmiwcQ@mail.gmail.com>

On Mon, Jun 13, 2016 at 07:24:51AM +0700, Duy Nguyen wrote:

> >> - git fsck --full
> >> - git repack -Adl -b --pack-kept-objects
> >> - git pack-refs --all
> >> - git prune
> >>
> >> The reason it's split into repack + prune instead of just gc is because
> >> we use alternates to save on disk space and try not to prune repos that
> >> are used as alternates by other repos in order to avoid potential
> >> corruption.
> 
> Isn't this what extensions.preciousObjects is for? It looks like prune
> just refuses to run in precious objects mode though, and repack is
> skipped by gc, but if that repack command works, maybe we should do
> something like that in git-gc?

Sort of. preciousObjects is a fail-safe so that you do not ever
accidentally run an object-deleting operation where you shouldn't (e.g.,
in the shared repository used by others as an alternate). So the
important step there is that before running "repack", you would want to
make sure you have taken into account the reachability of anybody
sharing from you.

So you could do something like (in your shared repository):

  git config core.repositoryFormatVersion 1
  git config extension.preciousObjects true

  # this will fail, because it's dangerous!
  git gc

  # but we can do it safely if we take into account the other repos
  for repo in $(somehow_get_list_of_shared_repos); do
	git fetch $repo +refs/*:refs/shared/$repo/*
  done
  git config extension.preciousObjects false
  git gc
  git config extension.preciousObjects true

So it really is orthogonal to running the various gc commands yourself;
it's just here to prevent you shooting yourself in the foot.

It may still be useful in such a case to split up the commands in your
own script, though. In my case, you'll note that the commands above are
racy (what happens if somebody pushes a reference to a shared object
between your fetch and the gc invocation?). So we use a custom "repack
-k" to get around that (it just keeps everything).

You _could_ have gc automatically switch to "-k" in a preciousObjects
repository. That's at least safe. But note that it doesn't really solve
all of the problems (you do still want to have ref tips from the leaf
repositories, because it affects things like bitmaps, and packing
order).

> BTW Jeff, I think we need more documentation for
> extensions.preciousObjects. It's only documented in technical/ which
> is practically invisible to all users. Maybe
> include::repository-version.txt in config.txt, or somewhere close to
> alternates?

I'm a little hesitant to document it for end users because it's still
pretty experimental. In fact, even we are not using it at GitHub
currently. We don't have a big problem with "oops, I accidentally ran
something destructive in the shared repository", because nothing except
the maintenance script ever even goes into the shared repository.

The reason I introduced it in the first place is that I was
experimenting with the idea of actually symlinking "objects/" in the
leaf repos into the shared repository. That eliminates the object
writing in the "fetch" step above, which can be a bottleneck in some
cases (not just the I/O, but the shared repo ends up having a _lot_ of
refs, and fetch can be pretty slow).

But in that case, anything that deletes an object in one of the leaf
repos is very dangerous, as it has no idea that its object store is
shared with other leaf repos. So I really wanted a fail safe so that
running "git gc" wasn't catastrophic.

I still think that's a viable approach, but my experiments got
side-tracked and I never produced anything worth looking at. So until
there's something end users can actually make use of, I'm hesitant to
push that stuff into the regular-user documentation. Anybody who is
playing with it at this point probably _should_ be familiar with what's
in Documentation/technical.

-Peff

  reply	other threads:[~2016-06-13  4:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-12 21:25 Repacking a repository uses up all available disk space Konstantin Ryabitsev
2016-06-12 21:38 ` Jeff King
2016-06-12 21:54   ` Konstantin Ryabitsev
2016-06-12 22:13     ` Jeff King
2016-06-13  0:24       ` Duy Nguyen
2016-06-13  4:58         ` Jeff King [this message]
2016-06-13  1:43       ` Nasser Grainawi
2016-06-13  4:33         ` [PATCH 0/3] repack --keep-unreachable Jeff King
2016-06-13  4:33           ` [PATCH 1/3] repack: document --unpack-unreachable option Jeff King
2016-06-13  4:36           ` [PATCH 2/3] repack: add --keep-unreachable option Jeff King
2016-06-13  4:38           ` [PATCH 3/3] repack: extend --keep-unreachable to loose objects Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160613045831.GA3950@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=konstantin@linuxfoundation.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).