git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: git@vger.kernel.org
Subject: Re: Repacking a repository uses up all available disk space
Date: Sun, 12 Jun 2016 17:38:04 -0400	[thread overview]
Message-ID: <20160612213804.GA5428@sigill.intra.peff.net> (raw)
In-Reply-To: <20160612212514.GA4584@gmail.com>

On Sun, Jun 12, 2016 at 05:25:14PM -0400, Konstantin Ryabitsev wrote:

> Hello:
> 
> I have a problematic repository that:
> 
> - Takes up 9GB on disk
> - Passes 'git fsck --full' with no errors
> - When cloned with --mirror, takes up 38M on the target system

Cloning will only copy the objects that are reachable from the refs. So
presumably the other 8.9GB is either reachable from reflogs, or not
reachable at all (due to rewinding history or deleting branches).

> - When attempting to repack, creates millions of files and eventually
>   eats up all available disk space

That means these objects fall into the unreachable category. Git will
prune unreachable loose objects after a grace period based on the
filesystem mtime of the objects; the default is 2 weeks.

For unreachable packed objects, their mtime is jumbled in with the rest
of the objects in the packfile.  So Git's strategy is to "eject" such
objects from the packfiles into individual loose objects, and let them
"age out" of the grace period individually.

Generally this works just fine, but there are corner cases where you
might have a very large number of such objects, and the loose storage is
much more expensive than the packed (e.g., because each object is stored
individually, not as a delta).

It sounds like this is the case you're running into.

The solution is to lower the grace period time, with something like:

  git gc --prune=5.minutes.ago

or even:

  git gc --prune=now

That will prune the unreachable objects immediately (and the packfile
ejector is smart enough to skip ejecting any file that would just get
deleted immediately anyway).

-Peff

  reply	other threads:[~2016-06-12 21:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-12 21:25 Repacking a repository uses up all available disk space Konstantin Ryabitsev
2016-06-12 21:38 ` Jeff King [this message]
2016-06-12 21:54   ` Konstantin Ryabitsev
2016-06-12 22:13     ` Jeff King
2016-06-13  0:24       ` Duy Nguyen
2016-06-13  4:58         ` Jeff King
2016-06-13  1:43       ` Nasser Grainawi
2016-06-13  4:33         ` [PATCH 0/3] repack --keep-unreachable Jeff King
2016-06-13  4:33           ` [PATCH 1/3] repack: document --unpack-unreachable option Jeff King
2016-06-13  4:36           ` [PATCH 2/3] repack: add --keep-unreachable option Jeff King
2016-06-13  4:38           ` [PATCH 3/3] repack: extend --keep-unreachable to loose objects Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160612213804.GA5428@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=konstantin@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).