git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Daniel Klauer <daniel.c.klauer@web.de>
Cc: Jiang Xin <zhiyou.jx@alibaba-inc.com>, git@vger.kernel.org
Subject: Re: bug report: "git pack-redundant --all" crash in minimize()
Date: Wed, 16 Dec 2020 14:00:32 -0500	[thread overview]
Message-ID: <X9pZUELTD9i5TrvQ@coredump.intra.peff.net> (raw)
In-Reply-To: <a46660c3-630c-5573-9ef4-75d273d37767@web.de>

On Wed, Dec 16, 2020 at 02:22:52PM +0100, Daniel Klauer wrote:

> Background: bitbake downloads git repositories during a build process
> and supports caching them locally (in form of bare repos in some
> user-defined directory). This prevents having to re-download them during
> the next build, and also it is a convenient mirroring/backup system in
> case the original URLs stop working.
> 
> As far as I can tell (since I'm not a bitbake developer) the git
> pack-redundant invocation is one of multiple calls meant to improve
> storage (probably minimize disk usage) of the locally cached git repos.
> For reference, please take a look at the other git commands it's
> invoking [1], and at the commit messages of the commits that added these
> invocations [2] [3] [4].
> 
> If doing it that way seems wrong, I'll report the issue to bitbake
> upstream too. Maybe there is a better way to do whatever bitbake wants
> to do here?

Thanks for that context.

I don't think it's _wrong_, in the sense that what they want to do
(remove redundant packs) is a reasonable thing to want. But in practice
I suspect that it rarely helps. It only makes sense if a pack is fully
made redundant by other packs. But that is unlikely to happen after a
fetch, because Git tries not to send objects that already exist. So
while there could be overlap, it's unlikely that full packs are
candidates for deletion. And if any are, then that is probably a sign
that fetch is not being given enough information (e.g., if there are
packs being copied into the repo behind the scenes, make sure that there
are matching refs pointing to their objects, so Git knows it has that
part of the object graph).

For saving space, "git repack -ad" is a much better option. It puts
everything reachable into a single pack, which means:

  - if two packs contain duplicates of an object, we'll end up with only
    a single copy, even if those packs also contained some unique
    objects

  - by putting all objects in the same pack, we have more opportunities
    for delta compression between similar objects

  - we'll drop any unreachable objects completely (presumably this is
    desirable here, but if they're trying to keep objects that don't
    have refs pointing at them as part of some caching scheme, they
    might not; passing "-k" will keep the unreachable objects, too)

Since they're doing other maintenance like "pack-refs", then running
"git gc" may be preferable, as it would cover that, too. Use
"--prune=now" to drop the unreachable objects immediately (as opposed to
giving them a 2-week grace period). Note that there's no equivalent to
repack's "-k" from git-gc", so if they need that, they'll have to invoke
git-repack directly.

-Peff

      reply	other threads:[~2020-12-16 19:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 17:34 bug report: "git pack-redundant --all" crash in minimize() Daniel C. Klauer
2020-12-15 18:21 ` Jeff King
2020-12-15 22:32   ` Junio C Hamano
2020-12-16  9:21   ` Jiang Xin
2020-12-16 10:09     ` [PATCH] pack-redundant: fix crash when one packfile in repo Jiang Xin
2020-12-16 17:32       ` Junio C Hamano
2020-12-17  1:57         ` [PATCH v2] " Jiang Xin
2020-12-17  6:08           ` Junio C Hamano
2020-12-16 18:46       ` [PATCH] " Jeff King
2020-12-16 13:22   ` bug report: "git pack-redundant --all" crash in minimize() Daniel Klauer
2020-12-16 19:00     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X9pZUELTD9i5TrvQ@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=daniel.c.klauer@web.de \
    --cc=git@vger.kernel.org \
    --cc=zhiyou.jx@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).