git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>, git@vger.kernel.org
Cc: gitster@pobox.com, gregory.szorc@gmail.com
Subject: Re: [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire`
Date: Thu, 22 Sep 2022 10:14:26 -0400	[thread overview]
Message-ID: <63a1c3d4-eff3-af10-4263-058c88e74594@github.com> (raw)
In-Reply-To: <cover.1663638929.git.me@ttaylorr.com>

On 9/19/2022 9:55 PM, Taylor Blau wrote:
> This series fixes a pair of bugs that were originally pointed out by
> Gregory Szorc in [1].
> 
> Namely, that both `git multi-pack-index repack` and `git
> multi-pack-index expire` can cause us to "absorb" the cruft pack,
> distributing its objects into a new pack, and removing its metadata.
> 
> This is worth avoiding, since even though it doesn't result in object
> corruption, this bug removes semi-important metadata contained in the
> .mtimes file, which controls how fast objects leave the repository
> during a pruning GC.
> 
> This series teaches both sub-commands to avoid any cruft pack(s),
> preserving their metadata.

I gave these patches a careful review. They all look correct and
are closing possible gaps in this mixed mode of repacking with
different strategies.

I do think the 'expire' situation is very rare, but it's best to
be safe here.

Overall, the end result is that users could set up their background
maintenance as follows:

 1. maintenance.incremental-repack.schedule=daily to do an
    incremental repack every night.

 2. maintenance.gc.schedule=weekly to do a GC once a week.

 3. With gc.cruftPacks=true and a gc.pruneExpire value, the cruft
    packs will be written on that weekly job, only expiring old
    enough objects.

Before this series, that weekly GC creating cruft packs ran the
(very likely) risk of those cruft packs being rewritten with the
rest of the new object data, losing the .mtimes data.

(While the proposed schedule above is now possible, I don't
recommend it as a default. Not only will it delete objects
automatically, but the GC task is very expensive for very large
repositories and the incremental-repack task was explicitly
created to avoid that huge cost in those cases.)

Thanks for this update!
-Stolee

      parent reply	other threads:[~2022-09-22 14:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
2022-09-20  1:55 ` [PATCH 1/7] Documentation/git-multi-pack-index.txt: fix typo Taylor Blau
2022-09-20  1:55 ` [PATCH 2/7] Documentation/git-multi-pack-index.txt: clarify expire behavior Taylor Blau
2022-09-20  1:55 ` [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack Taylor Blau
2022-09-22 14:08   ` Derrick Stolee
2022-09-20  1:55 ` [PATCH 4/7] midx.c: avoid cruft packs with `repack --batch-size=0` Taylor Blau
2022-09-20  1:55 ` [PATCH 5/7] midx.c: replace `xcalloc()` with `CALLOC_ARRAY()` Taylor Blau
2022-09-20  1:55 ` [PATCH 6/7] midx.c: remove unnecessary loop condition Taylor Blau
2022-09-20  1:55 ` [PATCH 7/7] midx.c: avoid cruft packs with non-zero `repack --batch-size` Taylor Blau
2022-09-20 14:39 ` [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Christian Couder
2022-09-22 14:14 ` Derrick Stolee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63a1c3d4-eff3-af10-4263-058c88e74594@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=gregory.szorc@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).