git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Gregory Szorc <gregory.szorc@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>, git@vger.kernel.org, stolee@gmail.com
Subject: Re: Race condition between repack and loose-objects maintenance task
Date: Wed, 29 Jun 2022 23:19:09 -0400	[thread overview]
Message-ID: <Yr0WLebMfBXZ1K7D@nand.local> (raw)
In-Reply-To: <CAKQoGanPBec6wRO6uWrETaoJXdszpjRWytXaJwx6jw0mrrj-gQ@mail.gmail.com>

On Wed, Jun 29, 2022 at 05:44:25PM -0700, Gregory Szorc wrote:
> Revising my initial post, not running `loose-objects` is insufficient:
> we also need to prevent `incremental-repack` from running while a `git
> gc` / `git repack` is in progress. If an `incremental-repack` runs
> concurrently with `git repack`, `repack` can error after bitmap
> generation but before the temp packfile is renamed with "could not
> find 'pack-*.pack'." I suspect this has to do with
> `incremental-repack` deleting packs that were initially in place when
> `git repack` started running. But I haven't looked into where exactly
> repack is failing.

Yeah, you would need to prevent other writers from removing packs while
writing a cruft pack.

I think the canonical way to do this would be to let `git
maintenance` use its own locking to ensure that it wasn't trying to
remove packs while simultaneously generating a cruft pack.

But assuming that the cruft pack generation is running independently and
concurrently with the maintenance incremental-repack task (which will
delete packs in the background), then there is definitely a TOCTOU race
there.

The race is that `repack` will signal to `pack-objects` which packs will
stay and which packs will be deleted during the repack. To generate a
cruft pack, the packs that stay are formed by doing a reachability
traversal and writing a pack that contains all reachable objects. That
way anything that is in `all-packs \ reachable` (which form the contents
of the cruft pack) are going to be just the unreachable objects.

But if a pack that was going to be deleted by repack *after* generating
the cruft pack disappears while `repack` is running (particularly after
it starts, but before it generates the cruft pack), then the cruft pack
generation can't proceed, since it has no idea what objects it needs to
add.

Namely, if it is told that pack P is going to be deleted, any objects in
P which don't appear in a pack that *isn't* going to be deleted need to
be saved. If we don't have a copy of P anymore, then there is no way to
come up with what that set of objects is, meaning that it's impossible
to generate the cruft pack.

I should mention, BTW, that this isn't a problem unique to cruft packs.
Geometric repacking, which uses `pack-objects --stdin-packs` has a
similar issue where if a pack being combined is removed from underneath
while `repack` is running, `pack-objects` cannot successfully generate
the pack.

> However, I think there is yet another bug at play: running
> `incremental-repack` appears to be able to repack the cruft packfile.
> In doing so, it deletes its .mtimes file and the metadata inside.

That sounds like a bug to me. I'll take a look into it and see what I
can find.

Thanks,
Taylor

  reply	other threads:[~2022-06-30  3:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 16:55 Race condition between repack and loose-objects maintenance task Gregory Szorc
2022-06-29 17:03 ` Taylor Blau
2022-06-29 17:10   ` Taylor Blau
2022-06-29 17:16   ` Gregory Szorc
2022-06-29 17:21     ` Taylor Blau
2022-06-30  0:44       ` Gregory Szorc
2022-06-30  3:19         ` Taylor Blau [this message]
2022-06-30  3:23           ` Taylor Blau
2022-06-30 20:12             ` Junio C Hamano
2022-07-05 18:43               ` Gregory Szorc
2022-07-06  8:50                 ` Ævar Arnfjörð Bjarmason
2022-07-20  1:40             ` Gregory Szorc
2022-07-20  9:52               ` Ævar Arnfjörð Bjarmason
2022-07-26 16:22                 ` Gregory Szorc
2022-07-26 18:11                   ` Ævar Arnfjörð Bjarmason
2022-07-20  1:41             ` Gregory Szorc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yr0WLebMfBXZ1K7D@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gregory.szorc@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).