git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Victoria Dye <vdye@github.com>
Subject: Re: [PATCH] midx.c: clear auxiliary MIDX files first
Date: Wed, 26 Oct 2022 09:31:28 -0400	[thread overview]
Message-ID: <143a588a-c98b-733b-2b23-34a87ca89431@github.com> (raw)
In-Reply-To: <Y1jIo1dPl0M2TSHo@coredump.intra.peff.net>

On 10/26/22 1:41 AM, Jeff King wrote:
> On Tue, Oct 25, 2022 at 02:25:20PM -0400, Taylor Blau wrote:
> 
>> Since they were added in c528e17966 (pack-bitmap: write multi-pack
>> bitmaps, 2021-08-31), the routine to remove MIDXs removed the
>> multi-pack-index file itself before removing its associated .bitmap and
>> .rev file(s), if any.
>>
>> This creates a window where a MIDX's .bitmap file exists without its
>> corresponding MIDX. If a reader tries to load a MIDX bitmap during that
>> time, they will get a warning, and the MIDX bitmap code will gracefully
>> degrade.
>>
>> Remove this window entirely by removing the MIDX last, and removing its
>> auxiliary files first.
> 
> We remove that window, but don't we create a new one where a reader may
> see the midx but not the bitmap? That won't generate a warning (it just
> looks like a midx that never had a bitmap generated), but it will cause
> the reader to follow the slow, non-bitmap path.

Yes, this is the worrisome direction. The midx is read first, then that
points to the .bitmap file (based on its trailing hash). If the midx isn't
there, then the .bitmap will not be read.

> Ideally this would just be atomic, but short of stuffing the metadata
> into the same file, we can't do that. But the replacement of the midx
> file itself is atomic, and I'd think everything would (or should at
> least) follow from there.

The interesting case here is that this is in clear_midx_file(), which
is called when repacking to a single pack and no longer needing a midx
file. So it's not using the atomic rewrite from the midx writing code,
but instead the "atomic" deletion.

In this case, a reader will check for the midx first, before looking
for individual packs. Further, the new pack is written, but the old
packs have not been deleted (or the midx would be invalid). So the
new code introduces the window where a midx exists without a bitmap,
so some readers will act as if no bitmap exists on-disk.

This was always possible before, too: the midx could be read by a
reader process before the repack process deletes that file. However,
if the reader does not also gain a handle on the corresponding
.bitmap before the repack process deletes that file, too, then the
reader is also left thinking that no .bitmap exists.

I think the old code is more correct, here. The window is slightly
smaller, and the new code creates a window where the filesystem
doesn't need to change for readers to get an imperfect view of
things.

Aside: in these cases where a .bitmap file is not found for a midx,
do we fall back to trying to find a .bitmap file for a pack-file?
That would assuage most of the concerns here about what happens in
this window where the repack has a new .pack/.bitmap pair but the
old midx still exists (without a .bitmap, depending on timing).

Thanks,
-Stolee


  reply	other threads:[~2022-10-26 13:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 18:25 [PATCH] midx.c: clear auxiliary MIDX files first Taylor Blau
2022-10-26  5:41 ` Jeff King
2022-10-26 13:31   ` Derrick Stolee [this message]
2022-10-26 19:59     ` Taylor Blau
2022-10-27 20:28     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=143a588a-c98b-733b-2b23-34a87ca89431@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).