git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	haoyurenzhuxia@gmail.com, git@vger.kernel.org,
	derrickstolee@github.com, dyroneteng@gmail.com
Subject: Re: [RFC PATCH] midx.c: clean up .rev file
Date: Wed, 22 Jun 2022 14:13:47 -0400	[thread overview]
Message-ID: <YrNb2x2/7Z31XnFJ@nand.local> (raw)
In-Reply-To: <xmqqo7yk60vf.fsf@gitster.g>

On Wed, Jun 22, 2022 at 10:53:24AM -0700, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
> > On Wed, Jun 22 2022, haoyurenzhuxia@gmail.com wrote:
> >
> >> From: Xia XiaoWen <haoyurenzhuxia@gmail.com>
> >>
> >> The command: `git multi-pack-index write --bitmap` will create 3
> >> files in `objects/pack/`:
> >>     * multi-pack-index
> >>     * multi-pack-index-*.bitmap
> >>     * multi-pack-index-*.rev
> >>
> >> But if the command is terminated by the user (such as Ctl-C) or
> >> the system, the midx reverse index file (`multi-pack-index-*.rev`)
> >> is not removed and still exists in `objects/pack/`:
> >>
> >>     $ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap
> >>     Selecting bitmap commits: 133020, done.
> >>     Building bitmaps:   0% (3/331)
> >>     ^C^C
> >>
> >>     $ tree objects/pack/
> >>     objects/pack/
> >>     ├── multi-pack-index-3b048d1b965842cd866e10b6ec1a3035dbede0a5.rev
> >>     ├── pack-b7d425f1b01727d5f364f5d9fbab2d1900fcd5c0.idx
> >>     └── pack-b7d425f1b01727d5f364f5d9fbab2d1900fcd5c0.pack
> >> ...
> > Also, the commit message doesn't really say *why*, i.e. in cmd_repack()
> > we've suffered from this already, but don't we have "git gc" cleaning
> > these up? Maybe not (I didn't check), but maybe that was the previous
> > assumption...
>
> Exactly my thought.  Well said.

`repack` relies on `git multi-pack-index write --bitmap` to do the
actual work here. A few things worth noting:

  - the MIDX file itself is written using a lock_file, so it is
    atomically moved into place, and the temporary file is either
    removed, or cleaned up automatically with a sigchain handler on
    process death

  - the bitmap (written in bitmap_writer_finish(), which is the path for
    both single- and multi-pack bitmaps) is written to a temporary file
    and moved into place after the bitmaps are written.

    ...but this temporary file isn't automatically cleaned up, so it
    could stick around after process death. Luckily the race window here
    is pretty small, since all of the bitmaps have been computed already
    and are held in memory.

    This is probably worth a cleanup on its own, too.

  - unless GIT_TEST_MIDX_WRITE_REV=1 is in your environment, we won't
    *write* a .rev file, hence this is pretty rare to deal with in
    practice.

So I think there are two things worth doing here:

  - make sure that the temporary file used to stage the .bitmap is a
    lock_file
  - use a temporary file to stage the .rev file (when forced to write
    one), and ensure that that too is a lock_file

This is mostly the same as Ævar's original suggestion, just with a
larger scope. But I agree that it's the right direction nonetheless.

> The _only_ case that might matter is if the next "write --bitmap" gets
> confused ir there is a leftover file(s) from a previous run, but then
> such a bug should be fixed on the reading side, I would think.

It shouldn't in practice. We'll recognize that the .rev file is garbage
if it's checksum doesn't match the MIDX's, and we squashed the bug where
changing the object *order* (but not the objects themselves) doesn't
change the MIDX checksum (it does now).

We won't write over an existing .rev file with the right name, but we'll
always prefer to read the .rev data from the MIDX itself, if it exists.

Thanks,
Taylor

  reply	other threads:[~2022-06-22 18:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 11:50 [RFC PATCH] midx.c: clean up .rev file haoyurenzhuxia
2022-06-22 15:56 ` Ævar Arnfjörð Bjarmason
2022-06-22 17:53   ` Junio C Hamano
2022-06-22 18:13     ` Taylor Blau [this message]
2022-06-22 19:58       ` Junio C Hamano
2022-06-22 21:31         ` Taylor Blau
2022-06-27  5:05           ` Xiaowen Xia
2022-06-23 12:38       ` Teng Long
2022-06-27  3:53   ` Xiaowen Xia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrNb2x2/7Z31XnFJ@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=haoyurenzhuxia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).