git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Andreas Krey <a.krey@gmx.de>,
	git@vger.kernel.org, "peff@peff.net" <peff@peff.net>
Subject: Re: Avoid race condition between fetch and repack/gc?
Date: Mon, 16 Mar 2020 08:10:56 -0400	[thread overview]
Message-ID: <759f4b3b-28a7-c002-ae51-5991bf9ad211@gmail.com> (raw)
In-Reply-To: <20200316082348.GA26581@inner.h.apk.li>

On 3/16/2020 4:23 AM, Andreas Krey wrote:
> Hi all,
> 
> we occasionally seeing things like this:
> 
> | DEBUG: 11:25:20: git -c advice.fetchShowForcedUpdates=false fetch --no-show-forced-updates -q --prune

I'm happy to see these options. I hope they are helping you!

> | Warning: Permanently added '[socgit.$company.com]:7999' (RSA) to the list of known hosts.
> | remote: fatal: packfile ./objects/pack/pack-20256f2be3bd51b57e519a9f2a4d3df09f231952.pack cannot be accessed        
This _could_ mean a lot of things, but....

> | error: git upload-pack: git-pack-objects died with error.
> | fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
> | remote: aborting due to possible repository corruption on the remote side.
> | fatal: protocol error: bad pack header
> 
> and when you look in the server repository there is a new packfile dated just around
> that time. It looks like the fetch tries to access a packfile that it assumes to exist,
> but the GC on the server throws it away just in that moment, and thus upload-pack fails.

...your intuition about repacking seems accurate. The important part of the
race condition is likely that the server process read and holds a read handle
on the .idx file, but when looking for the object contents it tries to open
the .pack file which was deleted.

This error is emitted by use_pack() in packfile.c. I'm surprised there is no
fallback here, and we simply die().

The race condition seems to be related to the loop in do_oid_object_info_extended()
in sha1-file.c looping through packs until finding the object in question: it does
not verify that the .pack file is open with a valid handle before terminating the
loop and calling packed_object_info().

Perhaps the fix is to update do_oid_object_info_extended() to "accept" a pack as
the home of the object only after verifying the pack is either open or can be
opened. That seems like the least-invasive fix to me.

The more-invasive fix is to modify the stack from packed_object_info() to
use_pack() to use error messages and return codes instead of die(). This would
still need to affect the loop in do_oid_object_info_extended(), but may be a
better way to handle this situation in general.

Of course, this is a very critical code path, and maybe other community members
have more context as to why we are not already doing this?

> Is there a way to avoid this?
> 
> Should there be, like git repack waiting a bit before deleting old packfiles?

This all depends on how you are managing your server. It is likely that you
could create your own maintenance that handles this for you.

The "git multi-pack-index (expire|repack)" cycle is built to prevent this sort
of issue, but is not yet integrated well with reachability bitmaps. You likely
require the bitmaps to keep your server performance, so that may not be a way
forward for you.

Thanks,
-Stolee

  reply	other threads:[~2020-03-16 12:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-16  8:23 Avoid race condition between fetch and repack/gc? Andreas Krey
2020-03-16 12:10 ` Derrick Stolee [this message]
2020-03-16 17:17   ` Nasser Grainawi
2020-03-16 17:27   ` Jeff King
2020-03-16 23:40     ` Bryan Turner
2020-03-17 18:41       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=759f4b3b-28a7-c002-ae51-5991bf9ad211@gmail.com \
    --to=stolee@gmail.com \
    --cc=a.krey@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).