git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: git gc --auto aquires *.lock files that make a subsequent git-fetch error out
Date: Wed, 12 Jul 2017 16:00:54 -0400	[thread overview]
Message-ID: <20170712200054.mxcabiyttijpbkbb@sigill.intra.peff.net> (raw)
In-Reply-To: <87bmopzbqx.fsf@gmail.com>

On Wed, Jul 12, 2017 at 09:38:46PM +0200, Ævar Arnfjörð Bjarmason wrote:

> In 131b8fcbfb ("fetch: run gc --auto after fetching", 2013-01-26) first
> released with v1.8.2 Jeff changed git-fetch to run "git gc --auto"
> afterwards.
> 
> This means that if you run two git fetches in a row the second one may
> fail because it can't acquire the *.lock files on the remote branches you
> have & which the next git-fetch needs to update.

Is it really "in a row" that's a problem? The second fetch should not
begin until the first one is done, including until its auto-gc exits.
And even with background gc, we do the ref-locking operations first, due
to 62aad1849 (gc --auto: do not lock refs in the background,
2014-05-25).

> I happen to run into this on a git.git which has a lot of remotes (most
> people on-list whose remotes I know about) and fetch them in parallel:
> 
>     $ git config alias.pfetch
>     !parallel 'git fetch {}' ::: $(git remote)

Ah, so it's not in a row. It's parallel. Then yes, you may run into
problems with the gc locks conflicting with real operations. This isn't
really unique to fetch. Any simultaneous operation can run into problems
(e.g., on a busy server repo you may see conflicts between pack-refs and
regular pushes).

> And so would 'git fetch --all':
> 
>     $ GIT_TRACE=1 git fetch --all 2>&1|grep --line-buffered built-in|grep -v rev-list
>     19:31:26.273577 git.c:328               trace: built-in: git 'fetch' '--all'
>     19:31:26.278869 git.c:328               trace: built-in: git 'fetch' '--append' 'origin'
>     19:31:27.993312 git.c:328               trace: built-in: git 'gc' '--auto'
>     19:31:27.995855 git.c:328               trace: built-in: git 'fetch' '--append' 'avar'
>     19:31:29.656925 git.c:328               trace: built-in: git 'gc' '--auto'
> 
> I think those two cases are bugs (but ones which I don't have the
> inclination to chase myself beyond sending this E-Mail). We should be
> running the 'git gc --auto' at the very end of the entire program, not
> after fetching every single remote.
> 
> Passing some env variable (similar to the config we pass via the env) to
> subprograms to make them avoid "git gc --auto" so the main process can
> do it would probably be the most simple solution.

Yes, I agree that's poor. Ideally there would be a command-line option
to tell the sub-fetches not to run auto-gc. It could be done with:

  git -c gc.auto=0 fetch --append ...

Or we could even take the "--append" as a hint not to run auto-gc.

> The more general case (such as with my parallel invocation) is harder to
> solve.

Yes, I don't think it can solved. The most general case is two totally
unrelated processes which know nothing about each other.

> Maybe "git gc --auto" should have a heuristic so it checks whether
> there's been recent activity on the repo, and waits until there's been
> say 60 seconds of no activity, or alternatively if it's waited 600
> seconds and hasn't run gc yet.

That sounds complicated.

> Ideally a "real" invocation like git-fetch would have a way to simply
> steal any *.lock a background "git gc --auto" creates, aborting the gc
> but allowing the "real" invocation to proceed. But that sounds even
> trickier to implement, and might without an extra heuristic on top
> postpone gc indefinitely.

The locks are generally due to ref-packing and reflog expiration.  I
think in the long run, it would be nice to move to a ref store that
didn't need packing, and that could do reflog expiration more
atomically.

I think the way "reflog expire" is done holds the locks for a lot longer
than is strictly necessary, too (it actually computes reachability for
--expire-unreachable on the fly while holding some locks).

-Peff

  parent reply	other threads:[~2017-07-12 20:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26 22:40 [PATCH 0/2] optimizing pack access on "read only" fetch repos Jeff King
2013-01-26 22:40 ` [PATCH 1/2] fetch: run gc --auto after fetching Jeff King
2013-01-27  1:51   ` Jonathan Nieder
     [not found]   ` <87bmopzbqx.fsf@gmail.com>
2017-07-12 20:00     ` Jeff King [this message]
2017-07-12 20:30       ` git gc --auto aquires *.lock files that make a subsequent git-fetch error out Ævar Arnfjörð Bjarmason
2017-07-12 20:43         ` Jeff King
2013-01-26 22:40 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King
2013-01-27 10:27   ` Jonathan Nieder
2013-01-27 20:09     ` Junio C Hamano
2013-01-27 23:20       ` Jonathan Nieder
2013-01-27  6:32 ` [PATCH 0/2] optimizing pack access on "read only" fetch repos Junio C Hamano
2013-01-29  8:06   ` Shawn Pearce
2013-01-29  8:29   ` Jeff King
2013-01-29 15:25     ` Martin Fick
2013-01-29 15:58     ` Junio C Hamano
2013-01-29 21:19       ` Jeff King
2013-01-29 22:26         ` Junio C Hamano
2013-01-31 16:47         ` Shawn Pearce
2013-02-01  9:14           ` Jeff King
2013-02-02 10:07             ` Shawn Pearce
2013-01-29 11:01   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170712200054.mxcabiyttijpbkbb@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).