Re: [RFC/PATCH] gc: run more pre-detach operations under lock

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Jeff King <peff@peff.net>, Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [RFC/PATCH] gc: run more pre-detach operations under lock
Date: Thu, 20 Jun 2019 23:49:02 +0200	[thread overview]
Message-ID: <87ef3o7ws1.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CACsJy8AjXXOpcKrSV4z6kEM=eyFDWSyf==tJZzvDyEN591XdGw@mail.gmail.com>

On Thu, Jun 20 2019, Duy Nguyen wrote:

> On Thu, Jun 20, 2019 at 5:49 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Wed, Jun 19 2019, Jeff King wrote:
>>
>> > On Wed, Jun 19, 2019 at 08:01:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >
>> >> > You could sort of avoid the problem here too with
>> >> >
>> >> > parallel 'git fetch --no-auto-gc {}' ::: $(git remote)
>> >> > git gc --auto
>> >> >
>> >> > It's definitely simpler, but of course we have to manually add
>> >> > --no-auto-gc in everywhere we need, so not quite as elegant.
>> >> >
>> >> > Actually you could already do that with 'git -c gc.auto=false fetch', I guess.
>> >>
>> >> The point of the 'parallel' example is to show disconnected git
>> >> commands, think trying to run 'git' in a terminal while your editor
>> >> asynchronously runs a polling 'fetch', or a server with multiple
>> >> concurrent clients running 'gc --auto'.
>> >>
>> >> That's the question my RFC patch raises. As far as I can tell the
>> >> approach in your patch is only needed because our locking for gc is
>> >> buggy, rather than introduce the caveat that an fetch(N) operation won't
>> >> do "gc" until it's finished (we may have hundreds, thousands of remotes,
>> >> I use that for some more obscure use-cases) shouldn't we just fix the
>> >> locking?
>> >
>> > I think there may be room for both approaches. Yours fixes the repeated
>> > message in the more general case, but Duy's suggestion is the most
>> > efficient thing.
>> >
>> > I agree that the "thousands of remotes" case means we might want to gc
>> > in the interim. But we probably ought to do that deterministically
>> > rather than hoping that the pattern of lock contention makes sense.
>>
>> We do it deterministically, when gc.auto thresholds et al are exceeded
>> we kick one off without waiting for other stuff, if we can get the lock.
>>
>> I don't think this desire to just wait a bit until all the fetches are
>> complete makes sense as a special-case.
>>
>> If, as you noted in <20190619190845.GD28145@sigill.intra.peff.net>, the
>> desire is to reduce GC CPU use then you're better off just tweaking the
>> limits upwards. Then you get that with everything, like when you run
>> "commit" in a for-loop, not just this one special case of "fetch".
>>
>> We have existing potentially long-running operations like "fetch",
>> "rebase" and "git svn fetch" that run "gc --auto" for their incremental
>> steps, and that's a feature.
>
> gc --auto is added at arbitrary points to help garbage collection. I
> don't think it's ever intended to "do gc at this and that exact
> moment", just "hey this command has taken a lot of time already (i.e.
> no instant response needed) and it may have added a bit more garbage,
> let's just check real quick".

I don't mean we can't ever change the algorithm, but that we've
documented:

    When common porcelain operations that create objects are run, they
    will check whether the repository has grown substantially since the
    last maintenance[...]

The "fetch" command is a common porcelain operation, when it fetches
from N remotes it just runs an invocation of itself, so thus far it's
both worked & been intuitive that if we needed (potentially multiple)
gc's while doing that we'd just go ahead and run it then, even if
something concurrent was happening.

No that's not optimal in many cases, but at least doesn't create caveats
we don't have now where we have runaway object growth.

>> It keeps "gc --auto" dumb enough to avoid a pathological case where
>> we'll have a ballooning objects dir because we figure we can run
>> something "at the end", when "the end" could be hours away, and we're
>> adding a new pack or hundreds of loose objects every second.
>
> Are we optimizing for a rare (large scale) case? Such setup requires
> tuning regardless to me.

At least for me it doesn't require custom tuning before this patch of
yours.

I.e. now "gc --auto" is dumb enough that you can run it on everything
from stuff that just does "commit" from cron, user's laptops, massive
rebases that take forever, and e.g. "stats" like jobs where for
<reasons> I'll add thousands of repos and "fetch --all" them (so e.g. I
can run "log --author=<x> --all").

Yeah of course I'm an advanced user and I can just grumble and manually
invoke fetch, actually I'll probably submit a follow-up patch to add a
gc.* config to disable this thing.

But I think even if the use-case is rather obscure it's important that
if at all possible we keep "gc" elastic enough to work for pretty much
all combinations of object-adding porcelain commands, and I think in
this case we're better off doing things differently...

>> So I don't think Duy's patch is a good way to go.
>
> This reminds me of being perfect is the enemy of the good. A normal
> user has a couple remotes at most, finishing fast (enough) and in such
> case it's a good idea to wait until everything is in before running
> gc.

Even a user with two remotes will run into issues with your patch where
"gc" will print things twice, or outright error due to concurrent access
by another process, which as has been discussed here on-list is *very*
common e.g. with editor integration.

So the extent of my complains about this is:

 1) The edge case with runaway object growth (obscure)

 2) It doesn't really fix the bug except for the narrow case of users
    who invoke things one-terminal-at-a-time and don't e.g. have an
    editor with "git" integration & a terminal (not obscure).

Maybe I should have led with #2 :)

Anyway, I'm not *just* complaining. I have patches too, but so far I'm
up to 8 patches on what's probably just the first one-third of
it. *Sigh*.

But the 2/3 of that if you want to dig through my crappy WIP code on
GitHub is instrumenting the test suite to demonstrate that with an
approach like what your patch does we still get these GC race
conditions, because our locking still sucks, which brings me to...

> Of course making git-gc more robust wrt. parallel access is great, but
> it's hard work. Dealing with locks is always tricky, especially when
> new locks can come up any time.

I've poked at it a bit now, and it's really not hard, I think it's just
that nobody looked at it hard enough before.

The issue is that currently we do:

    1. parent: do_we_need_gc();
    2. parent: say_way_will_gc();
    3. parent: lock();
    4. parent: do_a_bit_of_work();
    5. parent: unlock();
    6. parent: fork();
    7. child: lock();
    8. child: do_a_lot_of_work();
    9. child: unlock();

My RFC patch currently changes that to:

    1. parent: do_we_need_gc();
    2. parent: lock_or_silently_exit();
    3. parent: say_way_will_gc();
    4. parent: do_a_bit_of_work();
    5. parent: unlock();
    6. parent: fork();
    7. child: lock();
    8. child: do_a_lot_of_work();
    9. child: unlock();

I.e. we won't duplicate the message, but *do* introduce the caveat that
it's in principle possible nobody gc's, but in practice when we fail to
get the lock in lock_or_silently_exit() it's because we lost the race to
a sister process that's going to do the actual GC, so all is well.

But we are left with the brief race when we fork. My RFC patch proposed
some elaborate PID hand-over dance to deal with this. But having looked
at it again I think we can easily get rid of that race too with just:

    1. parent: do_we_need_gc();
    2. parent: lock_or_silently_exit();
    3. parent: say_way_will_gc();
    4. parent: do_a_bit_of_work();
    5. parent: fork();
    6. parent: <writes child pid to gc.lock>
    7. parent: <exits without unlocking gc.lock>
    8. child: do_a_lot_of_work();
    9. child: unlock();

Which can fail in cases where the child segfaults, or manages to exit
earlier than the parent etc, hence my earlier proposed elaborate pid
hand-over dance.

But looking at it again we only usurp an existing gc.lock if the mtime
is >12hrs, so it's OK if we have very rare cases where the PID info got
corrupted, we can still back ourselves out of it, which is what I was
paranoid about.

Furthermore the gc.lock contains the <hostname><pid> of the working
process, but we can in a backwards-compatible way add new entries to
that file, i.e. list both the child & parent pid. Older clients will
just read whichever one comes first, but if we make new versions check
both we can be more paranoid going forward.

> Having said that, I don't mind if my patch gets dropped. It was just a
> "hey that multiple gc output looks strange, hah the fix is quite
> simple" moment for me.

next prev parent reply	other threads:[~2019-06-20 21:49 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19  9:46 [PATCH] fetch: only run 'gc' once when fetching multiple remotes Nguyễn Thái Ngọc Duy
2019-06-19 10:26 ` [RFC/PATCH] gc: run more pre-detach operations under lock Ævar Arnfjörð Bjarmason
2019-06-19 12:51   ` Duy Nguyen
2019-06-19 18:01     ` Ævar Arnfjörð Bjarmason
2019-06-19 19:10       ` Jeff King
2019-06-19 22:49         ` Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 0/6] Change <non-empty?> GIT_TEST_* variables to <boolean> Ævar Arnfjörð Bjarmason
2019-06-20 18:13             ` Junio C Hamano
2019-06-20 21:00               ` Ævar Arnfjörð Bjarmason
2019-06-20 20:03             ` Junio C Hamano
2019-06-20 21:09             ` [PATCH v2 0/8] " Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 1/8] config tests: simplify include cycle test Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 2/8] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-21 17:07                 ` Junio C Hamano
2019-06-21 10:18               ` [PATCH v3 3/8] config.c: refactor die_bad_number() to not call gettext() early Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 4/8] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 5/8] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-24 16:47                 ` Junio C Hamano
2019-06-21 10:18               ` [PATCH v3 6/8] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 7/8] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-09-06 12:13                 ` [PATCH 1/2] t/lib-git-svn.sh: check GIT_TEST_SVN_HTTPD when running SVN HTTP tests SZEDER Gábor
2019-09-06 12:13                   ` [PATCH 2/2] ci: restore running httpd tests SZEDER Gábor
2019-09-06 17:03                     ` Junio C Hamano
2019-09-06 19:13                     ` Jeff King
2019-09-07 10:16                       ` SZEDER Gábor
2019-11-22 13:14                         ` [PATCH 0/2] tests: catch non-bool GIT_TEST_* values SZEDER Gábor
2019-11-22 13:14                           ` [PATCH 1/2] tests: add 'test_bool_env' to " SZEDER Gábor
2019-11-25 13:50                             ` Jeff King
2019-11-22 13:14                           ` [PATCH 2/2] t5608-clone-2gb.sh: turn GIT_TEST_CLONE_2GB into a bool SZEDER Gábor
2019-11-25 13:53                             ` Jeff King
2019-06-21 10:18               ` [PATCH v3 8/8] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 1/8] config tests: simplify include cycle test Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 2/8] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-20 22:11               ` Junio C Hamano
2019-06-20 22:21                 ` Junio C Hamano
2019-06-21  8:11                   ` Ævar Arnfjörð Bjarmason
2019-06-21 15:04                     ` Junio C Hamano
2019-06-20 21:09             ` [PATCH v2 3/8] config.c: refactor die_bad_number() to not call gettext() early Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 4/8] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 5/8] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 6/8] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 7/8] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 8/8] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 1/6] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-20 19:25             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 2/6] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-20 19:54             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 3/6] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-20 20:00             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 4/6] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 5/6] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 6/6] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-20 10:26           ` [RFC/PATCH] gc: run more pre-detach operations under lock Duy Nguyen
2019-06-20 21:49             ` Ævar Arnfjörð Bjarmason [this message]
2019-06-20 10:43           ` Jeff King
2019-06-20 18:55         ` Junio C Hamano
2019-06-19 19:08     ` Jeff King
2019-06-19 18:59 ` [PATCH] fetch: only run 'gc' once when fetching multiple remotes Jeff King
2019-06-20 10:11   ` Duy Nguyen
2019-06-20 10:21     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ef3o7ws1.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).