git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Duy Nguyen <pclouds@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [RFC/PATCH] gc: run more pre-detach operations under lock
Date: Thu, 20 Jun 2019 00:49:32 +0200	[thread overview]
Message-ID: <87imt18a2r.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20190619191037.GE28145@sigill.intra.peff.net>


On Wed, Jun 19 2019, Jeff King wrote:

> On Wed, Jun 19, 2019 at 08:01:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > You could sort of avoid the problem here too with
>> >
>> > parallel 'git fetch --no-auto-gc {}' ::: $(git remote)
>> > git gc --auto
>> >
>> > It's definitely simpler, but of course we have to manually add
>> > --no-auto-gc in everywhere we need, so not quite as elegant.
>> >
>> > Actually you could already do that with 'git -c gc.auto=false fetch', I guess.
>>
>> The point of the 'parallel' example is to show disconnected git
>> commands, think trying to run 'git' in a terminal while your editor
>> asynchronously runs a polling 'fetch', or a server with multiple
>> concurrent clients running 'gc --auto'.
>>
>> That's the question my RFC patch raises. As far as I can tell the
>> approach in your patch is only needed because our locking for gc is
>> buggy, rather than introduce the caveat that an fetch(N) operation won't
>> do "gc" until it's finished (we may have hundreds, thousands of remotes,
>> I use that for some more obscure use-cases) shouldn't we just fix the
>> locking?
>
> I think there may be room for both approaches. Yours fixes the repeated
> message in the more general case, but Duy's suggestion is the most
> efficient thing.
>
> I agree that the "thousands of remotes" case means we might want to gc
> in the interim. But we probably ought to do that deterministically
> rather than hoping that the pattern of lock contention makes sense.

We do it deterministically, when gc.auto thresholds et al are exceeded
we kick one off without waiting for other stuff, if we can get the lock.

I don't think this desire to just wait a bit until all the fetches are
complete makes sense as a special-case.

If, as you noted in <20190619190845.GD28145@sigill.intra.peff.net>, the
desire is to reduce GC CPU use then you're better off just tweaking the
limits upwards. Then you get that with everything, like when you run
"commit" in a for-loop, not just this one special case of "fetch".

We have existing potentially long-running operations like "fetch",
"rebase" and "git svn fetch" that run "gc --auto" for their incremental
steps, and that's a feature.

It keeps "gc --auto" dumb enough to avoid a pathological case where
we'll have a ballooning objects dir because we figure we can run
something "at the end", when "the end" could be hours away, and we're
adding a new pack or hundreds of loose objects every second.

So I don't think Duy's patch is a good way to go.

The rationale in its commit message for including it can be better
addressed by something like my WIP for just fixing the locking
mechanism, since it fixes the stated problems of multiple "auto packing
in the background" messages and the "we may waste some resources" (we
take the lock earlier before doing 'real' work) without introducing its
own pathological case of deferring "gc --auto" too much as we have
unchecked object growth.

I think that's really important. It's OK if "gc --auto" isn't optimal,
but we should really avoid such pathological cases.

It's also important that it's easy to explain, i.e. if this patch goes
in I think it should update the second paragraph of the git-gc.txt
docs. I.e. now it's not just:

    When common porcelain operations that create objects are run, they
    will check whether the repository has grown substantially since the
    last maintenance[...]

But also something like:

    Except in cases where we're running porcelain commads that
    themselves might re-run aggregates of themselves, in which case we
    defer "gc" until the end. This currently only applies to "fetch",
    but other commands such as "rebase" etc. might learn to do this in
    the future. Note that if the sub-commands are numerous enough this
    might itself become pathological as "gc --auto" is deferred too
    much, so use option/config XYZ to ...

Or whatever...

  reply	other threads:[~2019-06-19 22:49 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19  9:46 [PATCH] fetch: only run 'gc' once when fetching multiple remotes Nguyễn Thái Ngọc Duy
2019-06-19 10:26 ` [RFC/PATCH] gc: run more pre-detach operations under lock Ævar Arnfjörð Bjarmason
2019-06-19 12:51   ` Duy Nguyen
2019-06-19 18:01     ` Ævar Arnfjörð Bjarmason
2019-06-19 19:10       ` Jeff King
2019-06-19 22:49         ` Ævar Arnfjörð Bjarmason [this message]
2019-06-19 23:30           ` [PATCH 0/6] Change <non-empty?> GIT_TEST_* variables to <boolean> Ævar Arnfjörð Bjarmason
2019-06-20 18:13             ` Junio C Hamano
2019-06-20 21:00               ` Ævar Arnfjörð Bjarmason
2019-06-20 20:03             ` Junio C Hamano
2019-06-20 21:09             ` [PATCH v2 0/8] " Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 1/8] config tests: simplify include cycle test Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 2/8] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-21 17:07                 ` Junio C Hamano
2019-06-21 10:18               ` [PATCH v3 3/8] config.c: refactor die_bad_number() to not call gettext() early Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 4/8] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 5/8] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-24 16:47                 ` Junio C Hamano
2019-06-21 10:18               ` [PATCH v3 6/8] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 7/8] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-09-06 12:13                 ` [PATCH 1/2] t/lib-git-svn.sh: check GIT_TEST_SVN_HTTPD when running SVN HTTP tests SZEDER Gábor
2019-09-06 12:13                   ` [PATCH 2/2] ci: restore running httpd tests SZEDER Gábor
2019-09-06 17:03                     ` Junio C Hamano
2019-09-06 19:13                     ` Jeff King
2019-09-07 10:16                       ` SZEDER Gábor
2019-11-22 13:14                         ` [PATCH 0/2] tests: catch non-bool GIT_TEST_* values SZEDER Gábor
2019-11-22 13:14                           ` [PATCH 1/2] tests: add 'test_bool_env' to " SZEDER Gábor
2019-11-25 13:50                             ` Jeff King
2019-11-22 13:14                           ` [PATCH 2/2] t5608-clone-2gb.sh: turn GIT_TEST_CLONE_2GB into a bool SZEDER Gábor
2019-11-25 13:53                             ` Jeff King
2019-06-21 10:18               ` [PATCH v3 8/8] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 1/8] config tests: simplify include cycle test Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 2/8] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-20 22:11               ` Junio C Hamano
2019-06-20 22:21                 ` Junio C Hamano
2019-06-21  8:11                   ` Ævar Arnfjörð Bjarmason
2019-06-21 15:04                     ` Junio C Hamano
2019-06-20 21:09             ` [PATCH v2 3/8] config.c: refactor die_bad_number() to not call gettext() early Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 4/8] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 5/8] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 6/8] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 7/8] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 8/8] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 1/6] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-20 19:25             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 2/6] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-20 19:54             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 3/6] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-20 20:00             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 4/6] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 5/6] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 6/6] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-20 10:26           ` [RFC/PATCH] gc: run more pre-detach operations under lock Duy Nguyen
2019-06-20 21:49             ` Ævar Arnfjörð Bjarmason
2019-06-20 10:43           ` Jeff King
2019-06-20 18:55         ` Junio C Hamano
2019-06-19 19:08     ` Jeff King
2019-06-19 18:59 ` [PATCH] fetch: only run 'gc' once when fetching multiple remotes Jeff King
2019-06-20 10:11   ` Duy Nguyen
2019-06-20 10:21     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87imt18a2r.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).