git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Michael Haggerty" <mhagger@alum.mit.edu>,
	"Stefan Beller" <stefanbeller@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Matt McCutchen" <matt@mattmccutchen.net>
Subject: Re: [PATCH 0/4] gc docs: modernize and fix the documentation
Date: Fri, 10 May 2019 01:20:55 +0200	[thread overview]
Message-ID: <8736lnxlig.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20190507075158.GG28060@sigill.intra.peff.net>


On Tue, May 07 2019, Jeff King wrote:

> On Mon, May 06, 2019 at 11:44:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Maybe there's some case I haven't thought of that makes this stupid, but
>> I wonder if something like a "gc quarantine" might be a fix fo both of
>> the the issues you noted above.
>>
>> I.e. it seems to me that the main issue is that we conflate "mtime 2
>> weeks old because it's unreferenced for 2 weeks" v.s. "mtime 2 weeks old
>> because we haven't gotten around to a 'gc'".
>>
>> So in such a "gc quarantine" mode when we discover an object/pack that's
>> unreachable/purely made up of unreachable objects we'd move the relevant
>> loose object/"loose" pack to such a quarantine, which would just be
>> .git/unreferenced-objects/{??,pack}/ or whatever.
>>
>> AFAICT both cases you mentioned above would be mitigated by this because
>> we'd no longer conflate "haven't gc'd this yet and it's 2 weeks old"
>> v.s. "hasn't been referenced in 2 weeks".
>
> Michael Haggerty and I have (off-list) discussed variations on that, but
> it opens up a lot of new issues.  Moving something into quarantine isn't
> atomic. So you've still corrupted the repo, but now it's recoverable by
> reaching into the quarantine. Who notices that the repo is corrupt, and
> how? When do we expire objects from quarantine?
>
> I think the heart of the issue is really the lack of atomicity in the
> operations. You need some way to mark "I am using this now" in a way
> that cannot race with "looks like nobody is using this, so I'll delete
> it".
>
> And ideally without traversing large bits of the graph on the writing
> side, and without requiring any stop-the-world locks during pruning.

I was thinking (but realize now that I didn't articulate) that the "gc
quarantine" would be another "alternate" implementing a copy-on-write
"lockless delete-but-be-able-to-rollback scheme" as you put it.

So "gc" would decide (racily) what's unreachable, but instead of
unlink()-ing it would "mv" the loose object/pack into the
"unreferenced-objects" quarantine.

Then in your example #1 "wants to reference ABCD. It sees that we have
it." would race on the "other side". I.e. maybe ABCD was *just* moved to
the quarantine. But in that case we'd move it back, which would bump the
mtime and thus make it ineligible for expiry.

Similarly for example #2, the "ABCD is ancient" would be moved, but then
promptely moved back on the next GC as we notice ABCD has been
re-referenced.

Maybe it's just the same problem all over again, but I don't see how
yet.

Aside from that, I have a hunch that while it's theoretically true that
you can at any time re-reference some loose blob/tree/commit again, that
the likelyhood of that in practice goes down as it ages, since a user is
likely to e.g. re-push or rename some branch they pushed last week, not
last year.

Hence the mention of creating "unreferenced packs" with some new
--keep-unreachable mode. Since we'd pack those together they wouldn't
create the "ref explosion" problem we have with the loose refs, and thus
you could afford to keep them longer (even though the deltas would be
shittier).

Whereas now you either need --keep-unreachable (keep stuff forever) or a
more aggressive gc.pruneExpire if you'd like to not end up with a
ginormous amount of loose objects.

>> I started looking at this initially because I was wondering if the
>> --keep-unreachable mode you modified in e26a8c4721 ("repack: extend
>> --keep-unreachable to loose objects", 2016-06-13) could be made to write
>> out such "unreferenced" objects into their *own* pack, so we could
>> delete them all at once as a batch, and wouldn't create the "ref
>> explosions" mentioned in [1].
>>
>> But of course without an accompanying quarantine described above doing
>> that would just make this race condition worse.
>
> I'm not sure it really makes it worse. The pack would have the same
> mtime as the loose objects would.

  reply	other threads:[~2019-05-09 23:21 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
2019-03-18 16:14 ` [PATCH 1/4] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-03-18 21:27   ` Jeff King
2019-03-18 22:18     ` Ævar Arnfjörð Bjarmason
2019-03-18 16:15 ` [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-03-18 21:31   ` Jeff King
2019-03-21 22:11     ` Andreas Heiduk
2019-03-19  2:08   ` Duy Nguyen
2019-03-18 16:15 ` [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION" Ævar Arnfjörð Bjarmason
2019-03-18 21:49   ` Jeff King
2019-03-18 22:48     ` Ævar Arnfjörð Bjarmason
2019-03-18 23:42       ` Jeff King
2019-03-18 16:15 ` [PATCH 4/4] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-03-18 20:28   ` Jonathan Nieder
2019-03-18 21:22     ` Jeff King
2019-03-18 22:13       ` Ævar Arnfjörð Bjarmason
2019-03-18 23:53         ` Jeff King
2019-03-19  6:54   ` Johannes Sixt
2019-03-19  9:28     ` Ævar Arnfjörð Bjarmason
2019-03-18 21:51 ` [PATCH 0/4] gc docs: modernize and fix the documentation Jeff King
2019-03-18 22:45   ` Ævar Arnfjörð Bjarmason
2019-03-19  0:18     ` Jeff King
2019-05-06  9:44       ` Ævar Arnfjörð Bjarmason
2019-05-07  7:51         ` Jeff King
2019-05-09 23:20           ` Ævar Arnfjörð Bjarmason [this message]
2019-07-31  4:26             ` Jeff King
2019-07-31 10:12               ` Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 00/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 01/11] " Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-03-30 18:04     ` Todd Zullinger
2019-04-07 19:52     ` [PATCH v4 00/11] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 01/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 01/10] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-03-22  6:01   ` Junio C Hamano
2019-03-21 20:50 ` [PATCH v2 02/10] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 03/10] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 04/10] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 05/10] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 06/10] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 07/10] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 08/10] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 09/10] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 10/10] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8736lnxlig.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=matt@mattmccutchen.net \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).