git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Eric Wong <e@80x24.org>, Janos Farkas <chexum@gmail.com>,
	git@vger.kernel.org
Subject: Re: 2.22.0 repack -a duplicating pack contents
Date: Mon, 24 Jun 2019 11:30:59 +0200	[thread overview]
Message-ID: <874l4f8h4c.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20190623224244.GB1100@sigill.intra.peff.net>


On Mon, Jun 24 2019, Jeff King wrote:

> On Sun, Jun 23, 2019 at 06:08:25PM +0000, Eric Wong wrote:
>
>> > I'm not sure of the right solution. For maximal backwards-compatibility,
>> > the default for bitmaps could become "if not bare and if there are no
>> > .keep files". But that would mean bitmaps sometimes not getting
>> > generated because of the problems that ee34a2bead was trying to solve.
>> >
>> > That's probably OK, though; you can always flip the bitmap config to
>> > "true" yourself if you _must_ have bitmaps.
>>
>> What about something like this?  Needs tests but I need to leave, now.
>
> Yeah, I think that's the right direction.
>
> Though...
>
>> +static int has_pack_keep_file(void)
>> +{
>> +	DIR *dir;
>> +	struct dirent *e;
>> +	int found = 0;
>> +
>> +	if (!(dir = opendir(packdir)))
>> +		return found;
>> +
>> +	while ((e = readdir(dir)) != NULL) {
>> +		if (ends_with(e->d_name, ".keep")) {
>> +			found = 1;
>> +			break;
>> +		}
>> +	}
>> +	closedir(dir);
>> +	return found;
>> +}
>
> I think this can be replaced with just checking p->pack_keep for each
> item in the packed_git list.
>
> That's racy, but then so is your code here, since it's really the child
> pack-objects which is going to deal with the .keep. I don't think we
> need to care much about the race, though. Either:
>
>   1. Somebody has made an old intentional .keep, which would not be
>      racy. We'd see it in both places.
>
>   2. Somebody _just_ made an intentional .keep; we'll race with that and
>      maybe duplicate objects from the kept pack. But this is a rare
>      occurrence, and there's no real ordering promise here anyway with
>      somebody creating .keep files alongside a running repack.
>
>   3. An incoming fetch/push may create a .keep file as a temporary lock,
>      which we see here but which goes away by the time pack-objects
>      runs. That's OK; we err on the side of not generating bitmaps, but
>      they're an optimization anyway (and if you really insist on having
>      them, you should tell Git to definitely make them instead of
>      relying on this default behavior).

This sort of thing (#3) strikes me as a fairly pathological case we
should try to avoid. Now what we've turned on bitmaps by default people
will take the sort of performance increase noted in [1] for granted.

So they'll be happily running with that & then get a CPU/IO spike as the
*.bitmap files they'd been implicitly relying on for years in their
default config goes away, only to have it re-appear when "repack" runs
next.

I can't think of some great solution for this case, some thoughts:

 a. Perhaps we should split the *.keep flag into two things or
    more.

    We're using it for all of "I want this *.pack forever"
    (e.g. debugging) and "I want only this *.pack to contain the data
    found in it" (I/O & CPU optimization, what Janos wants) and "I'm
    git.git code avoiding a race with myself" (what you describe in #3).

    So maybe for the last of those we could also use and understand
    *.tmp-keep, at which point we wouldn't have this race described in
    #3. The 1st of those is a *.noprune and the 2nd is *.highlander (but
    whether it's worth splitting all that out v.s. just having
    *.tmp-keep is another matter).

 b) Shouldn't we at least print some warning to STDERR in this case so
    e.g. gc.log will note the performance degradation of the repo in its
    current configuration?

>   4. Like (3), but we _don't _see the temporary .keep here but _do_ see
>      it during pack-objects. That's OK, because we'll have told
>      pack-objects to pack those objects anyway, which is the right
>      thing.
>
> -Peff

1. https://github.blog/2015-09-22-counting-objects/

  reply	other threads:[~2019-06-24  9:31 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-23 12:15 2.22.0 repack -a duplicating pack contents Janos Farkas
2019-06-23 14:54 ` Ævar Arnfjörð Bjarmason
2019-06-23 15:38   ` Janos Farkas
2019-06-23 18:02   ` Jeff King
2019-06-23 18:08     ` Eric Wong
2019-06-23 22:42       ` Jeff King
2019-06-24  9:30         ` Ævar Arnfjörð Bjarmason [this message]
2019-07-03 17:40           ` Jeff King
2019-06-28  7:02         ` [PATCH] repack: disable bitmaps-by-default if .keep files exist Eric Wong
2019-06-28  7:21           ` Ævar Arnfjörð Bjarmason
2019-06-29 19:16             ` [PATCH 2/1] repack: warn if bitmaps are explicitly enabled with keep files Eric Wong
2019-07-01 18:15               ` Junio C Hamano
2019-07-03 17:38                 ` Jeff King
2019-07-03 18:10                   ` Junio C Hamano
2019-07-03 18:37                     ` Junio C Hamano
2019-07-03 21:24                       ` Jeff King
2019-07-03 21:23                     ` Jeff King
2019-07-08 17:40                       ` Junio C Hamano
2019-06-29  8:03           ` [PATCH] repack: disable bitmaps-by-default if .keep files exist SZEDER Gábor
2019-06-29 19:13             ` [PATCH v2] " Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874l4f8h4c.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=chexum@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).