git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, avarab@gmail.com, dstolee@microsoft.com,
	gitster@pobox.com, jonathantanmy@google.com
Subject: Re: [PATCH v3 08/16] midx: allow marking a pack as preferred
Date: Mon, 29 Mar 2021 17:15:12 -0400	[thread overview]
Message-ID: <YGJDYIxicaPDMdeZ@nand.local> (raw)
In-Reply-To: <YGHBe1ZB/iCRpqgD@coredump.intra.peff.net>

On Mon, Mar 29, 2021 at 08:00:59AM -0400, Jeff King wrote:
> I think in the long run we may want to add a midx chunk that gives the
> order of the packs (and likewise allow the caller of "midx write" to
> specify the exact order), since that may allow correlating locality
> between history and object order within the .rev/.bitmap files.
>
> But I think this is a nice stopping point for this series, since we're
> not having to introduce any new on-disk formats to do it, and it seems
> to give pretty good results in practice. I guess we'll have to support
> --preferred-pack forever, but that's OK. Even if we do eventually
> support arbitrary orderings, it's just a simple subset of that
> functionality.

To add a little bit of extra detail, I think what you're getting at here
is that it would be nice to let the order of the packs be dictated by
mtime, not the order they appear in the MIDX (which is lexicographic by
their hash, and thus effectively random).

The reason there being the same as you pointed out in

    https://lore.kernel.org/git/YDRdmh8oS5%2Fxq4rB@coredump.intra.peff.net/

which is that it effectively would lay objects out from newest to
oldest.

But, there's a problem, which is that the MIDX doesn't store the packs'
mtimes. That's fine for writing, since we can just look that information
up ourselves. But the reading side can get broken. That's because the
reader also has to know the pack order to go from MIDX- to bit-position.

So if a third party goes and touches some of the packs after the .rev
file was written, then the reader is going to think the packs ought to
appear in a different order than they actually do. So relying on having
to look up the mtimes again later on isn't good enough.

There are two solutions to the problem:

  - You could write the mtimes in the MIDX itself. This would give you a
    single point of reference, and resolve the TOCTOU race I just
    described.

  - Or, you could forget about mtimes entirely and let the MIDX dictate
    the pack ordering itself. That resolves the race in a
    similar-but-different way.

Of the two, I prefer the latter, but I think it introduces functionality
that we don't necessarily need yet. That's because the objects within
the packs are still ordered as such, and so the compression we get in
the packs is just as good as it is for single-pack bitmaps. It's only at
the objects between pack boundaries that any runs of 1s or 0s might be
interrupted, but there are far fewer pack boundaries than objects, so it
doesn't seem to matter in practice.

Anyway, I think that you know all of that already (mostly because we
thought aloud together when I originally brought this up), but I figure
that this detail may be interesting for other readers, too.

> > @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  		usage_with_options(builtin_multi_pack_index_write_usage,
> >  				   options);
> >
> > -	return write_midx_file(opts.object_dir, opts.flags);
> > +	return write_midx_file(opts.object_dir, opts.preferred_pack,
> > +			       opts.flags);
> >  }
>
> This has the same leak of "options" that I mentioned in the earlier
> patch.

Yup, thanks for pointing it out.

> > diff --git a/midx.c b/midx.c
> > index 971faa8cfc..46f55ff6cf 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
> >  	return strcmp(a->pack_name, b->pack_name);
> >  }
> >
> > +static int lookup_idx_or_pack_name(struct pack_info *info,
> > +				   uint32_t nr,
> > +				   const char *pack_name)
> > +{
> > +	uint32_t lo = 0, hi = nr;
> > +	while (lo < hi) {
> > +		uint32_t mi = lo + (hi - lo) / 2;
> > +		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
> > +		if (cmp < 0)
> > +			hi = mi;
> > +		else if (cmp > 0)
> > +			lo = mi + 1;
> > +		else
> > +			return mi;
> > +	}
> > +	return -1;
> > +}
>
> Could this just be replaced with bsearch() in the caller?

Great suggestion. Yes, it can be. FWIW, I think that I may have
originally thought that it couldn't be since we were comparing a fixed
string to an array of structs (each having a field which holds the value
we actually want to compare). But bsearch() always passes the key as the
first argument to the comparator, so this is possible to do.

> > +		git multi-pack-index --object-dir=objects \
> > +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> > +		test_must_be_empty err &&
> > +
> > +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> > +			cut -d" " -f1) &&
> > +		midx_expect_object_offset $b $ofs objects
> > +	)
>
> ...what we really care about is that the object came from BC. And we are
> just using the offset as a proxy for that. But doesn't "test-tool
> read-midx" give us the actual pack name? We could just be checking that.
>
> I also wondered if we should confirm that without the --preferred-pack
> option, we choose the other pack. I think it will always be true because
> the default order is to sort them lexically. A comment to that effect
> might be worth it (near the "set up two packs" comment).

Both great points, thanks.

> -Peff

Thanks,
Taylor

  reply	other threads:[~2021-03-29 21:16 UTC|newest]

Thread overview: 171+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
2021-02-10 23:02 ` [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-02-11  2:27   ` Derrick Stolee
2021-02-11  2:34     ` Taylor Blau
2021-02-10 23:02 ` [PATCH 2/9] midx: allow marking a pack as preferred Taylor Blau
2021-02-11 19:33   ` SZEDER Gábor
2021-02-15 15:49     ` Taylor Blau
2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
2021-02-15 18:41         ` [PATCH 0/5] commit-graph: parse_options() cleanup Ævar Arnfjörð Bjarmason
2021-02-15 18:41         ` [PATCH 1/5] commit-graph: define common usage with a macro Ævar Arnfjörð Bjarmason
2021-02-16 11:33           ` Derrick Stolee
2021-02-15 18:41         ` [PATCH 2/5] commit-graph: remove redundant handling of -h Ævar Arnfjörð Bjarmason
2021-02-16 11:35           ` Derrick Stolee
2021-02-15 18:41         ` [PATCH 3/5] commit-graph: use parse_options_concat() Ævar Arnfjörð Bjarmason
2021-02-15 18:51           ` Taylor Blau
2021-02-15 19:53             ` Taylor Blau
2021-02-15 20:39             ` Ævar Arnfjörð Bjarmason
2021-09-17 21:13               ` SZEDER Gábor
2021-09-17 22:03                 ` Jeff King
2021-09-18  4:30                   ` Taylor Blau
2021-09-18  7:20                     ` Ævar Arnfjörð Bjarmason
2021-09-18 15:56                       ` Taylor Blau
2021-09-18 15:58                         ` Taylor Blau
2021-09-18  0:58                 ` Ævar Arnfjörð Bjarmason
2021-02-15 18:41         ` [PATCH 4/5] commit-graph: refactor dispatch loop for style Ævar Arnfjörð Bjarmason
2021-02-15 18:53           ` Taylor Blau
2021-02-16 11:40             ` Derrick Stolee
2021-02-16 12:02               ` Ævar Arnfjörð Bjarmason
2021-02-16 18:28                 ` Derrick Stolee
2021-02-15 18:41         ` [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage" Ævar Arnfjörð Bjarmason
2021-02-15 19:06           ` Taylor Blau
2021-02-16 11:43           ` Derrick Stolee
2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
2021-02-15 21:01           ` [PATCH v2 1/4] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-02-15 21:01           ` [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-02-15 21:39             ` Ævar Arnfjörð Bjarmason
2021-02-15 21:45               ` Taylor Blau
2021-02-16 11:47                 ` Derrick Stolee
2021-02-15 21:01           ` [PATCH v2 3/4] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-02-15 21:01           ` [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-02-15 21:54             ` Ævar Arnfjörð Bjarmason
2021-02-15 22:34               ` Taylor Blau
2021-02-15 23:11                 ` Ævar Arnfjörð Bjarmason
2021-02-15 23:49                   ` Taylor Blau
2021-02-16 11:50           ` [PATCH v2 0/4] midx: split out sub-commands Derrick Stolee
2021-02-16 14:28             ` Taylor Blau
2021-02-10 23:02 ` [PATCH 3/9] midx: don't free midx_name early Taylor Blau
2021-02-10 23:02 ` [PATCH 4/9] midx: keep track of the checksum Taylor Blau
2021-02-11  2:33   ` Derrick Stolee
2021-02-11  2:35     ` Taylor Blau
2021-02-10 23:03 ` [PATCH 5/9] midx: make some functions non-static Taylor Blau
2021-02-10 23:03 ` [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-02-11  2:48   ` Derrick Stolee
2021-02-11  3:03     ` Taylor Blau
2021-02-10 23:03 ` [PATCH 7/9] pack-revindex: read " Taylor Blau
2021-02-11  2:53   ` Derrick Stolee
2021-02-11  3:04     ` Taylor Blau
2021-02-11  7:54   ` Junio C Hamano
2021-02-11 14:54     ` Taylor Blau
2021-02-10 23:03 ` [PATCH 8/9] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-02-10 23:03 ` [PATCH 9/9] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-02-11  2:58 ` [PATCH 0/9] midx: implement a multi-pack reverse index Derrick Stolee
2021-02-11  3:06   ` Taylor Blau
2021-02-11  8:13 ` Junio C Hamano
2021-02-11 18:37   ` Derrick Stolee
2021-02-11 18:55     ` Junio C Hamano
2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
2021-02-24 19:09   ` [PATCH v2 01/15] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-02-24 19:09   ` [PATCH v2 02/15] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-02-24 19:09   ` [PATCH v2 03/15] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-02-24 19:09   ` [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-03-02  4:06     ` Jonathan Tan
2021-03-02 19:02       ` Taylor Blau
2021-03-04  1:54         ` Jonathan Tan
2021-03-04  3:02           ` Taylor Blau
2021-02-24 19:09   ` [PATCH v2 05/15] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
2021-02-24 19:09   ` [PATCH v2 06/15] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
2021-02-24 19:09   ` [PATCH v2 07/15] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-02-24 19:09   ` [PATCH v2 08/15] midx: allow marking a pack as preferred Taylor Blau
2021-03-02  4:17     ` Jonathan Tan
2021-03-02 19:09       ` Taylor Blau
2021-03-04  2:00         ` Jonathan Tan
2021-03-04  3:04           ` Taylor Blau
2021-02-24 19:09   ` [PATCH v2 09/15] midx: don't free midx_name early Taylor Blau
2021-02-24 19:10   ` [PATCH v2 10/15] midx: keep track of the checksum Taylor Blau
2021-02-24 19:10   ` [PATCH v2 11/15] midx: make some functions non-static Taylor Blau
2021-02-24 19:10   ` [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-03-02  4:21     ` Jonathan Tan
2021-03-02  4:36       ` Taylor Blau
2021-03-02 19:15       ` Taylor Blau
2021-03-04  2:03         ` Jonathan Tan
2021-02-24 19:10   ` [PATCH v2 13/15] pack-revindex: read " Taylor Blau
2021-03-02 18:36     ` Jonathan Tan
2021-03-03 15:27       ` Taylor Blau
2021-02-24 19:10   ` [PATCH v2 14/15] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-02-24 19:10   ` [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-03-02 18:40     ` Jonathan Tan
2021-03-03 15:30       ` Taylor Blau
2021-03-04  2:04         ` Jonathan Tan
2021-03-04  3:06           ` Taylor Blau
2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
2021-03-11 17:04   ` [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-03-29 11:20     ` Jeff King
2021-03-11 17:04   ` [PATCH v3 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-03-29 11:22     ` Jeff King
2021-03-11 17:04   ` [PATCH v3 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-03-11 17:04   ` [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-03-29 11:36     ` Jeff King
2021-03-29 20:38       ` Taylor Blau
2021-03-30  7:04         ` Jeff King
2021-03-11 17:04   ` [PATCH v3 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
2021-03-11 17:04   ` [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
2021-03-29 11:42     ` Jeff King
2021-03-29 20:41       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-03-11 17:05   ` [PATCH v3 08/16] midx: allow marking a pack as preferred Taylor Blau
2021-03-29 12:00     ` Jeff King
2021-03-29 21:15       ` Taylor Blau [this message]
2021-03-30  7:11         ` Jeff King
2021-03-11 17:05   ` [PATCH v3 09/16] midx: don't free midx_name early Taylor Blau
2021-03-11 17:05   ` [PATCH v3 10/16] midx: keep track of the checksum Taylor Blau
2021-03-11 17:05   ` [PATCH v3 11/16] midx: make some functions non-static Taylor Blau
2021-03-11 17:05   ` [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-03-29 12:12     ` Jeff King
2021-03-29 21:22       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 13/16] pack-revindex: read " Taylor Blau
2021-03-29 12:43     ` Jeff King
2021-03-29 21:27       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-03-11 17:05   ` [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-03-29 12:53     ` Jeff King
2021-03-29 21:30       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
2021-03-29 12:59     ` Jeff King
2021-03-29 21:34       ` Taylor Blau
2021-03-30  7:15         ` Jeff King
2021-03-12 15:16   ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Derrick Stolee
2021-03-29 13:05   ` Jeff King
2021-03-29 21:30     ` Junio C Hamano
2021-03-29 21:37     ` Taylor Blau
2021-03-30  7:15       ` Jeff King
2021-03-30 13:37         ` Taylor Blau
2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
2021-03-30 15:03   ` [PATCH v4 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-03-30 15:03   ` [PATCH v4 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-03-30 15:03   ` [PATCH v4 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-03-30 15:03   ` [PATCH v4 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-03-30 15:04   ` [PATCH v4 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
2021-03-30 15:04   ` [PATCH v4 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
2021-03-30 15:04   ` [PATCH v4 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-03-30 15:04   ` [PATCH v4 08/16] midx: allow marking a pack as preferred Taylor Blau
2021-04-01  0:32     ` Taylor Blau
2021-03-30 15:04   ` [PATCH v4 09/16] midx: don't free midx_name early Taylor Blau
2021-03-30 15:04   ` [PATCH v4 10/16] midx: keep track of the checksum Taylor Blau
2021-03-30 15:04   ` [PATCH v4 11/16] midx: make some functions non-static Taylor Blau
2021-03-30 15:04   ` [PATCH v4 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-03-30 15:04   ` [PATCH v4 13/16] pack-revindex: read " Taylor Blau
2021-03-30 15:04   ` [PATCH v4 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-09-08  1:08     ` [PATCH] pack-write: skip *.rev work when not writing *.rev Ævar Arnfjörð Bjarmason
2021-09-08  1:35       ` Carlo Arenas
2021-09-08  2:42         ` Taylor Blau
2021-09-08 15:47           ` Junio C Hamano
2021-09-08  2:50       ` Taylor Blau
2021-09-08  3:50         ` Taylor Blau
2021-09-08 10:18           ` Ævar Arnfjörð Bjarmason
2021-09-08 16:32             ` Taylor Blau
2021-03-30 15:04   ` [PATCH v4 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-03-30 15:04   ` [PATCH v4 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
2021-03-30 15:45   ` [PATCH v4 00/16] midx: implement a multi-pack reverse index Jeff King
2021-03-30 15:49     ` Taylor Blau
2021-03-30 16:01       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YGJDYIxicaPDMdeZ@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).