git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/9] midx: implement a multi-pack reverse index
@ 2021-02-10 23:02 Taylor Blau
  2021-02-10 23:02 ` [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
                   ` (13 more replies)
  0 siblings, 14 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:02 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

This series describes and implements a reverse index for the multi-pack index,
based on a "pseudo-pack" which can be uniquely described by the multi-pack
index.

The details of the pseudo-pack, and multi-pack reverse index are laid out in
detail in the sixth patch.

This is in support of multi-pack reachability bitmaps, which contain objects
from the multi-pack index. Likewise, an object's bit position in a multi-pack
reachability bitmap is determined by its position with that multi-pack index's
pseudo pack.

In this series, there are no users of the multi-pack index, so this series is
mainly about laying the groundwork for implementing multi-pack bitmaps. This
series is the final prerequisite needed before we can implement multi-pack
bitmaps, which will come in the next series[1].

Since tb/pack-revindex-on-disk is queued to be merged to 'master', but hasn't
yet been merged, this series is based on that branch.

Thanks in advance for your review of this series, and all of the many other
series in support of multi-pack bitmaps.

[1]: If you're curious, you can find the patches in the tb/multi-pack-bitmaps
branch of my fork at https://github.com/ttaylorr/git.

Taylor Blau (9):
  t/helper/test-read-midx.c: add '--show-objects'
  midx: allow marking a pack as preferred
  midx: don't free midx_name early
  midx: keep track of the checksum
  midx: make some functions non-static
  Documentation/technical: describe multi-pack reverse indexes
  pack-revindex: read multi-pack reverse indexes
  pack-write.c: extract 'write_rev_file_order'
  pack-revindex: write multi-pack reverse indexes

 Documentation/git-multi-pack-index.txt       |  11 +-
 Documentation/technical/multi-pack-index.txt |   5 +-
 Documentation/technical/pack-format.txt      |  83 +++++++
 builtin/multi-pack-index.c                   |  10 +-
 builtin/repack.c                             |   2 +-
 midx.c                                       | 239 ++++++++++++++++++-
 midx.h                                       |  11 +-
 pack-revindex.c                              | 112 +++++++++
 pack-revindex.h                              |  46 ++++
 pack-write.c                                 |  39 ++-
 pack.h                                       |   1 +
 packfile.c                                   |   3 +
 t/helper/test-read-midx.c                    |  24 +-
 t/t5319-multi-pack-index.sh                  |  39 +++
 14 files changed, 591 insertions(+), 34 deletions(-)

-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects'
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
@ 2021-02-10 23:02 ` Taylor Blau
  2021-02-11  2:27   ` Derrick Stolee
  2021-02-10 23:02 ` [PATCH 2/9] midx: allow marking a pack as preferred Taylor Blau
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:02 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

The 'read-midx' helper is used in places like t5319 to display basic
information about a multi-pack-index.

In the next patch, the MIDX writing machinery will learn a new way to
choose from which pack an object is selected when multiple copies of
that object exist.

To disambiguate which pack introduces an object so that this feature can
be tested, add a '--show-objects' option which displays additional
information about each object in the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 2430880f78..4ec12f77a0 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -4,7 +4,7 @@
 #include "repository.h"
 #include "object-store.h"
 
-static int read_midx_file(const char *object_dir)
+static int read_midx_file(const char *object_dir, int show_objects)
 {
 	uint32_t i;
 	struct multi_pack_index *m;
@@ -15,6 +15,20 @@ static int read_midx_file(const char *object_dir)
 	if (!m)
 		return 1;
 
+	if (show_objects) {
+		struct object_id oid;
+		struct pack_entry e;
+
+		for (i = 0; i < m->num_objects; i++) {
+			nth_midxed_object_oid(&oid, m, i);
+			fill_midx_entry(the_repository, &oid, &e, m);
+
+			printf("%s %"PRIu64"\t%s\n",
+			       oid_to_hex(&oid), e.offset, e.p->pack_name);
+		}
+		return 0;
+	}
+
 	printf("header: %08x %d %d %d %d\n",
 	       m->signature,
 	       m->version,
@@ -48,8 +62,10 @@ static int read_midx_file(const char *object_dir)
 
 int cmd__read_midx(int argc, const char **argv)
 {
-	if (argc != 2)
-		usage("read-midx <object-dir>");
+	if (!(argc == 2 || argc == 3))
+		usage("read-midx [--show-objects] <object-dir>");
 
-	return read_midx_file(argv[1]);
+	if (!strcmp(argv[1], "--show-objects"))
+		return read_midx_file(argv[2], 1);
+	return read_midx_file(argv[1], 0);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 2/9] midx: allow marking a pack as preferred
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
  2021-02-10 23:02 ` [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
@ 2021-02-10 23:02 ` Taylor Blau
  2021-02-11 19:33   ` SZEDER Gábor
  2021-02-10 23:02 ` [PATCH 3/9] midx: don't free midx_name early Taylor Blau
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:02 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.

Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).

To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.

No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).

[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt       | 11 ++-
 Documentation/technical/multi-pack-index.txt |  5 +-
 builtin/multi-pack-index.c                   | 10 +-
 builtin/repack.c                             |  2 +-
 midx.c                                       | 97 ++++++++++++++++++--
 midx.h                                       |  2 +-
 t/t5319-multi-pack-index.sh                  | 39 ++++++++
 7 files changed, 151 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index eb0caa0439..dd14eab781 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
+'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
+	[--preferred-pack=<pack>] <subcommand>
 
 DESCRIPTION
 -----------
@@ -27,6 +28,14 @@ OPTIONS
 	Turn progress on/off explicitly. If neither is specified, progress is
 	shown if standard error is connected to a terminal.
 
+--preferred-pack=<pack>::
+	When using the `write` subcommand, optionally specify the
+	tie-breaking pack used when multiple packs contain the same
+	object. Incompatible with other subcommands, including `repack`,
+	which may repack the pack marked as preferred. If not given, the
+	preferred pack is inferred from an existing `multi-pack-index`,
+	if one exists, otherwise the pack with the lowest mtime.
+
 The following subcommands are available:
 
 write::
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index e8e377a59f..fb688976c4 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -43,8 +43,9 @@ Design Details
   a change in format.
 
 - The MIDX keeps only one record per object ID. If an object appears
-  in multiple packfiles, then the MIDX selects the copy in the most-
-  recently modified packfile.
+  in multiple packfiles, then the MIDX selects the copy in the
+  preferred packfile, otherwise selecting from the most-recently
+  modified packfile.
 
 - If there exist packfiles in the pack directory not registered in
   the MIDX, then those packfiles are loaded into the `packed_git`
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5bf88cd2a8..4d1ea3fe84 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -4,6 +4,7 @@
 #include "parse-options.h"
 #include "midx.h"
 #include "trace2.h"
+#include "object-store.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
 	N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
@@ -12,6 +13,7 @@ static char const * const builtin_multi_pack_index_usage[] = {
 
 static struct opts_multi_pack_index {
 	const char *object_dir;
+	const char *preferred_pack;
 	unsigned long batch_size;
 	int progress;
 } opts;
@@ -24,6 +26,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
+		OPT_STRING(0, "preferred-pack", &opts.preferred_pack, N_("preferred-pack"),
+		  N_("pack for reuse when computing a multi-pack bitmap")),
 		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
@@ -51,6 +55,9 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		return 1;
 	}
 
+	if (strcmp(argv[0], "write") && opts.preferred_pack)
+		die(_("'--preferred-pack' requires 'write'"));
+
 	trace2_cmd_mode(argv[0]);
 
 	if (!strcmp(argv[0], "repack"))
@@ -60,7 +67,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		die(_("--batch-size option is only for 'repack' subcommand"));
 
 	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, flags);
+		return write_midx_file(opts.object_dir, opts.preferred_pack,
+				       flags);
 	if (!strcmp(argv[0], "verify"))
 		return verify_midx_file(the_repository, opts.object_dir, flags);
 	if (!strcmp(argv[0], "expire"))
diff --git a/builtin/repack.c b/builtin/repack.c
index 01440de2d5..9f00806805 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -523,7 +523,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	remove_temporary_files();
 
 	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), 0);
+		write_midx_file(get_object_directory(), NULL, 0);
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/midx.c b/midx.c
index 05c40a98e0..064670c0c0 100644
--- a/midx.c
+++ b/midx.c
@@ -451,6 +451,24 @@ static int pack_info_compare(const void *_a, const void *_b)
 	return strcmp(a->pack_name, b->pack_name);
 }
 
+static int lookup_idx_or_pack_name(struct pack_info *info,
+				   uint32_t nr,
+				   const char *pack_name)
+{
+	uint32_t lo = 0, hi = nr;
+	while (lo < hi) {
+		uint32_t mi = lo + (hi - lo) / 2;
+		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
+		if (cmp < 0)
+			hi = mi;
+		else if (cmp > 0)
+			lo = mi + 1;
+		else
+			return mi;
+	}
+	return -1;
+}
+
 struct pack_list {
 	struct pack_info *info;
 	uint32_t nr;
@@ -502,6 +520,7 @@ struct pack_midx_entry {
 	uint32_t pack_int_id;
 	time_t pack_mtime;
 	uint64_t offset;
+	unsigned preferred : 1;
 };
 
 static int midx_oid_compare(const void *_a, const void *_b)
@@ -513,6 +532,12 @@ static int midx_oid_compare(const void *_a, const void *_b)
 	if (cmp)
 		return cmp;
 
+	/* Sort objects in a preferred pack first when multiple copies exist. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
 	if (a->pack_mtime > b->pack_mtime)
 		return -1;
 	else if (a->pack_mtime < b->pack_mtime)
@@ -540,7 +565,8 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 static void fill_pack_entry(uint32_t pack_int_id,
 			    struct packed_git *p,
 			    uint32_t cur_object,
-			    struct pack_midx_entry *entry)
+			    struct pack_midx_entry *entry,
+			    int preferred)
 {
 	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
 		die(_("failed to locate object %d in packfile"), cur_object);
@@ -549,6 +575,7 @@ static void fill_pack_entry(uint32_t pack_int_id,
 	entry->pack_mtime = p->mtime;
 
 	entry->offset = nth_packed_object_offset(p, cur_object);
+	entry->preferred = !!preferred;
 }
 
 /*
@@ -565,7 +592,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
 static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  struct pack_info *info,
 						  uint32_t nr_packs,
-						  uint32_t *nr_objects)
+						  uint32_t *nr_objects,
+						  uint32_t preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
 	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@@ -602,12 +630,17 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 				nth_midxed_pack_midx_entry(m,
 							   &entries_by_fanout[nr_fanout],
 							   cur_object);
+				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+					entries_by_fanout[nr_fanout].preferred = 1;
+				else
+					entries_by_fanout[nr_fanout].preferred = 0;
 				nr_fanout++;
 			}
 		}
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			uint32_t start = 0, end;
+			int preferred = cur_pack == preferred_pack;
 
 			if (cur_fanout)
 				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
@@ -615,7 +648,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 
 			for (cur_object = start; cur_object < end; cur_object++) {
 				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
-				fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
+				fill_pack_entry(cur_pack,
+						info[cur_pack].p,
+						cur_object,
+						&entries_by_fanout[nr_fanout],
+						preferred);
 				nr_fanout++;
 			}
 		}
@@ -794,7 +831,9 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
 }
 
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
-			       struct string_list *packs_to_drop, unsigned flags)
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       unsigned flags)
 {
 	unsigned char cur_chunk, num_chunks = 0;
 	char *midx_name;
@@ -813,6 +852,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
+	int preferred_pack_idx = -1;
 
 	midx_name = get_midx_filename(object_dir);
 	if (safe_create_leading_directories(midx_name))
@@ -853,7 +893,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries);
+	if (preferred_pack_name) {
+		for (i = 0; i < packs.nr; i++) {
+			if (!cmp_idx_or_pack_name(preferred_pack_name,
+						  packs.info[i].pack_name)) {
+				preferred_pack_idx = i;
+				break;
+			}
+		}
+	}
+
+	entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries,
+				     preferred_pack_idx);
 
 	for (i = 0; i < nr_entries; i++) {
 		if (entries[i].offset > 0x7fffffff)
@@ -913,6 +964,31 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			pack_name_concat_len += strlen(packs.info[i].pack_name) + 1;
 	}
 
+	/*
+	 * Recompute the preferred_pack_idx (if applicable) according to the
+	 * permuted pack order.
+	 */
+	preferred_pack_idx = -1;
+	if (preferred_pack_name) {
+		preferred_pack_idx = lookup_idx_or_pack_name(packs.info,
+							     packs.nr,
+							     preferred_pack_name);
+		if (preferred_pack_idx < 0)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+		else {
+			uint32_t orig = packs.info[preferred_pack_idx].orig_pack_int_id;
+			uint32_t perm = pack_perm[orig];
+
+			if (perm == PACK_EXPIRED) {
+				warning(_("preferred pack '%s' is expired"),
+					preferred_pack_name);
+				preferred_pack_idx = -1;
+			} else
+				preferred_pack_idx = perm;
+		}
+	}
+
 	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
@@ -1042,9 +1118,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	return result;
 }
 
-int write_midx_file(const char *object_dir, unsigned flags)
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, flags);
+	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+				   flags);
 }
 
 void clear_midx_file(struct repository *r)
@@ -1279,7 +1358,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	free(count);
 
 	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
+		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1468,7 +1547,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, flags);
+	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
diff --git a/midx.h b/midx.h
index b18cf53bc4..e7fea61109 100644
--- a/midx.h
+++ b/midx.h
@@ -47,7 +47,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
 int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-int write_midx_file(const char *object_dir, unsigned flags);
+int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 2fc3aadbd1..9304b33484 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -31,6 +31,14 @@ midx_read_expect () {
 	test_cmp expect actual
 }
 
+midx_expect_object_offset () {
+	OID="$1"
+	OFFSET="$2"
+	OBJECT_DIR="$3"
+	test-tool read-midx --show-objects $OBJECT_DIR >actual &&
+	grep "^$OID $OFFSET" actual
+}
+
 test_expect_success 'setup' '
 	test_oid_cache <<-EOF
 	idxoff sha1:2999
@@ -234,6 +242,37 @@ test_expect_success 'warn on improper hash version' '
 	)
 '
 
+test_expect_success 'midx picks objects from preferred pack' '
+	test_when_finished rm -rf preferred.git &&
+	git init --bare preferred.git &&
+	(
+		cd preferred.git &&
+
+		a=$(echo "a" | git hash-object -w --stdin) &&
+		b=$(echo "b" | git hash-object -w --stdin) &&
+		c=$(echo "c" | git hash-object -w --stdin) &&
+
+		# Set up two packs, duplicating the object "B" at different
+		# offsets.
+		git pack-objects objects/pack/test-AB <<-EOF &&
+		$a
+		$b
+		EOF
+		bc=$(git pack-objects objects/pack/test-BC <<-EOF
+		$b
+		$c
+		EOF
+		) &&
+
+		git multi-pack-index --object-dir=objects \
+			--preferred-pack=test-BC-$bc.idx write 2>err &&
+		test_must_be_empty err &&
+
+		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
+			cut -d" " -f1) &&
+		midx_expect_object_offset $b $ofs objects
+	)
+'
 
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 3/9] midx: don't free midx_name early
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
  2021-02-10 23:02 ` [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
  2021-02-10 23:02 ` [PATCH 2/9] midx: allow marking a pack as preferred Taylor Blau
@ 2021-02-10 23:02 ` Taylor Blau
  2021-02-10 23:02 ` [PATCH 4/9] midx: keep track of the checksum Taylor Blau
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:02 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

A subsequent patch will need to refer back to 'midx_name' later on in
the function. In fact, this variable is already free()'d later on, so
this makes the later free() no longer redundant.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/midx.c b/midx.c
index 064670c0c0..34fb9de3f3 100644
--- a/midx.c
+++ b/midx.c
@@ -995,7 +995,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
-	FREE_AND_NULL(midx_name);
 
 	if (packs.m)
 		close_midx(packs.m);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 4/9] midx: keep track of the checksum
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (2 preceding siblings ...)
  2021-02-10 23:02 ` [PATCH 3/9] midx: don't free midx_name early Taylor Blau
@ 2021-02-10 23:02 ` Taylor Blau
  2021-02-11  2:33   ` Derrick Stolee
  2021-02-10 23:03 ` [PATCH 5/9] midx: make some functions non-static Taylor Blau
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:02 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

write_midx_internal() uses a hashfile to write the multi-pack index, but
discards its checksum. This makes sense, since nothing that takes place
after writing the MIDX cares about its checksum.

That is about to change in a subsequent patch, when the optional
reverse index corresponding to the MIDX will want to include the MIDX's
checksum.

Store the checksum of the MIDX in preparation for that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 34fb9de3f3..6e47c726af 100644
--- a/midx.c
+++ b/midx.c
@@ -837,6 +837,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 {
 	unsigned char cur_chunk, num_chunks = 0;
 	char *midx_name;
+	unsigned char midx_hash[GIT_MAX_RAWSZ];
 	uint32_t i;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
@@ -1098,7 +1099,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		    written,
 		    chunk_offsets[num_chunks]);
 
-	finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	commit_lock_file(&lk);
 
 cleanup:
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 5/9] midx: make some functions non-static
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (3 preceding siblings ...)
  2021-02-10 23:02 ` [PATCH 4/9] midx: keep track of the checksum Taylor Blau
@ 2021-02-10 23:03 ` Taylor Blau
  2021-02-10 23:03 ` [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:03 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

In a subsequent commit, pack-revindex.c will become responsible for
sorting a list of objects in the "MIDX pack order" (which will be
defined in the following patch). To do so, it will need to be know the
pack identifier and offset within that pack for each object in the MIDX.

The MIDX code already has functions for doing just that
(nth_midxed_offset() and nth_midxed_pack_int_id()), but they are
statically declared.

Since there is no reason that they couldn't be exposed publicly, and
because they are already doing exactly what the caller in
pack-revindex.c will want, expose them publicly so that they can be
reused there.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 ++--
 midx.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 6e47c726af..bf258c4fde 100644
--- a/midx.c
+++ b/midx.c
@@ -260,7 +260,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 	return oid;
 }
 
-static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
 	const unsigned char *offset_data;
 	uint32_t offset32;
@@ -279,7 +279,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	return offset32;
 }
 
-static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
 	return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH);
 }
diff --git a/midx.h b/midx.h
index e7fea61109..93bd68189e 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,8 @@ struct multi_pack_index {
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (4 preceding siblings ...)
  2021-02-10 23:03 ` [PATCH 5/9] midx: make some functions non-static Taylor Blau
@ 2021-02-10 23:03 ` Taylor Blau
  2021-02-11  2:48   ` Derrick Stolee
  2021-02-10 23:03 ` [PATCH 7/9] pack-revindex: read " Taylor Blau
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:03 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

As a prerequisite to implementing multi-pack bitmaps, motivate and
describe the format and ordering of the multi-pack reverse index.

The subsequent patch will implement reading this format, and the patch
after that will implement writing it while producing a multi-pack index.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/pack-format.txt | 83 +++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8833b71c8b..a14722f119 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -376,3 +376,86 @@ CHUNK DATA:
 TRAILER:
 
 	Index checksum of the above contents.
+
+== multi-pack-index reverse indexes
+
+Similar to the pack-based reverse index, the multi-pack index can also
+be used to generate a reverse index.
+
+Instead of mapping between offset, pack-, and index position, this
+reverse index maps between an object's position within the midx, and
+that object's position within a pseudo-pack that the midx describes.
+Crucially, the objects' positions within this pseudo-pack are the same
+as their bit positions in a multi-pack reachability bitmap.
+
+As a motivating example, consider the multi-pack reachability bitmap
+(which does not yet exist, but is what we are building towards here). We
+need each bit to correspond to an object covered by the midx, and we
+need to be able to convert bit positions back to index positions (from
+which we can get the oid, etc).
+
+One solution is to let each bit position in the index correspond to
+the same position in the oid-sorted index stored by the midx. But
+because oids are effectively random, there resulting reachability
+bitmaps would have no locality, and thus compress poorly. (This is the
+reason that single-pack bitmaps use the pack ordering, and not the .idx
+ordering, for the same purpose.)
+
+So we'd like to define an ordering for the whole midx based around
+pack ordering. We can think of it as a pseudo-pack created by the
+concatenation of all of the packs in the midx. E.g., if we had a midx
+with three packs (a, b, c), with 10, 15, and 20 objects respectively, we
+can imagine an ordering of the objects like:
+
+    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
+
+where the ordering of the packs is defined by the midx's pack list,
+and then the ordering of objects within each pack is the same as the
+order in the actual packfile.
+
+Given the list of packs and their counts of objects, you can
+na&iuml;vely reconstruct that pseudo-pack ordering (e.g., the object at
+position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
+slots). But there's a catch. Objects may be duplicated between packs, in
+which case the midx only stores one pointer to the object (and thus we'd
+want only one slot in the bitmap).
+
+Callers could handle duplicates themselves by reading objects in order
+of their bit-position, but that's linear in the number of objects, and
+much too expensive for ordinary bitmap lookups. Building a reverse index
+solves this, since it is the logical inverse of the index, and that
+index has already removed duplicates. But, building a reverse index on
+the fly can be expensive. Since we already have an on-disk format for
+pack-based reverse indexes, let's reuse it for the midx's pseudo-pack,
+too.
+
+Objects from the midx are ordered as follows to string together the
+pseudo-pack. Let _pack(o)_ return the pack from which _o_ was selected
+by the midx, and define an ordering of packs based on their numeric ID
+(as stored by the midx). Let _offset(o)_ return the object offset of _o_
+within _pack(o)_. Then, compare _o~1~_ and _o~2~_ as follows:
+
+  - If one of _pack(o~1~)_ and _pack(o~2~)_ is preferred and the other
+    is not, then the preferred one sorts first.
++
+(This is a detail that allows the midx bitmap to determine which
+pack should be used by the pack-reuse mechanism, since it can ask
+the midx for the pack containing the object at bit position 0).
+
+  - If _pack(o~1~) &ne; pack(o~2~)_, then sort the two objects in
+    descending order based on the pack ID.
+
+  - Otherwise, _pack(o~1~) &equals; pack(o~2~)_, and the objects are
+    sorted in pack-order (i.e., _o~1~_ sorts ahead of _o~2~_ exactly
+    when _offset(o~1~) &lt; offset(o~2~)_).
+
+In short, a midx's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the midx, laid out in pack order, and the
+packs arranged in midx order (with the preferred pack coming first).
+
+Finally, note that the midx's reverse index is not stored as a chunk in
+the multi-pack-index itself. This is done because the reverse index
+includes the checksum of the pack or midx to which it belongs, which
+makes it impossible to write in the midx. To avoid races when rewriting
+the midx, a midx reverse index includes the midx's checksum in its
+filename (e.g., `multi-pack-index-xyz.rev`).
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 7/9] pack-revindex: read multi-pack reverse indexes
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (5 preceding siblings ...)
  2021-02-10 23:03 ` [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
@ 2021-02-10 23:03 ` Taylor Blau
  2021-02-11  2:53   ` Derrick Stolee
  2021-02-11  7:54   ` Junio C Hamano
  2021-02-10 23:03 ` [PATCH 8/9] pack-write.c: extract 'write_rev_file_order' Taylor Blau
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:03 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

Implement reading for multi-pack reverse indexes, as described in the
previous patch.

Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.

There are three new functions exposed by the revindex API:

  - load_midx_revindex(): loads the reverse index corresponding to the
    given multi-pack index.

  - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
    multi-pack index and pseudo-pack order.

load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.

load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.

pack_pos_to_midx() simply reads the corresponding entry out of the
table.

midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array, but rather a _permutation_.

So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c          |  11 +++++
 midx.h          |   6 +++
 pack-revindex.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++
 pack-revindex.h |  46 ++++++++++++++++++++
 packfile.c      |   3 ++
 5 files changed, 178 insertions(+)

diff --git a/midx.c b/midx.c
index bf258c4fde..12bfce8bb1 100644
--- a/midx.c
+++ b/midx.c
@@ -48,11 +48,22 @@ static uint8_t oid_version(void)
 	}
 }
 
+static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+{
+	return m->data + m->data_len - the_hash_algo->rawsz;
+}
+
 static char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+char *get_midx_rev_filename(struct multi_pack_index *m)
+{
+	return xstrfmt("%s/pack/multi-pack-index-%s.rev",
+		       m->object_dir, hash_to_hex(get_midx_checksum(m)));
+}
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
 {
 	struct multi_pack_index *m = NULL;
diff --git a/midx.h b/midx.h
index 93bd68189e..0a8294d2ee 100644
--- a/midx.h
+++ b/midx.h
@@ -15,6 +15,10 @@ struct multi_pack_index {
 	const unsigned char *data;
 	size_t data_len;
 
+	const uint32_t *revindex_data;
+	const uint32_t *revindex_map;
+	size_t revindex_len;
+
 	uint32_t signature;
 	unsigned char version;
 	unsigned char hash_len;
@@ -37,6 +41,8 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 
+char *get_midx_rev_filename(struct multi_pack_index *m);
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
diff --git a/pack-revindex.c b/pack-revindex.c
index 83fe4de773..da4101f4b2 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -3,6 +3,7 @@
 #include "object-store.h"
 #include "packfile.h"
 #include "config.h"
+#include "midx.h"
 
 struct revindex_entry {
 	off_t offset;
@@ -292,6 +293,29 @@ int load_pack_revindex(struct packed_git *p)
 	return -1;
 }
 
+int load_midx_revindex(struct multi_pack_index *m)
+{
+	char *revindex_name;
+	int ret;
+	if (m->revindex_data)
+		return 0;
+
+	revindex_name = get_midx_rev_filename(m);
+
+	ret = load_revindex_from_disk(revindex_name,
+				      m->num_objects,
+				      &m->revindex_map,
+				      &m->revindex_len);
+	if (ret)
+		goto cleanup;
+
+	m->revindex_data = (const uint32_t *)((const char *)m->revindex_map + RIDX_HEADER_SIZE);
+
+cleanup:
+	free(revindex_name);
+	return ret;
+}
+
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	unsigned lo, hi;
@@ -346,3 +370,91 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
 	else
 		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
 }
+
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
+{
+	if (!m->revindex_data)
+		BUG("pack_pos_to_midx: reverse index not yet loaded");
+	if (m->num_objects <= pos)
+		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
+	return get_be32((const char *)m->revindex_data + (pos * sizeof(uint32_t)));
+}
+
+struct midx_pack_key {
+	uint32_t pack;
+	off_t offset;
+
+	uint32_t preferred_pack;
+	struct multi_pack_index *midx;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+	const struct midx_pack_key *key = va;
+	struct multi_pack_index *midx = key->midx;
+
+	uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+	uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
+	off_t versus_offset;
+
+	uint32_t key_preferred = key->pack == key->preferred_pack;
+	uint32_t versus_preferred = versus_pack == key->preferred_pack;
+
+	/*
+	 * First, compare the preferred-ness, noting that the preferred pack
+	 * comes first.
+	 */
+	if (key_preferred && !versus_preferred)
+		return -1;
+	else if (!key_preferred && versus_preferred)
+		return 1;
+
+	/* Then, break ties first by comparing the pack IDs. */
+	if (key->pack < versus_pack)
+		return -1;
+	else if (key->pack > versus_pack)
+		return 1;
+
+	/* Finally, break ties by comparing offsets within a pack. */
+	versus_offset = nth_midxed_offset(midx, versus);
+	if (key->offset < versus_offset)
+		return -1;
+	else if (key->offset > versus_offset)
+		return 1;
+
+	return 0;
+}
+
+int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
+{
+	struct midx_pack_key key;
+	uint32_t *found;
+
+	if (!m->revindex_data)
+		BUG("midx_to_pack_pos: reverse index not yet loaded");
+	if (m->num_objects <= at)
+		BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
+
+	key.pack = nth_midxed_pack_int_id(m, at);
+	key.offset = nth_midxed_offset(m, at);
+	key.midx = m;
+	/*
+	 * The preferred pack sorts first, so determine its identifier by
+	 * looking at the first object in pseudo-pack order.
+	 *
+	 * Note that if no --preferred-pack is explicitly given when writing a
+	 * multi-pack index, then whichever pack has the lowest identifier
+	 * implicitly is preferred (and includes all its objects, since ties are
+	 * broken first by pack identifier).
+	 */
+	key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+
+	found = bsearch(&key, m->revindex_data, m->num_objects,
+			sizeof(uint32_t), midx_pack_order_cmp);
+
+	if (!found)
+		return error("bad offset for revindex");
+
+	*pos = found - m->revindex_data;
+	return 0;
+}
diff --git a/pack-revindex.h b/pack-revindex.h
index ba7c82c125..49d604aa9a 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -14,6 +14,20 @@
  *
  * - offset: the byte offset within the .pack file at which the object contents
  *   can be found
+ *
+ * The revindex can also be used with a multi-pack index (MIDX). In this
+ * setting:
+ *
+ *   - index position refers to an object's numeric position within the MIDX
+ *
+ *   - pack position refers to an object's position within a non-existent pack
+ *     described by the MIDX. The pack structure is described in
+ *     Documentation/technical/pack-format.txt.
+ *
+ *     It is effectively a concatanation of all packs in the MIDX (ordered by
+ *     their numeric ID within the MIDX) in their original order within each
+ *     pack), removing duplicates, and placing the preferred pack (if any)
+ *     first.
  */
 
 
@@ -24,6 +38,7 @@
 #define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
 
 struct packed_git;
+struct multi_pack_index;
 
 /*
  * load_pack_revindex populates the revindex's internal data-structures for the
@@ -34,6 +49,15 @@ struct packed_git;
  */
 int load_pack_revindex(struct packed_git *p);
 
+/*
+ * load_midx_revindex loads the '.rev' file corresponding to the given
+ * multi-pack index by mmap-ing it and assigning pointers in the
+ * multi_pack_index to point at it.
+ *
+ * A negative number is returned on error.
+ */
+int load_midx_revindex(struct multi_pack_index *m);
+
 /*
  * offset_to_pack_pos converts an object offset to a pack position. This
  * function returns zero on success, and a negative number otherwise. The
@@ -71,4 +95,26 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
  */
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
 
+/*
+ * pack_pos_to_midx converts the object at position "pos" within the MIDX
+ * pseudo-pack into a MIDX position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in time O(log N) with the number of objects in the MIDX.
+ */
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos);
+
+/*
+ * midx_to_pack_pos converts from the MIDX-relative position at "at" to the
+ * corresponding pack position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in constant time.
+ */
+int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 1fec12ac5f..82623e0cb4 100644
--- a/packfile.c
+++ b/packfile.c
@@ -862,6 +862,9 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
+	if (starts_with(file_name, "multi-pack-index") &&
+	    ends_with(file_name, ".rev"))
+		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
 	    ends_with(file_name, ".pack") ||
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 8/9] pack-write.c: extract 'write_rev_file_order'
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (6 preceding siblings ...)
  2021-02-10 23:03 ` [PATCH 7/9] pack-revindex: read " Taylor Blau
@ 2021-02-10 23:03 ` Taylor Blau
  2021-02-10 23:03 ` [PATCH 9/9] pack-revindex: write multi-pack reverse indexes Taylor Blau
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:03 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

Existing callers provide the reverse index code with an array of 'struct
pack_idx_entry *'s, which is then sorted by pack order (comparing the
offsets of each object within the pack).

Prepare for the multi-pack index to write a .rev file by providing a way
to write the reverse index without an array of pack_idx_entry (which the
MIDX code does not have).

Instead, callers can invoke 'write_rev_index_positions()', which takes
an array of uint32_t's. The ith entry in this array specifies the ith
object's (in index order) position within the pack (in pack order).

Expose this new function for use in a later patch, and rewrite the
existing write_rev_file() in terms of this new function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-write.c | 39 ++++++++++++++++++++++++++++-----------
 pack.h       |  1 +
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/pack-write.c b/pack-write.c
index 680c36755d..75fcf70db1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -201,21 +201,12 @@ static void write_rev_header(struct hashfile *f)
 }
 
 static void write_rev_index_positions(struct hashfile *f,
-				      struct pack_idx_entry **objects,
+				      uint32_t *pack_order,
 				      uint32_t nr_objects)
 {
-	uint32_t *pack_order;
 	uint32_t i;
-
-	ALLOC_ARRAY(pack_order, nr_objects);
-	for (i = 0; i < nr_objects; i++)
-		pack_order[i] = i;
-	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
-
 	for (i = 0; i < nr_objects; i++)
 		hashwrite_be32(f, pack_order[i]);
-
-	free(pack_order);
 }
 
 static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
@@ -228,6 +219,32 @@ const char *write_rev_file(const char *rev_name,
 			   uint32_t nr_objects,
 			   const unsigned char *hash,
 			   unsigned flags)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+	const char *ret;
+
+	ALLOC_ARRAY(pack_order, nr_objects);
+	for (i = 0; i < nr_objects; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
+
+	if (!(flags & (WRITE_REV | WRITE_REV_VERIFY)))
+		return NULL;
+
+	ret = write_rev_file_order(rev_name, pack_order, nr_objects, hash,
+				   flags);
+
+	free(pack_order);
+
+	return ret;
+}
+
+const char *write_rev_file_order(const char *rev_name,
+				 uint32_t *pack_order,
+				 uint32_t nr_objects,
+				 const unsigned char *hash,
+				 unsigned flags)
 {
 	struct hashfile *f;
 	int fd;
@@ -262,7 +279,7 @@ const char *write_rev_file(const char *rev_name,
 
 	write_rev_header(f);
 
-	write_rev_index_positions(f, objects, nr_objects);
+	write_rev_index_positions(f, pack_order, nr_objects);
 	write_rev_trailer(f, hash);
 
 	if (rev_name && adjust_shared_perm(rev_name) < 0)
diff --git a/pack.h b/pack.h
index afdcf8f5c7..09c2a7dd3a 100644
--- a/pack.h
+++ b/pack.h
@@ -94,6 +94,7 @@ struct ref;
 void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
 
 const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
+const char *write_rev_file_order(const char *rev_name, uint32_t *pack_order, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 9/9] pack-revindex: write multi-pack reverse indexes
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (7 preceding siblings ...)
  2021-02-10 23:03 ` [PATCH 8/9] pack-write.c: extract 'write_rev_file_order' Taylor Blau
@ 2021-02-10 23:03 ` Taylor Blau
  2021-02-11  2:58 ` [PATCH 0/9] midx: implement a multi-pack reverse index Derrick Stolee
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-10 23:03 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

Implement the writing half of multi-pack reverse indexes. This is
nothing more than the format describe a few patches ago, with a new set
of helper functions that will be used to clear out stale .rev files
corresponding to old MIDXs.

Unfortunately, a very similar comparison function as the one implemented
recently in pack-revindex.c is reimplemented here, this time accepting a
MIDX-internal type. An effort to DRY these up would create more
indirection and overhead than is necessary, so it isn't pursued here.

Currently, there are no callers which pass the MIDX_WRITE_REV_INDEX
flag, meaning that this is all dead code. But, that won't be the case
for long, since subsequent patches will introduce the multi-pack bitmap,
which will begin passing this field.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 midx.h |   1 +
 2 files changed, 124 insertions(+)

diff --git a/midx.c b/midx.c
index 12bfce8bb1..3d79a1930a 100644
--- a/midx.c
+++ b/midx.c
@@ -11,6 +11,7 @@
 #include "trace2.h"
 #include "run-command.h"
 #include "repository.h"
+#include "pack.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -841,6 +842,78 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
 	return written;
 }
 
+struct midx_pack_order_data {
+	struct pack_midx_entry *entries;
+	uint32_t *pack_perm;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb, void *_data)
+{
+	struct midx_pack_order_data *data = _data;
+
+	struct pack_midx_entry *a = &data->entries[*(const uint32_t *)va];
+	struct pack_midx_entry *b = &data->entries[*(const uint32_t *)vb];
+
+	uint32_t perm_a = data->pack_perm[a->pack_int_id];
+	uint32_t perm_b = data->pack_perm[b->pack_int_id];
+
+	/* Sort objects in the preferred pack ahead of any others. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
+	/* Then, order objects by which packs they appear in. */
+	if (perm_a < perm_b)
+		return -1;
+	if (perm_a > perm_b)
+		return 1;
+
+	/* Then, disambiguate by their offset within each pack. */
+	if (a->offset < b->offset)
+		return -1;
+	if (a->offset > b->offset)
+		return 1;
+
+	return 0;
+}
+
+static uint32_t *midx_pack_order(struct pack_midx_entry *entries,
+				 uint32_t *pack_perm,
+				 uint32_t entries_nr)
+{
+	struct midx_pack_order_data data;
+	uint32_t *pack_order;
+	uint32_t i;
+
+	data.entries = entries;
+	data.pack_perm = pack_perm;
+
+	ALLOC_ARRAY(pack_order, entries_nr);
+	for (i = 0; i < entries_nr; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, entries_nr, midx_pack_order_cmp, &data);
+
+	return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+				     uint32_t *pack_order,
+				     uint32_t entries_nr)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+	write_rev_file_order(buf.buf, pack_order, entries_nr, midx_hash,
+			     WRITE_REV);
+
+	strbuf_release(&buf);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash);
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -854,6 +927,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	struct lock_file lk;
 	struct pack_list packs;
 	uint32_t *pack_perm = NULL;
+	uint32_t *pack_order = NULL;
 	uint64_t written = 0;
 	uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1];
 	uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
@@ -1111,6 +1185,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		    chunk_offsets[num_chunks]);
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+	if (flags & MIDX_WRITE_REV_INDEX)
+		pack_order = midx_pack_order(entries, pack_perm, nr_entries);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		write_midx_reverse_index(midx_name, midx_hash, pack_order,
+					 nr_entries);
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 	commit_lock_file(&lk);
 
 cleanup:
@@ -1125,6 +1207,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(packs.info);
 	free(entries);
 	free(pack_perm);
+	free(pack_order);
 	free(midx_name);
 	return result;
 }
@@ -1137,6 +1220,44 @@ int write_midx_file(const char *object_dir,
 				   flags);
 }
 
+struct clear_midx_data {
+	char *keep;
+	const char *ext;
+};
+
+static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
+				const char *file_name, void *_data)
+{
+	struct clear_midx_data *data = _data;
+
+	if (!(starts_with(file_name, "multi-pack-index-") &&
+	      ends_with(file_name, data->ext)))
+		return;
+	if (data->keep && !strcmp(data->keep, file_name))
+		return;
+
+	if (unlink(full_path))
+		die_errno(_("failed to remove %s"), full_path);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash)
+{
+	struct clear_midx_data data;
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	if (keep_hash)
+		data.keep = xstrfmt("multi-pack-index-%s%s",
+				    hash_to_hex(keep_hash), ext);
+	data.ext = ext;
+
+	for_each_file_in_pack_dir(r->objects->odb->path,
+				  clear_midx_file_ext,
+				  &data);
+
+	free(data.keep);
+}
+
 void clear_midx_file(struct repository *r)
 {
 	char *midx = get_midx_filename(r->objects->odb->path);
@@ -1149,6 +1270,8 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".rev", NULL);
+
 	free(midx);
 }
 
diff --git a/midx.h b/midx.h
index 0a8294d2ee..8684cf0fef 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,7 @@ struct multi_pack_index {
 };
 
 #define MIDX_PROGRESS     (1 << 0)
+#define MIDX_WRITE_REV_INDEX (1 << 1)
 
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects'
  2021-02-10 23:02 ` [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
@ 2021-02-11  2:27   ` Derrick Stolee
  2021-02-11  2:34     ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-11  2:27 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff

On 2/10/21 6:02 PM, Taylor Blau wrote:
> +	if (show_objects) {
> +		struct object_id oid;
> +		struct pack_entry e;
> +
> +		for (i = 0; i < m->num_objects; i++) {
> +			nth_midxed_object_oid(&oid, m, i);
> +			fill_midx_entry(the_repository, &oid, &e, m);
> +
> +			printf("%s %"PRIu64"\t%s\n",
> +			       oid_to_hex(&oid), e.offset, e.p->pack_name);
> +		}
> +		return 0;
> +	}
> +
>  	printf("header: %08x %d %d %d %d\n",
>  	       m->signature,
>  	       m->version,

It seems a little odd to me that the list of objects happens after
the header information. Probably doesn't matter in your test cases,
but I sometimes use the test helpers to diagnose data during development
and could see piping this output into 'less' and wanting the header
at the top.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 4/9] midx: keep track of the checksum
  2021-02-10 23:02 ` [PATCH 4/9] midx: keep track of the checksum Taylor Blau
@ 2021-02-11  2:33   ` Derrick Stolee
  2021-02-11  2:35     ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-11  2:33 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff

On 2/10/21 6:02 PM, Taylor Blau wrote:
> +	unsigned char midx_hash[GIT_MAX_RAWSZ];

I was initially thinking we should use something like
'struct object_id' here, but the hash we are storing
doesn't correspond to an object, which would be
confusing. I suppose this is the most correct thing
to do.

> -	finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
> +	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);

And this is of course correct. The issue will be what
happens to 'midx_hash' in the later changes. Will
keep reading.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects'
  2021-02-11  2:27   ` Derrick Stolee
@ 2021-02-11  2:34     ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-11  2:34 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff

On Wed, Feb 10, 2021 at 09:27:29PM -0500, Derrick Stolee wrote:
> On 2/10/21 6:02 PM, Taylor Blau wrote:
> > +	if (show_objects) {
> > +		struct object_id oid;
> > +		struct pack_entry e;
> > +
> > +		for (i = 0; i < m->num_objects; i++) {
> > +			nth_midxed_object_oid(&oid, m, i);
> > +			fill_midx_entry(the_repository, &oid, &e, m);
> > +
> > +			printf("%s %"PRIu64"\t%s\n",
> > +			       oid_to_hex(&oid), e.offset, e.p->pack_name);
> > +		}
> > +		return 0;
> > +	}
> > +
> >  	printf("header: %08x %d %d %d %d\n",
> >  	       m->signature,
> >  	       m->version,
>
> It seems a little odd to me that the list of objects happens after
> the header information. Probably doesn't matter in your test cases,
> but I sometimes use the test helpers to diagnose data during development
> and could see piping this output into 'less' and wanting the header
> at the top.

Indeed. In theory you could pipe to tail instead (or to less and
immediately hit 'G'), but I can't think of a good reason that this would
have appeared above the header when I originally wrote the patch.

Anyway, it doesn't seem that the tests care about where this is (they're
just looking for for a line that begins with the object id and ends with
its offset), so I think this could probably be moved without thinking
too hard about it.

> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 4/9] midx: keep track of the checksum
  2021-02-11  2:33   ` Derrick Stolee
@ 2021-02-11  2:35     ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-11  2:35 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff

On Wed, Feb 10, 2021 at 09:33:36PM -0500, Derrick Stolee wrote:
> On 2/10/21 6:02 PM, Taylor Blau wrote:
> > +	unsigned char midx_hash[GIT_MAX_RAWSZ];
>
> I was initially thinking we should use something like
> 'struct object_id' here, but the hash we are storing
> doesn't correspond to an object, which would be
> confusing. I suppose this is the most correct thing
> to do.

Yeah. There are a number of places that abuse the unsigned char array
inside of object_id, but there's no good reason to.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes
  2021-02-10 23:03 ` [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
@ 2021-02-11  2:48   ` Derrick Stolee
  2021-02-11  3:03     ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-11  2:48 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff

On 2/10/21 6:03 PM, Taylor Blau wrote:> +Instead of mapping between offset, pack-, and index position, this

The "pack-," should be paired with "index-position" or drop the
hyphen in both cases. Perhaps just be explicit, especially since
"position" doesn't match with "offset":

  Instead  of mapping between pack offset, pack position, and index
  position, ...

> +reverse index maps between an object's position within the midx, and
> +that object's position within a pseudo-pack that the midx describes.

nit: use multi-pack-index or MIDX, not lower-case 'midx'.

> +Crucially, the objects' positions within this pseudo-pack are the same
> +as their bit positions in a multi-pack reachability bitmap.
> +
> +As a motivating example, consider the multi-pack reachability bitmap
> +(which does not yet exist, but is what we are building towards here). We
> +need each bit to correspond to an object covered by the midx, and we
> +need to be able to convert bit positions back to index positions (from
> +which we can get the oid, etc).

These paragraphs are awkward. Instead of operating in the hypothetical
world of reachability bitmaps, focus on the fact that bitmaps need
a bidirectional mapping between "bit position" and an object ID.

Here is an attempt to reword some of the context you are using here.
Feel free to take as much or as little as you want.

  The multi-pack-index stores the object IDs in lexicographical order
  (lex-order) to allow binary search. To allow compressible reachability
  bitmaps to pair with a multi-pack-index, a different ordering is
  required. When paired with a single packfile, the order used is the
  object order within the packfile (called the pack-order). Construct
  a "pseudo-pack" by concatenating all tracked packfiles in the
  multi-pack-index. We now need a mapping between the lex-order and the
  pseudo-pack-order.

> +One solution is to let each bit position in the index correspond to
> +the same position in the oid-sorted index stored by the midx. But
> +because oids are effectively random, there resulting reachability
> +bitmaps would have no locality, and thus compress poorly. (This is the
> +reason that single-pack bitmaps use the pack ordering, and not the .idx
> +ordering, for the same purpose.)
> +
> +So we'd like to define an ordering for the whole midx based around
> +pack ordering. We can think of it as a pseudo-pack created by the
> +concatenation of all of the packs in the midx. E.g., if we had a midx
> +with three packs (a, b, c), with 10, 15, and 20 objects respectively, we
> +can imagine an ordering of the objects like:
> +
> +    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
> +
> +where the ordering of the packs is defined by the midx's pack list,
> +and then the ordering of objects within each pack is the same as the
> +order in the actual packfile.
> +
> +Given the list of packs and their counts of objects, you can
> +na&iuml;vely reconstruct that pseudo-pack ordering (e.g., the object at
> +position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
> +slots). But there's a catch. Objects may be duplicated between packs, in
> +which case the midx only stores one pointer to the object (and thus we'd
> +want only one slot in the bitmap).
> +
> +Callers could handle duplicates themselves by reading objects in order
> +of their bit-position, but that's linear in the number of objects, and
> +much too expensive for ordinary bitmap lookups. Building a reverse index
> +solves this, since it is the logical inverse of the index, and that
> +index has already removed duplicates. But, building a reverse index on
> +the fly can be expensive. Since we already have an on-disk format for
> +pack-based reverse indexes, let's reuse it for the midx's pseudo-pack,
> +too.
> +
> +Objects from the midx are ordered as follows to string together the
> +pseudo-pack. Let _pack(o)_ return the pack from which _o_ was selected
> +by the midx, and define an ordering of packs based on their numeric ID
> +(as stored by the midx). Let _offset(o)_ return the object offset of _o_
> +within _pack(o)_. Then, compare _o~1~_ and _o~2~_ as follows:
> +
> +  - If one of _pack(o~1~)_ and _pack(o~2~)_ is preferred and the other
> +    is not, then the preferred one sorts first.
> ++
> +(This is a detail that allows the midx bitmap to determine which
> +pack should be used by the pack-reuse mechanism, since it can ask
> +the midx for the pack containing the object at bit position 0).
> +
> +  - If _pack(o~1~) &ne; pack(o~2~)_, then sort the two objects in
> +    descending order based on the pack ID.
> +
> +  - Otherwise, _pack(o~1~) &equals; pack(o~2~)_, and the objects are
> +    sorted in pack-order (i.e., _o~1~_ sorts ahead of _o~2~_ exactly
> +    when _offset(o~1~) &lt; offset(o~2~)_).
> +
> +In short, a midx's pseudo-pack is the de-duplicated concatenation of
> +objects in packs stored by the midx, laid out in pack order, and the
> +packs arranged in midx order (with the preferred pack coming first).
> +
> +Finally, note that the midx's reverse index is not stored as a chunk in
> +the multi-pack-index itself. This is done because the reverse index
> +includes the checksum of the pack or midx to which it belongs, which
> +makes it impossible to write in the midx. To avoid races when rewriting
> +the midx, a midx reverse index includes the midx's checksum in its
> +filename (e.g., `multi-pack-index-xyz.rev`).

The rest of these details make sense and sufficiently motivate the
ordering, once the concept is clear.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 7/9] pack-revindex: read multi-pack reverse indexes
  2021-02-10 23:03 ` [PATCH 7/9] pack-revindex: read " Taylor Blau
@ 2021-02-11  2:53   ` Derrick Stolee
  2021-02-11  3:04     ` Taylor Blau
  2021-02-11  7:54   ` Junio C Hamano
  1 sibling, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-11  2:53 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff

On 2/10/21 6:03 PM, Taylor Blau wrote:
> Implement reading for multi-pack reverse indexes, as described in the
> previous patch.
> 
> Note that these functions don't yet have any callers, and won't until
> multi-pack reachability bitmaps are introduced in a later patch series.
> In the meantime, this patch implements some of the infrastructure
> necessary to support multi-pack bitmaps.
> 
> There are three new functions exposed by the revindex API:
> 
>   - load_midx_revindex(): loads the reverse index corresponding to the
>     given multi-pack index.
> 
>   - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
>     multi-pack index and pseudo-pack order.
> 
> load_midx_revindex() and pack_pos_to_midx() are both relatively
> straightforward.
> 
> load_midx_revindex() needs a few functions to be exposed from the midx
> API. One to get the checksum of a midx, and another to get the .rev's
> filename. Similar to recent changes in the packed_git struct, three new
> fields are added to the multi_pack_index struct: one to keep track of
> the size, one to keep track of the mmap'd pointer, and another to point
> past the header and at the reverse index's data.
> 
> pack_pos_to_midx() simply reads the corresponding entry out of the
> table.
> 
> midx_to_pack_pos() is the trickiest, since it needs to find an object's
> position in the psuedo-pack order, but that order can only be recovered
> in the .rev file itself. This mapping can be implemented with a binary
> search, but note that the thing we're binary searching over isn't an
> array, but rather a _permutation_.
> 
> So, when comparing two items, it's helpful to keep in mind the
> difference. Instead of a traditional binary search, where you are
> comparing two things directly, here we're comparing a (pack, offset)
> tuple with an index into the multi-pack index. That index describes
> another (pack, offset) tuple, and it is _those_ two tuples that are
> compared.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c          |  11 +++++
>  midx.h          |   6 +++
>  pack-revindex.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++
>  pack-revindex.h |  46 ++++++++++++++++++++
>  packfile.c      |   3 ++
>  5 files changed, 178 insertions(+)
> 
> diff --git a/midx.c b/midx.c
> index bf258c4fde..12bfce8bb1 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -48,11 +48,22 @@ static uint8_t oid_version(void)
>  	}
>  }
>  
> +static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
> +{
> +	return m->data + m->data_len - the_hash_algo->rawsz;

'struct multi_pack_index' has a 'hash_len' member that you could
use here. It would allow a different hash length in the stored
file than the one required by the repository. Except...

> +}
> +
>  static char *get_midx_filename(const char *object_dir)
>  {
>  	return xstrfmt("%s/pack/multi-pack-index", object_dir);
>  }
>  
> +char *get_midx_rev_filename(struct multi_pack_index *m)
> +{
> +	return xstrfmt("%s/pack/multi-pack-index-%s.rev",
> +		       m->object_dir, hash_to_hex(get_midx_checksum(m)));

...this assumes the hash is of the same length as the_hash_algo,
so you are doing the right thing. Currently, I think we check
that 'm->hash_len == the_hash_algo->rawsz' on load. We'll need
to check this again later when in the transition phase of the
new hash work.

(No changes are needed to your patch.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 0/9] midx: implement a multi-pack reverse index
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (8 preceding siblings ...)
  2021-02-10 23:03 ` [PATCH 9/9] pack-revindex: write multi-pack reverse indexes Taylor Blau
@ 2021-02-11  2:58 ` Derrick Stolee
  2021-02-11  3:06   ` Taylor Blau
  2021-02-11  8:13 ` Junio C Hamano
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-11  2:58 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff

On 2/10/21 6:02 PM, Taylor Blau wrote:
> This series describes and implements a reverse index for the multi-pack index,
> based on a "pseudo-pack" which can be uniquely described by the multi-pack
> index.
> 
> The details of the pseudo-pack, and multi-pack reverse index are laid out in
> detail in the sixth patch.
> 
> This is in support of multi-pack reachability bitmaps, which contain objects
> from the multi-pack index. Likewise, an object's bit position in a multi-pack
> reachability bitmap is determined by its position with that multi-pack index's
> pseudo pack.

This has been a lot of work, but I'm impressed with the progress here.

This series is good prep, and my comments are very minor.

Since the need for these multi-pack-index-<hash>.rev files doesn't show
up until the reachability bitmaps can be paired with MIDX files, it
would make sense to hold this series in 'next' until that one also
stabilizes and they merge to 'master' together.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes
  2021-02-11  2:48   ` Derrick Stolee
@ 2021-02-11  3:03     ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-11  3:03 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff

On Wed, Feb 10, 2021 at 09:48:20PM -0500, Derrick Stolee wrote:
> nit: use multi-pack-index or MIDX, not lower-case 'midx'.

Thanks.

> > +Crucially, the objects' positions within this pseudo-pack are the same
> > +as their bit positions in a multi-pack reachability bitmap.
> > +
> > +As a motivating example, consider the multi-pack reachability bitmap
> > +(which does not yet exist, but is what we are building towards here). We
> > +need each bit to correspond to an object covered by the midx, and we
> > +need to be able to convert bit positions back to index positions (from
> > +which we can get the oid, etc).
>
> These paragraphs are awkward. Instead of operating in the hypothetical
> world of reachability bitmaps, focus on the fact that bitmaps need
> a bidirectional mapping between "bit position" and an object ID.

Hmm. I could buy that these paragraphs are awkward, but I'm not sure
that what you proposed makes it less so.

I may be a bad person to judge what you wrote, since I am familiar with
the details of what it's describing. But my thoughts on that second and
third paragraph are basically:

  - define the valid orderings we might consider objects in a MIDX by,
    indicating which of those orderings we're going to use for
    multi-pack bitmaps

  - motivate the need for a mapping between lexicographic order and
    pseudo-pack order

> Here is an attempt to reword some of the context you are using here.
> Feel free to take as much or as little as you want.
>
>   The multi-pack-index stores the object IDs in lexicographical order
>   (lex-order) to allow binary search. To allow compressible reachability
>   bitmaps to pair with a multi-pack-index, a different ordering is
>   required. When paired with a single packfile, the order used is the
>   object order within the packfile (called the pack-order). Construct
>   a "pseudo-pack" by concatenating all tracked packfiles in the
>   multi-pack-index. We now need a mapping between the lex-order and the
>   pseudo-pack-order.

I struggled with what you wrote because I couldn't seem to neatly
place/replace that paragraph in with the existing text without referring
to yet-undefined concepts.

Maybe the confusion lies in the fact that we stray too far from the
point in the second and third paragraphs. What if we reordered the
second, third, and fourth paragraph like this:

		Instead of mapping between offset, pack-, and index position, this
		reverse index maps between an object's position within the MIDX, and
		that object's position within a pseudo-pack that the MIDX describes.

		To clarify these three orderings, consider a multi-pack reachability
		bitmap (which does not yet exist, but is what we are building towards
		here). Each bit needs to correspond to an object in the MIDX, and so we
		need an efficient mapping from bit position to MIDX position.

		One solution is to let bits occupy the same position in the oid-sorted
		index stored by the MIDX. But because oids are effectively random, there
		resulting reachability bitmaps would have no locality, and thus compress
		poorly. (This is the reason that single-pack bitmaps use the pack
		ordering, and not the .idx ordering, for the same purpose.)

		So we'd like to define an ordering for the whole MIDX based around
		pack ordering, which has far better locality (and thus compresses more
		efficiently). We can think of a pseudo-pack created by the concatenation
		of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
		(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
		ordering of the objects like:

> [snip]
>
> The rest of these details make sense and sufficiently motivate the
> ordering, once the concept is clear.
>
> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 7/9] pack-revindex: read multi-pack reverse indexes
  2021-02-11  2:53   ` Derrick Stolee
@ 2021-02-11  3:04     ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-11  3:04 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff

On Wed, Feb 10, 2021 at 09:53:23PM -0500, Derrick Stolee wrote:
> ...this assumes the hash is of the same length as the_hash_algo,
> so you are doing the right thing. Currently, I think we check
> that 'm->hash_len == the_hash_algo->rawsz' on load. We'll need
> to check this again later when in the transition phase of the
> new hash work.
>
> (No changes are needed to your patch.)

All makes sense, thanks.

Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 0/9] midx: implement a multi-pack reverse index
  2021-02-11  2:58 ` [PATCH 0/9] midx: implement a multi-pack reverse index Derrick Stolee
@ 2021-02-11  3:06   ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-11  3:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff

On Wed, Feb 10, 2021 at 09:58:18PM -0500, Derrick Stolee wrote:
> This series is good prep, and my comments are very minor.

Thanks for your review, especially on a series like this one that adds
lots of code without any users :).

> Since the need for these multi-pack-index-<hash>.rev files doesn't show
> up until the reachability bitmaps can be paired with MIDX files, it
> would make sense to hold this series in 'next' until that one also
> stabilizes and they merge to 'master' together.

That's fine with me. It would be OK to merge this down to 'master', too,
since this is all dead code. In fact, that may be easier to work with,
since the next topic can be based directly off 'master' instead of
having to keep this branch around forever.

Either is fine, though.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 7/9] pack-revindex: read multi-pack reverse indexes
  2021-02-10 23:03 ` [PATCH 7/9] pack-revindex: read " Taylor Blau
  2021-02-11  2:53   ` Derrick Stolee
@ 2021-02-11  7:54   ` Junio C Hamano
  2021-02-11 14:54     ` Taylor Blau
  1 sibling, 1 reply; 171+ messages in thread
From: Junio C Hamano @ 2021-02-11  7:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, peff

Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/pack-revindex.c b/pack-revindex.c
> index 83fe4de773..da4101f4b2 100644
> --- a/pack-revindex.c
> +++ b/pack-revindex.c
> @@ -3,6 +3,7 @@
>  #include "object-store.h"
>  #include "packfile.h"
>  #include "config.h"
> +#include "midx.h"

This seems to assume that the topic tb/pack-revindex-on-disk is
already there?

Just trying to establish what dependencies the bunch of topics have
among themselves.

Thanks.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 0/9] midx: implement a multi-pack reverse index
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (9 preceding siblings ...)
  2021-02-11  2:58 ` [PATCH 0/9] midx: implement a multi-pack reverse index Derrick Stolee
@ 2021-02-11  8:13 ` Junio C Hamano
  2021-02-11 18:37   ` Derrick Stolee
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 171+ messages in thread
From: Junio C Hamano @ 2021-02-11  8:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, peff

Taylor Blau <me@ttaylorr.com> writes:

> Since tb/pack-revindex-on-disk is queued to be merged to 'master', but hasn't
> yet been merged, this series is based on that branch.

This seems to have a light conflict with Derrick's chunked file
format work in midx.c where pack_info is renamed and extended so the
new pack_order variable now needs to become a member in it.

I think I resolved it OK, but without any callers that actually
utilize the new code or tests, it is almost impossible to have any
confidence in the result of the conflict resolution X-<.

Could you two please look over to see if I made any silly mistakes,
when I pushe it out later?

Thanks.



^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 7/9] pack-revindex: read multi-pack reverse indexes
  2021-02-11  7:54   ` Junio C Hamano
@ 2021-02-11 14:54     ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-11 14:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee, peff

On Wed, Feb 10, 2021 at 11:54:23PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > diff --git a/pack-revindex.c b/pack-revindex.c
> > index 83fe4de773..da4101f4b2 100644
> > --- a/pack-revindex.c
> > +++ b/pack-revindex.c
> > @@ -3,6 +3,7 @@
> >  #include "object-store.h"
> >  #include "packfile.h"
> >  #include "config.h"
> > +#include "midx.h"
>
> This seems to assume that the topic tb/pack-revindex-on-disk is
> already there?

Yeah, this topic depends on tb/pack-revindex-on-disk, which looks like
will be merged to master according to your last What's Cooking email.

> Just trying to establish what dependencies the bunch of topics have
> among themselves.

Thanks, I hope that it wasn't too much trouble.

Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 0/9] midx: implement a multi-pack reverse index
  2021-02-11  8:13 ` Junio C Hamano
@ 2021-02-11 18:37   ` Derrick Stolee
  2021-02-11 18:55     ` Junio C Hamano
  0 siblings, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-11 18:37 UTC (permalink / raw)
  To: Junio C Hamano, Taylor Blau; +Cc: git, dstolee, peff

On 2/11/2021 3:13 AM, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
>> Since tb/pack-revindex-on-disk is queued to be merged to 'master', but hasn't
>> yet been merged, this series is based on that branch.
> 
> This seems to have a light conflict with Derrick's chunked file
> format work in midx.c where pack_info is renamed and extended so the
> new pack_order variable now needs to become a member in it.
> 
> I think I resolved it OK, but without any callers that actually
> utilize the new code or tests, it is almost impossible to have any
> confidence in the result of the conflict resolution X-<.
> 
> Could you two please look over to see if I made any silly mistakes,
> when I pushe it out later?

I reproduced your merge and got very similar results. The differences
that happened between my result and yours are not meaningful in any
way.

Definitely a very subtle merge, so thank you for doing that!

-Stolee


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 0/9] midx: implement a multi-pack reverse index
  2021-02-11 18:37   ` Derrick Stolee
@ 2021-02-11 18:55     ` Junio C Hamano
  0 siblings, 0 replies; 171+ messages in thread
From: Junio C Hamano @ 2021-02-11 18:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, peff

Derrick Stolee <stolee@gmail.com> writes:

>> Could you two please look over to see if I made any silly mistakes,
>> when I pushe it out later?
>
> I reproduced your merge and got very similar results. The differences
> that happened between my result and yours are not meaningful in any
> way.
>
> Definitely a very subtle merge, so thank you for doing that!

Resolving this kind of conflict is like a quick quiz in the
classroom to check my understanding of what is happening in all the
topics involved.  It is like solving a puzzle, and I do not dislike
doing it (otherwise I won't be working as a maintainer), but having
a second set of eyes, helping with independent verification, is
always appreciated.

Thanks.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 2/9] midx: allow marking a pack as preferred
  2021-02-10 23:02 ` [PATCH 2/9] midx: allow marking a pack as preferred Taylor Blau
@ 2021-02-11 19:33   ` SZEDER Gábor
  2021-02-15 15:49     ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: SZEDER Gábor @ 2021-02-11 19:33 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, peff

On Wed, Feb 10, 2021 at 06:02:42PM -0500, Taylor Blau wrote:
> diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> index eb0caa0439..dd14eab781 100644
> --- a/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes
>  SYNOPSIS
>  --------
>  [verse]
> -'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
> +'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
> +	[--preferred-pack=<pack>] <subcommand>
>  
>  DESCRIPTION
>  -----------
> @@ -27,6 +28,14 @@ OPTIONS
>  	Turn progress on/off explicitly. If neither is specified, progress is
>  	shown if standard error is connected to a terminal.
>  
> +--preferred-pack=<pack>::
> +	When using the `write` subcommand, optionally specify the
> +	tie-breaking pack used when multiple packs contain the same
> +	object. Incompatible with other subcommands, including `repack`,

I think this shouldn't be an option of the 'git multi-pack-index'
command but an option of its 'write' subcommand.


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 2/9] midx: allow marking a pack as preferred
  2021-02-11 19:33   ` SZEDER Gábor
@ 2021-02-15 15:49     ` Taylor Blau
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 15:49 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, dstolee, gitster, peff

On Thu, Feb 11, 2021 at 08:33:14PM +0100, SZEDER Gábor wrote:
> > +--preferred-pack=<pack>::
> > +	When using the `write` subcommand, optionally specify the
> > +	tie-breaking pack used when multiple packs contain the same
> > +	object. Incompatible with other subcommands, including `repack`,
>
> I think this shouldn't be an option of the 'git multi-pack-index'
> command but an option of its 'write' subcommand.

:-/. I wrote a lengthy response on Friday, but Gmail must have eaten it.

The gist of my response was that the intermingling of sub-commands with
options from other sub-commands goes deeper than just the documentation,
since command-line arguments are only parsed once in
builtin/multi-pack-index.c.

I explored whether or not it would be worth it to parse the common
options first, and then have separate options for each of the
sub-commands (as is done in the commit-graph builtin). But, this is
tricky, since we accept common options on either side of the sub-command
(i.e., we'd expect both 'git multi-pack-index --object-dir=... write' to
behave the same as 'git multi-pack-index write --object-dir=...').

So you could let the first call to parse_options() parse all of the
arguments, but then specialized arguments (e.g., 'repack --batch-size')
would cause the parse-options API to barf because the first call to
parse_options() doesn't recognize '--batch-size'.

I think the easiest way to do it would be to pass
PARSE_OPT_STOP_AT_NON_OPTION, and then let the subsequent calls to
parse_options() pass an array of option structs that also includes the
common options so they can be parsed on either side of the sub-command.

Obviously this leads to a lot of rather unfortunate duplication. So,
I'm content to leave it all as-is, and let the multi-pack-index
builtin check the disallowed combinations itself (e.g., if you passed
'--preferred-pack' but aren't in 'write' mode, then we should complain).

I can certainly move this piece of documentation into the 'write'
section, though, which should alleviate your immediate concern.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 2/9] midx: allow marking a pack as preferred
  2021-02-15 15:49     ` Taylor Blau
@ 2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
  2021-02-15 18:41         ` [PATCH 0/5] commit-graph: parse_options() cleanup Ævar Arnfjörð Bjarmason
                           ` (6 more replies)
  0 siblings, 7 replies; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 17:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: SZEDER Gábor, git, dstolee, gitster, peff


On Mon, Feb 15 2021, Taylor Blau wrote:

> On Thu, Feb 11, 2021 at 08:33:14PM +0100, SZEDER Gábor wrote:
>> > +--preferred-pack=<pack>::
>> > +	When using the `write` subcommand, optionally specify the
>> > +	tie-breaking pack used when multiple packs contain the same
>> > +	object. Incompatible with other subcommands, including `repack`,
>>
>> I think this shouldn't be an option of the 'git multi-pack-index'
>> command but an option of its 'write' subcommand.
>
> :-/. I wrote a lengthy response on Friday, but Gmail must have eaten it.
>
> The gist of my response was that the intermingling of sub-commands with
> options from other sub-commands goes deeper than just the documentation,
> since command-line arguments are only parsed once in
> builtin/multi-pack-index.c.
>
> I explored whether or not it would be worth it to parse the common
> options first, and then have separate options for each of the
> sub-commands (as is done in the commit-graph builtin). But, this is
> tricky, since we accept common options on either side of the sub-command
> (i.e., we'd expect both 'git multi-pack-index --object-dir=... write' to
> behave the same as 'git multi-pack-index write --object-dir=...').
>
> So you could let the first call to parse_options() parse all of the
> arguments, but then specialized arguments (e.g., 'repack --batch-size')
> would cause the parse-options API to barf because the first call to
> parse_options() doesn't recognize '--batch-size'.
>
> I think the easiest way to do it would be to pass
> PARSE_OPT_STOP_AT_NON_OPTION, and then let the subsequent calls to
> parse_options() pass an array of option structs that also includes the
> common options so they can be parsed on either side of the sub-command.
>
> Obviously this leads to a lot of rather unfortunate duplication. So,
> I'm content to leave it all as-is, and let the multi-pack-index
> builtin check the disallowed combinations itself (e.g., if you passed
> '--preferred-pack' but aren't in 'write' mode, then we should complain).
>
> I can certainly move this piece of documentation into the 'write'
> section, though, which should alleviate your immediate concern.

I may be missing something, but...

It sounds to me like you're imagining this is more complex than it is
because you don't know about some/all of parse_options_concat() or
PARSE_OPT_KEEP_*.

See e.g. cmd_{switch,restore} in builti/checkout.c, or the entire family
of diff-like commands where we do parse_options() followed by
setup_revisions(). We've got a lot of commands that parse options in a
piecemeal manner.

At no point do you need to re-parse the options. You just have the
common command parse as far as it gets, and leave anything else in
argv/argc for sub-commands like "write".

I think the problem is you read the builtin/commit-graph.c code, it
could really benefit from using parse_options_concat(), now things like
"object-directory" are copy/pasted in that file. See 2087182272
(checkout: split options[] array in three pieces, 2019-03-29) for a
commit which simplified that sort of code.

In this case you'd share the "opts_multi_pack_index" struct between the
various commands, it would just have unused fields for "write" that
aren't used by "verify" or whatever.

The PARSE_OPT_STOP_AT_NON_OPTION flag isn't for what you're doing with
"write" here, since as your test shows you're doing:

    git multi-pack-index <ALL_OPTS> <SUBCOMMAND>

But PARSE_OPT_STOP_AT_NON_OPTION is for cases like "git-remote" that do:

    git multi-pack-index <COMMOT_OPTS> <SUBCOMMAND> <SUBCOMMAND_OPTS>

(Or maybe you really want the latter, and the test change isn't
representative).

So then we want to stop at the first non-option, i.e. the subcommand. I
think it's good practice not to emulate how "git remote" works for new
commands, which makes things a bit simpler to implement.

You say "since we accept common options on either side of the
sub-command" but without PARSE_OPT_STOP_AT_NON_OPTION this works, since
if you can parse everything you'll have "write" left, but if you truly
have unknown options you'll have more than that in argv.

All of the above shouldn't be taken as a "your patch should change"
comment, maybe it's fine as-is.

I just replied because it sounded like you didn't spot how to easily use
parse_options() to do this sort of thing. It's actually rather easy.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 0/5] commit-graph: parse_options() cleanup
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:41         ` Ævar Arnfjörð Bjarmason
  2021-02-15 18:41         ` [PATCH 1/5] commit-graph: define common usage with a macro Ævar Arnfjörð Bjarmason
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 18:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff,
	Ævar Arnfjörð Bjarmason

A trivial cleanup series as a follow-up to my comments in
https://lore.kernel.org/git/87r1lhb6z7.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (5):
  commit-graph: define common usage with a macro
  commit-graph: remove redundant handling of -h
  commit-graph: use parse_options_concat()
  commit-graph: refactor dispatch loop for style
  commit-graph: show usage on "commit-graph [write|verify] garbage"

 builtin/commit-graph.c  | 102 ++++++++++++++++++++++------------------
 t/t5318-commit-graph.sh |   7 +++
 2 files changed, 62 insertions(+), 47 deletions(-)

-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 1/5] commit-graph: define common usage with a macro
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
  2021-02-15 18:41         ` [PATCH 0/5] commit-graph: parse_options() cleanup Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:41         ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:33           ` Derrick Stolee
  2021-02-15 18:41         ` [PATCH 2/5] commit-graph: remove redundant handling of -h Ævar Arnfjörð Bjarmason
                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 18:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff,
	Ævar Arnfjörð Bjarmason

Share the usage message between these three variables by using a
macro. Before this new options needed to copy/paste the usage
information, see e.g. 809e0327f5 (builtin/commit-graph.c: introduce
'--max-new-filters=<n>', 2020-09-18).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index cd86315221..c3fa4fde3e 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -9,26 +9,27 @@
 #include "progress.h"
 #include "tag.h"
 
-static char const * const builtin_commit_graph_usage[] = {
-	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] "
-	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] "
-	   "<split options>"),
+static const char * builtin_commit_graph_verify_usage[] = {
+#define BUILTIN_COMMIT_GRAPH_VERIFY_USAGE \
+	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]")
+	BUILTIN_COMMIT_GRAPH_VERIFY_USAGE,
 	NULL
 };
 
-static const char * const builtin_commit_graph_verify_usage[] = {
-	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
+static const char * builtin_commit_graph_write_usage[] = {
+#define BUILTIN_COMMIT_GRAPH_WRITE_USAGE \
+	N_("git commit-graph write [--object-dir <objdir>] [--append] " \
+	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " \
+	   "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " \
+	   "<split options>")
+	BUILTIN_COMMIT_GRAPH_WRITE_USAGE,
 	NULL
 };
 
-static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] "
-	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] "
-	   "<split options>"),
-	NULL
+static char const * const builtin_commit_graph_usage[] = {
+	BUILTIN_COMMIT_GRAPH_VERIFY_USAGE,
+	BUILTIN_COMMIT_GRAPH_WRITE_USAGE,
+	NULL,
 };
 
 static struct opts_commit_graph {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 2/5] commit-graph: remove redundant handling of -h
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
  2021-02-15 18:41         ` [PATCH 0/5] commit-graph: parse_options() cleanup Ævar Arnfjörð Bjarmason
  2021-02-15 18:41         ` [PATCH 1/5] commit-graph: define common usage with a macro Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:41         ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:35           ` Derrick Stolee
  2021-02-15 18:41         ` [PATCH 3/5] commit-graph: use parse_options_concat() Ævar Arnfjörð Bjarmason
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 18:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff,
	Ævar Arnfjörð Bjarmason

If we don't handle the -h option here like most parse_options() users
we'll fall through and it'll do the right thing for us.

I think this code added in 4ce58ee38d (commit-graph: create
git-commit-graph builtin, 2018-04-02) was always redundant,
parse_options() did this at the time, and the commit-graph code never
used PARSE_OPT_NO_INTERNAL_HELP.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c  | 4 ----
 t/t5318-commit-graph.sh | 5 +++++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index c3fa4fde3e..baead04a03 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -319,10 +319,6 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 		OPT_END(),
 	};
 
-	if (argc == 2 && !strcmp(argv[1], "-h"))
-		usage_with_options(builtin_commit_graph_usage,
-				   builtin_commit_graph_options);
-
 	git_config(git_default_config, NULL);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_commit_graph_options,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2ed0c1544d..567e68bd93 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -5,6 +5,11 @@ test_description='commit graph'
 
 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
 
+test_expect_success 'usage' '
+	test_expect_code 129 git commit-graph -h 2>err &&
+	! grep error: err
+'
+
 test_expect_success 'setup full repo' '
 	mkdir full &&
 	cd "$TRASH_DIRECTORY/full" &&
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
                           ` (2 preceding siblings ...)
  2021-02-15 18:41         ` [PATCH 2/5] commit-graph: remove redundant handling of -h Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:41         ` Ævar Arnfjörð Bjarmason
  2021-02-15 18:51           ` Taylor Blau
  2021-02-15 18:41         ` [PATCH 4/5] commit-graph: refactor dispatch loop for style Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  6 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 18:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff,
	Ævar Arnfjörð Bjarmason

Make use of the parse_options_concat() so we don't need to copy/paste
common options like --object-dir. This is inspired by a similar change
to "checkout" in 2087182272
(checkout: split options[] array in three pieces, 2019-03-29).

A minor behavior change here is that now we're going to list both
--object-dir and --progress first, before we'd list --progress along
with other options.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c | 43 ++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index baead04a03..a7718b2025 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -44,6 +44,21 @@ static struct opts_commit_graph {
 	int enable_changed_paths;
 } opts;
 
+static struct option *add_common_options(struct option *prevopts)
+{
+	struct option options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			   N_("dir"),
+			   N_("the object directory to store the graph")),
+		OPT_BOOL(0, "progress", &opts.progress,
+			 N_("force progress reporting")),
+		OPT_END()
+	};
+	struct option *newopts = parse_options_concat(options, prevopts);
+	free(prevopts);
+	return newopts;
+}
+
 static struct object_directory *find_odb(struct repository *r,
 					 const char *obj_dir)
 {
@@ -75,22 +90,20 @@ static int graph_verify(int argc, const char **argv)
 	int fd;
 	struct stat st;
 	int flags = 0;
-
+	struct option *options = NULL;
 	static struct option builtin_commit_graph_verify_options[] = {
-		OPT_STRING(0, "object-dir", &opts.obj_dir,
-			   N_("dir"),
-			   N_("the object directory to store the graph")),
 		OPT_BOOL(0, "shallow", &opts.shallow,
 			 N_("if the commit-graph is split, only verify the tip file")),
-		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
 		OPT_END(),
 	};
+	options = parse_options_dup(builtin_commit_graph_verify_options);
+	options = add_common_options(options);
 
 	trace2_cmd_mode("verify");
 
 	opts.progress = isatty(2);
 	argc = parse_options(argc, argv, NULL,
-			     builtin_commit_graph_verify_options,
+			     options,
 			     builtin_commit_graph_verify_usage, 0);
 
 	if (!opts.obj_dir)
@@ -205,11 +218,8 @@ static int graph_write(int argc, const char **argv)
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;
 	struct progress *progress = NULL;
-
+	struct option *options = NULL;
 	static struct option builtin_commit_graph_write_options[] = {
-		OPT_STRING(0, "object-dir", &opts.obj_dir,
-			N_("dir"),
-			N_("the object directory to store the graph")),
 		OPT_BOOL(0, "reachable", &opts.reachable,
 			N_("start walk at all refs")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
@@ -220,7 +230,6 @@ static int graph_write(int argc, const char **argv)
 			N_("include all commits already in the commit-graph file")),
 		OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths,
 			N_("enable computation for changed paths")),
-		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
 		OPT_CALLBACK_F(0, "split", &write_opts.split_flags, NULL,
 			N_("allow writing an incremental commit-graph file"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
@@ -236,6 +245,8 @@ static int graph_write(int argc, const char **argv)
 			0, write_option_max_new_filters),
 		OPT_END(),
 	};
+	options = parse_options_dup(builtin_commit_graph_write_options);
+	options = add_common_options(options);
 
 	opts.progress = isatty(2);
 	opts.enable_changed_paths = -1;
@@ -249,7 +260,7 @@ static int graph_write(int argc, const char **argv)
 	git_config(git_commit_graph_write_config, &opts);
 
 	argc = parse_options(argc, argv, NULL,
-			     builtin_commit_graph_write_options,
+			     options,
 			     builtin_commit_graph_write_usage, 0);
 
 	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
@@ -312,12 +323,8 @@ static int graph_write(int argc, const char **argv)
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
-	static struct option builtin_commit_graph_options[] = {
-		OPT_STRING(0, "object-dir", &opts.obj_dir,
-			N_("dir"),
-			N_("the object directory to store the graph")),
-		OPT_END(),
-	};
+	struct option *no_options = parse_options_dup(NULL);
+	struct option *builtin_commit_graph_options = add_common_options(no_options);
 
 	git_config(git_default_config, NULL);
 	argc = parse_options(argc, argv, prefix,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 4/5] commit-graph: refactor dispatch loop for style
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
                           ` (3 preceding siblings ...)
  2021-02-15 18:41         ` [PATCH 3/5] commit-graph: use parse_options_concat() Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:41         ` Ævar Arnfjörð Bjarmason
  2021-02-15 18:53           ` Taylor Blau
  2021-02-15 18:41         ` [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage" Ævar Arnfjörð Bjarmason
  2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
  6 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 18:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff,
	Ævar Arnfjörð Bjarmason

I think it's more readable to have one if/elsif/else chain here than
the code this replaces.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index a7718b2025..66fbdb7cb1 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -334,13 +334,11 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 
 	save_commit_buffer = 0;
 
-	if (argc > 0) {
-		if (!strcmp(argv[0], "verify"))
-			return graph_verify(argc, argv);
-		if (!strcmp(argv[0], "write"))
-			return graph_write(argc, argv);
-	}
-
-	usage_with_options(builtin_commit_graph_usage,
-			   builtin_commit_graph_options);
+	if (argc && !strcmp(argv[0], "verify"))
+		return graph_verify(argc, argv);
+	else if (argc && !strcmp(argv[0], "write"))
+		return graph_write(argc, argv);
+	else
+		usage_with_options(builtin_commit_graph_usage,
+				   builtin_commit_graph_options);
 }
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage"
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
                           ` (4 preceding siblings ...)
  2021-02-15 18:41         ` [PATCH 4/5] commit-graph: refactor dispatch loop for style Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:41         ` Ævar Arnfjörð Bjarmason
  2021-02-15 19:06           ` Taylor Blau
  2021-02-16 11:43           ` Derrick Stolee
  2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
  6 siblings, 2 replies; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 18:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff,
	Ævar Arnfjörð Bjarmason

Change the parse_options() invocation in the commit-graph code to make
sense. We're calling it twice, once for common options parsing, and
then for the sub-commands.

But we never checked if we had something leftover in argc in "write"
or "verify", as a result we'd silently accept garbage in these
subcommands. Let's not do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c  | 10 ++++++++--
 t/t5318-commit-graph.sh |  4 +++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 66fbdb7cb1..cb57771026 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -104,7 +104,10 @@ static int graph_verify(int argc, const char **argv)
 	opts.progress = isatty(2);
 	argc = parse_options(argc, argv, NULL,
 			     options,
-			     builtin_commit_graph_verify_usage, 0);
+			     builtin_commit_graph_verify_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_commit_graph_verify_usage, options);
 
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
@@ -261,7 +264,10 @@ static int graph_write(int argc, const char **argv)
 
 	argc = parse_options(argc, argv, NULL,
 			     options,
-			     builtin_commit_graph_write_usage, 0);
+			     builtin_commit_graph_write_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_commit_graph_write_usage, options);
 
 	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
 		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 567e68bd93..3f1c6dbc8f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -7,7 +7,9 @@ GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
 
 test_expect_success 'usage' '
 	test_expect_code 129 git commit-graph -h 2>err &&
-	! grep error: err
+	! grep error: err &&
+	test_expect_code 129 git commit-graph write blah &&
+	test_expect_code 129 git commit-graph write verify
 '
 
 test_expect_success 'setup full repo' '
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-02-15 18:41         ` [PATCH 3/5] commit-graph: use parse_options_concat() Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:51           ` Taylor Blau
  2021-02-15 19:53             ` Taylor Blau
  2021-02-15 20:39             ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 18:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, dstolee, SZEDER Gábor, peff

On Mon, Feb 15, 2021 at 07:41:16PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Make use of the parse_options_concat() so we don't need to copy/paste
> common options like --object-dir. This is inspired by a similar change
> to "checkout" in 2087182272
> (checkout: split options[] array in three pieces, 2019-03-29).
>
> A minor behavior change here is that now we're going to list both
> --object-dir and --progress first, before we'd list --progress along
> with other options.

"Behavior change" referring only to the output of `git commit-graph -h`,
no?

Looking at the code (and understanding this whole situation a little bit
better), I'd think that this wouldn't cause us to parse anything
differently before or after this change, right?

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/commit-graph.c | 43 ++++++++++++++++++++++++------------------
>  1 file changed, 25 insertions(+), 18 deletions(-)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index baead04a03..a7718b2025 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -44,6 +44,21 @@ static struct opts_commit_graph {
>  	int enable_changed_paths;
>  } opts;
>
> +static struct option *add_common_options(struct option *prevopts)
> +{
> +	struct option options[] = {
> +		OPT_STRING(0, "object-dir", &opts.obj_dir,
> +			   N_("dir"),
> +			   N_("the object directory to store the graph")),
> +		OPT_BOOL(0, "progress", &opts.progress,
> +			 N_("force progress reporting")),
> +		OPT_END()
> +	};

I'm nitpicking, but I wouldn't be sad to see this called "common"
instead".

Can't this also be declared statically?

> +	struct option *newopts = parse_options_concat(options, prevopts);
> +	free(prevopts);
> +	return newopts;
> +}
> +
>  static struct object_directory *find_odb(struct repository *r,
>  					 const char *obj_dir)
>  {
> @@ -75,22 +90,20 @@ static int graph_verify(int argc, const char **argv)
>  	int fd;
>  	struct stat st;
>  	int flags = 0;
> -
> +	struct option *options = NULL;
>  	static struct option builtin_commit_graph_verify_options[] = {
> -		OPT_STRING(0, "object-dir", &opts.obj_dir,
> -			   N_("dir"),
> -			   N_("the object directory to store the graph")),
>  		OPT_BOOL(0, "shallow", &opts.shallow,
>  			 N_("if the commit-graph is split, only verify the tip file")),
> -		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
>  		OPT_END(),
>  	};
> +	options = parse_options_dup(builtin_commit_graph_verify_options);

Another nitpick, but I'd rather see the initialization of "options" and
its declaration be on the same line, after declaring
builtin_commit_graph_verify_options.

> +	options = add_common_options(options);
>
>  	trace2_cmd_mode("verify");
>
>  	opts.progress = isatty(2);
>  	argc = parse_options(argc, argv, NULL,
> -			     builtin_commit_graph_verify_options,
> +			     options,
>  			     builtin_commit_graph_verify_usage, 0);
>
>  	if (!opts.obj_dir)
> @@ -205,11 +218,8 @@ static int graph_write(int argc, const char **argv)
>  	int result = 0;
>  	enum commit_graph_write_flags flags = 0;
>  	struct progress *progress = NULL;
> -
> +	struct option *options = NULL;
>  	static struct option builtin_commit_graph_write_options[] = {
> -		OPT_STRING(0, "object-dir", &opts.obj_dir,
> -			N_("dir"),
> -			N_("the object directory to store the graph")),
>  		OPT_BOOL(0, "reachable", &opts.reachable,
>  			N_("start walk at all refs")),
>  		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
> @@ -220,7 +230,6 @@ static int graph_write(int argc, const char **argv)
>  			N_("include all commits already in the commit-graph file")),
>  		OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths,
>  			N_("enable computation for changed paths")),
> -		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
>  		OPT_CALLBACK_F(0, "split", &write_opts.split_flags, NULL,
>  			N_("allow writing an incremental commit-graph file"),
>  			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
> @@ -236,6 +245,8 @@ static int graph_write(int argc, const char **argv)
>  			0, write_option_max_new_filters),
>  		OPT_END(),
>  	};
> +	options = parse_options_dup(builtin_commit_graph_write_options);
> +	options = add_common_options(options);
>
>  	opts.progress = isatty(2);
>  	opts.enable_changed_paths = -1;
> @@ -249,7 +260,7 @@ static int graph_write(int argc, const char **argv)
>  	git_config(git_commit_graph_write_config, &opts);
>
>  	argc = parse_options(argc, argv, NULL,
> -			     builtin_commit_graph_write_options,
> +			     options,
>  			     builtin_commit_graph_write_usage, 0);
>
>  	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
> @@ -312,12 +323,8 @@ static int graph_write(int argc, const char **argv)
>
>  int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>  {
> -	static struct option builtin_commit_graph_options[] = {
> -		OPT_STRING(0, "object-dir", &opts.obj_dir,
> -			N_("dir"),
> -			N_("the object directory to store the graph")),
> -		OPT_END(),
> -	};
> +	struct option *no_options = parse_options_dup(NULL);

Hmm. Why bother calling add_common_options at all here?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 4/5] commit-graph: refactor dispatch loop for style
  2021-02-15 18:41         ` [PATCH 4/5] commit-graph: refactor dispatch loop for style Ævar Arnfjörð Bjarmason
@ 2021-02-15 18:53           ` Taylor Blau
  2021-02-16 11:40             ` Derrick Stolee
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 18:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, dstolee, SZEDER Gábor, peff

On Mon, Feb 15, 2021 at 07:41:17PM +0100, Ævar Arnfjörð Bjarmason wrote:
> I think it's more readable to have one if/elsif/else chain here than
> the code this replaces.

FWIW, I find the pre-image more readable than what you are proposing
replacing it with here.

Of course, I have no doubts about the obvious correctness of this patch;
I'm merely suggesting that I wouldn't be sad to see us apply the first
three patches, and the fifth patch, but drop this one.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage"
  2021-02-15 18:41         ` [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage" Ævar Arnfjörð Bjarmason
@ 2021-02-15 19:06           ` Taylor Blau
  2021-02-16 11:43           ` Derrick Stolee
  1 sibling, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 19:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, dstolee, SZEDER Gábor, peff

On Mon, Feb 15, 2021 at 07:41:18PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Change the parse_options() invocation in the commit-graph code to make
> sense. We're calling it twice, once for common options parsing, and
> then for the sub-commands.
>
> But we never checked if we had something leftover in argc in "write"
> or "verify", as a result we'd silently accept garbage in these
> subcommands. Let's not do that.

...Implicit in all of this is that we need to pass
PARSE_OPT_KEEP_UNKNOWN to have the sub-commands' call to parse_options()
leave extra cruft alone so we can check for its existence with an "if (argc)".

Makes sense, thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-02-15 18:51           ` Taylor Blau
@ 2021-02-15 19:53             ` Taylor Blau
  2021-02-15 20:39             ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 19:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, dstolee, SZEDER Gábor, peff

On Mon, Feb 15, 2021 at 01:51:35PM -0500, Taylor Blau wrote:
> Another nitpick, but I'd rather see the initialization of "options" and
> its declaration be on the same line, after declaring
> builtin_commit_graph_verify_options.

Ignore me; the NULL initialization is important.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-02-15 18:51           ` Taylor Blau
  2021-02-15 19:53             ` Taylor Blau
@ 2021-02-15 20:39             ` Ævar Arnfjörð Bjarmason
  2021-09-17 21:13               ` SZEDER Gábor
  1 sibling, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 20:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, dstolee, SZEDER Gábor, peff


On Mon, Feb 15 2021, Taylor Blau wrote:

> On Mon, Feb 15, 2021 at 07:41:16PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> Make use of the parse_options_concat() so we don't need to copy/paste
>> common options like --object-dir. This is inspired by a similar change
>> to "checkout" in 2087182272
>> (checkout: split options[] array in three pieces, 2019-03-29).
>>
>> A minor behavior change here is that now we're going to list both
>> --object-dir and --progress first, before we'd list --progress along
>> with other options.
>
> "Behavior change" referring only to the output of `git commit-graph -h`,
> no?
>
> Looking at the code (and understanding this whole situation a little bit
> better), I'd think that this wouldn't cause us to parse anything
> differently before or after this change, right?

Indeed, I just mean the "-h" or "--invalid-opt" output changed in the
order we show the options in.

>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  builtin/commit-graph.c | 43 ++++++++++++++++++++++++------------------
>>  1 file changed, 25 insertions(+), 18 deletions(-)
>>
>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>> index baead04a03..a7718b2025 100644
>> --- a/builtin/commit-graph.c
>> +++ b/builtin/commit-graph.c
>> @@ -44,6 +44,21 @@ static struct opts_commit_graph {
>>  	int enable_changed_paths;
>>  } opts;
>>
>> +static struct option *add_common_options(struct option *prevopts)
>> +{
>> +	struct option options[] = {
>> +		OPT_STRING(0, "object-dir", &opts.obj_dir,
>> +			   N_("dir"),
>> +			   N_("the object directory to store the graph")),
>> +		OPT_BOOL(0, "progress", &opts.progress,
>> +			 N_("force progress reporting")),
>> +		OPT_END()
>> +	};
>
> I'm nitpicking, but I wouldn't be sad to see this called "common"
> instead".
>
> Can't this also be declared statically?

It happens to work now to do that, but try it in builtin/checkout.c and
you'll see it blows up with a wall of "initializer element is not
constant".

Probably better to be consistent in parse_options() usage than make it
safe for that sort of use...

>> +	struct option *newopts = parse_options_concat(options, prevopts);
>> +	free(prevopts);
>> +	return newopts;
>> +}
>> +
>>  static struct object_directory *find_odb(struct repository *r,
>>  					 const char *obj_dir)
>>  {
>> @@ -75,22 +90,20 @@ static int graph_verify(int argc, const char **argv)
>>  	int fd;
>>  	struct stat st;
>>  	int flags = 0;
>> -
>> +	struct option *options = NULL;
>>  	static struct option builtin_commit_graph_verify_options[] = {
>> -		OPT_STRING(0, "object-dir", &opts.obj_dir,
>> -			   N_("dir"),
>> -			   N_("the object directory to store the graph")),
>>  		OPT_BOOL(0, "shallow", &opts.shallow,
>>  			 N_("if the commit-graph is split, only verify the tip file")),
>> -		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
>>  		OPT_END(),
>>  	};
>> +	options = parse_options_dup(builtin_commit_graph_verify_options);
>
> Another nitpick, but I'd rather see the initialization of "options" and
> its declaration be on the same line, after declaring
> builtin_commit_graph_verify_options.

As you noted in your own reply "the NULL initialization is important",
or more specifically: We're doing this dance here (and in other existing
code, e.g. checkout.c) to trampoline from the stack ot the heap.

>> +	options = add_common_options(options);
>>
>>  	trace2_cmd_mode("verify");
>>
>>  	opts.progress = isatty(2);
>>  	argc = parse_options(argc, argv, NULL,
>> -			     builtin_commit_graph_verify_options,
>> +			     options,
>>  			     builtin_commit_graph_verify_usage, 0);
>>
>>  	if (!opts.obj_dir)
>> @@ -205,11 +218,8 @@ static int graph_write(int argc, const char **argv)
>>  	int result = 0;
>>  	enum commit_graph_write_flags flags = 0;
>>  	struct progress *progress = NULL;
>> -
>> +	struct option *options = NULL;
>>  	static struct option builtin_commit_graph_write_options[] = {
>> -		OPT_STRING(0, "object-dir", &opts.obj_dir,
>> -			N_("dir"),
>> -			N_("the object directory to store the graph")),
>>  		OPT_BOOL(0, "reachable", &opts.reachable,
>>  			N_("start walk at all refs")),
>>  		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
>> @@ -220,7 +230,6 @@ static int graph_write(int argc, const char **argv)
>>  			N_("include all commits already in the commit-graph file")),
>>  		OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths,
>>  			N_("enable computation for changed paths")),
>> -		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
>>  		OPT_CALLBACK_F(0, "split", &write_opts.split_flags, NULL,
>>  			N_("allow writing an incremental commit-graph file"),
>>  			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>> @@ -236,6 +245,8 @@ static int graph_write(int argc, const char **argv)
>>  			0, write_option_max_new_filters),
>>  		OPT_END(),
>>  	};
>> +	options = parse_options_dup(builtin_commit_graph_write_options);
>> +	options = add_common_options(options);
>>
>>  	opts.progress = isatty(2);
>>  	opts.enable_changed_paths = -1;
>> @@ -249,7 +260,7 @@ static int graph_write(int argc, const char **argv)
>>  	git_config(git_commit_graph_write_config, &opts);
>>
>>  	argc = parse_options(argc, argv, NULL,
>> -			     builtin_commit_graph_write_options,
>> +			     options,
>>  			     builtin_commit_graph_write_usage, 0);
>>
>>  	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
>> @@ -312,12 +323,8 @@ static int graph_write(int argc, const char **argv)
>>
>>  int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>>  {
>> -	static struct option builtin_commit_graph_options[] = {
>> -		OPT_STRING(0, "object-dir", &opts.obj_dir,
>> -			N_("dir"),
>> -			N_("the object directory to store the graph")),
>> -		OPT_END(),
>> -	};
>> +	struct option *no_options = parse_options_dup(NULL);
>
> Hmm. Why bother calling add_common_options at all here?

I assume you mean in this line just below what you quoted:

    struct option *builtin_commit_graph_options = add_common_options(no_options);

Do you mean why not do the whole thing in graph_{verify,write}() and
only show the usage if we fail here?

Yeah arguably that makes more sense, but I wanted to just focus on
refactoring existing behavior & get rid of the copy/pasted options
rather than start a bigger rewrite of "maybe we shouldn't show this
rather useless help info if we die here....".

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 0/4] midx: split out sub-commands
  2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
                           ` (5 preceding siblings ...)
  2021-02-15 18:41         ` [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage" Ævar Arnfjörð Bjarmason
@ 2021-02-15 21:01         ` Taylor Blau
  2021-02-15 21:01           ` [PATCH v2 1/4] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
                             ` (4 more replies)
  6 siblings, 5 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 21:01 UTC (permalink / raw)
  To: git; +Cc: avarab, peff, dstolee, szeder.dev, gitster

Here's a few patches that we could add to the beginning of this series,
or queue up separately.

I think that these are all fairly straightforward, but it would be good
to have Ævar take a look and make sure I'm not doing anything wrong
here.

I'll plan to send a v2 of the reverse index series in a few days with
these four new patches at the beginning.

Taylor Blau (4):
  builtin/multi-pack-index.c: inline 'flags' with options
  builtin/multi-pack-index.c: don't handle 'progress' separately
  builtin/multi-pack-index.c: define common usage with a macro
  builtin/multi-pack-index.c: split sub-commands

 builtin/multi-pack-index.c | 155 +++++++++++++++++++++++++++++--------
 1 file changed, 124 insertions(+), 31 deletions(-)

--
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 1/4] builtin/multi-pack-index.c: inline 'flags' with options
  2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
@ 2021-02-15 21:01           ` Taylor Blau
  2021-02-15 21:01           ` [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 21:01 UTC (permalink / raw)
  To: git; +Cc: avarab, peff, dstolee, szeder.dev, gitster

Subcommands of the 'git multi-pack-index' command (e.g., 'write',
'verify', etc.) will want to optionally change a set of shared flags
that are eventually passed to the MIDX libraries.

Right now, options and flags are handled separately. Inline them into
the same structure so that sub-commands can more easily share the
'flags' data.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5bf88cd2a8..4a0ddb06c4 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -14,13 +14,12 @@ static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
 	int progress;
+	unsigned flags;
 } opts;
 
 int cmd_multi_pack_index(int argc, const char **argv,
 			 const char *prefix)
 {
-	unsigned flags = 0;
-
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
@@ -40,7 +39,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
 	if (opts.progress)
-		flags |= MIDX_PROGRESS;
+		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
@@ -55,16 +54,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	if (!strcmp(argv[0], "repack"))
 		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, flags);
+			(size_t)opts.batch_size, opts.flags);
 	if (opts.batch_size)
 		die(_("--batch-size option is only for 'repack' subcommand"));
 
 	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, flags);
+		return write_midx_file(opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, flags);
+		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, flags);
+		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
 
 	die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
  2021-02-15 21:01           ` [PATCH v2 1/4] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
@ 2021-02-15 21:01           ` Taylor Blau
  2021-02-15 21:39             ` Ævar Arnfjörð Bjarmason
  2021-02-15 21:01           ` [PATCH v2 3/4] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 21:01 UTC (permalink / raw)
  To: git; +Cc: avarab, peff, dstolee, szeder.dev, gitster

Now that there is a shared 'flags' member in the options structure,
there is no need to keep track of whether to force progress or not,
since ultimately the decision of whether or not to show a progress meter
is controlled by a bit in the flags member.

Manipulate that bit directly, and drop the now-unnecessary 'progress'
field while we're at it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 4a0ddb06c4..c70f020d8f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,7 +13,6 @@ static char const * const builtin_multi_pack_index_usage[] = {
 static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
-	int progress;
 	unsigned flags;
 } opts;
 
@@ -23,7 +22,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
+		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
@@ -31,15 +30,14 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	git_config(git_default_config, NULL);
 
-	opts.progress = isatty(2);
+	if (isatty(2))
+		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
 			     builtin_multi_pack_index_usage, 0);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
-	if (opts.progress)
-		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 3/4] builtin/multi-pack-index.c: define common usage with a macro
  2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
  2021-02-15 21:01           ` [PATCH v2 1/4] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
  2021-02-15 21:01           ` [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
@ 2021-02-15 21:01           ` Taylor Blau
  2021-02-15 21:01           ` [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands Taylor Blau
  2021-02-16 11:50           ` [PATCH v2 0/4] midx: split out sub-commands Derrick Stolee
  4 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 21:01 UTC (permalink / raw)
  To: git; +Cc: avarab, peff, dstolee, szeder.dev, gitster

Factor out the usage message into pieces corresponding to each mode.
This avoids options specific to one sub-command from being shared with
another in the usage.

A subsequent commit will use these #define macros to have usage
variables for each sub-command without duplicating their contents.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c70f020d8f..eea498e026 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,8 +5,23 @@
 #include "midx.h"
 #include "trace2.h"
 
+#define BUILTIN_MIDX_WRITE_USAGE \
+	N_("git multi-pack-index [<options>] write")
+
+#define BUILTIN_MIDX_VERIFY_USAGE \
+	N_("git multi-pack-index [<options>] verify")
+
+#define BUILTIN_MIDX_EXPIRE_USAGE \
+	N_("git multi-pack-index [<options>] expire")
+
+#define BUILTIN_MIDX_REPACK_USAGE \
+	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
+
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
+	BUILTIN_MIDX_WRITE_USAGE,
+	BUILTIN_MIDX_VERIFY_USAGE,
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	BUILTIN_MIDX_REPACK_USAGE,
 	NULL
 };
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands
  2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
                             ` (2 preceding siblings ...)
  2021-02-15 21:01           ` [PATCH v2 3/4] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
@ 2021-02-15 21:01           ` Taylor Blau
  2021-02-15 21:54             ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:50           ` [PATCH v2 0/4] midx: split out sub-commands Derrick Stolee
  4 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 21:01 UTC (permalink / raw)
  To: git; +Cc: avarab, peff, dstolee, szeder.dev, gitster

Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
"write", "repack", etc.) separately from one another. This allows
sub-commands with unique options, without forcing cmd_multi_pack_index()
to reject invalid combinations itself.

This comes at the cost of some duplication and boilerplate. Luckily, the
duplication is reduced to a minimum, since common options are shared
among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
retain the common options, too, since this builtin accepts common
options on either side of the sub-command).

Roughly speaking, cmd_multi_pack_index() parses options (including
common ones), and stops at the first non-option, which is the
sub-command. It then dispatches to the appropriate sub-command, which
parses the remaining options (also including common options).

Unknown options are kept by the sub-commands in order to detect their
presence (and complain that too many arguments were given).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 131 ++++++++++++++++++++++++++++++-------
 1 file changed, 106 insertions(+), 25 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index eea498e026..caf0248a98 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,17 +5,33 @@
 #include "midx.h"
 #include "trace2.h"
 
+static char const * const builtin_multi_pack_index_write_usage[] = {
 #define BUILTIN_MIDX_WRITE_USAGE \
 	N_("git multi-pack-index [<options>] write")
+	BUILTIN_MIDX_WRITE_USAGE,
+	NULL
+};
 
+static char const * const builtin_multi_pack_index_verify_usage[] = {
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
+	BUILTIN_MIDX_VERIFY_USAGE,
+	NULL
+};
 
+static char const * const builtin_multi_pack_index_expire_usage[] = {
 #define BUILTIN_MIDX_EXPIRE_USAGE \
 	N_("git multi-pack-index [<options>] expire")
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	NULL
+};
 
+static char const * const builtin_multi_pack_index_repack_usage[] = {
 #define BUILTIN_MIDX_REPACK_USAGE \
 	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
+	BUILTIN_MIDX_REPACK_USAGE,
+	NULL
+};
 
 static char const * const builtin_multi_pack_index_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
@@ -31,25 +47,99 @@ static struct opts_multi_pack_index {
 	unsigned flags;
 } opts;
 
-int cmd_multi_pack_index(int argc, const char **argv,
-			 const char *prefix)
+static struct option common_opts[] = {
+	OPT_FILENAME(0, "object-dir", &opts.object_dir,
+	  N_("object directory containing set of packfile and pack-index pairs")),
+	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	OPT_END(),
+};
+
+static struct option *add_common_options(struct option *prev)
 {
-	static struct option builtin_multi_pack_index_options[] = {
-		OPT_FILENAME(0, "object-dir", &opts.object_dir,
-		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	struct option *with_common = parse_options_concat(common_opts, prev);
+	free(prev);
+	return with_common;
+}
+
+static int cmd_multi_pack_index_write(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_write_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_write_usage,
+				   options);
+
+	return write_midx_file(opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_verify(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_verify_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_verify_usage,
+				   options);
+
+	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_expire(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_expire_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_expire_usage,
+				   options);
+
+	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_repack(int argc, const char **argv)
+{
+	struct option *options;
+	static struct option builtin_multi_pack_index_repack_options[] = {
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
 	};
 
+	options = parse_options_dup(builtin_multi_pack_index_repack_options);
+	options = add_common_options(options);
+
+	argc = parse_options(argc, argv, NULL,
+			     options,
+			     builtin_multi_pack_index_repack_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_repack_usage,
+				   options);
+
+	return midx_repack(the_repository, opts.object_dir,
+			   (size_t)opts.batch_size, opts.flags);
+}
+
+int cmd_multi_pack_index(int argc, const char **argv,
+			 const char *prefix)
+{
+	struct option *builtin_multi_pack_index_options = common_opts;
+
 	git_config(git_default_config, NULL);
 
 	if (isatty(2))
 		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
-			     builtin_multi_pack_index_usage, 0);
+			     builtin_multi_pack_index_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
@@ -58,25 +148,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		usage_with_options(builtin_multi_pack_index_usage,
 				   builtin_multi_pack_index_options);
 
-	if (argc > 1) {
-		die(_("too many arguments"));
-		return 1;
-	}
-
 	trace2_cmd_mode(argv[0]);
 
 	if (!strcmp(argv[0], "repack"))
-		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, opts.flags);
-	if (opts.batch_size)
-		die(_("--batch-size option is only for 'repack' subcommand"));
-
-	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
-
-	die(_("unrecognized subcommand: %s"), argv[0]);
+		return cmd_multi_pack_index_repack(argc, argv);
+	else if (!strcmp(argv[0], "write"))
+		return cmd_multi_pack_index_write(argc, argv);
+	else if (!strcmp(argv[0], "verify"))
+		return cmd_multi_pack_index_verify(argc, argv);
+	else if (!strcmp(argv[0], "expire"))
+		return cmd_multi_pack_index_expire(argc, argv);
+	else
+		die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-02-15 21:01           ` [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
@ 2021-02-15 21:39             ` Ævar Arnfjörð Bjarmason
  2021-02-15 21:45               ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 21:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, szeder.dev, gitster


On Mon, Feb 15 2021, Taylor Blau wrote:

> @@ -31,15 +30,14 @@ int cmd_multi_pack_index(int argc, const char **argv,
>  
>  	git_config(git_default_config, NULL);
>  
> -	opts.progress = isatty(2);
> +	if (isatty(2))
> +		opts.flags |= MIDX_PROGRESS;
>  	argc = parse_options(argc, argv, prefix,
>  			     builtin_multi_pack_index_options,
>  			     builtin_multi_pack_index_usage, 0);
>  
>  	if (!opts.object_dir)
>  		opts.object_dir = get_object_directory();
> -	if (opts.progress)
> -		opts.flags |= MIDX_PROGRESS;


Funnily enough we could also just do:

    opts.flags = isatty(2);

Since there's a grand total of one flag it knows about, and
MIDX_PROGRESS is defined as 1.

Not the problem of this series really, just a nit: In efbc3aee08d (midx:
add MIDX_PROGRESS flag, 2019-10-21) we added this flag, and around the
same time the similar commit-graph code got refactored to have an enum
of flags in 5af80394521 (commit-graph: collapse parameters into flags,
2019-06-12).

I prefer the commit-graph way of having a clean boundary between the two
a bit more, and then just setting a flag based on an OPT_BOOL...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-02-15 21:39             ` Ævar Arnfjörð Bjarmason
@ 2021-02-15 21:45               ` Taylor Blau
  2021-02-16 11:47                 ` Derrick Stolee
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 21:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, szeder.dev, gitster

On Mon, Feb 15, 2021 at 10:39:16PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Feb 15 2021, Taylor Blau wrote:
>
> > @@ -31,15 +30,14 @@ int cmd_multi_pack_index(int argc, const char **argv,
> >
> >  	git_config(git_default_config, NULL);
> >
> > -	opts.progress = isatty(2);
> > +	if (isatty(2))
> > +		opts.flags |= MIDX_PROGRESS;
> >  	argc = parse_options(argc, argv, prefix,
> >  			     builtin_multi_pack_index_options,
> >  			     builtin_multi_pack_index_usage, 0);
> >
> >  	if (!opts.object_dir)
> >  		opts.object_dir = get_object_directory();
> > -	if (opts.progress)
> > -		opts.flags |= MIDX_PROGRESS;
>
>
> Funnily enough we could also just do:
>
>     opts.flags = isatty(2);
>
> Since there's a grand total of one flag it knows about, and
> MIDX_PROGRESS is defined as 1.

:-). I have a handful of branches that add some new flags (including the
original series I sent down-thread), so I'm not sure that I'm in favor
of this (admittedly cute) hack.

> Not the problem of this series really, just a nit: In efbc3aee08d (midx:
> add MIDX_PROGRESS flag, 2019-10-21) we added this flag, and around the
> same time the similar commit-graph code got refactored to have an enum
> of flags in 5af80394521 (commit-graph: collapse parameters into flags,
> 2019-06-12).

Hmm. I don't really have a strong opinion either way. I'd like to avoid
steering too far away from my original goal of multi-pack reverse
indexes, at least for now...

> I prefer the commit-graph way of having a clean boundary between the two
> a bit more, and then just setting a flag based on an OPT_BOOL...

Me too. But if you can part ways with it, it cuts down on the code
duplication (since callers in the sub-commands don't have to set that
bit on the flags themselves).

OTOH, we could keep half of this change and store the flags in the
options structure in addition to the progress field, then set the
appropriate bit in "flags" in cmd_builtin_multi_pack_index().

But I think at that point you're already sharing the flags field
everywhere, so you're just as well off to have something like what's
written in this patch here.

I don't have strong feelings either way.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands
  2021-02-15 21:01           ` [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands Taylor Blau
@ 2021-02-15 21:54             ` Ævar Arnfjörð Bjarmason
  2021-02-15 22:34               ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 21:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, szeder.dev, gitster


On Mon, Feb 15 2021, Taylor Blau wrote:

>  	trace2_cmd_mode(argv[0]);

Not a new issue, but curious that in the commit-graph.c code we'll first
validate, but here write garbage into the trace2_cmd_mode() before
potentially dying.

>  
>  	if (!strcmp(argv[0], "repack"))
> -		return midx_repack(the_repository, opts.object_dir,
> -			(size_t)opts.batch_size, opts.flags);
> -	if (opts.batch_size)
> -		die(_("--batch-size option is only for 'repack' subcommand"));
> -
> -	if (!strcmp(argv[0], "write"))
> -		return write_midx_file(opts.object_dir, opts.flags);
> -	if (!strcmp(argv[0], "verify"))
> -		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
> -	if (!strcmp(argv[0], "expire"))
> -		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
> -
> -	die(_("unrecognized subcommand: %s"), argv[0]);
> +		return cmd_multi_pack_index_repack(argc, argv);
> +	else if (!strcmp(argv[0], "write"))
> +		return cmd_multi_pack_index_write(argc, argv);
> +	else if (!strcmp(argv[0], "verify"))
> +		return cmd_multi_pack_index_verify(argc, argv);
> +	else if (!strcmp(argv[0], "expire"))
> +		return cmd_multi_pack_index_expire(argc, argv);
> +	else
> +		die(_("unrecognized subcommand: %s"), argv[0]);

I realize this is the existing behavior, but let's just make this die()
be the usage_with_options() we emit above in this case?

So maybe this on top?

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index caf0248a98..6f9223d538 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -65,6 +65,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_write_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -79,6 +81,8 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_verify_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -93,6 +97,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_expire_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -112,6 +118,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 		OPT_END(),
 	};
 
+	trace2_cmd_mode(argv[0]);
+
 	options = parse_options_dup(builtin_multi_pack_index_repack_options);
 	options = add_common_options(options);
 
@@ -144,20 +152,15 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
 
-	if (argc == 0)
-		usage_with_options(builtin_multi_pack_index_usage,
-				   builtin_multi_pack_index_options);
-
-	trace2_cmd_mode(argv[0]);
-
-	if (!strcmp(argv[0], "repack"))
+	if (argc && !strcmp(argv[0], "repack"))
 		return cmd_multi_pack_index_repack(argc, argv);
-	else if (!strcmp(argv[0], "write"))
+	else if (argc && !strcmp(argv[0], "write"))
 		return cmd_multi_pack_index_write(argc, argv);
-	else if (!strcmp(argv[0], "verify"))
+	else if (argc && !strcmp(argv[0], "verify"))
 		return cmd_multi_pack_index_verify(argc, argv);
-	else if (!strcmp(argv[0], "expire"))
+	else if (argc && !strcmp(argv[0], "expire"))
 		return cmd_multi_pack_index_expire(argc, argv);
 	else
-		die(_("unrecognized subcommand: %s"), argv[0]);
+		usage_with_options(builtin_multi_pack_index_usage,
+				   builtin_multi_pack_index_options);
 }


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands
  2021-02-15 21:54             ` Ævar Arnfjörð Bjarmason
@ 2021-02-15 22:34               ` Taylor Blau
  2021-02-15 23:11                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 22:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, peff, dstolee, szeder.dev, gitster

On Mon, Feb 15, 2021 at 10:54:31PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Feb 15 2021, Taylor Blau wrote:
>
> >  	trace2_cmd_mode(argv[0]);
>
> Not a new issue, but curious that in the commit-graph.c code we'll first
> validate, but here write garbage into the trace2_cmd_mode() before
> potentially dying.

Yeah, that's a good catch.

> I realize this is the existing behavior, but let's just make this die()
> be the usage_with_options() we emit above in this case?
>
> So maybe this on top?

I split this into two patches: one to move the trace2_cmd_mode() calls
around, and another to replace the final 'die()' with the usage text.

Like I said in my review of your patches to the commit-graph builtin
here:

    https://lore.kernel.org/git/YCrDGhIq7kU57p1s@nand.local/

I don't find the 'if (argc && ...)' style more readable, so the second
patch looks like this instead:

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5ab80bc722..ce4f1a0bcb 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -165,5 +165,6 @@ int cmd_multi_pack_index(int argc, const char **argv,
        else if (!strcmp(argv[0], "expire"))
                return cmd_multi_pack_index_expire(argc, argv);
        else
-               die(_("unrecognized subcommand: %s"), argv[0]);
+               usage_with_options(builtin_multi_pack_index_usage,
+                                  builtin_multi_pack_index_options);
 }

Is it OK if I use your Signed-off-by on both of those two new patches?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands
  2021-02-15 22:34               ` Taylor Blau
@ 2021-02-15 23:11                 ` Ævar Arnfjörð Bjarmason
  2021-02-15 23:49                   ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-15 23:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, szeder.dev, gitster


On Mon, Feb 15 2021, Taylor Blau wrote:

> On Mon, Feb 15, 2021 at 10:54:31PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Mon, Feb 15 2021, Taylor Blau wrote:
>>
>> >  	trace2_cmd_mode(argv[0]);
>>
>> Not a new issue, but curious that in the commit-graph.c code we'll first
>> validate, but here write garbage into the trace2_cmd_mode() before
>> potentially dying.
>
> Yeah, that's a good catch.
>
>> I realize this is the existing behavior, but let's just make this die()
>> be the usage_with_options() we emit above in this case?
>>
>> So maybe this on top?
>
> I split this into two patches: one to move the trace2_cmd_mode() calls
> around, and another to replace the final 'die()' with the usage text.

Thanks for picking it up.

> Like I said in my review of your patches to the commit-graph builtin
> here:
>
>     https://lore.kernel.org/git/YCrDGhIq7kU57p1s@nand.local/
>
> I don't find the 'if (argc && ...)' style more readable, so the second
> patch looks like this instead:

*Nod* FWIW (and this is getting way to nit-y) I don't disagree with you
about the "argc &&" being not very readable,

I just lean more on the side of getting rid of duplicate branches,
you'll still need the if (!argc) usage(...) case above without that
pattern, or some replacement for it.

But we can have our cake (not re-check argc all the time) and eat it too
(not copy/paste usage_with_options()). Isn't it beautiful?
    
    diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
    index caf0248a98..7ff50439f8 100644
    --- a/builtin/multi-pack-index.c
    +++ b/builtin/multi-pack-index.c
    @@ -144,12 +144,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
     	if (!opts.object_dir)
     		opts.object_dir = get_object_directory();
     
    -	if (argc == 0)
    -		usage_with_options(builtin_multi_pack_index_usage,
    -				   builtin_multi_pack_index_options);
    -
    -	trace2_cmd_mode(argv[0]);
    -
    +	if (!argc)
    +		goto usage;
     	if (!strcmp(argv[0], "repack"))
     		return cmd_multi_pack_index_repack(argc, argv);
     	else if (!strcmp(argv[0], "write"))
    @@ -159,5 +155,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
     	else if (!strcmp(argv[0], "expire"))
     		return cmd_multi_pack_index_expire(argc, argv);
     	else
    -		die(_("unrecognized subcommand: %s"), argv[0]);
    +usage:
    +		usage_with_options(builtin_multi_pack_index_usage,
    +				   builtin_multi_pack_index_options);
     }
    
:)

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 5ab80bc722..ce4f1a0bcb 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -165,5 +165,6 @@ int cmd_multi_pack_index(int argc, const char **argv,
>         else if (!strcmp(argv[0], "expire"))
>                 return cmd_multi_pack_index_expire(argc, argv);
>         else
> -               die(_("unrecognized subcommand: %s"), argv[0]);
> +               usage_with_options(builtin_multi_pack_index_usage,
> +                                  builtin_multi_pack_index_options);
>  }
>
> Is it OK if I use your Signed-off-by on both of those two new patches?

Yes please, should have included it to begin with.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands
  2021-02-15 23:11                 ` Ævar Arnfjörð Bjarmason
@ 2021-02-15 23:49                   ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-15 23:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, peff, dstolee, szeder.dev, gitster

On Tue, Feb 16, 2021 at 12:11:08AM +0100, Ævar Arnfjörð Bjarmason wrote:
> > I split this into two patches: one to move the trace2_cmd_mode() calls
> > around, and another to replace the final 'die()' with the usage text.
>
> Thanks for picking it up.

Of course. This has been quite a fun digression :-).

> > Like I said in my review of your patches to the commit-graph builtin
> > here:
> >
> >     https://lore.kernel.org/git/YCrDGhIq7kU57p1s@nand.local/
> >
> > I don't find the 'if (argc && ...)' style more readable, so the second
> > patch looks like this instead:
>
> *Nod* FWIW (and this is getting way to nit-y) I don't disagree with you
> about the "argc &&" being not very readable,
>
> I just lean more on the side of getting rid of duplicate branches,
> you'll still need the if (!argc) usage(...) case above without that
> pattern, or some replacement for it.
>
> But we can have our cake (not re-check argc all the time) and eat it too
> (not copy/paste usage_with_options()). Isn't it beautiful?

Heh; I'm not sure that I'd call adding a goto "beautiful", but I
actually do find this one more readable. I dunno, honestly, I'm happy to
squash it in to the last commit on top, but honestly I don't really care
strongly one way or another ;).

> > Is it OK if I use your Signed-off-by on both of those two new patches?
>
> Yes please, should have included it to begin with.

Thanks, and no worries.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/5] commit-graph: define common usage with a macro
  2021-02-15 18:41         ` [PATCH 1/5] commit-graph: define common usage with a macro Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:33           ` Derrick Stolee
  0 siblings, 0 replies; 171+ messages in thread
From: Derrick Stolee @ 2021-02-16 11:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff

On 2/15/2021 1:41 PM, Ævar Arnfjörð Bjarmason wrote:
> +static const char * builtin_commit_graph_verify_usage[] = {
> +#define BUILTIN_COMMIT_GRAPH_VERIFY_USAGE \
> +	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]")
> +	BUILTIN_COMMIT_GRAPH_VERIFY_USAGE,
>  	NULL
>  };
>  
> +static const char * builtin_commit_graph_write_usage[] = {
> +#define BUILTIN_COMMIT_GRAPH_WRITE_USAGE \
> +	N_("git commit-graph write [--object-dir <objdir>] [--append] " \
> +	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] " \
> +	   "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] " \
> +	   "<split options>")
> +	BUILTIN_COMMIT_GRAPH_WRITE_USAGE,
>  	NULL
>  };

This seemed very unnatural to me, but it all makes sense in the end:

> +static char const * const builtin_commit_graph_usage[] = {
> +	BUILTIN_COMMIT_GRAPH_VERIFY_USAGE,
> +	BUILTIN_COMMIT_GRAPH_WRITE_USAGE,
> +	NULL,
>  };

Clever!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 2/5] commit-graph: remove redundant handling of -h
  2021-02-15 18:41         ` [PATCH 2/5] commit-graph: remove redundant handling of -h Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:35           ` Derrick Stolee
  0 siblings, 0 replies; 171+ messages in thread
From: Derrick Stolee @ 2021-02-16 11:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff

On 2/15/2021 1:41 PM, Ævar Arnfjörð Bjarmason wrote:
> +test_expect_success 'usage' '
> +	test_expect_code 129 git commit-graph -h 2>err &&
> +	! grep error: err
> +'

I think this test already exists in t0012-help.sh, since it
tests bogus options for all of the builtins. (I can guarantee
that I wouldn't have thought to add the check without some
instance like that.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 4/5] commit-graph: refactor dispatch loop for style
  2021-02-15 18:53           ` Taylor Blau
@ 2021-02-16 11:40             ` Derrick Stolee
  2021-02-16 12:02               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-16 11:40 UTC (permalink / raw)
  To: Taylor Blau, Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, dstolee, SZEDER Gábor, peff

On 2/15/2021 1:53 PM, Taylor Blau wrote:
> On Mon, Feb 15, 2021 at 07:41:17PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> I think it's more readable to have one if/elsif/else chain here than
>> the code this replaces.
> 
> FWIW, I find the pre-image more readable than what you are proposing
> replacing it with here.
> 
> Of course, I have no doubts about the obvious correctness of this patch;
> I'm merely suggesting that I wouldn't be sad to see us apply the first
> three patches, and the fifth patch, but drop this one.

I agree with all of your points here. I think that compared to the
current code at-rest, the new version might be preferred. It's a little
dense, which is my only complaint.

The issue comes for the future: what if we need to add a third verb
to 'git commit-graph'? Then extending this new option looks worse since
we would check 'argc' three times.

The other patches solve real readability problems or reorganize the code
to use other concepts within the codebase. This one is much more optional.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage"
  2021-02-15 18:41         ` [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage" Ævar Arnfjörð Bjarmason
  2021-02-15 19:06           ` Taylor Blau
@ 2021-02-16 11:43           ` Derrick Stolee
  1 sibling, 0 replies; 171+ messages in thread
From: Derrick Stolee @ 2021-02-16 11:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Taylor Blau, dstolee, SZEDER Gábor, peff

On 2/15/2021 1:41 PM, Ævar Arnfjörð Bjarmason wrote:
>  test_expect_success 'usage' '
>  	test_expect_code 129 git commit-graph -h 2>err &&
> -	! grep error: err
> +	! grep error: err &&
> +	test_expect_code 129 git commit-graph write blah &&
> 	test_expect_code 129 git commit-graph write verify
>  '

This extension of the test justifies its existence, so ignore my earlier
comment about it being redundant.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-02-15 21:45               ` Taylor Blau
@ 2021-02-16 11:47                 ` Derrick Stolee
  0 siblings, 0 replies; 171+ messages in thread
From: Derrick Stolee @ 2021-02-16 11:47 UTC (permalink / raw)
  To: Taylor Blau, Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, szeder.dev, gitster

On 2/15/2021 4:45 PM, Taylor Blau wrote:
> On Mon, Feb 15, 2021 at 10:39:16PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> Funnily enough we could also just do:
>>
>>     opts.flags = isatty(2);
>>
>> Since there's a grand total of one flag it knows about, and
>> MIDX_PROGRESS is defined as 1.
> 
> :-). I have a handful of branches that add some new flags (including the
> original series I sent down-thread), so I'm not sure that I'm in favor
> of this (admittedly cute) hack.

It's also _wrong_ if the user passes in '--progress' but redirects stderr
to a file. I don't know why someone would want to do that, but they
could, and we honor that throughout all commands.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 0/4] midx: split out sub-commands
  2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
                             ` (3 preceding siblings ...)
  2021-02-15 21:01           ` [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands Taylor Blau
@ 2021-02-16 11:50           ` Derrick Stolee
  2021-02-16 14:28             ` Taylor Blau
  4 siblings, 1 reply; 171+ messages in thread
From: Derrick Stolee @ 2021-02-16 11:50 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: avarab, peff, dstolee, szeder.dev, gitster

On 2/15/2021 4:01 PM, Taylor Blau wrote:
> Here's a few patches that we could add to the beginning of this series,
> or queue up separately.
> 
> I think that these are all fairly straightforward, but it would be good
> to have Ævar take a look and make sure I'm not doing anything wrong
> here.
> 
> I'll plan to send a v2 of the reverse index series in a few days with
> these four new patches at the beginning.

Thanks, both, for cleaning up a mess I made as a new contributor. These
patches have been enlightening and definitely move the code into a
cleaner and more extensible direction. Thanks!

-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 4/5] commit-graph: refactor dispatch loop for style
  2021-02-16 11:40             ` Derrick Stolee
@ 2021-02-16 12:02               ` Ævar Arnfjörð Bjarmason
  2021-02-16 18:28                 ` Derrick Stolee
  0 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 12:02 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, git, Junio C Hamano, dstolee, SZEDER Gábor, peff


On Tue, Feb 16 2021, Derrick Stolee wrote:

> On 2/15/2021 1:53 PM, Taylor Blau wrote:
>> On Mon, Feb 15, 2021 at 07:41:17PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>> I think it's more readable to have one if/elsif/else chain here than
>>> the code this replaces.
>> 
>> FWIW, I find the pre-image more readable than what you are proposing
>> replacing it with here.
>> 
>> Of course, I have no doubts about the obvious correctness of this patch;
>> I'm merely suggesting that I wouldn't be sad to see us apply the first
>> three patches, and the fifth patch, but drop this one.
>
> I agree with all of your points here. I think that compared to the
> current code at-rest, the new version might be preferred. It's a little
> dense, which is my only complaint.
>
> The issue comes for the future: what if we need to add a third verb
> to 'git commit-graph'? Then extending this new option looks worse since
> we would check 'argc' three times.
>
> The other patches solve real readability problems or reorganize the code
> to use other concepts within the codebase. This one is much more optional.

What do you think about
https://lore.kernel.org/git/874kidapv7.fsf@evledraar.gmail.com/ ? :)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 0/4] midx: split out sub-commands
  2021-02-16 11:50           ` [PATCH v2 0/4] midx: split out sub-commands Derrick Stolee
@ 2021-02-16 14:28             ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-16 14:28 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, git, avarab, peff, dstolee, szeder.dev, gitster

On Tue, Feb 16, 2021 at 06:50:13AM -0500, Derrick Stolee wrote:
> On 2/15/2021 4:01 PM, Taylor Blau wrote:
> > Here's a few patches that we could add to the beginning of this series,
> > or queue up separately.
> >
> > I think that these are all fairly straightforward, but it would be good
> > to have Ævar take a look and make sure I'm not doing anything wrong
> > here.
> >
> > I'll plan to send a v2 of the reverse index series in a few days with
> > these four new patches at the beginning.
>
> Thanks, both, for cleaning up a mess I made as a new contributor. These
> patches have been enlightening and definitely move the code into a
> cleaner and more extensible direction. Thanks!

There was hardly a mess to clean-up, and clearly this pattern is new to
me, too :).

I'm planning on resubmitting my tb/reverse-midx series as soon as it
gets another set of reviewer eyes with these four or five patches as new
at the beginning.

I do wonder about the merge conflicts caused between this and your
chunk-format API series. I'd rather not create such conflicts for Junio,
and last I recall there were still some outstanding comments on that
series. So long as you don't think that you resolving those comments
would cause new conflicts, I would assume that Junio's rerere cache
would make applying both easy enough.

If you do think it would cause new conflicts, it may make sense for you
to rebase your branch on mine, but I'm not sure if that's something
you'd want to do or not.

> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 4/5] commit-graph: refactor dispatch loop for style
  2021-02-16 12:02               ` Ævar Arnfjörð Bjarmason
@ 2021-02-16 18:28                 ` Derrick Stolee
  0 siblings, 0 replies; 171+ messages in thread
From: Derrick Stolee @ 2021-02-16 18:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, Junio C Hamano, dstolee, SZEDER Gábor, peff

On 2/16/2021 7:02 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Feb 16 2021, Derrick Stolee wrote:
>> The other patches solve real readability problems or reorganize the code
>> to use other concepts within the codebase. This one is much more optional.
> 
> What do you think about
> https://lore.kernel.org/git/874kidapv7.fsf@evledraar.gmail.com/ ? :)

Using a 'goto' is a fine way to avoid a nesting level, but I'm not sure
it "improves readability." Having the tab level makes it clear that that
code is executed only when some condition is met, in this case "if (argc),"
while with the 'goto' we need to know that execution was redirected.

I don't feel too strongly either way. If the code was presented one way or
the other, I probably wouldn't recommend changing to the other mode. In
that sense, the change isn't necessary and causes me to break the tie in
favor of leaving it where it is.

Of course, if someone else says "I like this and would prefer it be used
as an example for future contributors" then the tie is broken in the other
way.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 00/15] midx: implement a multi-pack reverse index
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (10 preceding siblings ...)
  2021-02-11  8:13 ` Junio C Hamano
@ 2021-02-24 19:09 ` Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 01/15] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
                     ` (14 more replies)
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
  13 siblings, 15 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Here is a reroll of my series to implement a reverse index in preparation for
multi-pack reachability bitmaps. This version is based on a merge of
'ds/chunked-file-api' (currently on 'next') with the tip of 'master'.

The range-diff below looks complicated, but it isn't. The main changes from last
time are:

  - A handful of new patches at the front inspired by Ævar to handle
    sub-command options separately from the main 'git multi-pack-index' builtin.

  - Minor tweaks to the '--show-objects' mode for the multi-pack index helper
    (the information now follows the existing output instead of preceeding it).

  - Many fields have been folded into the new 'write_midx_context' struct, which
    allowed for some related cleanup.

  - A new 'close_midx_revindex()' was added, which I discovered I need for the
    multi-pack bitmaps topic.

I know that we're getting close to an -rc0, so I'm torn about what should happen
with this branch assuming that no major issues are uncovered during review. I
would be happy if this were queued soon so that we can start working on the main
topic which all of this is in support of, but I'm not sure how that would
interact with the upcoming release.

In fact, I think that the first six patches couple make it into 2.31 as-is,
since they have not much to do with the substance of this series and should be
fairly uncontroversial.

In any case, thanks in advance for your review.

Taylor Blau (15):
  builtin/multi-pack-index.c: inline 'flags' with options
  builtin/multi-pack-index.c: don't handle 'progress' separately
  builtin/multi-pack-index.c: define common usage with a macro
  builtin/multi-pack-index.c: split sub-commands
  builtin/multi-pack-index.c: don't enter bogus cmd_mode
  builtin/multi-pack-index.c: display usage on unrecognized command
  t/helper/test-read-midx.c: add '--show-objects'
  midx: allow marking a pack as preferred
  midx: don't free midx_name early
  midx: keep track of the checksum
  midx: make some functions non-static
  Documentation/technical: describe multi-pack reverse indexes
  pack-revindex: read multi-pack reverse indexes
  pack-write.c: extract 'write_rev_file_order'
  pack-revindex: write multi-pack reverse indexes

 Documentation/git-multi-pack-index.txt       |  14 +-
 Documentation/technical/multi-pack-index.txt |   5 +-
 Documentation/technical/pack-format.txt      |  80 +++++++
 builtin/multi-pack-index.c                   | 182 ++++++++++++---
 builtin/repack.c                             |   2 +-
 midx.c                                       | 229 +++++++++++++++++--
 midx.h                                       |  11 +-
 pack-revindex.c                              | 127 ++++++++++
 pack-revindex.h                              |  53 +++++
 pack-write.c                                 |  39 +++-
 pack.h                                       |   1 +
 packfile.c                                   |   3 +
 t/helper/test-read-midx.c                    |  24 +-
 t/t5319-multi-pack-index.sh                  |  39 ++++
 14 files changed, 740 insertions(+), 69 deletions(-)

Range-diff against v1:
 -:  ---------- >  1:  0527fa89a9 builtin/multi-pack-index.c: inline 'flags' with options
 -:  ---------- >  2:  a4e107b1f8 builtin/multi-pack-index.c: don't handle 'progress' separately
 -:  ---------- >  3:  8679dfd212 builtin/multi-pack-index.c: define common usage with a macro
 -:  ---------- >  4:  bc42b56ea2 builtin/multi-pack-index.c: split sub-commands
 -:  ---------- >  5:  5daa2946d3 builtin/multi-pack-index.c: don't enter bogus cmd_mode
 -:  ---------- >  6:  98d9ea0770 builtin/multi-pack-index.c: display usage on unrecognized command
 1:  e36acb005d !  7:  2fd9f4debf t/helper/test-read-midx.c: add '--show-objects'
    @@ t/helper/test-read-midx.c
      	uint32_t i;
      	struct multi_pack_index *m;
     @@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir)
    - 	if (!m)
    - 		return 1;
    + 
    + 	printf("object-dir: %s\n", m->object_dir);
      
     +	if (show_objects) {
     +		struct object_id oid;
    @@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir)
     +		return 0;
     +	}
     +
    - 	printf("header: %08x %d %d %d %d\n",
    - 	       m->signature,
    - 	       m->version,
    -@@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir)
    + 	return 0;
    + }
      
      int cmd__read_midx(int argc, const char **argv)
      {
 2:  4a358d57cf !  8:  223b899094 midx: allow marking a pack as preferred
    @@ Documentation/git-multi-pack-index.txt: git-multi-pack-index - Write and verify
      DESCRIPTION
      -----------
     @@ Documentation/git-multi-pack-index.txt: OPTIONS
    - 	Turn progress on/off explicitly. If neither is specified, progress is
    - 	shown if standard error is connected to a terminal.
    - 
    -+--preferred-pack=<pack>::
    -+	When using the `write` subcommand, optionally specify the
    -+	tie-breaking pack used when multiple packs contain the same
    -+	object. Incompatible with other subcommands, including `repack`,
    -+	which may repack the pack marked as preferred. If not given, the
    -+	preferred pack is inferred from an existing `multi-pack-index`,
    -+	if one exists, otherwise the pack with the lowest mtime.
    -+
      The following subcommands are available:
      
      write::
    +-	Write a new MIDX file.
    ++	Write a new MIDX file. The following options are available for
    ++	the `write` sub-command:
    +++
    ++--
    ++	--preferred-pack=<pack>::
    ++		Optionally specify the tie-breaking pack used when
    ++		multiple packs contain the same object. If not given,
    ++		ties are broken in favor of the pack with the lowest
    ++		mtime.
    ++--
    + 
    + verify::
    + 	Verify the contents of the MIDX file.
     
      ## Documentation/technical/multi-pack-index.txt ##
     @@ Documentation/technical/multi-pack-index.txt: Design Details
    @@ builtin/multi-pack-index.c
      #include "trace2.h"
     +#include "object-store.h"
      
    - static char const * const builtin_multi_pack_index_usage[] = {
    - 	N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
    + static char const * const builtin_multi_pack_index_write_usage[] = {
    + #define BUILTIN_MIDX_WRITE_USAGE \
    +-	N_("git multi-pack-index [<options>] write")
    ++	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
    + 	BUILTIN_MIDX_WRITE_USAGE,
    + 	NULL
    + };
     @@ builtin/multi-pack-index.c: static char const * const builtin_multi_pack_index_usage[] = {
      
      static struct opts_multi_pack_index {
      	const char *object_dir;
     +	const char *preferred_pack;
      	unsigned long batch_size;
    - 	int progress;
    + 	unsigned flags;
      } opts;
    -@@ builtin/multi-pack-index.c: int cmd_multi_pack_index(int argc, const char **argv,
    - 	static struct option builtin_multi_pack_index_options[] = {
    - 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
    - 		  N_("object directory containing set of packfile and pack-index pairs")),
    -+		OPT_STRING(0, "preferred-pack", &opts.preferred_pack, N_("preferred-pack"),
    -+		  N_("pack for reuse when computing a multi-pack bitmap")),
    - 		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
    - 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
    - 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
    -@@ builtin/multi-pack-index.c: int cmd_multi_pack_index(int argc, const char **argv,
    - 		return 1;
    - 	}
    +@@ builtin/multi-pack-index.c: static struct option *add_common_options(struct option *prev)
      
    -+	if (strcmp(argv[0], "write") && opts.preferred_pack)
    -+		die(_("'--preferred-pack' requires 'write'"));
    + static int cmd_multi_pack_index_write(int argc, const char **argv)
    + {
    +-	struct option *options = common_opts;
    ++	struct option *options;
    ++	static struct option builtin_multi_pack_index_write_options[] = {
    ++		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
    ++			   N_("preferred-pack"),
    ++			   N_("pack for reuse when computing a multi-pack bitmap")),
    ++		OPT_END(),
    ++	};
     +
    ++	options = parse_options_dup(builtin_multi_pack_index_write_options);
    ++	options = add_common_options(options);
    + 
      	trace2_cmd_mode(argv[0]);
      
    - 	if (!strcmp(argv[0], "repack"))
    -@@ builtin/multi-pack-index.c: int cmd_multi_pack_index(int argc, const char **argv,
    - 		die(_("--batch-size option is only for 'repack' subcommand"));
    +@@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_write(int argc, const char **argv)
    + 		usage_with_options(builtin_multi_pack_index_write_usage,
    + 				   options);
      
    - 	if (!strcmp(argv[0], "write"))
    --		return write_midx_file(opts.object_dir, flags);
    -+		return write_midx_file(opts.object_dir, opts.preferred_pack,
    -+				       flags);
    - 	if (!strcmp(argv[0], "verify"))
    - 		return verify_midx_file(the_repository, opts.object_dir, flags);
    - 	if (!strcmp(argv[0], "expire"))
    +-	return write_midx_file(opts.object_dir, opts.flags);
    ++	return write_midx_file(opts.object_dir, opts.preferred_pack,
    ++			       opts.flags);
    + }
    + 
    + static int cmd_multi_pack_index_verify(int argc, const char **argv)
     
      ## builtin/repack.c ##
     @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix)
    @@ midx.c: static int pack_info_compare(const void *_a, const void *_b)
     +	return -1;
     +}
     +
    - struct pack_list {
    + struct write_midx_context {
      	struct pack_info *info;
      	uint32_t nr;
    +@@ midx.c: struct write_midx_context {
    + 	uint32_t *pack_perm;
    + 	unsigned large_offsets_needed:1;
    + 	uint32_t num_large_offsets;
    ++
    ++	int preferred_pack_idx;
    + };
    + 
    + static void add_pack_to_midx(const char *full_path, size_t full_path_len,
     @@ midx.c: struct pack_midx_entry {
      	uint32_t pack_int_id;
      	time_t pack_mtime;
    @@ midx.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pack_inde
      				nr_fanout++;
      			}
      		}
    -@@ midx.c: static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
    +@@ midx.c: static int write_midx_large_offsets(struct hashfile *f,
      }
      
      static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
    @@ midx.c: static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_l
     +			       const char *preferred_pack_name,
     +			       unsigned flags)
      {
    - 	unsigned char cur_chunk, num_chunks = 0;
      	char *midx_name;
    + 	uint32_t i;
     @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 	int pack_name_concat_len = 0;
    - 	int dropped_packs = 0;
    - 	int result = 0;
    -+	int preferred_pack_idx = -1;
    - 
    - 	midx_name = get_midx_filename(object_dir);
    - 	if (safe_create_leading_directories(midx_name))
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 	if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop)
    + 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
      		goto cleanup;
      
    --	entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries);
    +-	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
     +	if (preferred_pack_name) {
    -+		for (i = 0; i < packs.nr; i++) {
    ++		for (i = 0; i < ctx.nr; i++) {
     +			if (!cmp_idx_or_pack_name(preferred_pack_name,
    -+						  packs.info[i].pack_name)) {
    -+				preferred_pack_idx = i;
    ++						  ctx.info[i].pack_name)) {
    ++				ctx.preferred_pack_idx = i;
     +				break;
     +			}
     +		}
    -+	}
    ++	} else
    ++		ctx.preferred_pack_idx = -1;
     +
    -+	entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries,
    -+				     preferred_pack_idx);
    ++	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
    ++					 ctx.preferred_pack_idx);
      
    - 	for (i = 0; i < nr_entries; i++) {
    - 		if (entries[i].offset > 0x7fffffff)
    + 	ctx.large_offsets_needed = 0;
    + 	for (i = 0; i < ctx.entries_nr; i++) {
     @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 			pack_name_concat_len += strlen(packs.info[i].pack_name) + 1;
    + 			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
      	}
      
     +	/*
     +	 * Recompute the preferred_pack_idx (if applicable) according to the
     +	 * permuted pack order.
     +	 */
    -+	preferred_pack_idx = -1;
    ++	ctx.preferred_pack_idx = -1;
     +	if (preferred_pack_name) {
    -+		preferred_pack_idx = lookup_idx_or_pack_name(packs.info,
    -+							     packs.nr,
    ++		ctx.preferred_pack_idx = lookup_idx_or_pack_name(ctx.info,
    ++							     ctx.nr,
     +							     preferred_pack_name);
    -+		if (preferred_pack_idx < 0)
    ++		if (ctx.preferred_pack_idx < 0)
     +			warning(_("unknown preferred pack: '%s'"),
     +				preferred_pack_name);
     +		else {
    -+			uint32_t orig = packs.info[preferred_pack_idx].orig_pack_int_id;
    -+			uint32_t perm = pack_perm[orig];
    ++			uint32_t orig = ctx.info[ctx.preferred_pack_idx].orig_pack_int_id;
    ++			uint32_t perm = ctx.pack_perm[orig];
     +
     +			if (perm == PACK_EXPIRED) {
     +				warning(_("preferred pack '%s' is expired"),
     +					preferred_pack_name);
    -+				preferred_pack_idx = -1;
    ++				ctx.preferred_pack_idx = -1;
     +			} else
    -+				preferred_pack_idx = perm;
    ++				ctx.preferred_pack_idx = perm;
     +		}
     +	}
     +
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'warn on improper hash version'
     +		) &&
     +
     +		git multi-pack-index --object-dir=objects \
    -+			--preferred-pack=test-BC-$bc.idx write 2>err &&
    ++			write --preferred-pack=test-BC-$bc.idx 2>err &&
     +		test_must_be_empty err &&
     +
     +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
 3:  218474158a !  9:  976848bc4b midx: don't free midx_name early
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
     -	FREE_AND_NULL(midx_name);
      
    - 	if (packs.m)
    - 		close_midx(packs.m);
    + 	if (ctx.m)
    + 		close_midx(ctx.m);
 4:  b4b842fa1e <  -:  ---------- midx: keep track of the checksum
 -:  ---------- > 10:  5ed47f7e3a midx: keep track of the checksum
 5:  953beabaa4 ! 11:  0292508e12 midx: make some functions non-static
    @@ midx.c: static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
     -static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
     +uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
      {
    - 	return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH);
    - }
    + 	return get_be32(m->chunk_object_offsets +
    + 			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
     
      ## midx.h ##
     @@ midx.h: struct multi_pack_index {
 6:  e64504bad6 <  -:  ---------- Documentation/technical: describe multi-pack reverse indexes
 -:  ---------- > 12:  404d730498 Documentation/technical: describe multi-pack reverse indexes
 7:  4c5e64c5fc ! 13:  d4e01a44e7 pack-revindex: read multi-pack reverse indexes
    @@ midx.c: static uint8_t oid_version(void)
     +		       m->object_dir, hash_to_hex(get_midx_checksum(m)));
     +}
     +
    - struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
    + static int midx_read_oid_fanout(const unsigned char *chunk_start,
    + 				size_t chunk_size, void *data)
      {
    - 	struct multi_pack_index *m = NULL;
     
      ## midx.h ##
     @@ midx.h: struct multi_pack_index {
    @@ pack-revindex.c: int load_pack_revindex(struct packed_git *p)
     +	free(revindex_name);
     +	return ret;
     +}
    ++
    ++int close_midx_revindex(struct multi_pack_index *m)
    ++{
    ++	if (!m)
    ++		return 0;
    ++
    ++	if (munmap((void*)m->revindex_map, m->revindex_len))
    ++		return -1;
    ++
    ++	m->revindex_map = NULL;
    ++	m->revindex_data = NULL;
    ++	m->revindex_len = 0;
    ++
    ++	return 0;
    ++}
     +
      int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
      {
    @@ pack-revindex.h: struct packed_git;
     + * A negative number is returned on error.
     + */
     +int load_midx_revindex(struct multi_pack_index *m);
    ++
    ++/*
    ++ * Frees resources associated with a multi-pack reverse index.
    ++ *
    ++ * A negative number is returned on error.
    ++ */
    ++int close_midx_revindex(struct multi_pack_index *m);
     +
      /*
       * offset_to_pack_pos converts an object offset to a pack position. This
 8:  4829b93f42 = 14:  ab7012b283 pack-write.c: extract 'write_rev_file_order'
 9:  fb5954b769 ! 15:  01bd6a35c6 pack-revindex: write multi-pack reverse indexes
    @@ Commit message
     
      ## midx.c ##
     @@
    - #include "trace2.h"
      #include "run-command.h"
      #include "repository.h"
    + #include "chunk-format.h"
     +#include "pack.h"
      
      #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
      #define MIDX_VERSION 1
    -@@ midx.c: static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
    - 	return written;
    +@@ midx.c: struct write_midx_context {
    + 	uint32_t entries_nr;
    + 
    + 	uint32_t *pack_perm;
    ++	uint32_t *pack_order;
    + 	unsigned large_offsets_needed:1;
    + 	uint32_t num_large_offsets;
    + 
    +@@ midx.c: static int write_midx_large_offsets(struct hashfile *f,
    + 	return 0;
      }
      
    -+struct midx_pack_order_data {
    -+	struct pack_midx_entry *entries;
    -+	uint32_t *pack_perm;
    -+};
    -+
    -+static int midx_pack_order_cmp(const void *va, const void *vb, void *_data)
    ++static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
     +{
    -+	struct midx_pack_order_data *data = _data;
    ++	struct write_midx_context *ctx = _ctx;
     +
    -+	struct pack_midx_entry *a = &data->entries[*(const uint32_t *)va];
    -+	struct pack_midx_entry *b = &data->entries[*(const uint32_t *)vb];
    ++	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
    ++	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
     +
    -+	uint32_t perm_a = data->pack_perm[a->pack_int_id];
    -+	uint32_t perm_b = data->pack_perm[b->pack_int_id];
    ++	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
    ++	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
     +
     +	/* Sort objects in the preferred pack ahead of any others. */
     +	if (a->preferred > b->preferred)
    @@ midx.c: static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_l
     +	return 0;
     +}
     +
    -+static uint32_t *midx_pack_order(struct pack_midx_entry *entries,
    -+				 uint32_t *pack_perm,
    -+				 uint32_t entries_nr)
    ++static uint32_t *midx_pack_order(struct write_midx_context *ctx)
     +{
    -+	struct midx_pack_order_data data;
     +	uint32_t *pack_order;
     +	uint32_t i;
     +
    -+	data.entries = entries;
    -+	data.pack_perm = pack_perm;
    -+
    -+	ALLOC_ARRAY(pack_order, entries_nr);
    -+	for (i = 0; i < entries_nr; i++)
    ++	ALLOC_ARRAY(pack_order, ctx->entries_nr);
    ++	for (i = 0; i < ctx->entries_nr; i++)
     +		pack_order[i] = i;
    -+	QSORT_S(pack_order, entries_nr, midx_pack_order_cmp, &data);
    ++	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
     +
     +	return pack_order;
     +}
     +
     +static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
    -+				     uint32_t *pack_order,
    -+				     uint32_t entries_nr)
    ++				     struct write_midx_context *ctx)
     +{
     +	struct strbuf buf = STRBUF_INIT;
     +
     +	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
     +
    -+	write_rev_file_order(buf.buf, pack_order, entries_nr, midx_hash,
    -+			     WRITE_REV);
    ++	write_rev_file_order(buf.buf, ctx->pack_order, ctx->entries_nr,
    ++			     midx_hash, WRITE_REV);
     +
     +	strbuf_release(&buf);
     +}
    @@ midx.c: static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_l
      			       struct string_list *packs_to_drop,
      			       const char *preferred_pack_name,
     @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 	struct lock_file lk;
    - 	struct pack_list packs;
    - 	uint32_t *pack_perm = NULL;
    -+	uint32_t *pack_order = NULL;
    - 	uint64_t written = 0;
    - 	uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1];
    - 	uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 		    chunk_offsets[num_chunks]);
      
      	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
    + 	free_chunkfile(cf);
    ++
     +	if (flags & MIDX_WRITE_REV_INDEX)
    -+		pack_order = midx_pack_order(entries, pack_perm, nr_entries);
    ++		ctx.pack_order = midx_pack_order(&ctx);
     +
     +	if (flags & MIDX_WRITE_REV_INDEX)
    -+		write_midx_reverse_index(midx_name, midx_hash, pack_order,
    -+					 nr_entries);
    ++		write_midx_reverse_index(midx_name, midx_hash, &ctx);
     +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
     +
      	commit_lock_file(&lk);
      
      cleanup:
     @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 	free(packs.info);
    - 	free(entries);
    - 	free(pack_perm);
    -+	free(pack_order);
    + 	free(ctx.info);
    + 	free(ctx.entries);
    + 	free(ctx.pack_perm);
    ++	free(ctx.pack_order);
      	free(midx_name);
      	return result;
      }
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 01/15] builtin/multi-pack-index.c: inline 'flags' with options
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 02/15] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Subcommands of the 'git multi-pack-index' command (e.g., 'write',
'verify', etc.) will want to optionally change a set of shared flags
that are eventually passed to the MIDX libraries.

Right now, options and flags are handled separately. Inline them into
the same structure so that sub-commands can more easily share the
'flags' data.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5bf88cd2a8..4a0ddb06c4 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -14,13 +14,12 @@ static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
 	int progress;
+	unsigned flags;
 } opts;
 
 int cmd_multi_pack_index(int argc, const char **argv,
 			 const char *prefix)
 {
-	unsigned flags = 0;
-
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
@@ -40,7 +39,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
 	if (opts.progress)
-		flags |= MIDX_PROGRESS;
+		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
@@ -55,16 +54,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	if (!strcmp(argv[0], "repack"))
 		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, flags);
+			(size_t)opts.batch_size, opts.flags);
 	if (opts.batch_size)
 		die(_("--batch-size option is only for 'repack' subcommand"));
 
 	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, flags);
+		return write_midx_file(opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, flags);
+		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, flags);
+		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
 
 	die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 02/15] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 01/15] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 03/15] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Now that there is a shared 'flags' member in the options structure,
there is no need to keep track of whether to force progress or not,
since ultimately the decision of whether or not to show a progress meter
is controlled by a bit in the flags member.

Manipulate that bit directly, and drop the now-unnecessary 'progress'
field while we're at it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 4a0ddb06c4..c70f020d8f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,7 +13,6 @@ static char const * const builtin_multi_pack_index_usage[] = {
 static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
-	int progress;
 	unsigned flags;
 } opts;
 
@@ -23,7 +22,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
+		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
@@ -31,15 +30,14 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	git_config(git_default_config, NULL);
 
-	opts.progress = isatty(2);
+	if (isatty(2))
+		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
 			     builtin_multi_pack_index_usage, 0);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
-	if (opts.progress)
-		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 03/15] builtin/multi-pack-index.c: define common usage with a macro
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 01/15] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 02/15] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands Taylor Blau
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Factor out the usage message into pieces corresponding to each mode.
This avoids options specific to one sub-command from being shared with
another in the usage.

A subsequent commit will use these #define macros to have usage
variables for each sub-command without duplicating their contents.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c70f020d8f..eea498e026 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,8 +5,23 @@
 #include "midx.h"
 #include "trace2.h"
 
+#define BUILTIN_MIDX_WRITE_USAGE \
+	N_("git multi-pack-index [<options>] write")
+
+#define BUILTIN_MIDX_VERIFY_USAGE \
+	N_("git multi-pack-index [<options>] verify")
+
+#define BUILTIN_MIDX_EXPIRE_USAGE \
+	N_("git multi-pack-index [<options>] expire")
+
+#define BUILTIN_MIDX_REPACK_USAGE \
+	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
+
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
+	BUILTIN_MIDX_WRITE_USAGE,
+	BUILTIN_MIDX_VERIFY_USAGE,
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	BUILTIN_MIDX_REPACK_USAGE,
 	NULL
 };
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (2 preceding siblings ...)
  2021-02-24 19:09   ` [PATCH v2 03/15] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-03-02  4:06     ` Jonathan Tan
  2021-02-24 19:09   ` [PATCH v2 05/15] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
"write", "repack", etc.) separately from one another. This allows
sub-commands with unique options, without forcing cmd_multi_pack_index()
to reject invalid combinations itself.

This comes at the cost of some duplication and boilerplate. Luckily, the
duplication is reduced to a minimum, since common options are shared
among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
retain the common options, too, since this builtin accepts common
options on either side of the sub-command).

Roughly speaking, cmd_multi_pack_index() parses options (including
common ones), and stops at the first non-option, which is the
sub-command. It then dispatches to the appropriate sub-command, which
parses the remaining options (also including common options).

Unknown options are kept by the sub-commands in order to detect their
presence (and complain that too many arguments were given).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 131 ++++++++++++++++++++++++++++++-------
 1 file changed, 106 insertions(+), 25 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index eea498e026..caf0248a98 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,17 +5,33 @@
 #include "midx.h"
 #include "trace2.h"
 
+static char const * const builtin_multi_pack_index_write_usage[] = {
 #define BUILTIN_MIDX_WRITE_USAGE \
 	N_("git multi-pack-index [<options>] write")
+	BUILTIN_MIDX_WRITE_USAGE,
+	NULL
+};
 
+static char const * const builtin_multi_pack_index_verify_usage[] = {
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
+	BUILTIN_MIDX_VERIFY_USAGE,
+	NULL
+};
 
+static char const * const builtin_multi_pack_index_expire_usage[] = {
 #define BUILTIN_MIDX_EXPIRE_USAGE \
 	N_("git multi-pack-index [<options>] expire")
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	NULL
+};
 
+static char const * const builtin_multi_pack_index_repack_usage[] = {
 #define BUILTIN_MIDX_REPACK_USAGE \
 	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
+	BUILTIN_MIDX_REPACK_USAGE,
+	NULL
+};
 
 static char const * const builtin_multi_pack_index_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
@@ -31,25 +47,99 @@ static struct opts_multi_pack_index {
 	unsigned flags;
 } opts;
 
-int cmd_multi_pack_index(int argc, const char **argv,
-			 const char *prefix)
+static struct option common_opts[] = {
+	OPT_FILENAME(0, "object-dir", &opts.object_dir,
+	  N_("object directory containing set of packfile and pack-index pairs")),
+	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	OPT_END(),
+};
+
+static struct option *add_common_options(struct option *prev)
 {
-	static struct option builtin_multi_pack_index_options[] = {
-		OPT_FILENAME(0, "object-dir", &opts.object_dir,
-		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	struct option *with_common = parse_options_concat(common_opts, prev);
+	free(prev);
+	return with_common;
+}
+
+static int cmd_multi_pack_index_write(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_write_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_write_usage,
+				   options);
+
+	return write_midx_file(opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_verify(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_verify_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_verify_usage,
+				   options);
+
+	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_expire(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_expire_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_expire_usage,
+				   options);
+
+	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_repack(int argc, const char **argv)
+{
+	struct option *options;
+	static struct option builtin_multi_pack_index_repack_options[] = {
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
 	};
 
+	options = parse_options_dup(builtin_multi_pack_index_repack_options);
+	options = add_common_options(options);
+
+	argc = parse_options(argc, argv, NULL,
+			     options,
+			     builtin_multi_pack_index_repack_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_repack_usage,
+				   options);
+
+	return midx_repack(the_repository, opts.object_dir,
+			   (size_t)opts.batch_size, opts.flags);
+}
+
+int cmd_multi_pack_index(int argc, const char **argv,
+			 const char *prefix)
+{
+	struct option *builtin_multi_pack_index_options = common_opts;
+
 	git_config(git_default_config, NULL);
 
 	if (isatty(2))
 		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
-			     builtin_multi_pack_index_usage, 0);
+			     builtin_multi_pack_index_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
@@ -58,25 +148,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		usage_with_options(builtin_multi_pack_index_usage,
 				   builtin_multi_pack_index_options);
 
-	if (argc > 1) {
-		die(_("too many arguments"));
-		return 1;
-	}
-
 	trace2_cmd_mode(argv[0]);
 
 	if (!strcmp(argv[0], "repack"))
-		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, opts.flags);
-	if (opts.batch_size)
-		die(_("--batch-size option is only for 'repack' subcommand"));
-
-	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
-
-	die(_("unrecognized subcommand: %s"), argv[0]);
+		return cmd_multi_pack_index_repack(argc, argv);
+	else if (!strcmp(argv[0], "write"))
+		return cmd_multi_pack_index_write(argc, argv);
+	else if (!strcmp(argv[0], "verify"))
+		return cmd_multi_pack_index_verify(argc, argv);
+	else if (!strcmp(argv[0], "expire"))
+		return cmd_multi_pack_index_expire(argc, argv);
+	else
+		die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 05/15] builtin/multi-pack-index.c: don't enter bogus cmd_mode
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (3 preceding siblings ...)
  2021-02-24 19:09   ` [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 06/15] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Even before the recent refactoring, 'git multi-pack-index' calls
'trace2_cmd_mode()' before verifying that the sub-command is recognized.

Push this call down into the individual sub-commands so that we don't
enter a bogus command mode.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index caf0248a98..9fdfe168c2 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -65,6 +65,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_write_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -79,6 +81,8 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_verify_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -93,6 +97,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_expire_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -115,6 +121,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 	options = parse_options_dup(builtin_multi_pack_index_repack_options);
 	options = add_common_options(options);
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options,
 			     builtin_multi_pack_index_repack_usage,
@@ -148,8 +156,6 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		usage_with_options(builtin_multi_pack_index_usage,
 				   builtin_multi_pack_index_options);
 
-	trace2_cmd_mode(argv[0]);
-
 	if (!strcmp(argv[0], "repack"))
 		return cmd_multi_pack_index_repack(argc, argv);
 	else if (!strcmp(argv[0], "write"))
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 06/15] builtin/multi-pack-index.c: display usage on unrecognized command
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (4 preceding siblings ...)
  2021-02-24 19:09   ` [PATCH v2 05/15] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 07/15] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

When given a sub-command that it doesn't understand, 'git
multi-pack-index' dies with the following message:

    $ git multi-pack-index bogus
    fatal: unrecognized subcommand: bogus

Instead of 'die()'-ing, we can display the usage text, which is much
more helpful:

    $ git.compile multi-pack-index bogus
    usage: git multi-pack-index [<options>] write
       or: git multi-pack-index [<options>] verify
       or: git multi-pack-index [<options>] expire
       or: git multi-pack-index [<options>] repack [--batch-size=<size>]

	--object-dir <file>   object directory containing set of packfile and pack-index pairs
	--progress            force progress reporting

While we're at it, clean up some duplication between the "no sub-command"
and "unrecognized sub-command" conditionals.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 9fdfe168c2..5b05e5ce39 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -153,8 +153,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		opts.object_dir = get_object_directory();
 
 	if (argc == 0)
-		usage_with_options(builtin_multi_pack_index_usage,
-				   builtin_multi_pack_index_options);
+		goto usage;
 
 	if (!strcmp(argv[0], "repack"))
 		return cmd_multi_pack_index_repack(argc, argv);
@@ -165,5 +164,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	else if (!strcmp(argv[0], "expire"))
 		return cmd_multi_pack_index_expire(argc, argv);
 	else
-		die(_("unrecognized subcommand: %s"), argv[0]);
+usage:
+		usage_with_options(builtin_multi_pack_index_usage,
+				   builtin_multi_pack_index_options);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 07/15] t/helper/test-read-midx.c: add '--show-objects'
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (5 preceding siblings ...)
  2021-02-24 19:09   ` [PATCH v2 06/15] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-02-24 19:09   ` [PATCH v2 08/15] midx: allow marking a pack as preferred Taylor Blau
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

The 'read-midx' helper is used in places like t5319 to display basic
information about a multi-pack-index.

In the next patch, the MIDX writing machinery will learn a new way to
choose from which pack an object is selected when multiple copies of
that object exist.

To disambiguate which pack introduces an object so that this feature can
be tested, add a '--show-objects' option which displays additional
information about each object in the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 2430880f78..7c2eb11a8e 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -4,7 +4,7 @@
 #include "repository.h"
 #include "object-store.h"
 
-static int read_midx_file(const char *object_dir)
+static int read_midx_file(const char *object_dir, int show_objects)
 {
 	uint32_t i;
 	struct multi_pack_index *m;
@@ -43,13 +43,29 @@ static int read_midx_file(const char *object_dir)
 
 	printf("object-dir: %s\n", m->object_dir);
 
+	if (show_objects) {
+		struct object_id oid;
+		struct pack_entry e;
+
+		for (i = 0; i < m->num_objects; i++) {
+			nth_midxed_object_oid(&oid, m, i);
+			fill_midx_entry(the_repository, &oid, &e, m);
+
+			printf("%s %"PRIu64"\t%s\n",
+			       oid_to_hex(&oid), e.offset, e.p->pack_name);
+		}
+		return 0;
+	}
+
 	return 0;
 }
 
 int cmd__read_midx(int argc, const char **argv)
 {
-	if (argc != 2)
-		usage("read-midx <object-dir>");
+	if (!(argc == 2 || argc == 3))
+		usage("read-midx [--show-objects] <object-dir>");
 
-	return read_midx_file(argv[1]);
+	if (!strcmp(argv[1], "--show-objects"))
+		return read_midx_file(argv[2], 1);
+	return read_midx_file(argv[1], 0);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 08/15] midx: allow marking a pack as preferred
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (6 preceding siblings ...)
  2021-02-24 19:09   ` [PATCH v2 07/15] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-03-02  4:17     ` Jonathan Tan
  2021-02-24 19:09   ` [PATCH v2 09/15] midx: don't free midx_name early Taylor Blau
                     ` (6 subsequent siblings)
  14 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.

Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).

To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.

No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).

[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt       | 14 ++-
 Documentation/technical/multi-pack-index.txt |  5 +-
 builtin/multi-pack-index.c                   | 18 +++-
 builtin/repack.c                             |  2 +-
 midx.c                                       | 99 ++++++++++++++++++--
 midx.h                                       |  2 +-
 t/t5319-multi-pack-index.sh                  | 39 ++++++++
 7 files changed, 161 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index eb0caa0439..ffd601bc17 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
+'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
+	[--preferred-pack=<pack>] <subcommand>
 
 DESCRIPTION
 -----------
@@ -30,7 +31,16 @@ OPTIONS
 The following subcommands are available:
 
 write::
-	Write a new MIDX file.
+	Write a new MIDX file. The following options are available for
+	the `write` sub-command:
++
+--
+	--preferred-pack=<pack>::
+		Optionally specify the tie-breaking pack used when
+		multiple packs contain the same object. If not given,
+		ties are broken in favor of the pack with the lowest
+		mtime.
+--
 
 verify::
 	Verify the contents of the MIDX file.
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index e8e377a59f..fb688976c4 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -43,8 +43,9 @@ Design Details
   a change in format.
 
 - The MIDX keeps only one record per object ID. If an object appears
-  in multiple packfiles, then the MIDX selects the copy in the most-
-  recently modified packfile.
+  in multiple packfiles, then the MIDX selects the copy in the
+  preferred packfile, otherwise selecting from the most-recently
+  modified packfile.
 
 - If there exist packfiles in the pack directory not registered in
   the MIDX, then those packfiles are loaded into the `packed_git`
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5b05e5ce39..2329dc5ec0 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -4,10 +4,11 @@
 #include "parse-options.h"
 #include "midx.h"
 #include "trace2.h"
+#include "object-store.h"
 
 static char const * const builtin_multi_pack_index_write_usage[] = {
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
 	BUILTIN_MIDX_WRITE_USAGE,
 	NULL
 };
@@ -43,6 +44,7 @@ static char const * const builtin_multi_pack_index_usage[] = {
 
 static struct opts_multi_pack_index {
 	const char *object_dir;
+	const char *preferred_pack;
 	unsigned long batch_size;
 	unsigned flags;
 } opts;
@@ -63,7 +65,16 @@ static struct option *add_common_options(struct option *prev)
 
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
-	struct option *options = common_opts;
+	struct option *options;
+	static struct option builtin_multi_pack_index_write_options[] = {
+		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
+			   N_("preferred-pack"),
+			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_END(),
+	};
+
+	options = parse_options_dup(builtin_multi_pack_index_write_options);
+	options = add_common_options(options);
 
 	trace2_cmd_mode(argv[0]);
 
@@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_write_usage,
 				   options);
 
-	return write_midx_file(opts.object_dir, opts.flags);
+	return write_midx_file(opts.object_dir, opts.preferred_pack,
+			       opts.flags);
 }
 
 static int cmd_multi_pack_index_verify(int argc, const char **argv)
diff --git a/builtin/repack.c b/builtin/repack.c
index 01440de2d5..9f00806805 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -523,7 +523,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	remove_temporary_files();
 
 	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), 0);
+		write_midx_file(get_object_directory(), NULL, 0);
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/midx.c b/midx.c
index 971faa8cfc..d2c56c4bc6 100644
--- a/midx.c
+++ b/midx.c
@@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
 	return strcmp(a->pack_name, b->pack_name);
 }
 
+static int lookup_idx_or_pack_name(struct pack_info *info,
+				   uint32_t nr,
+				   const char *pack_name)
+{
+	uint32_t lo = 0, hi = nr;
+	while (lo < hi) {
+		uint32_t mi = lo + (hi - lo) / 2;
+		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
+		if (cmp < 0)
+			hi = mi;
+		else if (cmp > 0)
+			lo = mi + 1;
+		else
+			return mi;
+	}
+	return -1;
+}
+
 struct write_midx_context {
 	struct pack_info *info;
 	uint32_t nr;
@@ -445,6 +463,8 @@ struct write_midx_context {
 	uint32_t *pack_perm;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
+
+	int preferred_pack_idx;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -489,6 +509,7 @@ struct pack_midx_entry {
 	uint32_t pack_int_id;
 	time_t pack_mtime;
 	uint64_t offset;
+	unsigned preferred : 1;
 };
 
 static int midx_oid_compare(const void *_a, const void *_b)
@@ -500,6 +521,12 @@ static int midx_oid_compare(const void *_a, const void *_b)
 	if (cmp)
 		return cmp;
 
+	/* Sort objects in a preferred pack first when multiple copies exist. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
 	if (a->pack_mtime > b->pack_mtime)
 		return -1;
 	else if (a->pack_mtime < b->pack_mtime)
@@ -527,7 +554,8 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 static void fill_pack_entry(uint32_t pack_int_id,
 			    struct packed_git *p,
 			    uint32_t cur_object,
-			    struct pack_midx_entry *entry)
+			    struct pack_midx_entry *entry,
+			    int preferred)
 {
 	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
 		die(_("failed to locate object %d in packfile"), cur_object);
@@ -536,6 +564,7 @@ static void fill_pack_entry(uint32_t pack_int_id,
 	entry->pack_mtime = p->mtime;
 
 	entry->offset = nth_packed_object_offset(p, cur_object);
+	entry->preferred = !!preferred;
 }
 
 /*
@@ -552,7 +581,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
 static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  struct pack_info *info,
 						  uint32_t nr_packs,
-						  uint32_t *nr_objects)
+						  uint32_t *nr_objects,
+						  uint32_t preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
 	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@@ -589,12 +619,17 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 				nth_midxed_pack_midx_entry(m,
 							   &entries_by_fanout[nr_fanout],
 							   cur_object);
+				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+					entries_by_fanout[nr_fanout].preferred = 1;
+				else
+					entries_by_fanout[nr_fanout].preferred = 0;
 				nr_fanout++;
 			}
 		}
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			uint32_t start = 0, end;
+			int preferred = cur_pack == preferred_pack;
 
 			if (cur_fanout)
 				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
@@ -602,7 +637,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 
 			for (cur_object = start; cur_object < end; cur_object++) {
 				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
-				fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
+				fill_pack_entry(cur_pack,
+						info[cur_pack].p,
+						cur_object,
+						&entries_by_fanout[nr_fanout],
+						preferred);
 				nr_fanout++;
 			}
 		}
@@ -777,7 +816,9 @@ static int write_midx_large_offsets(struct hashfile *f,
 }
 
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
-			       struct string_list *packs_to_drop, unsigned flags)
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       unsigned flags)
 {
 	char *midx_name;
 	uint32_t i;
@@ -828,7 +869,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
+	if (preferred_pack_name) {
+		for (i = 0; i < ctx.nr; i++) {
+			if (!cmp_idx_or_pack_name(preferred_pack_name,
+						  ctx.info[i].pack_name)) {
+				ctx.preferred_pack_idx = i;
+				break;
+			}
+		}
+	} else
+		ctx.preferred_pack_idx = -1;
+
+	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
+					 ctx.preferred_pack_idx);
 
 	ctx.large_offsets_needed = 0;
 	for (i = 0; i < ctx.entries_nr; i++) {
@@ -889,6 +942,31 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
 	}
 
+	/*
+	 * Recompute the preferred_pack_idx (if applicable) according to the
+	 * permuted pack order.
+	 */
+	ctx.preferred_pack_idx = -1;
+	if (preferred_pack_name) {
+		ctx.preferred_pack_idx = lookup_idx_or_pack_name(ctx.info,
+							     ctx.nr,
+							     preferred_pack_name);
+		if (ctx.preferred_pack_idx < 0)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+		else {
+			uint32_t orig = ctx.info[ctx.preferred_pack_idx].orig_pack_int_id;
+			uint32_t perm = ctx.pack_perm[orig];
+
+			if (perm == PACK_EXPIRED) {
+				warning(_("preferred pack '%s' is expired"),
+					preferred_pack_name);
+				ctx.preferred_pack_idx = -1;
+			} else
+				ctx.preferred_pack_idx = perm;
+		}
+	}
+
 	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
@@ -947,9 +1025,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	return result;
 }
 
-int write_midx_file(const char *object_dir, unsigned flags)
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, flags);
+	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+				   flags);
 }
 
 void clear_midx_file(struct repository *r)
@@ -1184,7 +1265,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	free(count);
 
 	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
+		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1373,7 +1454,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, flags);
+	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
diff --git a/midx.h b/midx.h
index b18cf53bc4..e7fea61109 100644
--- a/midx.h
+++ b/midx.h
@@ -47,7 +47,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
 int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-int write_midx_file(const char *object_dir, unsigned flags);
+int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index b4afab1dfc..fd94ba9053 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -31,6 +31,14 @@ midx_read_expect () {
 	test_cmp expect actual
 }
 
+midx_expect_object_offset () {
+	OID="$1"
+	OFFSET="$2"
+	OBJECT_DIR="$3"
+	test-tool read-midx --show-objects $OBJECT_DIR >actual &&
+	grep "^$OID $OFFSET" actual
+}
+
 test_expect_success 'setup' '
 	test_oid_cache <<-EOF
 	idxoff sha1:2999
@@ -234,6 +242,37 @@ test_expect_success 'warn on improper hash version' '
 	)
 '
 
+test_expect_success 'midx picks objects from preferred pack' '
+	test_when_finished rm -rf preferred.git &&
+	git init --bare preferred.git &&
+	(
+		cd preferred.git &&
+
+		a=$(echo "a" | git hash-object -w --stdin) &&
+		b=$(echo "b" | git hash-object -w --stdin) &&
+		c=$(echo "c" | git hash-object -w --stdin) &&
+
+		# Set up two packs, duplicating the object "B" at different
+		# offsets.
+		git pack-objects objects/pack/test-AB <<-EOF &&
+		$a
+		$b
+		EOF
+		bc=$(git pack-objects objects/pack/test-BC <<-EOF
+		$b
+		$c
+		EOF
+		) &&
+
+		git multi-pack-index --object-dir=objects \
+			write --preferred-pack=test-BC-$bc.idx 2>err &&
+		test_must_be_empty err &&
+
+		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
+			cut -d" " -f1) &&
+		midx_expect_object_offset $b $ofs objects
+	)
+'
 
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 09/15] midx: don't free midx_name early
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (7 preceding siblings ...)
  2021-02-24 19:09   ` [PATCH v2 08/15] midx: allow marking a pack as preferred Taylor Blau
@ 2021-02-24 19:09   ` Taylor Blau
  2021-02-24 19:10   ` [PATCH v2 10/15] midx: keep track of the checksum Taylor Blau
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:09 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

A subsequent patch will need to refer back to 'midx_name' later on in
the function. In fact, this variable is already free()'d later on, so
this makes the later free() no longer redundant.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/midx.c b/midx.c
index d2c56c4bc6..db043d3e65 100644
--- a/midx.c
+++ b/midx.c
@@ -973,7 +973,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
-	FREE_AND_NULL(midx_name);
 
 	if (ctx.m)
 		close_midx(ctx.m);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 10/15] midx: keep track of the checksum
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (8 preceding siblings ...)
  2021-02-24 19:09   ` [PATCH v2 09/15] midx: don't free midx_name early Taylor Blau
@ 2021-02-24 19:10   ` Taylor Blau
  2021-02-24 19:10   ` [PATCH v2 11/15] midx: make some functions non-static Taylor Blau
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

write_midx_internal() uses a hashfile to write the multi-pack index, but
discards its checksum. This makes sense, since nothing that takes place
after writing the MIDX cares about its checksum.

That is about to change in a subsequent patch, when the optional
reverse index corresponding to the MIDX will want to include the MIDX's
checksum.

Store the checksum of the MIDX in preparation for that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index db043d3e65..3ea795f416 100644
--- a/midx.c
+++ b/midx.c
@@ -821,6 +821,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			       unsigned flags)
 {
 	char *midx_name;
+	unsigned char midx_hash[GIT_MAX_RAWSZ];
 	uint32_t i;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
@@ -1004,7 +1005,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
 	write_chunkfile(cf, &ctx);
 
-	finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 	commit_lock_file(&lk);
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 11/15] midx: make some functions non-static
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (9 preceding siblings ...)
  2021-02-24 19:10   ` [PATCH v2 10/15] midx: keep track of the checksum Taylor Blau
@ 2021-02-24 19:10   ` Taylor Blau
  2021-02-24 19:10   ` [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

In a subsequent commit, pack-revindex.c will become responsible for
sorting a list of objects in the "MIDX pack order" (which will be
defined in the following patch). To do so, it will need to be know the
pack identifier and offset within that pack for each object in the MIDX.

The MIDX code already has functions for doing just that
(nth_midxed_offset() and nth_midxed_pack_int_id()), but they are
statically declared.

Since there is no reason that they couldn't be exposed publicly, and
because they are already doing exactly what the caller in
pack-revindex.c will want, expose them publicly so that they can be
reused there.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 ++--
 midx.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 3ea795f416..27a8b76dfe 100644
--- a/midx.c
+++ b/midx.c
@@ -239,7 +239,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 	return oid;
 }
 
-static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
 	const unsigned char *offset_data;
 	uint32_t offset32;
@@ -258,7 +258,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	return offset32;
 }
 
-static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
 	return get_be32(m->chunk_object_offsets +
 			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
diff --git a/midx.h b/midx.h
index e7fea61109..93bd68189e 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,8 @@ struct multi_pack_index {
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (10 preceding siblings ...)
  2021-02-24 19:10   ` [PATCH v2 11/15] midx: make some functions non-static Taylor Blau
@ 2021-02-24 19:10   ` Taylor Blau
  2021-03-02  4:21     ` Jonathan Tan
  2021-02-24 19:10   ` [PATCH v2 13/15] pack-revindex: read " Taylor Blau
                     ` (2 subsequent siblings)
  14 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

As a prerequisite to implementing multi-pack bitmaps, motivate and
describe the format and ordering of the multi-pack reverse index.

The subsequent patch will implement reading this format, and the patch
after that will implement writing it while producing a multi-pack index.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/pack-format.txt | 80 +++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 1faa949bf6..77eb591057 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -379,3 +379,83 @@ CHUNK DATA:
 TRAILER:
 
 	Index checksum of the above contents.
+
+== multi-pack-index reverse indexes
+
+Similar to the pack-based reverse index, the multi-pack index can also
+be used to generate a reverse index.
+
+Instead of mapping between offset, pack-, and index position, this
+reverse index maps between an object's position within the MIDX, and
+that object's position within a pseudo-pack that the MIDX describes.
+
+To clarify these three orderings, consider a multi-pack reachability
+bitmap (which does not yet exist, but is what we are building towards
+here). Each bit needs to correspond to an object in the MIDX, and so we
+need an efficient mapping from bit position to MIDX position.
+
+One solution is to let bits occupy the same position in the oid-sorted
+index stored by the MIDX. But because oids are effectively random, there
+resulting reachability bitmaps would have no locality, and thus compress
+poorly. (This is the reason that single-pack bitmaps use the pack
+ordering, and not the .idx ordering, for the same purpose.)
+
+So we'd like to define an ordering for the whole MIDX based around
+pack ordering, which has far better locality (and thus compresses more
+efficiently). We can think of a pseudo-pack created by the concatenation
+of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
+(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
+ordering of the objects like:
+
+    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
+
+where the ordering of the packs is defined by the MIDX's pack list,
+and then the ordering of objects within each pack is the same as the
+order in the actual packfile.
+
+Given the list of packs and their counts of objects, you can
+na&iuml;vely reconstruct that pseudo-pack ordering (e.g., the object at
+position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
+slots). But there's a catch. Objects may be duplicated between packs, in
+which case the MIDX only stores one pointer to the object (and thus we'd
+want only one slot in the bitmap).
+
+Callers could handle duplicates themselves by reading objects in order
+of their bit-position, but that's linear in the number of objects, and
+much too expensive for ordinary bitmap lookups. Building a reverse index
+solves this, since it is the logical inverse of the index, and that
+index has already removed duplicates. But, building a reverse index on
+the fly can be expensive. Since we already have an on-disk format for
+pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
+too.
+
+Objects from the MIDX are ordered as follows to string together the
+pseudo-pack. Let _pack(o)_ return the pack from which _o_ was selected
+by the MIDX, and define an ordering of packs based on their numeric ID
+(as stored by the MIDX). Let _offset(o)_ return the object offset of _o_
+within _pack(o)_. Then, compare _o~1~_ and _o~2~_ as follows:
+
+  - If one of _pack(o~1~)_ and _pack(o~2~)_ is preferred and the other
+    is not, then the preferred one sorts first.
++
+(This is a detail that allows the MIDX bitmap to determine which
+pack should be used by the pack-reuse mechanism, since it can ask
+the MIDX for the pack containing the object at bit position 0).
+
+  - If _pack(o~1~) &ne; pack(o~2~)_, then sort the two objects in
+    descending order based on the pack ID.
+
+  - Otherwise, _pack(o~1~) &equals; pack(o~2~)_, and the objects are
+    sorted in pack-order (i.e., _o~1~_ sorts ahead of _o~2~_ exactly
+    when _offset(o~1~) &lt; offset(o~2~)_).
+
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+
+Finally, note that the MIDX's reverse index is not stored as a chunk in
+the multi-pack-index itself. This is done because the reverse index
+includes the checksum of the pack or MIDX to which it belongs, which
+makes it impossible to write in the MIDX. To avoid races when rewriting
+the MIDX, a MIDX reverse index includes the MIDX's checksum in its
+filename (e.g., `multi-pack-index-xyz.rev`).
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 13/15] pack-revindex: read multi-pack reverse indexes
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (11 preceding siblings ...)
  2021-02-24 19:10   ` [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
@ 2021-02-24 19:10   ` Taylor Blau
  2021-03-02 18:36     ` Jonathan Tan
  2021-02-24 19:10   ` [PATCH v2 14/15] pack-write.c: extract 'write_rev_file_order' Taylor Blau
  2021-02-24 19:10   ` [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes Taylor Blau
  14 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Implement reading for multi-pack reverse indexes, as described in the
previous patch.

Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.

There are three new functions exposed by the revindex API:

  - load_midx_revindex(): loads the reverse index corresponding to the
    given multi-pack index.

  - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
    multi-pack index and pseudo-pack order.

load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.

load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.

pack_pos_to_midx() simply reads the corresponding entry out of the
table.

midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array, but rather a _permutation_.

So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c          |  11 +++++
 midx.h          |   6 +++
 pack-revindex.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++
 pack-revindex.h |  53 ++++++++++++++++++++
 packfile.c      |   3 ++
 5 files changed, 200 insertions(+)

diff --git a/midx.c b/midx.c
index 27a8b76dfe..8d7a8927b8 100644
--- a/midx.c
+++ b/midx.c
@@ -47,11 +47,22 @@ static uint8_t oid_version(void)
 	}
 }
 
+static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+{
+	return m->data + m->data_len - the_hash_algo->rawsz;
+}
+
 static char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+char *get_midx_rev_filename(struct multi_pack_index *m)
+{
+	return xstrfmt("%s/pack/multi-pack-index-%s.rev",
+		       m->object_dir, hash_to_hex(get_midx_checksum(m)));
+}
+
 static int midx_read_oid_fanout(const unsigned char *chunk_start,
 				size_t chunk_size, void *data)
 {
diff --git a/midx.h b/midx.h
index 93bd68189e..0a8294d2ee 100644
--- a/midx.h
+++ b/midx.h
@@ -15,6 +15,10 @@ struct multi_pack_index {
 	const unsigned char *data;
 	size_t data_len;
 
+	const uint32_t *revindex_data;
+	const uint32_t *revindex_map;
+	size_t revindex_len;
+
 	uint32_t signature;
 	unsigned char version;
 	unsigned char hash_len;
@@ -37,6 +41,8 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 
+char *get_midx_rev_filename(struct multi_pack_index *m);
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
diff --git a/pack-revindex.c b/pack-revindex.c
index 83fe4de773..2e15ba3a8f 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -3,6 +3,7 @@
 #include "object-store.h"
 #include "packfile.h"
 #include "config.h"
+#include "midx.h"
 
 struct revindex_entry {
 	off_t offset;
@@ -292,6 +293,44 @@ int load_pack_revindex(struct packed_git *p)
 	return -1;
 }
 
+int load_midx_revindex(struct multi_pack_index *m)
+{
+	char *revindex_name;
+	int ret;
+	if (m->revindex_data)
+		return 0;
+
+	revindex_name = get_midx_rev_filename(m);
+
+	ret = load_revindex_from_disk(revindex_name,
+				      m->num_objects,
+				      &m->revindex_map,
+				      &m->revindex_len);
+	if (ret)
+		goto cleanup;
+
+	m->revindex_data = (const uint32_t *)((const char *)m->revindex_map + RIDX_HEADER_SIZE);
+
+cleanup:
+	free(revindex_name);
+	return ret;
+}
+
+int close_midx_revindex(struct multi_pack_index *m)
+{
+	if (!m)
+		return 0;
+
+	if (munmap((void*)m->revindex_map, m->revindex_len))
+		return -1;
+
+	m->revindex_map = NULL;
+	m->revindex_data = NULL;
+	m->revindex_len = 0;
+
+	return 0;
+}
+
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	unsigned lo, hi;
@@ -346,3 +385,91 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
 	else
 		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
 }
+
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
+{
+	if (!m->revindex_data)
+		BUG("pack_pos_to_midx: reverse index not yet loaded");
+	if (m->num_objects <= pos)
+		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
+	return get_be32((const char *)m->revindex_data + (pos * sizeof(uint32_t)));
+}
+
+struct midx_pack_key {
+	uint32_t pack;
+	off_t offset;
+
+	uint32_t preferred_pack;
+	struct multi_pack_index *midx;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+	const struct midx_pack_key *key = va;
+	struct multi_pack_index *midx = key->midx;
+
+	uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+	uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
+	off_t versus_offset;
+
+	uint32_t key_preferred = key->pack == key->preferred_pack;
+	uint32_t versus_preferred = versus_pack == key->preferred_pack;
+
+	/*
+	 * First, compare the preferred-ness, noting that the preferred pack
+	 * comes first.
+	 */
+	if (key_preferred && !versus_preferred)
+		return -1;
+	else if (!key_preferred && versus_preferred)
+		return 1;
+
+	/* Then, break ties first by comparing the pack IDs. */
+	if (key->pack < versus_pack)
+		return -1;
+	else if (key->pack > versus_pack)
+		return 1;
+
+	/* Finally, break ties by comparing offsets within a pack. */
+	versus_offset = nth_midxed_offset(midx, versus);
+	if (key->offset < versus_offset)
+		return -1;
+	else if (key->offset > versus_offset)
+		return 1;
+
+	return 0;
+}
+
+int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
+{
+	struct midx_pack_key key;
+	uint32_t *found;
+
+	if (!m->revindex_data)
+		BUG("midx_to_pack_pos: reverse index not yet loaded");
+	if (m->num_objects <= at)
+		BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
+
+	key.pack = nth_midxed_pack_int_id(m, at);
+	key.offset = nth_midxed_offset(m, at);
+	key.midx = m;
+	/*
+	 * The preferred pack sorts first, so determine its identifier by
+	 * looking at the first object in pseudo-pack order.
+	 *
+	 * Note that if no --preferred-pack is explicitly given when writing a
+	 * multi-pack index, then whichever pack has the lowest identifier
+	 * implicitly is preferred (and includes all its objects, since ties are
+	 * broken first by pack identifier).
+	 */
+	key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+
+	found = bsearch(&key, m->revindex_data, m->num_objects,
+			sizeof(uint32_t), midx_pack_order_cmp);
+
+	if (!found)
+		return error("bad offset for revindex");
+
+	*pos = found - m->revindex_data;
+	return 0;
+}
diff --git a/pack-revindex.h b/pack-revindex.h
index ba7c82c125..479b8f2f9c 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -14,6 +14,20 @@
  *
  * - offset: the byte offset within the .pack file at which the object contents
  *   can be found
+ *
+ * The revindex can also be used with a multi-pack index (MIDX). In this
+ * setting:
+ *
+ *   - index position refers to an object's numeric position within the MIDX
+ *
+ *   - pack position refers to an object's position within a non-existent pack
+ *     described by the MIDX. The pack structure is described in
+ *     Documentation/technical/pack-format.txt.
+ *
+ *     It is effectively a concatanation of all packs in the MIDX (ordered by
+ *     their numeric ID within the MIDX) in their original order within each
+ *     pack), removing duplicates, and placing the preferred pack (if any)
+ *     first.
  */
 
 
@@ -24,6 +38,7 @@
 #define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
 
 struct packed_git;
+struct multi_pack_index;
 
 /*
  * load_pack_revindex populates the revindex's internal data-structures for the
@@ -34,6 +49,22 @@ struct packed_git;
  */
 int load_pack_revindex(struct packed_git *p);
 
+/*
+ * load_midx_revindex loads the '.rev' file corresponding to the given
+ * multi-pack index by mmap-ing it and assigning pointers in the
+ * multi_pack_index to point at it.
+ *
+ * A negative number is returned on error.
+ */
+int load_midx_revindex(struct multi_pack_index *m);
+
+/*
+ * Frees resources associated with a multi-pack reverse index.
+ *
+ * A negative number is returned on error.
+ */
+int close_midx_revindex(struct multi_pack_index *m);
+
 /*
  * offset_to_pack_pos converts an object offset to a pack position. This
  * function returns zero on success, and a negative number otherwise. The
@@ -71,4 +102,26 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
  */
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
 
+/*
+ * pack_pos_to_midx converts the object at position "pos" within the MIDX
+ * pseudo-pack into a MIDX position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in time O(log N) with the number of objects in the MIDX.
+ */
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos);
+
+/*
+ * midx_to_pack_pos converts from the MIDX-relative position at "at" to the
+ * corresponding pack position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in constant time.
+ */
+int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 1fec12ac5f..82623e0cb4 100644
--- a/packfile.c
+++ b/packfile.c
@@ -862,6 +862,9 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
+	if (starts_with(file_name, "multi-pack-index") &&
+	    ends_with(file_name, ".rev"))
+		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
 	    ends_with(file_name, ".pack") ||
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 14/15] pack-write.c: extract 'write_rev_file_order'
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (12 preceding siblings ...)
  2021-02-24 19:10   ` [PATCH v2 13/15] pack-revindex: read " Taylor Blau
@ 2021-02-24 19:10   ` Taylor Blau
  2021-02-24 19:10   ` [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes Taylor Blau
  14 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Existing callers provide the reverse index code with an array of 'struct
pack_idx_entry *'s, which is then sorted by pack order (comparing the
offsets of each object within the pack).

Prepare for the multi-pack index to write a .rev file by providing a way
to write the reverse index without an array of pack_idx_entry (which the
MIDX code does not have).

Instead, callers can invoke 'write_rev_index_positions()', which takes
an array of uint32_t's. The ith entry in this array specifies the ith
object's (in index order) position within the pack (in pack order).

Expose this new function for use in a later patch, and rewrite the
existing write_rev_file() in terms of this new function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-write.c | 39 ++++++++++++++++++++++++++++-----------
 pack.h       |  1 +
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/pack-write.c b/pack-write.c
index 680c36755d..75fcf70db1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -201,21 +201,12 @@ static void write_rev_header(struct hashfile *f)
 }
 
 static void write_rev_index_positions(struct hashfile *f,
-				      struct pack_idx_entry **objects,
+				      uint32_t *pack_order,
 				      uint32_t nr_objects)
 {
-	uint32_t *pack_order;
 	uint32_t i;
-
-	ALLOC_ARRAY(pack_order, nr_objects);
-	for (i = 0; i < nr_objects; i++)
-		pack_order[i] = i;
-	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
-
 	for (i = 0; i < nr_objects; i++)
 		hashwrite_be32(f, pack_order[i]);
-
-	free(pack_order);
 }
 
 static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
@@ -228,6 +219,32 @@ const char *write_rev_file(const char *rev_name,
 			   uint32_t nr_objects,
 			   const unsigned char *hash,
 			   unsigned flags)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+	const char *ret;
+
+	ALLOC_ARRAY(pack_order, nr_objects);
+	for (i = 0; i < nr_objects; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
+
+	if (!(flags & (WRITE_REV | WRITE_REV_VERIFY)))
+		return NULL;
+
+	ret = write_rev_file_order(rev_name, pack_order, nr_objects, hash,
+				   flags);
+
+	free(pack_order);
+
+	return ret;
+}
+
+const char *write_rev_file_order(const char *rev_name,
+				 uint32_t *pack_order,
+				 uint32_t nr_objects,
+				 const unsigned char *hash,
+				 unsigned flags)
 {
 	struct hashfile *f;
 	int fd;
@@ -262,7 +279,7 @@ const char *write_rev_file(const char *rev_name,
 
 	write_rev_header(f);
 
-	write_rev_index_positions(f, objects, nr_objects);
+	write_rev_index_positions(f, pack_order, nr_objects);
 	write_rev_trailer(f, hash);
 
 	if (rev_name && adjust_shared_perm(rev_name) < 0)
diff --git a/pack.h b/pack.h
index afdcf8f5c7..09c2a7dd3a 100644
--- a/pack.h
+++ b/pack.h
@@ -94,6 +94,7 @@ struct ref;
 void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
 
 const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
+const char *write_rev_file_order(const char *rev_name, uint32_t *pack_order, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
                     ` (13 preceding siblings ...)
  2021-02-24 19:10   ` [PATCH v2 14/15] pack-write.c: extract 'write_rev_file_order' Taylor Blau
@ 2021-02-24 19:10   ` Taylor Blau
  2021-03-02 18:40     ` Jonathan Tan
  14 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-02-24 19:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, avarab, gitster

Implement the writing half of multi-pack reverse indexes. This is
nothing more than the format describe a few patches ago, with a new set
of helper functions that will be used to clear out stale .rev files
corresponding to old MIDXs.

Unfortunately, a very similar comparison function as the one implemented
recently in pack-revindex.c is reimplemented here, this time accepting a
MIDX-internal type. An effort to DRY these up would create more
indirection and overhead than is necessary, so it isn't pursued here.

Currently, there are no callers which pass the MIDX_WRITE_REV_INDEX
flag, meaning that this is all dead code. But, that won't be the case
for long, since subsequent patches will introduce the multi-pack bitmap,
which will begin passing this field.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 midx.h |   1 +
 2 files changed, 112 insertions(+)

diff --git a/midx.c b/midx.c
index 8d7a8927b8..820276cc45 100644
--- a/midx.c
+++ b/midx.c
@@ -12,6 +12,7 @@
 #include "run-command.h"
 #include "repository.h"
 #include "chunk-format.h"
+#include "pack.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -472,6 +473,7 @@ struct write_midx_context {
 	uint32_t entries_nr;
 
 	uint32_t *pack_perm;
+	uint32_t *pack_order;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
 
@@ -826,6 +828,66 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
+static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
+{
+	struct write_midx_context *ctx = _ctx;
+
+	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
+	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
+
+	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
+	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
+
+	/* Sort objects in the preferred pack ahead of any others. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
+	/* Then, order objects by which packs they appear in. */
+	if (perm_a < perm_b)
+		return -1;
+	if (perm_a > perm_b)
+		return 1;
+
+	/* Then, disambiguate by their offset within each pack. */
+	if (a->offset < b->offset)
+		return -1;
+	if (a->offset > b->offset)
+		return 1;
+
+	return 0;
+}
+
+static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
+
+	return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+				     struct write_midx_context *ctx)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+	write_rev_file_order(buf.buf, ctx->pack_order, ctx->entries_nr,
+			     midx_hash, WRITE_REV);
+
+	strbuf_release(&buf);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash);
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -1018,6 +1080,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		ctx.pack_order = midx_pack_order(&ctx);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 	commit_lock_file(&lk);
 
 cleanup:
@@ -1032,6 +1102,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.info);
 	free(ctx.entries);
 	free(ctx.pack_perm);
+	free(ctx.pack_order);
 	free(midx_name);
 	return result;
 }
@@ -1044,6 +1115,44 @@ int write_midx_file(const char *object_dir,
 				   flags);
 }
 
+struct clear_midx_data {
+	char *keep;
+	const char *ext;
+};
+
+static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
+				const char *file_name, void *_data)
+{
+	struct clear_midx_data *data = _data;
+
+	if (!(starts_with(file_name, "multi-pack-index-") &&
+	      ends_with(file_name, data->ext)))
+		return;
+	if (data->keep && !strcmp(data->keep, file_name))
+		return;
+
+	if (unlink(full_path))
+		die_errno(_("failed to remove %s"), full_path);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash)
+{
+	struct clear_midx_data data;
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	if (keep_hash)
+		data.keep = xstrfmt("multi-pack-index-%s%s",
+				    hash_to_hex(keep_hash), ext);
+	data.ext = ext;
+
+	for_each_file_in_pack_dir(r->objects->odb->path,
+				  clear_midx_file_ext,
+				  &data);
+
+	free(data.keep);
+}
+
 void clear_midx_file(struct repository *r)
 {
 	char *midx = get_midx_filename(r->objects->odb->path);
@@ -1056,6 +1165,8 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".rev", NULL);
+
 	free(midx);
 }
 
diff --git a/midx.h b/midx.h
index 0a8294d2ee..8684cf0fef 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,7 @@ struct multi_pack_index {
 };
 
 #define MIDX_PROGRESS     (1 << 0)
+#define MIDX_WRITE_REV_INDEX (1 << 1)
 
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands
  2021-02-24 19:09   ` [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands Taylor Blau
@ 2021-03-02  4:06     ` Jonathan Tan
  2021-03-02 19:02       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jonathan Tan @ 2021-03-02  4:06 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, avarab, gitster, Jonathan Tan

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index eea498e026..caf0248a98 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -5,17 +5,33 @@
>  #include "midx.h"
>  #include "trace2.h"
>  
> +static char const * const builtin_multi_pack_index_write_usage[] = {
>  #define BUILTIN_MIDX_WRITE_USAGE \
>  	N_("git multi-pack-index [<options>] write")
> +	BUILTIN_MIDX_WRITE_USAGE,
> +	NULL
> +};

I think this way of writing is vulnerable to confusing errors if a
missing or extra backslash happens, so I would prefer the #define to be
outside the variable declaration.

> +static int cmd_multi_pack_index_repack(int argc, const char **argv)
> +{
> +	struct option *options;
> +	static struct option builtin_multi_pack_index_repack_options[] = {
>  		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
>  		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
>  		OPT_END(),
>  	};
>  
> +	options = parse_options_dup(builtin_multi_pack_index_repack_options);
> +	options = add_common_options(options);

I looked for where this was freed, but I guess freeing this struct is
not really something we're worried about (which makes sense).

The other patches up to this one look good.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 08/15] midx: allow marking a pack as preferred
  2021-02-24 19:09   ` [PATCH v2 08/15] midx: allow marking a pack as preferred Taylor Blau
@ 2021-03-02  4:17     ` Jonathan Tan
  2021-03-02 19:09       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jonathan Tan @ 2021-03-02  4:17 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, avarab, gitster, Jonathan Tan

> @@ -589,12 +619,17 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>  				nth_midxed_pack_midx_entry(m,
>  							   &entries_by_fanout[nr_fanout],
>  							   cur_object);
> +				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
> +					entries_by_fanout[nr_fanout].preferred = 1;
> +				else
> +					entries_by_fanout[nr_fanout].preferred = 0;
>  				nr_fanout++;
>  			}
>  		}
>  
>  		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
>  			uint32_t start = 0, end;
> +			int preferred = cur_pack == preferred_pack;
>  
>  			if (cur_fanout)
>  				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
> @@ -602,7 +637,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>  
>  			for (cur_object = start; cur_object < end; cur_object++) {
>  				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
> -				fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
> +				fill_pack_entry(cur_pack,
> +						info[cur_pack].p,
> +						cur_object,
> +						&entries_by_fanout[nr_fanout],
> +						preferred);
>  				nr_fanout++;
>  			}
>  		}

I was initially confused that "preferred" was set twice, but this makes
sense - the first one is when an existing midx is reused, and the second
one is for objects in packs that the midx (if it exists) does not cover.

> @@ -828,7 +869,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
>  		goto cleanup;
>  
> -	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
> +	if (preferred_pack_name) {
> +		for (i = 0; i < ctx.nr; i++) {
> +			if (!cmp_idx_or_pack_name(preferred_pack_name,
> +						  ctx.info[i].pack_name)) {
> +				ctx.preferred_pack_idx = i;
> +				break;
> +			}
> +		}
> +	} else
> +		ctx.preferred_pack_idx = -1;

Looks safer to put "ctx.preferred_pack_idx = -1" before the "if", just
in case the given pack name does not exist?

> @@ -889,6 +942,31 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
>  	}
>  
> +	/*
> +	 * Recompute the preferred_pack_idx (if applicable) according to the
> +	 * permuted pack order.
> +	 */
> +	ctx.preferred_pack_idx = -1;
> +	if (preferred_pack_name) {
> +		ctx.preferred_pack_idx = lookup_idx_or_pack_name(ctx.info,
> +							     ctx.nr,
> +							     preferred_pack_name);
> +		if (ctx.preferred_pack_idx < 0)
> +			warning(_("unknown preferred pack: '%s'"),
> +				preferred_pack_name);
> +		else {
> +			uint32_t orig = ctx.info[ctx.preferred_pack_idx].orig_pack_int_id;
> +			uint32_t perm = ctx.pack_perm[orig];
> +
> +			if (perm == PACK_EXPIRED) {
> +				warning(_("preferred pack '%s' is expired"),
> +					preferred_pack_name);
> +				ctx.preferred_pack_idx = -1;
> +			} else
> +				ctx.preferred_pack_idx = perm;
> +		}
> +	}

I couldn't figure out why the preferred pack index needs to be
recalculated here, since the pack entries would have already been
sorted. Also, the tests still pass when I comment this part out. A
comment describing what's going on would be helpful.

All previous patches look good to me.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes
  2021-02-24 19:10   ` [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
@ 2021-03-02  4:21     ` Jonathan Tan
  2021-03-02  4:36       ` Taylor Blau
  2021-03-02 19:15       ` Taylor Blau
  0 siblings, 2 replies; 171+ messages in thread
From: Jonathan Tan @ 2021-03-02  4:21 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, avarab, gitster, Jonathan Tan

> +== multi-pack-index reverse indexes
> +
> +Similar to the pack-based reverse index, the multi-pack index can also
> +be used to generate a reverse index.
> +
> +Instead of mapping between offset, pack-, and index position, this
> +reverse index maps between an object's position within the MIDX, and
> +that object's position within a pseudo-pack that the MIDX describes.
> +
> +To clarify these three orderings

The paragraph seems to only describe 2 orderings - object's position
within the MIDX and object's position within the pseudo-pack. (Is the
third one the offset within the MIDX - which is, I believe, trivially
computable from the position within the MIDX?)

Also, which are stored in the .rev file?

The previous patches look good to me, and I'll review the remaining
patches hopefully tomorrow.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes
  2021-03-02  4:21     ` Jonathan Tan
@ 2021-03-02  4:36       ` Taylor Blau
  2021-03-02 19:15       ` Taylor Blau
  1 sibling, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-02  4:36 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: me, git, peff, dstolee, avarab, gitster

On Mon, Mar 01, 2021 at 08:21:11PM -0800, Jonathan Tan wrote:
> The previous patches look good to me, and I'll review the remaining
> patches hopefully tomorrow.

Thanks; I am sorely behind recent activity on the list. I had a
last-minute errand to run last weekend and I haven't managed to quite
dig out of the hole I created for myself since then.

Incidentally, I have had this code (and the tb/multi-pack-bitmaps)
running on a couple of high-traffic repositories internal to GitHub, and
so have a couple of improvements that I was hoping to squash in, too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 13/15] pack-revindex: read multi-pack reverse indexes
  2021-02-24 19:10   ` [PATCH v2 13/15] pack-revindex: read " Taylor Blau
@ 2021-03-02 18:36     ` Jonathan Tan
  2021-03-03 15:27       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jonathan Tan @ 2021-03-02 18:36 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, avarab, gitster, Jonathan Tan

> midx_to_pack_pos() is the trickiest, since it needs to find an object's
> position in the psuedo-pack order, but that order can only be recovered
> in the .rev file itself. This mapping can be implemented with a binary
> search, but note that the thing we're binary searching over isn't an
> array, but rather a _permutation_.
> 
> So, when comparing two items, it's helpful to keep in mind the
> difference. Instead of a traditional binary search, where you are
> comparing two things directly, here we're comparing a (pack, offset)
> tuple with an index into the multi-pack index. That index describes
> another (pack, offset) tuple, and it is _those_ two tuples that are
> compared.

Well, the binary search is indeed over an array :-)

I understood that the array we're searching over is an array of indexes
into the MIDX in pack-pos order, so I understood what's written here. It
might be easier for other readers if we just say that we're treating the
elements of this array not as indexes into MIDX but as their
corresponding (is-preferred-pack, pack number, offset) tuples. But I'm
fine with retaining the existing wording too.

The patch itself looks good.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes
  2021-02-24 19:10   ` [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes Taylor Blau
@ 2021-03-02 18:40     ` Jonathan Tan
  2021-03-03 15:30       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jonathan Tan @ 2021-03-02 18:40 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, avarab, gitster, Jonathan Tan

> @@ -1018,6 +1080,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  
>  	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
>  	free_chunkfile(cf);
> +
> +	if (flags & MIDX_WRITE_REV_INDEX)
> +		ctx.pack_order = midx_pack_order(&ctx);
> +
> +	if (flags & MIDX_WRITE_REV_INDEX)
> +		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> +
>  	commit_lock_file(&lk);
>  
>  cleanup:

Any reason why we're using 2 separate "if" statements?

Other than that, this patch and patch 14 look good. Besides all my minor
comments, I think the overall patch set is in good shape and ready to be
merged. It's great that we could reuse some of the individual-pack reverse
index concepts and code too.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands
  2021-03-02  4:06     ` Jonathan Tan
@ 2021-03-02 19:02       ` Taylor Blau
  2021-03-04  1:54         ` Jonathan Tan
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-02 19:02 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Mon, Mar 01, 2021 at 08:06:25PM -0800, Jonathan Tan wrote:
> > +static char const * const builtin_multi_pack_index_write_usage[] = {
> >  #define BUILTIN_MIDX_WRITE_USAGE \
> >  	N_("git multi-pack-index [<options>] write")
> > +	BUILTIN_MIDX_WRITE_USAGE,
> > +	NULL
> > +};
>
> I think this way of writing is vulnerable to confusing errors if a
> missing or extra backslash happens, so I would prefer the #define to be
> outside the variable declaration.

Yeah, I can't say that I disagree with you. Of course, having the
#define's outside of the declaration makes the whole thing a little more
verbose, which isn't a huge deal.

But I was mirroring what Ævar was doing in the sub-thread he started at:

    https://public-inbox.org/git/20210215184118.11306-1-avarab@gmail.com/

Unless you feel strongly, I think that what we have isn't so bad here.

> > +static int cmd_multi_pack_index_repack(int argc, const char **argv)
> > +{
> > +	struct option *options;
> > +	static struct option builtin_multi_pack_index_repack_options[] = {
> >  		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
> >  		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
> >  		OPT_END(),
> >  	};
> >
> > +	options = parse_options_dup(builtin_multi_pack_index_repack_options);
> > +	options = add_common_options(options);
>
> I looked for where this was freed, but I guess freeing this struct is
> not really something we're worried about (which makes sense).

Right.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 08/15] midx: allow marking a pack as preferred
  2021-03-02  4:17     ` Jonathan Tan
@ 2021-03-02 19:09       ` Taylor Blau
  2021-03-04  2:00         ` Jonathan Tan
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-02 19:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Mon, Mar 01, 2021 at 08:17:53PM -0800, Jonathan Tan wrote:
> I was initially confused that "preferred" was set twice, but this makes
> sense - the first one is when an existing midx is reused, and the second
> one is for objects in packs that the midx (if it exists) does not cover.

Yep. Those two paths permeate a lot of the MIDX writer code, since it
wants to reuse work from an existing MIDX if it can find one.

> > @@ -828,7 +869,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
> >  		goto cleanup;
> >
> > -	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
> > +	if (preferred_pack_name) {
> > +		for (i = 0; i < ctx.nr; i++) {
> > +			if (!cmp_idx_or_pack_name(preferred_pack_name,
> > +						  ctx.info[i].pack_name)) {
> > +				ctx.preferred_pack_idx = i;
> > +				break;
> > +			}
> > +		}
> > +	} else
> > +		ctx.preferred_pack_idx = -1;
>
> Looks safer to put "ctx.preferred_pack_idx = -1" before the "if", just
> in case the given pack name does not exist?

Agreed.

> > @@ -889,6 +942,31 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
> >  	}
> >
> > +	/*
> > +	 * Recompute the preferred_pack_idx (if applicable) according to the
> > +	 * permuted pack order.
> > +	 */
> > +	ctx.preferred_pack_idx = -1;
> > +	if (preferred_pack_name) {
> > +		ctx.preferred_pack_idx = lookup_idx_or_pack_name(ctx.info,
> > +							     ctx.nr,
> > +							     preferred_pack_name);
> > +		if (ctx.preferred_pack_idx < 0)
> > +			warning(_("unknown preferred pack: '%s'"),
> > +				preferred_pack_name);
> > +		else {
> > +			uint32_t orig = ctx.info[ctx.preferred_pack_idx].orig_pack_int_id;
> > +			uint32_t perm = ctx.pack_perm[orig];
> > +
> > +			if (perm == PACK_EXPIRED) {
> > +				warning(_("preferred pack '%s' is expired"),
> > +					preferred_pack_name);
> > +				ctx.preferred_pack_idx = -1;
> > +			} else
> > +				ctx.preferred_pack_idx = perm;
> > +		}
> > +	}
>
> I couldn't figure out why the preferred pack index needs to be
> recalculated here, since the pack entries would have already been
> sorted. Also, the tests still pass when I comment this part out. A
> comment describing what's going on would be helpful.

Funny you mention that; I was wondering the same thing myself the other
day when reading these patches again before deploying them to a couple
of testing repositories at GitHub.

It is totally unnecessary: since we have already marked objects from the
preferred pack in get_sorted_entries(), the rest of the code doesn't
care if the preferred pack was permuted or not.

But we *do* care if the pack which was preferred expired. The 'git
repack --geometric --write-midx' caller (which will appear in a later
series) should never do that, so emitting a warning() is worthwhile. I
think ultimately you want something like this squashed in:

--- >8 ---

diff --git a/midx.c b/midx.c
index d2c56c4bc6..46f55ff6cf 100644
--- a/midx.c
+++ b/midx.c
@@ -582,7 +582,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  struct pack_info *info,
 						  uint32_t nr_packs,
 						  uint32_t *nr_objects,
-						  uint32_t preferred_pack)
+						  int preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
 	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@@ -869,6 +869,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;

+	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
@@ -877,8 +878,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 				break;
 			}
 		}
-	} else
-		ctx.preferred_pack_idx = -1;
+	}

 	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
 					 ctx.preferred_pack_idx);
@@ -942,28 +942,21 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
 	}

-	/*
-	 * Recompute the preferred_pack_idx (if applicable) according to the
-	 * permuted pack order.
-	 */
-	ctx.preferred_pack_idx = -1;
+	/* Check that the preferred pack wasn't expired (if given). */
 	if (preferred_pack_name) {
-		ctx.preferred_pack_idx = lookup_idx_or_pack_name(ctx.info,
-							     ctx.nr,
-							     preferred_pack_name);
-		if (ctx.preferred_pack_idx < 0)
+		int preferred_idx = lookup_idx_or_pack_name(ctx.info,
+							    ctx.nr,
+							    preferred_pack_name);
+		if (preferred_idx < 0)
 			warning(_("unknown preferred pack: '%s'"),
 				preferred_pack_name);
 		else {
-			uint32_t orig = ctx.info[ctx.preferred_pack_idx].orig_pack_int_id;
+			uint32_t orig = ctx.info[preferred_idx].orig_pack_int_id;
 			uint32_t perm = ctx.pack_perm[orig];

-			if (perm == PACK_EXPIRED) {
+			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
 					preferred_pack_name);
-				ctx.preferred_pack_idx = -1;
-			} else
-				ctx.preferred_pack_idx = perm;
 		}
 	}


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes
  2021-03-02  4:21     ` Jonathan Tan
  2021-03-02  4:36       ` Taylor Blau
@ 2021-03-02 19:15       ` Taylor Blau
  2021-03-04  2:03         ` Jonathan Tan
  1 sibling, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-02 19:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Mon, Mar 01, 2021 at 08:21:11PM -0800, Jonathan Tan wrote:
> > +== multi-pack-index reverse indexes
> > +
> > +Similar to the pack-based reverse index, the multi-pack index can also
> > +be used to generate a reverse index.
> > +
> > +Instead of mapping between offset, pack-, and index position, this
> > +reverse index maps between an object's position within the MIDX, and
> > +that object's position within a pseudo-pack that the MIDX describes.
> > +
> > +To clarify these three orderings
>
> The paragraph seems to only describe 2 orderings - object's position
> within the MIDX and object's position within the pseudo-pack. (Is the
> third one the offset within the MIDX - which is, I believe, trivially
> computable from the position within the MIDX?)

Sorry for the confusion. I was trying to distinguish between ordering
based on object offset, pack position, and index position.

I guess you could count that as 2, 3, or 4 different orderings (if you
classify "pack vs MIDX", "offset vs pack pos vs index pos" or the last
three plus "vs MIDX pos").

But I think that all of that is needlessly confusing, so I'd much rather
just say "To clarify the difference between these orderings".

> Also, which are stored in the .rev file?

The paragraph above describes it a little bit "this reverse index maps
between ...", but I think it could be made clearer. (I was intentionally
brief there since I wanted to not get too far into the details before
explaining the relevant concepts, but I think I went too far).

How does this sound?

--- >8 ---

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 77eb591057..4bbbb188a4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -387,12 +387,15 @@ be used to generate a reverse index.

 Instead of mapping between offset, pack-, and index position, this
 reverse index maps between an object's position within the MIDX, and
-that object's position within a pseudo-pack that the MIDX describes.
+that object's position within a pseudo-pack that the MIDX describes
+(i.e., the ith entry of the multi-pack reverse index holds the MIDX
+position of ith object in pseudo-pack order).

-To clarify these three orderings, consider a multi-pack reachability
-bitmap (which does not yet exist, but is what we are building towards
-here). Each bit needs to correspond to an object in the MIDX, and so we
-need an efficient mapping from bit position to MIDX position.
+To clarify the difference between these orderings, consider a multi-pack
+reachability bitmap (which does not yet exist, but is what we are
+building towards here). Each bit needs to correspond to an object in the
+MIDX, and so we need an efficient mapping from bit position to MIDX
+position.

 One solution is to let bits occupy the same position in the oid-sorted
 index stored by the MIDX. But because oids are effectively random, there

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 13/15] pack-revindex: read multi-pack reverse indexes
  2021-03-02 18:36     ` Jonathan Tan
@ 2021-03-03 15:27       ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-03 15:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Tue, Mar 02, 2021 at 10:36:20AM -0800, Jonathan Tan wrote:
> > midx_to_pack_pos() is the trickiest, since it needs to find an object's
> > position in the psuedo-pack order, but that order can only be recovered
> > in the .rev file itself. This mapping can be implemented with a binary
> > search, but note that the thing we're binary searching over isn't an
> > array, but rather a _permutation_.
> >
> > So, when comparing two items, it's helpful to keep in mind the
> > difference. Instead of a traditional binary search, where you are
> > comparing two things directly, here we're comparing a (pack, offset)
> > tuple with an index into the multi-pack index. That index describes
> > another (pack, offset) tuple, and it is _those_ two tuples that are
> > compared.
>
> Well, the binary search is indeed over an array :-)

:-). This might be more clearer as:

  ...isn't an array of values, but rather a permuted order of those values.

> The patch itself looks good.

Thanks.

Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes
  2021-03-02 18:40     ` Jonathan Tan
@ 2021-03-03 15:30       ` Taylor Blau
  2021-03-04  2:04         ` Jonathan Tan
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-03 15:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Tue, Mar 02, 2021 at 10:40:33AM -0800, Jonathan Tan wrote:
> > @@ -1018,6 +1080,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >
> >  	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
> >  	free_chunkfile(cf);
> > +
> > +	if (flags & MIDX_WRITE_REV_INDEX)
> > +		ctx.pack_order = midx_pack_order(&ctx);
> > +
> > +	if (flags & MIDX_WRITE_REV_INDEX)
> > +		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> > +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> > +
> >  	commit_lock_file(&lk);
> >
> >  cleanup:
>
> Any reason why we're using 2 separate "if" statements?

Yeah. This first if statement will turn into:

  if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))

so that the pack order is computed in either case (since both the
existing write_midx_reverse_index() and the eventual write_midx_bitmap()
will be able to use the pack order).

Arguably there is never a practical reason to write one without the
other (and writing a MIDX bitmap without a reverse index is a bug), so
perhaps these options should be consolidated.

But that's cleanup that I'd rather do after all of this has settled
(since it'd be weird to say: "here's the option to write bitmaps, except
we can't write multi-pack bitmaps yet, but setting it actually writes
this other thing").

> Other than that, this patch and patch 14 look good. Besides all my minor
> comments, I think the overall patch set is in good shape and ready to be
> merged. It's great that we could reuse some of the individual-pack reverse
> index concepts and code too.

Thanks, I am really glad that you had a chance to take a look at it. I
always find your review quite helpful.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands
  2021-03-02 19:02       ` Taylor Blau
@ 2021-03-04  1:54         ` Jonathan Tan
  2021-03-04  3:02           ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jonathan Tan @ 2021-03-04  1:54 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, peff, dstolee, avarab, gitster

> On Mon, Mar 01, 2021 at 08:06:25PM -0800, Jonathan Tan wrote:
> > > +static char const * const builtin_multi_pack_index_write_usage[] = {
> > >  #define BUILTIN_MIDX_WRITE_USAGE \
> > >  	N_("git multi-pack-index [<options>] write")
> > > +	BUILTIN_MIDX_WRITE_USAGE,
> > > +	NULL
> > > +};
> >
> > I think this way of writing is vulnerable to confusing errors if a
> > missing or extra backslash happens, so I would prefer the #define to be
> > outside the variable declaration.
> 
> Yeah, I can't say that I disagree with you. Of course, having the
> #define's outside of the declaration makes the whole thing a little more
> verbose, which isn't a huge deal.

I think it's the same verbosity - you just need to move the lines?

> But I was mirroring what Ævar was doing in the sub-thread he started at:
> 
>     https://public-inbox.org/git/20210215184118.11306-1-avarab@gmail.com/
> 
> Unless you feel strongly, I think that what we have isn't so bad here.

Yeah I don't feel that strongly about it.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 08/15] midx: allow marking a pack as preferred
  2021-03-02 19:09       ` Taylor Blau
@ 2021-03-04  2:00         ` Jonathan Tan
  2021-03-04  3:04           ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jonathan Tan @ 2021-03-04  2:00 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, peff, dstolee, avarab, gitster

> Funny you mention that; I was wondering the same thing myself the other
> day when reading these patches again before deploying them to a couple
> of testing repositories at GitHub.
> 
> It is totally unnecessary: since we have already marked objects from the
> preferred pack in get_sorted_entries(), the rest of the code doesn't
> care if the preferred pack was permuted or not.
> 
> But we *do* care if the pack which was preferred expired. The 'git
> repack --geometric --write-midx' caller (which will appear in a later
> series) should never do that, so emitting a warning() is worthwhile.

Ah, this makes sense.

> I
> think ultimately you want something like this squashed in:
> 
> --- >8 ---
> 
> diff --git a/midx.c b/midx.c
> index d2c56c4bc6..46f55ff6cf 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -582,7 +582,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>  						  struct pack_info *info,
>  						  uint32_t nr_packs,
>  						  uint32_t *nr_objects,
> -						  uint32_t preferred_pack)
> +						  int preferred_pack)

Why this change?

>  {
>  	uint32_t cur_fanout, cur_pack, cur_object;
>  	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
> @@ -869,6 +869,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
>  		goto cleanup;
> 
> +	ctx.preferred_pack_idx = -1;
>  	if (preferred_pack_name) {
>  		for (i = 0; i < ctx.nr; i++) {
>  			if (!cmp_idx_or_pack_name(preferred_pack_name,
> @@ -877,8 +878,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  				break;
>  			}
>  		}
> -	} else
> -		ctx.preferred_pack_idx = -1;
> +	}
> 
>  	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
>  					 ctx.preferred_pack_idx);
> @@ -942,28 +942,21 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
>  	}
> 
> -	/*
> -	 * Recompute the preferred_pack_idx (if applicable) according to the
> -	 * permuted pack order.
> -	 */
> -	ctx.preferred_pack_idx = -1;
> +	/* Check that the preferred pack wasn't expired (if given). */
>  	if (preferred_pack_name) {
> -		ctx.preferred_pack_idx = lookup_idx_or_pack_name(ctx.info,
> -							     ctx.nr,
> -							     preferred_pack_name);
> -		if (ctx.preferred_pack_idx < 0)
> +		int preferred_idx = lookup_idx_or_pack_name(ctx.info,
> +							    ctx.nr,
> +							    preferred_pack_name);
> +		if (preferred_idx < 0)
>  			warning(_("unknown preferred pack: '%s'"),
>  				preferred_pack_name);
>  		else {
> -			uint32_t orig = ctx.info[ctx.preferred_pack_idx].orig_pack_int_id;
> +			uint32_t orig = ctx.info[preferred_idx].orig_pack_int_id;
>  			uint32_t perm = ctx.pack_perm[orig];
> 
> -			if (perm == PACK_EXPIRED) {
> +			if (perm == PACK_EXPIRED)
>  				warning(_("preferred pack '%s' is expired"),
>  					preferred_pack_name);
> -				ctx.preferred_pack_idx = -1;
> -			} else
> -				ctx.preferred_pack_idx = perm;
>  		}
>  	}

The rest makes sense.
> 

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes
  2021-03-02 19:15       ` Taylor Blau
@ 2021-03-04  2:03         ` Jonathan Tan
  0 siblings, 0 replies; 171+ messages in thread
From: Jonathan Tan @ 2021-03-04  2:03 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, peff, dstolee, avarab, gitster

> diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
> index 77eb591057..4bbbb188a4 100644
> --- a/Documentation/technical/pack-format.txt
> +++ b/Documentation/technical/pack-format.txt
> @@ -387,12 +387,15 @@ be used to generate a reverse index.
> 
>  Instead of mapping between offset, pack-, and index position, this
>  reverse index maps between an object's position within the MIDX, and
> -that object's position within a pseudo-pack that the MIDX describes.
> +that object's position within a pseudo-pack that the MIDX describes
> +(i.e., the ith entry of the multi-pack reverse index holds the MIDX
> +position of ith object in pseudo-pack order).
> 
> -To clarify these three orderings, consider a multi-pack reachability
> -bitmap (which does not yet exist, but is what we are building towards
> -here). Each bit needs to correspond to an object in the MIDX, and so we
> -need an efficient mapping from bit position to MIDX position.
> +To clarify the difference between these orderings, consider a multi-pack
> +reachability bitmap (which does not yet exist, but is what we are
> +building towards here). Each bit needs to correspond to an object in the
> +MIDX, and so we need an efficient mapping from bit position to MIDX
> +position.
> 
>  One solution is to let bits occupy the same position in the oid-sorted
>  index stored by the MIDX. But because oids are effectively random, there

Thanks - this diff makes sense.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes
  2021-03-03 15:30       ` Taylor Blau
@ 2021-03-04  2:04         ` Jonathan Tan
  2021-03-04  3:06           ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jonathan Tan @ 2021-03-04  2:04 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, peff, dstolee, avarab, gitster

> On Tue, Mar 02, 2021 at 10:40:33AM -0800, Jonathan Tan wrote:
> > > @@ -1018,6 +1080,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> > >
> > >  	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
> > >  	free_chunkfile(cf);
> > > +
> > > +	if (flags & MIDX_WRITE_REV_INDEX)
> > > +		ctx.pack_order = midx_pack_order(&ctx);
> > > +
> > > +	if (flags & MIDX_WRITE_REV_INDEX)
> > > +		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> > > +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> > > +
> > >  	commit_lock_file(&lk);
> > >
> > >  cleanup:
> >
> > Any reason why we're using 2 separate "if" statements?
> 
> Yeah. This first if statement will turn into:
> 
>   if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
> 
> so that the pack order is computed in either case (since both the
> existing write_midx_reverse_index() and the eventual write_midx_bitmap()
> will be able to use the pack order).

Ah, OK. That's what I was thinking of, but nice to have confirmation.
Maybe write in the commit message that these are separated because in
the future, one of the conditions will change.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands
  2021-03-04  1:54         ` Jonathan Tan
@ 2021-03-04  3:02           ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-04  3:02 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Wed, Mar 03, 2021 at 05:54:44PM -0800, Jonathan Tan wrote:
> > On Mon, Mar 01, 2021 at 08:06:25PM -0800, Jonathan Tan wrote:
> > > > +static char const * const builtin_multi_pack_index_write_usage[] = {
> > > >  #define BUILTIN_MIDX_WRITE_USAGE \
> > > >  	N_("git multi-pack-index [<options>] write")
> > > > +	BUILTIN_MIDX_WRITE_USAGE,
> > > > +	NULL
> > > > +};
> > >
> > > I think this way of writing is vulnerable to confusing errors if a
> > > missing or extra backslash happens, so I would prefer the #define to be
> > > outside the variable declaration.
> >
> > Yeah, I can't say that I disagree with you. Of course, having the
> > #define's outside of the declaration makes the whole thing a little more
> > verbose, which isn't a huge deal.
>
> I think it's the same verbosity - you just need to move the lines?

Yeah, you're right. I'm being too subjective, and I don't really feel
strongly, either.

>
> > But I was mirroring what Ævar was doing in the sub-thread he started at:
> >
> >     https://public-inbox.org/git/20210215184118.11306-1-avarab@gmail.com/
> >
> > Unless you feel strongly, I think that what we have isn't so bad here.
>
> Yeah I don't feel that strongly about it.

I'll take your suggestion, thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 08/15] midx: allow marking a pack as preferred
  2021-03-04  2:00         ` Jonathan Tan
@ 2021-03-04  3:04           ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-04  3:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Wed, Mar 03, 2021 at 06:00:17PM -0800, Jonathan Tan wrote:
> > I
> > think ultimately you want something like this squashed in:
> >
> > --- >8 ---
> >
> > diff --git a/midx.c b/midx.c
> > index d2c56c4bc6..46f55ff6cf 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -582,7 +582,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
> >  						  struct pack_info *info,
> >  						  uint32_t nr_packs,
> >  						  uint32_t *nr_objects,
> > -						  uint32_t preferred_pack)
> > +						  int preferred_pack)
>
> Why this change?

This was wrong in the original patch: ctx.preferred_pack is an integer,
and is set to -1 when no preferred pack was specified.

It's certainly unlikely that we'd have 2^31 packs, but silently
converting a signed type to an unsigned one is misleading.

> The rest makes sense.

Thanks for taking a look.

Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes
  2021-03-04  2:04         ` Jonathan Tan
@ 2021-03-04  3:06           ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-04  3:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, avarab, gitster

On Wed, Mar 03, 2021 at 06:04:44PM -0800, Jonathan Tan wrote:
> > > Any reason why we're using 2 separate "if" statements?
> >
> > Yeah. This first if statement will turn into:
> >
> >   if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
> >
> > so that the pack order is computed in either case (since both the
> > existing write_midx_reverse_index() and the eventual write_midx_bitmap()
> > will be able to use the pack order).
>
> Ah, OK. That's what I was thinking of, but nice to have confirmation.
> Maybe write in the commit message that these are separated because in
> the future, one of the conditions will change.

Thanks; that's a great idea.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 00/16] midx: implement a multi-pack reverse index
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (11 preceding siblings ...)
  2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
@ 2021-03-11 17:04 ` Taylor Blau
  2021-03-11 17:04   ` [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
                     ` (17 more replies)
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
  13 siblings, 18 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:04 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Here is another reroll of my series to implement a reverse index in
preparation for multi-pack reachability bitmaps. The previous version
was based on 'ds/chunked-file-api', but that topic has since been merged
to 'master'. This series is now built directly on top of 'master'.

Not much has changed since last time. Jonathan Tan reviewed the previous
version, and I incorporated feedback from his review:

  - The usage macros in builtin/multi-pack-index.c were pulled out and
    defined separately.
  - Some sloppiness with converting a signed index referring to the
    preferred pack into an unsigned value was cleaned up.
  - Documentation clean-up, particularly in patches 12 and 13.

There are a couple of new things that we found while testing this out at
GitHub.

  - We now call finalize_object_file() on the multi-pack reverse index
    to set the correct permissions.
  - Patch 14 removed a stray hunk that introduced a memory leak.
  - Patch 16 (courtesy of Peff) is new. It improves the cache locality
    of midx_pack_order_cmp(), which has a substantial impact on
    repositories with many objects.

Thanks in advance for your review.

Jeff King (1):
  midx.c: improve cache locality in midx_pack_order_cmp()

Taylor Blau (15):
  builtin/multi-pack-index.c: inline 'flags' with options
  builtin/multi-pack-index.c: don't handle 'progress' separately
  builtin/multi-pack-index.c: define common usage with a macro
  builtin/multi-pack-index.c: split sub-commands
  builtin/multi-pack-index.c: don't enter bogus cmd_mode
  builtin/multi-pack-index.c: display usage on unrecognized command
  t/helper/test-read-midx.c: add '--show-objects'
  midx: allow marking a pack as preferred
  midx: don't free midx_name early
  midx: keep track of the checksum
  midx: make some functions non-static
  Documentation/technical: describe multi-pack reverse indexes
  pack-revindex: read multi-pack reverse indexes
  pack-write.c: extract 'write_rev_file_order'
  pack-revindex: write multi-pack reverse indexes

 Documentation/git-multi-pack-index.txt       |  14 +-
 Documentation/technical/multi-pack-index.txt |   5 +-
 Documentation/technical/pack-format.txt      |  83 +++++++
 builtin/multi-pack-index.c                   | 182 ++++++++++++---
 builtin/repack.c                             |   2 +-
 midx.c                                       | 229 +++++++++++++++++--
 midx.h                                       |  11 +-
 pack-revindex.c                              | 127 ++++++++++
 pack-revindex.h                              |  53 +++++
 pack-write.c                                 |  36 ++-
 pack.h                                       |   1 +
 packfile.c                                   |   3 +
 t/helper/test-read-midx.c                    |  24 +-
 t/t5319-multi-pack-index.sh                  |  39 ++++
 14 files changed, 740 insertions(+), 69 deletions(-)

Range-diff against v2:
 1:  0527fa89a9 =  1:  43fc0ad276 builtin/multi-pack-index.c: inline 'flags' with options
 2:  a4e107b1f8 =  2:  181f11e4c5 builtin/multi-pack-index.c: don't handle 'progress' separately
 3:  8679dfd212 =  3:  94c498f0e2 builtin/multi-pack-index.c: define common usage with a macro
 4:  bc42b56ea2 !  4:  d084f90466 builtin/multi-pack-index.c: split sub-commands
    @@ Commit message
     
      ## builtin/multi-pack-index.c ##
     @@
    - #include "midx.h"
    - #include "trace2.h"
    + #define BUILTIN_MIDX_REPACK_USAGE \
    + 	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
      
     +static char const * const builtin_multi_pack_index_write_usage[] = {
    - #define BUILTIN_MIDX_WRITE_USAGE \
    - 	N_("git multi-pack-index [<options>] write")
     +	BUILTIN_MIDX_WRITE_USAGE,
     +	NULL
     +};
    - 
     +static char const * const builtin_multi_pack_index_verify_usage[] = {
    - #define BUILTIN_MIDX_VERIFY_USAGE \
    - 	N_("git multi-pack-index [<options>] verify")
     +	BUILTIN_MIDX_VERIFY_USAGE,
     +	NULL
     +};
    - 
     +static char const * const builtin_multi_pack_index_expire_usage[] = {
    - #define BUILTIN_MIDX_EXPIRE_USAGE \
    - 	N_("git multi-pack-index [<options>] expire")
     +	BUILTIN_MIDX_EXPIRE_USAGE,
     +	NULL
     +};
    - 
     +static char const * const builtin_multi_pack_index_repack_usage[] = {
    - #define BUILTIN_MIDX_REPACK_USAGE \
    - 	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
     +	BUILTIN_MIDX_REPACK_USAGE,
     +	NULL
     +};
    - 
      static char const * const builtin_multi_pack_index_usage[] = {
      	BUILTIN_MIDX_WRITE_USAGE,
    + 	BUILTIN_MIDX_VERIFY_USAGE,
     @@ builtin/multi-pack-index.c: static struct opts_multi_pack_index {
      	unsigned flags;
      } opts;
 5:  5daa2946d3 =  5:  bc3b6837f2 builtin/multi-pack-index.c: don't enter bogus cmd_mode
 6:  98d9ea0770 =  6:  f117e442c3 builtin/multi-pack-index.c: display usage on unrecognized command
 7:  2fd9f4debf =  7:  ae85a68ef2 t/helper/test-read-midx.c: add '--show-objects'
 8:  223b899094 !  8:  30194a6786 midx: allow marking a pack as preferred
    @@ builtin/multi-pack-index.c
      #include "trace2.h"
     +#include "object-store.h"
      
    - static char const * const builtin_multi_pack_index_write_usage[] = {
      #define BUILTIN_MIDX_WRITE_USAGE \
     -	N_("git multi-pack-index [<options>] write")
     +	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
    - 	BUILTIN_MIDX_WRITE_USAGE,
    - 	NULL
    - };
    + 
    + #define BUILTIN_MIDX_VERIFY_USAGE \
    + 	N_("git multi-pack-index [<options>] verify")
     @@ builtin/multi-pack-index.c: static char const * const builtin_multi_pack_index_usage[] = {
      
      static struct opts_multi_pack_index {
    @@ midx.c: static void fill_pack_entry(uint32_t pack_int_id,
      						  uint32_t nr_packs,
     -						  uint32_t *nr_objects)
     +						  uint32_t *nr_objects,
    -+						  uint32_t preferred_pack)
    ++						  int preferred_pack)
      {
      	uint32_t cur_fanout, cur_pack, cur_object;
      	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      		goto cleanup;
      
     -	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
    ++	ctx.preferred_pack_idx = -1;
     +	if (preferred_pack_name) {
     +		for (i = 0; i < ctx.nr; i++) {
     +			if (!cmp_idx_or_pack_name(preferred_pack_name,
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +				break;
     +			}
     +		}
    -+	} else
    -+		ctx.preferred_pack_idx = -1;
    ++	}
     +
     +	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
     +					 ctx.preferred_pack_idx);
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
      	}
      
    -+	/*
    -+	 * Recompute the preferred_pack_idx (if applicable) according to the
    -+	 * permuted pack order.
    -+	 */
    -+	ctx.preferred_pack_idx = -1;
    ++	/* Check that the preferred pack wasn't expired (if given). */
     +	if (preferred_pack_name) {
    -+		ctx.preferred_pack_idx = lookup_idx_or_pack_name(ctx.info,
    -+							     ctx.nr,
    -+							     preferred_pack_name);
    -+		if (ctx.preferred_pack_idx < 0)
    ++		int preferred_idx = lookup_idx_or_pack_name(ctx.info,
    ++							    ctx.nr,
    ++							    preferred_pack_name);
    ++		if (preferred_idx < 0)
     +			warning(_("unknown preferred pack: '%s'"),
     +				preferred_pack_name);
     +		else {
    -+			uint32_t orig = ctx.info[ctx.preferred_pack_idx].orig_pack_int_id;
    ++			uint32_t orig = ctx.info[preferred_idx].orig_pack_int_id;
     +			uint32_t perm = ctx.pack_perm[orig];
     +
    -+			if (perm == PACK_EXPIRED) {
    ++			if (perm == PACK_EXPIRED)
     +				warning(_("preferred pack '%s' is expired"),
     +					preferred_pack_name);
    -+				ctx.preferred_pack_idx = -1;
    -+			} else
    -+				ctx.preferred_pack_idx = perm;
     +		}
     +	}
     +
 9:  976848bc4b =  9:  5c5aca761a midx: don't free midx_name early
10:  5ed47f7e3a = 10:  a22a1463a5 midx: keep track of the checksum
11:  0292508e12 = 11:  efa54479b1 midx: make some functions non-static
12:  404d730498 ! 12:  4745bb8590 Documentation/technical: describe multi-pack reverse indexes
    @@ Documentation/technical/pack-format.txt: CHUNK DATA:
     +
     +Instead of mapping between offset, pack-, and index position, this
     +reverse index maps between an object's position within the MIDX, and
    -+that object's position within a pseudo-pack that the MIDX describes.
    ++that object's position within a pseudo-pack that the MIDX describes
    ++(i.e., the ith entry of the multi-pack reverse index holds the MIDX
    ++position of ith object in pseudo-pack order).
     +
    -+To clarify these three orderings, consider a multi-pack reachability
    -+bitmap (which does not yet exist, but is what we are building towards
    -+here). Each bit needs to correspond to an object in the MIDX, and so we
    -+need an efficient mapping from bit position to MIDX position.
    ++To clarify the difference between these orderings, consider a multi-pack
    ++reachability bitmap (which does not yet exist, but is what we are
    ++building towards here). Each bit needs to correspond to an object in the
    ++MIDX, and so we need an efficient mapping from bit position to MIDX
    ++position.
     +
     +One solution is to let bits occupy the same position in the oid-sorted
     +index stored by the MIDX. But because oids are effectively random, there
13:  d4e01a44e7 ! 13:  a6ebd4be91 pack-revindex: read multi-pack reverse indexes
    @@ Commit message
         position in the psuedo-pack order, but that order can only be recovered
         in the .rev file itself. This mapping can be implemented with a binary
         search, but note that the thing we're binary searching over isn't an
    -    array, but rather a _permutation_.
    +    array of values, but rather a permuted order of those values.
     
         So, when comparing two items, it's helpful to keep in mind the
         difference. Instead of a traditional binary search, where you are
14:  ab7012b283 ! 14:  f5314f1822 pack-write.c: extract 'write_rev_file_order'
    @@ pack-write.c: const char *write_rev_file(const char *rev_name,
     +		pack_order[i] = i;
     +	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
     +
    -+	if (!(flags & (WRITE_REV | WRITE_REV_VERIFY)))
    -+		return NULL;
    -+
     +	ret = write_rev_file_order(rev_name, pack_order, nr_objects, hash,
     +				   flags);
     +
15:  01bd6a35c6 ! 15:  fa3acb5d5a pack-revindex: write multi-pack reverse indexes
    @@ Commit message
         for long, since subsequent patches will introduce the multi-pack bitmap,
         which will begin passing this field.
     
    +    (In midx.c:write_midx_internal(), the two adjacent if statements share a
    +    conditional, but are written separately since the first one will
    +    eventually also handle the MIDX_WRITE_BITMAP flag, which does not yet
    +    exist.)
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx.c ##
    @@ midx.c: static int write_midx_large_offsets(struct hashfile *f,
     +				     struct write_midx_context *ctx)
     +{
     +	struct strbuf buf = STRBUF_INIT;
    ++	const char *tmp_file;
     +
     +	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
     +
    -+	write_rev_file_order(buf.buf, ctx->pack_order, ctx->entries_nr,
    -+			     midx_hash, WRITE_REV);
    ++	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
    ++					midx_hash, WRITE_REV);
    ++
    ++	if (finalize_object_file(tmp_file, buf.buf))
    ++		die(_("cannot store reverse index file"));
     +
     +	strbuf_release(&buf);
     +}
 -:  ---------- > 16:  550e785f10 midx.c: improve cache locality in midx_pack_order_cmp()
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
@ 2021-03-11 17:04   ` Taylor Blau
  2021-03-29 11:20     ` Jeff King
  2021-03-11 17:04   ` [PATCH v3 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
                     ` (16 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:04 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Subcommands of the 'git multi-pack-index' command (e.g., 'write',
'verify', etc.) will want to optionally change a set of shared flags
that are eventually passed to the MIDX libraries.

Right now, options and flags are handled separately. Inline them into
the same structure so that sub-commands can more easily share the
'flags' data.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5bf88cd2a8..4a0ddb06c4 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -14,13 +14,12 @@ static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
 	int progress;
+	unsigned flags;
 } opts;
 
 int cmd_multi_pack_index(int argc, const char **argv,
 			 const char *prefix)
 {
-	unsigned flags = 0;
-
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
@@ -40,7 +39,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
 	if (opts.progress)
-		flags |= MIDX_PROGRESS;
+		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
@@ -55,16 +54,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	if (!strcmp(argv[0], "repack"))
 		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, flags);
+			(size_t)opts.batch_size, opts.flags);
 	if (opts.batch_size)
 		die(_("--batch-size option is only for 'repack' subcommand"));
 
 	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, flags);
+		return write_midx_file(opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, flags);
+		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, flags);
+		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
 
 	die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
  2021-03-11 17:04   ` [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
@ 2021-03-11 17:04   ` Taylor Blau
  2021-03-29 11:22     ` Jeff King
  2021-03-11 17:04   ` [PATCH v3 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
                     ` (15 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:04 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Now that there is a shared 'flags' member in the options structure,
there is no need to keep track of whether to force progress or not,
since ultimately the decision of whether or not to show a progress meter
is controlled by a bit in the flags member.

Manipulate that bit directly, and drop the now-unnecessary 'progress'
field while we're at it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 4a0ddb06c4..c70f020d8f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,7 +13,6 @@ static char const * const builtin_multi_pack_index_usage[] = {
 static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
-	int progress;
 	unsigned flags;
 } opts;
 
@@ -23,7 +22,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
+		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
@@ -31,15 +30,14 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	git_config(git_default_config, NULL);
 
-	opts.progress = isatty(2);
+	if (isatty(2))
+		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
 			     builtin_multi_pack_index_usage, 0);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
-	if (opts.progress)
-		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 03/16] builtin/multi-pack-index.c: define common usage with a macro
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
  2021-03-11 17:04   ` [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
  2021-03-11 17:04   ` [PATCH v3 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
@ 2021-03-11 17:04   ` Taylor Blau
  2021-03-11 17:04   ` [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:04 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Factor out the usage message into pieces corresponding to each mode.
This avoids options specific to one sub-command from being shared with
another in the usage.

A subsequent commit will use these #define macros to have usage
variables for each sub-command without duplicating their contents.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c70f020d8f..eea498e026 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,8 +5,23 @@
 #include "midx.h"
 #include "trace2.h"
 
+#define BUILTIN_MIDX_WRITE_USAGE \
+	N_("git multi-pack-index [<options>] write")
+
+#define BUILTIN_MIDX_VERIFY_USAGE \
+	N_("git multi-pack-index [<options>] verify")
+
+#define BUILTIN_MIDX_EXPIRE_USAGE \
+	N_("git multi-pack-index [<options>] expire")
+
+#define BUILTIN_MIDX_REPACK_USAGE \
+	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
+
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
+	BUILTIN_MIDX_WRITE_USAGE,
+	BUILTIN_MIDX_VERIFY_USAGE,
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	BUILTIN_MIDX_REPACK_USAGE,
 	NULL
 };
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (2 preceding siblings ...)
  2021-03-11 17:04   ` [PATCH v3 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
@ 2021-03-11 17:04   ` Taylor Blau
  2021-03-29 11:36     ` Jeff King
  2021-03-11 17:04   ` [PATCH v3 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
                     ` (13 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:04 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
"write", "repack", etc.) separately from one another. This allows
sub-commands with unique options, without forcing cmd_multi_pack_index()
to reject invalid combinations itself.

This comes at the cost of some duplication and boilerplate. Luckily, the
duplication is reduced to a minimum, since common options are shared
among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
retain the common options, too, since this builtin accepts common
options on either side of the sub-command).

Roughly speaking, cmd_multi_pack_index() parses options (including
common ones), and stops at the first non-option, which is the
sub-command. It then dispatches to the appropriate sub-command, which
parses the remaining options (also including common options).

Unknown options are kept by the sub-commands in order to detect their
presence (and complain that too many arguments were given).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 131 ++++++++++++++++++++++++++++++-------
 1 file changed, 106 insertions(+), 25 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index eea498e026..23e51dfeb4 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -17,6 +17,22 @@
 #define BUILTIN_MIDX_REPACK_USAGE \
 	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
 
+static char const * const builtin_multi_pack_index_write_usage[] = {
+	BUILTIN_MIDX_WRITE_USAGE,
+	NULL
+};
+static char const * const builtin_multi_pack_index_verify_usage[] = {
+	BUILTIN_MIDX_VERIFY_USAGE,
+	NULL
+};
+static char const * const builtin_multi_pack_index_expire_usage[] = {
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	NULL
+};
+static char const * const builtin_multi_pack_index_repack_usage[] = {
+	BUILTIN_MIDX_REPACK_USAGE,
+	NULL
+};
 static char const * const builtin_multi_pack_index_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
 	BUILTIN_MIDX_VERIFY_USAGE,
@@ -31,25 +47,99 @@ static struct opts_multi_pack_index {
 	unsigned flags;
 } opts;
 
-int cmd_multi_pack_index(int argc, const char **argv,
-			 const char *prefix)
+static struct option common_opts[] = {
+	OPT_FILENAME(0, "object-dir", &opts.object_dir,
+	  N_("object directory containing set of packfile and pack-index pairs")),
+	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	OPT_END(),
+};
+
+static struct option *add_common_options(struct option *prev)
 {
-	static struct option builtin_multi_pack_index_options[] = {
-		OPT_FILENAME(0, "object-dir", &opts.object_dir,
-		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	struct option *with_common = parse_options_concat(common_opts, prev);
+	free(prev);
+	return with_common;
+}
+
+static int cmd_multi_pack_index_write(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_write_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_write_usage,
+				   options);
+
+	return write_midx_file(opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_verify(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_verify_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_verify_usage,
+				   options);
+
+	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_expire(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_expire_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_expire_usage,
+				   options);
+
+	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_repack(int argc, const char **argv)
+{
+	struct option *options;
+	static struct option builtin_multi_pack_index_repack_options[] = {
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
 	};
 
+	options = parse_options_dup(builtin_multi_pack_index_repack_options);
+	options = add_common_options(options);
+
+	argc = parse_options(argc, argv, NULL,
+			     options,
+			     builtin_multi_pack_index_repack_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_repack_usage,
+				   options);
+
+	return midx_repack(the_repository, opts.object_dir,
+			   (size_t)opts.batch_size, opts.flags);
+}
+
+int cmd_multi_pack_index(int argc, const char **argv,
+			 const char *prefix)
+{
+	struct option *builtin_multi_pack_index_options = common_opts;
+
 	git_config(git_default_config, NULL);
 
 	if (isatty(2))
 		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
-			     builtin_multi_pack_index_usage, 0);
+			     builtin_multi_pack_index_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
@@ -58,25 +148,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		usage_with_options(builtin_multi_pack_index_usage,
 				   builtin_multi_pack_index_options);
 
-	if (argc > 1) {
-		die(_("too many arguments"));
-		return 1;
-	}
-
 	trace2_cmd_mode(argv[0]);
 
 	if (!strcmp(argv[0], "repack"))
-		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, opts.flags);
-	if (opts.batch_size)
-		die(_("--batch-size option is only for 'repack' subcommand"));
-
-	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
-
-	die(_("unrecognized subcommand: %s"), argv[0]);
+		return cmd_multi_pack_index_repack(argc, argv);
+	else if (!strcmp(argv[0], "write"))
+		return cmd_multi_pack_index_write(argc, argv);
+	else if (!strcmp(argv[0], "verify"))
+		return cmd_multi_pack_index_verify(argc, argv);
+	else if (!strcmp(argv[0], "expire"))
+		return cmd_multi_pack_index_expire(argc, argv);
+	else
+		die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (3 preceding siblings ...)
  2021-03-11 17:04   ` [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
@ 2021-03-11 17:04   ` Taylor Blau
  2021-03-11 17:04   ` [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:04 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Even before the recent refactoring, 'git multi-pack-index' calls
'trace2_cmd_mode()' before verifying that the sub-command is recognized.

Push this call down into the individual sub-commands so that we don't
enter a bogus command mode.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 23e51dfeb4..b5678cc2bb 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -65,6 +65,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_write_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -79,6 +81,8 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_verify_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -93,6 +97,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_expire_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -115,6 +121,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 	options = parse_options_dup(builtin_multi_pack_index_repack_options);
 	options = add_common_options(options);
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options,
 			     builtin_multi_pack_index_repack_usage,
@@ -148,8 +156,6 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		usage_with_options(builtin_multi_pack_index_usage,
 				   builtin_multi_pack_index_options);
 
-	trace2_cmd_mode(argv[0]);
-
 	if (!strcmp(argv[0], "repack"))
 		return cmd_multi_pack_index_repack(argc, argv);
 	else if (!strcmp(argv[0], "write"))
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (4 preceding siblings ...)
  2021-03-11 17:04   ` [PATCH v3 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
@ 2021-03-11 17:04   ` Taylor Blau
  2021-03-29 11:42     ` Jeff King
  2021-03-11 17:05   ` [PATCH v3 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
                     ` (11 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:04 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

When given a sub-command that it doesn't understand, 'git
multi-pack-index' dies with the following message:

    $ git multi-pack-index bogus
    fatal: unrecognized subcommand: bogus

Instead of 'die()'-ing, we can display the usage text, which is much
more helpful:

    $ git.compile multi-pack-index bogus
    usage: git multi-pack-index [<options>] write
       or: git multi-pack-index [<options>] verify
       or: git multi-pack-index [<options>] expire
       or: git multi-pack-index [<options>] repack [--batch-size=<size>]

	--object-dir <file>   object directory containing set of packfile and pack-index pairs
	--progress            force progress reporting

While we're at it, clean up some duplication between the "no sub-command"
and "unrecognized sub-command" conditionals.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index b5678cc2bb..243b6ccc7c 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -153,8 +153,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		opts.object_dir = get_object_directory();
 
 	if (argc == 0)
-		usage_with_options(builtin_multi_pack_index_usage,
-				   builtin_multi_pack_index_options);
+		goto usage;
 
 	if (!strcmp(argv[0], "repack"))
 		return cmd_multi_pack_index_repack(argc, argv);
@@ -165,5 +164,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	else if (!strcmp(argv[0], "expire"))
 		return cmd_multi_pack_index_expire(argc, argv);
 	else
-		die(_("unrecognized subcommand: %s"), argv[0]);
+usage:
+		usage_with_options(builtin_multi_pack_index_usage,
+				   builtin_multi_pack_index_options);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 07/16] t/helper/test-read-midx.c: add '--show-objects'
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (5 preceding siblings ...)
  2021-03-11 17:04   ` [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-11 17:05   ` [PATCH v3 08/16] midx: allow marking a pack as preferred Taylor Blau
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

The 'read-midx' helper is used in places like t5319 to display basic
information about a multi-pack-index.

In the next patch, the MIDX writing machinery will learn a new way to
choose from which pack an object is selected when multiple copies of
that object exist.

To disambiguate which pack introduces an object so that this feature can
be tested, add a '--show-objects' option which displays additional
information about each object in the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 2430880f78..7c2eb11a8e 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -4,7 +4,7 @@
 #include "repository.h"
 #include "object-store.h"
 
-static int read_midx_file(const char *object_dir)
+static int read_midx_file(const char *object_dir, int show_objects)
 {
 	uint32_t i;
 	struct multi_pack_index *m;
@@ -43,13 +43,29 @@ static int read_midx_file(const char *object_dir)
 
 	printf("object-dir: %s\n", m->object_dir);
 
+	if (show_objects) {
+		struct object_id oid;
+		struct pack_entry e;
+
+		for (i = 0; i < m->num_objects; i++) {
+			nth_midxed_object_oid(&oid, m, i);
+			fill_midx_entry(the_repository, &oid, &e, m);
+
+			printf("%s %"PRIu64"\t%s\n",
+			       oid_to_hex(&oid), e.offset, e.p->pack_name);
+		}
+		return 0;
+	}
+
 	return 0;
 }
 
 int cmd__read_midx(int argc, const char **argv)
 {
-	if (argc != 2)
-		usage("read-midx <object-dir>");
+	if (!(argc == 2 || argc == 3))
+		usage("read-midx [--show-objects] <object-dir>");
 
-	return read_midx_file(argv[1]);
+	if (!strcmp(argv[1], "--show-objects"))
+		return read_midx_file(argv[2], 1);
+	return read_midx_file(argv[1], 0);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 08/16] midx: allow marking a pack as preferred
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (6 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-29 12:00     ` Jeff King
  2021-03-11 17:05   ` [PATCH v3 09/16] midx: don't free midx_name early Taylor Blau
                     ` (9 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.

Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).

To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.

No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).

[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt       | 14 ++-
 Documentation/technical/multi-pack-index.txt |  5 +-
 builtin/multi-pack-index.c                   | 18 +++-
 builtin/repack.c                             |  2 +-
 midx.c                                       | 92 ++++++++++++++++++--
 midx.h                                       |  2 +-
 t/t5319-multi-pack-index.sh                  | 39 +++++++++
 7 files changed, 154 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index eb0caa0439..ffd601bc17 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
+'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
+	[--preferred-pack=<pack>] <subcommand>
 
 DESCRIPTION
 -----------
@@ -30,7 +31,16 @@ OPTIONS
 The following subcommands are available:
 
 write::
-	Write a new MIDX file.
+	Write a new MIDX file. The following options are available for
+	the `write` sub-command:
++
+--
+	--preferred-pack=<pack>::
+		Optionally specify the tie-breaking pack used when
+		multiple packs contain the same object. If not given,
+		ties are broken in favor of the pack with the lowest
+		mtime.
+--
 
 verify::
 	Verify the contents of the MIDX file.
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index e8e377a59f..fb688976c4 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -43,8 +43,9 @@ Design Details
   a change in format.
 
 - The MIDX keeps only one record per object ID. If an object appears
-  in multiple packfiles, then the MIDX selects the copy in the most-
-  recently modified packfile.
+  in multiple packfiles, then the MIDX selects the copy in the
+  preferred packfile, otherwise selecting from the most-recently
+  modified packfile.
 
 - If there exist packfiles in the pack directory not registered in
   the MIDX, then those packfiles are loaded into the `packed_git`
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 243b6ccc7c..92f358f212 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -4,9 +4,10 @@
 #include "parse-options.h"
 #include "midx.h"
 #include "trace2.h"
+#include "object-store.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
@@ -43,6 +44,7 @@ static char const * const builtin_multi_pack_index_usage[] = {
 
 static struct opts_multi_pack_index {
 	const char *object_dir;
+	const char *preferred_pack;
 	unsigned long batch_size;
 	unsigned flags;
 } opts;
@@ -63,7 +65,16 @@ static struct option *add_common_options(struct option *prev)
 
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
-	struct option *options = common_opts;
+	struct option *options;
+	static struct option builtin_multi_pack_index_write_options[] = {
+		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
+			   N_("preferred-pack"),
+			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_END(),
+	};
+
+	options = parse_options_dup(builtin_multi_pack_index_write_options);
+	options = add_common_options(options);
 
 	trace2_cmd_mode(argv[0]);
 
@@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_write_usage,
 				   options);
 
-	return write_midx_file(opts.object_dir, opts.flags);
+	return write_midx_file(opts.object_dir, opts.preferred_pack,
+			       opts.flags);
 }
 
 static int cmd_multi_pack_index_verify(int argc, const char **argv)
diff --git a/builtin/repack.c b/builtin/repack.c
index 01440de2d5..9f00806805 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -523,7 +523,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	remove_temporary_files();
 
 	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), 0);
+		write_midx_file(get_object_directory(), NULL, 0);
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/midx.c b/midx.c
index 971faa8cfc..46f55ff6cf 100644
--- a/midx.c
+++ b/midx.c
@@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
 	return strcmp(a->pack_name, b->pack_name);
 }
 
+static int lookup_idx_or_pack_name(struct pack_info *info,
+				   uint32_t nr,
+				   const char *pack_name)
+{
+	uint32_t lo = 0, hi = nr;
+	while (lo < hi) {
+		uint32_t mi = lo + (hi - lo) / 2;
+		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
+		if (cmp < 0)
+			hi = mi;
+		else if (cmp > 0)
+			lo = mi + 1;
+		else
+			return mi;
+	}
+	return -1;
+}
+
 struct write_midx_context {
 	struct pack_info *info;
 	uint32_t nr;
@@ -445,6 +463,8 @@ struct write_midx_context {
 	uint32_t *pack_perm;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
+
+	int preferred_pack_idx;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -489,6 +509,7 @@ struct pack_midx_entry {
 	uint32_t pack_int_id;
 	time_t pack_mtime;
 	uint64_t offset;
+	unsigned preferred : 1;
 };
 
 static int midx_oid_compare(const void *_a, const void *_b)
@@ -500,6 +521,12 @@ static int midx_oid_compare(const void *_a, const void *_b)
 	if (cmp)
 		return cmp;
 
+	/* Sort objects in a preferred pack first when multiple copies exist. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
 	if (a->pack_mtime > b->pack_mtime)
 		return -1;
 	else if (a->pack_mtime < b->pack_mtime)
@@ -527,7 +554,8 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 static void fill_pack_entry(uint32_t pack_int_id,
 			    struct packed_git *p,
 			    uint32_t cur_object,
-			    struct pack_midx_entry *entry)
+			    struct pack_midx_entry *entry,
+			    int preferred)
 {
 	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
 		die(_("failed to locate object %d in packfile"), cur_object);
@@ -536,6 +564,7 @@ static void fill_pack_entry(uint32_t pack_int_id,
 	entry->pack_mtime = p->mtime;
 
 	entry->offset = nth_packed_object_offset(p, cur_object);
+	entry->preferred = !!preferred;
 }
 
 /*
@@ -552,7 +581,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
 static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  struct pack_info *info,
 						  uint32_t nr_packs,
-						  uint32_t *nr_objects)
+						  uint32_t *nr_objects,
+						  int preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
 	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@@ -589,12 +619,17 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 				nth_midxed_pack_midx_entry(m,
 							   &entries_by_fanout[nr_fanout],
 							   cur_object);
+				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+					entries_by_fanout[nr_fanout].preferred = 1;
+				else
+					entries_by_fanout[nr_fanout].preferred = 0;
 				nr_fanout++;
 			}
 		}
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			uint32_t start = 0, end;
+			int preferred = cur_pack == preferred_pack;
 
 			if (cur_fanout)
 				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
@@ -602,7 +637,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 
 			for (cur_object = start; cur_object < end; cur_object++) {
 				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
-				fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
+				fill_pack_entry(cur_pack,
+						info[cur_pack].p,
+						cur_object,
+						&entries_by_fanout[nr_fanout],
+						preferred);
 				nr_fanout++;
 			}
 		}
@@ -777,7 +816,9 @@ static int write_midx_large_offsets(struct hashfile *f,
 }
 
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
-			       struct string_list *packs_to_drop, unsigned flags)
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       unsigned flags)
 {
 	char *midx_name;
 	uint32_t i;
@@ -828,7 +869,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
+	ctx.preferred_pack_idx = -1;
+	if (preferred_pack_name) {
+		for (i = 0; i < ctx.nr; i++) {
+			if (!cmp_idx_or_pack_name(preferred_pack_name,
+						  ctx.info[i].pack_name)) {
+				ctx.preferred_pack_idx = i;
+				break;
+			}
+		}
+	}
+
+	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
+					 ctx.preferred_pack_idx);
 
 	ctx.large_offsets_needed = 0;
 	for (i = 0; i < ctx.entries_nr; i++) {
@@ -889,6 +942,24 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
 	}
 
+	/* Check that the preferred pack wasn't expired (if given). */
+	if (preferred_pack_name) {
+		int preferred_idx = lookup_idx_or_pack_name(ctx.info,
+							    ctx.nr,
+							    preferred_pack_name);
+		if (preferred_idx < 0)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+		else {
+			uint32_t orig = ctx.info[preferred_idx].orig_pack_int_id;
+			uint32_t perm = ctx.pack_perm[orig];
+
+			if (perm == PACK_EXPIRED)
+				warning(_("preferred pack '%s' is expired"),
+					preferred_pack_name);
+		}
+	}
+
 	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
@@ -947,9 +1018,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	return result;
 }
 
-int write_midx_file(const char *object_dir, unsigned flags)
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, flags);
+	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+				   flags);
 }
 
 void clear_midx_file(struct repository *r)
@@ -1184,7 +1258,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	free(count);
 
 	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
+		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1373,7 +1447,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, flags);
+	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
diff --git a/midx.h b/midx.h
index b18cf53bc4..e7fea61109 100644
--- a/midx.h
+++ b/midx.h
@@ -47,7 +47,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
 int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-int write_midx_file(const char *object_dir, unsigned flags);
+int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index b4afab1dfc..fd94ba9053 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -31,6 +31,14 @@ midx_read_expect () {
 	test_cmp expect actual
 }
 
+midx_expect_object_offset () {
+	OID="$1"
+	OFFSET="$2"
+	OBJECT_DIR="$3"
+	test-tool read-midx --show-objects $OBJECT_DIR >actual &&
+	grep "^$OID $OFFSET" actual
+}
+
 test_expect_success 'setup' '
 	test_oid_cache <<-EOF
 	idxoff sha1:2999
@@ -234,6 +242,37 @@ test_expect_success 'warn on improper hash version' '
 	)
 '
 
+test_expect_success 'midx picks objects from preferred pack' '
+	test_when_finished rm -rf preferred.git &&
+	git init --bare preferred.git &&
+	(
+		cd preferred.git &&
+
+		a=$(echo "a" | git hash-object -w --stdin) &&
+		b=$(echo "b" | git hash-object -w --stdin) &&
+		c=$(echo "c" | git hash-object -w --stdin) &&
+
+		# Set up two packs, duplicating the object "B" at different
+		# offsets.
+		git pack-objects objects/pack/test-AB <<-EOF &&
+		$a
+		$b
+		EOF
+		bc=$(git pack-objects objects/pack/test-BC <<-EOF
+		$b
+		$c
+		EOF
+		) &&
+
+		git multi-pack-index --object-dir=objects \
+			write --preferred-pack=test-BC-$bc.idx 2>err &&
+		test_must_be_empty err &&
+
+		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
+			cut -d" " -f1) &&
+		midx_expect_object_offset $b $ofs objects
+	)
+'
 
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 09/16] midx: don't free midx_name early
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (7 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 08/16] midx: allow marking a pack as preferred Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-11 17:05   ` [PATCH v3 10/16] midx: keep track of the checksum Taylor Blau
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

A subsequent patch will need to refer back to 'midx_name' later on in
the function. In fact, this variable is already free()'d later on, so
this makes the later free() no longer redundant.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/midx.c b/midx.c
index 46f55ff6cf..e0009d3314 100644
--- a/midx.c
+++ b/midx.c
@@ -966,7 +966,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
-	FREE_AND_NULL(midx_name);
 
 	if (ctx.m)
 		close_midx(ctx.m);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 10/16] midx: keep track of the checksum
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (8 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 09/16] midx: don't free midx_name early Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-11 17:05   ` [PATCH v3 11/16] midx: make some functions non-static Taylor Blau
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

write_midx_internal() uses a hashfile to write the multi-pack index, but
discards its checksum. This makes sense, since nothing that takes place
after writing the MIDX cares about its checksum.

That is about to change in a subsequent patch, when the optional
reverse index corresponding to the MIDX will want to include the MIDX's
checksum.

Store the checksum of the MIDX in preparation for that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index e0009d3314..31e6d3d2df 100644
--- a/midx.c
+++ b/midx.c
@@ -821,6 +821,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			       unsigned flags)
 {
 	char *midx_name;
+	unsigned char midx_hash[GIT_MAX_RAWSZ];
 	uint32_t i;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
@@ -997,7 +998,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
 	write_chunkfile(cf, &ctx);
 
-	finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 	commit_lock_file(&lk);
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 11/16] midx: make some functions non-static
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (9 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 10/16] midx: keep track of the checksum Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-11 17:05   ` [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

In a subsequent commit, pack-revindex.c will become responsible for
sorting a list of objects in the "MIDX pack order" (which will be
defined in the following patch). To do so, it will need to be know the
pack identifier and offset within that pack for each object in the MIDX.

The MIDX code already has functions for doing just that
(nth_midxed_offset() and nth_midxed_pack_int_id()), but they are
statically declared.

Since there is no reason that they couldn't be exposed publicly, and
because they are already doing exactly what the caller in
pack-revindex.c will want, expose them publicly so that they can be
reused there.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 ++--
 midx.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 31e6d3d2df..0a5da49ed6 100644
--- a/midx.c
+++ b/midx.c
@@ -239,7 +239,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 	return oid;
 }
 
-static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
 	const unsigned char *offset_data;
 	uint32_t offset32;
@@ -258,7 +258,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	return offset32;
 }
 
-static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
 	return get_be32(m->chunk_object_offsets +
 			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
diff --git a/midx.h b/midx.h
index e7fea61109..93bd68189e 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,8 @@ struct multi_pack_index {
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (10 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 11/16] midx: make some functions non-static Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-29 12:12     ` Jeff King
  2021-03-11 17:05   ` [PATCH v3 13/16] pack-revindex: read " Taylor Blau
                     ` (5 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

As a prerequisite to implementing multi-pack bitmaps, motivate and
describe the format and ordering of the multi-pack reverse index.

The subsequent patch will implement reading this format, and the patch
after that will implement writing it while producing a multi-pack index.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/pack-format.txt | 83 +++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 1faa949bf6..4bbbb188a4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -379,3 +379,86 @@ CHUNK DATA:
 TRAILER:
 
 	Index checksum of the above contents.
+
+== multi-pack-index reverse indexes
+
+Similar to the pack-based reverse index, the multi-pack index can also
+be used to generate a reverse index.
+
+Instead of mapping between offset, pack-, and index position, this
+reverse index maps between an object's position within the MIDX, and
+that object's position within a pseudo-pack that the MIDX describes
+(i.e., the ith entry of the multi-pack reverse index holds the MIDX
+position of ith object in pseudo-pack order).
+
+To clarify the difference between these orderings, consider a multi-pack
+reachability bitmap (which does not yet exist, but is what we are
+building towards here). Each bit needs to correspond to an object in the
+MIDX, and so we need an efficient mapping from bit position to MIDX
+position.
+
+One solution is to let bits occupy the same position in the oid-sorted
+index stored by the MIDX. But because oids are effectively random, there
+resulting reachability bitmaps would have no locality, and thus compress
+poorly. (This is the reason that single-pack bitmaps use the pack
+ordering, and not the .idx ordering, for the same purpose.)
+
+So we'd like to define an ordering for the whole MIDX based around
+pack ordering, which has far better locality (and thus compresses more
+efficiently). We can think of a pseudo-pack created by the concatenation
+of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
+(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
+ordering of the objects like:
+
+    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
+
+where the ordering of the packs is defined by the MIDX's pack list,
+and then the ordering of objects within each pack is the same as the
+order in the actual packfile.
+
+Given the list of packs and their counts of objects, you can
+na&iuml;vely reconstruct that pseudo-pack ordering (e.g., the object at
+position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
+slots). But there's a catch. Objects may be duplicated between packs, in
+which case the MIDX only stores one pointer to the object (and thus we'd
+want only one slot in the bitmap).
+
+Callers could handle duplicates themselves by reading objects in order
+of their bit-position, but that's linear in the number of objects, and
+much too expensive for ordinary bitmap lookups. Building a reverse index
+solves this, since it is the logical inverse of the index, and that
+index has already removed duplicates. But, building a reverse index on
+the fly can be expensive. Since we already have an on-disk format for
+pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
+too.
+
+Objects from the MIDX are ordered as follows to string together the
+pseudo-pack. Let _pack(o)_ return the pack from which _o_ was selected
+by the MIDX, and define an ordering of packs based on their numeric ID
+(as stored by the MIDX). Let _offset(o)_ return the object offset of _o_
+within _pack(o)_. Then, compare _o~1~_ and _o~2~_ as follows:
+
+  - If one of _pack(o~1~)_ and _pack(o~2~)_ is preferred and the other
+    is not, then the preferred one sorts first.
++
+(This is a detail that allows the MIDX bitmap to determine which
+pack should be used by the pack-reuse mechanism, since it can ask
+the MIDX for the pack containing the object at bit position 0).
+
+  - If _pack(o~1~) &ne; pack(o~2~)_, then sort the two objects in
+    descending order based on the pack ID.
+
+  - Otherwise, _pack(o~1~) &equals; pack(o~2~)_, and the objects are
+    sorted in pack-order (i.e., _o~1~_ sorts ahead of _o~2~_ exactly
+    when _offset(o~1~) &lt; offset(o~2~)_).
+
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+
+Finally, note that the MIDX's reverse index is not stored as a chunk in
+the multi-pack-index itself. This is done because the reverse index
+includes the checksum of the pack or MIDX to which it belongs, which
+makes it impossible to write in the MIDX. To avoid races when rewriting
+the MIDX, a MIDX reverse index includes the MIDX's checksum in its
+filename (e.g., `multi-pack-index-xyz.rev`).
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 13/16] pack-revindex: read multi-pack reverse indexes
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (11 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-29 12:43     ` Jeff King
  2021-03-11 17:05   ` [PATCH v3 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
                     ` (4 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Implement reading for multi-pack reverse indexes, as described in the
previous patch.

Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.

There are three new functions exposed by the revindex API:

  - load_midx_revindex(): loads the reverse index corresponding to the
    given multi-pack index.

  - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
    multi-pack index and pseudo-pack order.

load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.

load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.

pack_pos_to_midx() simply reads the corresponding entry out of the
table.

midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array of values, but rather a permuted order of those values.

So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c          |  11 +++++
 midx.h          |   6 +++
 pack-revindex.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++
 pack-revindex.h |  53 ++++++++++++++++++++
 packfile.c      |   3 ++
 5 files changed, 200 insertions(+)

diff --git a/midx.c b/midx.c
index 0a5da49ed6..55f4567fca 100644
--- a/midx.c
+++ b/midx.c
@@ -47,11 +47,22 @@ static uint8_t oid_version(void)
 	}
 }
 
+static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+{
+	return m->data + m->data_len - the_hash_algo->rawsz;
+}
+
 static char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+char *get_midx_rev_filename(struct multi_pack_index *m)
+{
+	return xstrfmt("%s/pack/multi-pack-index-%s.rev",
+		       m->object_dir, hash_to_hex(get_midx_checksum(m)));
+}
+
 static int midx_read_oid_fanout(const unsigned char *chunk_start,
 				size_t chunk_size, void *data)
 {
diff --git a/midx.h b/midx.h
index 93bd68189e..0a8294d2ee 100644
--- a/midx.h
+++ b/midx.h
@@ -15,6 +15,10 @@ struct multi_pack_index {
 	const unsigned char *data;
 	size_t data_len;
 
+	const uint32_t *revindex_data;
+	const uint32_t *revindex_map;
+	size_t revindex_len;
+
 	uint32_t signature;
 	unsigned char version;
 	unsigned char hash_len;
@@ -37,6 +41,8 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 
+char *get_midx_rev_filename(struct multi_pack_index *m);
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
diff --git a/pack-revindex.c b/pack-revindex.c
index 83fe4de773..2e15ba3a8f 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -3,6 +3,7 @@
 #include "object-store.h"
 #include "packfile.h"
 #include "config.h"
+#include "midx.h"
 
 struct revindex_entry {
 	off_t offset;
@@ -292,6 +293,44 @@ int load_pack_revindex(struct packed_git *p)
 	return -1;
 }
 
+int load_midx_revindex(struct multi_pack_index *m)
+{
+	char *revindex_name;
+	int ret;
+	if (m->revindex_data)
+		return 0;
+
+	revindex_name = get_midx_rev_filename(m);
+
+	ret = load_revindex_from_disk(revindex_name,
+				      m->num_objects,
+				      &m->revindex_map,
+				      &m->revindex_len);
+	if (ret)
+		goto cleanup;
+
+	m->revindex_data = (const uint32_t *)((const char *)m->revindex_map + RIDX_HEADER_SIZE);
+
+cleanup:
+	free(revindex_name);
+	return ret;
+}
+
+int close_midx_revindex(struct multi_pack_index *m)
+{
+	if (!m)
+		return 0;
+
+	if (munmap((void*)m->revindex_map, m->revindex_len))
+		return -1;
+
+	m->revindex_map = NULL;
+	m->revindex_data = NULL;
+	m->revindex_len = 0;
+
+	return 0;
+}
+
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	unsigned lo, hi;
@@ -346,3 +385,91 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
 	else
 		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
 }
+
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
+{
+	if (!m->revindex_data)
+		BUG("pack_pos_to_midx: reverse index not yet loaded");
+	if (m->num_objects <= pos)
+		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
+	return get_be32((const char *)m->revindex_data + (pos * sizeof(uint32_t)));
+}
+
+struct midx_pack_key {
+	uint32_t pack;
+	off_t offset;
+
+	uint32_t preferred_pack;
+	struct multi_pack_index *midx;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+	const struct midx_pack_key *key = va;
+	struct multi_pack_index *midx = key->midx;
+
+	uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+	uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
+	off_t versus_offset;
+
+	uint32_t key_preferred = key->pack == key->preferred_pack;
+	uint32_t versus_preferred = versus_pack == key->preferred_pack;
+
+	/*
+	 * First, compare the preferred-ness, noting that the preferred pack
+	 * comes first.
+	 */
+	if (key_preferred && !versus_preferred)
+		return -1;
+	else if (!key_preferred && versus_preferred)
+		return 1;
+
+	/* Then, break ties first by comparing the pack IDs. */
+	if (key->pack < versus_pack)
+		return -1;
+	else if (key->pack > versus_pack)
+		return 1;
+
+	/* Finally, break ties by comparing offsets within a pack. */
+	versus_offset = nth_midxed_offset(midx, versus);
+	if (key->offset < versus_offset)
+		return -1;
+	else if (key->offset > versus_offset)
+		return 1;
+
+	return 0;
+}
+
+int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
+{
+	struct midx_pack_key key;
+	uint32_t *found;
+
+	if (!m->revindex_data)
+		BUG("midx_to_pack_pos: reverse index not yet loaded");
+	if (m->num_objects <= at)
+		BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
+
+	key.pack = nth_midxed_pack_int_id(m, at);
+	key.offset = nth_midxed_offset(m, at);
+	key.midx = m;
+	/*
+	 * The preferred pack sorts first, so determine its identifier by
+	 * looking at the first object in pseudo-pack order.
+	 *
+	 * Note that if no --preferred-pack is explicitly given when writing a
+	 * multi-pack index, then whichever pack has the lowest identifier
+	 * implicitly is preferred (and includes all its objects, since ties are
+	 * broken first by pack identifier).
+	 */
+	key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+
+	found = bsearch(&key, m->revindex_data, m->num_objects,
+			sizeof(uint32_t), midx_pack_order_cmp);
+
+	if (!found)
+		return error("bad offset for revindex");
+
+	*pos = found - m->revindex_data;
+	return 0;
+}
diff --git a/pack-revindex.h b/pack-revindex.h
index ba7c82c125..479b8f2f9c 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -14,6 +14,20 @@
  *
  * - offset: the byte offset within the .pack file at which the object contents
  *   can be found
+ *
+ * The revindex can also be used with a multi-pack index (MIDX). In this
+ * setting:
+ *
+ *   - index position refers to an object's numeric position within the MIDX
+ *
+ *   - pack position refers to an object's position within a non-existent pack
+ *     described by the MIDX. The pack structure is described in
+ *     Documentation/technical/pack-format.txt.
+ *
+ *     It is effectively a concatanation of all packs in the MIDX (ordered by
+ *     their numeric ID within the MIDX) in their original order within each
+ *     pack), removing duplicates, and placing the preferred pack (if any)
+ *     first.
  */
 
 
@@ -24,6 +38,7 @@
 #define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
 
 struct packed_git;
+struct multi_pack_index;
 
 /*
  * load_pack_revindex populates the revindex's internal data-structures for the
@@ -34,6 +49,22 @@ struct packed_git;
  */
 int load_pack_revindex(struct packed_git *p);
 
+/*
+ * load_midx_revindex loads the '.rev' file corresponding to the given
+ * multi-pack index by mmap-ing it and assigning pointers in the
+ * multi_pack_index to point at it.
+ *
+ * A negative number is returned on error.
+ */
+int load_midx_revindex(struct multi_pack_index *m);
+
+/*
+ * Frees resources associated with a multi-pack reverse index.
+ *
+ * A negative number is returned on error.
+ */
+int close_midx_revindex(struct multi_pack_index *m);
+
 /*
  * offset_to_pack_pos converts an object offset to a pack position. This
  * function returns zero on success, and a negative number otherwise. The
@@ -71,4 +102,26 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
  */
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
 
+/*
+ * pack_pos_to_midx converts the object at position "pos" within the MIDX
+ * pseudo-pack into a MIDX position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in time O(log N) with the number of objects in the MIDX.
+ */
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos);
+
+/*
+ * midx_to_pack_pos converts from the MIDX-relative position at "at" to the
+ * corresponding pack position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in constant time.
+ */
+int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 1fec12ac5f..82623e0cb4 100644
--- a/packfile.c
+++ b/packfile.c
@@ -862,6 +862,9 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
+	if (starts_with(file_name, "multi-pack-index") &&
+	    ends_with(file_name, ".rev"))
+		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
 	    ends_with(file_name, ".pack") ||
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 14/16] pack-write.c: extract 'write_rev_file_order'
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (12 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 13/16] pack-revindex: read " Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-11 17:05   ` [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Existing callers provide the reverse index code with an array of 'struct
pack_idx_entry *'s, which is then sorted by pack order (comparing the
offsets of each object within the pack).

Prepare for the multi-pack index to write a .rev file by providing a way
to write the reverse index without an array of pack_idx_entry (which the
MIDX code does not have).

Instead, callers can invoke 'write_rev_index_positions()', which takes
an array of uint32_t's. The ith entry in this array specifies the ith
object's (in index order) position within the pack (in pack order).

Expose this new function for use in a later patch, and rewrite the
existing write_rev_file() in terms of this new function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-write.c | 36 +++++++++++++++++++++++++-----------
 pack.h       |  1 +
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/pack-write.c b/pack-write.c
index 2ca85a9d16..f1fc3ecafa 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -201,21 +201,12 @@ static void write_rev_header(struct hashfile *f)
 }
 
 static void write_rev_index_positions(struct hashfile *f,
-				      struct pack_idx_entry **objects,
+				      uint32_t *pack_order,
 				      uint32_t nr_objects)
 {
-	uint32_t *pack_order;
 	uint32_t i;
-
-	ALLOC_ARRAY(pack_order, nr_objects);
-	for (i = 0; i < nr_objects; i++)
-		pack_order[i] = i;
-	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
-
 	for (i = 0; i < nr_objects; i++)
 		hashwrite_be32(f, pack_order[i]);
-
-	free(pack_order);
 }
 
 static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
@@ -228,6 +219,29 @@ const char *write_rev_file(const char *rev_name,
 			   uint32_t nr_objects,
 			   const unsigned char *hash,
 			   unsigned flags)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+	const char *ret;
+
+	ALLOC_ARRAY(pack_order, nr_objects);
+	for (i = 0; i < nr_objects; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
+
+	ret = write_rev_file_order(rev_name, pack_order, nr_objects, hash,
+				   flags);
+
+	free(pack_order);
+
+	return ret;
+}
+
+const char *write_rev_file_order(const char *rev_name,
+				 uint32_t *pack_order,
+				 uint32_t nr_objects,
+				 const unsigned char *hash,
+				 unsigned flags)
 {
 	struct hashfile *f;
 	int fd;
@@ -262,7 +276,7 @@ const char *write_rev_file(const char *rev_name,
 
 	write_rev_header(f);
 
-	write_rev_index_positions(f, objects, nr_objects);
+	write_rev_index_positions(f, pack_order, nr_objects);
 	write_rev_trailer(f, hash);
 
 	if (rev_name && adjust_shared_perm(rev_name) < 0)
diff --git a/pack.h b/pack.h
index 857cbd5bd4..fa13954526 100644
--- a/pack.h
+++ b/pack.h
@@ -94,6 +94,7 @@ struct ref;
 void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
 
 const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
+const char *write_rev_file_order(const char *rev_name, uint32_t *pack_order, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (13 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-29 12:53     ` Jeff King
  2021-03-11 17:05   ` [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
                     ` (2 subsequent siblings)
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

Implement the writing half of multi-pack reverse indexes. This is
nothing more than the format describe a few patches ago, with a new set
of helper functions that will be used to clear out stale .rev files
corresponding to old MIDXs.

Unfortunately, a very similar comparison function as the one implemented
recently in pack-revindex.c is reimplemented here, this time accepting a
MIDX-internal type. An effort to DRY these up would create more
indirection and overhead than is necessary, so it isn't pursued here.

Currently, there are no callers which pass the MIDX_WRITE_REV_INDEX
flag, meaning that this is all dead code. But, that won't be the case
for long, since subsequent patches will introduce the multi-pack bitmap,
which will begin passing this field.

(In midx.c:write_midx_internal(), the two adjacent if statements share a
conditional, but are written separately since the first one will
eventually also handle the MIDX_WRITE_BITMAP flag, which does not yet
exist.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 midx.h |   1 +
 2 files changed, 116 insertions(+)

diff --git a/midx.c b/midx.c
index 55f4567fca..eea9574d92 100644
--- a/midx.c
+++ b/midx.c
@@ -12,6 +12,7 @@
 #include "run-command.h"
 #include "repository.h"
 #include "chunk-format.h"
+#include "pack.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -472,6 +473,7 @@ struct write_midx_context {
 	uint32_t entries_nr;
 
 	uint32_t *pack_perm;
+	uint32_t *pack_order;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
 
@@ -826,6 +828,70 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
+static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
+{
+	struct write_midx_context *ctx = _ctx;
+
+	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
+	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
+
+	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
+	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
+
+	/* Sort objects in the preferred pack ahead of any others. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
+	/* Then, order objects by which packs they appear in. */
+	if (perm_a < perm_b)
+		return -1;
+	if (perm_a > perm_b)
+		return 1;
+
+	/* Then, disambiguate by their offset within each pack. */
+	if (a->offset < b->offset)
+		return -1;
+	if (a->offset > b->offset)
+		return 1;
+
+	return 0;
+}
+
+static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
+
+	return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+				     struct write_midx_context *ctx)
+{
+	struct strbuf buf = STRBUF_INIT;
+	const char *tmp_file;
+
+	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
+					midx_hash, WRITE_REV);
+
+	if (finalize_object_file(tmp_file, buf.buf))
+		die(_("cannot store reverse index file"));
+
+	strbuf_release(&buf);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash);
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -1011,6 +1077,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		ctx.pack_order = midx_pack_order(&ctx);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 	commit_lock_file(&lk);
 
 cleanup:
@@ -1025,6 +1099,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.info);
 	free(ctx.entries);
 	free(ctx.pack_perm);
+	free(ctx.pack_order);
 	free(midx_name);
 	return result;
 }
@@ -1037,6 +1112,44 @@ int write_midx_file(const char *object_dir,
 				   flags);
 }
 
+struct clear_midx_data {
+	char *keep;
+	const char *ext;
+};
+
+static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
+				const char *file_name, void *_data)
+{
+	struct clear_midx_data *data = _data;
+
+	if (!(starts_with(file_name, "multi-pack-index-") &&
+	      ends_with(file_name, data->ext)))
+		return;
+	if (data->keep && !strcmp(data->keep, file_name))
+		return;
+
+	if (unlink(full_path))
+		die_errno(_("failed to remove %s"), full_path);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash)
+{
+	struct clear_midx_data data;
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	if (keep_hash)
+		data.keep = xstrfmt("multi-pack-index-%s%s",
+				    hash_to_hex(keep_hash), ext);
+	data.ext = ext;
+
+	for_each_file_in_pack_dir(r->objects->odb->path,
+				  clear_midx_file_ext,
+				  &data);
+
+	free(data.keep);
+}
+
 void clear_midx_file(struct repository *r)
 {
 	char *midx = get_midx_filename(r->objects->odb->path);
@@ -1049,6 +1162,8 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".rev", NULL);
+
 	free(midx);
 }
 
diff --git a/midx.h b/midx.h
index 0a8294d2ee..8684cf0fef 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,7 @@ struct multi_pack_index {
 };
 
 #define MIDX_PROGRESS     (1 << 0)
+#define MIDX_WRITE_REV_INDEX (1 << 1)
 
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp()
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (14 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
@ 2021-03-11 17:05   ` Taylor Blau
  2021-03-29 12:59     ` Jeff King
  2021-03-12 15:16   ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Derrick Stolee
  2021-03-29 13:05   ` Jeff King
  17 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-11 17:05 UTC (permalink / raw)
  To: git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

From: Jeff King <peff@peff.net>

There is a lot of pointer dereferencing in the pre-image version of
'midx_pack_order_cmp()', which this patch gets rid of.

Instead of comparing the pack preferred-ness and then the pack id, both
of these checks are done at the same time by using the high-order bit of
the pack id to represent whether it's preferred. Then the pack id and
offset are compared as usual.

This produces the same result so long as there are less than 2^31 packs,
which seems like a likely assumption to make in practice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 55 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/midx.c b/midx.c
index eea9574d92..4835cc13d1 100644
--- a/midx.c
+++ b/midx.c
@@ -828,46 +828,49 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
-static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
+struct midx_pack_order_data {
+	uint32_t nr;
+	uint32_t pack;
+	off_t offset;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
 {
-	struct write_midx_context *ctx = _ctx;
-
-	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
-	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
-
-	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
-	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
-
-	/* Sort objects in the preferred pack ahead of any others. */
-	if (a->preferred > b->preferred)
+	const struct midx_pack_order_data *a = va, *b = vb;
+	if (a->pack < b->pack)
 		return -1;
-	if (a->preferred < b->preferred)
+	else if (a->pack > b->pack)
 		return 1;
-
-	/* Then, order objects by which packs they appear in. */
-	if (perm_a < perm_b)
+	else if (a->offset < b->offset)
 		return -1;
-	if (perm_a > perm_b)
+	else if (a->offset > b->offset)
 		return 1;
-
-	/* Then, disambiguate by their offset within each pack. */
-	if (a->offset < b->offset)
-		return -1;
-	if (a->offset > b->offset)
-		return 1;
-
-	return 0;
+	else
+		return 0;
 }
 
 static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 {
+	struct midx_pack_order_data *data;
 	uint32_t *pack_order;
 	uint32_t i;
 
+	ALLOC_ARRAY(data, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *e = &ctx->entries[i];
+		data[i].nr = i;
+		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		if (!e->preferred)
+			data[i].pack |= (1U << 31);
+		data[i].offset = e->offset;
+	}
+
+	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
+
 	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	for (i = 0; i < ctx->entries_nr; i++)
-		pack_order[i] = i;
-	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
+		pack_order[i] = data[i].nr;
+	free(data);
 
 	return pack_order;
 }
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 00/16] midx: implement a multi-pack reverse index
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (15 preceding siblings ...)
  2021-03-11 17:05   ` [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
@ 2021-03-12 15:16   ` Derrick Stolee
  2021-03-29 13:05   ` Jeff King
  17 siblings, 0 replies; 171+ messages in thread
From: Derrick Stolee @ 2021-03-12 15:16 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: avarab, dstolee, gitster, jonathantanmy, peff

On 3/11/2021 12:04 PM, Taylor Blau wrote:
> Here is another reroll of my series to implement a reverse index in
> preparation for multi-pack reachability bitmaps. The previous version
> was based on 'ds/chunked-file-api', but that topic has since been merged
> to 'master'. This series is now built directly on top of 'master'.
> 
> Not much has changed since last time. Jonathan Tan reviewed the previous
> version, and I incorporated feedback from his review:
> 
>   - The usage macros in builtin/multi-pack-index.c were pulled out and
>     defined separately.
>   - Some sloppiness with converting a signed index referring to the
>     preferred pack into an unsigned value was cleaned up.
>   - Documentation clean-up, particularly in patches 12 and 13.
> 
> There are a couple of new things that we found while testing this out at
> GitHub.
> 
>   - We now call finalize_object_file() on the multi-pack reverse index
>     to set the correct permissions.
>   - Patch 14 removed a stray hunk that introduced a memory leak.
>   - Patch 16 (courtesy of Peff) is new. It improves the cache locality
>     of midx_pack_order_cmp(), which has a substantial impact on
>     repositories with many objects.
> 
> Thanks in advance for your review.

I've reviewed the changes since my last review and this one looks
good, including that new patch from Peff.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options
  2021-03-11 17:04   ` [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
@ 2021-03-29 11:20     ` Jeff King
  0 siblings, 0 replies; 171+ messages in thread
From: Jeff King @ 2021-03-29 11:20 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:04:36PM -0500, Taylor Blau wrote:

> Subcommands of the 'git multi-pack-index' command (e.g., 'write',
> 'verify', etc.) will want to optionally change a set of shared flags
> that are eventually passed to the MIDX libraries.
> 
> Right now, options and flags are handled separately. Inline them into
> the same structure so that sub-commands can more easily share the
> 'flags' data.

This "opts" struct is kind of funny. It is used to collect the options
in cmd_multi_pack_index(), but nobody ever passes it anywhere! Instead,
we pass individual components of it around.

So I'm not sure I buy "...so that sub-commands can more easily share the
flags data", since either way they are all receiving the individual
flags field already. And your patch 2 could just as easily do the same
simplification by modifying the function-local "flags" variable.

But. I think things get more interesting when you later introduce
common_opts, because now those options have to refer back to actuals
storage for each item. Which means that "flags" would have to become a
global variable. And there it's nicer to have all of the options stuffed
into a struct, even if it is a single global struct.

So I think this is the right direction, but it took me a minute to
realize quite why.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-03-11 17:04   ` [PATCH v3 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
@ 2021-03-29 11:22     ` Jeff King
  0 siblings, 0 replies; 171+ messages in thread
From: Jeff King @ 2021-03-29 11:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:04:40PM -0500, Taylor Blau wrote:

> Now that there is a shared 'flags' member in the options structure,
> there is no need to keep track of whether to force progress or not,
> since ultimately the decision of whether or not to show a progress meter
> is controlled by a bit in the flags member.

Just going back to what I wrote for patch 1, I think this "now that
there is a shared flags..." bit is what misled me.

You can easily have done this patch by just manipulating the local
"flags" variable. And the rationale for the patch is "we can get rid of
opts.progress, because nobody ever reads it except to set a bit
opts.flags".

Definitely not worth re-rolling or anything; I'm just explaining my
earlier comments. :)

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands
  2021-03-11 17:04   ` [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
@ 2021-03-29 11:36     ` Jeff King
  2021-03-29 20:38       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-29 11:36 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:04:49PM -0500, Taylor Blau wrote:

> Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
> "write", "repack", etc.) separately from one another. This allows
> sub-commands with unique options, without forcing cmd_multi_pack_index()
> to reject invalid combinations itself.
> 
> This comes at the cost of some duplication and boilerplate. Luckily, the
> duplication is reduced to a minimum, since common options are shared
> among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
> retain the common options, too, since this builtin accepts common
> options on either side of the sub-command).
> 
> Roughly speaking, cmd_multi_pack_index() parses options (including
> common ones), and stops at the first non-option, which is the
> sub-command. It then dispatches to the appropriate sub-command, which
> parses the remaining options (also including common options).
> 
> Unknown options are kept by the sub-commands in order to detect their
> presence (and complain that too many arguments were given).

Makes sense, and the implementation looks pretty clean.

A few small nits:

> +static struct option *add_common_options(struct option *prev)
>  {
> -	static struct option builtin_multi_pack_index_options[] = {
> -		OPT_FILENAME(0, "object-dir", &opts.object_dir,
> -		  N_("object directory containing set of packfile and pack-index pairs")),
> -		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
> +	struct option *with_common = parse_options_concat(common_opts, prev);
> +	free(prev);
> +	return with_common;
> +}

This free(prev) pattern is copied from builtin/checkout.c, where we have
multiple layers of options, each added by a function. So it requires
that callers duplicate the base set of options, and each subsequent
"add_foo_options()" concatenates that and frees the old one.

But here, we only have one layer, so in the caller which uses it:

> +static int cmd_multi_pack_index_repack(int argc, const char **argv)
> +{
> [..]
> +	options = parse_options_dup(builtin_multi_pack_index_repack_options);
> +	options = add_common_options(options);

we do a rather pointless dup() followed by free(). Perhaps not that big
a deal, and this would naturally extend to adding other option sets, so
it may even be considered future-proofing. But it did confuse me for a
moment.

However, we do end up leaking the return value from add_common_options()
at the end of the function:

> +	options = parse_options_dup(builtin_multi_pack_index_repack_options);
> +	options = add_common_options(options);
> +
> +	argc = parse_options(argc, argv, NULL,
> +			     options,
> +			     builtin_multi_pack_index_repack_usage,
> +			     PARSE_OPT_KEEP_UNKNOWN);
> +	if (argc)
> +		usage_with_options(builtin_multi_pack_index_repack_usage,
> +				   options);
> +
> +	return midx_repack(the_repository, opts.object_dir,
> +			   (size_t)opts.batch_size, opts.flags);
> +}

This is definitely a harmless leak in the sense that we are going to
exit the program after midx_repack() returns anyway. But it might be
worth keeping things tidy, as we've recently seen a renewed effort to do
some leak-checking of the test suite. I _think_ we can just free the
options struct (even though we are still using the values themselves, we
don't care about the "struct options" anymore). But even if not, an
UNLEAK(options) annotation would do it.

(This doesn't apply to the other functions, because they just use
common_opts directly).

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command
  2021-03-11 17:04   ` [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
@ 2021-03-29 11:42     ` Jeff King
  2021-03-29 20:41       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-29 11:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:04:57PM -0500, Taylor Blau wrote:

> When given a sub-command that it doesn't understand, 'git
> multi-pack-index' dies with the following message:
> 
>     $ git multi-pack-index bogus
>     fatal: unrecognized subcommand: bogus
> 
> Instead of 'die()'-ing, we can display the usage text, which is much
> more helpful:
> 
>     $ git.compile multi-pack-index bogus
>     usage: git multi-pack-index [<options>] write
>        or: git multi-pack-index [<options>] verify
>        or: git multi-pack-index [<options>] expire
>        or: git multi-pack-index [<options>] repack [--batch-size=<size>]
> 
> 	--object-dir <file>   object directory containing set of packfile and pack-index pairs
> 	--progress            force progress reporting
> 
> While we're at it, clean up some duplication between the "no sub-command"
> and "unrecognized sub-command" conditionals.

I agree that it's much nicer to give the usage. But my preference in
general for cases like this is to _also_ explain what we found wrong
with the options we were given.

E.g., with a bogus option, we say so:

  $ git multi-pack-index --foo
  error: unknown option `foo'
  usage: git multi-pack-index [<options>] write [--preferred-pack=<pack>]
  [etc...]

but with a bogus sub-command, we get just the usage string:

  $ git multi-pack-index foo
  usage: git multi-pack-index [<options>] write [--preferred-pack=<pack>]
  [etc...]

Sometimes it is quote obvious what is wrong, but sometimes typos can be
hard to spot, especially because the usage message is so long.

I.e., I'd suggest changing this:

>  	else
> -		die(_("unrecognized subcommand: %s"), argv[0]);
> +usage:
> +		usage_with_options(builtin_multi_pack_index_usage,
> +				   builtin_multi_pack_index_options);

to:

  error(_("unrecognized subcommand: %s"), argv[0]);
  usage_with_options(...);

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 08/16] midx: allow marking a pack as preferred
  2021-03-11 17:05   ` [PATCH v3 08/16] midx: allow marking a pack as preferred Taylor Blau
@ 2021-03-29 12:00     ` Jeff King
  2021-03-29 21:15       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-29 12:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:05:07PM -0500, Taylor Blau wrote:

> To encourage the pack selection process to prefer one pack over another
> (the pack to be preferred is the one a caller would like to later use as
> a reuse pack), introduce the concept of a "preferred pack". When
> provided, the MIDX code will always prefer an object found in a
> preferred pack over any other.
> 
> No format changes are required to store the preferred pack, since it
> will be able to be inferred with a corresponding MIDX bitmap, by looking
> up the pack associated with the object in the first bit position (this
> ordering is described in detail in a subsequent commit).

I think in the long run we may want to add a midx chunk that gives the
order of the packs (and likewise allow the caller of "midx write" to
specify the exact order), since that may allow correlating locality
between history and object order within the .rev/.bitmap files.

But I think this is a nice stopping point for this series, since we're
not having to introduce any new on-disk formats to do it, and it seems
to give pretty good results in practice. I guess we'll have to support
--preferred-pack forever, but that's OK. Even if we do eventually
support arbitrary orderings, it's just a simple subset of that
functionality.

>  static int cmd_multi_pack_index_write(int argc, const char **argv)
>  {
> -	struct option *options = common_opts;
> +	struct option *options;
> +	static struct option builtin_multi_pack_index_write_options[] = {
> +		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
> +			   N_("preferred-pack"),
> +			   N_("pack for reuse when computing a multi-pack bitmap")),
> +		OPT_END(),
> +	};
> +
> +	options = parse_options_dup(builtin_multi_pack_index_write_options);
> +	options = add_common_options(options);
>  
>  	trace2_cmd_mode(argv[0]);
>  
> @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  		usage_with_options(builtin_multi_pack_index_write_usage,
>  				   options);
>  
> -	return write_midx_file(opts.object_dir, opts.flags);
> +	return write_midx_file(opts.object_dir, opts.preferred_pack,
> +			       opts.flags);
>  }

This has the same leak of "options" that I mentioned in the earlier
patch.

> diff --git a/midx.c b/midx.c
> index 971faa8cfc..46f55ff6cf 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
>  	return strcmp(a->pack_name, b->pack_name);
>  }
>  
> +static int lookup_idx_or_pack_name(struct pack_info *info,
> +				   uint32_t nr,
> +				   const char *pack_name)
> +{
> +	uint32_t lo = 0, hi = nr;
> +	while (lo < hi) {
> +		uint32_t mi = lo + (hi - lo) / 2;
> +		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
> +		if (cmp < 0)
> +			hi = mi;
> +		else if (cmp > 0)
> +			lo = mi + 1;
> +		else
> +			return mi;
> +	}
> +	return -1;
> +}

Could this just be replaced with bsearch() in the caller?

> +test_expect_success 'midx picks objects from preferred pack' '
> +	test_when_finished rm -rf preferred.git &&
> +	git init --bare preferred.git &&
> +	(
> +		cd preferred.git &&
> +
> +		a=$(echo "a" | git hash-object -w --stdin) &&
> +		b=$(echo "b" | git hash-object -w --stdin) &&
> +		c=$(echo "c" | git hash-object -w --stdin) &&
> +
> +		# Set up two packs, duplicating the object "B" at different
> +		# offsets.
> +		git pack-objects objects/pack/test-AB <<-EOF &&
> +		$a
> +		$b
> +		EOF
> +		bc=$(git pack-objects objects/pack/test-BC <<-EOF
> +		$b
> +		$c
> +		EOF
> +		) &&

I don't think pack-objects guarantees that the pack ordering matches the
input it received. compute_write_order() uses a variety of heuristics to
reorder things. I think this will work in practice with the current
code, because the objects have the same type, there are no deltas, and
the fallback ordering is input-order (or traversal order, if --revs is
used).

So it's probably OK in practice, though if we wanted to be paranoid we
could check that show-index produces different results for the $b entry
of both packs. That said...

> +		git multi-pack-index --object-dir=objects \
> +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> +		test_must_be_empty err &&
> +
> +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> +			cut -d" " -f1) &&
> +		midx_expect_object_offset $b $ofs objects
> +	)

...what we really care about is that the object came from BC. And we are
just using the offset as a proxy for that. But doesn't "test-tool
read-midx" give us the actual pack name? We could just be checking that.

I also wondered if we should confirm that without the --preferred-pack
option, we choose the other pack. I think it will always be true because
the default order is to sort them lexically. A comment to that effect
might be worth it (near the "set up two packs" comment).

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes
  2021-03-11 17:05   ` [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
@ 2021-03-29 12:12     ` Jeff King
  2021-03-29 21:22       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-29 12:12 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:05:25PM -0500, Taylor Blau wrote:

> As a prerequisite to implementing multi-pack bitmaps, motivate and
> describe the format and ordering of the multi-pack reverse index.

Nicely written overall. I found a few typos / formatting issues.

> +One solution is to let bits occupy the same position in the oid-sorted
> +index stored by the MIDX. But because oids are effectively random, there

s/there/their/

> +Given the list of packs and their counts of objects, you can
> +na&iuml;vely reconstruct that pseudo-pack ordering (e.g., the object at

An HTML entity seems to have snuck in. The source is utf8, so we can
just say ï.

> +position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
> +slots). But there's a catch. Objects may be duplicated between packs, in
> +which case the MIDX only stores one pointer to the object (and thus we'd
> +want only one slot in the bitmap).
> +
> +Callers could handle duplicates themselves by reading objects in order
> +of their bit-position, but that's linear in the number of objects, and
> +much too expensive for ordinary bitmap lookups. Building a reverse index
> +solves this, since it is the logical inverse of the index, and that
> +index has already removed duplicates. But, building a reverse index on
> +the fly can be expensive. Since we already have an on-disk format for
> +pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
> +too.

Yep, I think this nicely builds up the logic explaining the need for the
midx .rev file.

> +Objects from the MIDX are ordered as follows to string together the
> +pseudo-pack. Let _pack(o)_ return the pack from which _o_ was selected
> +by the MIDX, and define an ordering of packs based on their numeric ID
> +(as stored by the MIDX). Let _offset(o)_ return the object offset of _o_
> +within _pack(o)_. Then, compare _o~1~_ and _o~2~_ as follows:

I guess the asciidoc-formatted version of this makes these nicely
italicized and subscripted. Personally I think pack(o) and o1 would be
more readable in the source (which is what I would tend to read). Or
maybe backticks if you want to be fancy.

> +  - If _pack(o~1~) &ne; pack(o~2~)_, then sort the two objects in
> +    descending order based on the pack ID.
> +
> +  - Otherwise, _pack(o~1~) &equals; pack(o~2~)_, and the objects are
> +    sorted in pack-order (i.e., _o~1~_ sorts ahead of _o~2~_ exactly
> +    when _offset(o~1~) &lt; offset(o~2~)_).

A few more HTML bits in the comparison operators.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 13/16] pack-revindex: read multi-pack reverse indexes
  2021-03-11 17:05   ` [PATCH v3 13/16] pack-revindex: read " Taylor Blau
@ 2021-03-29 12:43     ` Jeff King
  2021-03-29 21:27       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-29 12:43 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:05:29PM -0500, Taylor Blau wrote:

> Implement reading for multi-pack reverse indexes, as described in the
> previous patch.

Looks good overall. I found a few tiny nits below.

> +int load_midx_revindex(struct multi_pack_index *m)
> +{
> +	char *revindex_name;
> +	int ret;
> +	if (m->revindex_data)
> +		return 0;
> +
> +	revindex_name = get_midx_rev_filename(m);
> +
> +	ret = load_revindex_from_disk(revindex_name,
> +				      m->num_objects,
> +				      &m->revindex_map,
> +				      &m->revindex_len);
> +	if (ret)
> +		goto cleanup;

On error, I wondered if m->revindex_map, etc, would be modified. But it
looks like no, load_revindex_from_disk() is careful not to touch them
unless it sees a valid revindex. Good.

> +int close_midx_revindex(struct multi_pack_index *m)
> +{
> +	if (!m)
> +		return 0;
> +
> +	if (munmap((void*)m->revindex_map, m->revindex_len))
> +		return -1;
> +
> +	m->revindex_map = NULL;
> +	m->revindex_data = NULL;
> +	m->revindex_len = 0;
> +
> +	return 0;
> +}

It's hard to imagine why munmap() would fail. But if it does, we should
probably clear the struct fields anyway. I note that the matching code
for a "struct packed_git" does not bother even checking the return value
of munmap. Perhaps we should just do the same here.

The packed_git version also returned early if revindex_map is NULL. Here
the burden is placed on the caller (it's hard to tell if that matters
since there aren't any callers yet, but it probably makes sense to push
the check down into this function).

> +uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
> +{
> +	if (!m->revindex_data)
> +		BUG("pack_pos_to_midx: reverse index not yet loaded");
> +	if (m->num_objects <= pos)
> +		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
> +	return get_be32((const char *)m->revindex_data + (pos * sizeof(uint32_t)));
> +}

OK, this one is just a direct read of the .rev data, like
pack_pos_to_index() is. I think the final line can be simplified to:

  return get_be32(m->revindex_data + pos);

just like pack_pos_to_index(). (I suspect this is a leftover from the
earlier version of your .rev series where the pointer was still a "void
*").

> +int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
> +{
> +	struct midx_pack_key key;
> +	uint32_t *found;
> +
> +	if (!m->revindex_data)
> +		BUG("midx_to_pack_pos: reverse index not yet loaded");
> +	if (m->num_objects <= at)
> +		BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
> +
> +	key.pack = nth_midxed_pack_int_id(m, at);
> +	key.offset = nth_midxed_offset(m, at);
> +	key.midx = m;
> +	/*
> +	 * The preferred pack sorts first, so determine its identifier by
> +	 * looking at the first object in pseudo-pack order.
> +	 *
> +	 * Note that if no --preferred-pack is explicitly given when writing a
> +	 * multi-pack index, then whichever pack has the lowest identifier
> +	 * implicitly is preferred (and includes all its objects, since ties are
> +	 * broken first by pack identifier).
> +	 */
> +	key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
> +
> +	found = bsearch(&key, m->revindex_data, m->num_objects,
> +			sizeof(uint32_t), midx_pack_order_cmp);

OK, this one is _roughly_ equivalent to offset_to_pack_pos(), in that we
have to binary search within the pack-ordered list to find the entry.
Makes sense.

Probably sizeof(*m->revindex_data) would be slightly nicer in the
bsearch call (again, I suspect a holdover from when that was a void
pointer).

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes
  2021-03-11 17:05   ` [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
@ 2021-03-29 12:53     ` Jeff King
  2021-03-29 21:30       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-29 12:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:05:38PM -0500, Taylor Blau wrote:

> Implement the writing half of multi-pack reverse indexes. This is
> nothing more than the format describe a few patches ago, with a new set
> of helper functions that will be used to clear out stale .rev files
> corresponding to old MIDXs.

Looks good.

> +struct clear_midx_data {
> +	char *keep;
> +	const char *ext;
> +};
> +
> +static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
> +				const char *file_name, void *_data)

This will clean up _any_ stale midx .rev file. So even if we miss one
when writing a new midx (due to a bug, race, power loss, etc), we'll
catch it later.

We _might_ want to also teach various tempfile-cleanup code run by gc to
likewise look for unattached midx .rev files, but I don't think we
necessarily have to do it now.

>  void clear_midx_file(struct repository *r)
>  {
>  	char *midx = get_midx_filename(r->objects->odb->path);
> @@ -1049,6 +1162,8 @@ void clear_midx_file(struct repository *r)
>  	if (remove_path(midx))
>  		die(_("failed to clear multi-pack-index at %s"), midx);
>  
> +	clear_midx_files_ext(r, ".rev", NULL);
> +
>  	free(midx);

The sole caller now doesn't pass the "keep" hash, so we'd always delete
all of them. I guess we'll see that change once somebody starts actually
writing them.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp()
  2021-03-11 17:05   ` [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
@ 2021-03-29 12:59     ` Jeff King
  2021-03-29 21:34       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-29 12:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:05:42PM -0500, Taylor Blau wrote:

> From: Jeff King <peff@peff.net>
> 
> There is a lot of pointer dereferencing in the pre-image version of
> 'midx_pack_order_cmp()', which this patch gets rid of.
> 
> Instead of comparing the pack preferred-ness and then the pack id, both
> of these checks are done at the same time by using the high-order bit of
> the pack id to represent whether it's preferred. Then the pack id and
> offset are compared as usual.
> 
> This produces the same result so long as there are less than 2^31 packs,
> which seems like a likely assumption to make in practice.

Obviously this patch is brilliant. ;)

Did we record any numbers to show the improvement here? I don't think it
can be demonstrated with this series (since most of the code is dead),
but I recall that this was motivated by a noticeable slowdown.

I briefly wondered whether the complicated midx_pack_order_cmp() in
pack-revindex.c, which is used for the bsearch() there, could benefit
from the same speedup. It's only log(n), of course, instead of n*log(n),
but one might imagine making "n" calls to it. I don't think it makes
sense, though. The pointer dereferencing there is into the midx mmap
itself. Creating an auxiliary array would defeat the purpose.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 00/16] midx: implement a multi-pack reverse index
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
                     ` (16 preceding siblings ...)
  2021-03-12 15:16   ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Derrick Stolee
@ 2021-03-29 13:05   ` Jeff King
  2021-03-29 21:30     ` Junio C Hamano
  2021-03-29 21:37     ` Taylor Blau
  17 siblings, 2 replies; 171+ messages in thread
From: Jeff King @ 2021-03-29 13:05 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Thu, Mar 11, 2021 at 12:04:31PM -0500, Taylor Blau wrote:

> Here is another reroll of my series to implement a reverse index in
> preparation for multi-pack reachability bitmaps. The previous version
> was based on 'ds/chunked-file-api', but that topic has since been merged
> to 'master'. This series is now built directly on top of 'master'.

I gave the whole thing another careful read. Most of what I found were
small nits, but enough that I think one more re-roll is worth it.

The biggest question is what we want to happen next. As you note, the
concept of a midx .rev file is useless until we have the matching
.bitmap file. So we _could_ let this sit in next while the dependent
bitmap topic is reviewed, and then merge them down together. But I'm
inclined to treat this as an independent topic that can get merged to
master on its own, since the early cleanups are valuable on their own,
and the .rev parts at the end, even if dead, won't hurt anything.

If we did want to break it up, the useful line would be after "allow
marking a pack as preferred" (while it is mostly intended for the bitmap
selection, it is theoretically useful on its own to make it more likely
to find a copy of an object with a useful delta).

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands
  2021-03-29 11:36     ` Jeff King
@ 2021-03-29 20:38       ` Taylor Blau
  2021-03-30  7:04         ` Jeff King
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 20:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 07:36:21AM -0400, Jeff King wrote:
> This is definitely a harmless leak in the sense that we are going to
> exit the program after midx_repack() returns anyway. But it might be
> worth keeping things tidy, as we've recently seen a renewed effort to do
> some leak-checking of the test suite. I _think_ we can just free the
> options struct (even though we are still using the values themselves, we
> don't care about the "struct options" anymore). But even if not, an
> UNLEAK(options) annotation would do it.

I see what you're saying. Let me make sure that I got the right idea in
mind after reading your email. I'm thinking of squashing the following
diff into this patch. For what it's worth, it causes 'valgrind
--leak-check=full ./git-multi-pack-index repack' to exit cleanly (when
it didn't before).

Does this match your expectations?

--- >8 ---

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 23e51dfeb4..a78640c061 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -56,9 +56,7 @@ static struct option common_opts[] = {

 static struct option *add_common_options(struct option *prev)
 {
-	struct option *with_common = parse_options_concat(common_opts, prev);
-	free(prev);
-	return with_common;
+	return parse_options_concat(common_opts, prev);
 }

 static int cmd_multi_pack_index_write(int argc, const char **argv)
@@ -112,8 +110,7 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 		OPT_END(),
 	};

-	options = parse_options_dup(builtin_multi_pack_index_repack_options);
-	options = add_common_options(options);
+	options = add_common_options(builtin_multi_pack_index_repack_options);

 	argc = parse_options(argc, argv, NULL,
 			     options,
@@ -123,6 +120,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_repack_usage,
 				   options);

+	FREE_AND_NULL(options);
+
 	return midx_repack(the_repository, opts.object_dir,
 			   (size_t)opts.batch_size, opts.flags);
 }

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command
  2021-03-29 11:42     ` Jeff King
@ 2021-03-29 20:41       ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 07:42:18AM -0400, Jeff King wrote:
> I.e., I'd suggest changing this:
>
> >  	else
> > -		die(_("unrecognized subcommand: %s"), argv[0]);
> > +usage:
> > +		usage_with_options(builtin_multi_pack_index_usage,
> > +				   builtin_multi_pack_index_options);
>
> to:
>
>   error(_("unrecognized subcommand: %s"), argv[0]);
>   usage_with_options(...);

Thanks, that's a helpful suggestion (and pretty easy to change, too).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 08/16] midx: allow marking a pack as preferred
  2021-03-29 12:00     ` Jeff King
@ 2021-03-29 21:15       ` Taylor Blau
  2021-03-30  7:11         ` Jeff King
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 21:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 08:00:59AM -0400, Jeff King wrote:
> I think in the long run we may want to add a midx chunk that gives the
> order of the packs (and likewise allow the caller of "midx write" to
> specify the exact order), since that may allow correlating locality
> between history and object order within the .rev/.bitmap files.
>
> But I think this is a nice stopping point for this series, since we're
> not having to introduce any new on-disk formats to do it, and it seems
> to give pretty good results in practice. I guess we'll have to support
> --preferred-pack forever, but that's OK. Even if we do eventually
> support arbitrary orderings, it's just a simple subset of that
> functionality.

To add a little bit of extra detail, I think what you're getting at here
is that it would be nice to let the order of the packs be dictated by
mtime, not the order they appear in the MIDX (which is lexicographic by
their hash, and thus effectively random).

The reason there being the same as you pointed out in

    https://lore.kernel.org/git/YDRdmh8oS5%2Fxq4rB@coredump.intra.peff.net/

which is that it effectively would lay objects out from newest to
oldest.

But, there's a problem, which is that the MIDX doesn't store the packs'
mtimes. That's fine for writing, since we can just look that information
up ourselves. But the reading side can get broken. That's because the
reader also has to know the pack order to go from MIDX- to bit-position.

So if a third party goes and touches some of the packs after the .rev
file was written, then the reader is going to think the packs ought to
appear in a different order than they actually do. So relying on having
to look up the mtimes again later on isn't good enough.

There are two solutions to the problem:

  - You could write the mtimes in the MIDX itself. This would give you a
    single point of reference, and resolve the TOCTOU race I just
    described.

  - Or, you could forget about mtimes entirely and let the MIDX dictate
    the pack ordering itself. That resolves the race in a
    similar-but-different way.

Of the two, I prefer the latter, but I think it introduces functionality
that we don't necessarily need yet. That's because the objects within
the packs are still ordered as such, and so the compression we get in
the packs is just as good as it is for single-pack bitmaps. It's only at
the objects between pack boundaries that any runs of 1s or 0s might be
interrupted, but there are far fewer pack boundaries than objects, so it
doesn't seem to matter in practice.

Anyway, I think that you know all of that already (mostly because we
thought aloud together when I originally brought this up), but I figure
that this detail may be interesting for other readers, too.

> > @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  		usage_with_options(builtin_multi_pack_index_write_usage,
> >  				   options);
> >
> > -	return write_midx_file(opts.object_dir, opts.flags);
> > +	return write_midx_file(opts.object_dir, opts.preferred_pack,
> > +			       opts.flags);
> >  }
>
> This has the same leak of "options" that I mentioned in the earlier
> patch.

Yup, thanks for pointing it out.

> > diff --git a/midx.c b/midx.c
> > index 971faa8cfc..46f55ff6cf 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
> >  	return strcmp(a->pack_name, b->pack_name);
> >  }
> >
> > +static int lookup_idx_or_pack_name(struct pack_info *info,
> > +				   uint32_t nr,
> > +				   const char *pack_name)
> > +{
> > +	uint32_t lo = 0, hi = nr;
> > +	while (lo < hi) {
> > +		uint32_t mi = lo + (hi - lo) / 2;
> > +		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
> > +		if (cmp < 0)
> > +			hi = mi;
> > +		else if (cmp > 0)
> > +			lo = mi + 1;
> > +		else
> > +			return mi;
> > +	}
> > +	return -1;
> > +}
>
> Could this just be replaced with bsearch() in the caller?

Great suggestion. Yes, it can be. FWIW, I think that I may have
originally thought that it couldn't be since we were comparing a fixed
string to an array of structs (each having a field which holds the value
we actually want to compare). But bsearch() always passes the key as the
first argument to the comparator, so this is possible to do.

> > +		git multi-pack-index --object-dir=objects \
> > +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> > +		test_must_be_empty err &&
> > +
> > +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> > +			cut -d" " -f1) &&
> > +		midx_expect_object_offset $b $ofs objects
> > +	)
>
> ...what we really care about is that the object came from BC. And we are
> just using the offset as a proxy for that. But doesn't "test-tool
> read-midx" give us the actual pack name? We could just be checking that.
>
> I also wondered if we should confirm that without the --preferred-pack
> option, we choose the other pack. I think it will always be true because
> the default order is to sort them lexically. A comment to that effect
> might be worth it (near the "set up two packs" comment).

Both great points, thanks.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes
  2021-03-29 12:12     ` Jeff King
@ 2021-03-29 21:22       ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 21:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 08:12:39AM -0400, Jeff King wrote:
> On Thu, Mar 11, 2021 at 12:05:25PM -0500, Taylor Blau wrote:
>
> > As a prerequisite to implementing multi-pack bitmaps, motivate and
> > describe the format and ordering of the multi-pack reverse index.
>
> Nicely written overall. I found a few typos / formatting issues.

Thanks for the attention to detail. Everything you wrote makes sense to
me (including a quite-embarrassing mistake to switch "their" with
"there").

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 13/16] pack-revindex: read multi-pack reverse indexes
  2021-03-29 12:43     ` Jeff King
@ 2021-03-29 21:27       ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 21:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 08:43:39AM -0400, Jeff King wrote:
> On Thu, Mar 11, 2021 at 12:05:29PM -0500, Taylor Blau wrote:
>
> > Implement reading for multi-pack reverse indexes, as described in the
> > previous patch.
>
> Looks good overall. I found a few tiny nits below.
>
> > +int load_midx_revindex(struct multi_pack_index *m)
> > +{
> > +	char *revindex_name;
> > +	int ret;
> > +	if (m->revindex_data)
> > +		return 0;
> > +
> > +	revindex_name = get_midx_rev_filename(m);
> > +
> > +	ret = load_revindex_from_disk(revindex_name,
> > +				      m->num_objects,
> > +				      &m->revindex_map,
> > +				      &m->revindex_len);
> > +	if (ret)
> > +		goto cleanup;
>
> On error, I wondered if m->revindex_map, etc, would be modified. But it
> looks like no, load_revindex_from_disk() is careful not to touch them
> unless it sees a valid revindex. Good.

Yep, that was intentional. Thanks.

> > +int close_midx_revindex(struct multi_pack_index *m)
> > +{
> > +	if (!m)
> > +		return 0;
> > +
> > +	if (munmap((void*)m->revindex_map, m->revindex_len))
> > +		return -1;
> > +
> > +	m->revindex_map = NULL;
> > +	m->revindex_data = NULL;
> > +	m->revindex_len = 0;
> > +
> > +	return 0;
> > +}
>
> It's hard to imagine why munmap() would fail. But if it does, we should
> probably clear the struct fields anyway. I note that the matching code
> for a "struct packed_git" does not bother even checking the return value
> of munmap. Perhaps we should just do the same here.

I tend to agree that we should match the behavior of
"packfile.c:close_pack_revindex()" and just not check the return value
of munmap. Either the call to munmap() worked, and we shouldn't be
reading revindex_map anymore, or it didn't, and something else is
probably wrong enough with the original mmap call that we probably also
shouldn't be reading it.

> The packed_git version also returned early if revindex_map is NULL. Here
> the burden is placed on the caller (it's hard to tell if that matters
> since there aren't any callers yet, but it probably makes sense to push
> the check down into this function).

Yeah, I think that that function actually is doing the worst of both
worlds (which is to check p->revindex_map, but not p itself).

I modified the MIDX version to check both m and m->revindex_map (but I
agree it's hard to tell with the caller coming in a later series).

>
> > +uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
> > +{
> > +	if (!m->revindex_data)
> > +		BUG("pack_pos_to_midx: reverse index not yet loaded");
> > +	if (m->num_objects <= pos)
> > +		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
> > +	return get_be32((const char *)m->revindex_data + (pos * sizeof(uint32_t)));
> > +}
>
> OK, this one is just a direct read of the .rev data, like
> pack_pos_to_index() is. I think the final line can be simplified to:
>
>   return get_be32(m->revindex_data + pos);
>
> just like pack_pos_to_index(). (I suspect this is a leftover from the
> earlier version of your .rev series where the pointer was still a "void
> *").

Yes, definitely.

> Probably sizeof(*m->revindex_data) would be slightly nicer in the
> bsearch call (again, I suspect a holdover from when that was a void
> pointer).

Yes, exactly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 00/16] midx: implement a multi-pack reverse index
  2021-03-29 13:05   ` Jeff King
@ 2021-03-29 21:30     ` Junio C Hamano
  2021-03-29 21:37     ` Taylor Blau
  1 sibling, 0 replies; 171+ messages in thread
From: Junio C Hamano @ 2021-03-29 21:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, avarab, dstolee, jonathantanmy

Jeff King <peff@peff.net> writes:

> On Thu, Mar 11, 2021 at 12:04:31PM -0500, Taylor Blau wrote:
>
>> Here is another reroll of my series to implement a reverse index in
>> preparation for multi-pack reachability bitmaps. The previous version
>> was based on 'ds/chunked-file-api', but that topic has since been merged
>> to 'master'. This series is now built directly on top of 'master'.
>
> I gave the whole thing another careful read. Most of what I found were
> small nits, but enough that I think one more re-roll is worth it.

Thanks.

> The biggest question is what we want to happen next. As you note, the
> concept of a midx .rev file is useless until we have the matching
> .bitmap file. So we _could_ let this sit in next while the dependent
> bitmap topic is reviewed, and then merge them down together. But I'm
> inclined to treat this as an independent topic that can get merged to
> master on its own, since the early cleanups are valuable on their own,
> and the .rev parts at the end, even if dead, won't hurt anything.

It was my impression as well that the early clean-ups are worth on
their own.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes
  2021-03-29 12:53     ` Jeff King
@ 2021-03-29 21:30       ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 21:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 08:53:22AM -0400, Jeff King wrote:
> On Thu, Mar 11, 2021 at 12:05:38PM -0500, Taylor Blau wrote:
>
> > Implement the writing half of multi-pack reverse indexes. This is
> > nothing more than the format describe a few patches ago, with a new set
> > of helper functions that will be used to clear out stale .rev files
> > corresponding to old MIDXs.
>
> Looks good.
>
> > +struct clear_midx_data {
> > +	char *keep;
> > +	const char *ext;
> > +};
> > +
> > +static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
> > +				const char *file_name, void *_data)
>
> This will clean up _any_ stale midx .rev file. So even if we miss one
> when writing a new midx (due to a bug, race, power loss, etc), we'll
> catch it later.
>
> We _might_ want to also teach various tempfile-cleanup code run by gc to
> likewise look for unattached midx .rev files, but I don't think we
> necessarily have to do it now.

Agreed there on both counts.

> >  void clear_midx_file(struct repository *r)
> >  {
> >  	char *midx = get_midx_filename(r->objects->odb->path);
> > @@ -1049,6 +1162,8 @@ void clear_midx_file(struct repository *r)
> >  	if (remove_path(midx))
> >  		die(_("failed to clear multi-pack-index at %s"), midx);
> >
> > +	clear_midx_files_ext(r, ".rev", NULL);
> > +
> >  	free(midx);
>
> The sole caller now doesn't pass the "keep" hash, so we'd always delete
> all of them. I guess we'll see that change once somebody starts actually
> writing them.

That's right. I hope that the benefits of splitting the MIDX bitmaps
topic into two series has generally outweighed the drawbacks, but in
instances like these it can be kind of annoying.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp()
  2021-03-29 12:59     ` Jeff King
@ 2021-03-29 21:34       ` Taylor Blau
  2021-03-30  7:15         ` Jeff King
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 21:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 08:59:12AM -0400, Jeff King wrote:
> On Thu, Mar 11, 2021 at 12:05:42PM -0500, Taylor Blau wrote:
>
> > From: Jeff King <peff@peff.net>
> >
> > There is a lot of pointer dereferencing in the pre-image version of
> > 'midx_pack_order_cmp()', which this patch gets rid of.
> >
> > Instead of comparing the pack preferred-ness and then the pack id, both
> > of these checks are done at the same time by using the high-order bit of
> > the pack id to represent whether it's preferred. Then the pack id and
> > offset are compared as usual.
> >
> > This produces the same result so long as there are less than 2^31 packs,
> > which seems like a likely assumption to make in practice.
>
> Obviously this patch is brilliant. ;)

Obviously.

> Did we record any numbers to show the improvement here? I don't think it
> can be demonstrated with this series (since most of the code is dead),
> but I recall that this was motivated by a noticeable slowdown.

Looking through our messages, you wrote that this seemed to produce a
.8 second speed-up on a large-ish repository that we were testing.
That's not significant overall, the fact that we were spending so long
probably caught our attention when looking at a profiler.

I could go either way on mentioning it. It does feel a little like
cheating to say, "well, if you applied these other patches it would make
it about this much faster". So I'm mostly happy to just keep it vague
and say that it makes things a little faster, unless you feel strongly
otherwise.

> I briefly wondered whether the complicated midx_pack_order_cmp() in
> pack-revindex.c, which is used for the bsearch() there, could benefit
> from the same speedup. It's only log(n), of course, instead of n*log(n),
> but one might imagine making "n" calls to it. I don't think it makes
> sense, though. The pointer dereferencing there is into the midx mmap
> itself. Creating an auxiliary array would defeat the purpose.

Right.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 00/16] midx: implement a multi-pack reverse index
  2021-03-29 13:05   ` Jeff King
  2021-03-29 21:30     ` Junio C Hamano
@ 2021-03-29 21:37     ` Taylor Blau
  2021-03-30  7:15       ` Jeff King
  1 sibling, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-29 21:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 09:05:33AM -0400, Jeff King wrote:
> On Thu, Mar 11, 2021 at 12:04:31PM -0500, Taylor Blau wrote:
>
> > Here is another reroll of my series to implement a reverse index in
> > preparation for multi-pack reachability bitmaps. The previous version
> > was based on 'ds/chunked-file-api', but that topic has since been merged
> > to 'master'. This series is now built directly on top of 'master'.
>
> I gave the whole thing another careful read. Most of what I found were
> small nits, but enough that I think one more re-roll is worth it.

Thanks. I agree that another re-roll is worth it. I have one prepared
locally, and I just had one outstanding question in:

    https://lore.kernel.org/git/YGI6ySogGoYZi66A@nand.local/

that I'll wait on your reply to before sending a reroll.

> The biggest question is what we want to happen next. As you note, the
> concept of a midx .rev file is useless until we have the matching
> .bitmap file. So we _could_ let this sit in next while the dependent
> bitmap topic is reviewed, and then merge them down together. But I'm
> inclined to treat this as an independent topic that can get merged to
> master on its own, since the early cleanups are valuable on their own,
> and the .rev parts at the end, even if dead, won't hurt anything.

That matches what I was hoping for. I think the clean-ups are worth it
on their own, but I also think it's a good idea to take the whole
series, since it means there's one less long-running branch in flight
while we review the MIDX bitmaps topic.

(FWIW, I can also see an argument in the other direction along the lines
of "we may discover something later on that requires us to change the
way multi-pack .rev files work". I think that such an outcome is fairly
unlikely, but worth considering anyway).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands
  2021-03-29 20:38       ` Taylor Blau
@ 2021-03-30  7:04         ` Jeff King
  0 siblings, 0 replies; 171+ messages in thread
From: Jeff King @ 2021-03-30  7:04 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 04:38:33PM -0400, Taylor Blau wrote:

> On Mon, Mar 29, 2021 at 07:36:21AM -0400, Jeff King wrote:
> > This is definitely a harmless leak in the sense that we are going to
> > exit the program after midx_repack() returns anyway. But it might be
> > worth keeping things tidy, as we've recently seen a renewed effort to do
> > some leak-checking of the test suite. I _think_ we can just free the
> > options struct (even though we are still using the values themselves, we
> > don't care about the "struct options" anymore). But even if not, an
> > UNLEAK(options) annotation would do it.
> 
> I see what you're saying. Let me make sure that I got the right idea in
> mind after reading your email. I'm thinking of squashing the following
> diff into this patch. For what it's worth, it causes 'valgrind
> --leak-check=full ./git-multi-pack-index repack' to exit cleanly (when
> it didn't before).
> 
> Does this match your expectations?

Yes, though...

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 23e51dfeb4..a78640c061 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -56,9 +56,7 @@ static struct option common_opts[] = {
> 
>  static struct option *add_common_options(struct option *prev)
>  {
> -	struct option *with_common = parse_options_concat(common_opts, prev);
> -	free(prev);
> -	return with_common;
> +	return parse_options_concat(common_opts, prev);
>  }

This simplification is orthogonal to the leak, and I'd be OK if you
wanted to retain it as it was before (because it future-proofs against
adding more add_foo_options() later, though for now it is a useless
dup/free pair).

> @@ -123,6 +120,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
>  		usage_with_options(builtin_multi_pack_index_repack_usage,
>  				   options);
> 
> +	FREE_AND_NULL(options);
> +

And this is the leak fix I care about. We'd want the same thing in the
later caller that adds another use of add_common_options(), of course.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 08/16] midx: allow marking a pack as preferred
  2021-03-29 21:15       ` Taylor Blau
@ 2021-03-30  7:11         ` Jeff King
  0 siblings, 0 replies; 171+ messages in thread
From: Jeff King @ 2021-03-30  7:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 05:15:12PM -0400, Taylor Blau wrote:

> There are two solutions to the problem:
> 
>   - You could write the mtimes in the MIDX itself. This would give you a
>     single point of reference, and resolve the TOCTOU race I just
>     described.
> 
>   - Or, you could forget about mtimes entirely and let the MIDX dictate
>     the pack ordering itself. That resolves the race in a
>     similar-but-different way.
> 
> Of the two, I prefer the latter, but I think it introduces functionality
> that we don't necessarily need yet.

Yeah, I'd strongly favor the latter over the former. The reason to go
with the solution you have in this series is that it doesn't require
changing anything in the on-disk midx format, and we think it is good
enough. But once we are going to change the on-disk format, we might as
well give the writing side as much flexibility as possible.

Of course the mtimes themselves are really just numbers, so in a sense
the two are really equivalent. ;)

> That's because the objects within
> the packs are still ordered as such, and so the compression we get in
> the packs is just as good as it is for single-pack bitmaps. It's only at
> the objects between pack boundaries that any runs of 1s or 0s might be
> interrupted, but there are far fewer pack boundaries than objects, so it
> doesn't seem to matter in practice.

Right. The absolute worst case is a large number of single-object packs,
in which case the bitmap order becomes essentially random with respect
to history (because it would be sorted by sha1 of the packs).

The effect _might_ be measurable in more real-world cases, like say one
big pack and 100 pushes each with a handful of commits. The big pack
would be in good shape, but you have a lot of extra pack boundaries that
hurt the bitmap compression.

But in practice, generating bitmaps is expensive enough that you'd
probably want to roll up some of the packs anyway (and that is certainly
what we are doing at GitHub, using your "repack --geometric"). So you'd
end usually with one big pack representing most of history, and then a
handful of roll-up packs.

So I'm a little curious whether one could even measure the impact of,
say, 100 little packs. But not enough to even run the experiment,
because even that is not a case that is really that interesting.

> Anyway, I think that you know all of that already (mostly because we
> thought aloud together when I originally brought this up), but I figure
> that this detail may be interesting for other readers, too.

Indeed. And I know that you know everything I just wrote, but I agree
it's nice to get a record of these discussions onto the list. :)

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 00/16] midx: implement a multi-pack reverse index
  2021-03-29 21:37     ` Taylor Blau
@ 2021-03-30  7:15       ` Jeff King
  2021-03-30 13:37         ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-30  7:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 05:37:01PM -0400, Taylor Blau wrote:

> > The biggest question is what we want to happen next. As you note, the
> > concept of a midx .rev file is useless until we have the matching
> > .bitmap file. So we _could_ let this sit in next while the dependent
> > bitmap topic is reviewed, and then merge them down together. But I'm
> > inclined to treat this as an independent topic that can get merged to
> > master on its own, since the early cleanups are valuable on their own,
> > and the .rev parts at the end, even if dead, won't hurt anything.
> 
> That matches what I was hoping for. I think the clean-ups are worth it
> on their own, but I also think it's a good idea to take the whole
> series, since it means there's one less long-running branch in flight
> while we review the MIDX bitmaps topic.
> 
> (FWIW, I can also see an argument in the other direction along the lines
> of "we may discover something later on that requires us to change the
> way multi-pack .rev files work". I think that such an outcome is fairly
> unlikely, but worth considering anyway).

That would be my general worry, too, but in this case I am not too
concerned because I know the code has received substantial exercise
already on real-world production servers. So while we may clean up some
cosmetic bits or respond to review as it goes upstream, I'm much less
worried about seeing some brown-paper-bag bug that would be sufficient
to make us want to re-roll these .rev commits. And hopefully the
existing rounds have addressed the cosmetic/review bits.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp()
  2021-03-29 21:34       ` Taylor Blau
@ 2021-03-30  7:15         ` Jeff King
  0 siblings, 0 replies; 171+ messages in thread
From: Jeff King @ 2021-03-30  7:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, dstolee, gitster, jonathantanmy

On Mon, Mar 29, 2021 at 05:34:21PM -0400, Taylor Blau wrote:

> > Did we record any numbers to show the improvement here? I don't think it
> > can be demonstrated with this series (since most of the code is dead),
> > but I recall that this was motivated by a noticeable slowdown.
> 
> Looking through our messages, you wrote that this seemed to produce a
> .8 second speed-up on a large-ish repository that we were testing.
> That's not significant overall, the fact that we were spending so long
> probably caught our attention when looking at a profiler.

That sounds about right from my recollection.

> I could go either way on mentioning it. It does feel a little like
> cheating to say, "well, if you applied these other patches it would make
> it about this much faster". So I'm mostly happy to just keep it vague
> and say that it makes things a little faster, unless you feel strongly
> otherwise.

No, I don't feel strongly. I just wanted to give people reading a sense
of what to expect. Now we have.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v3 00/16] midx: implement a multi-pack reverse index
  2021-03-30  7:15       ` Jeff King
@ 2021-03-30 13:37         ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 13:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, avarab, dstolee, gitster, jonathantanmy

On Tue, Mar 30, 2021 at 03:15:02AM -0400, Jeff King wrote:
> On Mon, Mar 29, 2021 at 05:37:01PM -0400, Taylor Blau wrote:
> > (FWIW, I can also see an argument in the other direction along the lines
> > of "we may discover something later on that requires us to change the
> > way multi-pack .rev files work". I think that such an outcome is fairly
> > unlikely, but worth considering anyway).
>
> That would be my general worry, too, but in this case I am not too
> concerned because I know the code has received substantial exercise
> already on real-world production servers. So while we may clean up some
> cosmetic bits or respond to review as it goes upstream, I'm much less
> worried about seeing some brown-paper-bag bug that would be sufficient
> to make us want to re-roll these .rev commits. And hopefully the
> existing rounds have addressed the cosmetic/review bits.

Yes. Another benefit is that it should give us substantial confidence in
the correctness not just of this topic, but of the multi-pack bitmaps
that are built on top, too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 00/16] midx: implement a multi-pack reverse index
  2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
                   ` (12 preceding siblings ...)
  2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
@ 2021-03-30 15:03 ` Taylor Blau
  2021-03-30 15:03   ` [PATCH v4 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
                     ` (16 more replies)
  13 siblings, 17 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:03 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Here is another reroll of my series to implement a reverse index in
preparation for multi-pack reachability bitmaps.

This reroll differs only in the feedback I incorporated from Peff's review. They
are mostly cosmetic; the most substantial change being that the --preferred-pack
code now uses bsearch() to locate the name of the preferred pack (instead of
implementing a binary search itself).

I think that this version is ready to go. I would hope that it can head for
'master' and avoid sitting in 'next' forever (since it has some worthwhile
cleanups outside of preparing for MIDX bitmaps).

But in either case, this is the last prereq series before MIDX bitmaps, which
I'll send shortly (based on this one).

Jeff King (1):
  midx.c: improve cache locality in midx_pack_order_cmp()

Taylor Blau (15):
  builtin/multi-pack-index.c: inline 'flags' with options
  builtin/multi-pack-index.c: don't handle 'progress' separately
  builtin/multi-pack-index.c: define common usage with a macro
  builtin/multi-pack-index.c: split sub-commands
  builtin/multi-pack-index.c: don't enter bogus cmd_mode
  builtin/multi-pack-index.c: display usage on unrecognized command
  t/helper/test-read-midx.c: add '--show-objects'
  midx: allow marking a pack as preferred
  midx: don't free midx_name early
  midx: keep track of the checksum
  midx: make some functions non-static
  Documentation/technical: describe multi-pack reverse indexes
  pack-revindex: read multi-pack reverse indexes
  pack-write.c: extract 'write_rev_file_order'
  pack-revindex: write multi-pack reverse indexes

 Documentation/git-multi-pack-index.txt       |  14 +-
 Documentation/technical/multi-pack-index.txt |   5 +-
 Documentation/technical/pack-format.txt      |  83 +++++++
 builtin/multi-pack-index.c                   | 182 ++++++++++++---
 builtin/repack.c                             |   2 +-
 midx.c                                       | 219 +++++++++++++++++--
 midx.h                                       |  11 +-
 pack-revindex.c                              | 126 +++++++++++
 pack-revindex.h                              |  53 +++++
 pack-write.c                                 |  36 ++-
 pack.h                                       |   1 +
 packfile.c                                   |   3 +
 t/helper/test-read-midx.c                    |  24 +-
 t/t5319-multi-pack-index.sh                  |  43 ++++
 14 files changed, 734 insertions(+), 68 deletions(-)

Range-diff against v3:
 1:  43fc0ad276 !  1:  90e021725f builtin/multi-pack-index.c: inline 'flags' with options
    @@ Commit message
         'verify', etc.) will want to optionally change a set of shared flags
         that are eventually passed to the MIDX libraries.
     
    -    Right now, options and flags are handled separately. Inline them into
    -    the same structure so that sub-commands can more easily share the
    -    'flags' data.
    +    Right now, options and flags are handled separately. That's fine, since
    +    the options structure is never passed around. But a future patch will
    +    make it so that common options shared by all sub-commands are defined in
    +    a common location. That means that "flags" would have to become a global
    +    variable.
    +
    +    Group it with the options structure so that we reduce the number of
    +    global variables we have overall.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
 2:  181f11e4c5 =  2:  130c191b80 builtin/multi-pack-index.c: don't handle 'progress' separately
 3:  94c498f0e2 =  3:  5a274b9096 builtin/multi-pack-index.c: define common usage with a macro
 4:  d084f90466 !  4:  b8c89cc239 builtin/multi-pack-index.c: split sub-commands
    @@ builtin/multi-pack-index.c: static struct opts_multi_pack_index {
     -		OPT_FILENAME(0, "object-dir", &opts.object_dir,
     -		  N_("object directory containing set of packfile and pack-index pairs")),
     -		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
    -+	struct option *with_common = parse_options_concat(common_opts, prev);
    -+	free(prev);
    -+	return with_common;
    ++	return parse_options_concat(common_opts, prev);
     +}
     +
     +static int cmd_multi_pack_index_write(int argc, const char **argv)
    @@ builtin/multi-pack-index.c: static struct opts_multi_pack_index {
      		OPT_END(),
      	};
      
    -+	options = parse_options_dup(builtin_multi_pack_index_repack_options);
    -+	options = add_common_options(options);
    ++	options = add_common_options(builtin_multi_pack_index_repack_options);
     +
     +	argc = parse_options(argc, argv, NULL,
     +			     options,
    @@ builtin/multi-pack-index.c: static struct opts_multi_pack_index {
     +		usage_with_options(builtin_multi_pack_index_repack_usage,
     +				   options);
     +
    ++	FREE_AND_NULL(options);
    ++
     +	return midx_repack(the_repository, opts.object_dir,
     +			   (size_t)opts.batch_size, opts.flags);
     +}
 5:  bc3b6837f2 !  5:  d817920e2a builtin/multi-pack-index.c: don't enter bogus cmd_mode
    @@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_expire(int argc, con
      			     options, builtin_multi_pack_index_expire_usage,
      			     PARSE_OPT_KEEP_UNKNOWN);
     @@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_repack(int argc, const char **argv)
    - 	options = parse_options_dup(builtin_multi_pack_index_repack_options);
    - 	options = add_common_options(options);
    + 
    + 	options = add_common_options(builtin_multi_pack_index_repack_options);
      
     +	trace2_cmd_mode(argv[0]);
     +
 6:  f117e442c3 !  6:  604a02ce85 builtin/multi-pack-index.c: display usage on unrecognized command
    @@ Commit message
         more helpful:
     
             $ git.compile multi-pack-index bogus
    +        error: unrecognized subcommand: bogus
             usage: git multi-pack-index [<options>] write
                or: git multi-pack-index [<options>] verify
                or: git multi-pack-index [<options>] expire
    @@ builtin/multi-pack-index.c: int cmd_multi_pack_index(int argc, const char **argv
      	if (!strcmp(argv[0], "repack"))
      		return cmd_multi_pack_index_repack(argc, argv);
     @@ builtin/multi-pack-index.c: int cmd_multi_pack_index(int argc, const char **argv,
    + 		return cmd_multi_pack_index_verify(argc, argv);
      	else if (!strcmp(argv[0], "expire"))
      		return cmd_multi_pack_index_expire(argc, argv);
    - 	else
    +-	else
     -		die(_("unrecognized subcommand: %s"), argv[0]);
    ++	else {
     +usage:
    ++		error(_("unrecognized subcommand: %s"), argv[0]);
     +		usage_with_options(builtin_multi_pack_index_usage,
     +				   builtin_multi_pack_index_options);
    ++	}
      }
 7:  ae85a68ef2 =  7:  37e073ea27 t/helper/test-read-midx.c: add '--show-objects'
 8:  30194a6786 !  8:  d061828e7e midx: allow marking a pack as preferred
    @@ builtin/multi-pack-index.c: static struct option *add_common_options(struct opti
     +		OPT_END(),
     +	};
     +
    -+	options = parse_options_dup(builtin_multi_pack_index_write_options);
    -+	options = add_common_options(options);
    ++	options = add_common_options(builtin_multi_pack_index_write_options);
      
      	trace2_cmd_mode(argv[0]);
      
    @@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_write(int argc, cons
      				   options);
      
     -	return write_midx_file(opts.object_dir, opts.flags);
    ++	FREE_AND_NULL(options);
    ++
     +	return write_midx_file(opts.object_dir, opts.preferred_pack,
     +			       opts.flags);
      }
    @@ midx.c: static int pack_info_compare(const void *_a, const void *_b)
      	return strcmp(a->pack_name, b->pack_name);
      }
      
    -+static int lookup_idx_or_pack_name(struct pack_info *info,
    -+				   uint32_t nr,
    -+				   const char *pack_name)
    ++static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
     +{
    -+	uint32_t lo = 0, hi = nr;
    -+	while (lo < hi) {
    -+		uint32_t mi = lo + (hi - lo) / 2;
    -+		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
    -+		if (cmp < 0)
    -+			hi = mi;
    -+		else if (cmp > 0)
    -+			lo = mi + 1;
    -+		else
    -+			return mi;
    -+	}
    -+	return -1;
    ++	const char *pack_name = _va;
    ++	const struct pack_info *compar = _vb;
    ++
    ++	return cmp_idx_or_pack_name(pack_name, compar->pack_name);
     +}
     +
      struct write_midx_context {
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      
     +	/* Check that the preferred pack wasn't expired (if given). */
     +	if (preferred_pack_name) {
    -+		int preferred_idx = lookup_idx_or_pack_name(ctx.info,
    -+							    ctx.nr,
    -+							    preferred_pack_name);
    -+		if (preferred_idx < 0)
    ++		struct pack_info *preferred = bsearch(preferred_pack_name,
    ++						      ctx.info, ctx.nr,
    ++						      sizeof(*ctx.info),
    ++						      idx_or_pack_name_cmp);
    ++
    ++		if (!preferred)
     +			warning(_("unknown preferred pack: '%s'"),
     +				preferred_pack_name);
     +		else {
    -+			uint32_t orig = ctx.info[preferred_idx].orig_pack_int_id;
    -+			uint32_t perm = ctx.pack_perm[orig];
    -+
    ++			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
     +			if (perm == PACK_EXPIRED)
     +				warning(_("preferred pack '%s' is expired"),
     +					preferred_pack_name);
    @@ midx.h: int fill_midx_entry(struct repository *r, const struct object_id *oid, s
      int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
     
      ## t/t5319-multi-pack-index.sh ##
    -@@ t/t5319-multi-pack-index.sh: midx_read_expect () {
    - 	test_cmp expect actual
    - }
    - 
    -+midx_expect_object_offset () {
    -+	OID="$1"
    -+	OFFSET="$2"
    -+	OBJECT_DIR="$3"
    -+	test-tool read-midx --show-objects $OBJECT_DIR >actual &&
    -+	grep "^$OID $OFFSET" actual
    -+}
    -+
    - test_expect_success 'setup' '
    - 	test_oid_cache <<-EOF
    - 	idxoff sha1:2999
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'warn on improper hash version' '
      	)
      '
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'warn on improper hash version'
     +
     +		# Set up two packs, duplicating the object "B" at different
     +		# offsets.
    ++		#
    ++		# Note that the "BC" pack (the one we choose as preferred) sorts
    ++		# lexically after the "AB" pack, meaning that omitting the
    ++		# --preferred-pack argument would cause this test to fail (since
    ++		# the MIDX code would select the copy of "b" in the "AB" pack).
     +		git pack-objects objects/pack/test-AB <<-EOF &&
     +		$a
     +		$b
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'warn on improper hash version'
     +			write --preferred-pack=test-BC-$bc.idx 2>err &&
     +		test_must_be_empty err &&
     +
    ++		echo hi &&
    ++		test-tool read-midx --show-objects objects >out &&
    ++
     +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
     +			cut -d" " -f1) &&
    -+		midx_expect_object_offset $b $ofs objects
    ++		printf "%s %s\tobjects/pack/test-BC-%s.pack\n" \
    ++			"$b" "$ofs" "$bc" >expect &&
    ++		grep ^$b out >actual &&
    ++
    ++		test_cmp expect actual
     +	)
     +'
      
 9:  5c5aca761a =  9:  33b8af97e7 midx: don't free midx_name early
10:  a22a1463a5 = 10:  3fc9b83dc6 midx: keep track of the checksum
11:  efa54479b1 = 11:  2ada397320 midx: make some functions non-static
12:  4745bb8590 ! 12:  8bb3dd24a7 Documentation/technical: describe multi-pack reverse indexes
    @@ Documentation/technical/pack-format.txt: CHUNK DATA:
     +position.
     +
     +One solution is to let bits occupy the same position in the oid-sorted
    -+index stored by the MIDX. But because oids are effectively random, there
    ++index stored by the MIDX. But because oids are effectively random, their
     +resulting reachability bitmaps would have no locality, and thus compress
     +poorly. (This is the reason that single-pack bitmaps use the pack
     +ordering, and not the .idx ordering, for the same purpose.)
    @@ Documentation/technical/pack-format.txt: CHUNK DATA:
     +order in the actual packfile.
     +
     +Given the list of packs and their counts of objects, you can
    -+na&iuml;vely reconstruct that pseudo-pack ordering (e.g., the object at
    ++naïvely reconstruct that pseudo-pack ordering (e.g., the object at
     +position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
     +slots). But there's a catch. Objects may be duplicated between packs, in
     +which case the MIDX only stores one pointer to the object (and thus we'd
    @@ Documentation/technical/pack-format.txt: CHUNK DATA:
     +too.
     +
     +Objects from the MIDX are ordered as follows to string together the
    -+pseudo-pack. Let _pack(o)_ return the pack from which _o_ was selected
    ++pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
     +by the MIDX, and define an ordering of packs based on their numeric ID
    -+(as stored by the MIDX). Let _offset(o)_ return the object offset of _o_
    -+within _pack(o)_. Then, compare _o~1~_ and _o~2~_ as follows:
    ++(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
    ++within `pack(o)`. Then, compare `o1` and `o2` as follows:
     +
    -+  - If one of _pack(o~1~)_ and _pack(o~2~)_ is preferred and the other
    ++  - If one of `pack(o1)` and `pack(o2)` is preferred and the other
     +    is not, then the preferred one sorts first.
     ++
     +(This is a detail that allows the MIDX bitmap to determine which
     +pack should be used by the pack-reuse mechanism, since it can ask
     +the MIDX for the pack containing the object at bit position 0).
     +
    -+  - If _pack(o~1~) &ne; pack(o~2~)_, then sort the two objects in
    -+    descending order based on the pack ID.
    ++  - If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending
    ++    order based on the pack ID.
     +
    -+  - Otherwise, _pack(o~1~) &equals; pack(o~2~)_, and the objects are
    -+    sorted in pack-order (i.e., _o~1~_ sorts ahead of _o~2~_ exactly
    -+    when _offset(o~1~) &lt; offset(o~2~)_).
    ++  - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in
    ++    pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1)
    ++    < offset(o2)`).
     +
     +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
     +objects in packs stored by the MIDX, laid out in pack order, and the
13:  a6ebd4be91 ! 13:  c070b9c99f pack-revindex: read multi-pack reverse indexes
    @@ pack-revindex.c: int load_pack_revindex(struct packed_git *p)
     +
     +int close_midx_revindex(struct multi_pack_index *m)
     +{
    -+	if (!m)
    ++	if (!m || !m->revindex_map)
     +		return 0;
     +
    -+	if (munmap((void*)m->revindex_map, m->revindex_len))
    -+		return -1;
    ++	munmap((void*)m->revindex_map, m->revindex_len);
     +
     +	m->revindex_map = NULL;
     +	m->revindex_data = NULL;
    @@ pack-revindex.c: off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
     +		BUG("pack_pos_to_midx: reverse index not yet loaded");
     +	if (m->num_objects <= pos)
     +		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
    -+	return get_be32((const char *)m->revindex_data + (pos * sizeof(uint32_t)));
    ++	return get_be32(m->revindex_data + pos);
     +}
     +
     +struct midx_pack_key {
    @@ pack-revindex.c: off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
     +	key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
     +
     +	found = bsearch(&key, m->revindex_data, m->num_objects,
    -+			sizeof(uint32_t), midx_pack_order_cmp);
    ++			sizeof(*m->revindex_data), midx_pack_order_cmp);
     +
     +	if (!found)
     +		return error("bad offset for revindex");
14:  f5314f1822 = 14:  9f40019eb3 pack-write.c: extract 'write_rev_file_order'
15:  fa3acb5d5a = 15:  47409cc508 pack-revindex: write multi-pack reverse indexes
16:  550e785f10 = 16:  7b793e7d09 midx.c: improve cache locality in midx_pack_order_cmp()
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 01/16] builtin/multi-pack-index.c: inline 'flags' with options
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
@ 2021-03-30 15:03   ` Taylor Blau
  2021-03-30 15:03   ` [PATCH v4 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:03 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Subcommands of the 'git multi-pack-index' command (e.g., 'write',
'verify', etc.) will want to optionally change a set of shared flags
that are eventually passed to the MIDX libraries.

Right now, options and flags are handled separately. That's fine, since
the options structure is never passed around. But a future patch will
make it so that common options shared by all sub-commands are defined in
a common location. That means that "flags" would have to become a global
variable.

Group it with the options structure so that we reduce the number of
global variables we have overall.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5bf88cd2a8..4a0ddb06c4 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -14,13 +14,12 @@ static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
 	int progress;
+	unsigned flags;
 } opts;
 
 int cmd_multi_pack_index(int argc, const char **argv,
 			 const char *prefix)
 {
-	unsigned flags = 0;
-
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
@@ -40,7 +39,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
 	if (opts.progress)
-		flags |= MIDX_PROGRESS;
+		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
@@ -55,16 +54,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	if (!strcmp(argv[0], "repack"))
 		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, flags);
+			(size_t)opts.batch_size, opts.flags);
 	if (opts.batch_size)
 		die(_("--batch-size option is only for 'repack' subcommand"));
 
 	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, flags);
+		return write_midx_file(opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, flags);
+		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
 	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, flags);
+		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
 
 	die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
  2021-03-30 15:03   ` [PATCH v4 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
@ 2021-03-30 15:03   ` Taylor Blau
  2021-03-30 15:03   ` [PATCH v4 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:03 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Now that there is a shared 'flags' member in the options structure,
there is no need to keep track of whether to force progress or not,
since ultimately the decision of whether or not to show a progress meter
is controlled by a bit in the flags member.

Manipulate that bit directly, and drop the now-unnecessary 'progress'
field while we're at it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 4a0ddb06c4..c70f020d8f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,7 +13,6 @@ static char const * const builtin_multi_pack_index_usage[] = {
 static struct opts_multi_pack_index {
 	const char *object_dir;
 	unsigned long batch_size;
-	int progress;
 	unsigned flags;
 } opts;
 
@@ -23,7 +22,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
+		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
@@ -31,15 +30,14 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	git_config(git_default_config, NULL);
 
-	opts.progress = isatty(2);
+	if (isatty(2))
+		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
 			     builtin_multi_pack_index_usage, 0);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
-	if (opts.progress)
-		opts.flags |= MIDX_PROGRESS;
 
 	if (argc == 0)
 		usage_with_options(builtin_multi_pack_index_usage,
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 03/16] builtin/multi-pack-index.c: define common usage with a macro
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
  2021-03-30 15:03   ` [PATCH v4 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
  2021-03-30 15:03   ` [PATCH v4 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
@ 2021-03-30 15:03   ` Taylor Blau
  2021-03-30 15:03   ` [PATCH v4 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:03 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Factor out the usage message into pieces corresponding to each mode.
This avoids options specific to one sub-command from being shared with
another in the usage.

A subsequent commit will use these #define macros to have usage
variables for each sub-command without duplicating their contents.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c70f020d8f..eea498e026 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,8 +5,23 @@
 #include "midx.h"
 #include "trace2.h"
 
+#define BUILTIN_MIDX_WRITE_USAGE \
+	N_("git multi-pack-index [<options>] write")
+
+#define BUILTIN_MIDX_VERIFY_USAGE \
+	N_("git multi-pack-index [<options>] verify")
+
+#define BUILTIN_MIDX_EXPIRE_USAGE \
+	N_("git multi-pack-index [<options>] expire")
+
+#define BUILTIN_MIDX_REPACK_USAGE \
+	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
+
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
+	BUILTIN_MIDX_WRITE_USAGE,
+	BUILTIN_MIDX_VERIFY_USAGE,
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	BUILTIN_MIDX_REPACK_USAGE,
 	NULL
 };
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 04/16] builtin/multi-pack-index.c: split sub-commands
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (2 preceding siblings ...)
  2021-03-30 15:03   ` [PATCH v4 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
@ 2021-03-30 15:03   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:03 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
"write", "repack", etc.) separately from one another. This allows
sub-commands with unique options, without forcing cmd_multi_pack_index()
to reject invalid combinations itself.

This comes at the cost of some duplication and boilerplate. Luckily, the
duplication is reduced to a minimum, since common options are shared
among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
retain the common options, too, since this builtin accepts common
options on either side of the sub-command).

Roughly speaking, cmd_multi_pack_index() parses options (including
common ones), and stops at the first non-option, which is the
sub-command. It then dispatches to the appropriate sub-command, which
parses the remaining options (also including common options).

Unknown options are kept by the sub-commands in order to detect their
presence (and complain that too many arguments were given).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 130 ++++++++++++++++++++++++++++++-------
 1 file changed, 105 insertions(+), 25 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index eea498e026..a78640c061 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -17,6 +17,22 @@
 #define BUILTIN_MIDX_REPACK_USAGE \
 	N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
 
+static char const * const builtin_multi_pack_index_write_usage[] = {
+	BUILTIN_MIDX_WRITE_USAGE,
+	NULL
+};
+static char const * const builtin_multi_pack_index_verify_usage[] = {
+	BUILTIN_MIDX_VERIFY_USAGE,
+	NULL
+};
+static char const * const builtin_multi_pack_index_expire_usage[] = {
+	BUILTIN_MIDX_EXPIRE_USAGE,
+	NULL
+};
+static char const * const builtin_multi_pack_index_repack_usage[] = {
+	BUILTIN_MIDX_REPACK_USAGE,
+	NULL
+};
 static char const * const builtin_multi_pack_index_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
 	BUILTIN_MIDX_VERIFY_USAGE,
@@ -31,25 +47,98 @@ static struct opts_multi_pack_index {
 	unsigned flags;
 } opts;
 
-int cmd_multi_pack_index(int argc, const char **argv,
-			 const char *prefix)
+static struct option common_opts[] = {
+	OPT_FILENAME(0, "object-dir", &opts.object_dir,
+	  N_("object directory containing set of packfile and pack-index pairs")),
+	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	OPT_END(),
+};
+
+static struct option *add_common_options(struct option *prev)
 {
-	static struct option builtin_multi_pack_index_options[] = {
-		OPT_FILENAME(0, "object-dir", &opts.object_dir,
-		  N_("object directory containing set of packfile and pack-index pairs")),
-		OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	return parse_options_concat(common_opts, prev);
+}
+
+static int cmd_multi_pack_index_write(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_write_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_write_usage,
+				   options);
+
+	return write_midx_file(opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_verify(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_verify_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_verify_usage,
+				   options);
+
+	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_expire(int argc, const char **argv)
+{
+	struct option *options = common_opts;
+
+	argc = parse_options(argc, argv, NULL,
+			     options, builtin_multi_pack_index_expire_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_expire_usage,
+				   options);
+
+	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_repack(int argc, const char **argv)
+{
+	struct option *options;
+	static struct option builtin_multi_pack_index_repack_options[] = {
 		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
 	};
 
+	options = add_common_options(builtin_multi_pack_index_repack_options);
+
+	argc = parse_options(argc, argv, NULL,
+			     options,
+			     builtin_multi_pack_index_repack_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+	if (argc)
+		usage_with_options(builtin_multi_pack_index_repack_usage,
+				   options);
+
+	FREE_AND_NULL(options);
+
+	return midx_repack(the_repository, opts.object_dir,
+			   (size_t)opts.batch_size, opts.flags);
+}
+
+int cmd_multi_pack_index(int argc, const char **argv,
+			 const char *prefix)
+{
+	struct option *builtin_multi_pack_index_options = common_opts;
+
 	git_config(git_default_config, NULL);
 
 	if (isatty(2))
 		opts.flags |= MIDX_PROGRESS;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
-			     builtin_multi_pack_index_usage, 0);
+			     builtin_multi_pack_index_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (!opts.object_dir)
 		opts.object_dir = get_object_directory();
@@ -58,25 +147,16 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		usage_with_options(builtin_multi_pack_index_usage,
 				   builtin_multi_pack_index_options);
 
-	if (argc > 1) {
-		die(_("too many arguments"));
-		return 1;
-	}
-
 	trace2_cmd_mode(argv[0]);
 
 	if (!strcmp(argv[0], "repack"))
-		return midx_repack(the_repository, opts.object_dir,
-			(size_t)opts.batch_size, opts.flags);
-	if (opts.batch_size)
-		die(_("--batch-size option is only for 'repack' subcommand"));
-
-	if (!strcmp(argv[0], "write"))
-		return write_midx_file(opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "verify"))
-		return verify_midx_file(the_repository, opts.object_dir, opts.flags);
-	if (!strcmp(argv[0], "expire"))
-		return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
-
-	die(_("unrecognized subcommand: %s"), argv[0]);
+		return cmd_multi_pack_index_repack(argc, argv);
+	else if (!strcmp(argv[0], "write"))
+		return cmd_multi_pack_index_write(argc, argv);
+	else if (!strcmp(argv[0], "verify"))
+		return cmd_multi_pack_index_verify(argc, argv);
+	else if (!strcmp(argv[0], "expire"))
+		return cmd_multi_pack_index_expire(argc, argv);
+	else
+		die(_("unrecognized subcommand: %s"), argv[0]);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (3 preceding siblings ...)
  2021-03-30 15:03   ` [PATCH v4 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Even before the recent refactoring, 'git multi-pack-index' calls
'trace2_cmd_mode()' before verifying that the sub-command is recognized.

Push this call down into the individual sub-commands so that we don't
enter a bogus command mode.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index a78640c061..b590c4fc88 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -63,6 +63,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_write_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -77,6 +79,8 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_verify_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -91,6 +95,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
 {
 	struct option *options = common_opts;
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options, builtin_multi_pack_index_expire_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
@@ -112,6 +118,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 
 	options = add_common_options(builtin_multi_pack_index_repack_options);
 
+	trace2_cmd_mode(argv[0]);
+
 	argc = parse_options(argc, argv, NULL,
 			     options,
 			     builtin_multi_pack_index_repack_usage,
@@ -147,8 +155,6 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		usage_with_options(builtin_multi_pack_index_usage,
 				   builtin_multi_pack_index_options);
 
-	trace2_cmd_mode(argv[0]);
-
 	if (!strcmp(argv[0], "repack"))
 		return cmd_multi_pack_index_repack(argc, argv);
 	else if (!strcmp(argv[0], "write"))
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 06/16] builtin/multi-pack-index.c: display usage on unrecognized command
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (4 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

When given a sub-command that it doesn't understand, 'git
multi-pack-index' dies with the following message:

    $ git multi-pack-index bogus
    fatal: unrecognized subcommand: bogus

Instead of 'die()'-ing, we can display the usage text, which is much
more helpful:

    $ git.compile multi-pack-index bogus
    error: unrecognized subcommand: bogus
    usage: git multi-pack-index [<options>] write
       or: git multi-pack-index [<options>] verify
       or: git multi-pack-index [<options>] expire
       or: git multi-pack-index [<options>] repack [--batch-size=<size>]

        --object-dir <file>   object directory containing set of packfile and pack-index pairs
        --progress            force progress reporting

While we're at it, clean up some duplication between the "no sub-command"
and "unrecognized sub-command" conditionals.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/multi-pack-index.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index b590c4fc88..8711174fae 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -152,8 +152,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		opts.object_dir = get_object_directory();
 
 	if (argc == 0)
-		usage_with_options(builtin_multi_pack_index_usage,
-				   builtin_multi_pack_index_options);
+		goto usage;
 
 	if (!strcmp(argv[0], "repack"))
 		return cmd_multi_pack_index_repack(argc, argv);
@@ -163,6 +162,10 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		return cmd_multi_pack_index_verify(argc, argv);
 	else if (!strcmp(argv[0], "expire"))
 		return cmd_multi_pack_index_expire(argc, argv);
-	else
-		die(_("unrecognized subcommand: %s"), argv[0]);
+	else {
+usage:
+		error(_("unrecognized subcommand: %s"), argv[0]);
+		usage_with_options(builtin_multi_pack_index_usage,
+				   builtin_multi_pack_index_options);
+	}
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 07/16] t/helper/test-read-midx.c: add '--show-objects'
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (5 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 08/16] midx: allow marking a pack as preferred Taylor Blau
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

The 'read-midx' helper is used in places like t5319 to display basic
information about a multi-pack-index.

In the next patch, the MIDX writing machinery will learn a new way to
choose from which pack an object is selected when multiple copies of
that object exist.

To disambiguate which pack introduces an object so that this feature can
be tested, add a '--show-objects' option which displays additional
information about each object in the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 2430880f78..7c2eb11a8e 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -4,7 +4,7 @@
 #include "repository.h"
 #include "object-store.h"
 
-static int read_midx_file(const char *object_dir)
+static int read_midx_file(const char *object_dir, int show_objects)
 {
 	uint32_t i;
 	struct multi_pack_index *m;
@@ -43,13 +43,29 @@ static int read_midx_file(const char *object_dir)
 
 	printf("object-dir: %s\n", m->object_dir);
 
+	if (show_objects) {
+		struct object_id oid;
+		struct pack_entry e;
+
+		for (i = 0; i < m->num_objects; i++) {
+			nth_midxed_object_oid(&oid, m, i);
+			fill_midx_entry(the_repository, &oid, &e, m);
+
+			printf("%s %"PRIu64"\t%s\n",
+			       oid_to_hex(&oid), e.offset, e.p->pack_name);
+		}
+		return 0;
+	}
+
 	return 0;
 }
 
 int cmd__read_midx(int argc, const char **argv)
 {
-	if (argc != 2)
-		usage("read-midx <object-dir>");
+	if (!(argc == 2 || argc == 3))
+		usage("read-midx [--show-objects] <object-dir>");
 
-	return read_midx_file(argv[1]);
+	if (!strcmp(argv[1], "--show-objects"))
+		return read_midx_file(argv[2], 1);
+	return read_midx_file(argv[1], 0);
 }
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 08/16] midx: allow marking a pack as preferred
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (6 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-04-01  0:32     ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 09/16] midx: don't free midx_name early Taylor Blau
                     ` (8 subsequent siblings)
  16 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.

Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).

To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.

No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).

[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt       | 14 +++-
 Documentation/technical/multi-pack-index.txt |  5 +-
 builtin/multi-pack-index.c                   | 19 ++++-
 builtin/repack.c                             |  2 +-
 midx.c                                       | 82 +++++++++++++++++---
 midx.h                                       |  2 +-
 t/t5319-multi-pack-index.sh                  | 43 ++++++++++
 7 files changed, 149 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index eb0caa0439..ffd601bc17 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
+'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
+	[--preferred-pack=<pack>] <subcommand>
 
 DESCRIPTION
 -----------
@@ -30,7 +31,16 @@ OPTIONS
 The following subcommands are available:
 
 write::
-	Write a new MIDX file.
+	Write a new MIDX file. The following options are available for
+	the `write` sub-command:
++
+--
+	--preferred-pack=<pack>::
+		Optionally specify the tie-breaking pack used when
+		multiple packs contain the same object. If not given,
+		ties are broken in favor of the pack with the lowest
+		mtime.
+--
 
 verify::
 	Verify the contents of the MIDX file.
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index e8e377a59f..fb688976c4 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -43,8 +43,9 @@ Design Details
   a change in format.
 
 - The MIDX keeps only one record per object ID. If an object appears
-  in multiple packfiles, then the MIDX selects the copy in the most-
-  recently modified packfile.
+  in multiple packfiles, then the MIDX selects the copy in the
+  preferred packfile, otherwise selecting from the most-recently
+  modified packfile.
 
 - If there exist packfiles in the pack directory not registered in
   the MIDX, then those packfiles are loaded into the `packed_git`
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 8711174fae..5d3ea445fd 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -4,9 +4,10 @@
 #include "parse-options.h"
 #include "midx.h"
 #include "trace2.h"
+#include "object-store.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
@@ -43,6 +44,7 @@ static char const * const builtin_multi_pack_index_usage[] = {
 
 static struct opts_multi_pack_index {
 	const char *object_dir;
+	const char *preferred_pack;
 	unsigned long batch_size;
 	unsigned flags;
 } opts;
@@ -61,7 +63,15 @@ static struct option *add_common_options(struct option *prev)
 
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
-	struct option *options = common_opts;
+	struct option *options;
+	static struct option builtin_multi_pack_index_write_options[] = {
+		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
+			   N_("preferred-pack"),
+			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_END(),
+	};
+
+	options = add_common_options(builtin_multi_pack_index_write_options);
 
 	trace2_cmd_mode(argv[0]);
 
@@ -72,7 +82,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_write_usage,
 				   options);
 
-	return write_midx_file(opts.object_dir, opts.flags);
+	FREE_AND_NULL(options);
+
+	return write_midx_file(opts.object_dir, opts.preferred_pack,
+			       opts.flags);
 }
 
 static int cmd_multi_pack_index_verify(int argc, const char **argv)
diff --git a/builtin/repack.c b/builtin/repack.c
index 6ce2556c9e..2847fdfbab 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -721,7 +721,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	remove_temporary_files();
 
 	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), 0);
+		write_midx_file(get_object_directory(), NULL, 0);
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/midx.c b/midx.c
index becfafe65e..4a9d84352c 100644
--- a/midx.c
+++ b/midx.c
@@ -431,6 +431,14 @@ static int pack_info_compare(const void *_a, const void *_b)
 	return strcmp(a->pack_name, b->pack_name);
 }
 
+static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
+{
+	const char *pack_name = _va;
+	const struct pack_info *compar = _vb;
+
+	return cmp_idx_or_pack_name(pack_name, compar->pack_name);
+}
+
 struct write_midx_context {
 	struct pack_info *info;
 	uint32_t nr;
@@ -445,6 +453,8 @@ struct write_midx_context {
 	uint32_t *pack_perm;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
+
+	int preferred_pack_idx;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -489,6 +499,7 @@ struct pack_midx_entry {
 	uint32_t pack_int_id;
 	time_t pack_mtime;
 	uint64_t offset;
+	unsigned preferred : 1;
 };
 
 static int midx_oid_compare(const void *_a, const void *_b)
@@ -500,6 +511,12 @@ static int midx_oid_compare(const void *_a, const void *_b)
 	if (cmp)
 		return cmp;
 
+	/* Sort objects in a preferred pack first when multiple copies exist. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
 	if (a->pack_mtime > b->pack_mtime)
 		return -1;
 	else if (a->pack_mtime < b->pack_mtime)
@@ -527,7 +544,8 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 static void fill_pack_entry(uint32_t pack_int_id,
 			    struct packed_git *p,
 			    uint32_t cur_object,
-			    struct pack_midx_entry *entry)
+			    struct pack_midx_entry *entry,
+			    int preferred)
 {
 	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
 		die(_("failed to locate object %d in packfile"), cur_object);
@@ -536,6 +554,7 @@ static void fill_pack_entry(uint32_t pack_int_id,
 	entry->pack_mtime = p->mtime;
 
 	entry->offset = nth_packed_object_offset(p, cur_object);
+	entry->preferred = !!preferred;
 }
 
 /*
@@ -552,7 +571,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
 static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  struct pack_info *info,
 						  uint32_t nr_packs,
-						  uint32_t *nr_objects)
+						  uint32_t *nr_objects,
+						  int preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
 	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@@ -589,12 +609,17 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 				nth_midxed_pack_midx_entry(m,
 							   &entries_by_fanout[nr_fanout],
 							   cur_object);
+				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+					entries_by_fanout[nr_fanout].preferred = 1;
+				else
+					entries_by_fanout[nr_fanout].preferred = 0;
 				nr_fanout++;
 			}
 		}
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			uint32_t start = 0, end;
+			int preferred = cur_pack == preferred_pack;
 
 			if (cur_fanout)
 				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
@@ -602,7 +627,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 
 			for (cur_object = start; cur_object < end; cur_object++) {
 				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
-				fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
+				fill_pack_entry(cur_pack,
+						info[cur_pack].p,
+						cur_object,
+						&entries_by_fanout[nr_fanout],
+						preferred);
 				nr_fanout++;
 			}
 		}
@@ -777,7 +806,9 @@ static int write_midx_large_offsets(struct hashfile *f,
 }
 
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
-			       struct string_list *packs_to_drop, unsigned flags)
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       unsigned flags)
 {
 	char *midx_name;
 	uint32_t i;
@@ -828,7 +859,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
+	ctx.preferred_pack_idx = -1;
+	if (preferred_pack_name) {
+		for (i = 0; i < ctx.nr; i++) {
+			if (!cmp_idx_or_pack_name(preferred_pack_name,
+						  ctx.info[i].pack_name)) {
+				ctx.preferred_pack_idx = i;
+				break;
+			}
+		}
+	}
+
+	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
+					 ctx.preferred_pack_idx);
 
 	ctx.large_offsets_needed = 0;
 	for (i = 0; i < ctx.entries_nr; i++) {
@@ -889,6 +932,24 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
 	}
 
+	/* Check that the preferred pack wasn't expired (if given). */
+	if (preferred_pack_name) {
+		struct pack_info *preferred = bsearch(preferred_pack_name,
+						      ctx.info, ctx.nr,
+						      sizeof(*ctx.info),
+						      idx_or_pack_name_cmp);
+
+		if (!preferred)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+		else {
+			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+			if (perm == PACK_EXPIRED)
+				warning(_("preferred pack '%s' is expired"),
+					preferred_pack_name);
+		}
+	}
+
 	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
@@ -947,9 +1008,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	return result;
 }
 
-int write_midx_file(const char *object_dir, unsigned flags)
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, flags);
+	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+				   flags);
 }
 
 void clear_midx_file(struct repository *r)
@@ -1184,7 +1248,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	free(count);
 
 	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
+		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1373,7 +1437,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, flags);
+	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
diff --git a/midx.h b/midx.h
index b18cf53bc4..e7fea61109 100644
--- a/midx.h
+++ b/midx.h
@@ -47,7 +47,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
 int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-int write_midx_file(const char *object_dir, unsigned flags);
+int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index b4afab1dfc..031a5570c0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -234,6 +234,49 @@ test_expect_success 'warn on improper hash version' '
 	)
 '
 
+test_expect_success 'midx picks objects from preferred pack' '
+	test_when_finished rm -rf preferred.git &&
+	git init --bare preferred.git &&
+	(
+		cd preferred.git &&
+
+		a=$(echo "a" | git hash-object -w --stdin) &&
+		b=$(echo "b" | git hash-object -w --stdin) &&
+		c=$(echo "c" | git hash-object -w --stdin) &&
+
+		# Set up two packs, duplicating the object "B" at different
+		# offsets.
+		#
+		# Note that the "BC" pack (the one we choose as preferred) sorts
+		# lexically after the "AB" pack, meaning that omitting the
+		# --preferred-pack argument would cause this test to fail (since
+		# the MIDX code would select the copy of "b" in the "AB" pack).
+		git pack-objects objects/pack/test-AB <<-EOF &&
+		$a
+		$b
+		EOF
+		bc=$(git pack-objects objects/pack/test-BC <<-EOF
+		$b
+		$c
+		EOF
+		) &&
+
+		git multi-pack-index --object-dir=objects \
+			write --preferred-pack=test-BC-$bc.idx 2>err &&
+		test_must_be_empty err &&
+
+		echo hi &&
+		test-tool read-midx --show-objects objects >out &&
+
+		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
+			cut -d" " -f1) &&
+		printf "%s %s\tobjects/pack/test-BC-%s.pack\n" \
+			"$b" "$ofs" "$bc" >expect &&
+		grep ^$b out >actual &&
+
+		test_cmp expect actual
+	)
+'
 
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 09/16] midx: don't free midx_name early
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (7 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 08/16] midx: allow marking a pack as preferred Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 10/16] midx: keep track of the checksum Taylor Blau
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

A subsequent patch will need to refer back to 'midx_name' later on in
the function. In fact, this variable is already free()'d later on, so
this makes the later free() no longer redundant.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/midx.c b/midx.c
index 4a9d84352c..3edde2b68d 100644
--- a/midx.c
+++ b/midx.c
@@ -956,7 +956,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
-	FREE_AND_NULL(midx_name);
 
 	if (ctx.m)
 		close_midx(ctx.m);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 10/16] midx: keep track of the checksum
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (8 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 09/16] midx: don't free midx_name early Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 11/16] midx: make some functions non-static Taylor Blau
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

write_midx_internal() uses a hashfile to write the multi-pack index, but
discards its checksum. This makes sense, since nothing that takes place
after writing the MIDX cares about its checksum.

That is about to change in a subsequent patch, when the optional
reverse index corresponding to the MIDX will want to include the MIDX's
checksum.

Store the checksum of the MIDX in preparation for that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 3edde2b68d..526795ff0e 100644
--- a/midx.c
+++ b/midx.c
@@ -811,6 +811,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			       unsigned flags)
 {
 	char *midx_name;
+	unsigned char midx_hash[GIT_MAX_RAWSZ];
 	uint32_t i;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
@@ -987,7 +988,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
 	write_chunkfile(cf, &ctx);
 
-	finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 	commit_lock_file(&lk);
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 11/16] midx: make some functions non-static
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (9 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 10/16] midx: keep track of the checksum Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

In a subsequent commit, pack-revindex.c will become responsible for
sorting a list of objects in the "MIDX pack order" (which will be
defined in the following patch). To do so, it will need to be know the
pack identifier and offset within that pack for each object in the MIDX.

The MIDX code already has functions for doing just that
(nth_midxed_offset() and nth_midxed_pack_int_id()), but they are
statically declared.

Since there is no reason that they couldn't be exposed publicly, and
because they are already doing exactly what the caller in
pack-revindex.c will want, expose them publicly so that they can be
reused there.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 ++--
 midx.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 526795ff0e..2173a9b45c 100644
--- a/midx.c
+++ b/midx.c
@@ -239,7 +239,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 	return oid;
 }
 
-static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
 	const unsigned char *offset_data;
 	uint32_t offset32;
@@ -258,7 +258,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	return offset32;
 }
 
-static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
 	return get_be32(m->chunk_object_offsets +
 			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
diff --git a/midx.h b/midx.h
index e7fea61109..93bd68189e 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,8 @@ struct multi_pack_index {
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n);
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 12/16] Documentation/technical: describe multi-pack reverse indexes
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (10 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 11/16] midx: make some functions non-static Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 13/16] pack-revindex: read " Taylor Blau
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

As a prerequisite to implementing multi-pack bitmaps, motivate and
describe the format and ordering of the multi-pack reverse index.

The subsequent patch will implement reading this format, and the patch
after that will implement writing it while producing a multi-pack index.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/pack-format.txt | 83 +++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 1faa949bf6..8d2f42f29e 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -379,3 +379,86 @@ CHUNK DATA:
 TRAILER:
 
 	Index checksum of the above contents.
+
+== multi-pack-index reverse indexes
+
+Similar to the pack-based reverse index, the multi-pack index can also
+be used to generate a reverse index.
+
+Instead of mapping between offset, pack-, and index position, this
+reverse index maps between an object's position within the MIDX, and
+that object's position within a pseudo-pack that the MIDX describes
+(i.e., the ith entry of the multi-pack reverse index holds the MIDX
+position of ith object in pseudo-pack order).
+
+To clarify the difference between these orderings, consider a multi-pack
+reachability bitmap (which does not yet exist, but is what we are
+building towards here). Each bit needs to correspond to an object in the
+MIDX, and so we need an efficient mapping from bit position to MIDX
+position.
+
+One solution is to let bits occupy the same position in the oid-sorted
+index stored by the MIDX. But because oids are effectively random, their
+resulting reachability bitmaps would have no locality, and thus compress
+poorly. (This is the reason that single-pack bitmaps use the pack
+ordering, and not the .idx ordering, for the same purpose.)
+
+So we'd like to define an ordering for the whole MIDX based around
+pack ordering, which has far better locality (and thus compresses more
+efficiently). We can think of a pseudo-pack created by the concatenation
+of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
+(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
+ordering of the objects like:
+
+    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
+
+where the ordering of the packs is defined by the MIDX's pack list,
+and then the ordering of objects within each pack is the same as the
+order in the actual packfile.
+
+Given the list of packs and their counts of objects, you can
+naïvely reconstruct that pseudo-pack ordering (e.g., the object at
+position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
+slots). But there's a catch. Objects may be duplicated between packs, in
+which case the MIDX only stores one pointer to the object (and thus we'd
+want only one slot in the bitmap).
+
+Callers could handle duplicates themselves by reading objects in order
+of their bit-position, but that's linear in the number of objects, and
+much too expensive for ordinary bitmap lookups. Building a reverse index
+solves this, since it is the logical inverse of the index, and that
+index has already removed duplicates. But, building a reverse index on
+the fly can be expensive. Since we already have an on-disk format for
+pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
+too.
+
+Objects from the MIDX are ordered as follows to string together the
+pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
+by the MIDX, and define an ordering of packs based on their numeric ID
+(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
+within `pack(o)`. Then, compare `o1` and `o2` as follows:
+
+  - If one of `pack(o1)` and `pack(o2)` is preferred and the other
+    is not, then the preferred one sorts first.
++
+(This is a detail that allows the MIDX bitmap to determine which
+pack should be used by the pack-reuse mechanism, since it can ask
+the MIDX for the pack containing the object at bit position 0).
+
+  - If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending
+    order based on the pack ID.
+
+  - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in
+    pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1)
+    < offset(o2)`).
+
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+
+Finally, note that the MIDX's reverse index is not stored as a chunk in
+the multi-pack-index itself. This is done because the reverse index
+includes the checksum of the pack or MIDX to which it belongs, which
+makes it impossible to write in the MIDX. To avoid races when rewriting
+the MIDX, a MIDX reverse index includes the MIDX's checksum in its
+filename (e.g., `multi-pack-index-xyz.rev`).
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 13/16] pack-revindex: read multi-pack reverse indexes
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (11 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Implement reading for multi-pack reverse indexes, as described in the
previous patch.

Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.

There are three new functions exposed by the revindex API:

  - load_midx_revindex(): loads the reverse index corresponding to the
    given multi-pack index.

  - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
    multi-pack index and pseudo-pack order.

load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.

load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.

pack_pos_to_midx() simply reads the corresponding entry out of the
table.

midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array of values, but rather a permuted order of those values.

So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c          |  11 +++++
 midx.h          |   6 +++
 pack-revindex.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++
 pack-revindex.h |  53 ++++++++++++++++++++
 packfile.c      |   3 ++
 5 files changed, 199 insertions(+)

diff --git a/midx.c b/midx.c
index 2173a9b45c..c04e794888 100644
--- a/midx.c
+++ b/midx.c
@@ -47,11 +47,22 @@ static uint8_t oid_version(void)
 	}
 }
 
+static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+{
+	return m->data + m->data_len - the_hash_algo->rawsz;
+}
+
 static char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+char *get_midx_rev_filename(struct multi_pack_index *m)
+{
+	return xstrfmt("%s/pack/multi-pack-index-%s.rev",
+		       m->object_dir, hash_to_hex(get_midx_checksum(m)));
+}
+
 static int midx_read_oid_fanout(const unsigned char *chunk_start,
 				size_t chunk_size, void *data)
 {
diff --git a/midx.h b/midx.h
index 93bd68189e..0a8294d2ee 100644
--- a/midx.h
+++ b/midx.h
@@ -15,6 +15,10 @@ struct multi_pack_index {
 	const unsigned char *data;
 	size_t data_len;
 
+	const uint32_t *revindex_data;
+	const uint32_t *revindex_map;
+	size_t revindex_len;
+
 	uint32_t signature;
 	unsigned char version;
 	unsigned char hash_len;
@@ -37,6 +41,8 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 
+char *get_midx_rev_filename(struct multi_pack_index *m);
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
diff --git a/pack-revindex.c b/pack-revindex.c
index 4262530449..0e4a31d9db 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -3,6 +3,7 @@
 #include "object-store.h"
 #include "packfile.h"
 #include "config.h"
+#include "midx.h"
 
 struct revindex_entry {
 	off_t offset;
@@ -293,6 +294,43 @@ int load_pack_revindex(struct packed_git *p)
 	return -1;
 }
 
+int load_midx_revindex(struct multi_pack_index *m)
+{
+	char *revindex_name;
+	int ret;
+	if (m->revindex_data)
+		return 0;
+
+	revindex_name = get_midx_rev_filename(m);
+
+	ret = load_revindex_from_disk(revindex_name,
+				      m->num_objects,
+				      &m->revindex_map,
+				      &m->revindex_len);
+	if (ret)
+		goto cleanup;
+
+	m->revindex_data = (const uint32_t *)((const char *)m->revindex_map + RIDX_HEADER_SIZE);
+
+cleanup:
+	free(revindex_name);
+	return ret;
+}
+
+int close_midx_revindex(struct multi_pack_index *m)
+{
+	if (!m || !m->revindex_map)
+		return 0;
+
+	munmap((void*)m->revindex_map, m->revindex_len);
+
+	m->revindex_map = NULL;
+	m->revindex_data = NULL;
+	m->revindex_len = 0;
+
+	return 0;
+}
+
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	unsigned lo, hi;
@@ -347,3 +385,91 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
 	else
 		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
 }
+
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
+{
+	if (!m->revindex_data)
+		BUG("pack_pos_to_midx: reverse index not yet loaded");
+	if (m->num_objects <= pos)
+		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
+	return get_be32(m->revindex_data + pos);
+}
+
+struct midx_pack_key {
+	uint32_t pack;
+	off_t offset;
+
+	uint32_t preferred_pack;
+	struct multi_pack_index *midx;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+	const struct midx_pack_key *key = va;
+	struct multi_pack_index *midx = key->midx;
+
+	uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+	uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
+	off_t versus_offset;
+
+	uint32_t key_preferred = key->pack == key->preferred_pack;
+	uint32_t versus_preferred = versus_pack == key->preferred_pack;
+
+	/*
+	 * First, compare the preferred-ness, noting that the preferred pack
+	 * comes first.
+	 */
+	if (key_preferred && !versus_preferred)
+		return -1;
+	else if (!key_preferred && versus_preferred)
+		return 1;
+
+	/* Then, break ties first by comparing the pack IDs. */
+	if (key->pack < versus_pack)
+		return -1;
+	else if (key->pack > versus_pack)
+		return 1;
+
+	/* Finally, break ties by comparing offsets within a pack. */
+	versus_offset = nth_midxed_offset(midx, versus);
+	if (key->offset < versus_offset)
+		return -1;
+	else if (key->offset > versus_offset)
+		return 1;
+
+	return 0;
+}
+
+int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
+{
+	struct midx_pack_key key;
+	uint32_t *found;
+
+	if (!m->revindex_data)
+		BUG("midx_to_pack_pos: reverse index not yet loaded");
+	if (m->num_objects <= at)
+		BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
+
+	key.pack = nth_midxed_pack_int_id(m, at);
+	key.offset = nth_midxed_offset(m, at);
+	key.midx = m;
+	/*
+	 * The preferred pack sorts first, so determine its identifier by
+	 * looking at the first object in pseudo-pack order.
+	 *
+	 * Note that if no --preferred-pack is explicitly given when writing a
+	 * multi-pack index, then whichever pack has the lowest identifier
+	 * implicitly is preferred (and includes all its objects, since ties are
+	 * broken first by pack identifier).
+	 */
+	key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+
+	found = bsearch(&key, m->revindex_data, m->num_objects,
+			sizeof(*m->revindex_data), midx_pack_order_cmp);
+
+	if (!found)
+		return error("bad offset for revindex");
+
+	*pos = found - m->revindex_data;
+	return 0;
+}
diff --git a/pack-revindex.h b/pack-revindex.h
index ba7c82c125..479b8f2f9c 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -14,6 +14,20 @@
  *
  * - offset: the byte offset within the .pack file at which the object contents
  *   can be found
+ *
+ * The revindex can also be used with a multi-pack index (MIDX). In this
+ * setting:
+ *
+ *   - index position refers to an object's numeric position within the MIDX
+ *
+ *   - pack position refers to an object's position within a non-existent pack
+ *     described by the MIDX. The pack structure is described in
+ *     Documentation/technical/pack-format.txt.
+ *
+ *     It is effectively a concatanation of all packs in the MIDX (ordered by
+ *     their numeric ID within the MIDX) in their original order within each
+ *     pack), removing duplicates, and placing the preferred pack (if any)
+ *     first.
  */
 
 
@@ -24,6 +38,7 @@
 #define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
 
 struct packed_git;
+struct multi_pack_index;
 
 /*
  * load_pack_revindex populates the revindex's internal data-structures for the
@@ -34,6 +49,22 @@ struct packed_git;
  */
 int load_pack_revindex(struct packed_git *p);
 
+/*
+ * load_midx_revindex loads the '.rev' file corresponding to the given
+ * multi-pack index by mmap-ing it and assigning pointers in the
+ * multi_pack_index to point at it.
+ *
+ * A negative number is returned on error.
+ */
+int load_midx_revindex(struct multi_pack_index *m);
+
+/*
+ * Frees resources associated with a multi-pack reverse index.
+ *
+ * A negative number is returned on error.
+ */
+int close_midx_revindex(struct multi_pack_index *m);
+
 /*
  * offset_to_pack_pos converts an object offset to a pack position. This
  * function returns zero on success, and a negative number otherwise. The
@@ -71,4 +102,26 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
  */
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
 
+/*
+ * pack_pos_to_midx converts the object at position "pos" within the MIDX
+ * pseudo-pack into a MIDX position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in time O(log N) with the number of objects in the MIDX.
+ */
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos);
+
+/*
+ * midx_to_pack_pos converts from the MIDX-relative position at "at" to the
+ * corresponding pack position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in constant time.
+ */
+int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 6661f3325a..8668345d93 100644
--- a/packfile.c
+++ b/packfile.c
@@ -862,6 +862,9 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
+	if (starts_with(file_name, "multi-pack-index") &&
+	    ends_with(file_name, ".rev"))
+		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
 	    ends_with(file_name, ".pack") ||
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 14/16] pack-write.c: extract 'write_rev_file_order'
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (12 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 13/16] pack-revindex: read " Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-09-08  1:08     ` [PATCH] pack-write: skip *.rev work when not writing *.rev Ævar Arnfjörð Bjarmason
  2021-03-30 15:04   ` [PATCH v4 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
                     ` (2 subsequent siblings)
  16 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Existing callers provide the reverse index code with an array of 'struct
pack_idx_entry *'s, which is then sorted by pack order (comparing the
offsets of each object within the pack).

Prepare for the multi-pack index to write a .rev file by providing a way
to write the reverse index without an array of pack_idx_entry (which the
MIDX code does not have).

Instead, callers can invoke 'write_rev_index_positions()', which takes
an array of uint32_t's. The ith entry in this array specifies the ith
object's (in index order) position within the pack (in pack order).

Expose this new function for use in a later patch, and rewrite the
existing write_rev_file() in terms of this new function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-write.c | 36 +++++++++++++++++++++++++-----------
 pack.h       |  1 +
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/pack-write.c b/pack-write.c
index 2ca85a9d16..f1fc3ecafa 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -201,21 +201,12 @@ static void write_rev_header(struct hashfile *f)
 }
 
 static void write_rev_index_positions(struct hashfile *f,
-				      struct pack_idx_entry **objects,
+				      uint32_t *pack_order,
 				      uint32_t nr_objects)
 {
-	uint32_t *pack_order;
 	uint32_t i;
-
-	ALLOC_ARRAY(pack_order, nr_objects);
-	for (i = 0; i < nr_objects; i++)
-		pack_order[i] = i;
-	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
-
 	for (i = 0; i < nr_objects; i++)
 		hashwrite_be32(f, pack_order[i]);
-
-	free(pack_order);
 }
 
 static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
@@ -228,6 +219,29 @@ const char *write_rev_file(const char *rev_name,
 			   uint32_t nr_objects,
 			   const unsigned char *hash,
 			   unsigned flags)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+	const char *ret;
+
+	ALLOC_ARRAY(pack_order, nr_objects);
+	for (i = 0; i < nr_objects; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
+
+	ret = write_rev_file_order(rev_name, pack_order, nr_objects, hash,
+				   flags);
+
+	free(pack_order);
+
+	return ret;
+}
+
+const char *write_rev_file_order(const char *rev_name,
+				 uint32_t *pack_order,
+				 uint32_t nr_objects,
+				 const unsigned char *hash,
+				 unsigned flags)
 {
 	struct hashfile *f;
 	int fd;
@@ -262,7 +276,7 @@ const char *write_rev_file(const char *rev_name,
 
 	write_rev_header(f);
 
-	write_rev_index_positions(f, objects, nr_objects);
+	write_rev_index_positions(f, pack_order, nr_objects);
 	write_rev_trailer(f, hash);
 
 	if (rev_name && adjust_shared_perm(rev_name) < 0)
diff --git a/pack.h b/pack.h
index 857cbd5bd4..fa13954526 100644
--- a/pack.h
+++ b/pack.h
@@ -94,6 +94,7 @@ struct ref;
 void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
 
 const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
+const char *write_rev_file_order(const char *rev_name, uint32_t *pack_order, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 15/16] pack-revindex: write multi-pack reverse indexes
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (13 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:04   ` [PATCH v4 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
  2021-03-30 15:45   ` [PATCH v4 00/16] midx: implement a multi-pack reverse index Jeff King
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Implement the writing half of multi-pack reverse indexes. This is
nothing more than the format describe a few patches ago, with a new set
of helper functions that will be used to clear out stale .rev files
corresponding to old MIDXs.

Unfortunately, a very similar comparison function as the one implemented
recently in pack-revindex.c is reimplemented here, this time accepting a
MIDX-internal type. An effort to DRY these up would create more
indirection and overhead than is necessary, so it isn't pursued here.

Currently, there are no callers which pass the MIDX_WRITE_REV_INDEX
flag, meaning that this is all dead code. But, that won't be the case
for long, since subsequent patches will introduce the multi-pack bitmap,
which will begin passing this field.

(In midx.c:write_midx_internal(), the two adjacent if statements share a
conditional, but are written separately since the first one will
eventually also handle the MIDX_WRITE_BITMAP flag, which does not yet
exist.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 midx.h |   1 +
 2 files changed, 116 insertions(+)

diff --git a/midx.c b/midx.c
index c04e794888..b96eaa12fb 100644
--- a/midx.c
+++ b/midx.c
@@ -12,6 +12,7 @@
 #include "run-command.h"
 #include "repository.h"
 #include "chunk-format.h"
+#include "pack.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -462,6 +463,7 @@ struct write_midx_context {
 	uint32_t entries_nr;
 
 	uint32_t *pack_perm;
+	uint32_t *pack_order;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
 
@@ -816,6 +818,70 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
+static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
+{
+	struct write_midx_context *ctx = _ctx;
+
+	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
+	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
+
+	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
+	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
+
+	/* Sort objects in the preferred pack ahead of any others. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
+	/* Then, order objects by which packs they appear in. */
+	if (perm_a < perm_b)
+		return -1;
+	if (perm_a > perm_b)
+		return 1;
+
+	/* Then, disambiguate by their offset within each pack. */
+	if (a->offset < b->offset)
+		return -1;
+	if (a->offset > b->offset)
+		return 1;
+
+	return 0;
+}
+
+static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
+
+	return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+				     struct write_midx_context *ctx)
+{
+	struct strbuf buf = STRBUF_INIT;
+	const char *tmp_file;
+
+	strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
+					midx_hash, WRITE_REV);
+
+	if (finalize_object_file(tmp_file, buf.buf))
+		die(_("cannot store reverse index file"));
+
+	strbuf_release(&buf);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash);
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -1001,6 +1067,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		ctx.pack_order = midx_pack_order(&ctx);
+
+	if (flags & MIDX_WRITE_REV_INDEX)
+		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 	commit_lock_file(&lk);
 
 cleanup:
@@ -1015,6 +1089,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.info);
 	free(ctx.entries);
 	free(ctx.pack_perm);
+	free(ctx.pack_order);
 	free(midx_name);
 	return result;
 }
@@ -1027,6 +1102,44 @@ int write_midx_file(const char *object_dir,
 				   flags);
 }
 
+struct clear_midx_data {
+	char *keep;
+	const char *ext;
+};
+
+static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
+				const char *file_name, void *_data)
+{
+	struct clear_midx_data *data = _data;
+
+	if (!(starts_with(file_name, "multi-pack-index-") &&
+	      ends_with(file_name, data->ext)))
+		return;
+	if (data->keep && !strcmp(data->keep, file_name))
+		return;
+
+	if (unlink(full_path))
+		die_errno(_("failed to remove %s"), full_path);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+				 unsigned char *keep_hash)
+{
+	struct clear_midx_data data;
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	if (keep_hash)
+		data.keep = xstrfmt("multi-pack-index-%s%s",
+				    hash_to_hex(keep_hash), ext);
+	data.ext = ext;
+
+	for_each_file_in_pack_dir(r->objects->odb->path,
+				  clear_midx_file_ext,
+				  &data);
+
+	free(data.keep);
+}
+
 void clear_midx_file(struct repository *r)
 {
 	char *midx = get_midx_filename(r->objects->odb->path);
@@ -1039,6 +1152,8 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".rev", NULL);
+
 	free(midx);
 }
 
diff --git a/midx.h b/midx.h
index 0a8294d2ee..8684cf0fef 100644
--- a/midx.h
+++ b/midx.h
@@ -40,6 +40,7 @@ struct multi_pack_index {
 };
 
 #define MIDX_PROGRESS     (1 << 0)
+#define MIDX_WRITE_REV_INDEX (1 << 1)
 
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
-- 
2.30.0.667.g81c0cbc6fd


^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v4 16/16] midx.c: improve cache locality in midx_pack_order_cmp()
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (14 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
@ 2021-03-30 15:04   ` Taylor Blau
  2021-03-30 15:45   ` [PATCH v4 00/16] midx: implement a multi-pack reverse index Jeff King
  16 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:04 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

From: Jeff King <peff@peff.net>

There is a lot of pointer dereferencing in the pre-image version of
'midx_pack_order_cmp()', which this patch gets rid of.

Instead of comparing the pack preferred-ness and then the pack id, both
of these checks are done at the same time by using the high-order bit of
the pack id to represent whether it's preferred. Then the pack id and
offset are compared as usual.

This produces the same result so long as there are less than 2^31 packs,
which seems like a likely assumption to make in practice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 55 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/midx.c b/midx.c
index b96eaa12fb..9e86583172 100644
--- a/midx.c
+++ b/midx.c
@@ -818,46 +818,49 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
-static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
+struct midx_pack_order_data {
+	uint32_t nr;
+	uint32_t pack;
+	off_t offset;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
 {
-	struct write_midx_context *ctx = _ctx;
-
-	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
-	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
-
-	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
-	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
-
-	/* Sort objects in the preferred pack ahead of any others. */
-	if (a->preferred > b->preferred)
+	const struct midx_pack_order_data *a = va, *b = vb;
+	if (a->pack < b->pack)
 		return -1;
-	if (a->preferred < b->preferred)
+	else if (a->pack > b->pack)
 		return 1;
-
-	/* Then, order objects by which packs they appear in. */
-	if (perm_a < perm_b)
+	else if (a->offset < b->offset)
 		return -1;
-	if (perm_a > perm_b)
+	else if (a->offset > b->offset)
 		return 1;
-
-	/* Then, disambiguate by their offset within each pack. */
-	if (a->offset < b->offset)
-		return -1;
-	if (a->offset > b->offset)
-		return 1;
-
-	return 0;
+	else
+		return 0;
 }
 
 static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 {
+	struct midx_pack_order_data *data;
 	uint32_t *pack_order;
 	uint32_t i;
 
+	ALLOC_ARRAY(data, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *e = &ctx->entries[i];
+		data[i].nr = i;
+		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		if (!e->preferred)
+			data[i].pack |= (1U << 31);
+		data[i].offset = e->offset;
+	}
+
+	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
+
 	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	for (i = 0; i < ctx->entries_nr; i++)
-		pack_order[i] = i;
-	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
+		pack_order[i] = data[i].nr;
+	free(data);
 
 	return pack_order;
 }
-- 
2.30.0.667.g81c0cbc6fd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v4 00/16] midx: implement a multi-pack reverse index
  2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
                     ` (15 preceding siblings ...)
  2021-03-30 15:04   ` [PATCH v4 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
@ 2021-03-30 15:45   ` Jeff King
  2021-03-30 15:49     ` Taylor Blau
  16 siblings, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-03-30 15:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, dstolee, jonathantanmy

On Tue, Mar 30, 2021 at 11:03:44AM -0400, Taylor Blau wrote:

> Here is another reroll of my series to implement a reverse index in
> preparation for multi-pack reachability bitmaps.

Thanks, this addresses all of my comments from the last round.

> This reroll differs only in the feedback I incorporated from Peff's review. They
> are mostly cosmetic; the most substantial change being that the --preferred-pack
> code now uses bsearch() to locate the name of the preferred pack (instead of
> implementing a binary search itself).

Yeah, I read over this part carefully, since it's actual new code (that
isn't run yet!), but I think it is correct.

One minor observation:

>     @@ t/t5319-multi-pack-index.sh: test_expect_success 'warn on improper hash version'
>      +			write --preferred-pack=test-BC-$bc.idx 2>err &&
>      +		test_must_be_empty err &&
>      +
>     ++		echo hi &&
>     ++		test-tool read-midx --show-objects objects >out &&
>     ++
>      +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
>      +			cut -d" " -f1) &&
>     -+		midx_expect_object_offset $b $ofs objects
>     ++		printf "%s %s\tobjects/pack/test-BC-%s.pack\n" \
>     ++			"$b" "$ofs" "$bc" >expect &&
>     ++		grep ^$b out >actual &&
>     ++
>     ++		test_cmp expect actual
>      +	)
>      +'

I'd probably have just skipped show-index entirely, and done:

  grep "^$b .* objects/pack/test-BC" actual

which expresses the intent ($b came from that pack). But I don't mind
the more exacting version (and certainly it is not worth a re-roll even
if you prefer mine).

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v4 00/16] midx: implement a multi-pack reverse index
  2021-03-30 15:45   ` [PATCH v4 00/16] midx: implement a multi-pack reverse index Jeff King
@ 2021-03-30 15:49     ` Taylor Blau
  2021-03-30 16:01       ` Jeff King
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-03-30 15:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, gitster, dstolee, jonathantanmy

On Tue, Mar 30, 2021 at 11:45:19AM -0400, Jeff King wrote:
> > This reroll differs only in the feedback I incorporated from Peff's review. They
> > are mostly cosmetic; the most substantial change being that the --preferred-pack
> > code now uses bsearch() to locate the name of the preferred pack (instead of
> > implementing a binary search itself).
>
> Yeah, I read over this part carefully, since it's actual new code (that
> isn't run yet!), but I think it is correct.

Thankfully this does have coverage via any test that passes
`--preferred-pack` (like the one below).

> One minor observation:
>
> >     @@ t/t5319-multi-pack-index.sh: test_expect_success 'warn on improper hash version'
> >      +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> >      +		test_must_be_empty err &&
> >      +
> >     ++		echo hi &&
> >     ++		test-tool read-midx --show-objects objects >out &&
> >     ++
> >      +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> >      +			cut -d" " -f1) &&
> >     -+		midx_expect_object_offset $b $ofs objects
> >     ++		printf "%s %s\tobjects/pack/test-BC-%s.pack\n" \
> >     ++			"$b" "$ofs" "$bc" >expect &&
> >     ++		grep ^$b out >actual &&
> >     ++
> >     ++		test_cmp expect actual
> >      +	)
> >      +'
>
> I'd probably have just skipped show-index entirely, and done:
>
>   grep "^$b .* objects/pack/test-BC" actual
>
> which expresses the intent ($b came from that pack). But I don't mind
> the more exacting version (and certainly it is not worth a re-roll even
> if you prefer mine).

I originally wrote it that way, but decided to write both expect and
actual to make debugging easier if this ever regresses. Not like it's
that hard to run the test-tool yourself in the trash directory, but
having a snapshot of that object from the MIDX's perspective might make
things a little easier.

Anyway, I agree with you that it doesn't probably matter a ton either
way.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v4 00/16] midx: implement a multi-pack reverse index
  2021-03-30 15:49     ` Taylor Blau
@ 2021-03-30 16:01       ` Jeff King
  0 siblings, 0 replies; 171+ messages in thread
From: Jeff King @ 2021-03-30 16:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, dstolee, jonathantanmy

On Tue, Mar 30, 2021 at 11:49:59AM -0400, Taylor Blau wrote:

> On Tue, Mar 30, 2021 at 11:45:19AM -0400, Jeff King wrote:
> > > This reroll differs only in the feedback I incorporated from Peff's review. They
> > > are mostly cosmetic; the most substantial change being that the --preferred-pack
> > > code now uses bsearch() to locate the name of the preferred pack (instead of
> > > implementing a binary search itself).
> >
> > Yeah, I read over this part carefully, since it's actual new code (that
> > isn't run yet!), but I think it is correct.
> 
> Thankfully this does have coverage via any test that passes
> `--preferred-pack` (like the one below).

Oh right, I forgot this was touching that early part. So now I'm doubly
confident in it.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v4 08/16] midx: allow marking a pack as preferred
  2021-03-30 15:04   ` [PATCH v4 08/16] midx: allow marking a pack as preferred Taylor Blau
@ 2021-04-01  0:32     ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-04-01  0:32 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, dstolee, jonathantanmy

Junio,

On Tue, Mar 30, 2021 at 11:04:11AM -0400, Taylor Blau wrote:

I accidentally left a stray debugging line in here, and managed to skip
over it when reading the range-diff. It's right...

> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index b4afab1dfc..031a5570c0 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -234,6 +234,49 @@ test_expect_success 'warn on improper hash version' '
>  	)
>  '
>
> +test_expect_success 'midx picks objects from preferred pack' '
> +	test_when_finished rm -rf preferred.git &&
> +	git init --bare preferred.git &&
> +	(
> +		cd preferred.git &&
> +
> +		a=$(echo "a" | git hash-object -w --stdin) &&
> +		b=$(echo "b" | git hash-object -w --stdin) &&
> +		c=$(echo "c" | git hash-object -w --stdin) &&
> +
> +		# Set up two packs, duplicating the object "B" at different
> +		# offsets.
> +		#
> +		# Note that the "BC" pack (the one we choose as preferred) sorts
> +		# lexically after the "AB" pack, meaning that omitting the
> +		# --preferred-pack argument would cause this test to fail (since
> +		# the MIDX code would select the copy of "b" in the "AB" pack).
> +		git pack-objects objects/pack/test-AB <<-EOF &&
> +		$a
> +		$b
> +		EOF
> +		bc=$(git pack-objects objects/pack/test-BC <<-EOF
> +		$b
> +		$c
> +		EOF
> +		) &&
> +
> +		git multi-pack-index --object-dir=objects \
> +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> +		test_must_be_empty err &&
> +
> +		echo hi &&

...here. Would you mind fixing it up locally before applying this to
next?

Sorry for the trouble.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-03-30 15:04   ` [PATCH v4 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
@ 2021-09-08  1:08     ` Ævar Arnfjörð Bjarmason
  2021-09-08  1:35       ` Carlo Arenas
  2021-09-08  2:50       ` Taylor Blau
  0 siblings, 2 replies; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-08  1:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Fix a performance regression introduced in a587b5a786 (pack-write.c:
extract 'write_rev_file_order', 2021-03-30) and stop needlessly
allocating the "pack_order" array and sorting it with
"pack_order_cmp()", only to throw that work away when we discover that
we're not writing *.rev files after all.

This redundant work was not present in the original version of this
code added in 8ef50d9958 (pack-write.c: prepare to write 'pack-*.rev'
files, 2021-01-25). There we'd call write_rev_file() from
e.g. finish_tmp_packfile(), but we'd "return NULL" early in
write_rev_file() if not doing a "WRITE_REV" or "WRITE_REV_VERIFY".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 pack-write.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/pack-write.c b/pack-write.c
index f1fc3ecafa..1883848e7c 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -224,6 +224,9 @@ const char *write_rev_file(const char *rev_name,
 	uint32_t i;
 	const char *ret;
 
+	if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
+		return NULL;
+
 	ALLOC_ARRAY(pack_order, nr_objects);
 	for (i = 0; i < nr_objects; i++)
 		pack_order[i] = i;
-- 
2.33.0.819.gea1b153a43c


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-09-08  1:08     ` [PATCH] pack-write: skip *.rev work when not writing *.rev Ævar Arnfjörð Bjarmason
@ 2021-09-08  1:35       ` Carlo Arenas
  2021-09-08  2:42         ` Taylor Blau
  2021-09-08  2:50       ` Taylor Blau
  1 sibling, 1 reply; 171+ messages in thread
From: Carlo Arenas @ 2021-09-08  1:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Taylor Blau, Derrick Stolee

On Tue, Sep 7, 2021 at 6:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> +       if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
> +               return NULL;

I see this expression matches exactly the logic from 8ef50d9958 which
is why I presume
you used it, but the simpler (and logically equivalent[1]) :

  if !((flags & WRITE_REV) || (flags & WRITE_REV_VERIFY))

is easier to read IMHO

Carlo

[1] https://en.wikipedia.org/wiki/De_Morgan%27s_laws

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-09-08  1:35       ` Carlo Arenas
@ 2021-09-08  2:42         ` Taylor Blau
  2021-09-08 15:47           ` Junio C Hamano
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-09-08  2:42 UTC (permalink / raw)
  To: Carlo Arenas
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Jeff King, Taylor Blau, Derrick Stolee

On Tue, Sep 07, 2021 at 06:35:10PM -0700, Carlo Arenas wrote:
> On Tue, Sep 7, 2021 at 6:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > +       if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
> > +               return NULL;
>
> I see this expression matches exactly the logic from 8ef50d9958 which
> is why I presume
> you used it, but the simpler (and logically equivalent[1]) :
>
>   if !((flags & WRITE_REV) || (flags & WRITE_REV_VERIFY))

Even simpler would be:

    if (!(flags & (WRITE_REV | WRITE_REV_VERIFY)))

although with optimization flags other than -O0, it seems that each of
these three produce the same result [1], so I don't think that it
matters much either way ;-).

Thanks,
Taylor

[1]: https://godbolt.org/z/fxxhzEz79

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-09-08  1:08     ` [PATCH] pack-write: skip *.rev work when not writing *.rev Ævar Arnfjörð Bjarmason
  2021-09-08  1:35       ` Carlo Arenas
@ 2021-09-08  2:50       ` Taylor Blau
  2021-09-08  3:50         ` Taylor Blau
  1 sibling, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-09-08  2:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Taylor Blau, Derrick Stolee

On Wed, Sep 08, 2021 at 03:08:03AM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a performance regression introduced in a587b5a786 (pack-write.c:
> extract 'write_rev_file_order', 2021-03-30) and stop needlessly
> allocating the "pack_order" array and sorting it with
> "pack_order_cmp()", only to throw that work away when we discover that
> we're not writing *.rev files after all.
>
> This redundant work was not present in the original version of this
> code added in 8ef50d9958 (pack-write.c: prepare to write 'pack-*.rev'
> files, 2021-01-25). There we'd call write_rev_file() from
> e.g. finish_tmp_packfile(), but we'd "return NULL" early in
> write_rev_file() if not doing a "WRITE_REV" or "WRITE_REV_VERIFY".
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  pack-write.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/pack-write.c b/pack-write.c
> index f1fc3ecafa..1883848e7c 100644
> --- a/pack-write.c
> +++ b/pack-write.c
> @@ -224,6 +224,9 @@ const char *write_rev_file(const char *rev_name,
>  	uint32_t i;
>  	const char *ret;
>
> +	if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
> +		return NULL;
> +

Great catch! This fix as-is is obviously correct, but it does make the
same checks in write_rev_file_order redundant as a result.

If we call write_rev_file() without WRITE_REV, then we'll never call
write_rev_file_order(). The other caller of write_rev_file_order() is
from midx.c:write_midx_reverse_index(), which is only called by
write_midx_internal() where it is guarded by a similar conditional.

So I think we could probably either:

  - remove the check from write_rev_file_order() altogether, moving it
    to write_rev_file() like you did here, or

  - remove the check from write_rev_file_order() and elevate it to the
    caller which is missing the check in finish_tmp_packfile()

Of the two, I think the former is more appealing (since no other
functions called by finish_tmp_packfile() are guarded like that; they
conditionally behave as noops depending on `flags`).

But what you wrote here is just fine as-is, so the above are just some
optional ideas for potential improvements.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-09-08  2:50       ` Taylor Blau
@ 2021-09-08  3:50         ` Taylor Blau
  2021-09-08 10:18           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-09-08  3:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Derrick Stolee

On Tue, Sep 07, 2021 at 10:50:58PM -0400, Taylor Blau wrote:
> Of the two, I think the former is more appealing (since no other
> functions called by finish_tmp_packfile() are guarded like that; they
> conditionally behave as noops depending on `flags`).

Sorry; this is nonsensical. The only other function we call is
write_idx_file() which merely changes its behavior based on flags, but
it never behaves as a noop.

That doesn't change my thinking about preferring the former of my two
suggestions, but just wanted to correct my error.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-09-08  3:50         ` Taylor Blau
@ 2021-09-08 10:18           ` Ævar Arnfjörð Bjarmason
  2021-09-08 16:32             ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-08 10:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Derrick Stolee


On Tue, Sep 07 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 10:50:58PM -0400, Taylor Blau wrote:
>> Of the two, I think the former is more appealing (since no other
>> functions called by finish_tmp_packfile() are guarded like that; they
>> conditionally behave as noops depending on `flags`).
>
> Sorry; this is nonsensical. The only other function we call is
> write_idx_file() which merely changes its behavior based on flags, but
> it never behaves as a noop.
>
> That doesn't change my thinking about preferring the former of my two
> suggestions, but just wanted to correct my error.

I agree that this code is very confusing overall, but would prefer to
wait on refactoring further until the two topics in flight (this and the
other pack-write topic) settle.

As shown in
https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I think
the best thing to do is neither of the narrow fixes you suggest, but to
more deeply untangle the whole mess around how we choose to write these
files & with what options. A lot of it is bit-twiddling back and forth
for no real reason.

Once we do that it becomes impossible to land in a mode where these
functions need to in principle deal with writing a "real" file and the
"verify" mode, which as noted in the linked E-Mail is the case now, we
just need/want these "is more than one set?" checks & assertions because
we've made the interface overly confusing/general.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-09-08  2:42         ` Taylor Blau
@ 2021-09-08 15:47           ` Junio C Hamano
  0 siblings, 0 replies; 171+ messages in thread
From: Junio C Hamano @ 2021-09-08 15:47 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Carlo Arenas, Ævar Arnfjörð Bjarmason, git,
	Jeff King, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Sep 07, 2021 at 06:35:10PM -0700, Carlo Arenas wrote:
>> On Tue, Sep 7, 2021 at 6:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> > +       if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
>> > +               return NULL;
>>
>> I see this expression matches exactly the logic from 8ef50d9958 which
>> is why I presume
>> you used it, but the simpler (and logically equivalent[1]) :
>>
>>   if !((flags & WRITE_REV) || (flags & WRITE_REV_VERIFY))
>
> Even simpler would be:
>
>     if (!(flags & (WRITE_REV | WRITE_REV_VERIFY)))
>
> although with optimization flags other than -O0, it seems that each of
> these three produce the same result [1], so I don't think that it
> matters much either way ;-).

If all result in the same binary, the only deciding factor would be
how readable the code is to human readers.

I too find that your version, "we care about these two bits---if
flags does not have either of them, then...", the easiest to follow.
But the original is not unreadable and is good enough once it has
already been written.

Thanks.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH] pack-write: skip *.rev work when not writing *.rev
  2021-09-08 10:18           ` Ævar Arnfjörð Bjarmason
@ 2021-09-08 16:32             ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-09-08 16:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Derrick Stolee

On Wed, Sep 08, 2021 at 12:18:38PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > On Tue, Sep 07, 2021 at 10:50:58PM -0400, Taylor Blau wrote:
> >> Of the two, I think the former is more appealing (since no other
> >> functions called by finish_tmp_packfile() are guarded like that; they
> >> conditionally behave as noops depending on `flags`).
> >
> > Sorry; this is nonsensical. The only other function we call is
> > write_idx_file() which merely changes its behavior based on flags, but
> > it never behaves as a noop.
> >
> > That doesn't change my thinking about preferring the former of my two
> > suggestions, but just wanted to correct my error.
>
> I agree that this code is very confusing overall, but would prefer to
> wait on refactoring further until the two topics in flight (this and the
> other pack-write topic) settle.

I'm fine to wait on any further refactorings. And I agree that this code
is confusing, since when I read it last night I thought that the check
in write_rev_file_order() was a duplicate of the one you introduced, but
it is not:

    if ((flags & WRITE_REV) && (flags & WRITE_REV_VERIFY))
      die(_("cannot both write and verify reverse index"));

and that check is different than the one you added, which I think is
appropriate.

So this patch looks good to me, and sorry for the confusion.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-02-15 20:39             ` Ævar Arnfjörð Bjarmason
@ 2021-09-17 21:13               ` SZEDER Gábor
  2021-09-17 22:03                 ` Jeff King
  2021-09-18  0:58                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 171+ messages in thread
From: SZEDER Gábor @ 2021-09-17 21:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, Junio C Hamano, dstolee, peff

On Mon, Feb 15, 2021 at 09:39:11PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Feb 15 2021, Taylor Blau wrote:
> 
> > On Mon, Feb 15, 2021 at 07:41:16PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> Make use of the parse_options_concat() so we don't need to copy/paste
> >> common options like --object-dir. This is inspired by a similar change
> >> to "checkout" in 2087182272
> >> (checkout: split options[] array in three pieces, 2019-03-29).
> >>
> >> A minor behavior change here is that now we're going to list both
> >> --object-dir and --progress first, before we'd list --progress along
> >> with other options.

The final version of this patch that was picked up is at 

  https://public-inbox.org/git/patch-v4-3.7-32cc0d1c7bc-20210823T122854Z-avarab@gmail.com/

I reply to this old version because of the following pieces of the
discussion:

> > "Behavior change" referring only to the output of `git commit-graph -h`,
> > no?
> >
> > Looking at the code (and understanding this whole situation a little bit
> > better), I'd think that this wouldn't cause us to parse anything
> > differently before or after this change, right?
> 
> Indeed, I just mean the "-h" or "--invalid-opt" output changed in the
> order we show the options in.

[...]

> but I wanted to just focus on
> refactoring existing behavior & get rid of the copy/pasted options

No, there is more behavior change: since 84e4484f12 (commit-graph: use
parse_options_concat(), 2021-08-23) the 'git commit-graph' command
does accept the '--[no-]progress' options as well, but before that
only its subcommands did, and 'git commit-graph --progress ...'
errored out with "unknown option".

Worse, sometimes 'git commit-graph --progress ...' doesn't work as
it's supposed to.  The patch below descibes the problem and fixes it,
but on second thought I don't think that it is the right approach.

In general, even when all subcommands of a git command understand a
particular --option, that does not mean that it's a good idea to teach
that option to that git command.  E.g. what if we later add another
subcommand for which that --option doesn't make any sense?  And from
the quoted discussion above it seems that teaching 'git commit-graph'
the '--progress' option was not intentional at all.

I'm inclined to think that '--progress' should rather be removed from
the common 'git commit-graph' options; luckily it's not too late,
because it hasn't been released yet.


  ---  >8  ---

Subject: [PATCH] commit-graph: fix 'git commit-graph --[no-]progress ...'

Until recenly 'git commit-graph' didn't have a '--progress' option,
only its subcommands did, but this changed with 84e4484f12
(commit-graph: use parse_options_concat(), 2021-08-23), and now the
'git commit-graph' command accepts the '--[no-]progress' options as
well.  Alas, they don't always works as they are supposed to, because
the isatty(2) check is only performed in the subcommands, i.e. after
the "main" 'git commit-graph' command has parsed its options, and it
unconditionally overwrites whatever '--[no-]progress' option might
have been given:

  $ GIT_PROGRESS_DELAY=0 git commit-graph --no-progress write --reachable
  Collecting referenced commits: 1617, done.
  Loading known commits in commit graph: 100% (1617/1617), done.
  [...]
  $ GIT_PROGRESS_DELAY=0 git commit-graph --progress write 2>out
  $ wc -c out
  0 out

Move the isatty(2) check to cmd_commit_graph(), before it calls
parse_options(), so 'git commit-graph --[no-]progress' will be able to
override it as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/commit-graph.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 21fc6e934b..3a873ceaf6 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -101,7 +101,6 @@ static int graph_verify(int argc, const char **argv)
 
 	trace2_cmd_mode("verify");
 
-	opts.progress = isatty(2);
 	argc = parse_options(argc, argv, NULL,
 			     options,
 			     builtin_commit_graph_verify_usage, 0);
@@ -250,7 +249,6 @@ static int graph_write(int argc, const char **argv)
 	};
 	struct option *options = add_common_options(builtin_commit_graph_write_options);
 
-	opts.progress = isatty(2);
 	opts.enable_changed_paths = -1;
 	write_opts.size_multiple = 2;
 	write_opts.max_commits = 0;
@@ -331,6 +329,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 	struct option *builtin_commit_graph_options = common_opts;
 
 	git_config(git_default_config, NULL);
+	opts.progress = isatty(2);
 	argc = parse_options(argc, argv, prefix,
 			     builtin_commit_graph_options,
 			     builtin_commit_graph_usage,
-- 
2.33.0.517.ga8dcee0d0a

  ---  8<  ---


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-09-17 21:13               ` SZEDER Gábor
@ 2021-09-17 22:03                 ` Jeff King
  2021-09-18  4:30                   ` Taylor Blau
  2021-09-18  0:58                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 171+ messages in thread
From: Jeff King @ 2021-09-17 22:03 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason, Taylor Blau, git,
	Junio C Hamano, dstolee

On Fri, Sep 17, 2021 at 11:13:37PM +0200, SZEDER Gábor wrote:

> Worse, sometimes 'git commit-graph --progress ...' doesn't work as
> it's supposed to.  The patch below descibes the problem and fixes it,
> but on second thought I don't think that it is the right approach.
> 
> In general, even when all subcommands of a git command understand a
> particular --option, that does not mean that it's a good idea to teach
> that option to that git command.  E.g. what if we later add another
> subcommand for which that --option doesn't make any sense?  And from
> the quoted discussion above it seems that teaching 'git commit-graph'
> the '--progress' option was not intentional at all.
> 
> I'm inclined to think that '--progress' should rather be removed from
> the common 'git commit-graph' options; luckily it's not too late,
> because it hasn't been released yet.

I wasn't following this series closely, but having seen your fix below,
I'm inclined to agree with you. Just because we _can_ allow options
before or after sub-commands does not necessarily make it a good idea.

There is a distinct meaning to options before/after the command for the
base "git" command (e.g., "git -C foo branch" versus "git branch -C
foo"), and I think that has been useful overall.

>   ---  >8  ---
> 
> Subject: [PATCH] commit-graph: fix 'git commit-graph --[no-]progress ...'

This patch looks like a sensible fix if we don't simply remove the "git
commit-graph --progress write" version.

-Peff

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-09-17 21:13               ` SZEDER Gábor
  2021-09-17 22:03                 ` Jeff King
@ 2021-09-18  0:58                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-18  0:58 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git, Junio C Hamano, dstolee, peff


On Fri, Sep 17 2021, SZEDER Gábor wrote:

> On Mon, Feb 15, 2021 at 09:39:11PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Feb 15 2021, Taylor Blau wrote:
>> 
>> > On Mon, Feb 15, 2021 at 07:41:16PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >> Make use of the parse_options_concat() so we don't need to copy/paste
>> >> common options like --object-dir. This is inspired by a similar change
>> >> to "checkout" in 2087182272
>> >> (checkout: split options[] array in three pieces, 2019-03-29).
>> >>
>> >> A minor behavior change here is that now we're going to list both
>> >> --object-dir and --progress first, before we'd list --progress along
>> >> with other options.
>
> The final version of this patch that was picked up is at 
>
>   https://public-inbox.org/git/patch-v4-3.7-32cc0d1c7bc-20210823T122854Z-avarab@gmail.com/
>
> I reply to this old version because of the following pieces of the
> discussion:
>
>> > "Behavior change" referring only to the output of `git commit-graph -h`,
>> > no?
>> >
>> > Looking at the code (and understanding this whole situation a little bit
>> > better), I'd think that this wouldn't cause us to parse anything
>> > differently before or after this change, right?
>> 
>> Indeed, I just mean the "-h" or "--invalid-opt" output changed in the
>> order we show the options in.
>
> [...]
>
>> but I wanted to just focus on
>> refactoring existing behavior & get rid of the copy/pasted options
>
> No, there is more behavior change: since 84e4484f12 (commit-graph: use
> parse_options_concat(), 2021-08-23) the 'git commit-graph' command
> does accept the '--[no-]progress' options as well, but before that
> only its subcommands did, and 'git commit-graph --progress ...'
> errored out with "unknown option".
>
> Worse, sometimes 'git commit-graph --progress ...' doesn't work as
> it's supposed to.  The patch below descibes the problem and fixes it,
> but on second thought I don't think that it is the right approach.
>
> In general, even when all subcommands of a git command understand a
> particular --option, that does not mean that it's a good idea to teach
> that option to that git command.  E.g. what if we later add another
> subcommand for which that --option doesn't make any sense?  And from
> the quoted discussion above it seems that teaching 'git commit-graph'
> the '--progress' option was not intentional at all.
>
> I'm inclined to think that '--progress' should rather be removed from
> the common 'git commit-graph' options; luckily it's not too late,
> because it hasn't been released yet.
>
>
>   ---  >8  ---
>
> Subject: [PATCH] commit-graph: fix 'git commit-graph --[no-]progress ...'
>
> Until recenly 'git commit-graph' didn't have a '--progress' option,
> only its subcommands did, but this changed with 84e4484f12
> (commit-graph: use parse_options_concat(), 2021-08-23), and now the
> 'git commit-graph' command accepts the '--[no-]progress' options as
> well.  Alas, they don't always works as they are supposed to, because
> the isatty(2) check is only performed in the subcommands, i.e. after
> the "main" 'git commit-graph' command has parsed its options, and it
> unconditionally overwrites whatever '--[no-]progress' option might
> have been given:
>
>   $ GIT_PROGRESS_DELAY=0 git commit-graph --no-progress write --reachable
>   Collecting referenced commits: 1617, done.
>   Loading known commits in commit graph: 100% (1617/1617), done.
>   [...]
>   $ GIT_PROGRESS_DELAY=0 git commit-graph --progress write 2>out
>   $ wc -c out
>   0 out
>
> Move the isatty(2) check to cmd_commit_graph(), before it calls
> parse_options(), so 'git commit-graph --[no-]progress' will be able to
> override it as well.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/commit-graph.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 21fc6e934b..3a873ceaf6 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -101,7 +101,6 @@ static int graph_verify(int argc, const char **argv)
>  
>  	trace2_cmd_mode("verify");
>  
> -	opts.progress = isatty(2);
>  	argc = parse_options(argc, argv, NULL,
>  			     options,
>  			     builtin_commit_graph_verify_usage, 0);
> @@ -250,7 +249,6 @@ static int graph_write(int argc, const char **argv)
>  	};
>  	struct option *options = add_common_options(builtin_commit_graph_write_options);
>  
> -	opts.progress = isatty(2);
>  	opts.enable_changed_paths = -1;
>  	write_opts.size_multiple = 2;
>  	write_opts.max_commits = 0;
> @@ -331,6 +329,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>  	struct option *builtin_commit_graph_options = common_opts;
>  
>  	git_config(git_default_config, NULL);
> +	opts.progress = isatty(2);
>  	argc = parse_options(argc, argv, prefix,
>  			     builtin_commit_graph_options,
>  			     builtin_commit_graph_usage,

Yes, this was unintentional on my part, sorry, and thanks for cleaning
up my mess.

However, I have wondered how we should be dealing with these
sub-commands in general.

In the case of commit-graph we've always documented it at the top-level
as OPTIONS, so even though the usage shows:

    git commit-graph write <options>

We've always accepted "--object-dir" after "git commit-graph", and all
the other options are documented in their per-subcommand sections.

So just from reading the documentation you might think that this (with
your fix here) is intentional behavior, and we should just fix the
synopsis.

Then we have the more recent multi-pack-index which *is* documented as:

    'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
            [--preferred-pack=<pack>] <subcommand>

So actually, the reason this crept in is probably because I was copying
the pattern we've had there since 60ca94769ce
(builtin/multi-pack-index.c: split sub-commands, 2021-03-30), my commit
message says as much.

Given that and multi-pack-index's documented behavior I think that it
probably makes sense to keep and document this, and as a follow-up
(which I or Taylor could do) change the synopsis accordingly.

Aside from whatever bugs have crept or existing behavior, I think it
makes sense as UI to do things like:

    git commit-graph --object-dir=<dir> write --reachable
    git commit-graph --progress write
    git commit-graph --progress verify

etc., as --progress is a not-subcommand-specific option, not really. We
might have a subcommand that doesn't have progress output, but I still
think it makes sense to have it in that position, maybe we'll end up
adding it later.

Brian and I also had a discussion back in April[1] about
--object-format, i.e. should we be making every single command support:

    git hash-object --object-format=sha256

Or (as I suggested) doesn't it make more sense to do:

    git --object-format=sha256 hash-object

Like the --progress option it does mean that you'll end up with commands
for whom that'll just be ignored:

    git --object-format=sha256 version

But that's conceptually similar to repo settings, and I don't think it's
confusing, the same can be said about e.g.:

    git -c this.doesNotUse=thisConfig version

Having said that for --progress it probably makes sense to eventually
have:

    git --progress commit-graph write

I.e. maybe we'd want a top-level option for it, given how many commands
have that option and us needing to pass a "do_progress" flag all over
the place.

Of course we'd need to (silently or not) support it also as:

    git commit-graph --progress write
    git commit-graph write --progress

Which is the case here.

1. https://lore.kernel.org/git/8735vq2l8a.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-09-17 22:03                 ` Jeff King
@ 2021-09-18  4:30                   ` Taylor Blau
  2021-09-18  7:20                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-09-18  4:30 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason, git,
	Junio C Hamano, dstolee

On Fri, Sep 17, 2021 at 06:03:58PM -0400, Jeff King wrote:
> > I'm inclined to think that '--progress' should rather be removed from
> > the common 'git commit-graph' options; luckily it's not too late,
> > because it hasn't been released yet.
>
> I wasn't following this series closely, but having seen your fix below,
> I'm inclined to agree with you. Just because we _can_ allow options
> before or after sub-commands does not necessarily make it a good idea.
>
I agree. Suppose we had a "git commit-graph remove" sub-command that
removed the commit-graph file (ignoring that there are probably better
hypothetical examples than this ;)). It's not obvious what --progress
means in the context of that mode.

Here's a patch that does what you and Gábor are suggesting as an
alternative. Unfortunately, we can't do the same for the
multi-pack-index command, since the analogous change there is 60ca94769c
(builtin/multi-pack-index.c: split sub-commands, 2021-03-30), which was
released in 2.32.

Anyway, as promised:

--- 8< ---

Subject: [PATCH] builtin/commit-graph.c: don't accept common --[no-]progress

In 84e4484f12 (commit-graph: use parse_options_concat(), 2021-08-23) we
unified common options of commit-graph's subcommands into a single
"common_opts" array.

But 84e4484f12 introduced a behavior change which is to accept the
"--[no-]progress" option before any sub-commands, e.g.,

    git commit-graph --progress write ...

Prior to that commit, the above would error out with "unknown option".

There are two issues with this behavior change. First is that the
top-level --[no-]progress is not always respected. This is because
isatty(2) is performed in the sub-commands, which unconditionally
overwrites any --[no-]progress that was given at the top-level.

But the second issue is that the existing sub-commands of commit-graph
only happen to both have a sensible interpretation of what `--progress`
or `--no-progress` means. If we ever added a sub-command which didn't
have a notion of progress, we would be forced to ignore the top-level
`--[no-]progress` altogether.

Since we haven't released a version of Git that supports --[no-]progress
as a top-level option for `git commit-graph`, let's remove it.

Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 21fc6e934b..067587a0fd 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -50,8 +50,6 @@ static struct option common_opts[] = {
 	OPT_STRING(0, "object-dir", &opts.obj_dir,
 		   N_("dir"),
 		   N_("the object directory to store the graph")),
-	OPT_BOOL(0, "progress", &opts.progress,
-		 N_("force progress reporting")),
 	OPT_END()
 };

@@ -95,6 +93,8 @@ static int graph_verify(int argc, const char **argv)
 	static struct option builtin_commit_graph_verify_options[] = {
 		OPT_BOOL(0, "shallow", &opts.shallow,
 			 N_("if the commit-graph is split, only verify the tip file")),
+		OPT_BOOL(0, "progress", &opts.progress,
+			 N_("force progress reporting")),
 		OPT_END(),
 	};
 	struct option *options = add_common_options(builtin_commit_graph_verify_options);
@@ -246,6 +246,8 @@ static int graph_write(int argc, const char **argv)
 		OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters,
 			NULL, N_("maximum number of changed-path Bloom filters to compute"),
 			0, write_option_max_new_filters),
+		OPT_BOOL(0, "progress", &opts.progress,
+			 N_("force progress reporting")),
 		OPT_END(),
 	};
 	struct option *options = add_common_options(builtin_commit_graph_write_options);
--
2.33.0.96.g73915697e6


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-09-18  4:30                   ` Taylor Blau
@ 2021-09-18  7:20                     ` Ævar Arnfjörð Bjarmason
  2021-09-18 15:56                       ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-18  7:20 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, SZEDER Gábor, git, Junio C Hamano, dstolee


On Sat, Sep 18 2021, Taylor Blau wrote:

> On Fri, Sep 17, 2021 at 06:03:58PM -0400, Jeff King wrote:
>> > I'm inclined to think that '--progress' should rather be removed from
>> > the common 'git commit-graph' options; luckily it's not too late,
>> > because it hasn't been released yet.
>>
>> I wasn't following this series closely, but having seen your fix below,
>> I'm inclined to agree with you. Just because we _can_ allow options
>> before or after sub-commands does not necessarily make it a good idea.
>>
> I agree. Suppose we had a "git commit-graph remove" sub-command that
> removed the commit-graph file (ignoring that there are probably better
> hypothetical examples than this ;)). It's not obvious what --progress
> means in the context of that mode.

Well, you might have as many as tens of commit-graph files, and could be
running this on AIX where I/O is apparently implemented in terms of
carrier pigeon messaging judging by how slow it is :)

But as argued in
https://lore.kernel.org/git/87zgsad6mn.fsf@evledraar.gmail.com/ I don't
see how it's going to be any more confusing to user than "git -c foo=bar
version" (the -c does nothing there)>

> Here's a patch that does what you and Gábor are suggesting as an
> alternative. Unfortunately, we can't do the same for the
> multi-pack-index command, since the analogous change there is 60ca94769c
> (builtin/multi-pack-index.c: split sub-commands, 2021-03-30), which was
> released in 2.32.

If we came up with some call about what we want subcommands in general
to look like I'd think it would be fine to convert multi-pack-index to
it, perhaps with some deprecation period where it would issue a
warning() while it understood both forms.

> Anyway, as promised:
>
> --- 8< ---
>
> Subject: [PATCH] builtin/commit-graph.c: don't accept common --[no-]progress
>
> In 84e4484f12 (commit-graph: use parse_options_concat(), 2021-08-23) we
> unified common options of commit-graph's subcommands into a single
> "common_opts" array.
>
> But 84e4484f12 introduced a behavior change which is to accept the
> "--[no-]progress" option before any sub-commands, e.g.,
>
>     git commit-graph --progress write ...
>
> Prior to that commit, the above would error out with "unknown option".
>
> There are two issues with this behavior change. First is that the
> top-level --[no-]progress is not always respected. This is because
> isatty(2) is performed in the sub-commands, which unconditionally
> overwrites any --[no-]progress that was given at the top-level.
>
> But the second issue is that the existing sub-commands of commit-graph
> only happen to both have a sensible interpretation of what `--progress`
> or `--no-progress` means. If we ever added a sub-command which didn't
> have a notion of progress, we would be forced to ignore the top-level
> `--[no-]progress` altogether.
>
> Since we haven't released a version of Git that supports --[no-]progress
> as a top-level option for `git commit-graph`, let's remove it.
>
> Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/commit-graph.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 21fc6e934b..067587a0fd 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -50,8 +50,6 @@ static struct option common_opts[] = {
>  	OPT_STRING(0, "object-dir", &opts.obj_dir,
>  		   N_("dir"),
>  		   N_("the object directory to store the graph")),
> -	OPT_BOOL(0, "progress", &opts.progress,
> -		 N_("force progress reporting")),
>  	OPT_END()
>  };
>
> @@ -95,6 +93,8 @@ static int graph_verify(int argc, const char **argv)
>  	static struct option builtin_commit_graph_verify_options[] = {
>  		OPT_BOOL(0, "shallow", &opts.shallow,
>  			 N_("if the commit-graph is split, only verify the tip file")),
> +		OPT_BOOL(0, "progress", &opts.progress,
> +			 N_("force progress reporting")),
>  		OPT_END(),
>  	};
>  	struct option *options = add_common_options(builtin_commit_graph_verify_options);
> @@ -246,6 +246,8 @@ static int graph_write(int argc, const char **argv)
>  		OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters,
>  			NULL, N_("maximum number of changed-path Bloom filters to compute"),
>  			0, write_option_max_new_filters),
> +		OPT_BOOL(0, "progress", &opts.progress,
> +			 N_("force progress reporting")),
>  		OPT_END(),
>  	};
>  	struct option *options = add_common_options(builtin_commit_graph_write_options);

This is a good change, but if you're up for bonus points leaves the docs
in an odd where we (as noted in [1]) document the --object-dir and
--progress options under OPTIONS, but now only take the former before
the sub-command.

1. https://lore.kernel.org/git/87zgsad6mn.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-09-18  7:20                     ` Ævar Arnfjörð Bjarmason
@ 2021-09-18 15:56                       ` Taylor Blau
  2021-09-18 15:58                         ` Taylor Blau
  0 siblings, 1 reply; 171+ messages in thread
From: Taylor Blau @ 2021-09-18 15:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, Jeff King, SZEDER Gábor, git, Junio C Hamano, dstolee

On Sat, Sep 18, 2021 at 09:20:38AM +0200, Ævar Arnfjörð Bjarmason wrote:
> If we came up with some call about what we want subcommands in general
> to look like I'd think it would be fine to convert multi-pack-index to
> it, perhaps with some deprecation period where it would issue a
> warning() while it understood both forms.

I'm not sure about what to do with the multi-pack-index command.
Probably going through a deprecation process makes the most sense if we
do want to get rid of the top-level `--[no-]progress` there, too. But
let's have the discussion elsewhere and not buried in the MIDX
bitmaps thread ;).

> This is a good change, but if you're up for bonus points leaves the docs
> in an odd where we (as noted in [1]) document the --object-dir and
> --progress options under OPTIONS, but now only take the former before
> the sub-command.

Thanks for noticing. I got up and did something in between writing and
sending this patch, and had a nagging feeling of forgetting something
before I sent. But I couldn't figure out what ;).

I'll resubmit this patch (with the doc changes squashed in) as a new
thread on the list so we can have the discussion not buried in another
thread.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/5] commit-graph: use parse_options_concat()
  2021-09-18 15:56                       ` Taylor Blau
@ 2021-09-18 15:58                         ` Taylor Blau
  0 siblings, 0 replies; 171+ messages in thread
From: Taylor Blau @ 2021-09-18 15:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, SZEDER Gábor, git, Junio C Hamano, dstolee

On Sat, Sep 18, 2021 at 11:56:16AM -0400, Taylor Blau wrote:
> > This is a good change, but if you're up for bonus points leaves the docs
> > in an odd where we (as noted in [1]) document the --object-dir and
> > --progress options under OPTIONS, but now only take the former before
> > the sub-command.
>
> Thanks for noticing. I got up and did something in between writing and
> sending this patch, and had a nagging feeling of forgetting something
> before I sent. But I couldn't figure out what ;).

Actually, I stand by the original patch. Yes, the top-level OPTIONS of
git-commit-graph(1) mentions `--[no-]progress`, but the synopsis makes
clear that those are only accepted after the sub-commands.

So I think it's fine as-is.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 171+ messages in thread

end of thread, other threads:[~2021-09-18 15:58 UTC | newest]

Thread overview: 171+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10 23:02 [PATCH 0/9] midx: implement a multi-pack reverse index Taylor Blau
2021-02-10 23:02 ` [PATCH 1/9] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-02-11  2:27   ` Derrick Stolee
2021-02-11  2:34     ` Taylor Blau
2021-02-10 23:02 ` [PATCH 2/9] midx: allow marking a pack as preferred Taylor Blau
2021-02-11 19:33   ` SZEDER Gábor
2021-02-15 15:49     ` Taylor Blau
2021-02-15 17:01       ` Ævar Arnfjörð Bjarmason
2021-02-15 18:41         ` [PATCH 0/5] commit-graph: parse_options() cleanup Ævar Arnfjörð Bjarmason
2021-02-15 18:41         ` [PATCH 1/5] commit-graph: define common usage with a macro Ævar Arnfjörð Bjarmason
2021-02-16 11:33           ` Derrick Stolee
2021-02-15 18:41         ` [PATCH 2/5] commit-graph: remove redundant handling of -h Ævar Arnfjörð Bjarmason
2021-02-16 11:35           ` Derrick Stolee
2021-02-15 18:41         ` [PATCH 3/5] commit-graph: use parse_options_concat() Ævar Arnfjörð Bjarmason
2021-02-15 18:51           ` Taylor Blau
2021-02-15 19:53             ` Taylor Blau
2021-02-15 20:39             ` Ævar Arnfjörð Bjarmason
2021-09-17 21:13               ` SZEDER Gábor
2021-09-17 22:03                 ` Jeff King
2021-09-18  4:30                   ` Taylor Blau
2021-09-18  7:20                     ` Ævar Arnfjörð Bjarmason
2021-09-18 15:56                       ` Taylor Blau
2021-09-18 15:58                         ` Taylor Blau
2021-09-18  0:58                 ` Ævar Arnfjörð Bjarmason
2021-02-15 18:41         ` [PATCH 4/5] commit-graph: refactor dispatch loop for style Ævar Arnfjörð Bjarmason
2021-02-15 18:53           ` Taylor Blau
2021-02-16 11:40             ` Derrick Stolee
2021-02-16 12:02               ` Ævar Arnfjörð Bjarmason
2021-02-16 18:28                 ` Derrick Stolee
2021-02-15 18:41         ` [PATCH 5/5] commit-graph: show usage on "commit-graph [write|verify] garbage" Ævar Arnfjörð Bjarmason
2021-02-15 19:06           ` Taylor Blau
2021-02-16 11:43           ` Derrick Stolee
2021-02-15 21:01         ` [PATCH v2 0/4] midx: split out sub-commands Taylor Blau
2021-02-15 21:01           ` [PATCH v2 1/4] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-02-15 21:01           ` [PATCH v2 2/4] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-02-15 21:39             ` Ævar Arnfjörð Bjarmason
2021-02-15 21:45               ` Taylor Blau
2021-02-16 11:47                 ` Derrick Stolee
2021-02-15 21:01           ` [PATCH v2 3/4] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-02-15 21:01           ` [PATCH v2 4/4] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-02-15 21:54             ` Ævar Arnfjörð Bjarmason
2021-02-15 22:34               ` Taylor Blau
2021-02-15 23:11                 ` Ævar Arnfjörð Bjarmason
2021-02-15 23:49                   ` Taylor Blau
2021-02-16 11:50           ` [PATCH v2 0/4] midx: split out sub-commands Derrick Stolee
2021-02-16 14:28             ` Taylor Blau
2021-02-10 23:02 ` [PATCH 3/9] midx: don't free midx_name early Taylor Blau
2021-02-10 23:02 ` [PATCH 4/9] midx: keep track of the checksum Taylor Blau
2021-02-11  2:33   ` Derrick Stolee
2021-02-11  2:35     ` Taylor Blau
2021-02-10 23:03 ` [PATCH 5/9] midx: make some functions non-static Taylor Blau
2021-02-10 23:03 ` [PATCH 6/9] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-02-11  2:48   ` Derrick Stolee
2021-02-11  3:03     ` Taylor Blau
2021-02-10 23:03 ` [PATCH 7/9] pack-revindex: read " Taylor Blau
2021-02-11  2:53   ` Derrick Stolee
2021-02-11  3:04     ` Taylor Blau
2021-02-11  7:54   ` Junio C Hamano
2021-02-11 14:54     ` Taylor Blau
2021-02-10 23:03 ` [PATCH 8/9] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-02-10 23:03 ` [PATCH 9/9] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-02-11  2:58 ` [PATCH 0/9] midx: implement a multi-pack reverse index Derrick Stolee
2021-02-11  3:06   ` Taylor Blau
2021-02-11  8:13 ` Junio C Hamano
2021-02-11 18:37   ` Derrick Stolee
2021-02-11 18:55     ` Junio C Hamano
2021-02-24 19:09 ` [PATCH v2 00/15] " Taylor Blau
2021-02-24 19:09   ` [PATCH v2 01/15] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-02-24 19:09   ` [PATCH v2 02/15] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-02-24 19:09   ` [PATCH v2 03/15] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-02-24 19:09   ` [PATCH v2 04/15] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-03-02  4:06     ` Jonathan Tan
2021-03-02 19:02       ` Taylor Blau
2021-03-04  1:54         ` Jonathan Tan
2021-03-04  3:02           ` Taylor Blau
2021-02-24 19:09   ` [PATCH v2 05/15] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
2021-02-24 19:09   ` [PATCH v2 06/15] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
2021-02-24 19:09   ` [PATCH v2 07/15] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-02-24 19:09   ` [PATCH v2 08/15] midx: allow marking a pack as preferred Taylor Blau
2021-03-02  4:17     ` Jonathan Tan
2021-03-02 19:09       ` Taylor Blau
2021-03-04  2:00         ` Jonathan Tan
2021-03-04  3:04           ` Taylor Blau
2021-02-24 19:09   ` [PATCH v2 09/15] midx: don't free midx_name early Taylor Blau
2021-02-24 19:10   ` [PATCH v2 10/15] midx: keep track of the checksum Taylor Blau
2021-02-24 19:10   ` [PATCH v2 11/15] midx: make some functions non-static Taylor Blau
2021-02-24 19:10   ` [PATCH v2 12/15] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-03-02  4:21     ` Jonathan Tan
2021-03-02  4:36       ` Taylor Blau
2021-03-02 19:15       ` Taylor Blau
2021-03-04  2:03         ` Jonathan Tan
2021-02-24 19:10   ` [PATCH v2 13/15] pack-revindex: read " Taylor Blau
2021-03-02 18:36     ` Jonathan Tan
2021-03-03 15:27       ` Taylor Blau
2021-02-24 19:10   ` [PATCH v2 14/15] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-02-24 19:10   ` [PATCH v2 15/15] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-03-02 18:40     ` Jonathan Tan
2021-03-03 15:30       ` Taylor Blau
2021-03-04  2:04         ` Jonathan Tan
2021-03-04  3:06           ` Taylor Blau
2021-03-11 17:04 ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Taylor Blau
2021-03-11 17:04   ` [PATCH v3 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-03-29 11:20     ` Jeff King
2021-03-11 17:04   ` [PATCH v3 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-03-29 11:22     ` Jeff King
2021-03-11 17:04   ` [PATCH v3 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-03-11 17:04   ` [PATCH v3 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-03-29 11:36     ` Jeff King
2021-03-29 20:38       ` Taylor Blau
2021-03-30  7:04         ` Jeff King
2021-03-11 17:04   ` [PATCH v3 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
2021-03-11 17:04   ` [PATCH v3 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
2021-03-29 11:42     ` Jeff King
2021-03-29 20:41       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-03-11 17:05   ` [PATCH v3 08/16] midx: allow marking a pack as preferred Taylor Blau
2021-03-29 12:00     ` Jeff King
2021-03-29 21:15       ` Taylor Blau
2021-03-30  7:11         ` Jeff King
2021-03-11 17:05   ` [PATCH v3 09/16] midx: don't free midx_name early Taylor Blau
2021-03-11 17:05   ` [PATCH v3 10/16] midx: keep track of the checksum Taylor Blau
2021-03-11 17:05   ` [PATCH v3 11/16] midx: make some functions non-static Taylor Blau
2021-03-11 17:05   ` [PATCH v3 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-03-29 12:12     ` Jeff King
2021-03-29 21:22       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 13/16] pack-revindex: read " Taylor Blau
2021-03-29 12:43     ` Jeff King
2021-03-29 21:27       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-03-11 17:05   ` [PATCH v3 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-03-29 12:53     ` Jeff King
2021-03-29 21:30       ` Taylor Blau
2021-03-11 17:05   ` [PATCH v3 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
2021-03-29 12:59     ` Jeff King
2021-03-29 21:34       ` Taylor Blau
2021-03-30  7:15         ` Jeff King
2021-03-12 15:16   ` [PATCH v3 00/16] midx: implement a multi-pack reverse index Derrick Stolee
2021-03-29 13:05   ` Jeff King
2021-03-29 21:30     ` Junio C Hamano
2021-03-29 21:37     ` Taylor Blau
2021-03-30  7:15       ` Jeff King
2021-03-30 13:37         ` Taylor Blau
2021-03-30 15:03 ` [PATCH v4 " Taylor Blau
2021-03-30 15:03   ` [PATCH v4 01/16] builtin/multi-pack-index.c: inline 'flags' with options Taylor Blau
2021-03-30 15:03   ` [PATCH v4 02/16] builtin/multi-pack-index.c: don't handle 'progress' separately Taylor Blau
2021-03-30 15:03   ` [PATCH v4 03/16] builtin/multi-pack-index.c: define common usage with a macro Taylor Blau
2021-03-30 15:03   ` [PATCH v4 04/16] builtin/multi-pack-index.c: split sub-commands Taylor Blau
2021-03-30 15:04   ` [PATCH v4 05/16] builtin/multi-pack-index.c: don't enter bogus cmd_mode Taylor Blau
2021-03-30 15:04   ` [PATCH v4 06/16] builtin/multi-pack-index.c: display usage on unrecognized command Taylor Blau
2021-03-30 15:04   ` [PATCH v4 07/16] t/helper/test-read-midx.c: add '--show-objects' Taylor Blau
2021-03-30 15:04   ` [PATCH v4 08/16] midx: allow marking a pack as preferred Taylor Blau
2021-04-01  0:32     ` Taylor Blau
2021-03-30 15:04   ` [PATCH v4 09/16] midx: don't free midx_name early Taylor Blau
2021-03-30 15:04   ` [PATCH v4 10/16] midx: keep track of the checksum Taylor Blau
2021-03-30 15:04   ` [PATCH v4 11/16] midx: make some functions non-static Taylor Blau
2021-03-30 15:04   ` [PATCH v4 12/16] Documentation/technical: describe multi-pack reverse indexes Taylor Blau
2021-03-30 15:04   ` [PATCH v4 13/16] pack-revindex: read " Taylor Blau
2021-03-30 15:04   ` [PATCH v4 14/16] pack-write.c: extract 'write_rev_file_order' Taylor Blau
2021-09-08  1:08     ` [PATCH] pack-write: skip *.rev work when not writing *.rev Ævar Arnfjörð Bjarmason
2021-09-08  1:35       ` Carlo Arenas
2021-09-08  2:42         ` Taylor Blau
2021-09-08 15:47           ` Junio C Hamano
2021-09-08  2:50       ` Taylor Blau
2021-09-08  3:50         ` Taylor Blau
2021-09-08 10:18           ` Ævar Arnfjörð Bjarmason
2021-09-08 16:32             ` Taylor Blau
2021-03-30 15:04   ` [PATCH v4 15/16] pack-revindex: write multi-pack reverse indexes Taylor Blau
2021-03-30 15:04   ` [PATCH v4 16/16] midx.c: improve cache locality in midx_pack_order_cmp() Taylor Blau
2021-03-30 15:45   ` [PATCH v4 00/16] midx: implement a multi-pack reverse index Jeff King
2021-03-30 15:49     ` Taylor Blau
2021-03-30 16:01       ` Jeff King

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).