git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format
@ 2021-01-08 18:19 Taylor Blau
  2021-01-08 18:19 ` [PATCH 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:19 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

Hi,

This is the second of two series to implement support for an on-disk format for
storing the reverse index. (It depends on the patches in the previous series
[1]).

The format is described in the first patch, but it is roughly as follows:

  - It begins with a 12-byte header, containing a magic string, a version
    identifier, and a hash function identifier.

  - It then contains a 4 * N (where 'N' is the number of objects) table of index
    positions, sorted by each object's offset within the corresponding packfile.

  - Finally, a trailer contains a checksum of the corresponding packfile, and a
    checksum of the above contents.

Since this is a large change, a new 'pack.writeReverseIndex' option is
introduced, which defaults to 'false'. When false, `*.rev` files are not
written, and Git gracefully falls back to generate each reverse index in
memory. This could optionally be tied to the "feature.experimental" option, and
eventually the defalt changed to 'true' in a couple of releases.

To test these new changes, the test suite now understands
'GIT_TEST_WRITE_REV_INDEX' to mean that 'pack.writeReverseIndex' should be
'true' everywhere. Some minor test fall-out is addressed in the sixth patch
before enabling this new mode in the seventh patch.

One option that is _not_ persued in this series is to store the (pack) offset of
each object in the `.rev` file. This would at worst triple the size of the file
(by having to store an additional eight bytes per entry), and add complexity
(like storing an extended offset table as in the `*.idx` format). An extensive
discussion about why this option was not persued can be found in the first
patch.

Thanks in advance for your review.

[1]: https://lore.kernel.org/git/cover.1610129796.git.me@ttaylorr.com/

Taylor Blau (8):
  packfile: prepare for the existence of '*.rev' files
  pack-write.c: prepare to write 'pack-*.rev' files
  builtin/index-pack.c: write reverse indexes
  builtin/pack-objects.c: respect 'pack.writeReverseIndex'
  Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
  t: prepare for GIT_TEST_WRITE_REV_INDEX
  t: support GIT_TEST_WRITE_REV_INDEX
  pack-revindex: ensure that on-disk reverse indexes are given
    precedence

 Documentation/config/pack.txt           |   7 ++
 Documentation/git-index-pack.txt        |  20 ++--
 Documentation/technical/pack-format.txt |  17 ++++
 builtin/index-pack.c                    |  67 +++++++++++--
 builtin/pack-objects.c                  |   9 ++
 builtin/repack.c                        |   1 +
 object-store.h                          |   3 +
 pack-revindex.c                         | 116 ++++++++++++++++++++--
 pack-revindex.h                         |   3 +
 pack-write.c                            | 123 +++++++++++++++++++++++-
 pack.h                                  |   4 +
 packfile.c                              |  13 ++-
 packfile.h                              |   1 +
 t/README                                |   3 +
 t/t5319-multi-pack-index.sh             |   2 +-
 t/t5325-reverse-index.sh                |  94 ++++++++++++++++++
 t/t5604-clone-reference.sh              |   2 +-
 t/t5702-protocol-v2.sh                  |   4 +-
 t/t6500-gc.sh                           |   4 +-
 t/t9300-fast-import.sh                  |   2 +-
 tmp-objdir.c                            |   4 +-
 21 files changed, 463 insertions(+), 36 deletions(-)
 create mode 100755 t/t5325-reverse-index.sh

-- 
2.30.0.138.g6d7191ea01

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
@ 2021-01-08 18:19 ` Taylor Blau
  2021-01-08 18:20 ` [PATCH 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:19 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

Specify the format of the on-disk reverse index 'pack-*.rev' file, as
well as prepare the code for the existence of such files.

The reverse index maps from pack relative positions (i.e., an index into
the array of object which is sorted by their offsets within the
packfile) to their position within the 'pack-*.idx' file. Today, this is
done by building up a list of (off_t, uint32_t) tuples for each object
(the off_t corresponding to that object's offset, and the uint32_t
corresponding to its position in the index). To convert between pack and
index position quickly, this array of tuples is radix sorted based on
its offset.

This has two major drawbacks:

First, the in-memory cost scales linearly with the number of objects in
a pack.  Each 'struct revindex_entry' is sizeof(off_t) +
sizeof(uint32_t) + padding bytes for a total of 16.

To observe this, force Git to load the reverse index by, for e.g.,
running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking
for a single object in a fresh clone of the kernel, Git needs to
allocate 120+ MB of memory in order to hold the reverse index in memory.

Second, the cost to sort also scales with the size of the pack.
Luckily, this is a linear function since 'load_pack_revindex()' uses a
radix sort, but this cost still must be paid once per pack per process.

As an example, it takes ~60x longer to print the _size_ of an object as
it does to print that entire object's _contents_:

  Benchmark #1: git.compile cat-file --batch <obj
    Time (mean ± σ):       3.4 ms ±   0.1 ms    [User: 3.3 ms, System: 2.1 ms]
    Range (min … max):     3.2 ms …   3.7 ms    726 runs

  Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj
    Time (mean ± σ):     210.3 ms ±   8.9 ms    [User: 188.2 ms, System: 23.2 ms]
    Range (min … max):   193.7 ms … 224.4 ms    13 runs

Instead, avoid computing and sorting the revindex once per process by
writing it to a file when the pack itself is generated.

The format is relatively straightforward. It contains an array of
uint32_t's, the length of which is equal to the number of objects in the
pack.  The ith entry in this table contains the index position of the
ith object in the pack, where "ith object in the pack" is determined by
pack offset.

One thing that the on-disk format does _not_ contain is the full (up to)
eight-byte offset corresponding to each object. This is something that
the in-memory revindex contains (it stores an off_t in 'struct
revindex_entry' along with the same uint32_t that the on-disk format
has). Omit it in the on-disk format, since knowing the index position
for some object is sufficient to get a constant-time lookup in the
pack-*.idx file to ask for an object's offset within the pack.

This trades off between the on-disk size of the 'pack-*.rev' file for
runtime to chase down the offset for some object. Even though the lookup
is constant time, the constant is heavier, since it can potentially
involve two pointer walks in v2 indexes (one to access the 4-byte offset
table, and potentially a second to access the double wide offset table).

Consider trying to map an object's pack offset to a relative position
within that pack. In a cold-cache scenario, more page faults occur while
switching between binary searching through the reverse index and
searching through the *.idx file for an object's offset. Sure enough,
with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after
'sync'ing), printing out the entire object's contents is still
marginally faster than printing its size:

  Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null
    Time (mean ± σ):      22.6 ms ±   0.5 ms    [User: 2.4 ms, System: 7.9 ms]
    Range (min … max):    21.4 ms …  23.5 ms    41 runs

  Benchmark #2: git.compile cat-file --batch <obj >/dev/null
    Time (mean ± σ):      17.2 ms ±   0.7 ms    [User: 2.8 ms, System: 5.5 ms]
    Range (min … max):    15.6 ms …  18.2 ms    45 runs

(Numbers taken in the kernel after cheating and using the next patch to
generate a reverse index). There are a couple of approaches to improve
cold cache performance not pursued here:

  - We could include the object offsets in the reverse index format.
    Predictably, this does result in fewer page faults, but it triples
    the size of the file, while simultaneously duplicating a ton of data
    already available in the .idx file. (This was the original way I
    implemented the format, and it did show
    `--batch-check='%(objectsize:disk)'` winning out against `--batch`.)

    On the other hand, this increase in size also results in a large
    block-cache footprint, which could potentially hurt other workloads.

  - We could store the mapping from pack to index position in more
    cache-friendly way, like constructing a binary search tree from the
    table and writing the values in breadth-first order. This would
    result in much better locality, but the price you pay is trading
    O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you
    can no longer directly index the table).

So, neither of these approaches are taken here. (Thankfully, the format
is versioned, so we are free to pursue these in the future.) But, cold
cache performance likely isn't interesting outside of one-off cases like
asking for the size of an object directly. In real-world usage, Git is
often performing many operations in the revindex,

The trade-off is worth it, since we will avoid the vast majority of the
cost of generating the revindex that the extra pointer chase will look
like noise in the following patch's benchmarks.

This patch describes the format and prepares callers (like in
pack-revindex.c) to be able to read *.rev files once they exist. An
implementation of the writer will appear in the next patch, and callers
will gradually begin to start using the writer in the patches that
follow after that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/pack-format.txt |  17 ++++
 builtin/repack.c                        |   1 +
 object-store.h                          |   3 +
 pack-revindex.c                         | 112 +++++++++++++++++++++---
 packfile.c                              |  13 ++-
 packfile.h                              |   1 +
 tmp-objdir.c                            |   4 +-
 7 files changed, 139 insertions(+), 12 deletions(-)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index f96b2e605f..9593f8bc68 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -259,6 +259,23 @@ Pack file entry: <+
 
     Index checksum of all of the above.
 
+== pack-*.rev files have the format:
+
+  - A 4-byte magic number '0x52494458' ('RIDX').
+
+  - A 4-byte version identifier (= 1)
+
+  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)
+
+  - A table of index positions, sorted by their corresponding offsets in the
+    packfile.
+
+  - A trailer, containing a:
+
+    checksum of the corresponding packfile, and
+
+    a checksum of all of the above.
+
 == multi-pack-index (MIDX) files have the following format:
 
 The multi-pack-index files refer to multiple pack-files and loose objects.
diff --git a/builtin/repack.c b/builtin/repack.c
index 279be11a16..8d643ddcb9 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -208,6 +208,7 @@ static struct {
 } exts[] = {
 	{".pack"},
 	{".idx"},
+	{".rev", 1},
 	{".bitmap", 1},
 	{".promisor", 1},
 };
diff --git a/object-store.h b/object-store.h
index c4fc9dd74e..3fbf11280f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -85,6 +85,9 @@ struct packed_git {
 		 multi_pack_index:1;
 	unsigned char hash[GIT_MAX_RAWSZ];
 	struct revindex_entry *revindex;
+	const void *revindex_data;
+	const void *revindex_map;
+	size_t revindex_size;
 	/* something like ".git/objects/pack/xxxxx.pack" */
 	char pack_name[FLEX_ARRAY]; /* more */
 };
diff --git a/pack-revindex.c b/pack-revindex.c
index 2cd9d632f1..1baaf2c42a 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -164,16 +164,98 @@ static void create_pack_revindex(struct packed_git *p)
 	sort_revindex(p->revindex, num_ent, p->pack_size);
 }
 
-int load_pack_revindex(struct packed_git *p)
+static int load_pack_revindex_from_memory(struct packed_git *p)
 {
-	if (!p->revindex) {
-		if (open_pack_index(p))
-			return -1;
-		create_pack_revindex(p);
-	}
+	if (open_pack_index(p))
+		return -1;
+	create_pack_revindex(p);
 	return 0;
 }
 
+static char *pack_revindex_filename(struct packed_git *p)
+{
+	size_t len;
+	if (!strip_suffix(p->pack_name, ".pack", &len))
+		BUG("pack_name does not end in .pack");
+	return xstrfmt("%.*s.rev", (int)len, p->pack_name);
+}
+
+#define RIDX_MIN_SIZE (12 + (2 * the_hash_algo->rawsz))
+
+static int load_revindex_from_disk(char *revindex_name,
+				   uint32_t num_objects,
+				   const void **data, size_t *len)
+{
+	int fd, ret = 0;
+	struct stat st;
+	size_t revindex_size;
+
+	fd = git_open(revindex_name);
+
+	if (fd < 0) {
+		ret = -1;
+		goto cleanup;
+	}
+	if (fstat(fd, &st)) {
+		ret = error_errno(_("failed to read %s"), revindex_name);
+		goto cleanup;
+	}
+
+	revindex_size = xsize_t(st.st_size);
+
+	if (revindex_size < RIDX_MIN_SIZE) {
+		ret = error(_("reverse-index file %s is too small"), revindex_name);
+		goto cleanup;
+	}
+
+	if (revindex_size - RIDX_MIN_SIZE != st_mult(sizeof(uint32_t), num_objects)) {
+		ret = error(_("reverse-index file %s is corrupt"), revindex_name);
+		goto cleanup;
+	}
+
+	*len = revindex_size;
+	*data = xmmap(NULL, revindex_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+cleanup:
+	close(fd);
+	return ret;
+}
+
+static int load_pack_revindex_from_disk(struct packed_git *p)
+{
+	char *revindex_name;
+	int ret;
+	if (open_pack_index(p))
+		return -1;
+
+	revindex_name = pack_revindex_filename(p);
+
+	ret = load_revindex_from_disk(revindex_name,
+				      p->num_objects,
+				      &p->revindex_map,
+				      &p->revindex_size);
+	if (ret)
+		goto cleanup;
+
+	p->revindex_data = (char *)p->revindex_map + 12;
+
+cleanup:
+	free(revindex_name);
+	return ret;
+}
+
+int load_pack_revindex(struct packed_git *p)
+{
+	if (p->revindex || p->revindex_data)
+		return 0;
+
+	if (!load_pack_revindex_from_disk(p))
+		return 0;
+	else if (!load_pack_revindex_from_memory(p))
+		return 0;
+	return -1;
+}
+
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	int lo = 0;
@@ -203,18 +285,28 @@ int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 
 uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos)
 {
-	if (!p->revindex)
+	if (!(p->revindex || p->revindex_data))
 		BUG("pack_pos_to_index: reverse index not yet loaded");
 	if (pos >= p->num_objects)
 		BUG("pack_pos_to_index: out-of-bounds object at %"PRIu32, pos);
-	return p->revindex[pos].nr;
+
+	if (p->revindex)
+		return p->revindex[pos].nr;
+	else
+		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
 }
 
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
 {
-	if (!p->revindex)
+	if (!(p->revindex || p->revindex_data))
 		BUG("pack_pos_to_index: reverse index not yet loaded");
 	if (pos > p->num_objects)
 		BUG("pack_pos_to_offset: out-of-bounds object at %"PRIu32, pos);
-	return p->revindex[pos].offset;
+
+	if (p->revindex)
+		return p->revindex[pos].offset;
+	else if (pos == p->num_objects)
+		return p->pack_size - the_hash_algo->rawsz;
+	else
+		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
 }
diff --git a/packfile.c b/packfile.c
index 46c9c7ea3c..e636e5ca17 100644
--- a/packfile.c
+++ b/packfile.c
@@ -324,11 +324,21 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
+void close_pack_revindex(struct packed_git *p) {
+	if (!p->revindex_map)
+		return;
+
+	munmap((void *)p->revindex_map, p->revindex_size);
+	p->revindex_map = NULL;
+	p->revindex_data = NULL;
+}
+
 void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
 	close_pack_index(p);
+	close_pack_revindex(p);
 }
 
 void close_object_store(struct raw_object_store *o)
@@ -351,7 +361,7 @@ void close_object_store(struct raw_object_store *o)
 
 void unlink_pack_path(const char *pack_name, int force_delete)
 {
-	static const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
+	static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;
@@ -853,6 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (ends_with(file_name, ".idx") ||
+	    ends_with(file_name, ".rev") ||
 	    ends_with(file_name, ".pack") ||
 	    ends_with(file_name, ".bitmap") ||
 	    ends_with(file_name, ".keep") ||
diff --git a/packfile.h b/packfile.h
index a58fc738e0..4cfec9e8d3 100644
--- a/packfile.h
+++ b/packfile.h
@@ -90,6 +90,7 @@ uint32_t get_pack_fanout(struct packed_git *p, uint32_t value);
 
 unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 void close_pack_windows(struct packed_git *);
+void close_pack_revindex(struct packed_git *);
 void close_pack(struct packed_git *);
 void close_object_store(struct raw_object_store *o);
 void unuse_pack(struct pack_window **);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index 42ed4db5d3..da414df14f 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -187,7 +187,9 @@ static int pack_copy_priority(const char *name)
 		return 2;
 	if (ends_with(name, ".idx"))
 		return 3;
-	return 4;
+	if (ends_with(name, ".rev"))
+		return 4;
+	return 5;
 }
 
 static int pack_copy_cmp(const char *a, const char *b)
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/8] pack-write.c: prepare to write 'pack-*.rev' files
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
  2021-01-08 18:19 ` [PATCH 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
@ 2021-01-08 18:20 ` Taylor Blau
  2021-01-08 18:20 ` [PATCH 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:20 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

This patch prepares for callers to be able to write reverse index files
to disk.

It adds the necessary machinery to write a format-compliant .rev file
from within 'write_rev_file()', which is called from
'finish_tmp_packfile()'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-write.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 pack.h       |   4 ++
 2 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/pack-write.c b/pack-write.c
index 3513665e1e..68db5a9edf 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -166,6 +166,116 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	return index_name;
 }
 
+static int pack_order_cmp(const void *va, const void *vb, void *ctx)
+{
+	struct pack_idx_entry **objects = ctx;
+
+	off_t oa = objects[*(uint32_t*)va]->offset;
+	off_t ob = objects[*(uint32_t*)vb]->offset;
+
+	if (oa < ob)
+		return -1;
+	if (oa > ob)
+		return 1;
+	return 0;
+}
+
+#define RIDX_SIGNATURE 0x52494458 /* "RIDX" */
+#define RIDX_VERSION 1
+
+static void write_rev_header(struct hashfile *f)
+{
+	uint32_t oid_version;
+	switch (hash_algo_by_ptr(the_hash_algo)) {
+	case GIT_HASH_SHA1:
+		oid_version = 1;
+		break;
+	case GIT_HASH_SHA256:
+		oid_version = 2;
+		break;
+	default:
+		die("write_rev_header: unknown hash version");
+	}
+
+	hashwrite_be32(f, RIDX_SIGNATURE);
+	hashwrite_be32(f, RIDX_VERSION);
+	hashwrite_be32(f, oid_version);
+}
+
+static void write_rev_index_positions(struct hashfile *f,
+				      struct pack_idx_entry **objects,
+				      uint32_t nr_objects)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+
+	ALLOC_ARRAY(pack_order, nr_objects);
+	for (i = 0; i < nr_objects; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
+
+	for (i = 0; i < nr_objects; i++)
+		hashwrite_be32(f, pack_order[i]);
+
+	free(pack_order);
+}
+
+static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
+{
+	hashwrite(f, hash, the_hash_algo->rawsz);
+}
+
+const char *write_rev_file(const char *rev_name,
+			   struct pack_idx_entry **objects,
+			   uint32_t nr_objects,
+			   const unsigned char *hash,
+			   unsigned flags)
+{
+	struct hashfile *f;
+	int fd;
+
+	if ((flags & WRITE_REV) && (flags & WRITE_REV_VERIFY))
+		die(_("cannot both write and verify reverse index"));
+
+	if (flags & WRITE_REV) {
+		if (!rev_name) {
+			struct strbuf tmp_file = STRBUF_INIT;
+			fd = odb_mkstemp(&tmp_file, "pack/tmp_rev_XXXXXX");
+			rev_name = strbuf_detach(&tmp_file, NULL);
+		} else {
+			unlink(rev_name);
+			fd = open(rev_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+			if (fd < 0)
+				die_errno("unable to create '%s'", rev_name);
+		}
+		f = hashfd(fd, rev_name);
+	} else if (flags & WRITE_REV_VERIFY) {
+		struct stat statbuf;
+		if (stat(rev_name, &statbuf)) {
+			if (errno == ENOENT) {
+				/* .rev files are optional */
+				return NULL;
+			} else
+				die_errno(_("could not stat: %s"), rev_name);
+		}
+		f = hashfd_check(rev_name);
+	} else
+		return NULL;
+
+	write_rev_header(f);
+
+	write_rev_index_positions(f, objects, nr_objects);
+	write_rev_trailer(f, hash);
+
+	if (rev_name && adjust_shared_perm(rev_name) < 0)
+		die(_("failed to make %s readable"), rev_name);
+
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+				    ((flags & WRITE_IDX_VERIFY) ? 0 : CSUM_FSYNC));
+
+	return rev_name;
+}
+
 off_t write_pack_header(struct hashfile *f, uint32_t nr_entries)
 {
 	struct pack_header hdr;
@@ -341,7 +451,7 @@ void finish_tmp_packfile(struct strbuf *name_buffer,
 			 struct pack_idx_option *pack_idx_opts,
 			 unsigned char hash[])
 {
-	const char *idx_tmp_name;
+	const char *idx_tmp_name, *rev_tmp_name = NULL;
 	int basename_len = name_buffer->len;
 
 	if (adjust_shared_perm(pack_tmp_name))
@@ -352,6 +462,9 @@ void finish_tmp_packfile(struct strbuf *name_buffer,
 	if (adjust_shared_perm(idx_tmp_name))
 		die_errno("unable to make temporary index file readable");
 
+	rev_tmp_name = write_rev_file(NULL, written_list, nr_written, hash,
+				      pack_idx_opts->flags);
+
 	strbuf_addf(name_buffer, "%s.pack", hash_to_hex(hash));
 
 	if (rename(pack_tmp_name, name_buffer->buf))
@@ -365,5 +478,13 @@ void finish_tmp_packfile(struct strbuf *name_buffer,
 
 	strbuf_setlen(name_buffer, basename_len);
 
+	if (rev_tmp_name) {
+		strbuf_addf(name_buffer, "%s.rev", hash_to_hex(hash));
+		if (rename(rev_tmp_name, name_buffer->buf))
+			die_errno("unable to rename temporary reverse-index file");
+	}
+
+	strbuf_setlen(name_buffer, basename_len);
+
 	free((void *)idx_tmp_name);
 }
diff --git a/pack.h b/pack.h
index 9fc0945ac9..30439e0784 100644
--- a/pack.h
+++ b/pack.h
@@ -42,6 +42,8 @@ struct pack_idx_option {
 	/* flag bits */
 #define WRITE_IDX_VERIFY 01 /* verify only, do not write the idx file */
 #define WRITE_IDX_STRICT 02
+#define WRITE_REV 04
+#define WRITE_REV_VERIFY 010
 
 	uint32_t version;
 	uint32_t off32_limit;
@@ -87,6 +89,8 @@ off_t write_pack_header(struct hashfile *f, uint32_t);
 void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 char *index_pack_lockfile(int fd);
 
+const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
+
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
  * up to 2^67.
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 3/8] builtin/index-pack.c: write reverse indexes
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
  2021-01-08 18:19 ` [PATCH 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
  2021-01-08 18:20 ` [PATCH 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
@ 2021-01-08 18:20 ` Taylor Blau
  2021-01-08 18:20 ` [PATCH 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:20 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

Teach 'git index-pack' to optionally write and verify reverse index with
'--[no-]rev-index', as well as respecting the 'pack.writeReverseIndex'
configuration option.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-index-pack.txt | 20 ++++++---
 builtin/index-pack.c             | 64 ++++++++++++++++++++++++-----
 t/t5325-reverse-index.sh         | 69 ++++++++++++++++++++++++++++++++
 3 files changed, 137 insertions(+), 16 deletions(-)
 create mode 100755 t/t5325-reverse-index.sh

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index af0c26232c..b65f380269 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -9,17 +9,18 @@ git-index-pack - Build pack index file for an existing packed archive
 SYNOPSIS
 --------
 [verse]
-'git index-pack' [-v] [-o <index-file>] <pack-file>
+'git index-pack' [-v] [-o <index-file>] [--[no-]rev-index] <pack-file>
 'git index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>]
-                 [<pack-file>]
+		  [--[no-]rev-index] [<pack-file>]
 
 
 DESCRIPTION
 -----------
 Reads a packed archive (.pack) from the specified file, and
-builds a pack index file (.idx) for it.  The packed archive
-together with the pack index can then be placed in the
-objects/pack/ directory of a Git repository.
+builds a pack index file (.idx) for it. Optionally writes a
+reverse-index (.rev) for the specified pack. The packed
+archive together with the pack index can then be placed in
+the objects/pack/ directory of a Git repository.
 
 
 OPTIONS
@@ -33,7 +34,14 @@ OPTIONS
 	file is constructed from the name of packed archive
 	file by replacing .pack with .idx (and the program
 	fails if the name of packed archive does not end
-	with .pack).
+	with .pack). Incompatible with `--rev-index`.
+
+--[no-]rev-index::
+	When this flag is provided, generate a reverse index
+	(a `.rev` file) corresponding to the given pack. If
+	`--verify` is given, ensure that the existing
+	reverse index is correct. Takes precedence over
+	`pack.writeReverseIndex`.
 
 --stdin::
 	When this flag is provided, the pack is read from stdin
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4b8d86e0ad..03408250b1 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -17,7 +17,7 @@
 #include "promisor-remote.h"
 
 static const char index_pack_usage[] =
-"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
+"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
 
 struct object_entry {
 	struct pack_idx_entry idx;
@@ -1436,13 +1436,13 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	free(sorted_by_pos);
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-				   struct strbuf *buf)
+static const char *derive_filename(const char *pack_name, const char *strip,
+				   const char *suffix, struct strbuf *buf)
 {
 	size_t len;
-	if (!strip_suffix(pack_name, ".pack", &len))
-		die(_("packfile name '%s' does not end with '.pack'"),
-		    pack_name);
+	if (!strip_suffix(pack_name, strip, &len))
+		die(_("packfile name '%s' does not end with '%s'"),
+		    pack_name, strip);
 	strbuf_add(buf, pack_name, len);
 	strbuf_addch(buf, '.');
 	strbuf_addstr(buf, suffix);
@@ -1459,7 +1459,7 @@ static void write_special_file(const char *suffix, const char *msg,
 	int msg_len = strlen(msg);
 
 	if (pack_name)
-		filename = derive_filename(pack_name, suffix, &name_buf);
+		filename = derive_filename(pack_name, ".pack", suffix, &name_buf);
 	else
 		filename = odb_pack_name(&name_buf, hash, suffix);
 
@@ -1484,12 +1484,14 @@ static void write_special_file(const char *suffix, const char *msg,
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
+		  const char *final_rev_index_name, const char *curr_rev_index_name,
 		  const char *keep_msg, const char *promisor_msg,
 		  unsigned char *hash)
 {
 	const char *report = "pack";
 	struct strbuf pack_name = STRBUF_INIT;
 	struct strbuf index_name = STRBUF_INIT;
+	struct strbuf rev_index_name = STRBUF_INIT;
 	int err;
 
 	if (!from_stdin) {
@@ -1524,6 +1526,16 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	} else
 		chmod(final_index_name, 0444);
 
+	if (curr_rev_index_name) {
+		if (final_rev_index_name != curr_rev_index_name) {
+			if (!final_rev_index_name)
+				final_rev_index_name = odb_pack_name(&rev_index_name, hash, "rev");
+			if (finalize_object_file(curr_rev_index_name, final_rev_index_name))
+				die(_("cannot store reverse index file"));
+		} else
+			chmod(final_rev_index_name, 0444);
+	}
+
 	if (do_fsck_object) {
 		struct packed_git *p;
 		p = add_packed_git(final_index_name, strlen(final_index_name), 0);
@@ -1553,6 +1565,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 		}
 	}
 
+	strbuf_release(&rev_index_name);
 	strbuf_release(&index_name);
 	strbuf_release(&pack_name);
 }
@@ -1578,6 +1591,12 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
 		}
 		return 0;
 	}
+	if (!strcmp(k, "pack.writereverseindex")) {
+		if (git_config_bool(k, v))
+			opts->flags |= WRITE_REV;
+		else
+			opts->flags &= ~WRITE_REV;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -1695,12 +1714,14 @@ static void show_pack_info(int stat_only)
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
-	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
+	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
 	const char *curr_index;
-	const char *index_name = NULL, *pack_name = NULL;
+	const char *curr_rev_index = NULL;
+	const char *index_name = NULL, *pack_name = NULL, *rev_index_name = NULL;
 	const char *keep_msg = NULL;
 	const char *promisor_msg = NULL;
 	struct strbuf index_name_buf = STRBUF_INIT;
+	struct strbuf rev_index_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_hash[GIT_MAX_RAWSZ];
@@ -1727,6 +1748,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (prefix && chdir(prefix))
 		die(_("Cannot come back to cwd"));
 
+	rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
+
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 
@@ -1805,6 +1828,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				if (hash_algo == GIT_HASH_UNKNOWN)
 					die(_("unknown hash algorithm '%s'"), arg);
 				repo_set_hash_algo(the_repository, hash_algo);
+			} else if (!strcmp(arg, "--rev-index")) {
+				rev_index = 1;
+			} else if (!strcmp(arg, "--no-rev-index")) {
+				rev_index = 0;
 			} else
 				usage(index_pack_usage);
 			continue;
@@ -1824,7 +1851,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (from_stdin && hash_algo)
 		die(_("--object-format cannot be used with --stdin"));
 	if (!index_name && pack_name)
-		index_name = derive_filename(pack_name, "idx", &index_name_buf);
+		index_name = derive_filename(pack_name, ".pack", "idx", &index_name_buf);
+
+	opts.flags &= ~(WRITE_REV | WRITE_REV_VERIFY);
+	if (rev_index) {
+		opts.flags |= verify ? WRITE_REV_VERIFY : WRITE_REV;
+		if (index_name)
+			rev_index_name = derive_filename(index_name,
+							 ".idx", "rev",
+							 &rev_index_name_buf);
+	}
 
 	if (verify) {
 		if (!index_name)
@@ -1878,11 +1914,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	for (i = 0; i < nr_objects; i++)
 		idx_objects[i] = &objects[i].idx;
 	curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_hash);
+	if (rev_index)
+		curr_rev_index = write_rev_file(rev_index_name, idx_objects,
+						nr_objects, pack_hash,
+						opts.flags);
 	free(idx_objects);
 
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
+		      rev_index_name, curr_rev_index,
 		      keep_msg, promisor_msg,
 		      pack_hash);
 	else
@@ -1893,10 +1934,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	free(objects);
 	strbuf_release(&index_name_buf);
+	strbuf_release(&rev_index_name_buf);
 	if (pack_name == NULL)
 		free((void *) curr_pack);
 	if (index_name == NULL)
 		free((void *) curr_index);
+	if (rev_index_name == NULL)
+		free((void *) curr_rev_index);
 
 	/*
 	 * Let the caller know this pack is not self contained
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
new file mode 100755
index 0000000000..f9afa698dc
--- /dev/null
+++ b/t/t5325-reverse-index.sh
@@ -0,0 +1,69 @@
+#!/bin/sh
+
+test_description='on-disk reverse index'
+. ./test-lib.sh
+
+packdir=.git/objects/pack
+
+test_expect_success 'setup' '
+	test_commit base &&
+
+	pack=$(git pack-objects --all $packdir/pack) &&
+	rev=$packdir/pack-$pack.rev &&
+
+	test_path_is_missing $rev
+'
+
+test_index_pack () {
+	rm -f $rev &&
+	conf=$1 &&
+	shift &&
+	git -c pack.writeReverseIndex=$conf index-pack "$@" \
+		$packdir/pack-$pack.pack
+}
+
+test_expect_success 'index-pack with pack.writeReverseIndex' '
+	test_index_pack "" &&
+	test_path_is_missing $rev &&
+
+	test_index_pack false &&
+	test_path_is_missing $rev &&
+
+	test_index_pack true &&
+	test_path_is_file $rev
+'
+
+test_expect_success 'index-pack with --[no-]rev-index' '
+	for conf in "" true false
+	do
+		test_index_pack "$conf" --rev-index &&
+		test_path_exists $rev &&
+
+		test_index_pack "$conf" --no-rev-index &&
+		test_path_is_missing $rev
+	done
+'
+
+test_expect_success 'index-pack can verify reverse indexes' '
+	test_when_finished "rm -f $rev" &&
+	test_index_pack true &&
+
+	test_path_is_file $rev &&
+	git index-pack --rev-index --verify $packdir/pack-$pack.pack &&
+
+	# Intentionally corrupt the reverse index.
+	chmod u+w $rev &&
+	printf "xxxx" | dd of=$rev bs=1 count=4 conv=notrunc &&
+
+	test_must_fail git index-pack --rev-index --verify \
+		$packdir/pack-$pack.pack 2>err &&
+	grep "validation error" err
+'
+
+test_expect_success 'index-pack infers reverse index name with -o' '
+	git index-pack --rev-index -o other.idx $packdir/pack-$pack.pack &&
+	test_path_is_file other.idx &&
+	test_path_is_file other.rev
+'
+
+test_done
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex'
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                   ` (2 preceding siblings ...)
  2021-01-08 18:20 ` [PATCH 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
@ 2021-01-08 18:20 ` Taylor Blau
  2021-01-08 18:20 ` [PATCH 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:20 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

Now that we have an implementation that can write the new reverse index
format, enable writing a .rev file in 'git pack-objects' by consulting
the pack.writeReverseIndex configuration variable.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c   |  7 +++++++
 t/t5325-reverse-index.sh | 13 +++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a193cdaf2f..80adce154a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2951,6 +2951,13 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 			    pack_idx_opts.version);
 		return 0;
 	}
+	if (!strcmp(k, "pack.writereverseindex")) {
+		if (git_config_bool(k, v))
+			pack_idx_opts.flags |= WRITE_REV;
+		else
+			pack_idx_opts.flags &= ~WRITE_REV;
+		return 0;
+	}
 	if (!strcmp(k, "uploadpack.blobpackfileuri")) {
 		struct configured_exclusion *ex = xmalloc(sizeof(*ex));
 		const char *oid_end, *pack_end;
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index f9afa698dc..431699ade2 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -66,4 +66,17 @@ test_expect_success 'index-pack infers reverse index name with -o' '
 	test_path_is_file other.rev
 '
 
+test_expect_success 'pack-objects respects pack.writeReverseIndex' '
+	test_when_finished "rm -fr pack-1-*" &&
+
+	git -c pack.writeReverseIndex= pack-objects --all pack-1 &&
+	test_path_is_missing pack-1-*.rev &&
+
+	git -c pack.writeReverseIndex=false pack-objects --all pack-1 &&
+	test_path_is_missing pack-1-*.rev &&
+
+	git -c pack.writeReverseIndex=true pack-objects --all pack-1 &&
+	test_path_is_file pack-1-*.rev
+'
+
 test_done
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                   ` (3 preceding siblings ...)
  2021-01-08 18:20 ` [PATCH 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
@ 2021-01-08 18:20 ` Taylor Blau
  2021-01-08 18:20 ` [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:20 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

Now that the pack.writeReverseIndex configuration is respected in both
'git index-pack' and 'git pack-objects' (and therefore, all of their
callers), we can safely advertise it for use in the git-config manual.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 837f1b1679..3da4ea98e2 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -133,3 +133,10 @@ pack.writeBitmapHashCache::
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
 	bytes per object of disk space. Defaults to true.
+
+pack.writeReverseIndex::
+	When true, git will write a corresponding .rev file (see:
+	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
+	for each new packfile that it writes in all places except for
+	linkgit:git-fast-import[1] and in the bulk checkin mechanism.
+	Defaults to false.
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                   ` (4 preceding siblings ...)
  2021-01-08 18:20 ` [PATCH 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
@ 2021-01-08 18:20 ` Taylor Blau
  2021-01-12 17:11   ` Ævar Arnfjörð Bjarmason
  2021-01-08 18:20 ` [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:20 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

In the next patch, we'll add support for unconditionally enabling the
'pack.writeReverseIndex' setting with a new GIT_TEST_WRITE_REV_INDEX
environment variable.

This causes a little bit of fallout with tests that, for example,
compare the list of files in the pack directory being unprepared to see
.rev files in its output.

For now, sprinkle these locations with a 'grep -v "\.rev$"' to ignore
them. Once the pack.writeReverseIndex option has been thoroughly
tested, we will default it to 'true', removing GIT_TEST_WRITE_REV_INDEX,
and making it possible to revert this patch.

At that time, we'll have to adjust the expected output to contain the
relevant .rev files, but for now this will do just fine.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 2 +-
 t/t5325-reverse-index.sh    | 4 ++++
 t/t5604-clone-reference.sh  | 2 +-
 t/t5702-protocol-v2.sh      | 4 ++--
 t/t6500-gc.sh               | 4 ++--
 t/t9300-fast-import.sh      | 2 +-
 6 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 297de502a9..9696f88c2f 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -710,7 +710,7 @@ test_expect_success 'expire respects .keep files' '
 		PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) &&
 		touch $PACKA.keep &&
 		git multi-pack-index expire &&
-		ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files &&
+		ls -S .git/objects/pack/a-pack* | grep $PACKA | grep -v "\.rev$" >a-pack-files &&
 		test_line_count = 3 a-pack-files &&
 		test-tool read-midx .git/objects | grep idx >midx-list &&
 		test_line_count = 2 midx-list
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index 431699ade2..5a59cc71e4 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -3,6 +3,10 @@
 test_description='on-disk reverse index'
 . ./test-lib.sh
 
+# The below tests want control over the 'pack.writeReverseIndex' setting
+# themselves to assert various combinations of it with other options.
+sane_unset GIT_TEST_WRITE_REV_INDEX
+
 packdir=.git/objects/pack
 
 test_expect_success 'setup' '
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 2f7be23044..d064426d03 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -318,7 +318,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at obje
 		test_cmp T.objects T$option.objects &&
 		(
 			cd T$option/.git/objects &&
-			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type f | grep -v \.rev$ | sort >../../../T$option.objects-files.raw &&
 			find . -type l | sort >../../../T$option.objects-symlinks.raw
 		)
 	done &&
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 7d5b17909b..9ebf045739 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -848,7 +848,7 @@ test_expect_success 'part of packfile response provided as URI' '
 	test -f h2found &&
 
 	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	ls http_child/.git/objects/pack/* | grep -v \.rev$ >filelist &&
 	test_line_count = 6 filelist
 '
 
@@ -902,7 +902,7 @@ test_expect_success 'packfile-uri with transfer.fsckobjects' '
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
 	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	ls http_child/.git/objects/pack/* | grep -v \.rev$ >filelist &&
 	test_line_count = 4 filelist
 '
 
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 4a3b8f48ac..d52f92f5a1 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -106,13 +106,13 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_commit "$(test_oid obj2)" &&
 	# Our first gc will create a pack; our second will create a second pack
 	git gc --auto &&
-	ls .git/objects/pack | sort >existing_packs &&
+	ls .git/objects/pack | grep -v \.rev$ | sort >existing_packs &&
 	test_commit "$(test_oid obj3)" &&
 	test_commit "$(test_oid obj4)" &&
 
 	git gc --auto 2>err &&
 	test_i18ngrep ! "^warning:" err &&
-	ls .git/objects/pack/ | sort >post_packs &&
+	ls .git/objects/pack/ | grep -v \.rev$ | sort >post_packs &&
 	comm -1 -3 existing_packs post_packs >new &&
 	comm -2 -3 existing_packs post_packs >del &&
 	test_line_count = 0 del && # No packs are deleted
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 308c1ef42c..100df52a71 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1629,7 +1629,7 @@ test_expect_success 'O: blank lines not necessary after other commands' '
 	INPUT_END
 
 	git fast-import <input &&
-	test 8 = $(find .git/objects/pack -type f | grep -v multi-pack-index | wc -l) &&
+	test 8 = $(find .git/objects/pack -type f \( -name "*.idx" -o -name "*.pack" \) | wc -l) &&
 	test $(git rev-parse refs/tags/O3-2nd) = $(git rev-parse O3^) &&
 	git log --reverse --pretty=oneline O3 | sed s/^.*z// >actual &&
 	test_cmp expect actual
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                   ` (5 preceding siblings ...)
  2021-01-08 18:20 ` [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
@ 2021-01-08 18:20 ` Taylor Blau
  2021-01-12 16:49   ` Derrick Stolee
  2021-01-12 17:18   ` Ævar Arnfjörð Bjarmason
  2021-01-08 18:20 ` [PATCH 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
  8 siblings, 2 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:20 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

Add a new option that unconditionally enables the pack.writeReverseIndex
setting in order to run the whole test suite in a mode that generates
on-disk reverse indexes.

Once on-disk reverse indexes are proven out over several releases, we
can change the default value of that configuration to 'true', and drop
this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/index-pack.c   | 5 ++++-
 builtin/pack-objects.c | 2 ++
 pack-revindex.h        | 2 ++
 t/README               | 3 +++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 03408250b1..0bde325a8b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1748,7 +1748,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (prefix && chdir(prefix))
 		die(_("Cannot come back to cwd"));
 
-	rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
+	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
+		rev_index = 1;
+	else
+		rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
 
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80adce154a..5b3395bd7a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3597,6 +3597,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	reset_pack_idx_option(&pack_idx_opts);
 	git_config(git_pack_config, NULL);
+	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
+		pack_idx_opts.flags |= WRITE_REV;
 
 	progress = isatty(2);
 	argc = parse_options(argc, argv, prefix, pack_objects_options,
diff --git a/pack-revindex.h b/pack-revindex.h
index b501a7cd62..d2d466e298 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -1,6 +1,8 @@
 #ifndef PACK_REVINDEX_H
 #define PACK_REVINDEX_H
 
+#define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
+
 struct packed_git;
 
 int load_pack_revindex(struct packed_git *p);
diff --git a/t/README b/t/README
index c730a70770..0f97a51640 100644
--- a/t/README
+++ b/t/README
@@ -439,6 +439,9 @@ GIT_TEST_DEFAULT_HASH=<hash-algo> specifies which hash algorithm to
 use in the test scripts. Recognized values for <hash-algo> are "sha1"
 and "sha256".
 
+GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
+'pack.writeReverseIndex' setting.
+
 Naming Tests
 ------------
 
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                   ` (6 preceding siblings ...)
  2021-01-08 18:20 ` [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
@ 2021-01-08 18:20 ` Taylor Blau
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
  8 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-08 18:20 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder

When an on-disk reverse index exists, there is no need to generate one
in memory. In fact, doing so can be slow, and require large amounts of
the heap.

Let's make sure that we treat the on-disk reverse index with precedence
(i.e., that when it exists, we don't bother trying to generate an
equivalent one in memory) by teaching Git how to conditionally die()
when generating a reverse index in memory.

Then, add a test to ensure that when (a) an on-disk reverse index
exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we
do not die, implying that we read from the on-disk one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-revindex.c          | 4 ++++
 pack-revindex.h          | 1 +
 t/t5325-reverse-index.sh | 8 ++++++++
 3 files changed, 13 insertions(+)

diff --git a/pack-revindex.c b/pack-revindex.c
index 1baaf2c42a..683d22e898 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -2,6 +2,7 @@
 #include "pack-revindex.h"
 #include "object-store.h"
 #include "packfile.h"
+#include "config.h"
 
 struct revindex_entry {
 	off_t offset;
@@ -166,6 +167,9 @@ static void create_pack_revindex(struct packed_git *p)
 
 static int load_pack_revindex_from_memory(struct packed_git *p)
 {
+	if (git_env_bool(GIT_TEST_REV_INDEX_DIE_IN_MEMORY, 0))
+		die("dying as requested by '%s'",
+		    GIT_TEST_REV_INDEX_DIE_IN_MEMORY);
 	if (open_pack_index(p))
 		return -1;
 	create_pack_revindex(p);
diff --git a/pack-revindex.h b/pack-revindex.h
index d2d466e298..e271da871a 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -2,6 +2,7 @@
 #define PACK_REVINDEX_H
 
 #define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
+#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
 
 struct packed_git;
 
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index 5a59cc71e4..9d4eecccc9 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -83,4 +83,12 @@ test_expect_success 'pack-objects respects pack.writeReverseIndex' '
 	test_path_is_file pack-1-*.rev
 '
 
+test_expect_success 'reverse index is not generated when available on disk' '
+	git index-pack --rev-index $packdir/pack-$pack.pack &&
+
+	git rev-parse HEAD >tip &&
+	GIT_TEST_REV_INDEX_DIE_IN_MEMORY=1 git cat-file \
+		--batch-check="%(objectsize:disk)" <tip
+'
+
 test_done
-- 
2.30.0.138.g6d7191ea01

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX
  2021-01-08 18:20 ` [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
@ 2021-01-12 16:49   ` Derrick Stolee
  2021-01-12 17:34     ` Taylor Blau
  2021-01-12 17:18   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2021-01-12 16:49 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: peff, jrnieder

On 1/8/2021 1:20 PM, Taylor Blau wrote:
> Add a new option that unconditionally enables the pack.writeReverseIndex
> setting in order to run the whole test suite in a mode that generates
> on-disk reverse indexes.
> 
> Once on-disk reverse indexes are proven out over several releases, we
> can change the default value of that configuration to 'true', and drop
> this patch.

...

> +GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
> +'pack.writeReverseIndex' setting.
> +

Should this also be added to the second run of the test
suite with optional variables?

diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 6c27b886b8f..d1cbf330a14 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -22,6 +22,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
+	export GIT_TEST_WRITE_REV_INDEX=1
 	make test
 	;;
 linux-clang)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX
  2021-01-08 18:20 ` [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
@ 2021-01-12 17:11   ` Ævar Arnfjörð Bjarmason
  2021-01-12 18:40     ` Taylor Blau
  0 siblings, 1 reply; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-12 17:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, jrnieder


On Fri, Jan 08 2021, Taylor Blau wrote:

> For now, sprinkle these locations with a 'grep -v "\.rev$"' to ignore
> them. Once the pack.writeReverseIndex option has been thoroughly
> tested, we will default it to 'true', removing GIT_TEST_WRITE_REV_INDEX,
> and making it possible to revert this patch.

Maybe some of it we can change/revert, but some of it just seems to be
test warts we can fix:

> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 297de502a9..9696f88c2f 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -710,7 +710,7 @@ test_expect_success 'expire respects .keep files' '
>  		PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) &&
>  		touch $PACKA.keep &&
>  		git multi-pack-index expire &&
> -		ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files &&
> +		ls -S .git/objects/pack/a-pack* | grep $PACKA | grep -v "\.rev$" >a-pack-files &&

This seems to be testing "a *.keep file made my pack not expire". Can't
it just check for *.{pack,idx,keep} or even just *.pack?

>  		test_line_count = 3 a-pack-files &&
>  		test-tool read-midx .git/objects | grep idx >midx-list &&
>  		test_line_count = 2 midx-list
> diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
> index 431699ade2..5a59cc71e4 100755
> --- a/t/t5325-reverse-index.sh
> +++ b/t/t5325-reverse-index.sh
> @@ -3,6 +3,10 @@
>  test_description='on-disk reverse index'
>  . ./test-lib.sh
>  
> +# The below tests want control over the 'pack.writeReverseIndex' setting
> +# themselves to assert various combinations of it with other options.
> +sane_unset GIT_TEST_WRITE_REV_INDEX
> +
>  packdir=.git/objects/pack
>  
>  test_expect_success 'setup' '
> diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
> index 2f7be23044..d064426d03 100755
> --- a/t/t5604-clone-reference.sh
> +++ b/t/t5604-clone-reference.sh
> @@ -318,7 +318,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at obje
>  		test_cmp T.objects T$option.objects &&
>  		(
>  			cd T$option/.git/objects &&
> -			find . -type f | sort >../../../T$option.objects-files.raw &&
> +			find . -type f | grep -v \.rev$ | sort >../../../T$option.objects-files.raw &&
>  			find . -type l | sort >../../../T$option.objects-symlinks.raw

There's an existing loop just below that where we grep out
/commit-graph/, /multi-pack-index/ etc. whith other test modes add to
the objects directory with sed. Seems like this belongs there, not in
the find above it.

>  		)
>  	done &&
> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index 7d5b17909b..9ebf045739 100755
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -848,7 +848,7 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test -f h2found &&
>  
>  	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
> -	ls http_child/.git/objects/pack/* >filelist &&
> +	ls http_child/.git/objects/pack/* | grep -v \.rev$ >filelist &&
>  	test_line_count = 6 filelist
>  '

Maybe just check *.{pack,idx,keep}. I was looking at that code the other
day and it's really just being overly specific. It really just cares
about the *.pack files.

> @@ -902,7 +902,7 @@ test_expect_success 'packfile-uri with transfer.fsckobjects' '
>  		clone "$HTTPD_URL/smart/http_parent" http_child &&
>  
>  	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
> -	ls http_child/.git/objects/pack/* >filelist &&
> +	ls http_child/.git/objects/pack/* | grep -v \.rev$ >filelist &&

ditto.

>  	test_line_count = 4 filelist
>  '
>  
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index 4a3b8f48ac..d52f92f5a1 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -106,13 +106,13 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
>  	test_commit "$(test_oid obj2)" &&
>  	# Our first gc will create a pack; our second will create a second pack
>  	git gc --auto &&
> -	ls .git/objects/pack | sort >existing_packs &&
> +	ls .git/objects/pack | grep -v \.rev$ | sort >existing_packs &&
>  	test_commit "$(test_oid obj3)" &&
>  	test_commit "$(test_oid obj4)" &&
>  
>  	git gc --auto 2>err &&
>  	test_i18ngrep ! "^warning:" err &&
> -	ls .git/objects/pack/ | sort >post_packs &&
> +	ls .git/objects/pack/ | grep -v \.rev$ | sort >post_packs &&
>  	comm -1 -3 existing_packs post_packs >new &&
>  	comm -2 -3 existing_packs post_packs >del &&
>  	test_line_count = 0 del && # No packs are deleted

This is all part of account where we later use comm/wc -l to check how
many new packs we have,so just check *.pack?

> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 308c1ef42c..100df52a71 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -1629,7 +1629,7 @@ test_expect_success 'O: blank lines not necessary after other commands' '
>  	INPUT_END
>  
>  	git fast-import <input &&
> -	test 8 = $(find .git/objects/pack -type f | grep -v multi-pack-index | wc -l) &&
> +	test 8 = $(find .git/objects/pack -type f \( -name "*.idx" -o -name "*.pack" \) | wc -l) &&

Yay, there the existing multi-pack-index case is amended in a
future-proof way :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX
  2021-01-08 18:20 ` [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
  2021-01-12 16:49   ` Derrick Stolee
@ 2021-01-12 17:18   ` Ævar Arnfjörð Bjarmason
  2021-01-12 17:39     ` Derrick Stolee
  1 sibling, 1 reply; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-12 17:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, jrnieder


On Fri, Jan 08 2021, Taylor Blau wrote:

> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/index-pack.c   | 5 ++++-
>  builtin/pack-objects.c | 2 ++
>  pack-revindex.h        | 2 ++
>  t/README               | 3 +++
>  4 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 03408250b1..0bde325a8b 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1748,7 +1748,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>  	if (prefix && chdir(prefix))
>  		die(_("Cannot come back to cwd"));
>  
> -	rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
> +	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
> +		rev_index = 1;
> +	else
> +		rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));

Why not make an explicit GIT_TEST_WRITE_REV_INDEX=false meaningful? It's
also sometimes handy to turn these off in the tests.

    rev_index = git_env_bool("GIT_TEST_WRITE_REV_INDEX",
    	!!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV)));

> +#define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"

Micro style nit: FWIW I'm not a fan of this macro->string indirection a
few GIT_TEST_* names have had since 859fdc0c3c (commit-graph: define
GIT_TEST_COMMIT_GRAPH, 2018-08-29).

Most of them just use git_env_bool("GIT_TEST_[...]") which IMO makes it
easier to eyeball a "git grep"

> +GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
> +'pack.writeReverseIndex' setting.
> +

Re the git_env_bool() default value comment above: I see our other
boolean GIT_TEST_* docs say "when true", but mostly they mean things
like:

    GIT_TEST_WRITE_REV_INDEX=<boolean>, when set, configures the
    'pack.writeReverseIndex' setting. Defaults to 'false'.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX
  2021-01-12 16:49   ` Derrick Stolee
@ 2021-01-12 17:34     ` Taylor Blau
  0 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-12 17:34 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, peff, jrnieder

On Tue, Jan 12, 2021 at 11:49:40AM -0500, Derrick Stolee wrote:
> Should this also be added to the second run of the test
> suite with optional variables?

Ah, I wasn't aware of this script myself. Yes, it should be there,
thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX
  2021-01-12 17:18   ` Ævar Arnfjörð Bjarmason
@ 2021-01-12 17:39     ` Derrick Stolee
  2021-01-12 18:17       ` Taylor Blau
  0 siblings, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2021-01-12 17:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Taylor Blau; +Cc: git, peff, jrnieder

On 1/12/2021 12:18 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jan 08 2021, Taylor Blau wrote:
>> -	rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
>> +	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
>> +		rev_index = 1;
>> +	else
>> +		rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
> 
> Why not make an explicit GIT_TEST_WRITE_REV_INDEX=false meaningful? It's
> also sometimes handy to turn these off in the tests.
> 
>     rev_index = git_env_bool("GIT_TEST_WRITE_REV_INDEX",
>     	!!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV)));
 
This will cause tests that explicitly request a rev-index to fail when
GIT_TEST_WRITE_REF_INDEX=false. I'm not sure that's a good pattern to
follow.

>> +#define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
> 
> Micro style nit: FWIW I'm not a fan of this macro->string indirection a
> few GIT_TEST_* names have had since 859fdc0c3c (commit-graph: define
> GIT_TEST_COMMIT_GRAPH, 2018-08-29).
> 
> Most of them just use git_env_bool("GIT_TEST_[...]") which IMO makes it
> easier to eyeball a "git grep"

In the case of GIT_TEST_COMMIT_GRAPH, there are multiple places that
check the environment variable, and it is probably best to have the
strings consistent through a macro.

For something like GIT_TEST_WRITE_REV_INDEX, this macro is less
important because it is checked only once.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX
  2021-01-12 17:39     ` Derrick Stolee
@ 2021-01-12 18:17       ` Taylor Blau
  0 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-12 18:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, git, peff, jrnieder

On Tue, Jan 12, 2021 at 12:39:37PM -0500, Derrick Stolee wrote:
> On 1/12/2021 12:18 PM, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Fri, Jan 08 2021, Taylor Blau wrote:
> >> -	rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
> >> +	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
> >> +		rev_index = 1;
> >> +	else
> >> +		rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
> >
> > Why not make an explicit GIT_TEST_WRITE_REV_INDEX=false meaningful? It's
> > also sometimes handy to turn these off in the tests.
> >
> >     rev_index = git_env_bool("GIT_TEST_WRITE_REV_INDEX",
> >     	!!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV)));
>
> This will cause tests that explicitly request a rev-index to fail when
> GIT_TEST_WRITE_REF_INDEX=false. I'm not sure that's a good pattern to
> follow.

I agree that this wouldn't work; the second parameter is the default
value, not a substitute for "false".

It _would_ work currently, since there aren't any tests in this series
that explicitly

    export GIT_TEST_WRITE_REF_INDEX=false

t5325 does 'sane_unset GIT_TEST_WRITE_REV_INDEX', which accomplishes
the same thing without setting a value. If we instead used the export
form, this would break, so I think the implementation is more robust
as-is.

> >> +#define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
> >
> > Micro style nit: FWIW I'm not a fan of this macro->string indirection a
> > few GIT_TEST_* names have had since 859fdc0c3c (commit-graph: define
> > GIT_TEST_COMMIT_GRAPH, 2018-08-29).
> >
> > Most of them just use git_env_bool("GIT_TEST_[...]") which IMO makes it
> > easier to eyeball a "git grep"
>
> In the case of GIT_TEST_COMMIT_GRAPH, there are multiple places that
> check the environment variable, and it is probably best to have the
> strings consistent through a macro.
>
> For something like GIT_TEST_WRITE_REV_INDEX, this macro is less
> important because it is checked only once.

FWIW, I'm not a huge fan of the indirection either, but I think that its
use here is warranted, since it is checked twice (in index-pack, and
pack-objects).

It _could_ be checked lower in the call stack, but I tried this when
developing this series and it ended up being far messier than what is
presented here. The trickiness is with the extra verification mode
(which ensures that an existing '.rev' file was written correctly), and
having to indicate that at each caller.

> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX
  2021-01-12 17:11   ` Ævar Arnfjörð Bjarmason
@ 2021-01-12 18:40     ` Taylor Blau
  0 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-12 18:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, peff, jrnieder

Hi Ævar,

Your suggestions are quite helpful, and I'm glad to apply them,
especially if it means that this "clean up" patch can actually harden us
from similar changes in the future.

On Tue, Jan 12, 2021 at 06:11:00PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
> On Fri, Jan 08 2021, Taylor Blau wrote:
>
> > For now, sprinkle these locations with a 'grep -v "\.rev$"' to ignore
> > them. Once the pack.writeReverseIndex option has been thoroughly
> > tested, we will default it to 'true', removing GIT_TEST_WRITE_REV_INDEX,
> > and making it possible to revert this patch.
>
> Maybe some of it we can change/revert, but some of it just seems to be
> test warts we can fix:
>
> > diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> > index 297de502a9..9696f88c2f 100755
> > --- a/t/t5319-multi-pack-index.sh
> > +++ b/t/t5319-multi-pack-index.sh
> > @@ -710,7 +710,7 @@ test_expect_success 'expire respects .keep files' '
> >  		PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) &&
> >  		touch $PACKA.keep &&
> >  		git multi-pack-index expire &&
> > -		ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files &&
> > +		ls -S .git/objects/pack/a-pack* | grep $PACKA | grep -v "\.rev$" >a-pack-files &&
>
> This seems to be testing "a *.keep file made my pack not expire". Can't
> it just check for *.{pack,idx,keep} or even just *.pack?

Yeah, and I think the simplest thing to do here is just check that these
files exist at all. Something like:

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 9696f88c2f..f5e50508c9 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -710,7 +710,9 @@ test_expect_success 'expire respects .keep files' '
 		PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) &&
 		touch $PACKA.keep &&
 		git multi-pack-index expire &&
-		ls -S .git/objects/pack/a-pack* | grep $PACKA | grep -v "\.rev$" >a-pack-files &&
+		test_is_file .git/objects/pack/pack-a-$PACKA.idx &&
+		test_is_file .git/objects/pack/pack-a-$PACKA.keep &&
+		test_is_file .git/objects/pack/pack-a-$PACKA.pack &&
 		test_line_count = 3 a-pack-files &&
 		test-tool read-midx .git/objects | grep idx >midx-list &&
 		test_line_count = 2 midx-list

> > diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
> > index 2f7be23044..d064426d03 100755
> > --- a/t/t5604-clone-reference.sh
> > +++ b/t/t5604-clone-reference.sh
> > @@ -318,7 +318,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at obje
> >  		test_cmp T.objects T$option.objects &&
> >  		(
> >  			cd T$option/.git/objects &&
> > -			find . -type f | sort >../../../T$option.objects-files.raw &&
> > +			find . -type f | grep -v \.rev$ | sort >../../../T$option.objects-files.raw &&
> >  			find . -type l | sort >../../../T$option.objects-symlinks.raw
>
> There's an existing loop just below that where we grep out
> /commit-graph/, /multi-pack-index/ etc. whith other test modes add to
> the objects directory with sed. Seems like this belongs there, not in
> the find above it.

Much cleaner! Thank you.

> > diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> > index 7d5b17909b..9ebf045739 100755
> > --- a/t/t5702-protocol-v2.sh
> > +++ b/t/t5702-protocol-v2.sh
> > @@ -848,7 +848,7 @@ test_expect_success 'part of packfile response provided as URI' '
> >  	test -f h2found &&
> >
> >  	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
> > -	ls http_child/.git/objects/pack/* >filelist &&
> > +	ls http_child/.git/objects/pack/* | grep -v \.rev$ >filelist &&
> >  	test_line_count = 6 filelist
> >  '
>
> Maybe just check *.{pack,idx,keep}. I was looking at that code the other
> day and it's really just being overly specific. It really just cares
> about the *.pack files.

I think here and in t6500 as well as t9300 the easiest thing to do is
just

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 9ebf045739..73cd9e3ff6 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -848,8 +848,10 @@ test_expect_success 'part of packfile response provided as URI' '
 	test -f h2found &&

 	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
-	ls http_child/.git/objects/pack/* | grep -v \.rev$ >filelist &&
-	test_line_count = 6 filelist
+	ls http_child/.git/objects/pack/*.pack >packlist &&
+	ls http_child/.git/objects/pack/*.idx >idxlist &&
+	test_line_count = 3 idxlist &&
+	test_line_count = 3 packlist
 '

> > diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> > index 308c1ef42c..100df52a71 100755
> > --- a/t/t9300-fast-import.sh
> > +++ b/t/t9300-fast-import.sh
> > @@ -1629,7 +1629,7 @@ test_expect_success 'O: blank lines not necessary after other commands' '
> >  	INPUT_END
> >
> >  	git fast-import <input &&
> > -	test 8 = $(find .git/objects/pack -type f | grep -v multi-pack-index | wc -l) &&
> > +	test 8 = $(find .git/objects/pack -type f \( -name "*.idx" -o -name "*.pack" \) | wc -l) &&
>
> Yay, there the existing multi-pack-index case is amended in a
> future-proof way :)

Here's actually a spot where I'm unhappy with the resulting complexity,
and I think that it would be much cleaner if we did the same
packlist+idxlist thing and then checked the line count of each.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format
  2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                   ` (7 preceding siblings ...)
  2021-01-08 18:20 ` [PATCH 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
@ 2021-01-13 22:28 ` Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
                     ` (7 more replies)
  8 siblings, 8 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

Hi,

This is the second of two series to implement support for an on-disk format for
storing the reverse index. Note that this depends on the patches in the previous
series [1], which was recently updated).

This version is largely unchanged from the original, with the following
exceptions:

  - It has been rebased onto the patches in the first series.
  - The operands of two comparisons in 'offset_to_pack_pos()' were swapped so
    that the smaller of the two appears on the left-hand side of the comparison.
  - A brown-paper-bag bug was fixed in tests so that they pass on Windows (last
    night's integration broke 'seen' on Windows).
  - The GIT_TEST_WRITE_REV_INDEX mode was enabled in the "all-features" test.

Thanks in advance for your review.

[1]: https://lore.kernel.org/git/cover.1610129796.git.me@ttaylorr.com/

Taylor Blau (8):
  packfile: prepare for the existence of '*.rev' files
  pack-write.c: prepare to write 'pack-*.rev' files
  builtin/index-pack.c: write reverse indexes
  builtin/pack-objects.c: respect 'pack.writeReverseIndex'
  Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
  t: prepare for GIT_TEST_WRITE_REV_INDEX
  t: support GIT_TEST_WRITE_REV_INDEX
  pack-revindex: ensure that on-disk reverse indexes are given
    precedence

 Documentation/config/pack.txt           |   7 ++
 Documentation/git-index-pack.txt        |  20 ++--
 Documentation/technical/pack-format.txt |  17 ++++
 builtin/index-pack.c                    |  67 +++++++++++--
 builtin/pack-objects.c                  |   9 ++
 builtin/repack.c                        |   1 +
 ci/run-build-and-tests.sh               |   1 +
 object-store.h                          |   3 +
 pack-revindex.c                         | 116 ++++++++++++++++++++--
 pack-revindex.h                         |  10 +-
 pack-write.c                            | 123 +++++++++++++++++++++++-
 pack.h                                  |   4 +
 packfile.c                              |  13 ++-
 packfile.h                              |   1 +
 t/README                                |   3 +
 t/t5319-multi-pack-index.sh             |   5 +-
 t/t5325-reverse-index.sh                |  97 +++++++++++++++++++
 t/t5604-clone-reference.sh              |   2 +-
 t/t5702-protocol-v2.sh                  |  12 ++-
 t/t6500-gc.sh                           |   6 +-
 t/t9300-fast-import.sh                  |   5 +-
 tmp-objdir.c                            |   4 +-
 22 files changed, 485 insertions(+), 41 deletions(-)
 create mode 100755 t/t5325-reverse-index.sh

 -:  ---------- >  1:  e1aa89244a pack-revindex: introduce a new API
 -:  ---------- >  2:  0fca7d5812 write_reuse_object(): convert to new revindex API
 -:  ---------- >  3:  7676822a54 write_reused_pack_one(): convert to new revindex API
 -:  ---------- >  4:  dd7133fdb7 write_reused_pack_verbatim(): convert to new revindex API
 -:  ---------- >  5:  8e93ca3886 check_object(): convert to new revindex API
 -:  ---------- >  6:  084bbf2145 bitmap_position_packfile(): convert to new revindex API
 -:  ---------- >  7:  68794e9484 show_objects_for_type(): convert to new revindex API
 -:  ---------- >  8:  31ac6f5703 get_size_by_pos(): convert to new revindex API
 -:  ---------- >  9:  acd80069a2 try_partial_reuse(): convert to new revindex API
 -:  ---------- > 10:  569acdca7f rebuild_existing_bitmaps(): convert to new revindex API
 -:  ---------- > 11:  9881637724 get_delta_base_oid(): convert to new revindex API
 -:  ---------- > 12:  df8bb571a5 retry_bad_packed_offset(): convert to new revindex API
 -:  ---------- > 13:  41b2e00947 packed_object_info(): convert to new revindex API
 -:  ---------- > 14:  8ad49d231f unpack_entry(): convert to new revindex API
 -:  ---------- > 15:  e757476351 for_each_object_in_pack(): convert to new revindex API
 -:  ---------- > 16:  a500311e33 builtin/gc.c: guess the size of the revindex
 -:  ---------- > 17:  67d14da04a pack-revindex: remove unused 'find_pack_revindex()'
 -:  ---------- > 18:  3b5c92be68 pack-revindex: remove unused 'find_revindex_position()'
 -:  ---------- > 19:  cabafce4a1 pack-revindex: hide the definition of 'revindex_entry'
 -:  ---------- > 20:  8400ff6c96 pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()'
 1:  ddf47a0a50 ! 21:  6742c15c84 packfile: prepare for the existence of '*.rev' files
    @@ pack-revindex.c: static void create_pack_revindex(struct packed_git *p)
     +
      int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
      {
    - 	int lo = 0;
    + 	unsigned lo, hi;
     @@ pack-revindex.c: int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)

      uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos)
    @@ pack-revindex.c: int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_
     -	if (!p->revindex)
     +	if (!(p->revindex || p->revindex_data))
      		BUG("pack_pos_to_index: reverse index not yet loaded");
    - 	if (pos >= p->num_objects)
    + 	if (p->num_objects <= pos)
      		BUG("pack_pos_to_index: out-of-bounds object at %"PRIu32, pos);
     -	return p->revindex[pos].nr;
     +
    @@ pack-revindex.c: int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_
     -	if (!p->revindex)
     +	if (!(p->revindex || p->revindex_data))
      		BUG("pack_pos_to_index: reverse index not yet loaded");
    - 	if (pos > p->num_objects)
    + 	if (p->num_objects < pos)
      		BUG("pack_pos_to_offset: out-of-bounds object at %"PRIu32, pos);
     -	return p->revindex[pos].offset;
     +
    @@ pack-revindex.c: int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_
     +		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
      }

    + ## pack-revindex.h ##
    +@@ pack-revindex.h: struct packed_git;
    + /*
    +  * load_pack_revindex populates the revindex's internal data-structures for the
    +  * given pack, returning zero on success and a negative value otherwise.
    ++ *
    ++ * If a '.rev' file is present, it is checked for consistency, mmap'd, and
    ++ * pointers are assigned into it (instead of using the in-memory variant).
    +  */
    + int load_pack_revindex(struct packed_git *p);
    +
    +@@ pack-revindex.h: uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
    +  * If the reverse index has not yet been loaded, or the position is out of
    +  * bounds, this function aborts.
    +  *
    +- * This function runs in constant time.
    ++ * This function runs in constant time under both in-memory and on-disk reverse
    ++ * indexes, but an additional step is taken to consult the corresponding .idx
    ++ * file when using the on-disk format.
    +  */
    + off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
    +
    +
      ## packfile.c ##
     @@ packfile.c: void close_pack_index(struct packed_git *p)
      	}
 2:  88393e2662 = 22:  8648c87fa7 pack-write.c: prepare to write 'pack-*.rev' files
 3:  b0a7329824 ! 23:  5b18ada611 builtin/index-pack.c: write reverse indexes
    @@ t/t5325-reverse-index.sh (new)
     +	rm -f $rev &&
     +	conf=$1 &&
     +	shift &&
    ++	# remove the index since Windows won't overwrite an existing file
    ++	rm $packdir/pack-$pack.idx &&
     +	git -c pack.writeReverseIndex=$conf index-pack "$@" \
     +		$packdir/pack-$pack.pack
     +}
 4:  e297a31875 = 24:  68bde3ea97 builtin/pack-objects.c: respect 'pack.writeReverseIndex'
 5:  5d3e96a498 = 25:  38a253d0ce Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
 6:  2288571fbe <  -:  ---------- t: prepare for GIT_TEST_WRITE_REV_INDEX
 -:  ---------- > 26:  12cdf2d67a t: prepare for GIT_TEST_WRITE_REV_INDEX
 7:  3525c4d114 ! 27:  6b647d9775 t: support GIT_TEST_WRITE_REV_INDEX
    @@ Commit message

         Add a new option that unconditionally enables the pack.writeReverseIndex
         setting in order to run the whole test suite in a mode that generates
    -    on-disk reverse indexes.
    +    on-disk reverse indexes. Additionally, enable this mode in the second
    +    run of tests under linux-gcc in 'ci/run-build-and-tests.sh'.

         Once on-disk reverse indexes are proven out over several releases, we
         can change the default value of that configuration to 'true', and drop
    @@ builtin/pack-objects.c: int cmd_pack_objects(int argc, const char **argv, const
      	progress = isatty(2);
      	argc = parse_options(argc, argv, prefix, pack_objects_options,

    + ## ci/run-build-and-tests.sh ##
    +@@ ci/run-build-and-tests.sh: linux-gcc)
    + 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
    + 	export GIT_TEST_MULTI_PACK_INDEX=1
    + 	export GIT_TEST_ADD_I_USE_BUILTIN=1
    ++	export GIT_TEST_WRITE_REV_INDEX=1
    + 	make test
    + 	;;
    + linux-clang)
    +
      ## pack-revindex.h ##
     @@
    - #ifndef PACK_REVINDEX_H
    - #define PACK_REVINDEX_H
    +  *   can be found
    +  */

     +#define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
     +
      struct packed_git;

    - int load_pack_revindex(struct packed_git *p);
    + /*

      ## t/README ##
     @@ t/README: GIT_TEST_DEFAULT_HASH=<hash-algo> specifies which hash algorithm to
 8:  6e580d43d1 ! 28:  48926ae182 pack-revindex: ensure that on-disk reverse indexes are given precedence
    @@ pack-revindex.c: static void create_pack_revindex(struct packed_git *p)

      ## pack-revindex.h ##
     @@
    - #define PACK_REVINDEX_H
    +  */

      #define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
     +#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
    @@ t/t5325-reverse-index.sh: test_expect_success 'pack-objects respects pack.writeR
      '

     +test_expect_success 'reverse index is not generated when available on disk' '
    -+	git index-pack --rev-index $packdir/pack-$pack.pack &&
    ++	test_index_pack true &&
    ++	test_path_is_file $rev &&
     +
     +	git rev-parse HEAD >tip &&
     +	GIT_TEST_REV_INDEX_DIE_IN_MEMORY=1 git cat-file \
--
2.30.0.138.g6d7191ea01

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  2021-01-14  7:22     ` Junio C Hamano
  2021-01-14  7:26     ` Junio C Hamano
  2021-01-13 22:28   ` [PATCH v2 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
                     ` (6 subsequent siblings)
  7 siblings, 2 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

Specify the format of the on-disk reverse index 'pack-*.rev' file, as
well as prepare the code for the existence of such files.

The reverse index maps from pack relative positions (i.e., an index into
the array of object which is sorted by their offsets within the
packfile) to their position within the 'pack-*.idx' file. Today, this is
done by building up a list of (off_t, uint32_t) tuples for each object
(the off_t corresponding to that object's offset, and the uint32_t
corresponding to its position in the index). To convert between pack and
index position quickly, this array of tuples is radix sorted based on
its offset.

This has two major drawbacks:

First, the in-memory cost scales linearly with the number of objects in
a pack.  Each 'struct revindex_entry' is sizeof(off_t) +
sizeof(uint32_t) + padding bytes for a total of 16.

To observe this, force Git to load the reverse index by, for e.g.,
running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking
for a single object in a fresh clone of the kernel, Git needs to
allocate 120+ MB of memory in order to hold the reverse index in memory.

Second, the cost to sort also scales with the size of the pack.
Luckily, this is a linear function since 'load_pack_revindex()' uses a
radix sort, but this cost still must be paid once per pack per process.

As an example, it takes ~60x longer to print the _size_ of an object as
it does to print that entire object's _contents_:

  Benchmark #1: git.compile cat-file --batch <obj
    Time (mean ± σ):       3.4 ms ±   0.1 ms    [User: 3.3 ms, System: 2.1 ms]
    Range (min … max):     3.2 ms …   3.7 ms    726 runs

  Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj
    Time (mean ± σ):     210.3 ms ±   8.9 ms    [User: 188.2 ms, System: 23.2 ms]
    Range (min … max):   193.7 ms … 224.4 ms    13 runs

Instead, avoid computing and sorting the revindex once per process by
writing it to a file when the pack itself is generated.

The format is relatively straightforward. It contains an array of
uint32_t's, the length of which is equal to the number of objects in the
pack.  The ith entry in this table contains the index position of the
ith object in the pack, where "ith object in the pack" is determined by
pack offset.

One thing that the on-disk format does _not_ contain is the full (up to)
eight-byte offset corresponding to each object. This is something that
the in-memory revindex contains (it stores an off_t in 'struct
revindex_entry' along with the same uint32_t that the on-disk format
has). Omit it in the on-disk format, since knowing the index position
for some object is sufficient to get a constant-time lookup in the
pack-*.idx file to ask for an object's offset within the pack.

This trades off between the on-disk size of the 'pack-*.rev' file for
runtime to chase down the offset for some object. Even though the lookup
is constant time, the constant is heavier, since it can potentially
involve two pointer walks in v2 indexes (one to access the 4-byte offset
table, and potentially a second to access the double wide offset table).

Consider trying to map an object's pack offset to a relative position
within that pack. In a cold-cache scenario, more page faults occur while
switching between binary searching through the reverse index and
searching through the *.idx file for an object's offset. Sure enough,
with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after
'sync'ing), printing out the entire object's contents is still
marginally faster than printing its size:

  Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null
    Time (mean ± σ):      22.6 ms ±   0.5 ms    [User: 2.4 ms, System: 7.9 ms]
    Range (min … max):    21.4 ms …  23.5 ms    41 runs

  Benchmark #2: git.compile cat-file --batch <obj >/dev/null
    Time (mean ± σ):      17.2 ms ±   0.7 ms    [User: 2.8 ms, System: 5.5 ms]
    Range (min … max):    15.6 ms …  18.2 ms    45 runs

(Numbers taken in the kernel after cheating and using the next patch to
generate a reverse index). There are a couple of approaches to improve
cold cache performance not pursued here:

  - We could include the object offsets in the reverse index format.
    Predictably, this does result in fewer page faults, but it triples
    the size of the file, while simultaneously duplicating a ton of data
    already available in the .idx file. (This was the original way I
    implemented the format, and it did show
    `--batch-check='%(objectsize:disk)'` winning out against `--batch`.)

    On the other hand, this increase in size also results in a large
    block-cache footprint, which could potentially hurt other workloads.

  - We could store the mapping from pack to index position in more
    cache-friendly way, like constructing a binary search tree from the
    table and writing the values in breadth-first order. This would
    result in much better locality, but the price you pay is trading
    O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you
    can no longer directly index the table).

So, neither of these approaches are taken here. (Thankfully, the format
is versioned, so we are free to pursue these in the future.) But, cold
cache performance likely isn't interesting outside of one-off cases like
asking for the size of an object directly. In real-world usage, Git is
often performing many operations in the revindex,

The trade-off is worth it, since we will avoid the vast majority of the
cost of generating the revindex that the extra pointer chase will look
like noise in the following patch's benchmarks.

This patch describes the format and prepares callers (like in
pack-revindex.c) to be able to read *.rev files once they exist. An
implementation of the writer will appear in the next patch, and callers
will gradually begin to start using the writer in the patches that
follow after that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/pack-format.txt |  17 ++++
 builtin/repack.c                        |   1 +
 object-store.h                          |   3 +
 pack-revindex.c                         | 112 +++++++++++++++++++++---
 pack-revindex.h                         |   7 +-
 packfile.c                              |  13 ++-
 packfile.h                              |   1 +
 tmp-objdir.c                            |   4 +-
 8 files changed, 145 insertions(+), 13 deletions(-)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index f96b2e605f..9593f8bc68 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -259,6 +259,23 @@ Pack file entry: <+
 
     Index checksum of all of the above.
 
+== pack-*.rev files have the format:
+
+  - A 4-byte magic number '0x52494458' ('RIDX').
+
+  - A 4-byte version identifier (= 1)
+
+  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)
+
+  - A table of index positions, sorted by their corresponding offsets in the
+    packfile.
+
+  - A trailer, containing a:
+
+    checksum of the corresponding packfile, and
+
+    a checksum of all of the above.
+
 == multi-pack-index (MIDX) files have the following format:
 
 The multi-pack-index files refer to multiple pack-files and loose objects.
diff --git a/builtin/repack.c b/builtin/repack.c
index 279be11a16..8d643ddcb9 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -208,6 +208,7 @@ static struct {
 } exts[] = {
 	{".pack"},
 	{".idx"},
+	{".rev", 1},
 	{".bitmap", 1},
 	{".promisor", 1},
 };
diff --git a/object-store.h b/object-store.h
index c4fc9dd74e..3fbf11280f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -85,6 +85,9 @@ struct packed_git {
 		 multi_pack_index:1;
 	unsigned char hash[GIT_MAX_RAWSZ];
 	struct revindex_entry *revindex;
+	const void *revindex_data;
+	const void *revindex_map;
+	size_t revindex_size;
 	/* something like ".git/objects/pack/xxxxx.pack" */
 	char pack_name[FLEX_ARRAY]; /* more */
 };
diff --git a/pack-revindex.c b/pack-revindex.c
index 5e69bc7372..369812dd21 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -164,16 +164,98 @@ static void create_pack_revindex(struct packed_git *p)
 	sort_revindex(p->revindex, num_ent, p->pack_size);
 }
 
-int load_pack_revindex(struct packed_git *p)
+static int load_pack_revindex_from_memory(struct packed_git *p)
 {
-	if (!p->revindex) {
-		if (open_pack_index(p))
-			return -1;
-		create_pack_revindex(p);
-	}
+	if (open_pack_index(p))
+		return -1;
+	create_pack_revindex(p);
 	return 0;
 }
 
+static char *pack_revindex_filename(struct packed_git *p)
+{
+	size_t len;
+	if (!strip_suffix(p->pack_name, ".pack", &len))
+		BUG("pack_name does not end in .pack");
+	return xstrfmt("%.*s.rev", (int)len, p->pack_name);
+}
+
+#define RIDX_MIN_SIZE (12 + (2 * the_hash_algo->rawsz))
+
+static int load_revindex_from_disk(char *revindex_name,
+				   uint32_t num_objects,
+				   const void **data, size_t *len)
+{
+	int fd, ret = 0;
+	struct stat st;
+	size_t revindex_size;
+
+	fd = git_open(revindex_name);
+
+	if (fd < 0) {
+		ret = -1;
+		goto cleanup;
+	}
+	if (fstat(fd, &st)) {
+		ret = error_errno(_("failed to read %s"), revindex_name);
+		goto cleanup;
+	}
+
+	revindex_size = xsize_t(st.st_size);
+
+	if (revindex_size < RIDX_MIN_SIZE) {
+		ret = error(_("reverse-index file %s is too small"), revindex_name);
+		goto cleanup;
+	}
+
+	if (revindex_size - RIDX_MIN_SIZE != st_mult(sizeof(uint32_t), num_objects)) {
+		ret = error(_("reverse-index file %s is corrupt"), revindex_name);
+		goto cleanup;
+	}
+
+	*len = revindex_size;
+	*data = xmmap(NULL, revindex_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+cleanup:
+	close(fd);
+	return ret;
+}
+
+static int load_pack_revindex_from_disk(struct packed_git *p)
+{
+	char *revindex_name;
+	int ret;
+	if (open_pack_index(p))
+		return -1;
+
+	revindex_name = pack_revindex_filename(p);
+
+	ret = load_revindex_from_disk(revindex_name,
+				      p->num_objects,
+				      &p->revindex_map,
+				      &p->revindex_size);
+	if (ret)
+		goto cleanup;
+
+	p->revindex_data = (char *)p->revindex_map + 12;
+
+cleanup:
+	free(revindex_name);
+	return ret;
+}
+
+int load_pack_revindex(struct packed_git *p)
+{
+	if (p->revindex || p->revindex_data)
+		return 0;
+
+	if (!load_pack_revindex_from_disk(p))
+		return 0;
+	else if (!load_pack_revindex_from_memory(p))
+		return 0;
+	return -1;
+}
+
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	unsigned lo, hi;
@@ -203,18 +285,28 @@ int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 
 uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos)
 {
-	if (!p->revindex)
+	if (!(p->revindex || p->revindex_data))
 		BUG("pack_pos_to_index: reverse index not yet loaded");
 	if (p->num_objects <= pos)
 		BUG("pack_pos_to_index: out-of-bounds object at %"PRIu32, pos);
-	return p->revindex[pos].nr;
+
+	if (p->revindex)
+		return p->revindex[pos].nr;
+	else
+		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
 }
 
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
 {
-	if (!p->revindex)
+	if (!(p->revindex || p->revindex_data))
 		BUG("pack_pos_to_index: reverse index not yet loaded");
 	if (p->num_objects < pos)
 		BUG("pack_pos_to_offset: out-of-bounds object at %"PRIu32, pos);
-	return p->revindex[pos].offset;
+
+	if (p->revindex)
+		return p->revindex[pos].offset;
+	else if (pos == p->num_objects)
+		return p->pack_size - the_hash_algo->rawsz;
+	else
+		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
 }
diff --git a/pack-revindex.h b/pack-revindex.h
index 6e0320b08b..01622cf21a 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -21,6 +21,9 @@ struct packed_git;
 /*
  * load_pack_revindex populates the revindex's internal data-structures for the
  * given pack, returning zero on success and a negative value otherwise.
+ *
+ * If a '.rev' file is present, it is checked for consistency, mmap'd, and
+ * pointers are assigned into it (instead of using the in-memory variant).
  */
 int load_pack_revindex(struct packed_git *p);
 
@@ -55,7 +58,9 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
  * If the reverse index has not yet been loaded, or the position is out of
  * bounds, this function aborts.
  *
- * This function runs in constant time.
+ * This function runs in constant time under both in-memory and on-disk reverse
+ * indexes, but an additional step is taken to consult the corresponding .idx
+ * file when using the on-disk format.
  */
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
 
diff --git a/packfile.c b/packfile.c
index 7bb1750934..b04eac9286 100644
--- a/packfile.c
+++ b/packfile.c
@@ -324,11 +324,21 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
+void close_pack_revindex(struct packed_git *p) {
+	if (!p->revindex_map)
+		return;
+
+	munmap((void *)p->revindex_map, p->revindex_size);
+	p->revindex_map = NULL;
+	p->revindex_data = NULL;
+}
+
 void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
 	close_pack_index(p);
+	close_pack_revindex(p);
 }
 
 void close_object_store(struct raw_object_store *o)
@@ -351,7 +361,7 @@ void close_object_store(struct raw_object_store *o)
 
 void unlink_pack_path(const char *pack_name, int force_delete)
 {
-	static const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
+	static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;
@@ -853,6 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (ends_with(file_name, ".idx") ||
+	    ends_with(file_name, ".rev") ||
 	    ends_with(file_name, ".pack") ||
 	    ends_with(file_name, ".bitmap") ||
 	    ends_with(file_name, ".keep") ||
diff --git a/packfile.h b/packfile.h
index a58fc738e0..4cfec9e8d3 100644
--- a/packfile.h
+++ b/packfile.h
@@ -90,6 +90,7 @@ uint32_t get_pack_fanout(struct packed_git *p, uint32_t value);
 
 unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 void close_pack_windows(struct packed_git *);
+void close_pack_revindex(struct packed_git *);
 void close_pack(struct packed_git *);
 void close_object_store(struct raw_object_store *o);
 void unuse_pack(struct pack_window **);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index 42ed4db5d3..da414df14f 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -187,7 +187,9 @@ static int pack_copy_priority(const char *name)
 		return 2;
 	if (ends_with(name, ".idx"))
 		return 3;
-	return 4;
+	if (ends_with(name, ".rev"))
+		return 4;
+	return 5;
 }
 
 static int pack_copy_cmp(const char *a, const char *b)
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 2/8] pack-write.c: prepare to write 'pack-*.rev' files
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

This patch prepares for callers to be able to write reverse index files
to disk.

It adds the necessary machinery to write a format-compliant .rev file
from within 'write_rev_file()', which is called from
'finish_tmp_packfile()'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-write.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 pack.h       |   4 ++
 2 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/pack-write.c b/pack-write.c
index 3513665e1e..68db5a9edf 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -166,6 +166,116 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	return index_name;
 }
 
+static int pack_order_cmp(const void *va, const void *vb, void *ctx)
+{
+	struct pack_idx_entry **objects = ctx;
+
+	off_t oa = objects[*(uint32_t*)va]->offset;
+	off_t ob = objects[*(uint32_t*)vb]->offset;
+
+	if (oa < ob)
+		return -1;
+	if (oa > ob)
+		return 1;
+	return 0;
+}
+
+#define RIDX_SIGNATURE 0x52494458 /* "RIDX" */
+#define RIDX_VERSION 1
+
+static void write_rev_header(struct hashfile *f)
+{
+	uint32_t oid_version;
+	switch (hash_algo_by_ptr(the_hash_algo)) {
+	case GIT_HASH_SHA1:
+		oid_version = 1;
+		break;
+	case GIT_HASH_SHA256:
+		oid_version = 2;
+		break;
+	default:
+		die("write_rev_header: unknown hash version");
+	}
+
+	hashwrite_be32(f, RIDX_SIGNATURE);
+	hashwrite_be32(f, RIDX_VERSION);
+	hashwrite_be32(f, oid_version);
+}
+
+static void write_rev_index_positions(struct hashfile *f,
+				      struct pack_idx_entry **objects,
+				      uint32_t nr_objects)
+{
+	uint32_t *pack_order;
+	uint32_t i;
+
+	ALLOC_ARRAY(pack_order, nr_objects);
+	for (i = 0; i < nr_objects; i++)
+		pack_order[i] = i;
+	QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
+
+	for (i = 0; i < nr_objects; i++)
+		hashwrite_be32(f, pack_order[i]);
+
+	free(pack_order);
+}
+
+static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
+{
+	hashwrite(f, hash, the_hash_algo->rawsz);
+}
+
+const char *write_rev_file(const char *rev_name,
+			   struct pack_idx_entry **objects,
+			   uint32_t nr_objects,
+			   const unsigned char *hash,
+			   unsigned flags)
+{
+	struct hashfile *f;
+	int fd;
+
+	if ((flags & WRITE_REV) && (flags & WRITE_REV_VERIFY))
+		die(_("cannot both write and verify reverse index"));
+
+	if (flags & WRITE_REV) {
+		if (!rev_name) {
+			struct strbuf tmp_file = STRBUF_INIT;
+			fd = odb_mkstemp(&tmp_file, "pack/tmp_rev_XXXXXX");
+			rev_name = strbuf_detach(&tmp_file, NULL);
+		} else {
+			unlink(rev_name);
+			fd = open(rev_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+			if (fd < 0)
+				die_errno("unable to create '%s'", rev_name);
+		}
+		f = hashfd(fd, rev_name);
+	} else if (flags & WRITE_REV_VERIFY) {
+		struct stat statbuf;
+		if (stat(rev_name, &statbuf)) {
+			if (errno == ENOENT) {
+				/* .rev files are optional */
+				return NULL;
+			} else
+				die_errno(_("could not stat: %s"), rev_name);
+		}
+		f = hashfd_check(rev_name);
+	} else
+		return NULL;
+
+	write_rev_header(f);
+
+	write_rev_index_positions(f, objects, nr_objects);
+	write_rev_trailer(f, hash);
+
+	if (rev_name && adjust_shared_perm(rev_name) < 0)
+		die(_("failed to make %s readable"), rev_name);
+
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+				    ((flags & WRITE_IDX_VERIFY) ? 0 : CSUM_FSYNC));
+
+	return rev_name;
+}
+
 off_t write_pack_header(struct hashfile *f, uint32_t nr_entries)
 {
 	struct pack_header hdr;
@@ -341,7 +451,7 @@ void finish_tmp_packfile(struct strbuf *name_buffer,
 			 struct pack_idx_option *pack_idx_opts,
 			 unsigned char hash[])
 {
-	const char *idx_tmp_name;
+	const char *idx_tmp_name, *rev_tmp_name = NULL;
 	int basename_len = name_buffer->len;
 
 	if (adjust_shared_perm(pack_tmp_name))
@@ -352,6 +462,9 @@ void finish_tmp_packfile(struct strbuf *name_buffer,
 	if (adjust_shared_perm(idx_tmp_name))
 		die_errno("unable to make temporary index file readable");
 
+	rev_tmp_name = write_rev_file(NULL, written_list, nr_written, hash,
+				      pack_idx_opts->flags);
+
 	strbuf_addf(name_buffer, "%s.pack", hash_to_hex(hash));
 
 	if (rename(pack_tmp_name, name_buffer->buf))
@@ -365,5 +478,13 @@ void finish_tmp_packfile(struct strbuf *name_buffer,
 
 	strbuf_setlen(name_buffer, basename_len);
 
+	if (rev_tmp_name) {
+		strbuf_addf(name_buffer, "%s.rev", hash_to_hex(hash));
+		if (rename(rev_tmp_name, name_buffer->buf))
+			die_errno("unable to rename temporary reverse-index file");
+	}
+
+	strbuf_setlen(name_buffer, basename_len);
+
 	free((void *)idx_tmp_name);
 }
diff --git a/pack.h b/pack.h
index 9fc0945ac9..30439e0784 100644
--- a/pack.h
+++ b/pack.h
@@ -42,6 +42,8 @@ struct pack_idx_option {
 	/* flag bits */
 #define WRITE_IDX_VERIFY 01 /* verify only, do not write the idx file */
 #define WRITE_IDX_STRICT 02
+#define WRITE_REV 04
+#define WRITE_REV_VERIFY 010
 
 	uint32_t version;
 	uint32_t off32_limit;
@@ -87,6 +89,8 @@ off_t write_pack_header(struct hashfile *f, uint32_t);
 void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 char *index_pack_lockfile(int fd);
 
+const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
+
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
  * up to 2^67.
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 3/8] builtin/index-pack.c: write reverse indexes
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

Teach 'git index-pack' to optionally write and verify reverse index with
'--[no-]rev-index', as well as respecting the 'pack.writeReverseIndex'
configuration option.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-index-pack.txt | 20 ++++++---
 builtin/index-pack.c             | 64 +++++++++++++++++++++++-----
 t/t5325-reverse-index.sh         | 71 ++++++++++++++++++++++++++++++++
 3 files changed, 139 insertions(+), 16 deletions(-)
 create mode 100755 t/t5325-reverse-index.sh

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index af0c26232c..b65f380269 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -9,17 +9,18 @@ git-index-pack - Build pack index file for an existing packed archive
 SYNOPSIS
 --------
 [verse]
-'git index-pack' [-v] [-o <index-file>] <pack-file>
+'git index-pack' [-v] [-o <index-file>] [--[no-]rev-index] <pack-file>
 'git index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>]
-                 [<pack-file>]
+		  [--[no-]rev-index] [<pack-file>]
 
 
 DESCRIPTION
 -----------
 Reads a packed archive (.pack) from the specified file, and
-builds a pack index file (.idx) for it.  The packed archive
-together with the pack index can then be placed in the
-objects/pack/ directory of a Git repository.
+builds a pack index file (.idx) for it. Optionally writes a
+reverse-index (.rev) for the specified pack. The packed
+archive together with the pack index can then be placed in
+the objects/pack/ directory of a Git repository.
 
 
 OPTIONS
@@ -33,7 +34,14 @@ OPTIONS
 	file is constructed from the name of packed archive
 	file by replacing .pack with .idx (and the program
 	fails if the name of packed archive does not end
-	with .pack).
+	with .pack). Incompatible with `--rev-index`.
+
+--[no-]rev-index::
+	When this flag is provided, generate a reverse index
+	(a `.rev` file) corresponding to the given pack. If
+	`--verify` is given, ensure that the existing
+	reverse index is correct. Takes precedence over
+	`pack.writeReverseIndex`.
 
 --stdin::
 	When this flag is provided, the pack is read from stdin
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4b8d86e0ad..03408250b1 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -17,7 +17,7 @@
 #include "promisor-remote.h"
 
 static const char index_pack_usage[] =
-"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
+"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
 
 struct object_entry {
 	struct pack_idx_entry idx;
@@ -1436,13 +1436,13 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	free(sorted_by_pos);
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-				   struct strbuf *buf)
+static const char *derive_filename(const char *pack_name, const char *strip,
+				   const char *suffix, struct strbuf *buf)
 {
 	size_t len;
-	if (!strip_suffix(pack_name, ".pack", &len))
-		die(_("packfile name '%s' does not end with '.pack'"),
-		    pack_name);
+	if (!strip_suffix(pack_name, strip, &len))
+		die(_("packfile name '%s' does not end with '%s'"),
+		    pack_name, strip);
 	strbuf_add(buf, pack_name, len);
 	strbuf_addch(buf, '.');
 	strbuf_addstr(buf, suffix);
@@ -1459,7 +1459,7 @@ static void write_special_file(const char *suffix, const char *msg,
 	int msg_len = strlen(msg);
 
 	if (pack_name)
-		filename = derive_filename(pack_name, suffix, &name_buf);
+		filename = derive_filename(pack_name, ".pack", suffix, &name_buf);
 	else
 		filename = odb_pack_name(&name_buf, hash, suffix);
 
@@ -1484,12 +1484,14 @@ static void write_special_file(const char *suffix, const char *msg,
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
+		  const char *final_rev_index_name, const char *curr_rev_index_name,
 		  const char *keep_msg, const char *promisor_msg,
 		  unsigned char *hash)
 {
 	const char *report = "pack";
 	struct strbuf pack_name = STRBUF_INIT;
 	struct strbuf index_name = STRBUF_INIT;
+	struct strbuf rev_index_name = STRBUF_INIT;
 	int err;
 
 	if (!from_stdin) {
@@ -1524,6 +1526,16 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	} else
 		chmod(final_index_name, 0444);
 
+	if (curr_rev_index_name) {
+		if (final_rev_index_name != curr_rev_index_name) {
+			if (!final_rev_index_name)
+				final_rev_index_name = odb_pack_name(&rev_index_name, hash, "rev");
+			if (finalize_object_file(curr_rev_index_name, final_rev_index_name))
+				die(_("cannot store reverse index file"));
+		} else
+			chmod(final_rev_index_name, 0444);
+	}
+
 	if (do_fsck_object) {
 		struct packed_git *p;
 		p = add_packed_git(final_index_name, strlen(final_index_name), 0);
@@ -1553,6 +1565,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 		}
 	}
 
+	strbuf_release(&rev_index_name);
 	strbuf_release(&index_name);
 	strbuf_release(&pack_name);
 }
@@ -1578,6 +1591,12 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
 		}
 		return 0;
 	}
+	if (!strcmp(k, "pack.writereverseindex")) {
+		if (git_config_bool(k, v))
+			opts->flags |= WRITE_REV;
+		else
+			opts->flags &= ~WRITE_REV;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -1695,12 +1714,14 @@ static void show_pack_info(int stat_only)
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
-	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
+	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
 	const char *curr_index;
-	const char *index_name = NULL, *pack_name = NULL;
+	const char *curr_rev_index = NULL;
+	const char *index_name = NULL, *pack_name = NULL, *rev_index_name = NULL;
 	const char *keep_msg = NULL;
 	const char *promisor_msg = NULL;
 	struct strbuf index_name_buf = STRBUF_INIT;
+	struct strbuf rev_index_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_hash[GIT_MAX_RAWSZ];
@@ -1727,6 +1748,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (prefix && chdir(prefix))
 		die(_("Cannot come back to cwd"));
 
+	rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
+
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 
@@ -1805,6 +1828,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				if (hash_algo == GIT_HASH_UNKNOWN)
 					die(_("unknown hash algorithm '%s'"), arg);
 				repo_set_hash_algo(the_repository, hash_algo);
+			} else if (!strcmp(arg, "--rev-index")) {
+				rev_index = 1;
+			} else if (!strcmp(arg, "--no-rev-index")) {
+				rev_index = 0;
 			} else
 				usage(index_pack_usage);
 			continue;
@@ -1824,7 +1851,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (from_stdin && hash_algo)
 		die(_("--object-format cannot be used with --stdin"));
 	if (!index_name && pack_name)
-		index_name = derive_filename(pack_name, "idx", &index_name_buf);
+		index_name = derive_filename(pack_name, ".pack", "idx", &index_name_buf);
+
+	opts.flags &= ~(WRITE_REV | WRITE_REV_VERIFY);
+	if (rev_index) {
+		opts.flags |= verify ? WRITE_REV_VERIFY : WRITE_REV;
+		if (index_name)
+			rev_index_name = derive_filename(index_name,
+							 ".idx", "rev",
+							 &rev_index_name_buf);
+	}
 
 	if (verify) {
 		if (!index_name)
@@ -1878,11 +1914,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	for (i = 0; i < nr_objects; i++)
 		idx_objects[i] = &objects[i].idx;
 	curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_hash);
+	if (rev_index)
+		curr_rev_index = write_rev_file(rev_index_name, idx_objects,
+						nr_objects, pack_hash,
+						opts.flags);
 	free(idx_objects);
 
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
+		      rev_index_name, curr_rev_index,
 		      keep_msg, promisor_msg,
 		      pack_hash);
 	else
@@ -1893,10 +1934,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	free(objects);
 	strbuf_release(&index_name_buf);
+	strbuf_release(&rev_index_name_buf);
 	if (pack_name == NULL)
 		free((void *) curr_pack);
 	if (index_name == NULL)
 		free((void *) curr_index);
+	if (rev_index_name == NULL)
+		free((void *) curr_rev_index);
 
 	/*
 	 * Let the caller know this pack is not self contained
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
new file mode 100755
index 0000000000..2dae213126
--- /dev/null
+++ b/t/t5325-reverse-index.sh
@@ -0,0 +1,71 @@
+#!/bin/sh
+
+test_description='on-disk reverse index'
+. ./test-lib.sh
+
+packdir=.git/objects/pack
+
+test_expect_success 'setup' '
+	test_commit base &&
+
+	pack=$(git pack-objects --all $packdir/pack) &&
+	rev=$packdir/pack-$pack.rev &&
+
+	test_path_is_missing $rev
+'
+
+test_index_pack () {
+	rm -f $rev &&
+	conf=$1 &&
+	shift &&
+	# remove the index since Windows won't overwrite an existing file
+	rm $packdir/pack-$pack.idx &&
+	git -c pack.writeReverseIndex=$conf index-pack "$@" \
+		$packdir/pack-$pack.pack
+}
+
+test_expect_success 'index-pack with pack.writeReverseIndex' '
+	test_index_pack "" &&
+	test_path_is_missing $rev &&
+
+	test_index_pack false &&
+	test_path_is_missing $rev &&
+
+	test_index_pack true &&
+	test_path_is_file $rev
+'
+
+test_expect_success 'index-pack with --[no-]rev-index' '
+	for conf in "" true false
+	do
+		test_index_pack "$conf" --rev-index &&
+		test_path_exists $rev &&
+
+		test_index_pack "$conf" --no-rev-index &&
+		test_path_is_missing $rev
+	done
+'
+
+test_expect_success 'index-pack can verify reverse indexes' '
+	test_when_finished "rm -f $rev" &&
+	test_index_pack true &&
+
+	test_path_is_file $rev &&
+	git index-pack --rev-index --verify $packdir/pack-$pack.pack &&
+
+	# Intentionally corrupt the reverse index.
+	chmod u+w $rev &&
+	printf "xxxx" | dd of=$rev bs=1 count=4 conv=notrunc &&
+
+	test_must_fail git index-pack --rev-index --verify \
+		$packdir/pack-$pack.pack 2>err &&
+	grep "validation error" err
+'
+
+test_expect_success 'index-pack infers reverse index name with -o' '
+	git index-pack --rev-index -o other.idx $packdir/pack-$pack.pack &&
+	test_path_is_file other.idx &&
+	test_path_is_file other.rev
+'
+
+test_done
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex'
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                     ` (2 preceding siblings ...)
  2021-01-13 22:28   ` [PATCH v2 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

Now that we have an implementation that can write the new reverse index
format, enable writing a .rev file in 'git pack-objects' by consulting
the pack.writeReverseIndex configuration variable.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c   |  7 +++++++
 t/t5325-reverse-index.sh | 13 +++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5b0c4489e2..d784569200 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2955,6 +2955,13 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 			    pack_idx_opts.version);
 		return 0;
 	}
+	if (!strcmp(k, "pack.writereverseindex")) {
+		if (git_config_bool(k, v))
+			pack_idx_opts.flags |= WRITE_REV;
+		else
+			pack_idx_opts.flags &= ~WRITE_REV;
+		return 0;
+	}
 	if (!strcmp(k, "uploadpack.blobpackfileuri")) {
 		struct configured_exclusion *ex = xmalloc(sizeof(*ex));
 		const char *oid_end, *pack_end;
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index 2dae213126..87040263b7 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -68,4 +68,17 @@ test_expect_success 'index-pack infers reverse index name with -o' '
 	test_path_is_file other.rev
 '
 
+test_expect_success 'pack-objects respects pack.writeReverseIndex' '
+	test_when_finished "rm -fr pack-1-*" &&
+
+	git -c pack.writeReverseIndex= pack-objects --all pack-1 &&
+	test_path_is_missing pack-1-*.rev &&
+
+	git -c pack.writeReverseIndex=false pack-objects --all pack-1 &&
+	test_path_is_missing pack-1-*.rev &&
+
+	git -c pack.writeReverseIndex=true pack-objects --all pack-1 &&
+	test_path_is_file pack-1-*.rev
+'
+
 test_done
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                     ` (3 preceding siblings ...)
  2021-01-13 22:28   ` [PATCH v2 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

Now that the pack.writeReverseIndex configuration is respected in both
'git index-pack' and 'git pack-objects' (and therefore, all of their
callers), we can safely advertise it for use in the git-config manual.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 837f1b1679..3da4ea98e2 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -133,3 +133,10 @@ pack.writeBitmapHashCache::
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
 	bytes per object of disk space. Defaults to true.
+
+pack.writeReverseIndex::
+	When true, git will write a corresponding .rev file (see:
+	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
+	for each new packfile that it writes in all places except for
+	linkgit:git-fast-import[1] and in the bulk checkin mechanism.
+	Defaults to false.
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                     ` (4 preceding siblings ...)
  2021-01-13 22:28   ` [PATCH v2 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
  7 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

In the next patch, we'll add support for unconditionally enabling the
'pack.writeReverseIndex' setting with a new GIT_TEST_WRITE_REV_INDEX
environment variable.

This causes a little bit of fallout with tests that, for example,
compare the list of files in the pack directory being unprepared to see
.rev files in its output.

Those locations can be cleaned up to look for specific file extensions,
rather than take everything in the pack directory (for instance) and
then grep out unwanted items.

Once the pack.writeReverseIndex option has been thoroughly
tested, we will default it to 'true', removing GIT_TEST_WRITE_REV_INDEX,
and making it possible to revert this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh |  5 +++--
 t/t5325-reverse-index.sh    |  4 ++++
 t/t5604-clone-reference.sh  |  2 +-
 t/t5702-protocol-v2.sh      | 12 ++++++++----
 t/t6500-gc.sh               |  6 +++---
 t/t9300-fast-import.sh      |  5 ++++-
 6 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 297de502a9..2fc3aadbd1 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -710,8 +710,9 @@ test_expect_success 'expire respects .keep files' '
 		PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) &&
 		touch $PACKA.keep &&
 		git multi-pack-index expire &&
-		ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files &&
-		test_line_count = 3 a-pack-files &&
+		test_path_is_file $PACKA.idx &&
+		test_path_is_file $PACKA.keep &&
+		test_path_is_file $PACKA.pack &&
 		test-tool read-midx .git/objects | grep idx >midx-list &&
 		test_line_count = 2 midx-list
 	)
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index 87040263b7..be452bb343 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -3,6 +3,10 @@
 test_description='on-disk reverse index'
 . ./test-lib.sh
 
+# The below tests want control over the 'pack.writeReverseIndex' setting
+# themselves to assert various combinations of it with other options.
+sane_unset GIT_TEST_WRITE_REV_INDEX
+
 packdir=.git/objects/pack
 
 test_expect_success 'setup' '
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 2f7be23044..7d93588982 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -326,7 +326,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at obje
 	for raw in $(ls T*.raw)
 	do
 		sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
-		    -e "/multi-pack-index/d" <$raw >$raw.de-sha-1 &&
+		    -e "/multi-pack-index/d" -e "/rev/d" <$raw >$raw.de-sha-1 &&
 		sort $raw.de-sha-1 >$raw.de-sha || return 1
 	done &&
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 7d5b17909b..73cd9e3ff6 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -848,8 +848,10 @@ test_expect_success 'part of packfile response provided as URI' '
 	test -f h2found &&
 
 	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
-	test_line_count = 6 filelist
+	ls http_child/.git/objects/pack/*.pack >packlist &&
+	ls http_child/.git/objects/pack/*.idx >idxlist &&
+	test_line_count = 3 idxlist &&
+	test_line_count = 3 packlist
 '
 
 test_expect_success 'fetching with valid packfile URI but invalid hash fails' '
@@ -902,8 +904,10 @@ test_expect_success 'packfile-uri with transfer.fsckobjects' '
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
 	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
-	test_line_count = 4 filelist
+	ls http_child/.git/objects/pack/*.pack >packlist &&
+	ls http_child/.git/objects/pack/*.idx >idxlist &&
+	test_line_count = 2 idxlist &&
+	test_line_count = 2 packlist
 '
 
 test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object' '
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 4a3b8f48ac..f76586f808 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -106,17 +106,17 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_commit "$(test_oid obj2)" &&
 	# Our first gc will create a pack; our second will create a second pack
 	git gc --auto &&
-	ls .git/objects/pack | sort >existing_packs &&
+	ls .git/objects/pack/pack-*.pack | sort >existing_packs &&
 	test_commit "$(test_oid obj3)" &&
 	test_commit "$(test_oid obj4)" &&
 
 	git gc --auto 2>err &&
 	test_i18ngrep ! "^warning:" err &&
-	ls .git/objects/pack/ | sort >post_packs &&
+	ls .git/objects/pack/pack-*.pack | sort >post_packs &&
 	comm -1 -3 existing_packs post_packs >new &&
 	comm -2 -3 existing_packs post_packs >del &&
 	test_line_count = 0 del && # No packs are deleted
-	test_line_count = 2 new # There is one new pack and its .idx
+	test_line_count = 1 new # There is one new pack
 '
 
 test_expect_success 'gc --no-quiet' '
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 308c1ef42c..2cc1f43c1b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1629,7 +1629,10 @@ test_expect_success 'O: blank lines not necessary after other commands' '
 	INPUT_END
 
 	git fast-import <input &&
-	test 8 = $(find .git/objects/pack -type f | grep -v multi-pack-index | wc -l) &&
+	ls -la .git/objects/pack/pack-*.pack >packlist &&
+	ls -la .git/objects/pack/pack-*.pack >idxlist &&
+	test_line_count = 4 idxlist &&
+	test_line_count = 4 packlist &&
 	test $(git rev-parse refs/tags/O3-2nd) = $(git rev-parse O3^) &&
 	git log --reverse --pretty=oneline O3 | sed s/^.*z// >actual &&
 	test_cmp expect actual
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 7/8] t: support GIT_TEST_WRITE_REV_INDEX
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                     ` (5 preceding siblings ...)
  2021-01-13 22:28   ` [PATCH v2 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  2021-01-13 22:28   ` [PATCH v2 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
  7 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

Add a new option that unconditionally enables the pack.writeReverseIndex
setting in order to run the whole test suite in a mode that generates
on-disk reverse indexes. Additionally, enable this mode in the second
run of tests under linux-gcc in 'ci/run-build-and-tests.sh'.

Once on-disk reverse indexes are proven out over several releases, we
can change the default value of that configuration to 'true', and drop
this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/index-pack.c      | 5 ++++-
 builtin/pack-objects.c    | 2 ++
 ci/run-build-and-tests.sh | 1 +
 pack-revindex.h           | 2 ++
 t/README                  | 3 +++
 5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 03408250b1..0bde325a8b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1748,7 +1748,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (prefix && chdir(prefix))
 		die(_("Cannot come back to cwd"));
 
-	rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
+	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
+		rev_index = 1;
+	else
+		rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV));
 
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d784569200..24df0c98f7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3601,6 +3601,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	reset_pack_idx_option(&pack_idx_opts);
 	git_config(git_pack_config, NULL);
+	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
+		pack_idx_opts.flags |= WRITE_REV;
 
 	progress = isatty(2);
 	argc = parse_options(argc, argv, prefix, pack_objects_options,
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 6c27b886b8..d1cbf330a1 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -22,6 +22,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
+	export GIT_TEST_WRITE_REV_INDEX=1
 	make test
 	;;
 linux-clang)
diff --git a/pack-revindex.h b/pack-revindex.h
index 01622cf21a..7237b2b6f8 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -16,6 +16,8 @@
  *   can be found
  */
 
+#define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
+
 struct packed_git;
 
 /*
diff --git a/t/README b/t/README
index c730a70770..0f97a51640 100644
--- a/t/README
+++ b/t/README
@@ -439,6 +439,9 @@ GIT_TEST_DEFAULT_HASH=<hash-algo> specifies which hash algorithm to
 use in the test scripts. Recognized values for <hash-algo> are "sha1"
 and "sha256".
 
+GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
+'pack.writeReverseIndex' setting.
+
 Naming Tests
 ------------
 
-- 
2.30.0.138.g6d7191ea01


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence
  2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
                     ` (6 preceding siblings ...)
  2021-01-13 22:28   ` [PATCH v2 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
@ 2021-01-13 22:28   ` Taylor Blau
  7 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-13 22:28 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, jrnieder, peff

When an on-disk reverse index exists, there is no need to generate one
in memory. In fact, doing so can be slow, and require large amounts of
the heap.

Let's make sure that we treat the on-disk reverse index with precedence
(i.e., that when it exists, we don't bother trying to generate an
equivalent one in memory) by teaching Git how to conditionally die()
when generating a reverse index in memory.

Then, add a test to ensure that when (a) an on-disk reverse index
exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we
do not die, implying that we read from the on-disk one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-revindex.c          | 4 ++++
 pack-revindex.h          | 1 +
 t/t5325-reverse-index.sh | 9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/pack-revindex.c b/pack-revindex.c
index 369812dd21..f264319f34 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -2,6 +2,7 @@
 #include "pack-revindex.h"
 #include "object-store.h"
 #include "packfile.h"
+#include "config.h"
 
 struct revindex_entry {
 	off_t offset;
@@ -166,6 +167,9 @@ static void create_pack_revindex(struct packed_git *p)
 
 static int load_pack_revindex_from_memory(struct packed_git *p)
 {
+	if (git_env_bool(GIT_TEST_REV_INDEX_DIE_IN_MEMORY, 0))
+		die("dying as requested by '%s'",
+		    GIT_TEST_REV_INDEX_DIE_IN_MEMORY);
 	if (open_pack_index(p))
 		return -1;
 	create_pack_revindex(p);
diff --git a/pack-revindex.h b/pack-revindex.h
index 7237b2b6f8..97f5893d3a 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -17,6 +17,7 @@
  */
 
 #define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
+#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
 
 struct packed_git;
 
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index be452bb343..a344b18d7e 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -85,4 +85,13 @@ test_expect_success 'pack-objects respects pack.writeReverseIndex' '
 	test_path_is_file pack-1-*.rev
 '
 
+test_expect_success 'reverse index is not generated when available on disk' '
+	test_index_pack true &&
+	test_path_is_file $rev &&
+
+	git rev-parse HEAD >tip &&
+	GIT_TEST_REV_INDEX_DIE_IN_MEMORY=1 git cat-file \
+		--batch-check="%(objectsize:disk)" <tip
+'
+
 test_done
-- 
2.30.0.138.g6d7191ea01

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-13 22:28   ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
@ 2021-01-14  7:22     ` Junio C Hamano
  2021-01-14 12:07       ` Derrick Stolee
  2021-01-14 18:28       ` Taylor Blau
  2021-01-14  7:26     ` Junio C Hamano
  1 sibling, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2021-01-14  7:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, jrnieder, peff

Taylor Blau <me@ttaylorr.com> writes:

> +== pack-*.rev files have the format:
> +
> +  - A 4-byte magic number '0x52494458' ('RIDX').
> +
> +  - A 4-byte version identifier (= 1)
> +
> +  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)

These two are presumably 4-byte-wide network byte order integers.
We should spell it out.

> +  - A table of index positions, sorted by their corresponding offsets in the
> +    packfile.

Likewise, how wide is each entry and in what byte order, and how
many entries are there in the table?

	... oh, what about the "one beyond the last"?  We cannot
	go back to the forward index to learn the offset of such
	an non-existent object, can we?

Again, I expect it to be 4-byte-wide network byte order integer.

> -int load_pack_revindex(struct packed_git *p)
> +static int load_pack_revindex_from_memory(struct packed_git *p)

I said it when I saw the beginning of v1 API patches, but it is a
bit unreasonable to call the act of "computing from the forward
index" "to load from memory".  Loading from disk perfectly works as
a phrase, though.

> +#define RIDX_MIN_SIZE (12 + (2 * the_hash_algo->rawsz))
> +
> +static int load_revindex_from_disk(char *revindex_name,
> +				   uint32_t num_objects,
> +				   const void **data, size_t *len)
> +{
> +	int fd, ret = 0;
> +	struct stat st;
> +	size_t revindex_size;
> +
> +	fd = git_open(revindex_name);
> +
> +	if (fd < 0) {
> +		ret = -1;
> +		goto cleanup;
> +	}
> +	if (fstat(fd, &st)) {
> +		ret = error_errno(_("failed to read %s"), revindex_name);
> +		goto cleanup;
> +	}
> +
> +	revindex_size = xsize_t(st.st_size);
> +
> +	if (revindex_size < RIDX_MIN_SIZE) {
> +		ret = error(_("reverse-index file %s is too small"), revindex_name);
> +		goto cleanup;
> +	}
> +
> +	if (revindex_size - RIDX_MIN_SIZE != st_mult(sizeof(uint32_t), num_objects)) {
> +		ret = error(_("reverse-index file %s is corrupt"), revindex_name);
> +		goto cleanup;
> +	}
> +
> +	*len = revindex_size;
> +	*data = xmmap(NULL, revindex_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +
> +cleanup:
> +	close(fd);
> +	return ret;
> +}
> +
> +static int load_pack_revindex_from_disk(struct packed_git *p)
> +{
> +	char *revindex_name;
> +	int ret;
> +	if (open_pack_index(p))
> +		return -1;
> +
> +	revindex_name = pack_revindex_filename(p);
> +
> +	ret = load_revindex_from_disk(revindex_name,
> +				      p->num_objects,
> +				      &p->revindex_map,
> +				      &p->revindex_size);
> +	if (ret)
> +		goto cleanup;
> +
> +	p->revindex_data = (char *)p->revindex_map + 12;

We've seen hardcoded constant "12" twice so far in this patch.

We need a C proprocessor macro "#define RIDX_FILE_HEADER_SIZE 12" or
something, perhaps?

> +cleanup:
> +	free(revindex_name);
> +	return ret;
> +}
> +
> +int load_pack_revindex(struct packed_git *p)
> +{
> +	if (p->revindex || p->revindex_data)
> +		return 0;
> +
> +	if (!load_pack_revindex_from_disk(p))
> +		return 0;
> +	else if (!load_pack_revindex_from_memory(p))
> +		return 0;
> +	return -1;
> +}
> +
>  int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
>  {
>  	unsigned lo, hi;
> @@ -203,18 +285,28 @@ int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
>  
>  uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos)
>  {
> -	if (!p->revindex)
> +	if (!(p->revindex || p->revindex_data))
>  		BUG("pack_pos_to_index: reverse index not yet loaded");
>  	if (p->num_objects <= pos)
>  		BUG("pack_pos_to_index: out-of-bounds object at %"PRIu32, pos);
> -	return p->revindex[pos].nr;
> +
> +	if (p->revindex)
> +		return p->revindex[pos].nr;
> +	else
> +		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));

Good.  We are using 32-bit uint in network byte order.  We should
document it as such.

Let's not strip const away while casting, though.  get_be32()
ensures that it only reads and never writes thru the pointer, and
p->revindex_data is a "const void *".

>  }
>  
>  off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
>  {
> -	if (!p->revindex)
> +	if (!(p->revindex || p->revindex_data))
>  		BUG("pack_pos_to_index: reverse index not yet loaded");
>  	if (p->num_objects < pos)
>  		BUG("pack_pos_to_offset: out-of-bounds object at %"PRIu32, pos);
> -	return p->revindex[pos].offset;
> +
> +	if (p->revindex)
> +		return p->revindex[pos].offset;
> +	else if (pos == p->num_objects)
> +		return p->pack_size - the_hash_algo->rawsz;

OK, here is the answer to my previous question.  We should document
that the table has num_objects entries in the on-disk file (we do
not need to say that there is no sentinel entry in the table at the
end).

> +	else
> +		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
>  }
> diff --git a/pack-revindex.h b/pack-revindex.h
> index 6e0320b08b..01622cf21a 100644
> --- a/pack-revindex.h
> +++ b/pack-revindex.h
> @@ -21,6 +21,9 @@ struct packed_git;
>  /*
>   * load_pack_revindex populates the revindex's internal data-structures for the
>   * given pack, returning zero on success and a negative value otherwise.
> + *
> + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
> + * pointers are assigned into it (instead of using the in-memory variant).

Hmph, I missed where it got checked for consistency, though.  If the
file is corrupt and has say duplicated entries, we'd happily grab
the data via get_be32(), for example.

> @@ -55,7 +58,9 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
>   * If the reverse index has not yet been loaded, or the position is out of
>   * bounds, this function aborts.
>   *
> - * This function runs in constant time.
> + * This function runs in constant time under both in-memory and on-disk reverse
> + * indexes, but an additional step is taken to consult the corresponding .idx
> + * file when using the on-disk format.

Again, I know this is a kind of detail that is interesting to those
who implemented the function, but I wonder how it would help those
who wonder if they should call it or use some other method to
achieve what they want.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-13 22:28   ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
  2021-01-14  7:22     ` Junio C Hamano
@ 2021-01-14  7:26     ` Junio C Hamano
  2021-01-14 18:13       ` Taylor Blau
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2021-01-14  7:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, jrnieder, peff

Taylor Blau <me@ttaylorr.com> writes:

> Specify the format of the on-disk reverse index 'pack-*.rev' file, as
> well as prepare the code for the existence of such files.

We've changed the pack .idx file format once as the file format is
versioned.  I wonder if you considered placing the reverse index
information in the same file, with version bump, in the .idx?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-14  7:22     ` Junio C Hamano
@ 2021-01-14 12:07       ` Derrick Stolee
  2021-01-14 19:57         ` Jeff King
  2021-01-14 18:28       ` Taylor Blau
  1 sibling, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2021-01-14 12:07 UTC (permalink / raw)
  To: Junio C Hamano, Taylor Blau; +Cc: git, dstolee, jrnieder, peff

On 1/14/2021 2:22 AM, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
>> +== pack-*.rev files have the format:
>> +
>> +  - A 4-byte magic number '0x52494458' ('RIDX').
>> +
>> +  - A 4-byte version identifier (= 1)
>> +
>> +  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)
> 
> These two are presumably 4-byte-wide network byte order integers.
> We should spell it out.

In the past, we've included the sentence "All multi-byte numbers are
in network byte order." to clarify this. These two entries need to
specify that the "identifier" is actually an integer. Perhaps these
three values could be provided as:

+== pack-*.rev files have the format:
+
+  - A 4-byte identifier '0x52494458' ('RIDX').
+
+  - A 4-byte integer version (= 1)
+
+  - A 4-byte integer hash function version (= 1 for SHA-1, 2 for SHA-256)

>>  /*
>>   * load_pack_revindex populates the revindex's internal data-structures for the
>>   * given pack, returning zero on success and a negative value otherwise.
>> + *
>> + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
>> + * pointers are assigned into it (instead of using the in-memory variant).
> 
> Hmph, I missed where it got checked for consistency, though.  If the
> file is corrupt and has say duplicated entries, we'd happily grab
> the data via get_be32(), for example.

Even if the consistency check is just verifying the trailing hash, that
seems like something that requires O(N) before performing a lookup. Perhaps
this was copied from somewhere else, or means something different?

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-14  7:26     ` Junio C Hamano
@ 2021-01-14 18:13       ` Taylor Blau
  2021-01-14 20:57         ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-01-14 18:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee, jrnieder, peff

On Wed, Jan 13, 2021 at 11:26:59PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Specify the format of the on-disk reverse index 'pack-*.rev' file, as
> > well as prepare the code for the existence of such files.
>
> We've changed the pack .idx file format once as the file format is
> versioned.  I wonder if you considered placing the reverse index
> information in the same file, with version bump, in the .idx?

Funny enough, I couldn't remember whether I considered this or not.
Peff reminded me off-list that he and I *had* considered this, but
decided against it.

The main benefit to introducing a new '.rev' format is that we can have
packs that can be upgraded to write a reverse index without having to
rewrite their forward index. It would also allow us to avoid teaching
other implementations about a new version of the index file (since they
can ignore it and continue to build their equivalent of the reverse
index file in memory by reading the forward index).

(Peff reminds me that dumb-http does look at remote .idx files, so this
new format would leak across to clients, whether or not that's something
to be concerned about...).

Of course, having the contents of the .rev file be included in the .idx
file nets us one fewer file to manage, but I'm not sure that's a reason
to do things one way or another.

Your response did pique my interest, since I was wondering if we could
improve the cold cache performance if the .rev file's contents were
included in the .idx, but after giving it some thought I don't think we
can. Reasons are:

  - If the reverse index's contents appears at the end of the .idx file,
    then in any .idx file large enough to matter, we'll almost certainly
    still be evicting cache lines back and forth when swapping between
    reading the forward- and reverse-indexes. So, no gains to be had
    there.

  - If, on the other hand, we included the reverse index's contents by
    interleaving it with the forward index's offsets, then we'd be
    worsening the cache performance of the forward index.

So, I'm more in favor of a new .rev file rather than a v3 .idx version.
Apologies for not including more of a rationale "why" in the cover
letter (had I not forgotten that I'd even considered it, I would have).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-14  7:22     ` Junio C Hamano
  2021-01-14 12:07       ` Derrick Stolee
@ 2021-01-14 18:28       ` Taylor Blau
  1 sibling, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-01-14 18:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, dstolee, jrnieder, peff

On Wed, Jan 13, 2021 at 11:22:53PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > +== pack-*.rev files have the format:
> > +
> > +  - A 4-byte magic number '0x52494458' ('RIDX').
> > +
> > +  - A 4-byte version identifier (= 1)
> > +
> > +  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)
>
> These two are presumably 4-byte-wide network byte order integers.
> We should spell it out.

Yep, all entries are network order. I added a sentence at the bottom of
this sub-section to say as much.

> We've seen hardcoded constant "12" twice so far in this patch.
>
> We need a C proprocessor macro "#define RIDX_FILE_HEADER_SIZE 12" or
> something, perhaps?

Good idea, thanks.

> > -	return p->revindex[pos].nr;
> > +
> > +	if (p->revindex)
> > +		return p->revindex[pos].nr;
> > +	else
> > +		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
>
> Good.  We are using 32-bit uint in network byte order.  We should
> document it as such.
>
> Let's not strip const away while casting, though.  get_be32()
> ensures that it only reads and never writes thru the pointer, and
> p->revindex_data is a "const void *".

Agreed, and thanks for the suggestion. I take it that what you mean is:

-		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
+		return get_be32((const char *)p->revindex_data + (pos * sizeof(uint32_t)));

...yes?

> > diff --git a/pack-revindex.h b/pack-revindex.h
> > index 6e0320b08b..01622cf21a 100644
> > --- a/pack-revindex.h
> > +++ b/pack-revindex.h
> > @@ -21,6 +21,9 @@ struct packed_git;
> >  /*
> >   * load_pack_revindex populates the revindex's internal data-structures for the
> >   * given pack, returning zero on success and a negative value otherwise.
> > + *
> > + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
> > + * pointers are assigned into it (instead of using the in-memory variant).
>
> Hmph, I missed where it got checked for consistency, though.  If the
> file is corrupt and has say duplicated entries, we'd happily grab
> the data via get_be32(), for example.

It doesn't, I'm mistaken. I removed that incorrect detail from this
comment. Thanks for catching it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-14 12:07       ` Derrick Stolee
@ 2021-01-14 19:57         ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2021-01-14 19:57 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Taylor Blau, git, dstolee, jrnieder

On Thu, Jan 14, 2021 at 07:07:08AM -0500, Derrick Stolee wrote:

> >> + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
> >> + * pointers are assigned into it (instead of using the in-memory variant).
> > 
> > Hmph, I missed where it got checked for consistency, though.  If the
> > file is corrupt and has say duplicated entries, we'd happily grab
> > the data via get_be32(), for example.
> 
> Even if the consistency check is just verifying the trailing hash, that
> seems like something that requires O(N) before performing a lookup. Perhaps
> this was copied from somewhere else, or means something different?

For the .idx file, we check that the size is what we expect. This is
important because it lets us access the mapped bytes in normal use
without having to do a bounds check.

It looks like we do the same for the .rev file here, which is good.  If
calling that "checked for consistency" is too strong, I don't think it's
a big deal to drop the wording (we do not make any such claim for
open_pack_index()).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  2021-01-14 18:13       ` Taylor Blau
@ 2021-01-14 20:57         ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2021-01-14 20:57 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, jrnieder, peff

Taylor Blau <me@ttaylorr.com> writes:

> Your response did pique my interest, since I was wondering if we could
> improve the cold cache performance if the .rev file's contents were
> included in the .idx,...

Yes, that was the primary thing I was wondering.  As long as it was
considered and rejected for good reason, that is good.

Thanks.



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-01-14 21:04 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
2021-01-08 18:19 ` [PATCH 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
2021-01-08 18:20 ` [PATCH 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
2021-01-08 18:20 ` [PATCH 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
2021-01-08 18:20 ` [PATCH 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
2021-01-08 18:20 ` [PATCH 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
2021-01-08 18:20 ` [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-12 17:11   ` Ævar Arnfjörð Bjarmason
2021-01-12 18:40     ` Taylor Blau
2021-01-08 18:20 ` [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-12 16:49   ` Derrick Stolee
2021-01-12 17:34     ` Taylor Blau
2021-01-12 17:18   ` Ævar Arnfjörð Bjarmason
2021-01-12 17:39     ` Derrick Stolee
2021-01-12 18:17       ` Taylor Blau
2021-01-08 18:20 ` [PATCH 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
2021-01-13 22:28   ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
2021-01-14  7:22     ` Junio C Hamano
2021-01-14 12:07       ` Derrick Stolee
2021-01-14 19:57         ` Jeff King
2021-01-14 18:28       ` Taylor Blau
2021-01-14  7:26     ` Junio C Hamano
2021-01-14 18:13       ` Taylor Blau
2021-01-14 20:57         ` Junio C Hamano
2021-01-13 22:28   ` [PATCH v2 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
2021-01-13 22:28   ` [PATCH v2 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
2021-01-13 22:28   ` [PATCH v2 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
2021-01-13 22:28   ` [PATCH v2 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
2021-01-13 22:28   ` [PATCH v2 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-13 22:28   ` [PATCH v2 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-13 22:28   ` [PATCH v2 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git