git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 00/12] Rewrite packfile reuse code
@ 2019-12-18 11:25 Christian Couder
  2019-12-18 11:25 ` [PATCH v4 01/12] builtin/pack-objects: report reused packfile objects Christian Couder
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

This patch series is rewriting the code that tries to reuse existing
packfiles.

The code in this patch series was written by GitHub, and Peff nicely
provided it in the following discussion:

https://public-inbox.org/git/3E56B0FD-EBE8-4057-A93A-16EBB09FBCE0@jramsay.com.au/

The first versions of this patch series were also discussed:

v3: https://public-inbox.org/git/20191115141541.11149-1-chriscool@tuxfamily.org/
V2: https://public-inbox.org/git/20191019103531.23274-1-chriscool@tuxfamily.org/
V1: https://public-inbox.org/git/20190913130226.7449-1-chriscool@tuxfamily.org/

Thanks to the reviewers!

According to Peff this new code is a lot smarter than what it
replaces. It allows "holes" in the chunks of packfile to be reused,
and skips over them. It rewrites OFS_DELTA offsets as it goes to
account for the holes. So it's basically a linear walk over the
packfile, but with the important distinction that we don't add those
objects to the object_entry array, which makes them very lightweight
(especially in memory use, but they also aren't considered bases for
finding new deltas, etc). It seems like a good compromise between the
cost to serve a clone and the quality of the resulting packfile.

This series has been rebased onto current master ad05a3d8e5 (The fifth
batch, 2019-12-10).

Other changes since the previous patch series have all been suggested
by Peff. Thanks to him! They are the following:

  - Add note in commit message of patch 3/12.

  - Move previous patch 4/9 to patch 12/12 at the end of the series to
    avoid test failures.

  - Add new patches 5/12 and 6/12.

  - Improve commit message and documentation of pack.allowPackReuse in
    patch 8/12.

  - Improve commit message of patch 10/12.

  - Extract patch 11/12 from patch 10/12.

Jeff King (12):
  builtin/pack-objects: report reused packfile objects
  packfile: expose get_delta_base()
  ewah/bitmap: introduce bitmap_word_alloc()
  pack-bitmap: introduce bitmap_walk_contains()
  pack-bitmap: uninteresting oid can be outside bitmapped packfile
  pack-bitmap: simplify bitmap_has_oid_in_uninteresting()
  csum-file: introduce hashfile_total()
  pack-objects: introduce pack.allowPackReuse
  builtin/pack-objects: introduce obj_is_packed()
  pack-objects: improve partial packfile reuse
  pack-objects: add checks for duplicate objects
  pack-bitmap: don't rely on bitmap_git->reuse_objects

 Documentation/config/pack.txt |   7 +
 builtin/pack-objects.c        | 243 +++++++++++++++++++++++++++-------
 csum-file.h                   |   9 ++
 ewah/bitmap.c                 |  13 +-
 ewah/ewok.h                   |   1 +
 pack-bitmap.c                 | 192 ++++++++++++++++++---------
 pack-bitmap.h                 |   6 +-
 packfile.c                    |  10 +-
 packfile.h                    |   3 +
 9 files changed, 362 insertions(+), 122 deletions(-)

-- 
2.24.1.498.g561400140f


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 01/12] builtin/pack-objects: report reused packfile objects
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 02/12] packfile: expose get_delta_base() Christian Couder
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan, James Ramsay

From: Jeff King <peff@peff.net>

To see when packfile reuse kicks in or not, it is useful to
show reused packfile objects statistics in the output of
upload-pack.

Helped-by: James Ramsay <james@jramsay.com.au>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/pack-objects.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 393c20a2d7..369f46fbc5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3509,7 +3509,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (progress)
 		fprintf_ln(stderr,
 			   _("Total %"PRIu32" (delta %"PRIu32"),"
-			     " reused %"PRIu32" (delta %"PRIu32")"),
-			   written, written_delta, reused, reused_delta);
+			     " reused %"PRIu32" (delta %"PRIu32"),"
+			     " pack-reused %"PRIu32),
+			   written, written_delta, reused, reused_delta,
+			   reuse_packfile_objects);
 	return 0;
 }
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 02/12] packfile: expose get_delta_base()
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
  2019-12-18 11:25 ` [PATCH v4 01/12] builtin/pack-objects: report reused packfile objects Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 03/12] ewah/bitmap: introduce bitmap_word_alloc() Christian Couder
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

In a following commit get_delta_base() will be used outside
packfile.c, so let's make it non static and declare it in
packfile.h.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 packfile.c | 10 +++++-----
 packfile.h |  3 +++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/packfile.c b/packfile.c
index 355066de17..81e66847bf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1173,11 +1173,11 @@ const struct packed_git *has_packed_and_bad(struct repository *r,
 	return NULL;
 }
 
-static off_t get_delta_base(struct packed_git *p,
-				    struct pack_window **w_curs,
-				    off_t *curpos,
-				    enum object_type type,
-				    off_t delta_obj_offset)
+off_t get_delta_base(struct packed_git *p,
+		     struct pack_window **w_curs,
+		     off_t *curpos,
+		     enum object_type type,
+		     off_t delta_obj_offset)
 {
 	unsigned char *base_info = use_pack(p, w_curs, *curpos, NULL);
 	off_t base_offset;
diff --git a/packfile.h b/packfile.h
index fc7904ec81..ec536a4ae5 100644
--- a/packfile.h
+++ b/packfile.h
@@ -151,6 +151,9 @@ void *unpack_entry(struct repository *r, struct packed_git *, off_t, enum object
 unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
 unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *);
+off_t get_delta_base(struct packed_git *p, struct pack_window **w_curs,
+		     off_t *curpos, enum object_type type,
+		     off_t delta_obj_offset);
 
 void release_pack_memory(size_t);
 
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 03/12] ewah/bitmap: introduce bitmap_word_alloc()
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
  2019-12-18 11:25 ` [PATCH v4 01/12] builtin/pack-objects: report reused packfile objects Christian Couder
  2019-12-18 11:25 ` [PATCH v4 02/12] packfile: expose get_delta_base() Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 04/12] pack-bitmap: introduce bitmap_walk_contains() Christian Couder
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

In a following commit we will need to allocate a variable
number of bitmap words, instead of always 32, so let's add
bitmap_word_alloc() for this purpose.

Note that we have to adjust the block growth in bitmap_set(),
since a caller could now use an initial size of "0" (we don't
plan to do that, but it doesn't hurt to be defensive).

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 ewah/bitmap.c | 13 +++++++++----
 ewah/ewok.h   |  1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 52f1178db4..b5fed9621f 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -22,21 +22,26 @@
 #define EWAH_MASK(x) ((eword_t)1 << (x % BITS_IN_EWORD))
 #define EWAH_BLOCK(x) (x / BITS_IN_EWORD)
 
-struct bitmap *bitmap_new(void)
+struct bitmap *bitmap_word_alloc(size_t word_alloc)
 {
 	struct bitmap *bitmap = xmalloc(sizeof(struct bitmap));
-	bitmap->words = xcalloc(32, sizeof(eword_t));
-	bitmap->word_alloc = 32;
+	bitmap->words = xcalloc(word_alloc, sizeof(eword_t));
+	bitmap->word_alloc = word_alloc;
 	return bitmap;
 }
 
+struct bitmap *bitmap_new(void)
+{
+	return bitmap_word_alloc(32);
+}
+
 void bitmap_set(struct bitmap *self, size_t pos)
 {
 	size_t block = EWAH_BLOCK(pos);
 
 	if (block >= self->word_alloc) {
 		size_t old_size = self->word_alloc;
-		self->word_alloc = block * 2;
+		self->word_alloc = block ? block * 2 : 1;
 		REALLOC_ARRAY(self->words, self->word_alloc);
 		memset(self->words + old_size, 0x0,
 			(self->word_alloc - old_size) * sizeof(eword_t));
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 84b2a29faa..1b98b57c8b 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -172,6 +172,7 @@ struct bitmap {
 };
 
 struct bitmap *bitmap_new(void);
+struct bitmap *bitmap_word_alloc(size_t word_alloc);
 void bitmap_set(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_reset(struct bitmap *self);
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 04/12] pack-bitmap: introduce bitmap_walk_contains()
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (2 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 03/12] ewah/bitmap: introduce bitmap_word_alloc() Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 05/12] pack-bitmap: uninteresting oid can be outside bitmapped packfile Christian Couder
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

We will use this helper function in a following commit to
tell us if an object is packed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 pack-bitmap.c | 12 ++++++++++++
 pack-bitmap.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e07c798879..fb4f6297f2 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -830,6 +830,18 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+int bitmap_walk_contains(struct bitmap_index *bitmap_git,
+			 struct bitmap *bitmap, const struct object_id *oid)
+{
+	int idx;
+
+	if (!bitmap)
+		return 0;
+
+	idx = bitmap_position(bitmap_git, oid);
+	return idx >= 0 && bitmap_get(bitmap, idx);
+}
+
 void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
 				 show_reachable_fn show_reachable)
 {
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 466c5afa09..6ab6033dbe 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -3,6 +3,7 @@
 
 #include "ewah/ewok.h"
 #include "khash.h"
+#include "pack.h"
 #include "pack-objects.h"
 
 struct commit;
@@ -53,6 +54,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
 			     kh_oid_map_t *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
+int bitmap_walk_contains(struct bitmap_index *,
+			 struct bitmap *bitmap, const struct object_id *oid);
 
 /*
  * After a traversal has been performed by prepare_bitmap_walk(), this can be
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 05/12] pack-bitmap: uninteresting oid can be outside bitmapped packfile
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (3 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 04/12] pack-bitmap: introduce bitmap_walk_contains() Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 06/12] pack-bitmap: simplify bitmap_has_oid_in_uninteresting() Christian Couder
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

bitmap_has_oid_in_uninteresting() only used bitmap_position_packfile(),
not bitmap_position(). So it wouldn't find objects which weren't in the
bitmapped packfile (i.e., ones where we extended the bitmap to handle
loose objects, or objects in other packs).

As we could reuse a delta against such an object it is suboptimal not
to use bitmap_position(), so let's use it instead of
bitmap_position_packfile().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 pack-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fb4f6297f2..de65f2fc36 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1137,7 +1137,7 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git,
 	if (!bitmap_git->haves)
 		return 0; /* walk had no "haves" */
 
-	pos = bitmap_position_packfile(bitmap_git, oid);
+	pos = bitmap_position(bitmap_git, oid);
 	if (pos < 0)
 		return 0;
 
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 06/12] pack-bitmap: simplify bitmap_has_oid_in_uninteresting()
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (4 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 05/12] pack-bitmap: uninteresting oid can be outside bitmapped packfile Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 07/12] csum-file: introduce hashfile_total() Christian Couder
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

Let's refactor bitmap_has_oid_in_uninteresting() using
bitmap_walk_contains().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 pack-bitmap.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index de65f2fc36..0bbc651dde 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1130,16 +1130,6 @@ void free_bitmap_index(struct bitmap_index *b)
 int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git,
 				    const struct object_id *oid)
 {
-	int pos;
-
-	if (!bitmap_git)
-		return 0; /* no bitmap loaded */
-	if (!bitmap_git->haves)
-		return 0; /* walk had no "haves" */
-
-	pos = bitmap_position(bitmap_git, oid);
-	if (pos < 0)
-		return 0;
-
-	return bitmap_get(bitmap_git->haves, pos);
+	return bitmap_git &&
+		bitmap_walk_contains(bitmap_git, bitmap_git->haves, oid);
 }
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 07/12] csum-file: introduce hashfile_total()
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (5 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 06/12] pack-bitmap: simplify bitmap_has_oid_in_uninteresting() Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 08/12] pack-objects: introduce pack.allowPackReuse Christian Couder
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

We will need this helper function in a following commit
to give us total number of bytes fed to the hashfile so far.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 csum-file.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/csum-file.h b/csum-file.h
index a98b1eee53..f9cbd317fb 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -42,6 +42,15 @@ void hashflush(struct hashfile *f);
 void crc32_begin(struct hashfile *);
 uint32_t crc32_end(struct hashfile *);
 
+/*
+ * Returns the total number of bytes fed to the hashfile so far (including ones
+ * that have not been written out to the descriptor yet).
+ */
+static inline off_t hashfile_total(struct hashfile *f)
+{
+	return f->total + f->offset;
+}
+
 static inline void hashwrite_u8(struct hashfile *f, uint8_t data)
 {
 	hashwrite(f, &data, sizeof(data));
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 08/12] pack-objects: introduce pack.allowPackReuse
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (6 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 07/12] csum-file: introduce hashfile_total() Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 09/12] builtin/pack-objects: introduce obj_is_packed() Christian Couder
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

Let's make it possible to configure if we want pack reuse or not.

The main reason it might not be wanted is probably debugging and
performance testing, though pack reuse _might_ cause larger packs,
because we wouldn't consider the reused objects as bases for
finding new deltas.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/pack.txt | 7 +++++++
 builtin/pack-objects.c        | 8 +++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 1d66f0c992..0dac580581 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -27,6 +27,13 @@ Note that changing the compression level will not automatically recompress
 all existing objects. You can force recompression by passing the -F option
 to linkgit:git-repack[1].
 
+pack.allowPackReuse::
+	When true, and when reachability bitmaps are enabled,
+	pack-objects will try to send parts of the bitmapped packfile
+	verbatim. This can reduce memory and CPU usage to serve fetches,
+	but might result in sending a slightly larger pack. Defaults to
+	true.
+
 pack.island::
 	An extended regular expression configuring a set of delta
 	islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 369f46fbc5..801e23dfe7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -96,6 +96,7 @@ static off_t reuse_packfile_offset;
 
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
+static int allow_pack_reuse = 1;
 static enum {
 	WRITE_BITMAP_FALSE = 0,
 	WRITE_BITMAP_QUIET,
@@ -2699,6 +2700,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
 	}
+	if (!strcmp(k, "pack.allowpackreuse")) {
+		allow_pack_reuse = git_config_bool(k, v);
+		return 0;
+	}
 	if (!strcmp(k, "pack.threads")) {
 		delta_search_threads = git_config_int(k, v);
 		if (delta_search_threads < 0)
@@ -3030,7 +3035,8 @@ static void loosen_unused_packed_objects(void)
  */
 static int pack_options_allow_reuse(void)
 {
-	return pack_to_stdout &&
+	return allow_pack_reuse &&
+	       pack_to_stdout &&
 	       allow_ofs_delta &&
 	       !ignore_packed_keep_on_disk &&
 	       !ignore_packed_keep_in_core &&
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 09/12] builtin/pack-objects: introduce obj_is_packed()
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (7 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 08/12] pack-objects: introduce pack.allowPackReuse Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 10/12] pack-objects: improve partial packfile reuse Christian Couder
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

Let's refactor the way we check if an object is packed by
introducing obj_is_packed(). This function is now a simple
wrapper around packlist_find(), but it will evolve in a
following commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/pack-objects.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 801e23dfe7..43730154ba 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2553,6 +2553,11 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size,
 	free(p);
 }
 
+static int obj_is_packed(const struct object_id *oid)
+{
+	return !!packlist_find(&to_pack, oid);
+}
+
 static void add_tag_chain(const struct object_id *oid)
 {
 	struct tag *tag;
@@ -2564,7 +2569,7 @@ static void add_tag_chain(const struct object_id *oid)
 	 * it was included via bitmaps, we would not have parsed it
 	 * previously).
 	 */
-	if (packlist_find(&to_pack, oid))
+	if (obj_is_packed(oid))
 		return;
 
 	tag = lookup_tag(the_repository, oid);
@@ -2588,7 +2593,7 @@ static int add_ref_tag(const char *path, const struct object_id *oid, int flag,
 
 	if (starts_with(path, "refs/tags/") && /* is a tag? */
 	    !peel_ref(path, &peeled)    && /* peelable? */
-	    packlist_find(&to_pack, &peeled))      /* object packed? */
+	    obj_is_packed(&peeled)) /* object packed? */
 		add_tag_chain(oid);
 	return 0;
 }
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 10/12] pack-objects: improve partial packfile reuse
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (8 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 09/12] builtin/pack-objects: introduce obj_is_packed() Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 11/12] pack-objects: add checks for duplicate objects Christian Couder
  2019-12-18 11:25 ` [PATCH v4 12/12] pack-bitmap: don't rely on bitmap_git->reuse_objects Christian Couder
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

The old code to reuse deltas from an existing packfile
just tried to dump a whole segment of the pack verbatim.
That's faster than the traditional way of actually adding
objects to the packing list, but it didn't kick in very
often. This new code is really going for a middle ground:
do _some_ per-object work, but way less than we'd
traditionally do.

The general strategy of the new code is to make a bitmap
of objects from the packfile we'll include, and then
iterate over it, writing out each object exactly as it is
in our on-disk pack, but _not_ adding it to our packlist
(which costs memory, and increases the search space for
deltas).

One complication is that if we're omitting some objects,
we can't set a delta against a base that we're not
sending. So we have to check each object in
try_partial_reuse() to make sure we have its delta.

About performance, in the worst case we might have
interleaved objects that we are sending or not sending,
and we'd have as many chunks as objects. But in practice
we send big chunks.

For instance, packing torvalds/linux on GitHub servers
now reused 6.5M objects, but only needed ~50k chunks.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/pack-objects.c | 214 ++++++++++++++++++++++++++++++++---------
 pack-bitmap.c          | 150 +++++++++++++++++++++--------
 pack-bitmap.h          |   3 +-
 3 files changed, 281 insertions(+), 86 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 43730154ba..c80c1fac94 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -92,7 +92,7 @@ static struct progress *progress_state;
 
 static struct packed_git *reuse_packfile;
 static uint32_t reuse_packfile_objects;
-static off_t reuse_packfile_offset;
+static struct bitmap *reuse_packfile_bitmap;
 
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
@@ -785,57 +785,185 @@ static struct object_entry **compute_write_order(void)
 	return wo;
 }
 
-static off_t write_reused_pack(struct hashfile *f)
+
+/*
+ * A reused set of objects. All objects in a chunk have the same
+ * relative position in the original packfile and the generated
+ * packfile.
+ */
+
+static struct reused_chunk {
+	/* The offset of the first object of this chunk in the original
+	 * packfile. */
+	off_t original;
+	/* The offset of the first object of this chunk in the generated
+	 * packfile minus "original". */
+	off_t difference;
+} *reused_chunks;
+static int reused_chunks_nr;
+static int reused_chunks_alloc;
+
+static void record_reused_object(off_t where, off_t offset)
+{
+	if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].difference == offset)
+		return;
+
+	ALLOC_GROW(reused_chunks, reused_chunks_nr + 1,
+		   reused_chunks_alloc);
+	reused_chunks[reused_chunks_nr].original = where;
+	reused_chunks[reused_chunks_nr].difference = offset;
+	reused_chunks_nr++;
+}
+
+/*
+ * Binary search to find the chunk that "where" is in. Note
+ * that we're not looking for an exact match, just the first
+ * chunk that contains it (which implicitly ends at the start
+ * of the next chunk.
+ */
+static off_t find_reused_offset(off_t where)
 {
-	unsigned char buffer[8192];
-	off_t to_write, total;
-	int fd;
+	int lo = 0, hi = reused_chunks_nr;
+	while (lo < hi) {
+		int mi = lo + ((hi - lo) / 2);
+		if (where == reused_chunks[mi].original)
+			return reused_chunks[mi].difference;
+		if (where < reused_chunks[mi].original)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
 
-	if (!is_pack_valid(reuse_packfile))
-		die(_("packfile is invalid: %s"), reuse_packfile->pack_name);
+	/*
+	 * The first chunk starts at zero, so we can't have gone below
+	 * there.
+	 */
+	assert(lo);
+	return reused_chunks[lo-1].difference;
+}
 
-	fd = git_open(reuse_packfile->pack_name);
-	if (fd < 0)
-		die_errno(_("unable to open packfile for reuse: %s"),
-			  reuse_packfile->pack_name);
+static void write_reused_pack_one(size_t pos, struct hashfile *out,
+				  struct pack_window **w_curs)
+{
+	off_t offset, next, cur;
+	enum object_type type;
+	unsigned long size;
 
-	if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
-		die_errno(_("unable to seek in reused packfile"));
+	offset = reuse_packfile->revindex[pos].offset;
+	next = reuse_packfile->revindex[pos + 1].offset;
 
-	if (reuse_packfile_offset < 0)
-		reuse_packfile_offset = reuse_packfile->pack_size - the_hash_algo->rawsz;
+	record_reused_object(offset, offset - hashfile_total(out));
 
-	total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
+	cur = offset;
+	type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
+	assert(type >= 0);
 
-	while (to_write) {
-		int read_pack = xread(fd, buffer, sizeof(buffer));
+	if (type == OBJ_OFS_DELTA) {
+		off_t base_offset;
+		off_t fixup;
+
+		unsigned char header[MAX_PACK_OBJECT_HEADER];
+		unsigned len;
+
+		base_offset = get_delta_base(reuse_packfile, w_curs, &cur, type, offset);
+		assert(base_offset != 0);
+
+		/* Convert to REF_DELTA if we must... */
+		if (!allow_ofs_delta) {
+			int base_pos = find_revindex_position(reuse_packfile, base_offset);
+			const unsigned char *base_sha1 =
+				nth_packed_object_sha1(reuse_packfile,
+						       reuse_packfile->revindex[base_pos].nr);
+
+			len = encode_in_pack_object_header(header, sizeof(header),
+							   OBJ_REF_DELTA, size);
+			hashwrite(out, header, len);
+			hashwrite(out, base_sha1, 20);
+			copy_pack_data(out, reuse_packfile, w_curs, cur, next - cur);
+			return;
+		}
 
-		if (read_pack <= 0)
-			die_errno(_("unable to read from reused packfile"));
+		/* Otherwise see if we need to rewrite the offset... */
+		fixup = find_reused_offset(offset) -
+			find_reused_offset(base_offset);
+		if (fixup) {
+			unsigned char ofs_header[10];
+			unsigned i, ofs_len;
+			off_t ofs = offset - base_offset - fixup;
 
-		if (read_pack > to_write)
-			read_pack = to_write;
+			len = encode_in_pack_object_header(header, sizeof(header),
+							   OBJ_OFS_DELTA, size);
 
-		hashwrite(f, buffer, read_pack);
-		to_write -= read_pack;
+			i = sizeof(ofs_header) - 1;
+			ofs_header[i] = ofs & 127;
+			while (ofs >>= 7)
+				ofs_header[--i] = 128 | (--ofs & 127);
+
+			ofs_len = sizeof(ofs_header) - i;
+
+			hashwrite(out, header, len);
+			hashwrite(out, ofs_header + sizeof(ofs_header) - ofs_len, ofs_len);
+			copy_pack_data(out, reuse_packfile, w_curs, cur, next - cur);
+			return;
+		}
+
+		/* ...otherwise we have no fixup, and can write it verbatim */
+	}
+
+	copy_pack_data(out, reuse_packfile, w_curs, offset, next - offset);
+}
+
+static size_t write_reused_pack_verbatim(struct hashfile *out,
+					 struct pack_window **w_curs)
+{
+	size_t pos = 0;
+
+	while (pos < reuse_packfile_bitmap->word_alloc &&
+			reuse_packfile_bitmap->words[pos] == (eword_t)~0)
+		pos++;
+
+	if (pos) {
+		off_t to_write;
+
+		written = (pos * BITS_IN_EWORD);
+		to_write = reuse_packfile->revindex[written].offset
+			- sizeof(struct pack_header);
+
+		/* We're recording one chunk, not one object. */
+		record_reused_object(sizeof(struct pack_header), 0);
+		hashflush(out);
+		copy_pack_data(out, reuse_packfile, w_curs,
+			sizeof(struct pack_header), to_write);
 
-		/*
-		 * We don't know the actual number of objects written,
-		 * only how many bytes written, how many bytes total, and
-		 * how many objects total. So we can fake it by pretending all
-		 * objects we are writing are the same size. This gives us a
-		 * smooth progress meter, and at the end it matches the true
-		 * answer.
-		 */
-		written = reuse_packfile_objects *
-				(((double)(total - to_write)) / total);
 		display_progress(progress_state, written);
 	}
+	return pos;
+}
+
+static void write_reused_pack(struct hashfile *f)
+{
+	size_t i = 0;
+	uint32_t offset;
+	struct pack_window *w_curs = NULL;
+
+	if (allow_ofs_delta)
+		i = write_reused_pack_verbatim(f, &w_curs);
+
+	for (; i < reuse_packfile_bitmap->word_alloc; ++i) {
+		eword_t word = reuse_packfile_bitmap->words[i];
+		size_t pos = (i * BITS_IN_EWORD);
+
+		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			if ((word >> offset) == 0)
+				break;
+
+			offset += ewah_bit_ctz64(word >> offset);
+			write_reused_pack_one(pos + offset, f, &w_curs);
+			display_progress(progress_state, ++written);
+		}
+	}
 
-	close(fd);
-	written = reuse_packfile_objects;
-	display_progress(progress_state, written);
-	return reuse_packfile_offset - sizeof(struct pack_header);
+	unuse_pack(&w_curs);
 }
 
 static const char no_split_warning[] = N_(
@@ -868,11 +996,9 @@ static void write_pack_file(void)
 		offset = write_pack_header(f, nr_remaining);
 
 		if (reuse_packfile) {
-			off_t packfile_size;
 			assert(pack_to_stdout);
-
-			packfile_size = write_reused_pack(f);
-			offset += packfile_size;
+			write_reused_pack(f);
+			offset = hashfile_total(f);
 		}
 
 		nr_written = 0;
@@ -2661,6 +2787,7 @@ static void prepare_pack(int window, int depth)
 
 	if (nr_deltas && n > 1) {
 		unsigned nr_done = 0;
+
 		if (progress)
 			progress_state = start_progress(_("Compressing objects"),
 							nr_deltas);
@@ -3042,7 +3169,6 @@ static int pack_options_allow_reuse(void)
 {
 	return allow_pack_reuse &&
 	       pack_to_stdout &&
-	       allow_ofs_delta &&
 	       !ignore_packed_keep_on_disk &&
 	       !ignore_packed_keep_in_core &&
 	       (!local || !have_non_local_packs) &&
@@ -3059,7 +3185,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
 			bitmap_git,
 			&reuse_packfile,
 			&reuse_packfile_objects,
-			&reuse_packfile_offset)) {
+			&reuse_packfile_bitmap)) {
 		assert(reuse_packfile_objects);
 		nr_result += reuse_packfile_objects;
 		display_progress(progress_state, nr_result);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 0bbc651dde..a1d43d367a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -326,6 +326,13 @@ static int load_pack_bitmap(struct bitmap_index *bitmap_git)
 	munmap(bitmap_git->map, bitmap_git->map_size);
 	bitmap_git->map = NULL;
 	bitmap_git->map_size = 0;
+
+	kh_destroy_oid_map(bitmap_git->bitmaps);
+	bitmap_git->bitmaps = NULL;
+
+	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
+	bitmap_git->ext_index.positions = NULL;
+
 	return -1;
 }
 
@@ -768,65 +775,126 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	return NULL;
 }
 
-int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
-				       struct packed_git **packfile,
-				       uint32_t *entries,
-				       off_t *up_to)
+static void try_partial_reuse(struct bitmap_index *bitmap_git,
+			      size_t pos,
+			      struct bitmap *reuse,
+			      struct pack_window **w_curs)
 {
+	struct revindex_entry *revidx;
+	off_t offset;
+	enum object_type type;
+	unsigned long size;
+
+	if (pos >= bitmap_git->pack->num_objects)
+		return; /* not actually in the pack */
+
+	revidx = &bitmap_git->pack->revindex[pos];
+	offset = revidx->offset;
+	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
+	if (type < 0)
+		return; /* broken packfile, punt */
+
+	if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
+		off_t base_offset;
+		int base_pos;
+
+		/*
+		 * Find the position of the base object so we can look it up
+		 * in our bitmaps. If we can't come up with an offset, or if
+		 * that offset is not in the revidx, the pack is corrupt.
+		 * There's nothing we can do, so just punt on this object,
+		 * and the normal slow path will complain about it in
+		 * more detail.
+		 */
+		base_offset = get_delta_base(bitmap_git->pack, w_curs,
+					     &offset, type, revidx->offset);
+		if (!base_offset)
+			return;
+		base_pos = find_revindex_position(bitmap_git->pack, base_offset);
+		if (base_pos < 0)
+			return;
+
+		/*
+		 * We assume delta dependencies always point backwards. This
+		 * lets us do a single pass, and is basically always true
+		 * due to the way OFS_DELTAs work. You would not typically
+		 * find REF_DELTA in a bitmapped pack, since we only bitmap
+		 * packs we write fresh, and OFS_DELTA is the default). But
+		 * let's double check to make sure the pack wasn't written with
+		 * odd parameters.
+		 */
+		if (base_pos >= pos)
+			return;
+
+		/*
+		 * And finally, if we're not sending the base as part of our
+		 * reuse chunk, then don't send this object either. The base
+		 * would come after us, along with other objects not
+		 * necessarily in the pack, which means we'd need to convert
+		 * to REF_DELTA on the fly. Better to just let the normal
+		 * object_entry code path handle it.
+		 */
+		if (!bitmap_get(reuse, base_pos))
+			return;
+	}
+
 	/*
-	 * Reuse the packfile content if we need more than
-	 * 90% of its objects
+	 * If we got here, then the object is OK to reuse. Mark it.
 	 */
-	static const double REUSE_PERCENT = 0.9;
+	bitmap_set(reuse, pos);
+}
 
+int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
+				       struct packed_git **packfile_out,
+				       uint32_t *entries,
+				       struct bitmap **reuse_out)
+{
 	struct bitmap *result = bitmap_git->result;
-	uint32_t reuse_threshold;
-	uint32_t i, reuse_objects = 0;
+	struct bitmap *reuse;
+	struct pack_window *w_curs = NULL;
+	size_t i = 0;
+	uint32_t offset;
 
 	assert(result);
 
-	for (i = 0; i < result->word_alloc; ++i) {
-		if (result->words[i] != (eword_t)~0) {
-			reuse_objects += ewah_bit_ctz64(~result->words[i]);
-			break;
-		}
+	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
+		i++;
 
-		reuse_objects += BITS_IN_EWORD;
-	}
+	/* Don't mark objects not in the packfile */
+	if (i > bitmap_git->pack->num_objects / BITS_IN_EWORD)
+		i = bitmap_git->pack->num_objects / BITS_IN_EWORD;
 
-#ifdef GIT_BITMAP_DEBUG
-	{
-		const unsigned char *sha1;
-		struct revindex_entry *entry;
+	reuse = bitmap_word_alloc(i);
+	memset(reuse->words, 0xFF, i * sizeof(eword_t));
 
-		entry = &bitmap_git->reverse_index->revindex[reuse_objects];
-		sha1 = nth_packed_object_sha1(bitmap_git->pack, entry->nr);
-
-		fprintf(stderr, "Failed to reuse at %d (%016llx)\n",
-			reuse_objects, result->words[i]);
-		fprintf(stderr, " %s\n", hash_to_hex(sha1));
-	}
-#endif
+	for (; i < result->word_alloc; ++i) {
+		eword_t word = result->words[i];
+		size_t pos = (i * BITS_IN_EWORD);
 
-	if (!reuse_objects)
-		return -1;
+		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			if ((word >> offset) == 0)
+				break;
 
-	if (reuse_objects >= bitmap_git->pack->num_objects) {
-		bitmap_git->reuse_objects = *entries = bitmap_git->pack->num_objects;
-		*up_to = -1; /* reuse the full pack */
-		*packfile = bitmap_git->pack;
-		return 0;
+			offset += ewah_bit_ctz64(word >> offset);
+			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
+		}
 	}
 
-	reuse_threshold = bitmap_popcount(bitmap_git->result) * REUSE_PERCENT;
+	unuse_pack(&w_curs);
 
-	if (reuse_objects < reuse_threshold)
+	*entries = bitmap_popcount(reuse);
+	if (!*entries) {
+		bitmap_free(reuse);
 		return -1;
+	}
 
-	bitmap_git->reuse_objects = *entries = reuse_objects;
-	*up_to = bitmap_git->pack->revindex[reuse_objects].offset;
-	*packfile = bitmap_git->pack;
-
+	/*
+	 * Drop any reused objects from the result, since they will not
+	 * need to be handled separately.
+	 */
+	bitmap_and_not(result, reuse);
+	*packfile_out = bitmap_git->pack;
+	*reuse_out = reuse;
 	return 0;
 }
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 6ab6033dbe..bcd03b8993 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,7 +50,8 @@ void test_bitmap_walk(struct rev_info *revs);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 				       struct packed_git **packfile,
-				       uint32_t *entries, off_t *up_to);
+				       uint32_t *entries,
+				       struct bitmap **reuse_out);
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
 			     kh_oid_map_t *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 11/12] pack-objects: add checks for duplicate objects
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (9 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 10/12] pack-objects: improve partial packfile reuse Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  2019-12-18 11:25 ` [PATCH v4 12/12] pack-bitmap: don't rely on bitmap_git->reuse_objects Christian Couder
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

Additional checks are added in have_duplicate_entry() and
obj_is_packed() to avoid duplicate objects in the reuse
bitmap. It was probably buggy to not have such a check
before.

Git as a client would never both asks for a tag by sha1 and
specify "include-tag", but libgit2 will, so a libgit2 client
cloning from a Git server would trigger the bug.

If a client both asks for a tag by sha1 and specifies
"include-tag", we may end up including the tag in the reuse
bitmap (due to the first thing), and then later adding it to
the packlist (due to the second). This results in duplicate
objects in the pack, which git chokes on. We should notice
that we are already including it when doing the include-tag
portion, and avoid adding it to the packlist.

The simplest place to fix this is right in add_ref_tag(),
where we could avoid peeling the tag at all if we know that
we are already including it. However, this pushes the check
instead into have_duplicate_entry(). This fixes not only
this case, but also means that we cannot have any similar
problems lurking in other code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/pack-objects.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c80c1fac94..b1998202fb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1127,6 +1127,10 @@ static int have_duplicate_entry(const struct object_id *oid,
 {
 	struct object_entry *entry;
 
+	if (reuse_packfile_bitmap &&
+	    bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid))
+		return 1;
+
 	entry = packlist_find(&to_pack, oid);
 	if (!entry)
 		return 0;
@@ -2681,7 +2685,9 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size,
 
 static int obj_is_packed(const struct object_id *oid)
 {
-	return !!packlist_find(&to_pack, oid);
+	return packlist_find(&to_pack, oid) ||
+		(reuse_packfile_bitmap &&
+		 bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid));
 }
 
 static void add_tag_chain(const struct object_id *oid)
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 12/12] pack-bitmap: don't rely on bitmap_git->reuse_objects
  2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
                   ` (10 preceding siblings ...)
  2019-12-18 11:25 ` [PATCH v4 11/12] pack-objects: add checks for duplicate objects Christian Couder
@ 2019-12-18 11:25 ` Christian Couder
  11 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2019-12-18 11:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder, Ramsay Jones,
	Jonathan Tan

From: Jeff King <peff@peff.net>

We no longer compute bitmap_git->reuse_objects, so we
cannot rely on it anymore to terminate the loop early;
we have to iterate to the end.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 pack-bitmap.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index a1d43d367a..5a8689cdf8 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -629,7 +629,7 @@ static void show_objects_for_type(
 	enum object_type object_type,
 	show_reachable_fn show_reach)
 {
-	size_t pos = 0, i = 0;
+	size_t i = 0;
 	uint32_t offset;
 
 	struct ewah_iterator it;
@@ -637,13 +637,15 @@ static void show_objects_for_type(
 
 	struct bitmap *objects = bitmap_git->result;
 
-	if (bitmap_git->reuse_objects == bitmap_git->pack->num_objects)
-		return;
-
 	ewah_iterator_init(&it, type_filter);
 
-	while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+	for (i = 0; i < objects->word_alloc &&
+			ewah_iterator_next(&filter, &it); i++) {
 		eword_t word = objects->words[i] & filter;
+		size_t pos = (i * BITS_IN_EWORD);
+
+		if (!word)
+			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
 			struct object_id oid;
@@ -655,9 +657,6 @@ static void show_objects_for_type(
 
 			offset += ewah_bit_ctz64(word >> offset);
 
-			if (pos + offset < bitmap_git->reuse_objects)
-				continue;
-
 			entry = &bitmap_git->pack->revindex[pos + offset];
 			nth_packed_object_oid(&oid, bitmap_git->pack, entry->nr);
 
@@ -666,9 +665,6 @@ static void show_objects_for_type(
 
 			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, entry->offset);
 		}
-
-		pos += BITS_IN_EWORD;
-		i++;
 	}
 }
 
-- 
2.24.1.498.g561400140f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-12-18 11:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 11:25 [PATCH v4 00/12] Rewrite packfile reuse code Christian Couder
2019-12-18 11:25 ` [PATCH v4 01/12] builtin/pack-objects: report reused packfile objects Christian Couder
2019-12-18 11:25 ` [PATCH v4 02/12] packfile: expose get_delta_base() Christian Couder
2019-12-18 11:25 ` [PATCH v4 03/12] ewah/bitmap: introduce bitmap_word_alloc() Christian Couder
2019-12-18 11:25 ` [PATCH v4 04/12] pack-bitmap: introduce bitmap_walk_contains() Christian Couder
2019-12-18 11:25 ` [PATCH v4 05/12] pack-bitmap: uninteresting oid can be outside bitmapped packfile Christian Couder
2019-12-18 11:25 ` [PATCH v4 06/12] pack-bitmap: simplify bitmap_has_oid_in_uninteresting() Christian Couder
2019-12-18 11:25 ` [PATCH v4 07/12] csum-file: introduce hashfile_total() Christian Couder
2019-12-18 11:25 ` [PATCH v4 08/12] pack-objects: introduce pack.allowPackReuse Christian Couder
2019-12-18 11:25 ` [PATCH v4 09/12] builtin/pack-objects: introduce obj_is_packed() Christian Couder
2019-12-18 11:25 ` [PATCH v4 10/12] pack-objects: improve partial packfile reuse Christian Couder
2019-12-18 11:25 ` [PATCH v4 11/12] pack-objects: add checks for duplicate objects Christian Couder
2019-12-18 11:25 ` [PATCH v4 12/12] pack-bitmap: don't rely on bitmap_git->reuse_objects Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).