git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/3] some prune optimizations
@ 2019-02-14  4:31 Jeff King
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Jeff King @ 2019-02-14  4:31 UTC (permalink / raw)
  To: git

Here are two optimizations for git-prune that we've been running at
GitHub for ages. No particular reason to share them now; they just
finally floated to the top of my todo pile. Do note that I rebased and
polished them (and the third one is brand new), so the concepts are
proven, but it's possible I introduced a new bug. ;)

  [1/3]: prune: lazily perform reachability traversal
  [2/3]: prune: use bitmaps for reachability traversal
  [3/3]: prune: check SEEN flag for reachability

 builtin/prune.c       | 44 +++++++++++++++++++++++++++++++------------
 reachable.c           | 42 +++++++++++++++++++++++++++++++++++++++++
 t/perf/p5304-prune.sh | 35 ++++++++++++++++++++++++++++++++++
 t/t5304-prune.sh      | 12 ++++++++++++
 4 files changed, 121 insertions(+), 12 deletions(-)
 create mode 100755 t/perf/p5304-prune.sh

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] prune: lazily perform reachability traversal
  2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
@ 2019-02-14  4:35 ` Jeff King
  2019-02-14 10:54   ` Eric Sunshine
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
  2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King
  2 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-02-14  4:35 UTC (permalink / raw)
  To: git

The general strategy of "git prune" is to do a full reachability walk,
then for each loose object see if we found it in our walk. But if we
don't have any loose objects, we don't need to do the expensive walk in
the first place.

This patch postpones that walk until the first time we need to see its
results.

Note that this is really a specific case of a more general optimization,
which is that we could traverse only far enough to find the object under
consideration (i.e., stop the traversal when we find it, then pick up
again when asked about the next object, etc). That could save us in some
instances from having to do a full walk. But it's actually a bit tricky
to do with our traversal code, and you'd need to do a full walk anyway
if you have even a single unreachable object (which you generally do, if
any objects are actually left after running git-repack).

So in practice this lazy-load of the full walk catches one easy but
common case (i.e., you've just repacked via git-gc, and there's nothing
unreachable).

The perf script is fairly contrived, but it does show off the
improvement:

  Test                            HEAD^             HEAD
  -------------------------------------------------------------------------
  5304.4: prune with no objects   3.66(3.60+0.05)   0.00(0.00+0.00) -100.0%

and would let us know if we accidentally regress this optimization.

Note also that we need to take special care with prune_shallow(), which
relies on us having performed the traversal. So this optimization can
only kick in for a non-shallow repository. Since this is easy to get
wrong and is not covered by existing tests, let's add an extra test to
t5304 that covers this case explicitly.

Signed-off-by: Jeff King <peff@peff.net>
---
The diff looks nice with --color-moved. I wish there was a way to
communicate that information in a plaintext email.

 builtin/prune.c       | 43 ++++++++++++++++++++++++++++++++-----------
 t/perf/p5304-prune.sh | 24 ++++++++++++++++++++++++
 t/t5304-prune.sh      | 12 ++++++++++++
 3 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100755 t/perf/p5304-prune.sh

diff --git a/builtin/prune.c b/builtin/prune.c
index 1ec9ddd751..04b6573945 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -31,16 +31,40 @@ static int prune_tmp_file(const char *fullpath)
 	return 0;
 }
 
-static int prune_object(const struct object_id *oid, const char *fullpath,
-			void *data)
+static void perform_reachability_traversal(struct rev_info *revs)
 {
-	struct stat st;
+	static int initialized;
+	struct progress *progress = NULL;
+
+	if (initialized)
+		return;
+
+	if (show_progress)
+		progress = start_delayed_progress(_("Checking connectivity"), 0);
+	mark_reachable_objects(revs, 1, expire, progress);
+	stop_progress(&progress);
+	initialized = 1;
+}
+
+static int is_object_reachable(const struct object_id *oid,
+			       struct rev_info *revs)
+{
+	perform_reachability_traversal(revs);
 
 	/*
 	 * Do we know about this object?
 	 * It must have been reachable
 	 */
-	if (lookup_object(the_repository, oid->hash))
+	return !!lookup_object(the_repository, oid->hash);
+}
+
+static int prune_object(const struct object_id *oid, const char *fullpath,
+			void *data)
+{
+	struct rev_info *revs = data;
+	struct stat st;
+
+	if (is_object_reachable(oid, revs))
 		return 0;
 
 	if (lstat(fullpath, &st)) {
@@ -102,7 +126,6 @@ static void remove_temporary_files(const char *path)
 int cmd_prune(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
-	struct progress *progress = NULL;
 	int exclude_promisor_objects = 0;
 	const struct option options[] = {
 		OPT__DRY_RUN(&show_only, N_("do not remove, show only")),
@@ -142,17 +165,13 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 
 	if (show_progress == -1)
 		show_progress = isatty(2);
-	if (show_progress)
-		progress = start_delayed_progress(_("Checking connectivity"), 0);
 	if (exclude_promisor_objects) {
 		fetch_if_missing = 0;
 		revs.exclude_promisor_objects = 1;
 	}
 
-	mark_reachable_objects(&revs, 1, expire, progress);
-	stop_progress(&progress);
 	for_each_loose_file_in_objdir(get_object_directory(), prune_object,
-				      prune_cruft, prune_subdir, NULL);
+				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
 	remove_temporary_files(get_object_directory());
@@ -160,8 +179,10 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	remove_temporary_files(s);
 	free(s);
 
-	if (is_repository_shallow(the_repository))
+	if (is_repository_shallow(the_repository)) {
+		perform_reachability_traversal(&revs);
 		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
+	}
 
 	return 0;
 }
diff --git a/t/perf/p5304-prune.sh b/t/perf/p5304-prune.sh
new file mode 100755
index 0000000000..3c852084eb
--- /dev/null
+++ b/t/perf/p5304-prune.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+test_description='performance tests of prune'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'remove reachable loose objects' '
+	git repack -ad
+'
+
+test_expect_success 'remove unreachable loose objects' '
+	git prune
+'
+
+test_expect_success 'confirm there are no loose objects' '
+	git count-objects | grep ^0
+'
+
+test_perf 'prune with no objects' '
+	git prune
+'
+
+test_done
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 270da21ac3..2c19a790c1 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -274,6 +274,18 @@ test_expect_success 'prune .git/shallow' '
 	test_path_is_missing .git/shallow
 '
 
+test_expect_success 'prune .git/shallow when there are no loose objects' '
+	SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
+	echo $SHA1 >.git/shallow &&
+	git update-ref refs/heads/shallow-tip $SHA1 &&
+	git repack -ad &&
+	# verify assumption that all loose objects are gone
+	git count-objects | grep ^0 &&
+	git prune &&
+	echo $SHA1 >expect &&
+	test_cmp expect .git/shallow
+'
+
 test_expect_success 'prune: handle alternate object database' '
 	test_create_repo A &&
 	git -C A commit --allow-empty -m "initial commit" &&
-- 
2.21.0.rc0.586.gffba1126a0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/3] prune: use bitmaps for reachability traversal
  2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
@ 2019-02-14  4:37 ` " Jeff King
  2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
  2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
  2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King
  2 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-02-14  4:37 UTC (permalink / raw)
  To: git

Pruning generally has to traverse the whole commit graph in order to
see which objects are reachable. This is the exact problem that
reachability bitmaps were meant to solve, so let's use them (if they're
available, of course).

Here are timings on git.git:

  Test                            HEAD^             HEAD
  ------------------------------------------------------------------------
  5304.6: prune with bitmaps      3.65(3.56+0.09)   1.01(0.92+0.08) -72.3%

And on linux.git:

  Test                            HEAD^               HEAD
  --------------------------------------------------------------------------
  5304.6: prune with bitmaps      35.05(34.79+0.23)   3.00(2.78+0.21) -91.4%

The tests show a pretty optimal case, as we'll have just repacked and
should have pretty good coverage of all refs with our bitmaps. But
that's actually pretty realistic: normally prune is run via "gc" right
after repacking.

A few notes on the implementation:

  - the change is actually in reachable.c, so it would improve
    reachability traversals by "reflog expire --stale-fix", as well.
    Those aren't performed regularly, though (a normal "git gc" doesn't
    use --stale-fix), so they're not really worth measuring. There's a
    low chance of regressing that caller, since the use of bitmaps is
    totally transparent from the caller's perspective.

  - The bitmap case could actually get away without creating a "struct
    object", and instead the caller could just look up each object id in
    the bitmap result. However, this would be a marginal improvement in
    runtime, and it would make the callers much more complicated. They'd
    have to handle both the bitmap and non-bitmap cases separately, and
    in the case of git-prune, we'd also have to tweak prune_shallow(),
    which relies on our SEEN flags.

  - Because we do create real object structs, we go through a few
    contortions to create ones of the right type. This isn't strictly
    necessary (lookup_unknown_object() would suffice), but it's more
    memory efficient to use the correct types, since we already know
    them.

Signed-off-by: Jeff King <peff@peff.net>
---
 reachable.c           | 42 ++++++++++++++++++++++++++++++++++++++++++
 t/perf/p5304-prune.sh | 11 +++++++++++
 2 files changed, 53 insertions(+)

diff --git a/reachable.c b/reachable.c
index 6e9b810d2a..0d00a91de4 100644
--- a/reachable.c
+++ b/reachable.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "worktree.h"
 #include "object-store.h"
+#include "pack-bitmap.h"
 
 struct connectivity_progress {
 	struct progress *progress;
@@ -158,10 +159,44 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
 				      FOR_EACH_OBJECT_LOCAL_ONLY);
 }
 
+static void *lookup_object_by_type(struct repository *r,
+				   const struct object_id *oid,
+				   enum object_type type)
+{
+	switch (type) {
+	case OBJ_COMMIT:
+		return lookup_commit(r, oid);
+	case OBJ_TREE:
+		return lookup_tree(r, oid);
+	case OBJ_TAG:
+		return lookup_tag(r, oid);
+	case OBJ_BLOB:
+		return lookup_blob(r, oid);
+	default:
+		die("BUG: unknown object type %d", type);
+	}
+}
+
+static int mark_object_seen(const struct object_id *oid,
+			     enum object_type type,
+			     int exclude,
+			     uint32_t name_hash,
+			     struct packed_git *found_pack,
+			     off_t found_offset)
+{
+	struct object *obj = lookup_object_by_type(the_repository, oid, type);
+	if (!obj)
+		die("unable to create object '%s'", oid_to_hex(oid));
+
+	obj->flags |= SEEN;
+	return 0;
+}
+
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 			    timestamp_t mark_recent, struct progress *progress)
 {
 	struct connectivity_progress cp;
+	struct bitmap_index *bitmap_git;
 
 	/*
 	 * Set up revision parsing, and mark us as being interested
@@ -188,6 +223,13 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
+	bitmap_git = prepare_bitmap_walk(revs);
+	if (bitmap_git) {
+		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+		free_bitmap_index(bitmap_git);
+		return;
+	}
+
 	/*
 	 * Set up the revision walk - this will move all commits
 	 * from the pending list to the commit walking list.
diff --git a/t/perf/p5304-prune.sh b/t/perf/p5304-prune.sh
index 3c852084eb..83baedb8a4 100755
--- a/t/perf/p5304-prune.sh
+++ b/t/perf/p5304-prune.sh
@@ -21,4 +21,15 @@ test_perf 'prune with no objects' '
 	git prune
 '
 
+test_expect_success 'repack with bitmaps' '
+	git repack -adb
+'
+
+# We have to create the object in each trial run, since otherwise
+# runs after the first see no object and just skip the traversal entirely!
+test_perf 'prune with bitmaps' '
+	echo "probably not present in repo" | git hash-object -w --stdin &&
+	git prune
+'
+
 test_done
-- 
2.21.0.rc0.586.gffba1126a0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/3] prune: check SEEN flag for reachability
  2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
@ 2019-02-14  4:38 ` Jeff King
  2 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-02-14  4:38 UTC (permalink / raw)
  To: git

The git-prune command checks reachability by doing a traversal, and then
checking whether a given object exists in the global object hash. This
can yield false positives if any other part of the code had to create an
object struct for some reason. It's not clear whether this is even
possible, but it's more robust to rely on something a little more
concrete: the SEEN flag set by our traversal.

Note that there is a slight possibility of regression here, as we're
relying on mark_reachable_objects() to consistently set the flag.
However, it has always done so, and we're already relying on that fact
in prune_shallow(), which is called as part of git-prune. So this is
making these two parts of the prune operation more consistent.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/prune.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 04b6573945..97613eccb5 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -49,13 +49,12 @@ static void perform_reachability_traversal(struct rev_info *revs)
 static int is_object_reachable(const struct object_id *oid,
 			       struct rev_info *revs)
 {
+	struct object *obj;
+
 	perform_reachability_traversal(revs);
 
-	/*
-	 * Do we know about this object?
-	 * It must have been reachable
-	 */
-	return !!lookup_object(the_repository, oid->hash);
+	obj = lookup_object(the_repository, oid->hash);
+	return obj && (obj->flags & SEEN);
 }
 
 static int prune_object(const struct object_id *oid, const char *fullpath,
-- 
2.21.0.rc0.586.gffba1126a0

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] prune: lazily perform reachability traversal
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
@ 2019-02-14 10:54   ` Eric Sunshine
  2019-02-14 11:07     ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2019-02-14 10:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Wed, Feb 13, 2019 at 11:35 PM Jeff King <peff@peff.net> wrote:
> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> @@ -274,6 +274,18 @@ test_expect_success 'prune .git/shallow' '
> +test_expect_success 'prune .git/shallow when there are no loose objects' '
> +       SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&

Perhaps name this variable 'oid' rather than 'SHA1' to make brian happy.

> +       echo $SHA1 >.git/shallow &&
> +       git update-ref refs/heads/shallow-tip $SHA1 &&
> +       git repack -ad &&
> +       # verify assumption that all loose objects are gone
> +       git count-objects | grep ^0 &&
> +       git prune &&
> +       echo $SHA1 >expect &&
> +       test_cmp expect .git/shallow
> +'

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] prune: lazily perform reachability traversal
  2019-02-14 10:54   ` Eric Sunshine
@ 2019-02-14 11:07     ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-02-14 11:07 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Feb 14, 2019 at 05:54:25AM -0500, Eric Sunshine wrote:

> On Wed, Feb 13, 2019 at 11:35 PM Jeff King <peff@peff.net> wrote:
> > diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> > @@ -274,6 +274,18 @@ test_expect_success 'prune .git/shallow' '
> > +test_expect_success 'prune .git/shallow when there are no loose objects' '
> > +       SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
> 
> Perhaps name this variable 'oid' rather than 'SHA1' to make brian happy.

Heh, that line (and the one below it) are cut-and-paste from the setup
of the test directly above.

Perhaps we should do this on top:

-- >8 --
Subject: [PATCH] t5304: rename "sha1" variables to "oid"

Let's make the script less jarring to read in a post-sha1 world by
using more hash-agnostic variable names.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5304-prune.sh | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 2c19a790c1..1eeb828a15 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -118,10 +118,10 @@ test_expect_success 'prune: do not prune detached HEAD with no reflog' '
 
 test_expect_success 'prune: prune former HEAD after checking out branch' '
 
-	head_sha1=$(git rev-parse HEAD) &&
+	head_oid=$(git rev-parse HEAD) &&
 	git checkout --quiet master &&
 	git prune -v >prune_actual &&
-	grep "$head_sha1" prune_actual
+	grep "$head_oid" prune_actual
 
 '
 
@@ -265,24 +265,24 @@ EOF
 '
 
 test_expect_success 'prune .git/shallow' '
-	SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
-	echo $SHA1 >.git/shallow &&
+	oid=$(echo hi|git commit-tree HEAD^{tree}) &&
+	echo $oid >.git/shallow &&
 	git prune --dry-run >out &&
-	grep $SHA1 .git/shallow &&
-	grep $SHA1 out &&
+	grep $oid .git/shallow &&
+	grep $oid out &&
 	git prune &&
 	test_path_is_missing .git/shallow
 '
 
 test_expect_success 'prune .git/shallow when there are no loose objects' '
-	SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
-	echo $SHA1 >.git/shallow &&
-	git update-ref refs/heads/shallow-tip $SHA1 &&
+	oid=$(echo hi|git commit-tree HEAD^{tree}) &&
+	echo $oid >.git/shallow &&
+	git update-ref refs/heads/shallow-tip $oid &&
 	git repack -ad &&
 	# verify assumption that all loose objects are gone
 	git count-objects | grep ^0 &&
 	git prune &&
-	echo $SHA1 >expect &&
+	echo $oid >expect &&
 	test_cmp expect .git/shallow
 '
 
@@ -326,8 +326,8 @@ test_expect_success 'prune: handle HEAD reflog in multiple worktrees' '
 		git reset --hard HEAD^
 	) &&
 	git prune --expire=now &&
-	SHA1=`git hash-object expected` &&
-	git -C third-worktree show "$SHA1" >actual &&
+	oid=`git hash-object expected` &&
+	git -C third-worktree show "$oid" >actual &&
 	test_cmp expected actual
 '
 
-- 
2.21.0.rc1.567.g7f045cdc92


^ permalink raw reply	[flat|nested] 41+ messages in thread

* bitmaps by default? [was: prune: use bitmaps for reachability traversal]
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
@ 2019-03-09  2:49   ` Eric Wong
  2019-03-10 23:39     ` Jeff King
  2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Wong @ 2019-03-09  2:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> Pruning generally has to traverse the whole commit graph in order to
> see which objects are reachable. This is the exact problem that
> reachability bitmaps were meant to solve, so let's use them (if they're
> available, of course).

Perhaps this is good impetus for doing bitmaps by default?

It would make life easier for people new to hosting git servers
(and hopefully reduce centralization :)


I started working on it, but t0410-partial-clone.sh fails with
"Failed to write bitmap index. Packfile doesn't have full
closure"; so more work needs to be done w.r.t. default behavior
on partial clones...

Here's my WIP:

diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..ca98d32715 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -344,10 +344,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		die(_("--keep-unreachable and -A are incompatible"));
 
 	if (pack_kept_objects < 0)
-		pack_kept_objects = write_bitmaps;
+		pack_kept_objects = write_bitmaps > 0;
 
-	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE))
-		die(_(incremental_bitmap_conflict_error));
+	if (!(pack_everything & ALL_INTO_ONE)) {
+		if (write_bitmaps > 0)
+			die(_(incremental_bitmap_conflict_error));
+	} else if (write_bitmaps < 0) {
+		write_bitmaps = 1;
+	}
 
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
@@ -368,7 +372,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--indexed-objects");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
-	if (write_bitmaps)
+	if (write_bitmaps > 0)
 		argv_array_push(&cmd.args, "--write-bitmap-index");
 	if (use_delta_islands)
 		argv_array_push(&cmd.args, "--delta-islands");

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: bitmaps by default? [was: prune: use bitmaps for reachability traversal]
  2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
@ 2019-03-10 23:39     ` Jeff King
  2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-03-10 23:39 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Sat, Mar 09, 2019 at 02:49:44AM +0000, Eric Wong wrote:

> Jeff King <peff@peff.net> wrote:
> > Pruning generally has to traverse the whole commit graph in order to
> > see which objects are reachable. This is the exact problem that
> > reachability bitmaps were meant to solve, so let's use them (if they're
> > available, of course).
> 
> Perhaps this is good impetus for doing bitmaps by default?

I'm actually not sure it is, because the prune costs less than making
the bitmaps. Here are some timings on linux.git. This full-graph
traversal is roughly the same cost as the reachability walk that prune
would do internally:

	$ time git rev-list --objects --all >/dev/null
	real	0m47.714s
	user	0m47.113s
	sys	0m0.600s

Here's a normal noop repack as a baseline.

	$ time git repack -ad
	real	1m26.922s
	user	1m20.029s
	sys	0m7.878s

And here's another immediately after with bitmap generation. Generating
the bitmaps takes about 100s, compared to the 47s it would save us on
the prune.

	$ time git repack -adb
	real	3m5.915s
	user	2m59.377s
	sys	0m7.718s

Things are a little rosier if you generate the bitmaps a second time:

	$ time git repack -adb
	real	1m43.571s
	user	1m37.403s
	sys	0m8.179s

We can reuse some of the old bitmaps and it only takes 20 extra seconds,
making it a net win. But I'm not sure how realistic that is. There were
literally no new objects introduced between those two command. If this
were a "real" repack occurring after we'd accumulated a week or two
worth of objects, how long would it take?

A few other random observations:

  - I do suspect there are some real inefficiencies in the way we
    generate bitmaps. It _should_ be about as expensive as the graph
    traversal, but clearly it's not. I think this is because of the way
    the current bitmap code picks commits to bitmap, and then walks
    somewhat haphazardly over the history, trying to accumulate the set
    of objects for each commit. IOW, I believe it may sometimes
    traverse over some sequences of history more than once. So if we
    could make that faster, then the balance would shift in its favor.

  - This is comparing the cost of generating the bitmaps to the time
    saved for _one_ operation. On a repo serving many fetches, the cost
    to generate it is amortized over many requests. But for a normal
    end-user, that's not true (they'd presumably push out their work,
    but that usually only needs to walk a small bit of history anyway).

    The balance would change if we had more operations that used bitmaps
    (e.g., --contains can use them, as can ahead/behind checks). We
    don't do those things yet, but we could. However, those algorithms
    are also using other commit-graph optimizations, and we've discussed
    revamping the bitmap format as part of that work (one problem in
    particular is that to use the current bitmaps you have to parse the
    whole .bitmap file, making it sometimes a net negative to use the
    bitmaps). So I'd consider holding off any decision like "let's make
    this the default" until we see where that work goes.

> It would make life easier for people new to hosting git servers
> (and hopefully reduce centralization :)

I do think they're a net win for people hosting git servers. But if
that's the goal, I think at most you'd want to make bitmaps the default
for bare repos. They're really not much help for normal end-user repos
at this point.

> I started working on it, but t0410-partial-clone.sh fails with
> "Failed to write bitmap index. Packfile doesn't have full
> closure"; so more work needs to be done w.r.t. default behavior
> on partial clones...

Yeah, you can't use bitmaps at all in an incomplete clone. Shallow
clones would probably have the same issue (though in general we just
disable bitmaps entirely in shallow situations, so that might kick in).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-10 23:39     ` Jeff King
@ 2019-03-12  3:13       ` Eric Wong
  2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
  2019-03-12 10:49         ` Jeff King
  0 siblings, 2 replies; 41+ messages in thread
From: Eric Wong @ 2019-03-12  3:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> On Sat, Mar 09, 2019 at 02:49:44AM +0000, Eric Wong wrote:
> > It would make life easier for people new to hosting git servers
> > (and hopefully reduce centralization :)
> 
> I do think they're a net win for people hosting git servers. But if
> that's the goal, I think at most you'd want to make bitmaps the default
> for bare repos. They're really not much help for normal end-user repos
> at this point.

Fair enough, hopefully this can make life easier for admins
new to hosting git:

----------8<---------
Subject: [PATCH] repack: enable bitmaps by default on bare repos

A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/repack.c  | 16 ++++++++++------
 t/t7700-repack.sh | 14 +++++++++++++-
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..5d4758b515 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
 		die(_("--keep-unreachable and -A are incompatible"));
 
+	if (!(pack_everything & ALL_INTO_ONE)) {
+		if (write_bitmaps > 0)
+			die(_(incremental_bitmap_conflict_error));
+	} else if (write_bitmaps < 0) {
+		write_bitmaps = is_bare_repository();
+	}
+
 	if (pack_kept_objects < 0)
-		pack_kept_objects = write_bitmaps;
-
-	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE))
-		die(_(incremental_bitmap_conflict_error));
+		pack_kept_objects = write_bitmaps > 0;
 
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
@@ -368,7 +372,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--indexed-objects");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
-	if (write_bitmaps)
+	if (write_bitmaps > 0)
 		argv_array_push(&cmd.args, "--write-bitmap-index");
 	if (use_delta_islands)
 		argv_array_push(&cmd.args, "--delta-islands");
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 6162e2a8e6..3e0b5c40e4 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -221,5 +221,17 @@ test_expect_success 'repack --keep-pack' '
 	)
 '
 
+test_expect_success 'bitmaps are created by default in bare repos' '
+	git clone --bare .git bare.git &&
+	cd bare.git &&
+	mkdir old &&
+	mv objects/pack/* old &&
+	pack=$(ls old/*.pack) &&
+	test_path_is_file "$pack" &&
+	git unpack-objects -q <"$pack" &&
+	git repack -ad &&
+	bitmap=$(ls objects/pack/*.bitmap) &&
+	test_path_is_file "$bitmap"
+'
+
 test_done
-

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
@ 2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
  2019-03-12 10:49         ` Jeff King
  1 sibling, 0 replies; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-12  9:07 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git


On Tue, Mar 12 2019, Eric Wong wrote:

> Jeff King <peff@peff.net> wrote:
>> On Sat, Mar 09, 2019 at 02:49:44AM +0000, Eric Wong wrote:
>> > It would make life easier for people new to hosting git servers
>> > (and hopefully reduce centralization :)
>>
>> I do think they're a net win for people hosting git servers. But if
>> that's the goal, I think at most you'd want to make bitmaps the default
>> for bare repos. They're really not much help for normal end-user repos
>> at this point.
>
> Fair enough, hopefully this can make life easier for admins
> new to hosting git:
>
> ----------8<---------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
>
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  builtin/repack.c  | 16 ++++++++++------
>  t/t7700-repack.sh | 14 +++++++++++++-
>  2 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 67f8978043..5d4758b515 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -14,7 +14,7 @@
>
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> -static int write_bitmaps;
> +static int write_bitmaps = -1;
>  static int use_delta_islands;
>  static char *packdir, *packtmp;
>
> @@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
>  		die(_("--keep-unreachable and -A are incompatible"));
>
> +	if (!(pack_everything & ALL_INTO_ONE)) {
> +		if (write_bitmaps > 0)
> +			die(_(incremental_bitmap_conflict_error));
> +	} else if (write_bitmaps < 0) {
> +		write_bitmaps = is_bare_repository();
> +	}
> +
>  	if (pack_kept_objects < 0)
> -		pack_kept_objects = write_bitmaps;
> -
> -	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE))
> -		die(_(incremental_bitmap_conflict_error));
> +		pack_kept_objects = write_bitmaps > 0;
>
>  	packdir = mkpathdup("%s/pack", get_object_directory());
>  	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> @@ -368,7 +372,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	argv_array_push(&cmd.args, "--indexed-objects");
>  	if (repository_format_partial_clone)
>  		argv_array_push(&cmd.args, "--exclude-promisor-objects");
> -	if (write_bitmaps)
> +	if (write_bitmaps > 0)
>  		argv_array_push(&cmd.args, "--write-bitmap-index");
>  	if (use_delta_islands)
>  		argv_array_push(&cmd.args, "--delta-islands");
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 6162e2a8e6..3e0b5c40e4 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -221,5 +221,17 @@ test_expect_success 'repack --keep-pack' '
>  	)
>  '
>
> +test_expect_success 'bitmaps are created by default in bare repos' '
> +	git clone --bare .git bare.git &&
> +	cd bare.git &&
> +	mkdir old &&
> +	mv objects/pack/* old &&
> +	pack=$(ls old/*.pack) &&
> +	test_path_is_file "$pack" &&
> +	git unpack-objects -q <"$pack" &&
> +	git repack -ad &&
> +	bitmap=$(ls objects/pack/*.bitmap) &&
> +	test_path_is_file "$bitmap"
> +'
> +
>  test_done
> -

Looks sensible in principle, but now the git-repack and git-config docs
talking about repack.writeBitmaps are out-of-date. A v2 should update
those.

Also worth testing that -c repack.writeBitmaps=false on a bare
repository still overrides it, even though glancing at the code it looks
like that case is handled, but without being tested for.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
  2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
@ 2019-03-12 10:49         ` Jeff King
  2019-03-12 12:05           ` Jeff King
  2019-03-13  1:51           ` Eric Wong
  1 sibling, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-03-12 10:49 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Tue, Mar 12, 2019 at 03:13:03AM +0000, Eric Wong wrote:

> > I do think they're a net win for people hosting git servers. But if
> > that's the goal, I think at most you'd want to make bitmaps the default
> > for bare repos. They're really not much help for normal end-user repos
> > at this point.
> 
> Fair enough, hopefully this can make life easier for admins
> new to hosting git:
> 
> ----------8<---------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
> 
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.

OK. I still think of bitmaps as something that might need manual care
and feeding, but I think that may be leftover superstition. I can't
offhand think of any real downsides to this.

>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> -static int write_bitmaps;
> +static int write_bitmaps = -1;

So we'll have "-1" be "not decided yet". Makes sense.

> @@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
>  		die(_("--keep-unreachable and -A are incompatible"));
>  
> +	if (!(pack_everything & ALL_INTO_ONE)) {
> +		if (write_bitmaps > 0)
> +			die(_(incremental_bitmap_conflict_error));
> +	} else if (write_bitmaps < 0) {
> +		write_bitmaps = is_bare_repository();
> +	}

Might it be easier here to always resolve "-1" into a 0/1? I.e., like:

  if (write_bitmaps < 0)
	write_bitmaps = (pack_everything & ALL_INTO_ONE) && is_bare_repository();

and then the rest of the logic can stay the same, and does not need to
be modified to handle "write_bitmaps < 0"?

> +test_expect_success 'bitmaps are created by default in bare repos' '
> +	git clone --bare .git bare.git &&
> +	cd bare.git &&

Please don't "cd" outside of a subshell, since it impacts further tests
that are added.

> +	mkdir old &&
> +	mv objects/pack/* old &&
> +	pack=$(ls old/*.pack) &&

Are we sure we have just done $pack here? Our repo came from a
local-disk clone, which would have just hard-linked whatever was in the
source repo. So we're subtly relying on the state that other tests have
left.

I'm not sure what we're trying to accomplish with this unpacking,
though. Running "git repack -ad" should generate bitmaps whether the
objects were already in a single pack or not. So I think this test can
just be:

  git clone --bare . bare.git &&
  git -C bare.git repack -ad &&
  bitmap=$(ls objects/pack/*.bitmap)
  test_path_is_file "$bitmap"

I do agree with Ævar it might also be worth testing that disabling
bitmaps explicitly still works. And also that repacking _without_ "-a"
(i.e., an incremental) does not complain about being unable to generate
bitmaps.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12 10:49         ` Jeff King
@ 2019-03-12 12:05           ` Jeff King
  2019-03-13  1:51           ` Eric Wong
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-03-12 12:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Tue, Mar 12, 2019 at 06:49:54AM -0400, Jeff King wrote:

> I'm not sure what we're trying to accomplish with this unpacking,
> though. Running "git repack -ad" should generate bitmaps whether the
> objects were already in a single pack or not. So I think this test can
> just be:
> 
>   git clone --bare . bare.git &&
>   git -C bare.git repack -ad &&
>   bitmap=$(ls objects/pack/*.bitmap)
>   test_path_is_file "$bitmap"

Of course that should be "bare.git/objects/pack/*.bitmap" in the third
line, since we skipped the "cd" entirely.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12 10:49         ` Jeff King
  2019-03-12 12:05           ` Jeff King
@ 2019-03-13  1:51           ` Eric Wong
  2019-03-13 14:54             ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Wong @ 2019-03-13  1:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> wrote:
> OK. I still think of bitmaps as something that might need manual care
> and feeding, but I think that may be leftover superstition. I can't
> offhand think of any real downsides to this.

It's a _relatively_ new feature to long-time users like us, so
maybe the "new == immature" mindset sets in[*].  I skimmed some
mail archives but couldn't find any reason not to...

But I did find Ævar's forgotten gitperformance doc and thread
where the topic was brought up:

  https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/

> On Tue, Mar 12, 2019 at 03:13:03AM +0000, Eric Wong wrote:
> > @@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
> >  		die(_("--keep-unreachable and -A are incompatible"));
> >  
> > +	if (!(pack_everything & ALL_INTO_ONE)) {
> > +		if (write_bitmaps > 0)
> > +			die(_(incremental_bitmap_conflict_error));
> > +	} else if (write_bitmaps < 0) {
> > +		write_bitmaps = is_bare_repository();
> > +	}
> 
> Might it be easier here to always resolve "-1" into a 0/1? I.e., like:
> 
>   if (write_bitmaps < 0)
> 	write_bitmaps = (pack_everything & ALL_INTO_ONE) && is_bare_repository();
> 
> and then the rest of the logic can stay the same, and does not need to
> be modified to handle "write_bitmaps < 0"?

Good point, changed in v2.

> > +test_expect_success 'bitmaps are created by default in bare repos' '
> > +	git clone --bare .git bare.git &&
> > +	cd bare.git &&
> 
> Please don't "cd" outside of a subshell, since it impacts further tests
> that are added.

Oops, I got it into my head that each test was already run in a
separate subshell :x  Fixed.

> > +	mkdir old &&
> > +	mv objects/pack/* old &&
> > +	pack=$(ls old/*.pack) &&
> 
> Are we sure we have just done $pack here? Our repo came from a
> local-disk clone, which would have just hard-linked whatever was in the
> source repo. So we're subtly relying on the state that other tests have
> left.
> 
> I'm not sure what we're trying to accomplish with this unpacking,
> though. Running "git repack -ad" should generate bitmaps whether the
> objects were already in a single pack or not. So I think this test can
> just be:

Right, I forgot "repack -a" didn't need loose objects to operate :x

> I do agree with Ævar it might also be worth testing that disabling
> bitmaps explicitly still works. And also that repacking _without_ "-a"
> (i.e., an incremental) does not complain about being unable to generate
> bitmaps.

Yup, now with additional tests for repack.writeBitmaps=false,
repack (w/o "-a") and a config/repack.txt update

------------8<---------
Subject: [PATCH] repack: enable bitmaps by default on bare repos

A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config/repack.txt |  2 +-
 builtin/repack.c                |  5 ++++-
 t/t7700-repack.sh               | 19 ++++++++++++++++++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
index a5c37813fd..9c413e177e 100644
--- a/Documentation/config/repack.txt
+++ b/Documentation/config/repack.txt
@@ -24,4 +24,4 @@ repack.writeBitmaps::
 	packs created for clones and fetches, at the cost of some disk
 	space and extra time spent on the initial repack.  This has
 	no effect if multiple packfiles are created.
-	Defaults to false.
+	Defaults to true on bare repos, false otherwise.
diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..caca113927 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
 		die(_("--keep-unreachable and -A are incompatible"));
 
+	if (write_bitmaps < 0)
+		write_bitmaps = (pack_everything & ALL_INTO_ONE) &&
+				 is_bare_repository();
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps;
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 6162e2a8e6..6458286dcf 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' '
 	)
 '
 
-test_done
+test_expect_success 'bitmaps are created by default in bare repos' '
+	git clone --bare .git bare.git &&
+	git -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
+	test_path_is_file "$bitmap"
+'
+
+test_expect_success 'incremental repack does not complain' '
+	git -C bare.git repack -q 2>repack.err &&
+	! test -s repack.err
+'
 
+test_expect_success 'bitmaps can be disabled on bare repos' '
+	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) &&
+	test -z "$bitmap"
+'
+
+test_done
-- 
EW

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-13  1:51           ` Eric Wong
@ 2019-03-13 14:54             ` Jeff King
  2019-03-14  9:12               ` [PATCH v3] " Eric Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-03-13 14:54 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote:

> But I did find Ævar's forgotten gitperformance doc and thread
> where the topic was brought up:
> 
>   https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/

One thing that thread reminded me of: we probably also want to default
pack.writebitmaphashcache on. Otherwise the time saved during the object
enumeration can backfire when we spend much more time trying delta
compression (because we don't know the pathnames of any objects).

The reason it defaults to off is for on-disk compatibility with JGit.
But I have very little experience running without the hash-cache on. We
added it very early on because we found performance was not great
without it (I don't know if people running JGit have run into the same
problem and if not, why not).

> ------------8<---------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
> 
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.

This looks good to me, with one minor nit:

> +test_expect_success 'incremental repack does not complain' '
> +	git -C bare.git repack -q 2>repack.err &&
> +	! test -s repack.err
> +'

This last line could use "test_must_be_empty".

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-03-13 14:54             ` Jeff King
@ 2019-03-14  9:12               ` " Eric Wong
  2019-03-14 16:02                 ` Jeff King
  2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 41+ messages in thread
From: Eric Wong @ 2019-03-14  9:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> wrote:
> On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote:
> 
> > But I did find Ævar's forgotten gitperformance doc and thread
> > where the topic was brought up:
> > 
> >   https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/
> 
> One thing that thread reminded me of: we probably also want to default
> pack.writebitmaphashcache on. Otherwise the time saved during the object
> enumeration can backfire when we spend much more time trying delta
> compression (because we don't know the pathnames of any objects).

Interesting...  I found a big improvement with public-inbox
just using bitmaps; but have never tried the hash cache.

> The reason it defaults to off is for on-disk compatibility with JGit.

Right.  Our documentation seems to indicate JGit just warns (but
doesn't fall over), so maybe that can be considered separately.

I've never used JGit myself; and was satisfied enough with
bitmaps alone that I never tried the hash-cache.

> But I have very little experience running without the hash-cache on. We
> added it very early on because we found performance was not great
> without it (I don't know if people running JGit have run into the same
> problem and if not, why not).

As far as serving clones and fetches, public-inbox-init has
always created bare repos with bitmaps enabled, but without
the hash-cache for compatibility concerns.

That's a lot of fetches and clones over the years.

> > +test_expect_success 'incremental repack does not complain' '
> > +	git -C bare.git repack -q 2>repack.err &&
> > +	! test -s repack.err
> > +'
> 
> This last line could use "test_must_be_empty".

Thanks for the review!

---------8<-----------
Subject: [PATCH] repack: enable bitmaps by default on bare repos

A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config/repack.txt |  2 +-
 builtin/repack.c                |  5 ++++-
 t/t7700-repack.sh               | 19 ++++++++++++++++++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
index a5c37813fd..9c413e177e 100644
--- a/Documentation/config/repack.txt
+++ b/Documentation/config/repack.txt
@@ -24,4 +24,4 @@ repack.writeBitmaps::
 	packs created for clones and fetches, at the cost of some disk
 	space and extra time spent on the initial repack.  This has
 	no effect if multiple packfiles are created.
-	Defaults to false.
+	Defaults to true on bare repos, false otherwise.
diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..caca113927 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
 		die(_("--keep-unreachable and -A are incompatible"));
 
+	if (write_bitmaps < 0)
+		write_bitmaps = (pack_everything & ALL_INTO_ONE) &&
+				 is_bare_repository();
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps;
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 6162e2a8e6..86d05160a3 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' '
 	)
 '
 
-test_done
+test_expect_success 'bitmaps are created by default in bare repos' '
+	git clone --bare .git bare.git &&
+	git -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
+	test_path_is_file "$bitmap"
+'
+
+test_expect_success 'incremental repack does not complain' '
+	git -C bare.git repack -q 2>repack.err &&
+	test_must_be_empty repack.err
+'
 
+test_expect_success 'bitmaps can be disabled on bare repos' '
+	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) &&
+	test -z "$bitmap"
+'
+
+test_done
-- 
EW

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-03-14  9:12               ` [PATCH v3] " Eric Wong
@ 2019-03-14 16:02                 ` Jeff King
  2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
  2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-03-14 16:02 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

On Thu, Mar 14, 2019 at 09:12:54AM +0000, Eric Wong wrote:

> > The reason it defaults to off is for on-disk compatibility with JGit.
> 
> Right.  Our documentation seems to indicate JGit just warns (but
> doesn't fall over), so maybe that can be considered separately.

I think it was a hard error in the beginning, but they changed it pretty
soon after we added more flags. So it might be reasonable to just enable
it by default (but it wouldn't hurt to double check the behavior).

I tried running t5310 (which does a back-and-forth with jgit) using this
patch:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a154fc29f6..5264bf055a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -97,7 +97,7 @@ static off_t reuse_packfile_offset;
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
 static int write_bitmap_index;
-static uint16_t write_bitmap_options;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
 
 static int exclude_promisor_objects;
 

and it seemed happy.

> As far as serving clones and fetches, public-inbox-init has
> always created bare repos with bitmaps enabled, but without
> the hash-cache for compatibility concerns.
> 
> That's a lot of fetches and clones over the years.

The symptom you'd see is that "Compressing objects" during a fetch takes
a long time, and/or produces lousy deltas. But it matters less if:

  - you keep things packed pretty promptly, because we don't bother
    looking for new deltas between objects in the same pack. Just trying
    to clone public-inbox.org/git, it does look like it's mostly packed
    (based on the object counts) but the compression phase still takes
    10+ seconds.

  - how much the names actually help. In your case, I'd think not at
    all, because being based on hashes, they're effectively random. So
    the pack-objects heuristics to try to find deltas between files of
    similar filenames will not help you.

Regarding the second thing, I wondered if the overall packing of your
public-inbox git repo might not be good, so I did a "git repack -adf
--window=1000" on a clone.  Hundreds of CPU minutes later, I was only
able to shave off about 80MB. I'm not sure if that means you
occasionally do very aggressive repacks, or if there simply isn't all
that much delta opportunity (after all, you're not storing many versions
of one file, but rather tons of different emails; I would expect to find
deltas between various versions of a patch, though).

Anyway...

> ---------8<-----------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
> 
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.
> 
> Signed-off-by: Eric Wong <e@80x24.org>
> Helped-by: Jeff King <peff@peff.net>

This version looks good to me. If we're going to flip the hash-cache
default, I think that should be a separate patch anyway.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/2] enable bitmap hash-cache by default
  2019-03-14 16:02                 ` Jeff King
@ 2019-03-15  6:21                   ` Jeff King
  2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
  2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-03-15  6:21 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

On Thu, Mar 14, 2019 at 12:02:56PM -0400, Jeff King wrote:

> On Thu, Mar 14, 2019 at 09:12:54AM +0000, Eric Wong wrote:
> 
> > > The reason it defaults to off is for on-disk compatibility with JGit.
> > 
> > Right.  Our documentation seems to indicate JGit just warns (but
> > doesn't fall over), so maybe that can be considered separately.
> 
> I think it was a hard error in the beginning, but they changed it pretty
> soon after we added more flags. So it might be reasonable to just enable
> it by default (but it wouldn't hurt to double check the behavior).
> 
> I tried running t5310 (which does a back-and-forth with jgit) using this
> patch:

I dug up the actual JGit change, and it was indeed from 2014. So here's
a more complete series to handle that. There's a minor performance
mystery in the second patch, but I think it might be OK to proceed even
without solving it.

Conceptually these go on top of your patch, but they could be applied
separately.

  [1/2]: t5310: correctly remove bitmaps for jgit test
  [2/2]: pack-objects: default to writing bitmap hash-cache

 Documentation/config/pack.txt      | 4 +---
 builtin/pack-objects.c             | 2 +-
 t/perf/p5310-pack-bitmaps.sh       | 3 +--
 t/perf/p5311-pack-bitmaps-fetch.sh | 1 -
 t/t5310-pack-bitmaps.sh            | 5 ++---
 5 files changed, 5 insertions(+), 10 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/2] t5310: correctly remove bitmaps for jgit test
  2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
@ 2019-03-15  6:22                     ` Jeff King
  2019-03-15 13:25                       ` SZEDER Gábor
  2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-03-15  6:22 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

We use "jgit gc" to generate a pack bitmap file, and then make sure our
implementation can read it. To prepare the repo before running jgit, we
try to "rm -f" any existing bitmap files. But we got the path wrong;
we're in a bare repo, so looking in ".git/" finds nothing. Our "rm"
doesn't complain because of the "-f", and when we run "rev-list" there
are two bitmap files (ours and jgit's).

Our reader implementation will ignore one of the bitmap files, but it's
likely non-deterministic which one we will use. We'd prefer the one with
the more recent timestamp (just because of the way the packed_git list
is sorted), but in most test runs they'd have identical timestamps.

So this was probably actually testing something useful about 50% of the
time, and other half just testing that we could read our own bitmaps
(which is covered elsewhere).

Signed-off-by: Jeff King <peff@peff.net>
---
Just a cleanup I noticed in the area; can be applied independently from
patch 2.

 t/t5310-pack-bitmaps.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 82d7f7f6a5..44a038881a 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -269,7 +269,7 @@ test_expect_success JGIT 'we can read jgit bitmaps' '
 	git clone --bare . compat-jgit.git &&
 	(
 		cd compat-jgit.git &&
-		rm -f .git/objects/pack/*.bitmap &&
+		rm -f objects/pack/*.bitmap &&
 		jgit gc &&
 		git rev-list --test-bitmap HEAD
 	)
-- 
2.21.0.543.g90eed137f3


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/2] pack-objects: default to writing bitmap hash-cache
  2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
  2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
@ 2019-03-15  6:25                     ` Jeff King
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-03-15  6:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

Enabling pack.writebitmaphashcache should always be a performance win.
It costs only 4 bytes per object on disk, and the timings in ae4f07fbcc
(pack-bitmap: implement optional name_hash cache, 2013-12-21) show it
improving fetch and partial-bitmap clone times by 40-50%.

The only reason we didn't enable it by default at the time is that early
versions of JGit's bitmap reader complained about the presence of
optional header bits it didn't understand. But that was changed in
JGit's d2fa3987a (Use bitcheck to check for presence of OPT_FULL option,
2013-10-30), which made it into JGit v3.5.0 in late 2014.

So let's turn this option on by default. It's backwards-compatible with
all versions of Git, and if you are also using JGit on the same
repository, you'd only run into problems using a version that's almost 5
years old.

We'll drop the manual setting from all of our test scripts, including
perf tests. This isn't strictly necessary, but it has two advantages:

  1. If the hash-cache ever stops being enabled by default, our perf
     regression tests will notice.

  2. We can use the modified perf tests to show off the behavior of an
     otherwise unconfigured repo, as shown below.

These are the results of a few of a perf tests against linux.git that
showed interesting results. You can see the expected speedup in 5310.4,
which was noted in ae4f07fbcc. Curiously, 5310.8 did not improve (and
actually got slower), despite seeing the opposite in ae4f07fbcc.
I don't have an explanation for that.

The tests from p5311 did not exist back then, but do show improvements
(a smaller pack due to better deltas, which we found in less time).

  Test                                    HEAD^                HEAD
  -------------------------------------------------------------------------------------
  5310.4: simulated fetch                 7.39(22.70+0.25)     5.64(11.43+0.22) -23.7%
  5310.8: clone (partial bitmap)          18.45(24.83+1.19)    19.94(28.40+1.36) +8.1%
  5311.31: server (128 days)              0.41(1.13+0.05)      0.34(0.72+0.02) -17.1%
  5311.32: size   (128 days)                         7.4M                 7.0M -4.8%
  5311.33: client (128 days)              1.33(1.49+0.06)      1.29(1.37+0.12) -3.0%

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config/pack.txt      | 4 +---
 builtin/pack-objects.c             | 2 +-
 t/perf/p5310-pack-bitmaps.sh       | 3 +--
 t/perf/p5311-pack-bitmaps-fetch.sh | 1 -
 t/t5310-pack-bitmaps.sh            | 3 +--
 5 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 425c73aa52..9cdcfa7324 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -124,6 +124,4 @@ pack.writeBitmapHashCache::
 	bitmapped and non-bitmapped objects (e.g., when serving a fetch
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
-	bytes per object of disk space, and that JGit's bitmap
-	implementation does not understand it, causing it to complain if
-	Git and JGit are used on the same repository. Defaults to false.
+	bytes per object of disk space. Defaults to true.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a154fc29f6..5264bf055a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -97,7 +97,7 @@ static off_t reuse_packfile_offset;
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
 static int write_bitmap_index;
-static uint16_t write_bitmap_options;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
 
 static int exclude_promisor_objects;
 
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index bb91dbb173..6a3a42531b 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -12,8 +12,7 @@ test_perf_large_repo
 # We intentionally use the deprecated pack.writebitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true &&
-	git config pack.writebitmaphashcache true
+	git config pack.writebitmaps true
 '
 
 test_perf 'repack to disk' '
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index b04575951f..47c3fd7581 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -7,7 +7,6 @@ test_perf_default_repo
 
 test_expect_success 'create bitmapped server repo' '
 	git config pack.writebitmaps true &&
-	git config pack.writebitmaphashcache true &&
 	git repack -ad
 '
 
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 44a038881a..a26c8ba9a2 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -34,8 +34,7 @@ test_expect_success 'setup repo with moderate-sized history' '
 	bitmaptip=$(git rev-parse master) &&
 	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
 	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true &&
-	git config pack.writebitmaphashcache true
+	git config repack.writebitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
-- 
2.21.0.543.g90eed137f3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] t5310: correctly remove bitmaps for jgit test
  2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
@ 2019-03-15 13:25                       ` SZEDER Gábor
  2019-03-15 18:36                         ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: SZEDER Gábor @ 2019-03-15 13:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Wong, git, Ævar Arnfjörð Bjarmason

On Fri, Mar 15, 2019 at 02:22:44AM -0400, Jeff King wrote:
> We use "jgit gc" to generate a pack bitmap file, and then make sure our
> implementation can read it. To prepare the repo before running jgit, we
> try to "rm -f" any existing bitmap files. But we got the path wrong;
> we're in a bare repo, so looking in ".git/" finds nothing. Our "rm"
> doesn't complain because of the "-f", and when we run "rev-list" there
> are two bitmap files (ours and jgit's).

Oh, indeed.

> Our reader implementation will ignore one of the bitmap files, but it's
> likely non-deterministic which one we will use. We'd prefer the one with
> the more recent timestamp (just because of the way the packed_git list
> is sorted), but in most test runs they'd have identical timestamps.
> 
> So this was probably actually testing something useful about 50% of the
> time, and other half just testing that we could read our own bitmaps
> (which is covered elsewhere).

At least it was testing the right thing up until 87a6bb701a
(t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEX,
2018-05-10) came along.


>  t/t5310-pack-bitmaps.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 82d7f7f6a5..44a038881a 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -269,7 +269,7 @@ test_expect_success JGIT 'we can read jgit bitmaps' '
>  	git clone --bare . compat-jgit.git &&
>  	(
>  		cd compat-jgit.git &&
> -		rm -f .git/objects/pack/*.bitmap &&
> +		rm -f objects/pack/*.bitmap &&
>  		jgit gc &&
>  		git rev-list --test-bitmap HEAD
>  	)
> -- 
> 2.21.0.543.g90eed137f3
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] t5310: correctly remove bitmaps for jgit test
  2019-03-15 13:25                       ` SZEDER Gábor
@ 2019-03-15 18:36                         ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-03-15 18:36 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Eric Wong, git, Ævar Arnfjörð Bjarmason

On Fri, Mar 15, 2019 at 02:25:45PM +0100, SZEDER Gábor wrote:

> > Our reader implementation will ignore one of the bitmap files, but it's
> > likely non-deterministic which one we will use. We'd prefer the one with
> > the more recent timestamp (just because of the way the packed_git list
> > is sorted), but in most test runs they'd have identical timestamps.
> > 
> > So this was probably actually testing something useful about 50% of the
> > time, and other half just testing that we could read our own bitmaps
> > (which is covered elsewhere).
> 
> At least it was testing the right thing up until 87a6bb701a
> (t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEX,
> 2018-05-10) came along.

Heh, indeed. I didn't even dig into the history, and just assumed I had
botched it in 2013. :)

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-03-14  9:12               ` [PATCH v3] " Eric Wong
  2019-03-14 16:02                 ` Jeff King
@ 2019-04-09 15:10                 ` Ævar Arnfjörð Bjarmason
  2019-04-10 22:57                   ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-09 15:10 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git


On Thu, Mar 14 2019, Eric Wong wrote:

> Jeff King <peff@peff.net> wrote:
>> On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote:
>>
>> > But I did find Ævar's forgotten gitperformance doc and thread
>> > where the topic was brought up:
>> >
>> >   https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/
>>
>> One thing that thread reminded me of: we probably also want to default
>> pack.writebitmaphashcache on. Otherwise the time saved during the object
>> enumeration can backfire when we spend much more time trying delta
>> compression (because we don't know the pathnames of any objects).
>
> Interesting...  I found a big improvement with public-inbox
> just using bitmaps; but have never tried the hash cache.
>
>> The reason it defaults to off is for on-disk compatibility with JGit.
>
> Right.  Our documentation seems to indicate JGit just warns (but
> doesn't fall over), so maybe that can be considered separately.
>
> I've never used JGit myself; and was satisfied enough with
> bitmaps alone that I never tried the hash-cache.
>
>> But I have very little experience running without the hash-cache on. We
>> added it very early on because we found performance was not great
>> without it (I don't know if people running JGit have run into the same
>> problem and if not, why not).
>
> As far as serving clones and fetches, public-inbox-init has
> always created bare repos with bitmaps enabled, but without
> the hash-cache for compatibility concerns.
>
> That's a lot of fetches and clones over the years.
>
>> > +test_expect_success 'incremental repack does not complain' '
>> > +	git -C bare.git repack -q 2>repack.err &&
>> > +	! test -s repack.err
>> > +'
>>
>> This last line could use "test_must_be_empty".
>
> Thanks for the review!
>
> ---------8<-----------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
>
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> Helped-by: Jeff King <peff@peff.net>
> ---
>  Documentation/config/repack.txt |  2 +-
>  builtin/repack.c                |  5 ++++-
>  t/t7700-repack.sh               | 19 ++++++++++++++++++-
>  3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
> index a5c37813fd..9c413e177e 100644
> --- a/Documentation/config/repack.txt
> +++ b/Documentation/config/repack.txt
> @@ -24,4 +24,4 @@ repack.writeBitmaps::
>  	packs created for clones and fetches, at the cost of some disk
>  	space and extra time spent on the initial repack.  This has
>  	no effect if multiple packfiles are created.
> -	Defaults to false.
> +	Defaults to true on bare repos, false otherwise.
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 67f8978043..caca113927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -14,7 +14,7 @@
>
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> -static int write_bitmaps;
> +static int write_bitmaps = -1;
>  static int use_delta_islands;
>  static char *packdir, *packtmp;
>
> @@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
>  		die(_("--keep-unreachable and -A are incompatible"));
>
> +	if (write_bitmaps < 0)
> +		write_bitmaps = (pack_everything & ALL_INTO_ONE) &&
> +				 is_bare_repository();
>  	if (pack_kept_objects < 0)
>  		pack_kept_objects = write_bitmaps;
>
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 6162e2a8e6..86d05160a3 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' '
>  	)
>  '
>
> -test_done
> +test_expect_success 'bitmaps are created by default in bare repos' '
> +	git clone --bare .git bare.git &&
> +	git -C bare.git repack -ad &&
> +	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
> +	test_path_is_file "$bitmap"
> +'
> +
> +test_expect_success 'incremental repack does not complain' '
> +	git -C bare.git repack -q 2>repack.err &&
> +	test_must_be_empty repack.err
> +'
>
> +test_expect_success 'bitmaps can be disabled on bare repos' '
> +	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
> +	bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) &&
> +	test -z "$bitmap"
> +'
> +
> +test_done

I've found a case where turning bitmaps on does horrible things for
bitmap "push" performance.

As it turns out it's not related to this patch per-se, because I had a
*.bitmap for other reasons, but replying to this because we'd presumably
get the same thing in the bare repo case once this merges down.

I can't share the repo, but I had a report where just a "git push" of a
topic branch that was 2/58 ahead/behind took ~2 minutes just in
"Enumerating objects", but ~500ms without bitmaps.

Using a horrible "print to stderr"[1] monkeypatch I'd get without
bitmaps and reported by trace2 / ts:

    Apr 09 16:45:15 16:45:15.365817 git.c:433                         | d1 | main                     | cmd_name     |     |           |           |            | pack-objects (push/pack-objects)
    Apr 09 16:45:15 16:45:15.366220 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000928 |           | pack-objects | label:enumerate-objects
    Apr 09 16:45:15 16:45:15.366241 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000950 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:45:15 16:45:15.366384 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001091 |  0.000141 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:45:15 16:45:15.366394 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001102 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:45:15 get obj list 1
    Apr 09 16:45:15 get obj list 2, did 29391 lines
    Apr 09 16:45:15 get obj list 3
    Apr 09 16:45:15 get obj list 4
    Apr 09 16:45:15 get obj list 5
    Apr 09 16:45:15 get obj list 6
    Apr 09 16:45:15 get obj list 7
    Apr 09 16:45:15 get obj list 8
    Apr 09 16:45:15 get obj list 9
    Apr 09 16:45:15 16:45:15.776559 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  |  0.411263 |  0.410161 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:45:15 16:45:15.776577 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  |  0.411285 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:45:15 16:45:15.776584 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  |  0.411292 |  0.000007 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:45:15 16:45:15.776605 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  |  0.411313 |  0.410385 | pack-objects | label:enumerate-objects
    Apr 09 16:45:15 16:45:15.776609 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  |  0.411318 |           | pack-objects | label:write-pack-file
    Apr 09 16:45:15 16:45:15.794235 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  |  0.428942 |  0.017624 | pack-objects | label:write-pack-file

But with pack.useBitmaps=true:

    Apr 09 16:39:59 16:39:59.139022 git.c:433                         | d1 | main                     | cmd_name     |     |           |           |            | pack-objects (push/pack-objects)
    Apr 09 16:39:59 16:39:59.139398 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000869 |           | pack-objects | label:enumerate-objects
    Apr 09 16:39:59 16:39:59.139419 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000892 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:39:59 16:39:59.139551 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001023 |  0.000131 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:39:59 16:39:59.139559 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001032 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:39:59 get obj list 1
    Apr 09 16:39:59 get obj list 2, did 29392 lines
    Apr 09 16:39:59 get obj list 3
    Apr 09 16:39:59 prepping walk
    Apr 09 16:39:59 opening packed bitmap...
    Apr 09 16:39:59 opening packed bitmap done
    Apr 09 16:39:59 walking 29392 pending
    Apr 09 16:39:59 done walking 29392 pending
    Apr 09 16:39:59 prepare_bitmap_walk 3
    Apr 09 16:39:59 prepare_bitmap_walk 4
    Apr 09 16:39:59 prepare_bitmap_walk 5
    Apr 09 16:40:00 prepare_bitmap_walk 6
    Apr 09 16:40:00 prepare_bitmap_walk 6.1
    Apr 09 16:41:35 prepare_bitmap_walk 6.2
    Apr 09 16:41:35 prepare_bitmap_walk 7
    Apr 09 16:41:52 prepare_bitmap_walk 8
    Apr 09 16:41:52 walking?
    Apr 09 16:41:52 traversing
    Apr 09 16:41:52 traversing done
    Apr 09 16:41:52 16:41:52.091634 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  | 112.953099 | 112.952067 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:41:52 16:41:52.091655 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  | 112.953128 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:41:52 16:41:52.091668 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  | 112.953141 |  0.000013 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:41:52 16:41:52.091700 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  | 112.953172 | 112.952303 | pack-objects | label:enumerate-objects
    Apr 09 16:41:52 16:41:52.091706 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  | 112.953179 |           | pack-objects | label:write-pack-file
    Apr 09 16:41:52 16:41:52.111966 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  | 112.973438 |  0.020259 | pack-objects | label:write-pack-file

I.e. almost all the time is in get_object_list_from_bitmap() and around
1m30s in just this in pack-bitmap.c:

    haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);

And then another ~20s in:

    wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);

This is with 10 packs and where only the largest (initial clone pack)
had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b',
i.e. with only one pack with a *.bitmap, although that makes it a bit
better for the first bit, and almost completely cuts down on the time
spent in the second phase:

    Apr 09 17:08:37 17:08:37.261507 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000922 |           | pack-objects | label:enumerate-objects
    Apr 09 17:08:37 17:08:37.261527 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000943 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 17:08:37 17:08:37.261600 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001015 |  0.000072 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 17:08:37 17:08:37.261608 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001024 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 17:08:37 get obj list 1
    Apr 09 17:08:37 get obj list 2, did 29380 lines
    Apr 09 17:08:37 get obj list 3
    Apr 09 17:08:37 prepping walk
    Apr 09 17:08:37 opening packed bitmap...
    Apr 09 17:08:37 opening packed bitmap done
    Apr 09 17:08:37 walking 29380 pending
    Apr 09 17:08:37 done walking 29380 pending
    Apr 09 17:08:37 prepare_bitmap_walk 3
    Apr 09 17:08:37 prepare_bitmap_walk 4
    Apr 09 17:08:37 prepare_bitmap_walk 5
    Apr 09 17:08:38 prepare_bitmap_walk 6
    Apr 09 17:08:38 prepare_bitmap_walk 6.1
    Apr 09 17:09:07 prepare_bitmap_walk 6.2
    Apr 09 17:09:07 prepare_bitmap_walk 7
    Apr 09 17:09:09 prepare_bitmap_walk 8
    Apr 09 17:09:09 walking?
    Apr 09 17:09:09 traversing
    Apr 09 17:09:09 traversing done
    Apr 09 17:09:09 17:09:09.229185 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  | 31.968595 | 31.967571 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 17:09:09 17:09:09.229203 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  | 31.968619 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 17:09:09 17:09:09.229214 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  | 31.968630 |  0.000011 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 17:09:09 17:09:09.229242 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  | 31.968658 | 31.967736 | pack-objects | label:enumerate-objects
    Apr 09 17:09:09 17:09:09.229265 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  | 31.968681 |           | pack-objects | label:write-pack-file
    Apr 09 17:09:09 17:09:09.251998 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  | 31.991412 |  0.022731 | pack-objects | label:write-pack-file

I don't have time to dig more into this now, just wanted to send these
initial results...

1.



    diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
    index a154fc29f6..8b2af1740e 100644
    --- a/builtin/pack-objects.c
    +++ b/builtin/pack-objects.c
    @@ -3052,2 +3052,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
     {
    +	fprintf(stderr, "prepping walk\n");
     	if (!(bitmap_git = prepare_bitmap_walk(revs)))
    @@ -3055,2 +3056,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs)

    +	fprintf(stderr, "walking?\n");
     	if (pack_options_allow_reuse() &&
    @@ -3066,3 +3068,5 @@ static int get_object_list_from_bitmap(struct rev_info *revs)

    +	fprintf(stderr, "traversing\n");
     	traverse_bitmap_commit_list(bitmap_git, &add_object_entry_from_bitmap);
    +	fprintf(stderr, "traversing done\n");
     	return 0;
    @@ -3091,2 +3095,3 @@ static void get_object_list(int ac, const char **av)
     	int save_warning;
    +	int lns = 0;

    @@ -3102,3 +3107,5 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 1\n");
     	while (fgets(line, sizeof(line), stdin) != NULL) {
    +		lns++;
     		int len = strlen(line);
    @@ -3128,4 +3135,6 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 2, did %d lines\n", lns);
     	warn_on_object_refname_ambiguity = save_warning;

    +	fprintf(stderr, "get obj list 3\n");
     	if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
    @@ -3133,2 +3142,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 4\n");
     	if (use_delta_islands)
    @@ -3136,2 +3146,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 5\n");
     	if (prepare_revision_walk(&revs))
    @@ -3140,2 +3151,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 6\n");
     	if (!fn_show_object)
    @@ -3146,2 +3158,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 7\n");
     	if (unpack_unreachable_expiration) {
    @@ -3157,2 +3170,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 8\n");
     	if (keep_unreachable)
    @@ -3163,2 +3177,3 @@ static void get_object_list(int ac, const char **av)
     		loosen_unused_packed_objects(&revs);
    +	fprintf(stderr, "get obj list 9\n");

    @@ -3478,3 +3493,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
     			    the_repository);
    +	trace2_region_enter("pack-objects", "enumerate-objects-prepare-packing-data",
    +			    the_repository);
     	prepare_packing_data(the_repository, &to_pack);
    +	trace2_region_leave("pack-objects", "enumerate-objects-prepare-packing-data",
    +			    the_repository);

    @@ -3482,11 +3501,28 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
     		progress_state = start_progress(_("Enumerating objects"), 0);
    -	if (!use_internal_rev_list)
    +	if (!use_internal_rev_list) {
    +		trace2_region_enter("pack-objects", "enumerate-objects-read-stdin",
    +				    the_repository);
     		read_object_list_from_stdin();
    -	else {
    +		trace2_region_leave("pack-objects", "enumerate-objects-read-stdin",
    +				    the_repository);
    +	} else {
    +		trace2_region_enter("pack-objects", "enumerate-objects-get-obj-list",
    +				    the_repository);
     		get_object_list(rp.argc, rp.argv);
     		argv_array_clear(&rp);
    +		trace2_region_leave("pack-objects", "enumerate-objects-get-obj-list",
    +				    the_repository);
     	}
    +	trace2_region_enter("pack-objects", "enumerate-objects-cleanup-preferred-base",
    +			    the_repository);
     	cleanup_preferred_base();
    -	if (include_tag && nr_result)
    +	trace2_region_leave("pack-objects", "enumerate-objects-cleanup-preferred-base",
    +			    the_repository);
    +	if (include_tag && nr_result) {
    +		trace2_region_enter("pack-objects", "enumerate-objects-add-tags",
    +				    the_repository);
     		for_each_ref(add_ref_tag, NULL);
    +		trace2_region_leave("pack-objects", "enumerate-objects-add-tags",
    +				    the_repository);
    +	}
     	stop_progress(&progress_state);
    diff --git a/pack-bitmap.c b/pack-bitmap.c
    index 4695aaf6b4..0ab71597fd 100644
    --- a/pack-bitmap.c
    +++ b/pack-bitmap.c
    @@ -693,5 +693,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 * because we may not need to use it */
    +	fprintf(stderr, "opening packed bitmap...\n");
     	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
     		goto cleanup;
    +	fprintf(stderr, "opening packed bitmap done\n");

    +	fprintf(stderr, "walking %d pending\n", revs->pending.nr);
     	for (i = 0; i < revs->pending.nr; ++i) {
    @@ -720,2 +723,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	}
    +	fprintf(stderr, "done walking %d pending\n", revs->pending.nr);

    @@ -726,2 +730,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 */
    +	fprintf(stderr, "prepare_bitmap_walk 3\n");
     	if (haves && !in_bitmapped_pack(bitmap_git, haves))
    @@ -729,2 +734,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)

    +	fprintf(stderr, "prepare_bitmap_walk 4\n");
     	/* if we don't want anything, we're done here */
    @@ -738,2 +744,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 */
    +	fprintf(stderr, "prepare_bitmap_walk 5\n");
     	if (load_pack_bitmap(bitmap_git) < 0)
    @@ -741,2 +748,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)

    +	fprintf(stderr, "prepare_bitmap_walk 6\n");
     	object_array_clear(&revs->pending);
    @@ -745,3 +753,5 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     		revs->ignore_missing_links = 1;
    + 		fprintf(stderr, "prepare_bitmap_walk 6.1\n");
     		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
    + 		fprintf(stderr, "prepare_bitmap_walk 6.2\n");
     		reset_revision_walk();
    @@ -752,4 +762,6 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	}
    +	fprintf(stderr, "prepare_bitmap_walk 7\n");

     	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
    +	fprintf(stderr, "prepare_bitmap_walk 8\n");

    diff --git a/revision.c b/revision.c
    index eb8e51bc63..4592d01ee7 100644
    --- a/revision.c
    +++ b/revision.c
    @@ -63,2 +63,4 @@ static void mark_tree_contents_uninteresting(struct repository *r,

    +	fprintf(stderr, "MTCU\n");
    +
     	if (parse_tree_gently(tree, 1) < 0)
    @@ -167,2 +169,4 @@ static void add_children_by_path(struct repository *r,

    +	fprintf(stderr, "ACBP\n");
    +
     	if (!tree)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
@ 2019-04-10 22:57                   ` Jeff King
  2019-04-25  7:16                     ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-10 22:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eric Wong, git

On Tue, Apr 09, 2019 at 05:10:43PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I've found a case where turning bitmaps on does horrible things for
> bitmap "push" performance.
> [...]
> I can't share the repo, but I had a report where just a "git push" of a
> topic branch that was 2/58 ahead/behind took ~2 minutes just in
> "Enumerating objects", but ~500ms without bitmaps.

That's pretty bad, though I'm not terribly surprised. The worst cases
are ones where we have to traverse a lot to fill in the bitmap. So
either there are a lot of commits newer than the bitmapped pack, or
we've done a bad job of picking old commits to bitmap, requiring us to
walk back to find all of the reachable objects (until we find something
that does have a bitmap).

And "bad" here is somewhat subjective. The other side told us some
"have" lines, and those are what we have to walk back from. _Usually_
those would correspond to ref tips, and the bitmap code tries to put a
bitmap at each ref tip. But if you have tons of refs, it can't always do
so.

> I.e. almost all the time is in get_object_list_from_bitmap() and around
> 1m30s in just this in pack-bitmap.c:
> 
>     haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
> 
> And then another ~20s in:
> 
>     wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);

Yeah, that's where I'd expect the time to go. Inside find_objects()
we'll call traverse_commit_list() to do the actual walk.

The first walk for "haves" is looking mostly at old commits. Stuff that
is mentioned in the pack, but for which we have to walk to find the
bitmap.

The second walk for wants is probably mostly looking at new commits.
Things that don't have a bitmap yet, and for which we have to walk
(though it's interesting that it's so much more expensive than the
non-bitmap walk, which has to fully enumerate those trees itself; that
implies that some of the "haves" are recent, too, with respect to the
last pack).

> This is with 10 packs and where only the largest (initial clone pack)
> had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b',
> i.e. with only one pack with a *.bitmap, although that makes it a bit
> better for the first bit, and almost completely cuts down on the time
> spent in the second phase:

Yeah, that makes sense. By repacking you've taken all those new commits
and included them in on-disk bitmaps. So I'd expect the "wants" to get
much shorter, but the "haves" phase staying long means we could do a
better job of picking commits to have on-disk bitmaps.

So two avenues for exploration I think:

  1. I've long suspected that the bitmap selection code isn't ideal.
     Both in terms of what it picks, but also in its runtime (I think it
     ends up walking the same slices of history multiple times in some
     cases).

  2. The answer we get from a bitmap versus a regular traversal are not
     apples-to-apples equivalent. The regular traversal walks down to
     the UNINTERESTING commits, marks the boundaries trees and blobs as
     UNINTERESTING, and then adds in all the interesting trees and blobs
     minus the UNINTERESTING parts. So it can sometimes give the wrong
     answer, claiming something is interesting when it is not.

     Whereas with bitmaps we fill in the trees and blobs as we walk, and
     you get the true answer. But it means we may open up a lot more
     trees than the equivalent traversal would.

     So one thing I've never really experimented with (and indeed, never
     really thought about until writing this email) is that the bitmaps
     could try to do that looser style of traversal, knowing that we
     might err on the side of calling things interesting in a few cases.
     But hopefully spending a lot less time opening trees.

     I'm not even 100% sure what that would look like in code, but just
     thinking about it from a high level, I don't there's a particular
     mathematical reason it couldn't work.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] prune: use bitmaps for reachability traversal
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
  2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
@ 2019-04-15 15:00   ` Derrick Stolee
  2019-04-18 19:49     ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2019-04-15 15:00 UTC (permalink / raw)
  To: Jeff King, git

On 2/13/2019 11:37 PM, Jeff King wrote:
> +static void *lookup_object_by_type(struct repository *r,
> +				   const struct object_id *oid,
> +				   enum object_type type)
> +{
> +	switch (type) {
> +	case OBJ_COMMIT:
> +		return lookup_commit(r, oid);
> +	case OBJ_TREE:
> +		return lookup_tree(r, oid);
> +	case OBJ_TAG:
> +		return lookup_tag(r, oid);
> +	case OBJ_BLOB:
> +		return lookup_blob(r, oid);
> +	default:
> +		die("BUG: unknown object type %d", type);
> +	}
> +}
> +
> +static int mark_object_seen(const struct object_id *oid,
> +			     enum object_type type,
> +			     int exclude,
> +			     uint32_t name_hash,
> +			     struct packed_git *found_pack,
> +			     off_t found_offset)
> +{
> +	struct object *obj = lookup_object_by_type(the_repository, oid, type);
> +	if (!obj)
> +		die("unable to create object '%s'", oid_to_hex(oid));
> +
> +	obj->flags |= SEEN;
> +	return 0;
> +}
> +
>  void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
>  			    timestamp_t mark_recent, struct progress *progress)
>  {
>  	struct connectivity_progress cp;
> +	struct bitmap_index *bitmap_git;
>  
>  	/*
>  	 * Set up revision parsing, and mark us as being interested
> @@ -188,6 +223,13 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
>  	cp.progress = progress;
>  	cp.count = 0;
>  
> +	bitmap_git = prepare_bitmap_walk(revs);
> +	if (bitmap_git) {
> +		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
> +		free_bitmap_index(bitmap_git);
> +		return;
> +	}
> +

Peff,

This block after "if (bitmap_git)" is not exercised by the (non-performance)
test suite, so the rest of the code above is not tested, either. Could a test
of this "prune" capability be added to the regression tests around the bitmaps?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] prune: use bitmaps for reachability traversal
  2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
@ 2019-04-18 19:49     ` Jeff King
  2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-18 19:49 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Mon, Apr 15, 2019 at 11:00:45AM -0400, Derrick Stolee wrote:

> >  void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
> >  			    timestamp_t mark_recent, struct progress *progress)
> [...]
> >  	cp.progress = progress;
> >  	cp.count = 0;
> >  
> > +	bitmap_git = prepare_bitmap_walk(revs);
> > +	if (bitmap_git) {
> > +		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
> > +		free_bitmap_index(bitmap_git);
> > +		return;
> > +	}
> > +
> 
> This block after "if (bitmap_git)" is not exercised by the (non-performance)
> test suite, so the rest of the code above is not tested, either. Could a test
> of this "prune" capability be added to the regression tests around the bitmaps?

I have somewhat mixed feelings here.  We can add a test with a trivial
bitmap to get code coverage here. But as with many optimizations
(bitmaps, but also your commit graph work), we get a much more thorough
correctness test by re-running the whole test suite (though we do not
yet have one for running with bitmaps on), and a better test that the
optimization is kicking in and working via the t/perf tests.

I dunno. I guess it does not hurt to at least to at least make sure this
code is running in the normal suite. I don't think that will find the
more interesting regressions, but at least saves us the from the most
egregious ones ("oops, the whole thing segfaults because somebody
changed the API" kinds of things).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-18 19:49     ` Jeff King
@ 2019-04-18 20:08       ` Jeff King
  2019-04-20  1:01         ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-18 20:08 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote:

> > This block after "if (bitmap_git)" is not exercised by the (non-performance)
> > test suite, so the rest of the code above is not tested, either. Could a test
> > of this "prune" capability be added to the regression tests around the bitmaps?
> 
> I have somewhat mixed feelings here.  We can add a test with a trivial
> bitmap to get code coverage here. But as with many optimizations
> (bitmaps, but also your commit graph work), we get a much more thorough
> correctness test by re-running the whole test suite (though we do not
> yet have one for running with bitmaps on), and a better test that the
> optimization is kicking in and working via the t/perf tests.
> 
> I dunno. I guess it does not hurt to at least to at least make sure this
> code is running in the normal suite. I don't think that will find the
> more interesting regressions, but at least saves us the from the most
> egregious ones ("oops, the whole thing segfaults because somebody
> changed the API" kinds of things).

So here's a test. This goes on top of jk/prune-optim (which is also
already in master).

-- >8 --
Subject: [PATCH] t5304: add a test for pruning with bitmaps

Commit fde67d6896 (prune: use bitmaps for reachability traversal,
2019-02-13) uses bitmaps for pruning when they're available, but only
covers this functionality in the t/perf tests. This makes a kind of
sense, since the point is that the behaviour is indistinguishable before
and after the patch, just faster.

But since the bitmap code path is not exercised at all in the regular
test suite, it leaves us open to a regression where the behavior does in
fact change. The most thorough way to test that would be running the
whole suite with bitmaps enabled. But we don't yet have a way to do
that, and anyway it's expensive to do so. Let's at least add a basic
test that exercises this path and make sure we prune an object we should
(and not one that we shouldn't).

That would hopefully catch the most obvious breakages early.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5304-prune.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 1eeb828a15..df60f18fb8 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -341,4 +341,12 @@ test_expect_success 'prune: handle expire option correctly' '
 	git prune --no-expire
 '
 
+test_expect_success 'trivial prune with bitmaps enabled' '
+	git repack -adb &&
+	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
+	git prune --expire=now &&
+	git cat-file -e HEAD &&
+	test_must_fail git cat-file -e $blob
+'
+
 test_done
-- 
2.21.0.1090.g9d17dac192


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
@ 2019-04-20  1:01         ` Derrick Stolee
  2019-04-20  3:24           ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2019-04-20  1:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On 4/18/2019 4:08 PM, Jeff King wrote:
> On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote:
>> I dunno. I guess it does not hurt to at least to at least make sure this
>> code is running in the normal suite. I don't think that will find the
>> more interesting regressions, but at least saves us the from the most
>> egregious ones ("oops, the whole thing segfaults because somebody
>> changed the API" kinds of things).

That's the level of coverage I was hoping to see. I won't be picky if
the OBJ_TAG case isn't hit or something.

> So here's a test. This goes on top of jk/prune-optim (which is also
> already in master).

[snip]

> +test_expect_success 'trivial prune with bitmaps enabled' '
> +	git repack -adb &&
> +	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
> +	git prune --expire=now &&
> +	git cat-file -e HEAD &&
> +	test_must_fail git cat-file -e $blob
> +'
> +
>  test_done

This test does cover the 'mark_object_seen()' method, which I tested by
removing the "!" in the "if (!obj) die();" condition.

However, my first inclination was to remove the line

	obj->flags |= SEEN;

from the method, and I found that it still worked! This was confusing,
so I looked for places where the SEEN flag was added during bitmap walks,
and it turns out that the objects are marked as SEEN by prepare_bitmap_walk().

This means that we can remove a lot of the implementation from reachable.c
and have the same effect (and drop an iteration through the objects). See the
diff below.

Thoughts?
-Stolee

-->8--

diff --git a/reachable.c b/reachable.c
index 0d00a91de4..7d2762d47f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -159,39 +159,6 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
                                      FOR_EACH_OBJECT_LOCAL_ONLY);
 }

-static void *lookup_object_by_type(struct repository *r,
-                                  const struct object_id *oid,
-                                  enum object_type type)
-{
-       switch (type) {
-       case OBJ_COMMIT:
-               return lookup_commit(r, oid);
-       case OBJ_TREE:
-               return lookup_tree(r, oid);
-       case OBJ_TAG:
-               return lookup_tag(r, oid);
-       case OBJ_BLOB:
-               return lookup_blob(r, oid);
-       default:
-               die("BUG: unknown object type %d", type);
-       }
-}
-
-static int mark_object_seen(const struct object_id *oid,
-                            enum object_type type,
-                            int exclude,
-                            uint32_t name_hash,
-                            struct packed_git *found_pack,
-                            off_t found_offset)
-{
-       struct object *obj = lookup_object_by_type(the_repository, oid, type);
-       if (!obj)
-               die("unable to create object '%s'", oid_to_hex(oid));
-
-       obj->flags |= SEEN;
-       return 0;
-}
-
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
                            timestamp_t mark_recent, struct progress *progress)
 {
@@ -225,7 +192,10 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,

        bitmap_git = prepare_bitmap_walk(revs);
        if (bitmap_git) {
-               traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+               /*
+                * reachable objects are marked as SEEN
+                * by prepare_bitmap_walk(revs).
+                */
                free_bitmap_index(bitmap_git);
                return;
        }

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-20  1:01         ` Derrick Stolee
@ 2019-04-20  3:24           ` Jeff King
  2019-04-20 21:01             ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-20  3:24 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Fri, Apr 19, 2019 at 09:01:50PM -0400, Derrick Stolee wrote:

> > +test_expect_success 'trivial prune with bitmaps enabled' '
> > +	git repack -adb &&
> > +	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
> > +	git prune --expire=now &&
> > +	git cat-file -e HEAD &&
> > +	test_must_fail git cat-file -e $blob
> > +'
> > +
> >  test_done
> 
> This test does cover the 'mark_object_seen()' method, which I tested by
> removing the "!" in the "if (!obj) die();" condition.
> 
> However, my first inclination was to remove the line
> 
> 	obj->flags |= SEEN;
> 
> from the method, and I found that it still worked! This was confusing,
> so I looked for places where the SEEN flag was added during bitmap walks,
> and it turns out that the objects are marked as SEEN by prepare_bitmap_walk().

I think this is only _some_ objects. The full bitmap is created by
reading the on-disk bitmaps, and then filling in any necessary gaps by
doing a traditional traversal. We also set it on tip objects to de-dup
the initial revs->pending list.

Try running t5304 with this:

diff --git a/reachable.c b/reachable.c
index eba6f64e01..7ec73ef43f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid,
 	if (!obj)
 		die("unable to create object '%s'", oid_to_hex(oid));
 
+	if (!(obj->flags & SEEN))
+		die("seen flag not already there");
 	obj->flags |= SEEN;
 	return 0;
 }

which shows that we are indeed freshly setting SEEN on some objects.

Interestingly, I don't _think_ you can cause an object to get pruned
accidentally here, because:

  1. Any object that will miss its SEEN has to be in the bitmapped pack,
     and not found through normal traversal.

  2. git-prune only removes loose objects, not packed ones.

  3. Loose objects cannot be in the bitmapped pack. Therefore, no object
     can hit both cases (1) and (2).

Even if that were the end of the story, I don't think I'd want to rely
on it here. The post-condition of the function should be that SEEN is
set on all reachable objects, whether bitmaps are used or not. And we do
indeed use this elsewhere:

We'll later call prune_shallow(), which uses SEEN to discard unreachable
entries. I'm not sure it even makes sense to have a shallow repo with
bitmaps, though.  Obviously we're lacking the full graph, but I wouldn't
be surprised if the shallow code quietly pretends that all is well and
we generate bogus bitmaps or something. Or maybe it even mostly works as
long as you don't walk over the shallow cut-points.

mark_reachable_objects() is also used for reflog expiration with
--stale-fix. I tried generating a test that would fail with your patch,
but I think it's not quite possible, because --stale-fix will do a
follow-up walk of the graph to see which objects mentioned in the reflog
we have and do not have. So it doesn't actually break things, it just
makes them slower (because we erroneously fail to mark SEEN things that
it's then forced to walk).

So I don't _think_ your patch actually breaks anything user-visible, but
it's a bit like leaving a live grenade in your living room for somebody
to find.

And I think we are indeed covering lookup_object_by_type(), etc in the
t5304 test I added. So AFAICT all of the new code is covered after that?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-20  3:24           ` Jeff King
@ 2019-04-20 21:01             ` Derrick Stolee
  0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2019-04-20 21:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On 4/19/2019 11:24 PM, Jeff King wrote:
> Try running t5304 with this:
> 
> diff --git a/reachable.c b/reachable.c
> index eba6f64e01..7ec73ef43f 100644
> --- a/reachable.c
> +++ b/reachable.c
> @@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid,
>  	if (!obj)
>  		die("unable to create object '%s'", oid_to_hex(oid));
>  
> +	if (!(obj->flags & SEEN))
> +		die("seen flag not already there");
>  	obj->flags |= SEEN;
>  	return 0;
>  }
> 
> which shows that we are indeed freshly setting SEEN on some objects.

Good point! Thanks for clearing that up for me.
 
> Interestingly, I don't _think_ you can cause an object to get pruned
> accidentally here, because:

[snip]

I appreciate the additional context that you gave (and I snipped). This
makes me comfortable accepting this test as an exercise of the code (to
guard against future changes that create failures like null-refs) and we
will need to rely on the performance suite to catch issues like removing
the SEEN markers that I had tested.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-04-10 22:57                   ` Jeff King
@ 2019-04-25  7:16                     ` Junio C Hamano
  2019-05-04  1:37                       ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-04-25  7:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Eric Wong, git

Jeff King <peff@peff.net> writes:

> On Tue, Apr 09, 2019 at 05:10:43PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I've found a case where turning bitmaps on does horrible things for
>> bitmap "push" performance.
>> [...]
>> I can't share the repo, but I had a report where just a "git push" of a
>> topic branch that was 2/58 ahead/behind took ~2 minutes just in
>> "Enumerating objects", but ~500ms without bitmaps.
>
> That's pretty bad, though I'm not terribly surprised.

I was revisiting the recent "What's cooking" report, and I am not
sure what the current status of the topic is.

I do not get a feel that the current bitmap implementation has been
widely used in repositories that have vastly different access
patterns---it probably has been tried only by those who can afford
the engineering cost to see if the implementation happens to work
well for their workload and some may have chosen to adopt it while
others didn't.  So it may be very well tuned for the former people
but once we merge this topic down, we'll hear from others with quite
different workload, which may lead to us tuning the code to bit
better to their workload while not hurting other existing users,
hopefully.

Or not.

I am somewhat tempted to make things more exciting by merging it to
'next' soonish, but I guess Ævar and you are not quite ready for
that excitement yet, judging from the following (which looks quite
sensible suggestions)?

Thanks.

> Yeah, that makes sense. By repacking you've taken all those new commits
> and included them in on-disk bitmaps. So I'd expect the "wants" to get
> much shorter, but the "haves" phase staying long means we could do a
> better job of picking commits to have on-disk bitmaps.
>
> So two avenues for exploration I think:
>
>   1. I've long suspected that the bitmap selection code isn't ideal.
>      Both in terms of what it picks, but also in its runtime (I think it
>      ends up walking the same slices of history multiple times in some
>      cases).
>
>   2. The answer we get from a bitmap versus a regular traversal are not
>      apples-to-apples equivalent. The regular traversal walks down to
>      the UNINTERESTING commits, marks the boundaries trees and blobs as
>      UNINTERESTING, and then adds in all the interesting trees and blobs
>      minus the UNINTERESTING parts. So it can sometimes give the wrong
>      answer, claiming something is interesting when it is not.
>
>      Whereas with bitmaps we fill in the trees and blobs as we walk, and
>      you get the true answer. But it means we may open up a lot more
>      trees than the equivalent traversal would.
>
>      So one thing I've never really experimented with (and indeed, never
>      really thought about until writing this email) is that the bitmaps
>      could try to do that looser style of traversal, knowing that we
>      might err on the side of calling things interesting in a few cases.
>      But hopefully spending a lot less time opening trees.
>
>      I'm not even 100% sure what that would look like in code, but just
>      thinking about it from a high level, I don't there's a particular
>      mathematical reason it couldn't work.
>
> -Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-04-25  7:16                     ` Junio C Hamano
@ 2019-05-04  1:37                       ` Jeff King
  2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-05-04  1:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Eric Wong, git

On Thu, Apr 25, 2019 at 04:16:46PM +0900, Junio C Hamano wrote:

> I was revisiting the recent "What's cooking" report, and I am not
> sure what the current status of the topic is.
> 
> I do not get a feel that the current bitmap implementation has been
> widely used in repositories that have vastly different access
> patterns---it probably has been tried only by those who can afford
> the engineering cost to see if the implementation happens to work
> well for their workload and some may have chosen to adopt it while
> others didn't.  So it may be very well tuned for the former people
> but once we merge this topic down, we'll hear from others with quite
> different workload, which may lead to us tuning the code to bit
> better to their workload while not hurting other existing users,
> hopefully.
> 
> Or not.

Note that Ævar's case was somebody running bitmaps locally and trying to
push, which I think is generally not a good match for bitmaps (even when
they work, they cost more to generate than what you save if you're only
pushing once).

The goal of Eric's patch was that by kicking in for bare repos, we'd
mostly be hitting servers that are serving up fetches. So if by
"workload" you mean that we some people might use bare repos for other
cases, yeah, there's a potential for confusion or regression there.

If you mean that bitmaps might not work for some workloads even when
we're serving a lot of fetches, I won't say that's _not_ true, but my
experience is that they are generally a net win. Both for the smaller
repositories we see on github.com, but also for big, busy ones that our
on-premises customers use.

  Actually, there is one curiosity with Eric's patch that I haven't
  tested. As I've mentioned before, we store "forks" as single
  repositories pointing to a single shared alternates repository. Since
  the bitmap code only handles one .bitmap per invocation, you really
  want just one big one in the shared repo. If "git repack" in the forks
  started generating one, that would be surprising and annoying.

  In practice this is a pretty extreme corner case. And a lot would
  depend on how you're using "repack" in the fork (e.g., a partial
  repack would know that it can't generate bitmaps anyway). I'm pretty
  sure it would not even impact our setup at all, but I can probably
  come up with a devils advocate one where it would.

> I am somewhat tempted to make things more exciting by merging it to
> 'next' soonish, but I guess Ævar and you are not quite ready for
> that excitement yet, judging from the following (which looks quite
> sensible suggestions)?

It's OK with me for this to go to 'next'. Note that the other two
patches from me could actually graduate separately. One is a
straight-out test fix, and the other should always be a win (and does
nothing if you're not already generating bitmaps).

By the way, there were some timing puzzles mentioned in that second
commit. I re-ran them today and everything was what I'd expect. So I
wonder if I just screwed up the timings before. I can re-write that
commit message if it hasn't made it to 'next' yet.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04  1:37                       ` Jeff King
@ 2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
  2019-05-04 13:23                           ` SZEDER Gábor
  2019-05-07  7:45                           ` Jeff King
  0 siblings, 2 replies; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-04  6:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor


On Sat, May 04 2019, Jeff King wrote:

> On Thu, Apr 25, 2019 at 04:16:46PM +0900, Junio C Hamano wrote:
>
>> I was revisiting the recent "What's cooking" report, and I am not
>> sure what the current status of the topic is.
>>
>> I do not get a feel that the current bitmap implementation has been
>> widely used in repositories that have vastly different access
>> patterns---it probably has been tried only by those who can afford
>> the engineering cost to see if the implementation happens to work
>> well for their workload and some may have chosen to adopt it while
>> others didn't.  So it may be very well tuned for the former people
>> but once we merge this topic down, we'll hear from others with quite
>> different workload, which may lead to us tuning the code to bit
>> better to their workload while not hurting other existing users,
>> hopefully.
>>
>> Or not.
>
> Note that Ævar's case was somebody running bitmaps locally and trying to
> push, which I think is generally not a good match for bitmaps (even when
> they work, they cost more to generate than what you save if you're only
> pushing once).

Right. It was *not* caused by this "enable bitmaps by default on bare
repos" patch (which I wasn't even running with), but *is* indicative of
a pretty big edge case with enabling bitmaps that *will* happen for some
on such bare repos if we ship the patch.

> The goal of Eric's patch was that by kicking in for bare repos, we'd
> mostly be hitting servers that are serving up fetches. So if by
> "workload" you mean that we some people might use bare repos for other
> cases, yeah, there's a potential for confusion or regression there.

As noted I suspect that's a really rare use-case in practice, and in
reply to Junio's <xmqq1s1qy2ox.fsf@gitster-ct.c.googlers.com> upthread I
think it's fine to just try this. Maybe we'll finally get such use-cases
out of the woodworks by turning it on by default.

As an aside this is the Nth time I notice how crappy that "Enumerating
objects" progress bar is. We do a *lot* of things there, including this
bitmap calculation.

But just splitting it up might result in either no progress (all
individually below 2 seconds), or a lot of noise as you have 20 things
that each take 2 seconds. I wonder if someone's looked at supporting:

    Enumerating Objects (X%) => Calculating bitmaps (Y%)

Where X% is the total progres, and %Y is the sub-progress. I eyeballed
doing this once by "chaining" the progress structs, but there's probably
a less crappy way...


> If you mean that bitmaps might not work for some workloads even when
> we're serving a lot of fetches, I won't say that's _not_ true, but my
> experience is that they are generally a net win. Both for the smaller
> repositories we see on github.com, but also for big, busy ones that our
> on-premises customers use.
>
>   Actually, there is one curiosity with Eric's patch that I haven't
>   tested. As I've mentioned before, we store "forks" as single
>   repositories pointing to a single shared alternates repository. Since
>   the bitmap code only handles one .bitmap per invocation, you really
>   want just one big one in the shared repo. If "git repack" in the forks
>   started generating one, that would be surprising and annoying.
>
>   In practice this is a pretty extreme corner case. And a lot would
>   depend on how you're using "repack" in the fork (e.g., a partial
>   repack would know that it can't generate bitmaps anyway). I'm pretty
>   sure it would not even impact our setup at all, but I can probably
>   come up with a devils advocate one where it would.
>
>> I am somewhat tempted to make things more exciting by merging it to
>> 'next' soonish, but I guess Ævar and you are not quite ready for
>> that excitement yet, judging from the following (which looks quite
>> sensible suggestions)?
>
> It's OK with me for this to go to 'next'. Note that the other two
> patches from me could actually graduate separately. One is a
> straight-out test fix, and the other should always be a win (and does
> nothing if you're not already generating bitmaps).
>
> By the way, there were some timing puzzles mentioned in that second
> commit. I re-ran them today and everything was what I'd expect. So I
> wonder if I just screwed up the timings before. I can re-write that
> commit message if it hasn't made it to 'next' yet.
>
> -Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
@ 2019-05-04 13:23                           ` SZEDER Gábor
  2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
  2019-05-07  7:45                           ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: SZEDER Gábor @ 2019-05-04 13:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Junio C Hamano, Eric Wong, git

On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:
> As an aside this is the Nth time I notice how crappy that "Enumerating
> objects" progress bar is.

And don't forget that the commit-graph progress bar still prints
nonsense :)

  https://public-inbox.org/git/87ef6ydds8.fsf@evledraar.gmail.com/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
  2019-05-04 13:23                           ` SZEDER Gábor
@ 2019-05-07  7:45                           ` Jeff King
  2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-05-07  7:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor

On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Note that Ævar's case was somebody running bitmaps locally and trying to
> > push, which I think is generally not a good match for bitmaps (even when
> > they work, they cost more to generate than what you save if you're only
> > pushing once).
> 
> Right. It was *not* caused by this "enable bitmaps by default on bare
> repos" patch (which I wasn't even running with), but *is* indicative of
> a pretty big edge case with enabling bitmaps that *will* happen for some
> on such bare repos if we ship the patch.

Yeah. To clarify my comments a bit: I do think it would be possible to
hit a weird case like this while serving fetches (i.e., as far as I know
there is nothing in what you saw that is inherent to pushes). But I do
think for serving fetches, bitmaps are overall a big net win (based on
my experiences).

So I think it may come down to a tradeoff: enabling this by default
would probably be a net win across the population, but that's little
comfort to the unlucky somebody who may see it as a regression. I'm not
sure which is more important to maintain.

> As an aside this is the Nth time I notice how crappy that "Enumerating
> objects" progress bar is. We do a *lot* of things there, including this
> bitmap calculation.
> 
> But just splitting it up might result in either no progress (all
> individually below 2 seconds), or a lot of noise as you have 20 things
> that each take 2 seconds. I wonder if someone's looked at supporting:
> 
>     Enumerating Objects (X%) => Calculating bitmaps (Y%)
> 
> Where X% is the total progres, and %Y is the sub-progress. I eyeballed
> doing this once by "chaining" the progress structs, but there's probably
> a less crappy way...

I don't think it needs to be split; I think we just need to update the
object count while we're traversing the bitmaps. The problem is that the
progress object is known to pack-objects.c. Without bitmaps, as the
revision machinery walks the graph, our callbacks bump the progress
meter every time we see an object.

With bitmaps, all that walking happens behind the scenes, and the bitmap
code delivers us the final answer. So we pause for a long time, and then
suddenly it shoots forward.

I think we'd want a way to tell the bitmap code to update our progress
meter as it traverses (both single objects, but also taking into account
when it finds a bitmap and then suddenly bumps the value by a large
amount).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-07  7:45                           ` Jeff King
@ 2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
  2019-05-08  7:11                               ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-07  8:12 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee


On Tue, May 07 2019, Jeff King wrote:

> On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Note that Ævar's case was somebody running bitmaps locally and trying to
>> > push, which I think is generally not a good match for bitmaps (even when
>> > they work, they cost more to generate than what you save if you're only
>> > pushing once).
>>
>> Right. It was *not* caused by this "enable bitmaps by default on bare
>> repos" patch (which I wasn't even running with), but *is* indicative of
>> a pretty big edge case with enabling bitmaps that *will* happen for some
>> on such bare repos if we ship the patch.
>
> Yeah. To clarify my comments a bit: I do think it would be possible to
> hit a weird case like this while serving fetches (i.e., as far as I know
> there is nothing in what you saw that is inherent to pushes). But I do
> think for serving fetches, bitmaps are overall a big net win (based on
> my experiences).
>
> So I think it may come down to a tradeoff: enabling this by default
> would probably be a net win across the population, but that's little
> comfort to the unlucky somebody who may see it as a regression. I'm not
> sure which is more important to maintain.
>
>> As an aside this is the Nth time I notice how crappy that "Enumerating
>> objects" progress bar is. We do a *lot* of things there, including this
>> bitmap calculation.
>>
>> But just splitting it up might result in either no progress (all
>> individually below 2 seconds), or a lot of noise as you have 20 things
>> that each take 2 seconds. I wonder if someone's looked at supporting:
>>
>>     Enumerating Objects (X%) => Calculating bitmaps (Y%)
>>
>> Where X% is the total progres, and %Y is the sub-progress. I eyeballed
>> doing this once by "chaining" the progress structs, but there's probably
>> a less crappy way...
>
> I don't think it needs to be split; I think we just need to update the
> object count while we're traversing the bitmaps. The problem is that the
> progress object is known to pack-objects.c. Without bitmaps, as the
> revision machinery walks the graph, our callbacks bump the progress
> meter every time we see an object.
>
> With bitmaps, all that walking happens behind the scenes, and the bitmap
> code delivers us the final answer. So we pause for a long time, and then
> suddenly it shoots forward.
>
> I think we'd want a way to tell the bitmap code to update our progress
> meter as it traverses (both single objects, but also taking into account
> when it finds a bitmap and then suddenly bumps the value by a large
> amount).

Not splitting it will fix the progress bar stalling, so it fixes the
problem that the user is wondering if the command is entirely hanging.

But I was hoping to give the user an idea of roughly where we're
spending our time, e.g. so you can see how much the pack.useSparse
setting is helping (or not).

So something where we report sub-progress as we go along, and perhaps
print some brief summary at the end if it took long enough, e.g.:

    Enumerating Objects (X^1%) => Marking trees (Y^1%)
    Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)

And at the end:

    Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other)

I.e. bringing the whole "nested" trace2 regions full circle with the
progress bar where we could elect to trace/show some of that info, and
then you could turn on some trace2 mode/verbose progress to see more.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
@ 2019-05-08  7:11                               ` Jeff King
  2019-05-08 14:20                                 ` Derrick Stolee
  2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-05-08  7:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee

On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I think we'd want a way to tell the bitmap code to update our progress
> > meter as it traverses (both single objects, but also taking into account
> > when it finds a bitmap and then suddenly bumps the value by a large
> > amount).
> 
> Not splitting it will fix the progress bar stalling, so it fixes the
> problem that the user is wondering if the command is entirely hanging.
> 
> But I was hoping to give the user an idea of roughly where we're
> spending our time, e.g. so you can see how much the pack.useSparse
> setting is helping (or not).

Yeah, I think that's a bigger and more complicated problem. I admit that
my main annoyance is just the stall while we fill in the bitmaps (and
it's easy because the bitmap traversal is the same unit of work as a
regular traversal).

> So something where we report sub-progress as we go along, and perhaps
> print some brief summary at the end if it took long enough, e.g.:
> 
>     Enumerating Objects (X^1%) => Marking trees (Y^1%)
>     Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)
> 
> And at the end:
> 
>     Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other)
> 
> I.e. bringing the whole "nested" trace2 regions full circle with the
> progress bar where we could elect to trace/show some of that info, and
> then you could turn on some trace2 mode/verbose progress to see more.

I do wonder if this really needs to be part of the progress bar. The
goal of the progress bar is to give the user a sense that work is
happening, and (if possible, but not for "enumerating") an idea of when
it might finish. If the trace code can already do detailed timings, then
shouldn't we just be encouraging people to use that?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08  7:11                               ` Jeff King
@ 2019-05-08 14:20                                 ` Derrick Stolee
  2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2019-05-08 14:20 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor

On 5/8/2019 3:11 AM, Jeff King wrote:
> On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
>>> I think we'd want a way to tell the bitmap code to update our progress
>>> meter as it traverses (both single objects, but also taking into account
>>> when it finds a bitmap and then suddenly bumps the value by a large
>>> amount).
>>
>> Not splitting it will fix the progress bar stalling, so it fixes the
>> problem that the user is wondering if the command is entirely hanging.
>>
>> But I was hoping to give the user an idea of roughly where we're
>> spending our time, e.g. so you can see how much the pack.useSparse
>> setting is helping (or not).
> 
> Yeah, I think that's a bigger and more complicated problem. I admit that
> my main annoyance is just the stall while we fill in the bitmaps (and
> it's easy because the bitmap traversal is the same unit of work as a
> regular traversal).

The pack.useSparse setting also speeds up a section that is not marked
by progress: that portion usually is walking all UNINTERESTING trees and
the"Enumerating Objects" progress is just for walking the INTERESTING objects.
 
>> So something where we report sub-progress as we go along, and perhaps
>> print some brief summary at the end if it took long enough, e.g.:
>>
>>     Enumerating Objects (X^1%) => Marking trees (Y^1%)
>>     Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)

I like this idea for splitting the "normal" mechanism, too:

     Enumerating Objects (X^1%) => Marking trees (Y^1%)
     Enumerating Objects (X^2%) => Enumerating objects to pack (Y^2%)

>> I.e. bringing the whole "nested" trace2 regions full circle with the
>> progress bar where we could elect to trace/show some of that info, and
>> then you could turn on some trace2 mode/verbose progress to see more.
> 
> I do wonder if this really needs to be part of the progress bar. The
> goal of the progress bar is to give the user a sense that work is
> happening, and (if possible, but not for "enumerating") an idea of when
> it might finish. If the trace code can already do detailed timings, then
> shouldn't we just be encouraging people to use that?

The problem I've seen (without bitmaps) is that running `git push` can
take a while before _any_ progress is listed.

Good news is: `pack.useSparse` fixed our push problem in the Windows OS
repo. The end-to-end time for `git push` sped up by 7.7x with the change,
and this "blank" time is too fast for users to notice.

Updating the progress could help in cases without pack.useSparse.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08  7:11                               ` Jeff King
  2019-05-08 14:20                                 ` Derrick Stolee
@ 2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
  2019-05-08 22:25                                   ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-08 16:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee


On Wed, May 08 2019, Jeff King wrote:

> On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I think we'd want a way to tell the bitmap code to update our progress
>> > meter as it traverses (both single objects, but also taking into account
>> > when it finds a bitmap and then suddenly bumps the value by a large
>> > amount).
>>
>> Not splitting it will fix the progress bar stalling, so it fixes the
>> problem that the user is wondering if the command is entirely hanging.
>>
>> But I was hoping to give the user an idea of roughly where we're
>> spending our time, e.g. so you can see how much the pack.useSparse
>> setting is helping (or not).
>
> Yeah, I think that's a bigger and more complicated problem. I admit that
> my main annoyance is just the stall while we fill in the bitmaps (and
> it's easy because the bitmap traversal is the same unit of work as a
> regular traversal).
>
>> So something where we report sub-progress as we go along, and perhaps
>> print some brief summary at the end if it took long enough, e.g.:
>>
>>     Enumerating Objects (X^1%) => Marking trees (Y^1%)
>>     Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)
>>
>> And at the end:
>>
>>     Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other)
>>
>> I.e. bringing the whole "nested" trace2 regions full circle with the
>> progress bar where we could elect to trace/show some of that info, and
>> then you could turn on some trace2 mode/verbose progress to see more.
>
> I do wonder if this really needs to be part of the progress bar. The
> goal of the progress bar is to give the user a sense that work is
> happening, and (if possible, but not for "enumerating") an idea of when
> it might finish. If the trace code can already do detailed timings, then
> shouldn't we just be encouraging people to use that?

To just show work happening we could save ourselves some horizontal
space and the debates over counting v.s. enumerating with:

     diff --git a/progress.c b/progress.c
     index 0318bdd41b..83336ca391 100644
     --- a/progress.c
     +++ b/progress.c
     @@ -226,3 +226,3 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
             struct progress *progress = xmalloc(sizeof(*progress));
     -       progress->title = title;
     +       progress->title = "Reticulating splines";
             progress->total = total;

:)

Obviously that's silly, but the point is we do show some user messaging
with these now, and e.g. the other day here on-list (can't be bothered
to find the msgid) someone was lamenting that the N progressbars we show
on "push" were too verbose.

So by coalescing some of the existing bars that do one logical operation
(push) in N steps we could be less verbose without losing the "stuff's
happening" part of it, and would see if something odd was going on,
e.g. the "I/O write" part being proportionally slower on this box than
the other, or when they upgrade bitmaps suddenly showing up as >95% of
the time.

The bit I find interesting about tying it into trace2 is that once you
do that the trace logs can contain e.g. min/max/avg/median/percentile
time for doing some operation we can break into N steps same/similar
steps, which might be interesting for performance analysis.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04 13:23                           ` SZEDER Gábor
@ 2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
  2019-05-09  4:24                               ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-08 20:17 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, Junio C Hamano, Eric Wong, git, Derrick Stolee


On Sat, May 04 2019, SZEDER Gábor wrote:

> On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> As an aside this is the Nth time I notice how crappy that "Enumerating
>> objects" progress bar is.
>
> And don't forget that the commit-graph progress bar still prints
> nonsense :)
>
>   https://public-inbox.org/git/87ef6ydds8.fsf@evledraar.gmail.com/

I forgot for a bit, but then figured I'd get out of Derrick's way with
his more significant commit-graph work (which would have inevitably had
merge conflicts with this patch), and look at the time. Here we are
coming up on rc0.

I'm inclined to just wait until Derrick's refactorings land post-2.22.0
unless a) we need it enough before 2.22.0 to cause him trouble b) Junio
agrees with "a" and would take such a patch to fix this part of the
progress output before 2.22.0.

If people (just you would do) tell me "yes I/we want it" and Junio says
"yeah I'll do that" I'll write this patch in the next couple of days,
otherwise I'll do other stuff.

Junio?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
@ 2019-05-08 22:25                                   ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-05-08 22:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee

On Wed, May 08, 2019 at 06:13:58PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I do wonder if this really needs to be part of the progress bar. The
> > goal of the progress bar is to give the user a sense that work is
> > happening, and (if possible, but not for "enumerating") an idea of when
> > it might finish. If the trace code can already do detailed timings, then
> > shouldn't we just be encouraging people to use that?
> [...]
>      -       progress->title = title;
>      +       progress->title = "Reticulating splines";

Heh, OK, that's a fair reductio ad absurdum of my point. I guess what I
was trying to say is that we don't need to go overboard with accuracy as
long as we're giving a vague sense of the work. Precision measurements
can be handled through a different system.

But I don't mind if you can find a way to do these kind of cascaded
progress meters in a way that is pleasing to the eye and not hard to
handle in the callers of the code.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
@ 2019-05-09  4:24                               ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-05-09  4:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: SZEDER Gábor, Jeff King, Eric Wong, git, Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sat, May 04 2019, SZEDER Gábor wrote:
> ...
>> And don't forget that the commit-graph progress bar still prints
>> nonsense :)
>
> I'm inclined to just wait until Derrick's refactorings land post-2.22.0

Fine by me.  Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, back to index

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
2019-02-14 10:54   ` Eric Sunshine
2019-02-14 11:07     ` Jeff King
2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
2019-03-10 23:39     ` Jeff King
2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
2019-03-12 10:49         ` Jeff King
2019-03-12 12:05           ` Jeff King
2019-03-13  1:51           ` Eric Wong
2019-03-13 14:54             ` Jeff King
2019-03-14  9:12               ` [PATCH v3] " Eric Wong
2019-03-14 16:02                 ` Jeff King
2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
2019-03-15 13:25                       ` SZEDER Gábor
2019-03-15 18:36                         ` Jeff King
2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
2019-04-10 22:57                   ` Jeff King
2019-04-25  7:16                     ` Junio C Hamano
2019-05-04  1:37                       ` Jeff King
2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
2019-05-04 13:23                           ` SZEDER Gábor
2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
2019-05-09  4:24                               ` Junio C Hamano
2019-05-07  7:45                           ` Jeff King
2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
2019-05-08  7:11                               ` Jeff King
2019-05-08 14:20                                 ` Derrick Stolee
2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
2019-05-08 22:25                                   ` Jeff King
2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
2019-04-18 19:49     ` Jeff King
2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
2019-04-20  1:01         ` Derrick Stolee
2019-04-20  3:24           ` Jeff King
2019-04-20 21:01             ` Derrick Stolee
2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox