git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* [PATCH 0/4] Speed up repacking when lots of pack-kept objects
@ 2019-03-12 13:18 Nathaniel Filardo
  2019-03-12 13:18 ` [PATCH 1/4] count-objects: report statistics about kept packs Nathaniel Filardo
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Nathaniel Filardo @ 2019-03-12 13:18 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Nathaniel Filardo

This patch series improves handling of very large repositories, as generated
by, for example, bup (https://github.com/bup/bup).  Prolonged operation
thereof creates quite a lot of small pack files; repacking improves
filesystem performance of the objects/pack directory, but is quite
expensive, in terms of time and memory.  We have adopted a strategy that
marks "large" (tens of GB) of pack files as "kept" and defers repacking
until there are enough un-kept packs or enough bytes of un-kept objects.
(The first patch in the series will make our accounting easier, replacing
some terrible shell scripting with grep.)

While this strategy has generally improved our lives relative to either
extreme (not repacking, or repacking after every bup save operation), it
still leaves a good bit to be desired.  Because our packs are marked as
kept, repacking will leave the objects therein alone, but it still must
instantiate in memory and walk the entire object graph.  However, because
our kept packs are transitively closed, such that an object in one
necessarily references only objects in other kept packs, we should like to
avoid reasoning about them more or less altogether.

This series attempts to do just that.  The middle patches are just some
groundwork for the last patch, which carries the punch line.  This last
patch adds an option to builtin/repack to enumerate commit and tree objects
within kept packs as UNINTERESTING to its spawned builtin/pack-objects
command.  Together with inducing the use of sparse reachability, this speeds
enumerating candidate objects for repacking and thereby substantially
reduces the runtime of our repack operations, while producing identical
results.

I am, however, rather a novice when it comes to git internals, so any and
all feedback is quite welcome.

Nathaniel Filardo (4):
  count-objects: report statistics about kept packs
  revision walk: optionally use sparse reachability
  repack: add --sparse and pass to pack-objects
  repack: optionally assume transitive kept packs

 Documentation/git-gc.txt         |  5 +++
 Documentation/git-repack.txt     | 25 +++++++++++++
 bisect.c                         |  2 +-
 blame.c                          |  2 +-
 builtin/checkout.c               |  2 +-
 builtin/commit.c                 |  2 +-
 builtin/count-objects.c          | 17 ++++++++-
 builtin/describe.c               |  2 +-
 builtin/fast-export.c            |  2 +-
 builtin/fmt-merge-msg.c          |  2 +-
 builtin/gc.c                     |  5 +++
 builtin/log.c                    | 10 ++---
 builtin/merge.c                  |  2 +-
 builtin/pack-objects.c           |  4 +-
 builtin/repack.c                 | 64 +++++++++++++++++++++++++++++++-
 builtin/rev-list.c               |  2 +-
 builtin/shortlog.c               |  2 +-
 bundle.c                         |  2 +-
 http-push.c                      |  2 +-
 merge-recursive.c                |  2 +-
 pack-bitmap-write.c              |  2 +-
 pack-bitmap.c                    |  4 +-
 reachable.c                      |  4 +-
 ref-filter.c                     |  2 +-
 remote.c                         |  2 +-
 revision.c                       | 10 +++--
 revision.h                       |  2 +-
 sequencer.c                      |  6 +--
 shallow.c                        |  2 +-
 submodule.c                      |  4 +-
 t/helper/test-revision-walking.c |  2 +-
 31 files changed, 154 insertions(+), 42 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] count-objects: report statistics about kept packs
  2019-03-12 13:18 [PATCH 0/4] Speed up repacking when lots of pack-kept objects Nathaniel Filardo
@ 2019-03-12 13:18 ` Nathaniel Filardo
  2019-03-12 13:18 ` [PATCH 2/4] revision walk: optionally use sparse reachability Nathaniel Filardo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Nathaniel Filardo @ 2019-03-12 13:18 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Nathaniel Filardo

Specifically: number of kept packs, size of kept packs (and indexes),
and number of objects in kept packs.

Signed-off-by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
---
 builtin/count-objects.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 3fae474f6f..0309c7907f 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -117,17 +117,24 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 
 	if (verbose) {
 		struct packed_git *p;
-		unsigned long num_pack = 0;
-		off_t size_pack = 0;
+		unsigned long num_pack = 0, num_pack_keep = 0;
+		unsigned long packed_keep = 0;
+		off_t size_pack = 0, size_pack_keep = 0;
 		struct strbuf loose_buf = STRBUF_INIT;
 		struct strbuf pack_buf = STRBUF_INIT;
 		struct strbuf garbage_buf = STRBUF_INIT;
+		struct strbuf pack_keep_buf = STRBUF_INIT;
 
 		for (p = get_all_packs(the_repository); p; p = p->next) {
 			if (!p->pack_local)
 				continue;
 			if (open_pack_index(p))
 				continue;
+			if (p->pack_keep || p->pack_keep_in_core) {
+				packed_keep += p->num_objects;
+				size_pack_keep += p->pack_size + p->index_size;
+				num_pack_keep++;
+			}
 			packed += p->num_objects;
 			size_pack += p->pack_size + p->index_size;
 			num_pack++;
@@ -136,12 +143,15 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		if (human_readable) {
 			strbuf_humanise_bytes(&loose_buf, loose_size);
 			strbuf_humanise_bytes(&pack_buf, size_pack);
+			strbuf_humanise_bytes(&pack_keep_buf, size_pack_keep);
 			strbuf_humanise_bytes(&garbage_buf, size_garbage);
 		} else {
 			strbuf_addf(&loose_buf, "%lu",
 				    (unsigned long)(loose_size / 1024));
 			strbuf_addf(&pack_buf, "%lu",
 				    (unsigned long)(size_pack / 1024));
+			strbuf_addf(&pack_keep_buf, "%lu",
+				    (unsigned long)(size_pack_keep / 1024));
 			strbuf_addf(&garbage_buf, "%lu",
 				    (unsigned long)(size_garbage / 1024));
 		}
@@ -149,8 +159,11 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		printf("count: %lu\n", loose);
 		printf("size: %s\n", loose_buf.buf);
 		printf("in-pack: %lu\n", packed);
+		printf("in-pack-keep: %lu\n", packed_keep);
 		printf("packs: %lu\n", num_pack);
+		printf("packs-keep: %lu\n", num_pack_keep);
 		printf("size-pack: %s\n", pack_buf.buf);
+		printf("size-pack-keep: %s\n", pack_keep_buf.buf);
 		printf("prune-packable: %lu\n", packed_loose);
 		printf("garbage: %lu\n", garbage);
 		printf("size-garbage: %s\n", garbage_buf.buf);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/4] revision walk: optionally use sparse reachability
  2019-03-12 13:18 [PATCH 0/4] Speed up repacking when lots of pack-kept objects Nathaniel Filardo
  2019-03-12 13:18 ` [PATCH 1/4] count-objects: report statistics about kept packs Nathaniel Filardo
@ 2019-03-12 13:18 ` Nathaniel Filardo
  2019-03-12 13:59   ` Derrick Stolee
  2019-03-12 13:18 ` [PATCH 3/4] repack: add --sparse and pass to pack-objects Nathaniel Filardo
  2019-03-12 13:18 ` [PATCH 4/4] repack: optionally assume transitive kept packs Nathaniel Filardo
  3 siblings, 1 reply; 8+ messages in thread
From: Nathaniel Filardo @ 2019-03-12 13:18 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Nathaniel Filardo

The only caller that passes a non-zero value to prepare_revision_walk
after this patch is builtin/pack-objects.  Without this, sparsity seems
to do little good therein, as prepare_revision_walk will densely
propagate UNINTERESTING flags from trees to tree contents, before
mark_edges_uninteresting has a chance to be faster by being sparse.

Signed-off-by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
---
 bisect.c                         |  2 +-
 blame.c                          |  2 +-
 builtin/checkout.c               |  2 +-
 builtin/commit.c                 |  2 +-
 builtin/describe.c               |  2 +-
 builtin/fast-export.c            |  2 +-
 builtin/fmt-merge-msg.c          |  2 +-
 builtin/log.c                    | 10 +++++-----
 builtin/merge.c                  |  2 +-
 builtin/pack-objects.c           |  4 ++--
 builtin/rev-list.c               |  2 +-
 builtin/shortlog.c               |  2 +-
 bundle.c                         |  2 +-
 http-push.c                      |  2 +-
 merge-recursive.c                |  2 +-
 pack-bitmap-write.c              |  2 +-
 pack-bitmap.c                    |  4 ++--
 reachable.c                      |  4 ++--
 ref-filter.c                     |  2 +-
 remote.c                         |  2 +-
 revision.c                       | 10 ++++++----
 revision.h                       |  2 +-
 sequencer.c                      |  6 +++---
 shallow.c                        |  2 +-
 submodule.c                      |  4 ++--
 t/helper/test-revision-walking.c |  2 +-
 26 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/bisect.c b/bisect.c
index 3af955c4bc..47f8e3a7cc 100644
--- a/bisect.c
+++ b/bisect.c
@@ -655,7 +655,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 
 static void bisect_common(struct rev_info *revs)
 {
-	if (prepare_revision_walk(revs))
+	if (prepare_revision_walk(revs, 0))
 		die("revision walk setup failed");
 	if (revs->tree_objects)
 		mark_edges_uninteresting(revs, NULL, 0);
diff --git a/blame.c b/blame.c
index 5c07dec190..13270ba07b 100644
--- a/blame.c
+++ b/blame.c
@@ -1833,7 +1833,7 @@ void setup_scoreboard(struct blame_scoreboard *sb,
 	 * bottom commits we would reach while traversing as
 	 * uninteresting.
 	 */
-	if (prepare_revision_walk(sb->revs))
+	if (prepare_revision_walk(sb->revs, 0))
 		die(_("revision walk setup failed"));
 
 	if (sb->reverse && sb->revs->first_parent_only) {
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0e6037b296..8d6773cbaf 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -997,7 +997,7 @@ static void orphaned_commit_warning(struct commit *old_commit, struct commit *ne
 	for_each_ref(add_pending_uninteresting_ref, &revs);
 	add_pending_oid(&revs, "HEAD", &new_commit->object.oid, UNINTERESTING);
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die(_("internal error in revision walk"));
 	if (!(old_commit->object.flags & UNINTERESTING))
 		suggest_reattach(old_commit, &revs);
diff --git a/builtin/commit.c b/builtin/commit.c
index f17537474a..5072e1c85e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1001,7 +1001,7 @@ static const char *find_author_by_nickname(const char *name)
 	revs.mailmap = &mailmap;
 	read_mailmap(revs.mailmap, NULL);
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die(_("revision walk setup failed"));
 	commit = get_revision(&revs);
 	if (commit) {
diff --git a/builtin/describe.c b/builtin/describe.c
index 1409cedce2..0e0d9c5173 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -493,7 +493,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 	if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
 		BUG("setup_revisions could not handle all args?");
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die("revision walk setup failed");
 
 	traverse_commit_list(&revs, process_commit, process_object, &pcd);
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9e283482ef..7962b4a96c 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1148,7 +1148,7 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 
 	get_tags_and_duplicates(&revs.cmdline);
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die("revision walk setup failed");
 	revs.diffopt.format_callback = show_filemodify;
 	revs.diffopt.format_callback_data = &paths_of_changed_objects;
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index a4615587fd..d38af04e0c 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -356,7 +356,7 @@ static void shortlog(const char *name,
 	add_pending_object(rev, branch, name);
 	add_pending_object(rev, &head->object, "^HEAD");
 	head->object.flags |= UNINTERESTING;
-	if (prepare_revision_walk(rev))
+	if (prepare_revision_walk(rev, 0))
 		die("revision walk setup failed");
 	while ((commit = get_revision(rev)) != NULL) {
 		struct pretty_print_context ctx = {0};
diff --git a/builtin/log.c b/builtin/log.c
index ab859f5904..e2f12a88bb 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -378,7 +378,7 @@ static int cmd_log_walk(struct rev_info *rev)
 	if (rev->early_output)
 		setup_early_output(rev);
 
-	if (prepare_revision_walk(rev))
+	if (prepare_revision_walk(rev, 0))
 		die(_("revision walk setup failed"));
 
 	if (rev->early_output)
@@ -936,7 +936,7 @@ static void get_patch_ids(struct rev_info *rev, struct patch_ids *ids)
 	o2->flags ^= UNINTERESTING;
 	add_pending_object(&check_rev, o1, "o1");
 	add_pending_object(&check_rev, o2, "o2");
-	if (prepare_revision_walk(&check_rev))
+	if (prepare_revision_walk(&check_rev, 0))
 		die(_("revision walk setup failed"));
 
 	while ((commit = get_revision(&check_rev)) != NULL) {
@@ -1424,7 +1424,7 @@ static void prepare_bases(struct base_tree_info *bases,
 	base->object.flags |= UNINTERESTING;
 	add_pending_object(&revs, &base->object, "base");
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die(_("revision walk setup failed"));
 	/*
 	 * Traverse the commits list, get prerequisite patch ids
@@ -1800,7 +1800,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		get_patch_ids(&rev, &ids);
 	}
 
-	if (prepare_revision_walk(&rev))
+	if (prepare_revision_walk(&rev, 0))
 		die(_("revision walk setup failed"));
 	rev.boundary = 1;
 	while ((commit = get_revision(&rev)) != NULL) {
@@ -2090,7 +2090,7 @@ int cmd_cherry(int argc, const char **argv, const char *prefix)
 		die(_("Unknown commit %s"), limit);
 
 	/* reverse the list of commits */
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die(_("revision walk setup failed"));
 	while ((commit = get_revision(&revs)) != NULL) {
 		commit_list_insert(commit, &list);
diff --git a/builtin/merge.c b/builtin/merge.c
index 5ce8946d39..1fbb4069cf 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -406,7 +406,7 @@ static void squash_message(struct commit *commit, struct commit_list *remotehead
 		add_pending_object(&rev, &j->item->object, NULL);
 
 	setup_revisions(0, NULL, &rev, NULL);
-	if (prepare_revision_walk(&rev))
+	if (prepare_revision_walk(&rev, 0))
 		die(_("revision walk setup failed"));
 
 	ctx.abbrev = rev.abbrev;
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a154fc29f6..c4ff6f79af 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3134,7 +3134,7 @@ static void get_object_list(int ac, const char **av)
 	if (use_delta_islands)
 		load_delta_islands(the_repository);
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, sparse))
 		die(_("revision walk setup failed"));
 	mark_edges_uninteresting(&revs, show_edge, sparse);
 
@@ -3149,7 +3149,7 @@ static void get_object_list(int ac, const char **av)
 		if (add_unseen_recent_objects_to_traversal(&revs,
 				unpack_unreachable_expiration))
 			die(_("unable to add recent objects"));
-		if (prepare_revision_walk(&revs))
+		if (prepare_revision_walk(&revs, sparse))
 			die(_("revision walk setup failed"));
 		traverse_commit_list(&revs, record_recent_commit,
 				     record_recent_object, NULL);
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b5b6dbb1c..f192bfdce8 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -543,7 +543,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die("revision walk setup failed");
 	if (revs.tree_objects)
 		mark_edges_uninteresting(&revs, show_edge, 0);
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 65cd41392c..8a236aa317 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -188,7 +188,7 @@ static void get_from_rev(struct rev_info *rev, struct shortlog *log)
 {
 	struct commit *commit;
 
-	if (prepare_revision_walk(rev))
+	if (prepare_revision_walk(rev, 0))
 		die(_("revision walk setup failed"));
 	while ((commit = get_revision(rev)) != NULL)
 		shortlog_add_commit(log, commit);
diff --git a/bundle.c b/bundle.c
index b45666c49b..b1ffa02a70 100644
--- a/bundle.c
+++ b/bundle.c
@@ -160,7 +160,7 @@ int verify_bundle(struct repository *r,
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die(_("revision walk setup failed"));
 
 	i = req_nr;
diff --git a/http-push.c b/http-push.c
index b22c7caea0..ba9416f2e0 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1931,7 +1931,7 @@ int cmd_main(int argc, const char **argv)
 
 		/* Generate a list of objects that need to be pushed */
 		pushing = 0;
-		if (prepare_revision_walk(&revs))
+		if (prepare_revision_walk(&revs, 0))
 			die("revision walk setup failed");
 		mark_edges_uninteresting(&revs, NULL, 0);
 		objects_to_send = get_delta(&revs, ref_lock);
diff --git a/merge-recursive.c b/merge-recursive.c
index 6c40c61c47..adf218f66b 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1141,7 +1141,7 @@ static int find_first_merges(struct repository *repo,
 	setup_revisions(ARRAY_SIZE(rev_args)-1, rev_args, &revs, &rev_opts);
 
 	/* save all revisions from the above list that contain b */
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die("revision walk setup failed");
 	while ((commit = get_revision(&revs)) != NULL) {
 		struct object *o = &(commit->object);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5566e94abe..e317550020 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -296,7 +296,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 			add_pending_object(&revs, object, "");
 			revs.include_check_data = base;
 
-			if (prepare_revision_walk(&revs))
+			if (prepare_revision_walk(&revs, 0))
 				die("revision walk setup failed");
 
 			traverse_commit_list(&revs, show_commit, show_object, base);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4695aaf6b4..21bbbab063 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -584,7 +584,7 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		revs->include_check = should_include;
 		revs->include_check_data = &incdata;
 
-		if (prepare_revision_walk(revs))
+		if (prepare_revision_walk(revs, 0))
 			die("revision walk setup failed");
 
 		show_data.bitmap_git = bitmap_git;
@@ -987,7 +987,7 @@ void test_bitmap_walk(struct rev_info *revs)
 
 	result_popcnt = bitmap_popcount(result);
 
-	if (prepare_revision_walk(revs))
+	if (prepare_revision_walk(revs, 0))
 		die("revision walk setup failed");
 
 	tdata.bitmap_git = bitmap_git;
diff --git a/reachable.c b/reachable.c
index 0d00a91de4..0339adc074 100644
--- a/reachable.c
+++ b/reachable.c
@@ -234,7 +234,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	 * Set up the revision walk - this will move all commits
 	 * from the pending list to the commit walking list.
 	 */
-	if (prepare_revision_walk(revs))
+	if (prepare_revision_walk(revs, 0))
 		die("revision walk setup failed");
 	traverse_commit_list(revs, mark_commit, mark_object, &cp);
 
@@ -242,7 +242,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 		revs->ignore_missing_links = 1;
 		if (add_unseen_recent_objects_to_traversal(revs, mark_recent))
 			die("unable to mark recent objects");
-		if (prepare_revision_walk(revs))
+		if (prepare_revision_walk(revs, 0))
 			die("revision walk setup failed");
 		traverse_commit_list(revs, mark_commit, mark_object, &cp);
 	}
diff --git a/ref-filter.c b/ref-filter.c
index 3aca105307..a5e16f3b80 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2071,7 +2071,7 @@ static void do_merge_filter(struct ref_filter_cbdata *ref_cbdata)
 	add_pending_object(&revs, &filter->merge_commit->object, "");
 
 	revs.limited = 1;
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die(_("revision walk setup failed"));
 
 	old_nr = array->nr;
diff --git a/remote.c b/remote.c
index 9cc3b07d21..4ddd93d5c7 100644
--- a/remote.c
+++ b/remote.c
@@ -1944,7 +1944,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
 
 	repo_init_revisions(the_repository, &revs, NULL);
 	setup_revisions(argv.argc, argv.argv, &revs, NULL);
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die(_("revision walk setup failed"));
 
 	/* ... and count the commits on each side. */
diff --git a/revision.c b/revision.c
index eb8e51bc63..d108fe49bb 100644
--- a/revision.c
+++ b/revision.c
@@ -389,7 +389,8 @@ void add_pending_oid(struct rev_info *revs, const char *name,
 }
 
 static struct commit *handle_commit(struct rev_info *revs,
-				    struct object_array_entry *entry)
+				    struct object_array_entry *entry,
+				    int sparse)
 {
 	struct object *object = entry->item;
 	const char *name = entry->name;
@@ -456,7 +457,8 @@ static struct commit *handle_commit(struct rev_info *revs,
 		if (!revs->tree_objects)
 			return NULL;
 		if (flags & UNINTERESTING) {
-			mark_tree_contents_uninteresting(revs->repo, tree);
+			if (!sparse)
+				mark_tree_contents_uninteresting(revs->repo, tree);
 			return NULL;
 		}
 		add_pending_object_with_path(revs, object, name, mode, path);
@@ -3289,7 +3291,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 	}
 }
 
-int prepare_revision_walk(struct rev_info *revs)
+int prepare_revision_walk(struct rev_info *revs, int sparse)
 {
 	int i;
 	struct object_array old_pending;
@@ -3301,7 +3303,7 @@ int prepare_revision_walk(struct rev_info *revs)
 	revs->pending.objects = NULL;
 	for (i = 0; i < old_pending.nr; i++) {
 		struct object_array_entry *e = old_pending.objects + i;
-		struct commit *commit = handle_commit(revs, e);
+		struct commit *commit = handle_commit(revs, e, sparse);
 		if (commit) {
 			if (!(commit->object.flags & SEEN)) {
 				commit->object.flags |= SEEN;
diff --git a/revision.h b/revision.h
index 4134dc6029..2e921c515a 100644
--- a/revision.h
+++ b/revision.h
@@ -320,7 +320,7 @@ int handle_revision_arg(const char *arg, struct rev_info *revs,
 			int flags, unsigned revarg_opt);
 
 void reset_revision_walk(void);
-int prepare_revision_walk(struct rev_info *revs);
+int prepare_revision_walk(struct rev_info *revs, int sparse);
 struct commit *get_revision(struct rev_info *revs);
 char *get_revision_mark(const struct rev_info *revs,
 			const struct commit *commit);
diff --git a/sequencer.c b/sequencer.c
index 95dda23eee..df32dc1e71 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1990,7 +1990,7 @@ static int prepare_revs(struct replay_opts *opts)
 	if (opts->action == REPLAY_PICK && !opts->revs->no_walk)
 		opts->revs->reverse ^= 1;
 
-	if (prepare_revision_walk(opts->revs))
+	if (prepare_revision_walk(opts->revs, 0))
 		return error(_("revision walk setup failed"));
 
 	return 0;
@@ -4062,7 +4062,7 @@ int sequencer_pick_revisions(struct repository *r,
 	    opts->revs->no_walk &&
 	    !opts->revs->cmdline.rev->flags) {
 		struct commit *cmit;
-		if (prepare_revision_walk(opts->revs))
+		if (prepare_revision_walk(opts->revs, 0))
 			return error(_("revision walk setup failed"));
 		cmit = get_revision(opts->revs);
 		if (!cmit)
@@ -4529,7 +4529,7 @@ int sequencer_make_script(struct repository *r, FILE *out,
 	if (setup_revisions(argc, argv, &revs, NULL) > 1)
 		return error(_("make_script: unhandled options"));
 
-	if (prepare_revision_walk(&revs) < 0)
+	if (prepare_revision_walk(&revs, 0) < 0)
 		return error(_("make_script: error preparing revisions"));
 
 	if (rebase_merges)
diff --git a/shallow.c b/shallow.c
index ce45297940..e9f3886a59 100644
--- a/shallow.c
+++ b/shallow.c
@@ -196,7 +196,7 @@ struct commit_list *get_shallow_commits_by_rev_list(int ac, const char **av,
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, &revs, NULL);
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(&revs, 0))
 		die("revision walk setup failed");
 	traverse_commit_list(&revs, show_commit, NULL, &not_shallow_list);
 
diff --git a/submodule.c b/submodule.c
index 21cf50ca15..2b18f888ca 100644
--- a/submodule.c
+++ b/submodule.c
@@ -456,7 +456,7 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path,
 		add_pending_object(rev, &list->item->object,
 			oid_to_hex(&list->item->object.oid));
 	}
-	return prepare_revision_walk(rev);
+	return prepare_revision_walk(rev, 0);
 }
 
 static void print_submodule_summary(struct repository *r, struct rev_info *rev, struct diff_options *o)
@@ -843,7 +843,7 @@ static void collect_changed_submodules(struct repository *r,
 
 	repo_init_revisions(r, &rev, NULL);
 	setup_revisions(argv->argc, argv->argv, &rev, NULL);
-	if (prepare_revision_walk(&rev))
+	if (prepare_revision_walk(&rev, 0))
 		die("revision walk setup failed");
 
 	while ((commit = get_revision(&rev))) {
diff --git a/t/helper/test-revision-walking.c b/t/helper/test-revision-walking.c
index 625b2dbf82..dc1ff31d5c 100644
--- a/t/helper/test-revision-walking.c
+++ b/t/helper/test-revision-walking.c
@@ -34,7 +34,7 @@ static int run_revision_walk(void)
 
 	repo_init_revisions(the_repository, &rev, NULL);
 	setup_revisions(argc, argv, &rev, NULL);
-	if (prepare_revision_walk(&rev))
+	if (prepare_revision_walk(&rev, 0))
 		die("revision walk setup failed");
 
 	while ((commit = get_revision(&rev)) != NULL) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/4] repack: add --sparse and pass to pack-objects
  2019-03-12 13:18 [PATCH 0/4] Speed up repacking when lots of pack-kept objects Nathaniel Filardo
  2019-03-12 13:18 ` [PATCH 1/4] count-objects: report statistics about kept packs Nathaniel Filardo
  2019-03-12 13:18 ` [PATCH 2/4] revision walk: optionally use sparse reachability Nathaniel Filardo
@ 2019-03-12 13:18 ` Nathaniel Filardo
  2019-03-12 13:47   ` Derrick Stolee
  2019-03-12 13:18 ` [PATCH 4/4] repack: optionally assume transitive kept packs Nathaniel Filardo
  3 siblings, 1 reply; 8+ messages in thread
From: Nathaniel Filardo @ 2019-03-12 13:18 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Nathaniel Filardo

The sparse connectivity algorithm saves a whole lot of time when there
are UNINTERESTING trees around.
---
 Documentation/git-repack.txt |  4 ++++
 builtin/repack.c             | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index aa0cc8bd44..836d81457a 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -165,6 +165,10 @@ depth is 4095.
 	Pass the `--delta-islands` option to `git-pack-objects`, see
 	linkgit:git-pack-objects[1].
 
+--sparse::
+	Pass the `--sparse` option to `git-pack-objects`; see
+	linkgit:git-pack-objects[1].
+
 Configuration
 -------------
 
diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..71e715b594 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -17,6 +17,7 @@ static int pack_kept_objects = -1;
 static int write_bitmaps;
 static int use_delta_islands;
 static char *packdir, *packtmp;
+static int sparse;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [<options>]"),
@@ -48,6 +49,10 @@ static int repack_config(const char *var, const char *value, void *cb)
 		use_delta_islands = git_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "pack.usesparse")) {
+		sparse = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, cb);
 }
 
@@ -326,11 +331,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("maximum size of each packfile")),
 		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
 				N_("repack objects in packs marked with .keep")),
+		OPT_BOOL(0, "sparse", &sparse,
+			 N_("use the sparse reachability algorithm")),
 		OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"),
 				N_("do not repack this pack")),
 		OPT_END()
 	};
 
+	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", 0);
 	git_config(repack_config, NULL);
 
 	argc = parse_options(argc, argv, prefix, builtin_repack_options,
@@ -366,6 +374,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--all");
 	argv_array_push(&cmd.args, "--reflog");
 	argv_array_push(&cmd.args, "--indexed-objects");
+	if (sparse)
+		argv_array_push(&cmd.args, "--sparse");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
 	if (write_bitmaps)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 4/4] repack: optionally assume transitive kept packs
  2019-03-12 13:18 [PATCH 0/4] Speed up repacking when lots of pack-kept objects Nathaniel Filardo
                   ` (2 preceding siblings ...)
  2019-03-12 13:18 ` [PATCH 3/4] repack: add --sparse and pass to pack-objects Nathaniel Filardo
@ 2019-03-12 13:18 ` Nathaniel Filardo
  3 siblings, 0 replies; 8+ messages in thread
From: Nathaniel Filardo @ 2019-03-12 13:18 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Nathaniel Filardo

If the user is careful to mark .pack files as kept only when they refer
to (other) kept packs, then we can rely on this when walking the object
graph in subsequent repack operations and reduce the time and memory
spent building the object graph.

Towards that end, then, teach git repack to enumerate the non-BLOBs
objects (the tree and commits and tags) we're skipping because they're
in kept packs and mark them as UNINTERESTING for pack-object's
operation.  Because this creates UNINTERESTING marks, we make repack's
--assume-pack-keep-transitive imply --sparse for pack-object.

In testing with a 203GB repository with 80M objects, this amounts to
several gigabytes of memory and over ten minutes saved.  This machine
has 32G of RAM and the repository is on a fast SSD to speed testing.

All told, this test repository takes about 33 minutes elapsed (28
minutes in user code) and 3 GB of RAM to repack 12M objects occupying
33GB scattered across 477 pack files not marked as kept (the remainder
of the objects are spread across three kept pack files).  The time and
RAM usage with --assume-pack-keep-transitive should be dominated by
factors proportional to the size and number of un-kept objects.

Without these optimizations, it takes about 45 minutes (34 minutes in
user code) and 7 GB of RAM to compute the same pack file.  The extra
time and total memory use are expected to be proportional to the kept
packs' size, not the working set of the repack.  The time discrepancy
can be expected to be more dramatic when the repository is on spinning
rust rather than SSD.

Signed-off-by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
---
 Documentation/git-gc.txt     |  5 ++++
 Documentation/git-repack.txt | 21 ++++++++++++++
 builtin/gc.c                 |  5 ++++
 builtin/repack.c             | 56 ++++++++++++++++++++++++++++++++++--
 4 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index a7c1b0f60e..7115564f7d 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -96,6 +96,11 @@ be performed as well.
 	`.keep` files are consolidated into a single pack. When this
 	option is used, `gc.bigPackThreshold` is ignored.
 
+--assume-pack-keep-transitive::
+	Pass the `--assume-pack-keep-transitive` option to `git-repack`;
+	see linkgit:git-repack[1].
+
+
 CONFIGURATION
 -------------
 
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 836d81457a..014812c4bb 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -169,6 +169,27 @@ depth is 4095.
 	Pass the `--sparse` option to `git-pack-objects`; see
 	linkgit:git-pack-objects[1].
 
+--assume-pack-keep-transitive::
+	This flag accelerates the search for objects to pack by assuming
+	that commit objects found in kept packfiles make reference only
+	to that or other kept packfiles.  This is very useful if, for
+	example, this repository periodically repacks all reachable objects
+	and marks the resulting pack file as kept.
+
+	Because this option may prevent git from descending into kept packs,
+	no objects inside kept packs will be available for delta processing.
+	One should therefore restrict the use of this option to situations
+	where delta processing is disabled or when relatively large amounts
+	of data are considered and the relative gain of a wider set of
+	possible delta base objects is reduced.
+
+	The simplest way to ensure transitivity of kept packs is to run `git
+	repack` with `-a` (or `-A`) and `-d` and mark all resulting packs as
+	kept, never removing the kept marker.  In practice, one may wish to
+	wait to mark packs as kept until they are sufficiently large
+	(reducing the number of pack files necessary for a given set of
+	objects) but not so large as to be unwieldy.
+
 Configuration
 -------------
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..1346afb3b1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -49,6 +49,7 @@ static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 static unsigned long big_pack_threshold;
 static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
+static int assume_pack_keep_transitive;
 
 static struct argv_array pack_refs_cmd = ARGV_ARRAY_INIT;
 static struct argv_array reflog = ARGV_ARRAY_INIT;
@@ -526,6 +527,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			   PARSE_OPT_NOCOMPLETE),
 		OPT_BOOL(0, "keep-largest-pack", &keep_base_pack,
 			 N_("repack all other packs except the largest pack")),
+		OPT_BOOL(0, "assume-pack-keep-transitive", &assume_pack_keep_transitive,
+			 N_("assume kept packs reference only kept packs")),
 		OPT_END()
 	};
 
@@ -564,6 +567,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	}
 	if (quiet)
 		argv_array_push(&repack, "-q");
+	if (assume_pack_keep_transitive)
+		argv_array_push(&repack, "--assume-pack-keep-transitive");
 
 	if (auto_gc) {
 		/*
diff --git a/builtin/repack.c b/builtin/repack.c
index 71e715b594..b2b407d42c 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -11,6 +11,8 @@
 #include "midx.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "revision.h"
+#include "list-objects.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -261,6 +263,30 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
 		die(_("could not finish pack-objects to repack promisor objects"));
 }
 
+static void apkt_show_commit(struct commit *commit, void *data)
+{
+	struct tree *tree;
+	struct pack_entry e;
+
+	if (!find_pack_entry(the_repository, &commit->object.oid, &e))
+		return;
+
+	if (!(e.p->pack_keep || e.p->pack_keep_in_core))
+		return;
+
+	tree = get_commit_tree(commit);
+	if (tree)
+		tree->object.flags |= UNINTERESTING;
+
+	write_oid(&commit->object.oid, e.p, 0, data);
+	write_oid(&tree->object.oid, NULL, 0, data);
+}
+
+static void apkt_show_object(struct object *obj, const char *name, void *data)
+{
+	return;
+}
+
 #define ALL_INTO_ONE 1
 #define LOOSEN_UNREACHABLE 2
 
@@ -292,6 +318,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	int no_update_server_info = 0;
 	int midx_cleared = 0;
+	int assume_pack_keep_transitive = 0;
 	struct pack_objects_args po_args = {NULL};
 
 	struct option builtin_repack_options[] = {
@@ -333,6 +360,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("repack objects in packs marked with .keep")),
 		OPT_BOOL(0, "sparse", &sparse,
 			 N_("use the sparse reachability algorithm")),
+		OPT_BOOL(0, "assume-pack-keep-transitive", &assume_pack_keep_transitive,
+				N_("assume kept packs reference only kept packs")),
 		OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"),
 				N_("do not repack this pack")),
 		OPT_END()
@@ -351,6 +380,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
 		die(_("--keep-unreachable and -A are incompatible"));
 
+	if (assume_pack_keep_transitive) {
+		/* imply --honor-pack-keep */
+		pack_kept_objects = 0;
+	}
+
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps;
 
@@ -374,7 +408,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--all");
 	argv_array_push(&cmd.args, "--reflog");
 	argv_array_push(&cmd.args, "--indexed-objects");
-	if (sparse)
+	if (sparse || assume_pack_keep_transitive)
 		argv_array_push(&cmd.args, "--sparse");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
@@ -409,12 +443,30 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		argv_array_push(&cmd.args, "--incremental");
 	}
 
-	cmd.no_stdin = 1;
+	cmd.in = -1;
+	cmd.no_stdin = !assume_pack_keep_transitive;
 
 	ret = start_command(&cmd);
 	if (ret)
 		return ret;
 
+	if (assume_pack_keep_transitive) {
+		struct rev_info revs;
+		const char *revargv[] = { "skip", "--all", "--reflog", "--indexed-objects", NULL };
+
+		repo_init_revisions(the_repository, &revs, NULL);
+		setup_revisions(sizeof(revargv)/sizeof(revargv[0]) - 1 , revargv, &revs, NULL);
+		if (prepare_revision_walk(&revs, sparse))
+			die("revision walk setup failed");
+
+		xwrite(cmd.in, "--not\n", 6);
+		traverse_commit_list(&revs, apkt_show_commit, apkt_show_object,
+				     &cmd);
+		xwrite(cmd.in, "--not\n", 6);
+
+		close(cmd.in);
+	}
+
 	out = xfdopen(cmd.out, "r");
 	while (strbuf_getline_lf(&line, out) != EOF) {
 		if (line.len != the_hash_algo->hexsz)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/4] repack: add --sparse and pass to pack-objects
  2019-03-12 13:18 ` [PATCH 3/4] repack: add --sparse and pass to pack-objects Nathaniel Filardo
@ 2019-03-12 13:47   ` Derrick Stolee
  2019-03-12 14:03     ` Dr N.W. Filardo
  0 siblings, 1 reply; 8+ messages in thread
From: Derrick Stolee @ 2019-03-12 13:47 UTC (permalink / raw)
  To: Nathaniel Filardo, git; +Cc: Derrick Stolee

On 3/12/2019 9:18 AM, Nathaniel Filardo wrote:
> The sparse connectivity algorithm saves a whole lot of time when there
> are UNINTERESTING trees around.

Interesting! Do you have some performance numbers to include with
this statement?
> @@ -48,6 +49,10 @@ static int repack_config(const char *var, const char *value, void *cb)
>  		use_delta_islands = git_config_bool(var, value);
>  		return 0;
>  	}
> +	if (!strcmp(var, "pack.usesparse")) {
> +		sparse = git_config_bool(var, value);
> +		return 0;
> +	}

This part is not handled inside of `pack-objects`. Since you are not
sending '--no-sparse' when the variable 'sparse' is zero, the config
setting will automatically be picked up by the pack-objects builtin.

Now, a question of whether you _should_ allow the '--no-sparse' option
in the 'repack' command, and send it along to the inner command (when
it is present) is another question.

> @@ -366,6 +374,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	argv_array_push(&cmd.args, "--all");
>  	argv_array_push(&cmd.args, "--reflog");
>  	argv_array_push(&cmd.args, "--indexed-objects");
> +	if (sparse)
> +		argv_array_push(&cmd.args, "--sparse");
>  	if (repository_format_partial_clone)
>  		argv_array_push(&cmd.args, "--exclude-promisor-objects");
>  	if (write_bitmaps)
> 

How about a test with this new option? You can probably just add to
t5322-pack-objects-sparse.sh.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] revision walk: optionally use sparse reachability
  2019-03-12 13:18 ` [PATCH 2/4] revision walk: optionally use sparse reachability Nathaniel Filardo
@ 2019-03-12 13:59   ` Derrick Stolee
  0 siblings, 0 replies; 8+ messages in thread
From: Derrick Stolee @ 2019-03-12 13:59 UTC (permalink / raw)
  To: Nathaniel Filardo, git; +Cc: Derrick Stolee

On 3/12/2019 9:18 AM, Nathaniel Filardo wrote:
> The only caller that passes a non-zero value to prepare_revision_walk
> after this patch is builtin/pack-objects.  Without this, sparsity seems
> to do little good therein, as prepare_revision_walk will densely
> propagate UNINTERESTING flags from trees to tree contents, before
> mark_edges_uninteresting has a chance to be faster by being sparse.
> 
> Signed-off-by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
> ---
>  bisect.c                         |  2 +-
>  blame.c                          |  2 +-
>  builtin/checkout.c               |  2 +-
>  builtin/commit.c                 |  2 +-
>  builtin/describe.c               |  2 +-
>  builtin/fast-export.c            |  2 +-
>  builtin/fmt-merge-msg.c          |  2 +-
>  builtin/log.c                    | 10 +++++-----
>  builtin/merge.c                  |  2 +-
>  builtin/pack-objects.c           |  4 ++--
>  builtin/rev-list.c               |  2 +-
>  builtin/shortlog.c               |  2 +-
>  bundle.c                         |  2 +-
>  http-push.c                      |  2 +-
>  merge-recursive.c                |  2 +-
>  pack-bitmap-write.c              |  2 +-
>  pack-bitmap.c                    |  4 ++--
>  reachable.c                      |  4 ++--
>  ref-filter.c                     |  2 +-
>  remote.c                         |  2 +-
>  revision.c                       | 10 ++++++----
>  revision.h                       |  2 +-
>  sequencer.c                      |  6 +++---
>  shallow.c                        |  2 +-
>  submodule.c                      |  4 ++--
>  t/helper/test-revision-walking.c |  2 +-
>  26 files changed, 41 insertions(+), 39 deletions(-)

This patch is very noisy. Perhaps the pattern established in 4f6d26b1:
"list-objects: consume sparse tree walk" should be reversed and instead
include a 'sparse_tree_walk' setting into 'struct rev_info'.

Changing so many method prototypes is rather invasive and unlikely to
benefit many of these callers.

If the setting is added to 'struct rev_info', then you'll want to remove
the parameter from mark_edges_uninteresting().

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/4] repack: add --sparse and pass to pack-objects
  2019-03-12 13:47   ` Derrick Stolee
@ 2019-03-12 14:03     ` Dr N.W. Filardo
  0 siblings, 0 replies; 8+ messages in thread
From: Dr N.W. Filardo @ 2019-03-12 14:03 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On 2019-03-12 13:47, Derrick Stolee wrote:
> On 3/12/2019 9:18 AM, Nathaniel Filardo wrote:
>> The sparse connectivity algorithm saves a whole lot of time when there
>> are UNINTERESTING trees around.
> 
> Interesting! Do you have some performance numbers to include with
> this statement?

Not UNINTERESTING? ;)

Not directly, no, but the performance numbers reported for the next 
patch in
the series hinge on using sparse reachability.  It seemed like a good 
idea to
expose this knob through repack even if one isn't going to use the
--assume-pack-keep-transitive flag introduced in the next patch.

>> @@ -48,6 +49,10 @@ static int repack_config(const char *var, const 
>> char *value, void *cb)
>>  		use_delta_islands = git_config_bool(var, value);
>>  		return 0;
>>  	}
>> +	if (!strcmp(var, "pack.usesparse")) {
>> +		sparse = git_config_bool(var, value);
>> +		return 0;
>> +	}
> 
> This part is not handled inside of `pack-objects`. Since you are not
> sending '--no-sparse' when the variable 'sparse' is zero, the config
> setting will automatically be picked up by the pack-objects builtin.

OK, I will drop this hunk.

> Now, a question of whether you _should_ allow the '--no-sparse' option
> in the 'repack' command, and send it along to the inner command (when
> it is present) is another question.

I'm inclined to say yes, but am open to suggestions. :)

>> @@ -366,6 +374,8 @@ int cmd_repack(int argc, const char **argv, const 
>> char *prefix)
>>  	argv_array_push(&cmd.args, "--all");
>>  	argv_array_push(&cmd.args, "--reflog");
>>  	argv_array_push(&cmd.args, "--indexed-objects");
>> +	if (sparse)
>> +		argv_array_push(&cmd.args, "--sparse");
>>  	if (repository_format_partial_clone)
>>  		argv_array_push(&cmd.args, "--exclude-promisor-objects");
>>  	if (write_bitmaps)
>> 
> 
> How about a test with this new option? You can probably just add to
> t5322-pack-objects-sparse.sh.

Can do.

Cheers,
--nwf;


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-03-12 14:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12 13:18 [PATCH 0/4] Speed up repacking when lots of pack-kept objects Nathaniel Filardo
2019-03-12 13:18 ` [PATCH 1/4] count-objects: report statistics about kept packs Nathaniel Filardo
2019-03-12 13:18 ` [PATCH 2/4] revision walk: optionally use sparse reachability Nathaniel Filardo
2019-03-12 13:59   ` Derrick Stolee
2019-03-12 13:18 ` [PATCH 3/4] repack: add --sparse and pass to pack-objects Nathaniel Filardo
2019-03-12 13:47   ` Derrick Stolee
2019-03-12 14:03     ` Dr N.W. Filardo
2019-03-12 13:18 ` [PATCH 4/4] repack: optionally assume transitive kept packs Nathaniel Filardo

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git