git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
@ 2020-08-24 19:16 Jonathan Tan
  2020-08-24 19:16 ` [PATCH 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
                   ` (11 more replies)
  0 siblings, 12 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

I'm trying to resurrect [1] and have rebased it to latest master
(675a4aaf3b ("Ninth batch", 2020-08-19)).

Peff said [2] (of v1) that the overall direction seems reasonable and
Josh Steadmon said [3] (of v2) that it looks mostly good except for
possible improvements to commit messages and comments. Josh did not list
out specific improvements to commit messages but I have taken his
suggestions for comments.

[1] https://lore.kernel.org/git/cover.1571343096.git.jonathantanmy@google.com/
[2] https://lore.kernel.org/git/20191017063554.GG10862@sigill.intra.peff.net/
[3] https://lore.kernel.org/git/20200228000350.GB12115@google.com/

Jonathan Tan (7):
  Documentation: deltaBaseCacheLimit is per-thread
  index-pack: remove redundant parameter
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant child field
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: make resolve_delta() assume base data
  index-pack: make quantum of work smaller

 Documentation/config/core.txt |   2 +-
 builtin/index-pack.c          | 449 ++++++++++++++++++----------------
 2 files changed, 244 insertions(+), 207 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/7] Documentation: deltaBaseCacheLimit is per-thread
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 19:16 ` [PATCH] fetch-pack: in partial clone, pass --promisor Jonathan Tan
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

Clarify that core.deltaBaseCacheLimit is per-thread, as can be seen from
the fact that cache usage (base_cache_used in struct thread_local in
builtin/index-pack.c) is tracked individually for each thread and
compared against delta_base_cache_limit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config/core.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 74619a9c03..02002cf109 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -399,7 +399,7 @@ the largest projects.  You probably do not need to adjust this value.
 Common unit suffixes of 'k', 'm', or 'g' are supported.
 
 core.deltaBaseCacheLimit::
-	Maximum number of bytes to reserve for caching base objects
+	Maximum number of bytes per thread to reserve for caching base objects
 	that may be referenced by multiple deltified objects.  By storing the
 	entire decompressed base objects in a cache Git is able
 	to avoid unpacking and decompressing frequently used base
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] fetch-pack: in partial clone, pass --promisor
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
  2020-08-24 19:16 ` [PATCH 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 19:36   ` Jonathan Tan
  2020-08-24 19:16 ` [PATCH 2/7] index-pack: remove redundant parameter Jonathan Tan
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When fetching a pack from a promisor remote, the corresponding .promisor
file needs to be created. "fetch-pack" originally did this by passing
"--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write
fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do
this itself instead, because it needed to store ref information in the
.promisor file.

This causes a problem with superprojects when transfer.fsckobjects is
set, because in the current implementation, it is "index-pack" that
calls fsck_finish() to check the objects; before 5374a290aa,
fsck_finish() would see that .gitmodules is a promisor object and
tolerate it being missing, but after, there is no .promisor file (at the
time of the invocation of fsck_finish() by "index-pack") to tell it that
.gitmodules is a promisor object, so it returns an error.

Therefore, teach "fetch-pack" to pass "--promisor" to index pack once
again. "fetch-pack" will subsequently overwrite this file with the ref
information.

An alternative is to instead move object checking to "fetch-pack", and
let "index-pack" only index the files. However, since "index-pack" has
to inflate objects in order to index them, it seems reasonable to also
let it check the objects (which also require inflated files).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c             | 17 ++++++++++-------
 t/t5616-partial-clone.sh | 16 ++++++++++++++++
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 7f20eca4f8..d467edc24e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -866,13 +866,16 @@ static int get_pack(struct fetch_pack_args *args,
 			 * have this responsibility.
 			 */
 			args->check_self_contained_and_connected = 0;
-		/*
-		 * If we're obtaining the filename of a lockfile, we'll use
-		 * that filename to write a .promisor file with more
-		 * information below. If not, we need index-pack to do it for
-		 * us.
-		 */
-		if (!(do_keep && pack_lockfiles) && args->from_promisor)
+
+		if (args->from_promisor)
+			/*
+			 * write_promisor_file() may be called afterwards but
+			 * we still need index-pack to know that this is a
+			 * promisor pack. For example, if transfer.fsckobjects
+			 * is true, index-pack needs to know that .gitmodules
+			 * is a promisor object (so that it won't complain if
+			 * it is missing).
+			 */
 			strvec_push(&cmd.args, "--promisor");
 	}
 	else {
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 8827c2ed18..5a01466db4 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -163,6 +163,22 @@ test_expect_success 'manual prefetch of missing objects' '
 	test_line_count = 0 observed.oids
 '
 
+test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
+	test_create_repo submodule &&
+	test_commit -C submodule mycommit &&
+
+	test_create_repo src_with_sub &&
+	test_config -C src_with_sub uploadpack.allowfilter 1 &&
+	test_config -C src_with_sub uploadpack.allowanysha1inwant 1 &&
+
+	git -C src_with_sub submodule add "file://$(pwd)/submodule" mysub &&
+	git -C src_with_sub commit -m "commit with submodule" &&
+
+	git -c transfer.fsckobjects=1 \
+		clone --filter="blob:none" "file://$(pwd)/src_with_sub" dst &&
+	test_when_finished rm -rf dst
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack --fsck-objects' '
 	git init src &&
 	test_commit -C src x &&
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/7] index-pack: remove redundant parameter
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
  2020-08-24 19:16 ` [PATCH 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
  2020-08-24 19:16 ` [PATCH] fetch-pack: in partial clone, pass --promisor Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 21:01   ` Junio C Hamano
  2020-08-24 19:16 ` [PATCH 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
the object type is already present in the name of the function. Remove
that parameter from these functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f865666db9..a8444d550b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -614,7 +614,7 @@ static int compare_ofs_delta_bases(off_t offset1, off_t offset2,
 	       0;
 }
 
-static int find_ofs_delta(const off_t offset, enum object_type type)
+static int find_ofs_delta(const off_t offset)
 {
 	int first = 0, last = nr_ofs_deltas;
 
@@ -624,7 +624,8 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 		int cmp;
 
 		cmp = compare_ofs_delta_bases(offset, delta->offset,
-					      type, objects[delta->obj_no].type);
+					      OBJ_OFS_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -637,10 +638,9 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 }
 
 static void find_ofs_delta_children(off_t offset,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ofs_delta(offset, type);
+	int first = find_ofs_delta(offset);
 	int last = first;
 	int end = nr_ofs_deltas - 1;
 
@@ -668,7 +668,7 @@ static int compare_ref_delta_bases(const struct object_id *oid1,
 	return oidcmp(oid1, oid2);
 }
 
-static int find_ref_delta(const struct object_id *oid, enum object_type type)
+static int find_ref_delta(const struct object_id *oid)
 {
 	int first = 0, last = nr_ref_deltas;
 
@@ -678,7 +678,8 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 		int cmp;
 
 		cmp = compare_ref_delta_bases(oid, &delta->oid,
-					      type, objects[delta->obj_no].type);
+					      OBJ_REF_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -691,10 +692,9 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 }
 
 static void find_ref_delta_children(const struct object_id *oid,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ref_delta(oid, type);
+	int first = find_ref_delta(oid);
 	int last = first;
 	int end = nr_ref_deltas - 1;
 
@@ -983,12 +983,10 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 {
 	if (base->ref_last == -1 && base->ofs_last == -1) {
 		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last,
-					OBJ_REF_DELTA);
+					&base->ref_first, &base->ref_last);
 
 		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last,
-					OBJ_OFS_DELTA);
+					&base->ofs_first, &base->ofs_last);
 
 		if (base->ref_last == -1 && base->ofs_last == -1) {
 			free(base->data);
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/7] index-pack: unify threaded and unthreaded code
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (2 preceding siblings ...)
  2020-08-24 19:16 ` [PATCH 2/7] index-pack: remove redundant parameter Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 21:11   ` Junio C Hamano
  2020-08-24 19:16 ` [PATCH 4/7] index-pack: remove redundant child field Jonathan Tan
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index a8444d550b..357e03b5aa 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1211,15 +1211,7 @@ static void resolve_deltas(void)
 		cleanup_thread();
 		return;
 	}
-
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
-
-		if (is_delta_type(obj->type))
-			continue;
-		resolve_base(obj);
-		display_progress(progress, nr_resolved_deltas);
-	}
+	threaded_second_pass(&nothread_data);
 }
 
 /*
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/7] index-pack: remove redundant child field
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (3 preceding siblings ...)
  2020-08-24 19:16 ` [PATCH 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 19:16 ` [PATCH 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

This is refactoring 1 of 2 to simplify struct base_data.

In index-pack, each thread maintains a doubly-linked list of the delta
chain that it is currently processing (the "base" and "child" pointers
in struct base_data). When a thread exceeds the delta base cache limit
and needs to reclaim memory, it uses the "child" pointers to traverse
the lineage, reclaiming the memory of the eldest delta bases first.

A subsequent patch will perform memory reclaiming in a different way and
will thus no longer need the "child" pointer. Because the "child"
pointer is redundant even now, remove it so that the aforementioned
subsequent patch will be clearer. In the meantime, reclaim memory in the
reverse order of the "base" pointers.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 357e03b5aa..032716553c 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -34,7 +34,6 @@ struct object_stat {
 
 struct base_data {
 	struct base_data *base;
-	struct base_data *child;
 	struct object_entry *obj;
 	void *data;
 	unsigned long size;
@@ -44,7 +43,6 @@ struct base_data {
 
 struct thread_local {
 	pthread_t thread;
-	struct base_data *base_cache;
 	size_t base_cache_used;
 	int pack_fd;
 };
@@ -380,27 +378,37 @@ static void free_base_data(struct base_data *c)
 	}
 }
 
-static void prune_base_data(struct base_data *retain)
+static void prune_base_data(struct base_data *youngest_child)
 {
 	struct base_data *b;
 	struct thread_local *data = get_thread_data();
-	for (b = data->base_cache;
-	     data->base_cache_used > delta_base_cache_limit && b;
-	     b = b->child) {
-		if (b->data && b != retain)
-			free_base_data(b);
+	struct base_data **ancestry = NULL;
+	size_t nr = 0, alloc = 0;
+	ssize_t i;
+
+	if (data->base_cache_used <= delta_base_cache_limit)
+		return;
+
+	/*
+	 * Free all ancestors of youngest_child until we have enough space,
+	 * starting with the oldest. (We cannot free youngest_child itself.)
+	 */
+	for (b = youngest_child->base; b != NULL; b = b->base) {
+		ALLOC_GROW(ancestry, nr + 1, alloc);
+		ancestry[nr++] = b;
 	}
+	for (i = nr - 1;
+	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
+	     i--) {
+		if (ancestry[i]->data)
+			free_base_data(ancestry[i]);
+	}
+	free(ancestry);
 }
 
 static void link_base_data(struct base_data *base, struct base_data *c)
 {
-	if (base)
-		base->child = c;
-	else
-		get_thread_data()->base_cache = c;
-
 	c->base = base;
-	c->child = NULL;
 	if (c->data)
 		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
@@ -408,11 +416,6 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 
 static void unlink_base_data(struct base_data *c)
 {
-	struct base_data *base = c->base;
-	if (base)
-		base->child = NULL;
-	else
-		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 5/7] index-pack: calculate {ref,ofs}_{first,last} early
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (4 preceding siblings ...)
  2020-08-24 19:16 ` [PATCH 4/7] index-pack: remove redundant child field Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 19:16 ` [PATCH 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

This is refactoring 2 of 2 to simplify struct base_data.

Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.

Before this patch, the delta children were calculated at the last
possible moment. This allowed the members of struct base_data to be
populated in any order, superficially useful when we have the object
contents before the struct object_entry. But this makes reasoning about
the state of struct base_data more complicated, hence this patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 123 +++++++++++++++++++++----------------------
 1 file changed, 60 insertions(+), 63 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 032716553c..e98b11ab37 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -33,12 +33,15 @@ struct object_stat {
 };
 
 struct base_data {
+	/* Initialized by make_base(). */
 	struct base_data *base;
 	struct object_entry *obj;
-	void *data;
-	unsigned long size;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+
+	/* Not initialized by make_base(). */
+	void *data;
+	unsigned long size;
 };
 
 struct thread_local {
@@ -362,14 +365,6 @@ static void set_thread_data(struct thread_local *data)
 		pthread_setspecific(key, data);
 }
 
-static struct base_data *alloc_base_data(void)
-{
-	struct base_data *base = xcalloc(1, sizeof(struct base_data));
-	base->ref_last = -1;
-	base->ofs_last = -1;
-	return base;
-}
-
 static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
@@ -406,19 +401,6 @@ static void prune_base_data(struct base_data *youngest_child)
 	free(ancestry);
 }
 
-static void link_base_data(struct base_data *base, struct base_data *c)
-{
-	c->base = base;
-	if (c->data)
-		get_thread_data()->base_cache_used += c->size;
-	prune_base_data(c);
-}
-
-static void unlink_base_data(struct base_data *c)
-{
-	free_base_data(c);
-}
-
 static int is_delta_type(enum object_type type)
 {
 	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
@@ -929,10 +911,25 @@ static void *get_base_data(struct base_data *c)
 	return c->data;
 }
 
-static void resolve_delta(struct object_entry *delta_obj,
-			  struct base_data *base, struct base_data *result)
+static struct base_data *make_base(struct object_entry *obj,
+				   struct base_data *parent)
 {
-	void *base_data, *delta_data;
+	struct base_data *base = xcalloc(1, sizeof(struct base_data));
+	base->base = parent;
+	base->obj = obj;
+	find_ref_delta_children(&obj->idx.oid,
+				&base->ref_first, &base->ref_last);
+	find_ofs_delta_children(obj->idx.offset,
+				&base->ofs_first, &base->ofs_last);
+	return base;
+}
+
+static struct base_data *resolve_delta(struct object_entry *delta_obj,
+				       struct base_data *base)
+{
+	void *base_data, *delta_data, *result_data;
+	struct base_data *result;
+	unsigned long result_size;
 
 	if (show_stat) {
 		int i = delta_obj - objects;
@@ -946,19 +943,31 @@ static void resolve_delta(struct object_entry *delta_obj,
 	}
 	delta_data = get_data_from_pack(delta_obj);
 	base_data = get_base_data(base);
-	result->obj = delta_obj;
-	result->data = patch_delta(base_data, base->size,
-				   delta_data, delta_obj->size, &result->size);
+	result_data = patch_delta(base_data, base->size,
+				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
-	if (!result->data)
+	if (!result_data)
 		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
-	hash_object_file(the_hash_algo, result->data, result->size,
+	hash_object_file(the_hash_algo, result_data, result_size,
 			 type_name(delta_obj->real_type), &delta_obj->idx.oid);
-	sha1_object(result->data, NULL, result->size, delta_obj->real_type,
+	sha1_object(result_data, NULL, result_size, delta_obj->real_type,
 		    &delta_obj->idx.oid);
+
+	result = make_base(delta_obj, base);
+	if (result->ref_last == -1 && result->ofs_last == -1) {
+		free(result_data);
+	} else {
+		result->data = result_data;
+		result->size = result_size;
+		get_thread_data()->base_cache_used += result->size;
+		prune_base_data(result);
+	}
+
 	counter_lock();
 	nr_resolved_deltas++;
 	counter_unlock();
+
+	return result;
 }
 
 /*
@@ -984,24 +993,9 @@ static int compare_and_swap_type(signed char *type,
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 						  struct base_data *prev_base)
 {
-	if (base->ref_last == -1 && base->ofs_last == -1) {
-		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last);
-
-		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last);
-
-		if (base->ref_last == -1 && base->ofs_last == -1) {
-			free(base->data);
-			return NULL;
-		}
-
-		link_base_data(prev_base, base);
-	}
-
 	if (base->ref_first <= base->ref_last) {
 		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
 					   base->obj->real_type))
@@ -1009,7 +1003,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 			    (uintmax_t)child->idx.offset,
 			    oid_to_hex(&base->obj->idx.oid));
 
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
 
@@ -1019,11 +1013,11 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 	if (base->ofs_first <= base->ofs_last) {
 		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
 
@@ -1031,7 +1025,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 		return result;
 	}
 
-	unlink_base_data(base);
+	free_base_data(base);
 	return NULL;
 }
 
@@ -1074,9 +1068,8 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 
 static void resolve_base(struct object_entry *obj)
 {
-	struct base_data *base_obj = alloc_base_data();
-	base_obj->obj = obj;
-	base_obj->data = NULL;
+	struct base_data *base_obj = make_base(obj, NULL);
+
 	find_unresolved_deltas(base_obj);
 }
 
@@ -1369,22 +1362,26 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base_obj = alloc_base_data();
+		struct base_data *base;
+		void *data;
+		unsigned long size;
+		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
-		base_obj->data = read_object_file(&d->oid, &type,
-						  &base_obj->size);
-		if (!base_obj->data)
+		data = read_object_file(&d->oid, &type, &size);
+		if (!data)
 			continue;
 
 		if (check_object_signature(the_repository, &d->oid,
-					   base_obj->data, base_obj->size,
+					   data, size,
 					   type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		base_obj->obj = append_obj_to_pack(f, d->oid.hash,
-					base_obj->data, base_obj->size, type);
-		find_unresolved_deltas(base_obj);
+		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
+		base = make_base(obj, NULL);
+		base->data = data;
+		base->size = size;
+		find_unresolved_deltas(base);
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 6/7] index-pack: make resolve_delta() assume base data
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (5 preceding siblings ...)
  2020-08-24 19:16 ` [PATCH 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 19:16 ` [PATCH 7/7] index-pack: make quantum of work smaller Jonathan Tan
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index e98b11ab37..c6d2acc13a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -927,7 +927,7 @@ static struct base_data *make_base(struct object_entry *obj,
 static struct base_data *resolve_delta(struct object_entry *delta_obj,
 				       struct base_data *base)
 {
-	void *base_data, *delta_data, *result_data;
+	void *delta_data, *result_data;
 	struct base_data *result;
 	unsigned long result_size;
 
@@ -942,8 +942,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		obj_stat[i].base_object_no = j;
 	}
 	delta_data = get_data_from_pack(delta_obj);
-	base_data = get_base_data(base);
-	result_data = patch_delta(base_data, base->size,
+	assert(base->data);
+	result_data = patch_delta(base->data, base->size,
 				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
 	if (!result_data)
@@ -1003,6 +1003,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 			    (uintmax_t)child->idx.offset,
 			    oid_to_hex(&base->obj->idx.oid));
 
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
@@ -1017,6 +1018,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 7/7] index-pack: make quantum of work smaller
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (6 preceding siblings ...)
  2020-08-24 19:16 ` [PATCH 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
@ 2020-08-24 19:16 ` Jonathan Tan
  2020-08-24 21:19   ` Junio C Hamano
  2020-08-24 20:47 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.

This patch contains a solution to that. When cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.

The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 341 ++++++++++++++++++++++++-------------------
 1 file changed, 193 insertions(+), 148 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index c6d2acc13a..0a5b938e1e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -38,15 +38,49 @@ struct base_data {
 	struct object_entry *obj;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+	/*
+	 * Threads should increment retain_data if they are about to call
+	 * patch_delta() using this struct's data as a base, and decrement this
+	 * when they are done. While retain_data is nonzero, this struct's data
+	 * will not be freed even if the delta base cache limit is exceeded.
+	 */
+	int retain_data;
+	/*
+	 * The number of direct children that have not been fully processed
+	 * (entered work_head, entered done_head, left done_head). When this
+	 * number reaches zero, this struct base_data can be freed.
+	 */
+	int children_remaining;
 
 	/* Not initialized by make_base(). */
+	struct list_head list;
 	void *data;
 	unsigned long size;
 };
 
+/*
+ * Stack of struct base_data that have unprocessed children.
+ * threaded_second_pass() uses this as a source of work (the other being the
+ * objects array).
+ */
+LIST_HEAD(work_head);
+
+/*
+ * Stack of struct base_data that have children, all of whom have been
+ * processed or are being processed, and at least one child is being processed.
+ * These struct base_data must be kept around until the last child is
+ * processed.
+ */
+LIST_HEAD(done_head);
+
+/*
+ * All threads share one delta base cache.
+ */
+size_t base_cache_used;
+size_t base_cache_limit;
+
 struct thread_local {
 	pthread_t thread;
-	size_t base_cache_used;
 	int pack_fd;
 };
 
@@ -369,36 +403,38 @@ static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
 		FREE_AND_NULL(c->data);
-		get_thread_data()->base_cache_used -= c->size;
+		base_cache_used -= c->size;
 	}
 }
 
-static void prune_base_data(struct base_data *youngest_child)
+static void prune_base_data(struct base_data *retain)
 {
-	struct base_data *b;
-	struct thread_local *data = get_thread_data();
-	struct base_data **ancestry = NULL;
-	size_t nr = 0, alloc = 0;
-	ssize_t i;
+	struct list_head *pos;
 
-	if (data->base_cache_used <= delta_base_cache_limit)
+	if (base_cache_used <= base_cache_limit)
 		return;
 
-	/*
-	 * Free all ancestors of youngest_child until we have enough space,
-	 * starting with the oldest. (We cannot free youngest_child itself.)
-	 */
-	for (b = youngest_child->base; b != NULL; b = b->base) {
-		ALLOC_GROW(ancestry, nr + 1, alloc);
-		ancestry[nr++] = b;
+	list_for_each_prev(pos, &done_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	for (i = nr - 1;
-	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
-	     i--) {
-		if (ancestry[i]->data)
-			free_base_data(ancestry[i]);
+
+	list_for_each_prev(pos, &work_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	free(ancestry);
 }
 
 static int is_delta_type(enum object_type type)
@@ -851,15 +887,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 }
 
 /*
- * This function is part of find_unresolved_deltas(). There are two
- * walkers going in the opposite ways.
- *
- * The first one in find_unresolved_deltas() traverses down from
- * parent node to children, deflating nodes along the way. However,
- * memory for deflated nodes is limited by delta_base_cache_limit, so
- * at some point parent node's deflated content may be freed.
- *
- * The second walker is this function, which goes from current node up
+ * Walk from current node up
  * to top parent if necessary to deflate the node. In normal
  * situation, its parent node would be already deflated, so it just
  * needs to apply delta.
@@ -887,7 +915,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -903,7 +931,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, _("failed to apply delta"));
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -921,6 +949,8 @@ static struct base_data *make_base(struct object_entry *obj,
 				&base->ref_first, &base->ref_last);
 	find_ofs_delta_children(obj->idx.offset,
 				&base->ofs_first, &base->ofs_last);
+	base->children_remaining = base->ref_last - base->ref_first +
+		base->ofs_last - base->ofs_first + 2;
 	return base;
 }
 
@@ -954,14 +984,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		    &delta_obj->idx.oid);
 
 	result = make_base(delta_obj, base);
-	if (result->ref_last == -1 && result->ofs_last == -1) {
-		free(result_data);
-	} else {
-		result->data = result_data;
-		result->size = result_size;
-		get_thread_data()->base_cache_used += result->size;
-		prune_base_data(result);
-	}
+	result->data = result_data;
+	result->size = result_size;
 
 	counter_lock();
 	nr_resolved_deltas++;
@@ -970,86 +994,6 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 	return result;
 }
 
-/*
- * Standard boolean compare-and-swap: atomically check whether "*type" is
- * "want"; if so, swap in "set" and return true. Otherwise, leave it untouched
- * and return false.
- */
-static int compare_and_swap_type(signed char *type,
-				 enum object_type want,
-				 enum object_type set)
-{
-	enum object_type old;
-
-	type_cas_lock();
-	old = *type;
-	if (old == want)
-		*type = set;
-	type_cas_unlock();
-
-	return old == want;
-}
-
-static struct base_data *find_unresolved_deltas_1(struct base_data *base,
-						  struct base_data *prev_base)
-{
-	if (base->ref_first <= base->ref_last) {
-		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result;
-
-		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
-					   base->obj->real_type))
-			die("REF_DELTA at offset %"PRIuMAX" already resolved (duplicate base %s?)",
-			    (uintmax_t)child->idx.offset,
-			    oid_to_hex(&base->obj->idx.oid));
-
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ref_first == base->ref_last && base->ofs_last == -1)
-			free_base_data(base);
-
-		base->ref_first++;
-		return result;
-	}
-
-	if (base->ofs_first <= base->ofs_last) {
-		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result;
-
-		assert(child->real_type == OBJ_OFS_DELTA);
-		child->real_type = base->obj->real_type;
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ofs_first == base->ofs_last)
-			free_base_data(base);
-
-		base->ofs_first++;
-		return result;
-	}
-
-	free_base_data(base);
-	return NULL;
-}
-
-static void find_unresolved_deltas(struct base_data *base)
-{
-	struct base_data *new_base, *prev_base = NULL;
-	for (;;) {
-		new_base = find_unresolved_deltas_1(base, prev_base);
-
-		if (new_base) {
-			prev_base = base;
-			base = new_base;
-		} else {
-			free(base);
-			base = prev_base;
-			if (!base)
-				return;
-			prev_base = base->base;
-		}
-	}
-}
-
 static int compare_ofs_delta_entry(const void *a, const void *b)
 {
 	const struct ofs_delta_entry *delta_a = a;
@@ -1068,33 +1012,131 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 	return oidcmp(&delta_a->oid, &delta_b->oid);
 }
 
-static void resolve_base(struct object_entry *obj)
-{
-	struct base_data *base_obj = make_base(obj, NULL);
-
-	find_unresolved_deltas(base_obj);
-}
-
 static void *threaded_second_pass(void *data)
 {
-	set_thread_data(data);
+	if (data)
+		set_thread_data(data);
 	for (;;) {
-		int i;
-		counter_lock();
-		display_progress(progress, nr_resolved_deltas);
-		counter_unlock();
+		struct base_data *parent = NULL;
+		struct object_entry *child_obj;
+		struct base_data *child;
+
 		work_lock();
-		while (nr_dispatched < nr_objects &&
-		       is_delta_type(objects[nr_dispatched].type))
-			nr_dispatched++;
-		if (nr_dispatched >= nr_objects) {
-			work_unlock();
-			break;
+		if (list_empty(&work_head)) {
+			/*
+			 * Take an object from the object array.
+			 */
+			while (nr_dispatched < nr_objects &&
+			       is_delta_type(objects[nr_dispatched].type))
+				nr_dispatched++;
+			if (nr_dispatched >= nr_objects) {
+				work_unlock();
+				break;
+			}
+			child_obj = &objects[nr_dispatched++];
+		} else {
+			/*
+			 * Peek at the top of the stack, and take a child from
+			 * it.
+			 */
+			parent = list_first_entry(&work_head, struct base_data,
+						  list);
+
+			if (parent->ref_first <= parent->ref_last) {
+				int offset = ref_deltas[parent->ref_first++].obj_no;
+				child_obj = objects + offset;
+				if (child_obj->real_type != OBJ_REF_DELTA)
+					die("REF_DELTA at offset %"PRIuMAX" already resolved (duplicate base %s?)",
+					    (uintmax_t) child_obj->idx.offset,
+					    oid_to_hex(&parent->obj->idx.oid));
+				child_obj->real_type = parent->obj->real_type;
+			} else {
+				child_obj = objects +
+					ofs_deltas[parent->ofs_first++].obj_no;
+				assert(child_obj->real_type == OBJ_OFS_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			}
+
+			if (parent->ref_first > parent->ref_last &&
+			    parent->ofs_first > parent->ofs_last) {
+				/*
+				 * This parent has run out of children, so move
+				 * it to done_head.
+				 */
+				list_del(&parent->list);
+				list_add(&parent->list, &done_head);
+			}
+
+			/*
+			 * Ensure that the parent has data, since we will need
+			 * it later.
+			 *
+			 * NEEDSWORK: If parent data needs to be reloaded, this
+			 * prolongs the time that the current thread spends in
+			 * the mutex. A mitigating factor is that parent data
+			 * needs to be reloaded only if the delta base cache
+			 * limit is exceeded, so in the typical case, this does
+			 * not happen.
+			 */
+			get_base_data(parent);
+			parent->retain_data++;
 		}
-		i = nr_dispatched++;
 		work_unlock();
 
-		resolve_base(&objects[i]);
+		if (parent) {
+			child = resolve_delta(child_obj, parent);
+			if (!child->children_remaining)
+				FREE_AND_NULL(child->data);
+		} else {
+			child = make_base(child_obj, NULL);
+			if (child->children_remaining) {
+				/*
+				 * Since this child has its own delta children,
+				 * we will need this data in the future.
+				 * Inflate now so that future iterations will
+				 * have access to this object's data while
+				 * outside the work mutex.
+				 */
+				child->data = get_data_from_pack(child_obj);
+				child->size = child_obj->size;
+			}
+		}
+
+		work_lock();
+		if (parent)
+			parent->retain_data--;
+		if (child->data) {
+			/*
+			 * This child has its own children, so add it to
+			 * work_head.
+			 */
+			list_add(&child->list, &work_head);
+			base_cache_used += child->size;
+			prune_base_data(NULL);
+		} else {
+			/*
+			 * This child does not have its own children. It may be
+			 * the last descendant of its ancestors; free those
+			 * that we can.
+			 */
+			struct base_data *p = parent;
+
+			while (p) {
+				struct base_data *next_p;
+
+				p->children_remaining--;
+				if (p->children_remaining)
+					break;
+
+				next_p = p->base;
+				free_base_data(p);
+				list_del(&p->list);
+				free(p);
+
+				p = next_p;
+			}
+		}
+		work_unlock();
 	}
 	return NULL;
 }
@@ -1195,6 +1237,7 @@ static void resolve_deltas(void)
 					  nr_ref_deltas + nr_ofs_deltas);
 
 	nr_dispatched = 0;
+	base_cache_limit = delta_base_cache_limit * nr_threads;
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
 		init_thread();
 		for (i = 0; i < nr_threads; i++) {
@@ -1364,10 +1407,8 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base;
 		void *data;
 		unsigned long size;
-		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
@@ -1379,11 +1420,15 @@ static void fix_unresolved_deltas(struct hashfile *f)
 					   data, size,
 					   type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
-		base = make_base(obj, NULL);
-		base->data = data;
-		base->size = size;
-		find_unresolved_deltas(base);
+
+		/*
+		 * Add this as an object to the objects array and call
+		 * threaded_second_pass() (which will pick up the added
+		 * object).
+		 */
+		append_obj_to_pack(f, d->oid.hash, data, size, type);
+		threaded_second_pass(NULL);
+
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] fetch-pack: in partial clone, pass --promisor
  2020-08-24 19:16 ` [PATCH] fetch-pack: in partial clone, pass --promisor Jonathan Tan
@ 2020-08-24 19:36   ` Jonathan Tan
  0 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:36 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git

> When fetching a pack from a promisor remote, the corresponding .promisor
> file needs to be created. "fetch-pack" originally did this by passing
> "--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write
> fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do
> this itself instead, because it needed to store ref information in the
> .promisor file.

Oops...please ignore this. This has already been sent out as
https://lore.kernel.org/git/20200820175116.3889786-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (7 preceding siblings ...)
  2020-08-24 19:16 ` [PATCH 7/7] index-pack: make quantum of work smaller Jonathan Tan
@ 2020-08-24 20:47 ` Junio C Hamano
  2020-08-24 21:27 ` [PATCH] fixup! index-pack: make quantum of work smaller Jonathan Tan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2020-08-24 20:47 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, steadmon

Jonathan Tan <jonathantanmy@google.com> writes:

> I'm trying to resurrect [1] and have rebased it to latest master
> (675a4aaf3b ("Ninth batch", 2020-08-19)).

Yay.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] index-pack: remove redundant parameter
  2020-08-24 19:16 ` [PATCH 2/7] index-pack: remove redundant parameter Jonathan Tan
@ 2020-08-24 21:01   ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2020-08-24 21:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, steadmon

Jonathan Tan <jonathantanmy@google.com> writes:

> find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
> the object type is already present in the name of the function. Remove
> that parameter from these functions.

Interesting.  

These "find children" are the only caller for each delta type and
they always pass OFS/REF constant that corresponds to them, so
find_*_delta() can lose the type.  But compare_*_delta_bases()
cannot, as the other one in the objects[] array may be of different
type.

Makes sense.  Thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] index-pack: unify threaded and unthreaded code
  2020-08-24 19:16 ` [PATCH 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
@ 2020-08-24 21:11   ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2020-08-24 21:11 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, steadmon

Jonathan Tan <jonathantanmy@google.com> writes:

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index a8444d550b..357e03b5aa 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1211,15 +1211,7 @@ static void resolve_deltas(void)
>  		cleanup_thread();
>  		return;
>  	}
> -
> -	for (i = 0; i < nr_objects; i++) {
> -		struct object_entry *obj = &objects[i];
> -
> -		if (is_delta_type(obj->type))
> -			continue;
> -		resolve_base(obj);
> -		display_progress(progress, nr_resolved_deltas);
> -	}
> +	threaded_second_pass(&nothread_data);
>  }

The threaded_second_pass() function has a lot more noise than the
above simple and easy-to-grok loop, managing the locks, but those
all turn into no-op unless init_thread() is called, which happens
only when this code is not reached.  So OK, this looks essentially
a no-op clean-up.

Thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] index-pack: make quantum of work smaller
  2020-08-24 19:16 ` [PATCH 7/7] index-pack: make quantum of work smaller Jonathan Tan
@ 2020-08-24 21:19   ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2020-08-24 21:19 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, steadmon

Jonathan Tan <jonathantanmy@google.com> writes:

> Currently, when index-pack resolves deltas, it does not split up delta
> trees into threads: each delta base root (an object that is not a
> REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
> that root (direct or indirect) are processed in the same thread.
>
> This is a problem when a repository contains a large text file (thus,
> delta-able) that is modified many times - delta resolution time during
> fetching is dominated by processing the deltas corresponding to that
> text file.

We favor wide and shallow delta trees for both reduced storage
footprint and lower runtime overhead, so optimizing the index-pack
for that use case makes a lot of sense.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] fixup! index-pack: make quantum of work smaller
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (8 preceding siblings ...)
  2020-08-24 20:47 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano
@ 2020-08-24 21:27 ` Jonathan Tan
  2020-08-24 22:08 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jeff King
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
  11 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-08-24 21:27 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

---
> Josh did not list
> out specific improvements to commit messages but I have taken his
> suggestions for comments.

...and somehow I forgot to commit them before sending out this patch
set, so here they are.

 builtin/index-pack.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 0a5b938e1e..4fa9127c35 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -62,6 +62,8 @@ struct base_data {
  * Stack of struct base_data that have unprocessed children.
  * threaded_second_pass() uses this as a source of work (the other being the
  * objects array).
+ *
+ * Guarded by work_mutex.
  */
 LIST_HEAD(work_head);
 
@@ -70,11 +72,16 @@ LIST_HEAD(work_head);
  * processed or are being processed, and at least one child is being processed.
  * These struct base_data must be kept around until the last child is
  * processed.
+ *
+ * Guarded by work_mutex.
  */
 LIST_HEAD(done_head);
 
 /*
  * All threads share one delta base cache.
+ *
+ * base_cache_used is guarded by work_mutex, and base_cache_limit is read-only
+ * in a thread.
  */
 size_t base_cache_used;
 size_t base_cache_limit;
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (9 preceding siblings ...)
  2020-08-24 21:27 ` [PATCH] fixup! index-pack: make quantum of work smaller Jonathan Tan
@ 2020-08-24 22:08 ` Jeff King
  2020-08-25 18:11   ` Jonathan Tan
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
  11 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2020-08-24 22:08 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, steadmon

On Mon, Aug 24, 2020 at 12:16:30PM -0700, Jonathan Tan wrote:

> I'm trying to resurrect [1] and have rebased it to latest master
> (675a4aaf3b ("Ninth batch", 2020-08-19)).
> 
> Peff said [2] (of v1) that the overall direction seems reasonable and
> Josh Steadmon said [3] (of v2) that it looks mostly good except for
> possible improvements to commit messages and comments. Josh did not list
> out specific improvements to commit messages but I have taken his
> suggestions for comments.

I haven't looked closely yet, but since I was doing timings of
index-pack recently[1], I wondered if this might change anything
(spoiler: it doesn't really seem to).

Here's the result of p5302 with my extra tests on my 8-core (16 with
hyperthreading) laptop against linux.git:

  5302.3: index-pack 0 threads                   266.66(263.85+2.71)
  5302.4: index-pack 1 threads                   275.06(272.11+2.85)
  5302.5: index-pack 2 threads                   159.49(285.44+3.51)
  5302.6: index-pack 4 threads                   102.54(318.86+4.30)
  5302.7: index-pack 8 threads                   75.60(391.39+6.56) 
  5302.8: index-pack 16 threads                  75.56(748.45+13.37)
  5302.9: index-pack default number of threads   75.01(389.33+6.59) 

So the conclusions from that other series remain pretty similar: nothing
gets faster as we move past the number of actual cores. The penalty for
doing so seems less than what I got before, though (though it might just
be a fluke; it was something like 2s worse before your patches, and
there's a bit of noise; the increased CPU time can be disregarded as the
processors are throttled down when more are running).

The overall time seems to get slightly worse, though (HEAD~7 is before
your patch, HEAD is with it):

  Test                                           HEAD~7               HEAD                    
  --------------------------------------------------------------------------------------------
  5302.9: index-pack default number of threads   71.96(376.11+3.66)   74.18(390.62+6.08) +3.1%

There may be other cases that get better, though. A 3% increase here is
probably OK if we get something for it. But if our primary goal here is
increasing multithread efficiency, then we should be able to show some
benchmark that improves. :)

-Peff

[1] https://lore.kernel.org/git/20200821175153.GA3263018@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
  2020-08-24 22:08 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jeff King
@ 2020-08-25 18:11   ` Jonathan Tan
  2020-08-25 21:18     ` Jeff King
  0 siblings, 1 reply; 31+ messages in thread
From: Jonathan Tan @ 2020-08-25 18:11 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git, steadmon

> The overall time seems to get slightly worse, though (HEAD~7 is before
> your patch, HEAD is with it):
> 
>   Test                                           HEAD~7               HEAD                    
>   --------------------------------------------------------------------------------------------
>   5302.9: index-pack default number of threads   71.96(376.11+3.66)   74.18(390.62+6.08) +3.1%
> 
> There may be other cases that get better, though. A 3% increase here is
> probably OK if we get something for it. But if our primary goal here is
> increasing multithread efficiency, then we should be able to show some
> benchmark that improves. :)

Ah...good question. Cloning from
https://fuchsia.googlesource.com/third_party/vulkan-cts (mentioned in
patch 7), cd-ing to the pack dir, and running:

  git index-pack --stdin -o foo <*.pack

I got 8m2.878s with my patches and 12m6.365s without. But I ran this on
a cloud virtual machine (what I have access to right now) so the numbers
might look different on a dedicated machine.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
  2020-08-25 18:11   ` Jonathan Tan
@ 2020-08-25 21:18     ` Jeff King
  2020-08-25 21:46       ` Jeff King
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff King @ 2020-08-25 21:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, steadmon

On Tue, Aug 25, 2020 at 11:11:45AM -0700, Jonathan Tan wrote:

> > There may be other cases that get better, though. A 3% increase here is
> > probably OK if we get something for it. But if our primary goal here is
> > increasing multithread efficiency, then we should be able to show some
> > benchmark that improves. :)
> 
> Ah...good question. Cloning from
> https://fuchsia.googlesource.com/third_party/vulkan-cts (mentioned in
> patch 7), cd-ing to the pack dir, and running:
> 
>   git index-pack --stdin -o foo <*.pack
> 
> I got 8m2.878s with my patches and 12m6.365s without. But I ran this on
> a cloud virtual machine (what I have access to right now) so the numbers
> might look different on a dedicated machine.

Thanks, that's a much more interesting example. Here's what I get on my
8-core machine:

  5302.9: index-pack default number of threads   167.70(546.19+12.00)   83.69(585.61+6.95) -50.1%

So that's a considerable improvement. And hardly surprising given the
repository structure. I used the script below to show the size of the
delta families, and the vk-master ones really dominate in size and
object number (the biggest is 50GB in one delta family).

I also ran my PERF_EXTRA tests on them to see if it behaved differently
as the threads increased. Nope:

  5302.3: index-pack 0 threads                   434.13(425.90+8.16)
  5302.4: index-pack 1 threads                   428.65(421.82+6.77)
  5302.5: index-pack 2 threads                   224.05(424.13+6.21)
  5302.6: index-pack 4 threads                   125.43(457.68+5.77)
  5302.7: index-pack 8 threads                   82.60(579.10+7.78) 
  5302.8: index-pack 16 threads                  82.89(1147.82+9.66)
  5302.9: index-pack default number of threads   83.91(576.92+8.52) 

Still maxes out at the number of physical cores (not unexpected, but
that was the thing I was most curious about ;) ). I may run it on the
40-core machine, too. It's possible that with the new threading we're
able to do better going past 20-threads. I doubt it, because I think
it's mostly a function of Git's locking granularity, but worth checking.

-Peff

-- >8 --
#!/bin/sh
# script to output size, count, and filenames for each delta family

git rev-list --objects --all |
git cat-file --buffer \
  --batch-check='%(objectname) %(deltabase) %(objectsize) %(rest)' |
perl -alne '
  if ($F[1] =~ /[^0]/) {
    push @{$children{$F[1]}}, $F[0];
  } else {
    push @bases, $F[0];
  }
  $size{$F[0]} = $F[2];
  $name{$F[0]} = $F[3];
  END {
    sub add_to_component {
      my ($oid, $data) = @_;
      $data->{names}->{$name{$oid}}++;
      $data->{size} += $size{$oid};
      $data->{nr}++;
      add_to_component($_, $data) for @{$children{$oid}};
    }
    for my $b (@bases) {
      my $data = { size => 0, nr => 0, names => {} };
      add_to_component($b, $data);
      print join(" ",
                 $data->{size}, $data->{nr},
		 sort keys(%{$data->{names}})
            ), "\n";
    }
  }
' |
sort -rn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
  2020-08-25 21:18     ` Jeff King
@ 2020-08-25 21:46       ` Jeff King
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff King @ 2020-08-25 21:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, steadmon

On Tue, Aug 25, 2020 at 05:18:36PM -0400, Jeff King wrote:

> Still maxes out at the number of physical cores (not unexpected, but
> that was the thing I was most curious about ;) ). I may run it on the
> 40-core machine, too. It's possible that with the new threading we're
> able to do better going past 20-threads. I doubt it, because I think
> it's mostly a function of Git's locking granularity, but worth checking.

For the curious (I know you were all on the edge of your seat):

  5302.3: index-pack 0 threads                    519.23(479.10+40.10) 
  5302.4: index-pack 1 threads                    525.94(476.27+49.64) 
  5302.5: index-pack 2 threads                    271.82(458.93+55.52) 
  5302.6: index-pack 5 threads                    115.88(463.84+50.69) 
  5302.7: index-pack 10 threads                   67.26(478.37+57.38)  
  5302.8: index-pack 20 threads                   43.02(524.01+77.33)  
  5302.9: index-pack 40 threads                   33.42(709.86+100.24) 
  5302.10: index-pack 80 threads                  32.02(1030.75+228.28)
  5302.11: index-pack default number of threads   43.58(551.13+68.92)  

So it actually does do a slight improvement to go from 20 to 40 threads
on this repository/machine combo. Not enough to make me revise the code
I sent the other day, though. And we still get nothing from going past
the number of physical cores.

-Peff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 0/7] Better threaded delta resolution in index-pack (another try)
  2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
                   ` (10 preceding siblings ...)
  2020-08-24 22:08 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jeff King
@ 2020-09-08 19:48 ` Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
                     ` (7 more replies)
  11 siblings, 8 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

From What's Cooking [1]:

> * jt/threaded-index-pack (2020-08-27) 9 commits
>  - builtin/index-pack.c: fix some sparse warnings
>  - fixup! index-pack: make quantum of work smaller
>  - index-pack: make quantum of work smaller
>  - index-pack: make resolve_delta() assume base data
>  - index-pack: calculate {ref,ofs}_{first,last} early
>  - index-pack: remove redundant child field
>  - index-pack: unify threaded and unthreaded code
>  - index-pack: remove redundant parameter
>  - Documentation: deltaBaseCacheLimit is per-thread
> 
>  "git index-pack" learned to resolve deltified objects with greater
>  parallelism.
> 
>  Expecting the final reroll.
>  cf. https://colabti.org/irclogger/irclogger_log/git-devel?date=2020-08-31#l82

Here's the reroll.

[1] https://lore.kernel.org/git/xmqqk0xa8rvn.fsf@gitster.c.googlers.com/

Jonathan Tan (7):
  Documentation: deltaBaseCacheLimit is per-thread
  index-pack: remove redundant parameter
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant child field
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: make resolve_delta() assume base data
  index-pack: make quantum of work smaller

 Documentation/config/core.txt |   2 +-
 builtin/index-pack.c          | 456 +++++++++++++++++++---------------
 2 files changed, 251 insertions(+), 207 deletions(-)

-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 2/7] index-pack: remove redundant parameter Jonathan Tan
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

Clarify that core.deltaBaseCacheLimit is per-thread, as can be seen from
the fact that cache usage (base_cache_used in struct thread_local in
builtin/index-pack.c) is tracked individually for each thread and
compared against delta_base_cache_limit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config/core.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 74619a9c03..02002cf109 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -399,7 +399,7 @@ the largest projects.  You probably do not need to adjust this value.
 Common unit suffixes of 'k', 'm', or 'g' are supported.
 
 core.deltaBaseCacheLimit::
-	Maximum number of bytes to reserve for caching base objects
+	Maximum number of bytes per thread to reserve for caching base objects
 	that may be referenced by multiple deltified objects.  By storing the
 	entire decompressed base objects in a cache Git is able
 	to avoid unpacking and decompressing frequently used base
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 2/7] index-pack: remove redundant parameter
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
the object type is already present in the name of the function. Remove
that parameter from these functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 9721bf1ffe..0e0889afc4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -614,7 +614,7 @@ static int compare_ofs_delta_bases(off_t offset1, off_t offset2,
 	       0;
 }
 
-static int find_ofs_delta(const off_t offset, enum object_type type)
+static int find_ofs_delta(const off_t offset)
 {
 	int first = 0, last = nr_ofs_deltas;
 
@@ -624,7 +624,8 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 		int cmp;
 
 		cmp = compare_ofs_delta_bases(offset, delta->offset,
-					      type, objects[delta->obj_no].type);
+					      OBJ_OFS_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -637,10 +638,9 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 }
 
 static void find_ofs_delta_children(off_t offset,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ofs_delta(offset, type);
+	int first = find_ofs_delta(offset);
 	int last = first;
 	int end = nr_ofs_deltas - 1;
 
@@ -668,7 +668,7 @@ static int compare_ref_delta_bases(const struct object_id *oid1,
 	return oidcmp(oid1, oid2);
 }
 
-static int find_ref_delta(const struct object_id *oid, enum object_type type)
+static int find_ref_delta(const struct object_id *oid)
 {
 	int first = 0, last = nr_ref_deltas;
 
@@ -678,7 +678,8 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 		int cmp;
 
 		cmp = compare_ref_delta_bases(oid, &delta->oid,
-					      type, objects[delta->obj_no].type);
+					      OBJ_REF_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -691,10 +692,9 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 }
 
 static void find_ref_delta_children(const struct object_id *oid,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ref_delta(oid, type);
+	int first = find_ref_delta(oid);
 	int last = first;
 	int end = nr_ref_deltas - 1;
 
@@ -983,12 +983,10 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 {
 	if (base->ref_last == -1 && base->ofs_last == -1) {
 		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last,
-					OBJ_REF_DELTA);
+					&base->ref_first, &base->ref_last);
 
 		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last,
-					OBJ_OFS_DELTA);
+					&base->ofs_first, &base->ofs_last);
 
 		if (base->ref_last == -1 && base->ofs_last == -1) {
 			free(base->data);
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 3/7] index-pack: unify threaded and unthreaded code
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 2/7] index-pack: remove redundant parameter Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 0e0889afc4..c7b4aef4e4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1211,15 +1211,7 @@ static void resolve_deltas(void)
 		cleanup_thread();
 		return;
 	}
-
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
-
-		if (is_delta_type(obj->type))
-			continue;
-		resolve_base(obj);
-		display_progress(progress, nr_resolved_deltas);
-	}
+	threaded_second_pass(&nothread_data);
 }
 
 /*
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 4/7] index-pack: remove redundant child field
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
                     ` (2 preceding siblings ...)
  2020-09-08 19:48   ` [PATCH v2 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

This is refactoring 1 of 2 to simplify struct base_data.

In index-pack, each thread maintains a doubly-linked list of the delta
chain that it is currently processing (the "base" and "child" pointers
in struct base_data). When a thread exceeds the delta base cache limit
and needs to reclaim memory, it uses the "child" pointers to traverse
the lineage, reclaiming the memory of the eldest delta bases first.

A subsequent patch will perform memory reclaiming in a different way and
will thus no longer need the "child" pointer. Because the "child"
pointer is redundant even now, remove it so that the aforementioned
subsequent patch will be clearer. In the meantime, reclaim memory in the
reverse order of the "base" pointers.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index c7b4aef4e4..c8db464557 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -34,7 +34,6 @@ struct object_stat {
 
 struct base_data {
 	struct base_data *base;
-	struct base_data *child;
 	struct object_entry *obj;
 	void *data;
 	unsigned long size;
@@ -44,7 +43,6 @@ struct base_data {
 
 struct thread_local {
 	pthread_t thread;
-	struct base_data *base_cache;
 	size_t base_cache_used;
 	int pack_fd;
 };
@@ -380,27 +378,37 @@ static void free_base_data(struct base_data *c)
 	}
 }
 
-static void prune_base_data(struct base_data *retain)
+static void prune_base_data(struct base_data *youngest_child)
 {
 	struct base_data *b;
 	struct thread_local *data = get_thread_data();
-	for (b = data->base_cache;
-	     data->base_cache_used > delta_base_cache_limit && b;
-	     b = b->child) {
-		if (b->data && b != retain)
-			free_base_data(b);
+	struct base_data **ancestry = NULL;
+	size_t nr = 0, alloc = 0;
+	ssize_t i;
+
+	if (data->base_cache_used <= delta_base_cache_limit)
+		return;
+
+	/*
+	 * Free all ancestors of youngest_child until we have enough space,
+	 * starting with the oldest. (We cannot free youngest_child itself.)
+	 */
+	for (b = youngest_child->base; b != NULL; b = b->base) {
+		ALLOC_GROW(ancestry, nr + 1, alloc);
+		ancestry[nr++] = b;
 	}
+	for (i = nr - 1;
+	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
+	     i--) {
+		if (ancestry[i]->data)
+			free_base_data(ancestry[i]);
+	}
+	free(ancestry);
 }
 
 static void link_base_data(struct base_data *base, struct base_data *c)
 {
-	if (base)
-		base->child = c;
-	else
-		get_thread_data()->base_cache = c;
-
 	c->base = base;
-	c->child = NULL;
 	if (c->data)
 		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
@@ -408,11 +416,6 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 
 static void unlink_base_data(struct base_data *c)
 {
-	struct base_data *base = c->base;
-	if (base)
-		base->child = NULL;
-	else
-		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
                     ` (3 preceding siblings ...)
  2020-09-08 19:48   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

This is refactoring 2 of 2 to simplify struct base_data.

Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.

Before this patch, the delta children were calculated at the last
possible moment. This allowed the members of struct base_data to be
populated in any order, superficially useful when we have the object
contents before the struct object_entry. But this makes reasoning about
the state of struct base_data more complicated, hence this patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 123 +++++++++++++++++++++----------------------
 1 file changed, 60 insertions(+), 63 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index c8db464557..94d0f53b03 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -33,12 +33,15 @@ struct object_stat {
 };
 
 struct base_data {
+	/* Initialized by make_base(). */
 	struct base_data *base;
 	struct object_entry *obj;
-	void *data;
-	unsigned long size;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+
+	/* Not initialized by make_base(). */
+	void *data;
+	unsigned long size;
 };
 
 struct thread_local {
@@ -362,14 +365,6 @@ static void set_thread_data(struct thread_local *data)
 		pthread_setspecific(key, data);
 }
 
-static struct base_data *alloc_base_data(void)
-{
-	struct base_data *base = xcalloc(1, sizeof(struct base_data));
-	base->ref_last = -1;
-	base->ofs_last = -1;
-	return base;
-}
-
 static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
@@ -406,19 +401,6 @@ static void prune_base_data(struct base_data *youngest_child)
 	free(ancestry);
 }
 
-static void link_base_data(struct base_data *base, struct base_data *c)
-{
-	c->base = base;
-	if (c->data)
-		get_thread_data()->base_cache_used += c->size;
-	prune_base_data(c);
-}
-
-static void unlink_base_data(struct base_data *c)
-{
-	free_base_data(c);
-}
-
 static int is_delta_type(enum object_type type)
 {
 	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
@@ -929,10 +911,25 @@ static void *get_base_data(struct base_data *c)
 	return c->data;
 }
 
-static void resolve_delta(struct object_entry *delta_obj,
-			  struct base_data *base, struct base_data *result)
+static struct base_data *make_base(struct object_entry *obj,
+				   struct base_data *parent)
 {
-	void *base_data, *delta_data;
+	struct base_data *base = xcalloc(1, sizeof(struct base_data));
+	base->base = parent;
+	base->obj = obj;
+	find_ref_delta_children(&obj->idx.oid,
+				&base->ref_first, &base->ref_last);
+	find_ofs_delta_children(obj->idx.offset,
+				&base->ofs_first, &base->ofs_last);
+	return base;
+}
+
+static struct base_data *resolve_delta(struct object_entry *delta_obj,
+				       struct base_data *base)
+{
+	void *base_data, *delta_data, *result_data;
+	struct base_data *result;
+	unsigned long result_size;
 
 	if (show_stat) {
 		int i = delta_obj - objects;
@@ -946,19 +943,31 @@ static void resolve_delta(struct object_entry *delta_obj,
 	}
 	delta_data = get_data_from_pack(delta_obj);
 	base_data = get_base_data(base);
-	result->obj = delta_obj;
-	result->data = patch_delta(base_data, base->size,
-				   delta_data, delta_obj->size, &result->size);
+	result_data = patch_delta(base_data, base->size,
+				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
-	if (!result->data)
+	if (!result_data)
 		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
-	hash_object_file(the_hash_algo, result->data, result->size,
+	hash_object_file(the_hash_algo, result_data, result_size,
 			 type_name(delta_obj->real_type), &delta_obj->idx.oid);
-	sha1_object(result->data, NULL, result->size, delta_obj->real_type,
+	sha1_object(result_data, NULL, result_size, delta_obj->real_type,
 		    &delta_obj->idx.oid);
+
+	result = make_base(delta_obj, base);
+	if (result->ref_last == -1 && result->ofs_last == -1) {
+		free(result_data);
+	} else {
+		result->data = result_data;
+		result->size = result_size;
+		get_thread_data()->base_cache_used += result->size;
+		prune_base_data(result);
+	}
+
 	counter_lock();
 	nr_resolved_deltas++;
 	counter_unlock();
+
+	return result;
 }
 
 /*
@@ -984,24 +993,9 @@ static int compare_and_swap_type(signed char *type,
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 						  struct base_data *prev_base)
 {
-	if (base->ref_last == -1 && base->ofs_last == -1) {
-		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last);
-
-		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last);
-
-		if (base->ref_last == -1 && base->ofs_last == -1) {
-			free(base->data);
-			return NULL;
-		}
-
-		link_base_data(prev_base, base);
-	}
-
 	if (base->ref_first <= base->ref_last) {
 		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
 					   base->obj->real_type))
@@ -1009,7 +1003,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 			    (uintmax_t)child->idx.offset,
 			    oid_to_hex(&base->obj->idx.oid));
 
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
 
@@ -1019,11 +1013,11 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 	if (base->ofs_first <= base->ofs_last) {
 		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
 
@@ -1031,7 +1025,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 		return result;
 	}
 
-	unlink_base_data(base);
+	free_base_data(base);
 	return NULL;
 }
 
@@ -1074,9 +1068,8 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 
 static void resolve_base(struct object_entry *obj)
 {
-	struct base_data *base_obj = alloc_base_data();
-	base_obj->obj = obj;
-	base_obj->data = NULL;
+	struct base_data *base_obj = make_base(obj, NULL);
+
 	find_unresolved_deltas(base_obj);
 }
 
@@ -1369,22 +1362,26 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base_obj = alloc_base_data();
+		struct base_data *base;
+		void *data;
+		unsigned long size;
+		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
-		base_obj->data = read_object_file(&d->oid, &type,
-						  &base_obj->size);
-		if (!base_obj->data)
+		data = read_object_file(&d->oid, &type, &size);
+		if (!data)
 			continue;
 
 		if (check_object_signature(the_repository, &d->oid,
-					   base_obj->data, base_obj->size,
+					   data, size,
 					   type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		base_obj->obj = append_obj_to_pack(f, d->oid.hash,
-					base_obj->data, base_obj->size, type);
-		find_unresolved_deltas(base_obj);
+		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
+		base = make_base(obj, NULL);
+		base->data = data;
+		base->size = size;
+		find_unresolved_deltas(base);
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 6/7] index-pack: make resolve_delta() assume base data
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
                     ` (4 preceding siblings ...)
  2020-09-08 19:48   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  2020-09-08 19:48   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
  2020-09-08 22:53   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano
  7 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 94d0f53b03..f69a50d46b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -927,7 +927,7 @@ static struct base_data *make_base(struct object_entry *obj,
 static struct base_data *resolve_delta(struct object_entry *delta_obj,
 				       struct base_data *base)
 {
-	void *base_data, *delta_data, *result_data;
+	void *delta_data, *result_data;
 	struct base_data *result;
 	unsigned long result_size;
 
@@ -942,8 +942,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		obj_stat[i].base_object_no = j;
 	}
 	delta_data = get_data_from_pack(delta_obj);
-	base_data = get_base_data(base);
-	result_data = patch_delta(base_data, base->size,
+	assert(base->data);
+	result_data = patch_delta(base->data, base->size,
 				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
 	if (!result_data)
@@ -1003,6 +1003,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 			    (uintmax_t)child->idx.offset,
 			    oid_to_hex(&base->obj->idx.oid));
 
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
@@ -1017,6 +1018,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 7/7] index-pack: make quantum of work smaller
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
                     ` (5 preceding siblings ...)
  2020-09-08 19:48   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  2020-09-08 22:53   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano
  7 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.

This patch contains a solution to that. When cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.

The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 348 +++++++++++++++++++++++++------------------
 1 file changed, 200 insertions(+), 148 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f69a50d46b..8acd078aa0 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -38,15 +38,56 @@ struct base_data {
 	struct object_entry *obj;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+	/*
+	 * Threads should increment retain_data if they are about to call
+	 * patch_delta() using this struct's data as a base, and decrement this
+	 * when they are done. While retain_data is nonzero, this struct's data
+	 * will not be freed even if the delta base cache limit is exceeded.
+	 */
+	int retain_data;
+	/*
+	 * The number of direct children that have not been fully processed
+	 * (entered work_head, entered done_head, left done_head). When this
+	 * number reaches zero, this struct base_data can be freed.
+	 */
+	int children_remaining;
 
 	/* Not initialized by make_base(). */
+	struct list_head list;
 	void *data;
 	unsigned long size;
 };
 
+/*
+ * Stack of struct base_data that have unprocessed children.
+ * threaded_second_pass() uses this as a source of work (the other being the
+ * objects array).
+ *
+ * Guarded by work_mutex.
+ */
+static LIST_HEAD(work_head);
+
+/*
+ * Stack of struct base_data that have children, all of whom have been
+ * processed or are being processed, and at least one child is being processed.
+ * These struct base_data must be kept around until the last child is
+ * processed.
+ *
+ * Guarded by work_mutex.
+ */
+static LIST_HEAD(done_head);
+
+/*
+ * All threads share one delta base cache.
+ *
+ * base_cache_used is guarded by work_mutex, and base_cache_limit is read-only
+ * in a thread.
+ */
+static size_t base_cache_used;
+static size_t base_cache_limit;
+
 struct thread_local {
 	pthread_t thread;
-	size_t base_cache_used;
 	int pack_fd;
 };
 
@@ -369,36 +410,38 @@ static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
 		FREE_AND_NULL(c->data);
-		get_thread_data()->base_cache_used -= c->size;
+		base_cache_used -= c->size;
 	}
 }
 
-static void prune_base_data(struct base_data *youngest_child)
+static void prune_base_data(struct base_data *retain)
 {
-	struct base_data *b;
-	struct thread_local *data = get_thread_data();
-	struct base_data **ancestry = NULL;
-	size_t nr = 0, alloc = 0;
-	ssize_t i;
+	struct list_head *pos;
 
-	if (data->base_cache_used <= delta_base_cache_limit)
+	if (base_cache_used <= base_cache_limit)
 		return;
 
-	/*
-	 * Free all ancestors of youngest_child until we have enough space,
-	 * starting with the oldest. (We cannot free youngest_child itself.)
-	 */
-	for (b = youngest_child->base; b != NULL; b = b->base) {
-		ALLOC_GROW(ancestry, nr + 1, alloc);
-		ancestry[nr++] = b;
+	list_for_each_prev(pos, &done_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	for (i = nr - 1;
-	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
-	     i--) {
-		if (ancestry[i]->data)
-			free_base_data(ancestry[i]);
+
+	list_for_each_prev(pos, &work_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	free(ancestry);
 }
 
 static int is_delta_type(enum object_type type)
@@ -851,15 +894,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 }
 
 /*
- * This function is part of find_unresolved_deltas(). There are two
- * walkers going in the opposite ways.
- *
- * The first one in find_unresolved_deltas() traverses down from
- * parent node to children, deflating nodes along the way. However,
- * memory for deflated nodes is limited by delta_base_cache_limit, so
- * at some point parent node's deflated content may be freed.
- *
- * The second walker is this function, which goes from current node up
+ * Walk from current node up
  * to top parent if necessary to deflate the node. In normal
  * situation, its parent node would be already deflated, so it just
  * needs to apply delta.
@@ -887,7 +922,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -903,7 +938,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, _("failed to apply delta"));
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -921,6 +956,8 @@ static struct base_data *make_base(struct object_entry *obj,
 				&base->ref_first, &base->ref_last);
 	find_ofs_delta_children(obj->idx.offset,
 				&base->ofs_first, &base->ofs_last);
+	base->children_remaining = base->ref_last - base->ref_first +
+		base->ofs_last - base->ofs_first + 2;
 	return base;
 }
 
@@ -954,14 +991,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		    &delta_obj->idx.oid);
 
 	result = make_base(delta_obj, base);
-	if (result->ref_last == -1 && result->ofs_last == -1) {
-		free(result_data);
-	} else {
-		result->data = result_data;
-		result->size = result_size;
-		get_thread_data()->base_cache_used += result->size;
-		prune_base_data(result);
-	}
+	result->data = result_data;
+	result->size = result_size;
 
 	counter_lock();
 	nr_resolved_deltas++;
@@ -970,86 +1001,6 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 	return result;
 }
 
-/*
- * Standard boolean compare-and-swap: atomically check whether "*type" is
- * "want"; if so, swap in "set" and return true. Otherwise, leave it untouched
- * and return false.
- */
-static int compare_and_swap_type(signed char *type,
-				 enum object_type want,
-				 enum object_type set)
-{
-	enum object_type old;
-
-	type_cas_lock();
-	old = *type;
-	if (old == want)
-		*type = set;
-	type_cas_unlock();
-
-	return old == want;
-}
-
-static struct base_data *find_unresolved_deltas_1(struct base_data *base,
-						  struct base_data *prev_base)
-{
-	if (base->ref_first <= base->ref_last) {
-		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result;
-
-		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
-					   base->obj->real_type))
-			die("REF_DELTA at offset %"PRIuMAX" already resolved (duplicate base %s?)",
-			    (uintmax_t)child->idx.offset,
-			    oid_to_hex(&base->obj->idx.oid));
-
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ref_first == base->ref_last && base->ofs_last == -1)
-			free_base_data(base);
-
-		base->ref_first++;
-		return result;
-	}
-
-	if (base->ofs_first <= base->ofs_last) {
-		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result;
-
-		assert(child->real_type == OBJ_OFS_DELTA);
-		child->real_type = base->obj->real_type;
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ofs_first == base->ofs_last)
-			free_base_data(base);
-
-		base->ofs_first++;
-		return result;
-	}
-
-	free_base_data(base);
-	return NULL;
-}
-
-static void find_unresolved_deltas(struct base_data *base)
-{
-	struct base_data *new_base, *prev_base = NULL;
-	for (;;) {
-		new_base = find_unresolved_deltas_1(base, prev_base);
-
-		if (new_base) {
-			prev_base = base;
-			base = new_base;
-		} else {
-			free(base);
-			base = prev_base;
-			if (!base)
-				return;
-			prev_base = base->base;
-		}
-	}
-}
-
 static int compare_ofs_delta_entry(const void *a, const void *b)
 {
 	const struct ofs_delta_entry *delta_a = a;
@@ -1068,33 +1019,131 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 	return oidcmp(&delta_a->oid, &delta_b->oid);
 }
 
-static void resolve_base(struct object_entry *obj)
-{
-	struct base_data *base_obj = make_base(obj, NULL);
-
-	find_unresolved_deltas(base_obj);
-}
-
 static void *threaded_second_pass(void *data)
 {
-	set_thread_data(data);
+	if (data)
+		set_thread_data(data);
 	for (;;) {
-		int i;
-		counter_lock();
-		display_progress(progress, nr_resolved_deltas);
-		counter_unlock();
+		struct base_data *parent = NULL;
+		struct object_entry *child_obj;
+		struct base_data *child;
+
 		work_lock();
-		while (nr_dispatched < nr_objects &&
-		       is_delta_type(objects[nr_dispatched].type))
-			nr_dispatched++;
-		if (nr_dispatched >= nr_objects) {
-			work_unlock();
-			break;
+		if (list_empty(&work_head)) {
+			/*
+			 * Take an object from the object array.
+			 */
+			while (nr_dispatched < nr_objects &&
+			       is_delta_type(objects[nr_dispatched].type))
+				nr_dispatched++;
+			if (nr_dispatched >= nr_objects) {
+				work_unlock();
+				break;
+			}
+			child_obj = &objects[nr_dispatched++];
+		} else {
+			/*
+			 * Peek at the top of the stack, and take a child from
+			 * it.
+			 */
+			parent = list_first_entry(&work_head, struct base_data,
+						  list);
+
+			if (parent->ref_first <= parent->ref_last) {
+				int offset = ref_deltas[parent->ref_first++].obj_no;
+				child_obj = objects + offset;
+				if (child_obj->real_type != OBJ_REF_DELTA)
+					die("REF_DELTA at offset %"PRIuMAX" already resolved (duplicate base %s?)",
+					    (uintmax_t) child_obj->idx.offset,
+					    oid_to_hex(&parent->obj->idx.oid));
+				child_obj->real_type = parent->obj->real_type;
+			} else {
+				child_obj = objects +
+					ofs_deltas[parent->ofs_first++].obj_no;
+				assert(child_obj->real_type == OBJ_OFS_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			}
+
+			if (parent->ref_first > parent->ref_last &&
+			    parent->ofs_first > parent->ofs_last) {
+				/*
+				 * This parent has run out of children, so move
+				 * it to done_head.
+				 */
+				list_del(&parent->list);
+				list_add(&parent->list, &done_head);
+			}
+
+			/*
+			 * Ensure that the parent has data, since we will need
+			 * it later.
+			 *
+			 * NEEDSWORK: If parent data needs to be reloaded, this
+			 * prolongs the time that the current thread spends in
+			 * the mutex. A mitigating factor is that parent data
+			 * needs to be reloaded only if the delta base cache
+			 * limit is exceeded, so in the typical case, this does
+			 * not happen.
+			 */
+			get_base_data(parent);
+			parent->retain_data++;
 		}
-		i = nr_dispatched++;
 		work_unlock();
 
-		resolve_base(&objects[i]);
+		if (parent) {
+			child = resolve_delta(child_obj, parent);
+			if (!child->children_remaining)
+				FREE_AND_NULL(child->data);
+		} else {
+			child = make_base(child_obj, NULL);
+			if (child->children_remaining) {
+				/*
+				 * Since this child has its own delta children,
+				 * we will need this data in the future.
+				 * Inflate now so that future iterations will
+				 * have access to this object's data while
+				 * outside the work mutex.
+				 */
+				child->data = get_data_from_pack(child_obj);
+				child->size = child_obj->size;
+			}
+		}
+
+		work_lock();
+		if (parent)
+			parent->retain_data--;
+		if (child->data) {
+			/*
+			 * This child has its own children, so add it to
+			 * work_head.
+			 */
+			list_add(&child->list, &work_head);
+			base_cache_used += child->size;
+			prune_base_data(NULL);
+		} else {
+			/*
+			 * This child does not have its own children. It may be
+			 * the last descendant of its ancestors; free those
+			 * that we can.
+			 */
+			struct base_data *p = parent;
+
+			while (p) {
+				struct base_data *next_p;
+
+				p->children_remaining--;
+				if (p->children_remaining)
+					break;
+
+				next_p = p->base;
+				free_base_data(p);
+				list_del(&p->list);
+				free(p);
+
+				p = next_p;
+			}
+		}
+		work_unlock();
 	}
 	return NULL;
 }
@@ -1195,6 +1244,7 @@ static void resolve_deltas(void)
 					  nr_ref_deltas + nr_ofs_deltas);
 
 	nr_dispatched = 0;
+	base_cache_limit = delta_base_cache_limit * nr_threads;
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
 		init_thread();
 		for (i = 0; i < nr_threads; i++) {
@@ -1364,10 +1414,8 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base;
 		void *data;
 		unsigned long size;
-		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
@@ -1379,11 +1427,15 @@ static void fix_unresolved_deltas(struct hashfile *f)
 					   data, size,
 					   type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
-		base = make_base(obj, NULL);
-		base->data = data;
-		base->size = size;
-		find_unresolved_deltas(base);
+
+		/*
+		 * Add this as an object to the objects array and call
+		 * threaded_second_pass() (which will pick up the added
+		 * object).
+		 */
+		append_obj_to_pack(f, d->oid.hash, data, size, type);
+		threaded_second_pass(NULL);
+
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/7] Better threaded delta resolution in index-pack (another try)
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
                     ` (6 preceding siblings ...)
  2020-09-08 19:48   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
@ 2020-09-08 22:53   ` Junio C Hamano
  7 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2020-09-08 22:53 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> Here's the reroll.
>
> [1] https://lore.kernel.org/git/xmqqk0xa8rvn.fsf@gitster.c.googlers.com/

It seems that the only changes (as expected) are squashing in the
follow-up "oops" fixes into the right place in the series?

Will replace; thanks.


>
> Jonathan Tan (7):
>   Documentation: deltaBaseCacheLimit is per-thread
>   index-pack: remove redundant parameter
>   index-pack: unify threaded and unthreaded code
>   index-pack: remove redundant child field
>   index-pack: calculate {ref,ofs}_{first,last} early
>   index-pack: make resolve_delta() assume base data
>   index-pack: make quantum of work smaller
>
>  Documentation/config/core.txt |   2 +-
>  builtin/index-pack.c          | 456 +++++++++++++++++++---------------
>  2 files changed, 251 insertions(+), 207 deletions(-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 7/7] index-pack: make quantum of work smaller
  2020-02-28  0:04     ` Josh Steadmon
@ 2020-03-10 21:42       ` Jonathan Tan
  0 siblings, 0 replies; 31+ messages in thread
From: Jonathan Tan @ 2020-03-10 21:42 UTC (permalink / raw)
  To: steadmon; +Cc: jonathantanmy, git, stolee, peff

> > +/*
> > + * Stack of struct base_data that have unprocessed children.
> > + * threaded_second_pass() uses this as a source of work (the other being the
> > + * objects array).
> > + */
> > +LIST_HEAD(work_head);
> > +
> > +/*
> > + * Stack of struct base_data that have children, all of whom have been
> > + * processed or are being processed, and at least one child is being processed.
> > + * These struct base_data must be kept around until the last child is
> > + * processed.
> > + */
> > +LIST_HEAD(done_head);
> > +
> > +/*
> > + * All threads share one delta base cache.
> > + */
> > +size_t base_cache_used;
> > +size_t base_cache_limit;
> > +
> >  struct thread_local {
> >  	pthread_t thread;
> > -	size_t base_cache_used;
> >  	int pack_fd;
> >  };
> 
> Seeing several new shared variables made me paranoid about whether or
> not these were thread-safe. There are already five mutexes in this file,
> but there are no clear descriptions (apart from hints in the lock names)
> about which locks protect which variables. It would be nice to make this
> a bit more explicit.

I think that this patch set should confine itself to the work mutex and
what it controls (since that's what this patch set is affecting), but
admittedly, that is already more than a handful.

The work mutex controls work_head and done_head, and all the struct
base_data contained therein, except for data. However, data is
controlled by the semaphore retain_data (and the semaphore is itself
controlled by the work mutex, as described in the previous sentence):
when retain_data is positive, data can be read from but not written to.
The work mutex also controls base_cache_used but not base_cache_limit.
The work mutex does not control any struct base_data outside of
work_head and done_head, so it is possible to generate a struct
base_data outside the mutex, take the mutex, and add it to the list
(reducing time spent in the mutex).

This scheme seems rather complicated to me, but I couldn't make it
simpler - I was hoping for advice on this. If this is the best we can
do, I can try to document it in code comments.

> I wish we had something similar to Abseil's "GUARDED_BY" annotations [1]
> so that we could ensure thread safety with static analysis, but that's
> obviously out of scope for this series.
> 
> [1]: https://abseil.io/docs/cpp/guides/synchronization#thread-annotations

I think what we need is more complicated than what these annotations
provide, but anyway, Abseil is C++-only, I believe.

> > @@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
> [snip]
> >  		work_unlock();
> >  
> > -		resolve_base(&objects[i]);
> > +		if (parent) {
> > +			child = resolve_delta(child_obj, parent);
> > +			if (!child->children_remaining)
> > +				FREE_AND_NULL(child->data);
> > +		} else {
> > +			child = make_base(child_obj, NULL);
> > +			if (child->children_remaining) {
> > +				/*
> > +				 * Since this child has its own delta children,
> > +				 * we will need this data in the future.
> > +				 * Inflate now so that future iterations will
> > +				 * have access to this object's data while
> > +				 * outside the work mutex.
> > +				 */
> > +				child->data = get_data_from_pack(child_obj);
> > +				child->size = child_obj->size;
> > +			}
> > +		}
> > +
> > +		work_lock();
> 
> This part made me uneasy at first, because it looks like we might be
> doing work on shared data outside of the work lock. But just prior to
> unlocking, we parent->retain_data, so we know it won't be freed.

That's correct that it made you uneasy (I think the usual solution is to
make data refcounted, but we don't have a refcounting implementation in
Git), and that's correct that my solution was that parent->retain_data
prevents data from being concurrently written to (in this case, freed).
I didn't know how better to implement this.

> And the
> only part we modify is child, which I don't believe is shared with any
> other threads.

Yes - in the part you quoted, child is a whole new object (created from
resolve_delta() or make_base()). It is only added to the list within the
mutex (the work_lock()).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 7/7] index-pack: make quantum of work smaller
  2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
@ 2020-02-28  0:04     ` Josh Steadmon
  2020-03-10 21:42       ` Jonathan Tan
  0 siblings, 1 reply; 31+ messages in thread
From: Josh Steadmon @ 2020-02-28  0:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, stolee, peff

On 2019.10.17 13:17, Jonathan Tan wrote:
> Currently, when index-pack resolves deltas, it does not split up delta
> trees into threads: each delta base root (an object that is not a
> REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
> that root (direct or indirect) are processed in the same thread.
> 
> This is a problem when a repository contains a large text file (thus,
> delta-able) that is modified many times - delta resolution time during
> fetching is dominated by processing the deltas corresponding to that
> text file.
> 
> This patch contains a solution to that. When cloning using
> 
>   git -c core.deltabasecachelimit=1g clone \
>     https://fuchsia.googlesource.com/third_party/vulkan-cts
> 
> on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
> which is the default).
> 
> The solution is to have a global work stack. This stack contains delta
> bases (objects, whether appearing directly in the packfile or generated
> by delta resolution, that themselves have delta children) that need to
> be processed; whenever a thread needs work, it peeks at the top of the
> stack and processes its next unprocessed child. If a thread finds the
> stack empty, it will look for more delta base roots to push on the stack
> instead.
> 
> The main weakness of having a global work stack is that more time is
> spent in the mutex, but profiling has shown that most time is spent in
> the resolution of the deltas themselves, so this shouldn't be an issue
> in practice. In any case, experimentation (as described in the clone
> command above) shows that this patch is a net improvement.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 336 ++++++++++++++++++++++++-------------------
>  1 file changed, 190 insertions(+), 146 deletions(-)
> 
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 31607a77fc..072592a35d 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -38,15 +38,49 @@ struct base_data {
>  	struct object_entry *obj;
>  	int ref_first, ref_last;
>  	int ofs_first, ofs_last;
> +	/*
> +	 * Threads should increment retain_data if they are about to call
> +	 * patch_delta() using this struct's data as a base, and decrement this
> +	 * when they are done. While retain_data is nonzero, this struct's data
> +	 * will not be freed even if the delta base cache limit is exceeded.
> +	 */
> +	int retain_data;
> +	/*
> +	 * The number of direct children that have not been fully processed
> +	 * (entered work_head, entered done_head, left done_head). When this
> +	 * number reaches zero, this struct base_data can be freed.
> +	 */
> +	int children_remaining;
>  
>  	/* Not initialized by make_base(). */
> +	struct list_head list;
>  	void *data;
>  	unsigned long size;
>  };

As a newb, the addition of a field just called "list" struck me as
needing a more descriptive name. But this makes more sense now after
looking at list.h. And it seems that most uses in other parts of the
code are similarly just named "list".


> +/*
> + * Stack of struct base_data that have unprocessed children.
> + * threaded_second_pass() uses this as a source of work (the other being the
> + * objects array).
> + */
> +LIST_HEAD(work_head);
> +
> +/*
> + * Stack of struct base_data that have children, all of whom have been
> + * processed or are being processed, and at least one child is being processed.
> + * These struct base_data must be kept around until the last child is
> + * processed.
> + */
> +LIST_HEAD(done_head);
> +
> +/*
> + * All threads share one delta base cache.
> + */
> +size_t base_cache_used;
> +size_t base_cache_limit;
> +
>  struct thread_local {
>  	pthread_t thread;
> -	size_t base_cache_used;
>  	int pack_fd;
>  };

Seeing several new shared variables made me paranoid about whether or
not these were thread-safe. There are already five mutexes in this file,
but there are no clear descriptions (apart from hints in the lock names)
about which locks protect which variables. It would be nice to make this
a bit more explicit.

I wish we had something similar to Abseil's "GUARDED_BY" annotations [1]
so that we could ensure thread safety with static analysis, but that's
obviously out of scope for this series.

[1]: https://abseil.io/docs/cpp/guides/synchronization#thread-annotations


> @@ -920,6 +948,8 @@ static struct base_data *make_base(struct object_entry *obj,
>  				&base->ref_first, &base->ref_last);
>  	find_ofs_delta_children(obj->idx.offset,
>  				&base->ofs_first, &base->ofs_last);
> +	base->children_remaining = base->ref_last - base->ref_first +
> +		base->ofs_last - base->ofs_first + 2;
>  	return base;
>  }

The extra +2 here confused me for a bit, but it makes sense now. You're
subtracting indices to get the length, so you want +1 to make the math
work out, and you're doing it across two different lists. In the edge
case where a list is empty, "first" is 0 and "last" is -1, so the math
still works out.


> @@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
[snip]
>  		work_unlock();
>  
> -		resolve_base(&objects[i]);
> +		if (parent) {
> +			child = resolve_delta(child_obj, parent);
> +			if (!child->children_remaining)
> +				FREE_AND_NULL(child->data);
> +		} else {
> +			child = make_base(child_obj, NULL);
> +			if (child->children_remaining) {
> +				/*
> +				 * Since this child has its own delta children,
> +				 * we will need this data in the future.
> +				 * Inflate now so that future iterations will
> +				 * have access to this object's data while
> +				 * outside the work mutex.
> +				 */
> +				child->data = get_data_from_pack(child_obj);
> +				child->size = child_obj->size;
> +			}
> +		}
> +
> +		work_lock();

This part made me uneasy at first, because it looks like we might be
doing work on shared data outside of the work lock. But just prior to
unlocking, we parent->retain_data, so we know it won't be freed. And the
only part we modify is child, which I don't believe is shared with any
other threads.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 7/7] index-pack: make quantum of work smaller
  2019-10-17 20:17 ` [PATCH v2 0/7] " Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2020-02-28  0:04     ` Josh Steadmon
  0 siblings, 1 reply; 31+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.

This patch contains a solution to that. When cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.

The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 336 ++++++++++++++++++++++++-------------------
 1 file changed, 190 insertions(+), 146 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 31607a77fc..072592a35d 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -38,15 +38,49 @@ struct base_data {
 	struct object_entry *obj;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+	/*
+	 * Threads should increment retain_data if they are about to call
+	 * patch_delta() using this struct's data as a base, and decrement this
+	 * when they are done. While retain_data is nonzero, this struct's data
+	 * will not be freed even if the delta base cache limit is exceeded.
+	 */
+	int retain_data;
+	/*
+	 * The number of direct children that have not been fully processed
+	 * (entered work_head, entered done_head, left done_head). When this
+	 * number reaches zero, this struct base_data can be freed.
+	 */
+	int children_remaining;
 
 	/* Not initialized by make_base(). */
+	struct list_head list;
 	void *data;
 	unsigned long size;
 };
 
+/*
+ * Stack of struct base_data that have unprocessed children.
+ * threaded_second_pass() uses this as a source of work (the other being the
+ * objects array).
+ */
+LIST_HEAD(work_head);
+
+/*
+ * Stack of struct base_data that have children, all of whom have been
+ * processed or are being processed, and at least one child is being processed.
+ * These struct base_data must be kept around until the last child is
+ * processed.
+ */
+LIST_HEAD(done_head);
+
+/*
+ * All threads share one delta base cache.
+ */
+size_t base_cache_used;
+size_t base_cache_limit;
+
 struct thread_local {
 	pthread_t thread;
-	size_t base_cache_used;
 	int pack_fd;
 };
 
@@ -369,36 +403,38 @@ static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
 		FREE_AND_NULL(c->data);
-		get_thread_data()->base_cache_used -= c->size;
+		base_cache_used -= c->size;
 	}
 }
 
-static void prune_base_data(struct base_data *youngest_child)
+static void prune_base_data(struct base_data *retain)
 {
-	struct base_data *b;
-	struct thread_local *data = get_thread_data();
-	struct base_data **ancestry = NULL;
-	size_t nr = 0, alloc = 0;
-	ssize_t i;
+	struct list_head *pos;
 
-	if (data->base_cache_used <= delta_base_cache_limit)
+	if (base_cache_used <= base_cache_limit)
 		return;
 
-	/*
-	 * Free all ancestors of youngest_child until we have enough space,
-	 * starting with the oldest. (We cannot free youngest_child itself.)
-	 */
-	for (b = youngest_child->base; b != NULL; b = b->base) {
-		ALLOC_GROW(ancestry, nr + 1, alloc);
-		ancestry[nr++] = b;
+	list_for_each_prev(pos, &done_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	for (i = nr - 1;
-	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
-	     i--) {
-		if (ancestry[i]->data)
-			free_base_data(ancestry[i]);
+
+	list_for_each_prev(pos, &work_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	free(ancestry);
 }
 
 static int is_delta_type(enum object_type type)
@@ -850,15 +886,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 }
 
 /*
- * This function is part of find_unresolved_deltas(). There are two
- * walkers going in the opposite ways.
- *
- * The first one in find_unresolved_deltas() traverses down from
- * parent node to children, deflating nodes along the way. However,
- * memory for deflated nodes is limited by delta_base_cache_limit, so
- * at some point parent node's deflated content may be freed.
- *
- * The second walker is this function, which goes from current node up
+ * Walk from current node up
  * to top parent if necessary to deflate the node. In normal
  * situation, its parent node would be already deflated, so it just
  * needs to apply delta.
@@ -886,7 +914,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -902,7 +930,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, _("failed to apply delta"));
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -920,6 +948,8 @@ static struct base_data *make_base(struct object_entry *obj,
 				&base->ref_first, &base->ref_last);
 	find_ofs_delta_children(obj->idx.offset,
 				&base->ofs_first, &base->ofs_last);
+	base->children_remaining = base->ref_last - base->ref_first +
+		base->ofs_last - base->ofs_first + 2;
 	return base;
 }
 
@@ -953,14 +983,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		    &delta_obj->idx.oid);
 
 	result = make_base(delta_obj, base);
-	if (result->ref_last == -1 && result->ofs_last == -1) {
-		free(result_data);
-	} else {
-		result->data = result_data;
-		result->size = result_size;
-		get_thread_data()->base_cache_used += result->size;
-		prune_base_data(result);
-	}
+	result->data = result_data;
+	result->size = result_size;
 
 	counter_lock();
 	nr_resolved_deltas++;
@@ -969,84 +993,6 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 	return result;
 }
 
-/*
- * Standard boolean compare-and-swap: atomically check whether "*type" is
- * "want"; if so, swap in "set" and return true. Otherwise, leave it untouched
- * and return false.
- */
-static int compare_and_swap_type(signed char *type,
-				 enum object_type want,
-				 enum object_type set)
-{
-	enum object_type old;
-
-	type_cas_lock();
-	old = *type;
-	if (old == want)
-		*type = set;
-	type_cas_unlock();
-
-	return old == want;
-}
-
-static struct base_data *find_unresolved_deltas_1(struct base_data *base,
-						  struct base_data *prev_base)
-{
-	if (base->ref_first <= base->ref_last) {
-		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result;
-
-		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
-					   base->obj->real_type))
-			BUG("child->real_type != OBJ_REF_DELTA");
-
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ref_first == base->ref_last && base->ofs_last == -1)
-			free_base_data(base);
-
-		base->ref_first++;
-		return result;
-	}
-
-	if (base->ofs_first <= base->ofs_last) {
-		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result;
-
-		assert(child->real_type == OBJ_OFS_DELTA);
-		child->real_type = base->obj->real_type;
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ofs_first == base->ofs_last)
-			free_base_data(base);
-
-		base->ofs_first++;
-		return result;
-	}
-
-	free_base_data(base);
-	return NULL;
-}
-
-static void find_unresolved_deltas(struct base_data *base)
-{
-	struct base_data *new_base, *prev_base = NULL;
-	for (;;) {
-		new_base = find_unresolved_deltas_1(base, prev_base);
-
-		if (new_base) {
-			prev_base = base;
-			base = new_base;
-		} else {
-			free(base);
-			base = prev_base;
-			if (!base)
-				return;
-			prev_base = base->base;
-		}
-	}
-}
-
 static int compare_ofs_delta_entry(const void *a, const void *b)
 {
 	const struct ofs_delta_entry *delta_a = a;
@@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 	return oidcmp(&delta_a->oid, &delta_b->oid);
 }
 
-static void resolve_base(struct object_entry *obj)
-{
-	struct base_data *base_obj = make_base(obj, NULL);
-
-	find_unresolved_deltas(base_obj);
-}
-
 static void *threaded_second_pass(void *data)
 {
-	set_thread_data(data);
+	if (data)
+		set_thread_data(data);
 	for (;;) {
-		int i;
-		counter_lock();
-		display_progress(progress, nr_resolved_deltas);
-		counter_unlock();
+		struct base_data *parent = NULL;
+		struct object_entry *child_obj;
+		struct base_data *child;
+
 		work_lock();
-		while (nr_dispatched < nr_objects &&
-		       is_delta_type(objects[nr_dispatched].type))
-			nr_dispatched++;
-		if (nr_dispatched >= nr_objects) {
-			work_unlock();
-			break;
+		if (list_empty(&work_head)) {
+			/*
+			 * Take an object from the object array.
+			 */
+			while (nr_dispatched < nr_objects &&
+			       is_delta_type(objects[nr_dispatched].type))
+				nr_dispatched++;
+			if (nr_dispatched >= nr_objects) {
+				work_unlock();
+				break;
+			}
+			child_obj = &objects[nr_dispatched++];
+		} else {
+			/*
+			 * Peek at the top of the stack, and take a child from
+			 * it.
+			 */
+			parent = list_first_entry(&work_head, struct base_data,
+						  list);
+
+			if (parent->ref_first <= parent->ref_last) {
+				child_obj = objects +
+					ref_deltas[parent->ref_first++].obj_no;
+				assert(child_obj->real_type == OBJ_REF_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			} else {
+				child_obj = objects +
+					ofs_deltas[parent->ofs_first++].obj_no;
+				assert(child_obj->real_type == OBJ_OFS_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			}
+
+			if (parent->ref_first > parent->ref_last &&
+			    parent->ofs_first > parent->ofs_last) {
+				/*
+				 * This parent has run out of children, so move
+				 * it to done_head.
+				 */
+				list_del(&parent->list);
+				list_add(&parent->list, &done_head);
+			}
+
+			/*
+			 * Ensure that the parent has data, since we will need
+			 * it later.
+			 *
+			 * NEEDSWORK: If parent data needs to be reloaded, this
+			 * prolongs the time that the current thread spends in
+			 * the mutex. A mitigating factor is that parent data
+			 * needs to be reloaded only if the delta base cache
+			 * limit is exceeded, so in the typical case, this does
+			 * not happen.
+			 */
+			get_base_data(parent);
+			parent->retain_data++;
 		}
-		i = nr_dispatched++;
 		work_unlock();
 
-		resolve_base(&objects[i]);
+		if (parent) {
+			child = resolve_delta(child_obj, parent);
+			if (!child->children_remaining)
+				FREE_AND_NULL(child->data);
+		} else {
+			child = make_base(child_obj, NULL);
+			if (child->children_remaining) {
+				/*
+				 * Since this child has its own delta children,
+				 * we will need this data in the future.
+				 * Inflate now so that future iterations will
+				 * have access to this object's data while
+				 * outside the work mutex.
+				 */
+				child->data = get_data_from_pack(child_obj);
+				child->size = child_obj->size;
+			}
+		}
+
+		work_lock();
+		if (parent)
+			parent->retain_data--;
+		if (child->data) {
+			/*
+			 * This child has its own children, so add it to
+			 * work_head.
+			 */
+			list_add(&child->list, &work_head);
+			base_cache_used += child->size;
+			prune_base_data(NULL);
+		} else {
+			/*
+			 * This child does not have its own children. It may be
+			 * the last descendant of its ancestors; free those
+			 * that we can.
+			 */
+			struct base_data *p = parent;
+
+			while (p) {
+				struct base_data *next_p;
+
+				p->children_remaining--;
+				if (p->children_remaining)
+					break;
+
+				next_p = p->base;
+				free_base_data(p);
+				list_del(&p->list);
+				free(p);
+
+				p = next_p;
+			}
+		}
+		work_unlock();
 	}
 	return NULL;
 }
@@ -1192,6 +1233,7 @@ static void resolve_deltas(void)
 					  nr_ref_deltas + nr_ofs_deltas);
 
 	nr_dispatched = 0;
+	base_cache_limit = delta_base_cache_limit * nr_threads;
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
 		init_thread();
 		for (i = 0; i < nr_threads; i++) {
@@ -1362,10 +1404,8 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base;
 		void *data;
 		unsigned long size;
-		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
@@ -1376,11 +1416,15 @@ static void fix_unresolved_deltas(struct hashfile *f)
 		if (check_object_signature(&d->oid, data,
 					   size, type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
-		base = make_base(obj, NULL);
-		base->data = data;
-		base->size = size;
-		find_unresolved_deltas(base);
+
+		/*
+		 * Add this as an object to the objects array and call
+		 * threaded_second_pass() (which will pick up the added
+		 * object).
+		 */
+		append_obj_to_pack(f, d->oid.hash, data, size, type);
+		threaded_second_pass(NULL);
+
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-09-08 22:53 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
2020-08-24 19:16 ` [PATCH 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2020-08-24 19:16 ` [PATCH] fetch-pack: in partial clone, pass --promisor Jonathan Tan
2020-08-24 19:36   ` Jonathan Tan
2020-08-24 19:16 ` [PATCH 2/7] index-pack: remove redundant parameter Jonathan Tan
2020-08-24 21:01   ` Junio C Hamano
2020-08-24 19:16 ` [PATCH 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2020-08-24 21:11   ` Junio C Hamano
2020-08-24 19:16 ` [PATCH 4/7] index-pack: remove redundant child field Jonathan Tan
2020-08-24 19:16 ` [PATCH 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2020-08-24 19:16 ` [PATCH 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2020-08-24 19:16 ` [PATCH 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-08-24 21:19   ` Junio C Hamano
2020-08-24 20:47 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano
2020-08-24 21:27 ` [PATCH] fixup! index-pack: make quantum of work smaller Jonathan Tan
2020-08-24 22:08 ` [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jeff King
2020-08-25 18:11   ` Jonathan Tan
2020-08-25 21:18     ` Jeff King
2020-08-25 21:46       ` Jeff King
2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 2/7] index-pack: remove redundant parameter Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 3/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-09-08 22:53   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack (another try) Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-17 20:17 ` [PATCH v2 0/7] " Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2020-03-10 21:42       ` Jonathan Tan

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git