git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/6] Better threaded delta resolution in index-pack
@ 2019-10-09 23:44 Jonathan Tan
  2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

Quoting myself [1]:

> index-pack does parallelize delta resolution, but
> it cannot split up trees into threads: each delta base root can go into
> its own thread, but when a delta base root is processed, all deltas on
> that root (direct or indirect) is processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file. Here are patches that teach index-pack to better divide up
the work.

As an example of the effect, when cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

As you can see from the diff stats, my new algorithm uses comparable
lines of code to the existing one, but I think that it is a bit more
complicated. My main point of difficulty was in handling the delta base
cache - it must be GC-able, but at the same time available to another
thread if it was being used as a base to inflate a delta. In the end,
what I did was to make individual mutex-guarded refcounts for each
inflation result, but the buffer itself is not mutex-guarded: so a
thread could increment the refcount within the mutex, inflate (and
verify) outside the mutex, and then decrement the refcount within the
mutex. (One global mutex guards all the refcounts, as well as other
things.) Any ideas for making this design less complicated is
appreciated.

If this is a good direction, let me know and I'll refine the patches. I
personally think that the improvement in performance is worth the slight
added complexity. Also, in this patch set, I did some cleanup to make
future patches clearer, but some of the cleanup is undone by the future
patches themselves; let me know if it's easier to review if I should
squash those patches.

Also CC-ing Mike Hommey because Mike brought up a repo with a similar
case [2], although that case happens during repack.

[1] https://public-inbox.org/git/20190926003300.195781-1-jonathantanmy@google.com/
[2] https://public-inbox.org/git/20190704100530.smn4rpiekwtfylhz@glandium.org/

Jonathan Tan (6):
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant parameter
  index-pack: remove redundant child field
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: make resolve_delta() assume base data
  index-pack: make quantum of work smaller

 builtin/index-pack.c | 375 ++++++++++++++++++++-----------------------
 1 file changed, 177 insertions(+), 198 deletions(-)

-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] index-pack: unify threaded and unthreaded code
  2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
@ 2019-10-09 23:44 ` Jonathan Tan
  2019-10-17  6:20   ` Jeff King
  2019-10-09 23:44 ` [PATCH 2/6] index-pack: remove redundant parameter Jonathan Tan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index a23454da6e..1a2600d4b3 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1210,15 +1210,7 @@ static void resolve_deltas(void)
 		cleanup_thread();
 		return;
 	}
-
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
-
-		if (is_delta_type(obj->type))
-			continue;
-		resolve_base(obj);
-		display_progress(progress, nr_resolved_deltas);
-	}
+	threaded_second_pass(&nothread_data);
 }
 
 /*
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/6] index-pack: remove redundant parameter
  2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
  2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
@ 2019-10-09 23:44 ` Jonathan Tan
  2019-10-17  6:21   ` Jeff King
  2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 1a2600d4b3..99f6e2957f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -614,7 +614,7 @@ static int compare_ofs_delta_bases(off_t offset1, off_t offset2,
 	       0;
 }
 
-static int find_ofs_delta(const off_t offset, enum object_type type)
+static int find_ofs_delta(const off_t offset)
 {
 	int first = 0, last = nr_ofs_deltas;
 
@@ -624,7 +624,8 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 		int cmp;
 
 		cmp = compare_ofs_delta_bases(offset, delta->offset,
-					      type, objects[delta->obj_no].type);
+					      OBJ_OFS_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -637,10 +638,9 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 }
 
 static void find_ofs_delta_children(off_t offset,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ofs_delta(offset, type);
+	int first = find_ofs_delta(offset);
 	int last = first;
 	int end = nr_ofs_deltas - 1;
 
@@ -668,7 +668,7 @@ static int compare_ref_delta_bases(const struct object_id *oid1,
 	return oidcmp(oid1, oid2);
 }
 
-static int find_ref_delta(const struct object_id *oid, enum object_type type)
+static int find_ref_delta(const struct object_id *oid)
 {
 	int first = 0, last = nr_ref_deltas;
 
@@ -678,7 +678,8 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 		int cmp;
 
 		cmp = compare_ref_delta_bases(oid, &delta->oid,
-					      type, objects[delta->obj_no].type);
+					      OBJ_REF_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -691,10 +692,9 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 }
 
 static void find_ref_delta_children(const struct object_id *oid,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ref_delta(oid, type);
+	int first = find_ref_delta(oid);
 	int last = first;
 	int end = nr_ref_deltas - 1;
 
@@ -982,12 +982,10 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 {
 	if (base->ref_last == -1 && base->ofs_last == -1) {
 		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last,
-					OBJ_REF_DELTA);
+					&base->ref_first, &base->ref_last);
 
 		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last,
-					OBJ_OFS_DELTA);
+					&base->ofs_first, &base->ofs_last);
 
 		if (base->ref_last == -1 && base->ofs_last == -1) {
 			free(base->data);
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/6] index-pack: remove redundant child field
  2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
  2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
  2019-10-09 23:44 ` [PATCH 2/6] index-pack: remove redundant parameter Jonathan Tan
@ 2019-10-09 23:44 ` Jonathan Tan
  2019-10-10 14:45   ` Derrick Stolee
  2019-10-17  6:26   ` Jeff King
  2019-10-09 23:44 ` [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

Instead, recompute ancestry if we ever need to reclaim memory.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 99f6e2957f..35f7f9e52b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -34,7 +34,6 @@ struct object_stat {
 
 struct base_data {
 	struct base_data *base;
-	struct base_data *child;
 	struct object_entry *obj;
 	void *data;
 	unsigned long size;
@@ -44,7 +43,6 @@ struct base_data {
 
 struct thread_local {
 	pthread_t thread;
-	struct base_data *base_cache;
 	size_t base_cache_used;
 	int pack_fd;
 };
@@ -380,27 +378,37 @@ static void free_base_data(struct base_data *c)
 	}
 }
 
-static void prune_base_data(struct base_data *retain)
+static void prune_base_data(struct base_data *youngest_child)
 {
 	struct base_data *b;
 	struct thread_local *data = get_thread_data();
-	for (b = data->base_cache;
-	     data->base_cache_used > delta_base_cache_limit && b;
-	     b = b->child) {
-		if (b->data && b != retain)
-			free_base_data(b);
+	struct base_data **ancestry = NULL;
+	int nr = 0, alloc = 0;
+	int i;
+
+	if (data->base_cache_used <= delta_base_cache_limit)
+		return;
+
+	/*
+	 * Free all ancestors of youngest_child until we have enough space,
+	 * starting with the oldest. (We cannot free youngest_child itself.)
+	 */
+	for (b = youngest_child->base; b != NULL; b = b->base) {
+		ALLOC_GROW(ancestry, nr + 1, alloc);
+		ancestry[nr++] = b;
+	}
+	for (i = nr - 1;
+	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
+	     i--) {
+		if (ancestry[i]->data)
+			free_base_data(ancestry[i]);
 	}
+	free(ancestry);
 }
 
 static void link_base_data(struct base_data *base, struct base_data *c)
 {
-	if (base)
-		base->child = c;
-	else
-		get_thread_data()->base_cache = c;
-
 	c->base = base;
-	c->child = NULL;
 	if (c->data)
 		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
@@ -408,11 +416,6 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 
 static void unlink_base_data(struct base_data *c)
 {
-	struct base_data *base = c->base;
-	if (base)
-		base->child = NULL;
-	else
-		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early
  2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
                   ` (2 preceding siblings ...)
  2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
@ 2019-10-09 23:44 ` Jonathan Tan
  2019-10-17  6:30   ` Jeff King
  2019-10-09 23:44 ` [PATCH 5/6] index-pack: make resolve_delta() assume base data Jonathan Tan
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 125 +++++++++++++++++++++----------------------
 1 file changed, 61 insertions(+), 64 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 35f7f9e52b..17997834ec 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -33,12 +33,15 @@ struct object_stat {
 };
 
 struct base_data {
+	/* Initialized by make_base(). */
 	struct base_data *base;
 	struct object_entry *obj;
-	void *data;
-	unsigned long size;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+
+	/* Not initialized by make_base(). */
+	void *data;
+	unsigned long size;
 };
 
 struct thread_local {
@@ -362,14 +365,6 @@ static void set_thread_data(struct thread_local *data)
 		pthread_setspecific(key, data);
 }
 
-static struct base_data *alloc_base_data(void)
-{
-	struct base_data *base = xcalloc(1, sizeof(struct base_data));
-	base->ref_last = -1;
-	base->ofs_last = -1;
-	return base;
-}
-
 static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
@@ -406,19 +401,6 @@ static void prune_base_data(struct base_data *youngest_child)
 	free(ancestry);
 }
 
-static void link_base_data(struct base_data *base, struct base_data *c)
-{
-	c->base = base;
-	if (c->data)
-		get_thread_data()->base_cache_used += c->size;
-	prune_base_data(c);
-}
-
-static void unlink_base_data(struct base_data *c)
-{
-	free_base_data(c);
-}
-
 static int is_delta_type(enum object_type type)
 {
 	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
@@ -928,10 +910,25 @@ static void *get_base_data(struct base_data *c)
 	return c->data;
 }
 
-static void resolve_delta(struct object_entry *delta_obj,
-			  struct base_data *base, struct base_data *result)
+static struct base_data *make_base(struct object_entry *obj,
+				   struct base_data *parent)
 {
-	void *base_data, *delta_data;
+	struct base_data *base = xcalloc(1, sizeof(struct base_data));
+	base->base = parent;
+	base->obj = obj;
+	find_ref_delta_children(&obj->idx.oid,
+				&base->ref_first, &base->ref_last);
+	find_ofs_delta_children(obj->idx.offset,
+				&base->ofs_first, &base->ofs_last);
+	return base;
+}
+
+static struct base_data *resolve_delta(struct object_entry *delta_obj,
+				       struct base_data *base)
+{
+	void *base_data, *delta_data, *result_data;
+	struct base_data *result;
+	unsigned long result_size;
 
 	if (show_stat) {
 		int i = delta_obj - objects;
@@ -945,19 +942,31 @@ static void resolve_delta(struct object_entry *delta_obj,
 	}
 	delta_data = get_data_from_pack(delta_obj);
 	base_data = get_base_data(base);
-	result->obj = delta_obj;
-	result->data = patch_delta(base_data, base->size,
-				   delta_data, delta_obj->size, &result->size);
+	result_data = patch_delta(base_data, base->size,
+				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
-	if (!result->data)
+	if (!result_data)
 		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
-	hash_object_file(result->data, result->size,
+	hash_object_file(result_data, result_size,
 			 type_name(delta_obj->real_type), &delta_obj->idx.oid);
-	sha1_object(result->data, NULL, result->size, delta_obj->real_type,
+	sha1_object(result_data, NULL, result_size, delta_obj->real_type,
 		    &delta_obj->idx.oid);
+
+	result = make_base(delta_obj, base);
+	if (result->ref_last == -1 && result->ofs_last == -1) {
+		free(result_data);
+	} else {
+		result->data = result_data;
+		result->size = result_size;
+		get_thread_data()->base_cache_used += result->size;
+		prune_base_data(result);
+	}
+
 	counter_lock();
 	nr_resolved_deltas++;
 	counter_unlock();
+
+	return result;
 }
 
 /*
@@ -983,30 +992,15 @@ static int compare_and_swap_type(signed char *type,
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 						  struct base_data *prev_base)
 {
-	if (base->ref_last == -1 && base->ofs_last == -1) {
-		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last);
-
-		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last);
-
-		if (base->ref_last == -1 && base->ofs_last == -1) {
-			free(base->data);
-			return NULL;
-		}
-
-		link_base_data(prev_base, base);
-	}
-
 	if (base->ref_first <= base->ref_last) {
 		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
 					   base->obj->real_type))
 			BUG("child->real_type != OBJ_REF_DELTA");
 
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
 
@@ -1016,11 +1010,11 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 	if (base->ofs_first <= base->ofs_last) {
 		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
 
@@ -1028,7 +1022,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 		return result;
 	}
 
-	unlink_base_data(base);
+	free_base_data(base);
 	return NULL;
 }
 
@@ -1071,9 +1065,8 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 
 static void resolve_base(struct object_entry *obj)
 {
-	struct base_data *base_obj = alloc_base_data();
-	base_obj->obj = obj;
-	base_obj->data = NULL;
+	struct base_data *base_obj = make_base(obj, NULL);
+
 	find_unresolved_deltas(base_obj);
 }
 
@@ -1367,21 +1360,25 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base_obj = alloc_base_data();
+		struct base_data *base;
+		void *data;
+		unsigned long size;
+		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
-		base_obj->data = read_object_file(&d->oid, &type,
-						  &base_obj->size);
-		if (!base_obj->data)
+		data = read_object_file(&d->oid, &type, &size);
+		if (!data)
 			continue;
 
-		if (check_object_signature(&d->oid, base_obj->data,
-				base_obj->size, type_name(type)))
+		if (check_object_signature(&d->oid, data,
+					   size, type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		base_obj->obj = append_obj_to_pack(f, d->oid.hash,
-					base_obj->data, base_obj->size, type);
-		find_unresolved_deltas(base_obj);
+		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
+		base = make_base(obj, NULL);
+		base->data = data;
+		base->size = size;
+		find_unresolved_deltas(base);
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/6] index-pack: make resolve_delta() assume base data
  2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
                   ` (3 preceding siblings ...)
  2019-10-09 23:44 ` [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
@ 2019-10-09 23:44 ` Jonathan Tan
  2019-10-17  6:32   ` Jeff King
  2019-10-09 23:44 ` [PATCH 6/6] index-pack: make quantum of work smaller Jonathan Tan
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
  6 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 17997834ec..3908cd3115 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -926,7 +926,7 @@ static struct base_data *make_base(struct object_entry *obj,
 static struct base_data *resolve_delta(struct object_entry *delta_obj,
 				       struct base_data *base)
 {
-	void *base_data, *delta_data, *result_data;
+	void *delta_data, *result_data;
 	struct base_data *result;
 	unsigned long result_size;
 
@@ -941,8 +941,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		obj_stat[i].base_object_no = j;
 	}
 	delta_data = get_data_from_pack(delta_obj);
-	base_data = get_base_data(base);
-	result_data = patch_delta(base_data, base->size,
+	assert(base->data);
+	result_data = patch_delta(base->data, base->size,
 				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
 	if (!result_data)
@@ -1000,6 +1000,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 					   base->obj->real_type))
 			BUG("child->real_type != OBJ_REF_DELTA");
 
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
@@ -1014,6 +1015,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/6] index-pack: make quantum of work smaller
  2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
                   ` (4 preceding siblings ...)
  2019-10-09 23:44 ` [PATCH 5/6] index-pack: make resolve_delta() assume base data Jonathan Tan
@ 2019-10-09 23:44 ` Jonathan Tan
  2019-10-17  6:35   ` Jeff King
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
  6 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 267 ++++++++++++++++++++-----------------------
 1 file changed, 127 insertions(+), 140 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3908cd3115..f6318037ca 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -38,15 +38,22 @@ struct base_data {
 	struct object_entry *obj;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+	int retain_data;
+	int children_remaining;
 
 	/* Not initialized by make_base(). */
+	struct list_head list;
 	void *data;
 	unsigned long size;
 };
 
+LIST_HEAD(work_head);
+LIST_HEAD(done_head);
+size_t base_cache_used;
+size_t base_cache_limit;
+
 struct thread_local {
 	pthread_t thread;
-	size_t base_cache_used;
 	int pack_fd;
 };
 
@@ -369,36 +376,38 @@ static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
 		FREE_AND_NULL(c->data);
-		get_thread_data()->base_cache_used -= c->size;
+		base_cache_used -= c->size;
 	}
 }
 
-static void prune_base_data(struct base_data *youngest_child)
+static void prune_base_data(struct base_data *retain)
 {
-	struct base_data *b;
-	struct thread_local *data = get_thread_data();
-	struct base_data **ancestry = NULL;
-	int nr = 0, alloc = 0;
-	int i;
+	struct list_head *pos;
 
-	if (data->base_cache_used <= delta_base_cache_limit)
+	if (base_cache_used <= base_cache_limit)
 		return;
 
-	/*
-	 * Free all ancestors of youngest_child until we have enough space,
-	 * starting with the oldest. (We cannot free youngest_child itself.)
-	 */
-	for (b = youngest_child->base; b != NULL; b = b->base) {
-		ALLOC_GROW(ancestry, nr + 1, alloc);
-		ancestry[nr++] = b;
+	list_for_each_prev(pos, &done_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	for (i = nr - 1;
-	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
-	     i--) {
-		if (ancestry[i]->data)
-			free_base_data(ancestry[i]);
+
+	list_for_each_prev(pos, &work_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	free(ancestry);
 }
 
 static int is_delta_type(enum object_type type)
@@ -886,7 +895,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -902,7 +911,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, _("failed to apply delta"));
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -920,6 +929,8 @@ static struct base_data *make_base(struct object_entry *obj,
 				&base->ref_first, &base->ref_last);
 	find_ofs_delta_children(obj->idx.offset,
 				&base->ofs_first, &base->ofs_last);
+	base->children_remaining = base->ref_last - base->ref_first +
+		base->ofs_last - base->ofs_first + 2;
 	return base;
 }
 
@@ -953,14 +964,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		    &delta_obj->idx.oid);
 
 	result = make_base(delta_obj, base);
-	if (result->ref_last == -1 && result->ofs_last == -1) {
-		free(result_data);
-	} else {
-		result->data = result_data;
-		result->size = result_size;
-		get_thread_data()->base_cache_used += result->size;
-		prune_base_data(result);
-	}
+	result->data = result_data;
+	result->size = result_size;
 
 	counter_lock();
 	nr_resolved_deltas++;
@@ -969,84 +974,6 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 	return result;
 }
 
-/*
- * Standard boolean compare-and-swap: atomically check whether "*type" is
- * "want"; if so, swap in "set" and return true. Otherwise, leave it untouched
- * and return false.
- */
-static int compare_and_swap_type(signed char *type,
-				 enum object_type want,
-				 enum object_type set)
-{
-	enum object_type old;
-
-	type_cas_lock();
-	old = *type;
-	if (old == want)
-		*type = set;
-	type_cas_unlock();
-
-	return old == want;
-}
-
-static struct base_data *find_unresolved_deltas_1(struct base_data *base,
-						  struct base_data *prev_base)
-{
-	if (base->ref_first <= base->ref_last) {
-		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result;
-
-		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
-					   base->obj->real_type))
-			BUG("child->real_type != OBJ_REF_DELTA");
-
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ref_first == base->ref_last && base->ofs_last == -1)
-			free_base_data(base);
-
-		base->ref_first++;
-		return result;
-	}
-
-	if (base->ofs_first <= base->ofs_last) {
-		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result;
-
-		assert(child->real_type == OBJ_OFS_DELTA);
-		child->real_type = base->obj->real_type;
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ofs_first == base->ofs_last)
-			free_base_data(base);
-
-		base->ofs_first++;
-		return result;
-	}
-
-	free_base_data(base);
-	return NULL;
-}
-
-static void find_unresolved_deltas(struct base_data *base)
-{
-	struct base_data *new_base, *prev_base = NULL;
-	for (;;) {
-		new_base = find_unresolved_deltas_1(base, prev_base);
-
-		if (new_base) {
-			prev_base = base;
-			base = new_base;
-		} else {
-			free(base);
-			base = prev_base;
-			if (!base)
-				return;
-			prev_base = base->base;
-		}
-	}
-}
-
 static int compare_ofs_delta_entry(const void *a, const void *b)
 {
 	const struct ofs_delta_entry *delta_a = a;
@@ -1065,33 +992,97 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 	return oidcmp(&delta_a->oid, &delta_b->oid);
 }
 
-static void resolve_base(struct object_entry *obj)
+static void *do_work(void *data)
 {
-	struct base_data *base_obj = make_base(obj, NULL);
-
-	find_unresolved_deltas(base_obj);
-}
-
-static void *threaded_second_pass(void *data)
-{
-	set_thread_data(data);
+	if (data)
+		set_thread_data(data);
 	for (;;) {
-		int i;
-		counter_lock();
-		display_progress(progress, nr_resolved_deltas);
-		counter_unlock();
+		struct base_data *parent = NULL;
+		struct object_entry *child_obj;
+		struct base_data *child;
+
 		work_lock();
-		while (nr_dispatched < nr_objects &&
-		       is_delta_type(objects[nr_dispatched].type))
-			nr_dispatched++;
-		if (nr_dispatched >= nr_objects) {
-			work_unlock();
-			break;
+		if (list_empty(&work_head)) {
+			while (nr_dispatched < nr_objects &&
+			       is_delta_type(objects[nr_dispatched].type))
+				nr_dispatched++;
+			if (nr_dispatched >= nr_objects) {
+				work_unlock();
+				break;
+			}
+			child_obj = &objects[nr_dispatched++];
+		} else {
+			parent = list_first_entry(&work_head, struct base_data,
+						  list);
+
+			if (parent->ref_first <= parent->ref_last) {
+				child_obj = objects +
+					ref_deltas[parent->ref_first++].obj_no;
+				assert(child_obj->real_type == OBJ_REF_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			} else {
+				child_obj = objects +
+					ofs_deltas[parent->ofs_first++].obj_no;
+				assert(child_obj->real_type == OBJ_OFS_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			}
+
+			if (parent->ref_first > parent->ref_last &&
+			    parent->ofs_first > parent->ofs_last) {
+				list_del(&parent->list);
+				list_add(&parent->list, &done_head);
+			}
+
+			get_base_data(parent);
+			parent->retain_data++;
 		}
-		i = nr_dispatched++;
 		work_unlock();
 
-		resolve_base(&objects[i]);
+		if (parent) {
+			child = resolve_delta(child_obj, parent);
+			if (!child->children_remaining)
+				FREE_AND_NULL(child->data);
+		} else {
+			child = make_base(child_obj, NULL);
+			if (child->children_remaining) {
+				/*
+				 * Since this child has its own delta children,
+				 * we will need this data in the future.
+				 * Inflate now so that future iterations will
+				 * have access to this object's data while
+				 * outside the work mutex.
+				 */
+				child->data = get_data_from_pack(child_obj);
+				child->size = child_obj->size;
+			}
+		}
+
+		work_lock();
+		if (parent)
+			parent->retain_data--;
+		if (child->data) {
+			list_add(&child->list, &work_head);
+			base_cache_used += child->size;
+			prune_base_data(NULL);
+		} else {
+			struct base_data *p = parent;
+
+			while (p) {
+				struct base_data *next_p;
+
+				p->children_remaining--;
+				if (p->children_remaining)
+					break;
+
+				next_p = p->base;
+				free_base_data(p);
+				list_del(&p->list);
+				free(p);
+
+				p = next_p;
+			}
+		}
+		work_unlock();
 	}
 	return NULL;
 }
@@ -1192,11 +1183,12 @@ static void resolve_deltas(void)
 					  nr_ref_deltas + nr_ofs_deltas);
 
 	nr_dispatched = 0;
+	base_cache_limit = delta_base_cache_limit * nr_threads;
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
 		init_thread();
 		for (i = 0; i < nr_threads; i++) {
 			int ret = pthread_create(&thread_data[i].thread, NULL,
-						 threaded_second_pass, thread_data + i);
+						 do_work, thread_data + i);
 			if (ret)
 				die(_("unable to create thread: %s"),
 				    strerror(ret));
@@ -1206,7 +1198,7 @@ static void resolve_deltas(void)
 		cleanup_thread();
 		return;
 	}
-	threaded_second_pass(&nothread_data);
+	do_work(&nothread_data);
 }
 
 /*
@@ -1362,10 +1354,8 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base;
 		void *data;
 		unsigned long size;
-		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
@@ -1376,11 +1366,8 @@ static void fix_unresolved_deltas(struct hashfile *f)
 		if (check_object_signature(&d->oid, data,
 					   size, type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
-		base = make_base(obj, NULL);
-		base->data = data;
-		base->size = size;
-		find_unresolved_deltas(base);
+		append_obj_to_pack(f, d->oid.hash, data, size, type);
+		do_work(NULL); /* will pick up new object in objects array (added by append_obj_to_pack) */
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] index-pack: remove redundant child field
  2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
@ 2019-10-10 14:45   ` Derrick Stolee
  2019-10-10 19:02     ` Jonathan Tan
  2019-10-17  6:26   ` Jeff King
  1 sibling, 1 reply; 32+ messages in thread
From: Derrick Stolee @ 2019-10-10 14:45 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: peff, mh

On 10/9/2019 7:44 PM, Jonathan Tan wrote:
> Instead, recompute ancestry if we ever need to reclaim memory.

I find this message lacking in important details:

1. Where do we recompute ancestry?
2. What are the performance implications of this change?
3. Why is it important that you construct a stack of deltas in prune_base_data()?

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 41 ++++++++++++++++++++++-------------------
>  1 file changed, 22 insertions(+), 19 deletions(-)
> 
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 99f6e2957f..35f7f9e52b 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -34,7 +34,6 @@ struct object_stat {
>  
>  struct base_data {
>  	struct base_data *base;
> -	struct base_data *child;
>  	struct object_entry *obj;
>  	void *data;
>  	unsigned long size;
> @@ -44,7 +43,6 @@ struct base_data {
>  
>  struct thread_local {
>  	pthread_t thread;
> -	struct base_data *base_cache;
>  	size_t base_cache_used;
>  	int pack_fd;
>  };
> @@ -380,27 +378,37 @@ static void free_base_data(struct base_data *c)
>  	}
>  }
>  
> -static void prune_base_data(struct base_data *retain)
> +static void prune_base_data(struct base_data *youngest_child)
>  {
>  	struct base_data *b;
>  	struct thread_local *data = get_thread_data();
> -	for (b = data->base_cache;
> -	     data->base_cache_used > delta_base_cache_limit && b;
> -	     b = b->child) {
> -		if (b->data && b != retain)
> -			free_base_data(b);
> +	struct base_data **ancestry = NULL;
> +	int nr = 0, alloc = 0;
> +	int i;
> +
> +	if (data->base_cache_used <= delta_base_cache_limit)
> +		return;
> +
> +	/*
> +	 * Free all ancestors of youngest_child until we have enough space,
> +	 * starting with the oldest. (We cannot free youngest_child itself.)
> +	 */
> +	for (b = youngest_child->base; b != NULL; b = b->base) {
> +		ALLOC_GROW(ancestry, nr + 1, alloc);
> +		ancestry[nr++] = b;
> +	}
> +	for (i = nr - 1;
> +	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
> +	     i--) {
> +		if (ancestry[i]->data)
> +			free_base_data(ancestry[i]);
>  	}
> +	free(ancestry);
>  }
>  
>  static void link_base_data(struct base_data *base, struct base_data *c)
>  {
> -	if (base)
> -		base->child = c;
> -	else
> -		get_thread_data()->base_cache = c;
> -
>  	c->base = base;
> -	c->child = NULL;
>  	if (c->data)
>  		get_thread_data()->base_cache_used += c->size;
>  	prune_base_data(c);
> @@ -408,11 +416,6 @@ static void link_base_data(struct base_data *base, struct base_data *c)
>  
>  static void unlink_base_data(struct base_data *c)
>  {
> -	struct base_data *base = c->base;
> -	if (base)
> -		base->child = NULL;
> -	else
> -		get_thread_data()->base_cache = NULL;
>  	free_base_data(c);
>  }

Seems like this method should be removed and all callers should
call free_base_data() instead.

-Stolee


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] index-pack: remove redundant child field
  2019-10-10 14:45   ` Derrick Stolee
@ 2019-10-10 19:02     ` Jonathan Tan
  2019-10-17  6:24       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-10 19:02 UTC (permalink / raw)
  To: stolee; +Cc: jonathantanmy, git, peff, mh

> On 10/9/2019 7:44 PM, Jonathan Tan wrote:
> > Instead, recompute ancestry if we ever need to reclaim memory.
> 
> I find this message lacking in important details:
> 
> 1. Where do we recompute ancestry?
> 2. What are the performance implications of this change?
> 3. Why is it important that you construct a stack of deltas in prune_base_data()?

Thanks for taking a look at this. My original plan (as I perhaps badly
explained in the cover letter [1]) was to show the individual small
steps that I took to reach the end goal, each step still passing all
tests, in the hope that small steps are easier to understand than one
big one. Hence why I didn't explain much in this commit message (and
others), because I thought that I might have to squash them later. But
perhaps that is too confusing and I should have just squashed them in
the first place (and explain all the changes in the commit message -
it's +177 -198, which is not too big anyway).

To answer the question anyway, the short answer is that it doesn't
matter because I'm going to replace this mechanism in later patches. But
a longer answer:

 1. In prune_base_delta() (the stack of deltas you mention in question
    3).
 2. Slightly fewer pointer management during the normal course of
    operation, but an allocation if we ever need to reclaim memory.
 3. To recompute the ancestry. We have ancestry using the "base"
    pointer, but I need to iterate from the oldest to newest, so I
    create an array of all the "base" pointers and iterate in reverse.

[1] https://public-inbox.org/git/cover.1570663470.git.jonathantanmy@google.com/

> >  static void link_base_data(struct base_data *base, struct base_data *c)
> >  {
> > -	if (base)
> > -		base->child = c;
> > -	else
> > -		get_thread_data()->base_cache = c;
> > -
> >  	c->base = base;
> > -	c->child = NULL;
> >  	if (c->data)
> >  		get_thread_data()->base_cache_used += c->size;
> >  	prune_base_data(c);
> > @@ -408,11 +416,6 @@ static void link_base_data(struct base_data *base, struct base_data *c)
> >  
> >  static void unlink_base_data(struct base_data *c)
> >  {
> > -	struct base_data *base = c->base;
> > -	if (base)
> > -		base->child = NULL;
> > -	else
> > -		get_thread_data()->base_cache = NULL;
> >  	free_base_data(c);
> >  }
> 
> Seems like this method should be removed and all callers should
> call free_base_data() instead.

I agree, and did it in the next patch. Here I left it to preserve the
{link,unlink}_base_data symmetry.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] index-pack: unify threaded and unthreaded code
  2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
@ 2019-10-17  6:20   ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-10-17  6:20 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, mh

On Wed, Oct 09, 2019 at 04:44:17PM -0700, Jonathan Tan wrote:

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index a23454da6e..1a2600d4b3 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1210,15 +1210,7 @@ static void resolve_deltas(void)
>  		cleanup_thread();
>  		return;
>  	}
> -
> -	for (i = 0; i < nr_objects; i++) {
> -		struct object_entry *obj = &objects[i];
> -
> -		if (is_delta_type(obj->type))
> -			continue;
> -		resolve_base(obj);
> -		display_progress(progress, nr_resolved_deltas);
> -	}
> +	threaded_second_pass(&nothread_data);
>  }

I wondered at first if this would be a problem with NO_PTHREADS. But I
think since the cleanups around 2094c5e582 (index-pack: remove #ifdef
NO_PTHREADS, 2018-11-03), we always define threaded_second_pass(), even
if all the locks are noops and we only ever have one thread.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] index-pack: remove redundant parameter
  2019-10-09 23:44 ` [PATCH 2/6] index-pack: remove redundant parameter Jonathan Tan
@ 2019-10-17  6:21   ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-10-17  6:21 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, mh

On Wed, Oct 09, 2019 at 04:44:18PM -0700, Jonathan Tan wrote:

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)

Yeah, this seems like a good cleanup. Probably worth explaining in the
commit message that we can infer it from the functions themselves.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] index-pack: remove redundant child field
  2019-10-10 19:02     ` Jonathan Tan
@ 2019-10-17  6:24       ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-10-17  6:24 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: stolee, git, mh

On Thu, Oct 10, 2019 at 12:02:29PM -0700, Jonathan Tan wrote:

> > On 10/9/2019 7:44 PM, Jonathan Tan wrote:
> > > Instead, recompute ancestry if we ever need to reclaim memory.
> > 
> > I find this message lacking in important details:
> > 
> > 1. Where do we recompute ancestry?
> > 2. What are the performance implications of this change?
> > 3. Why is it important that you construct a stack of deltas in prune_base_data()?
> 
> Thanks for taking a look at this. My original plan (as I perhaps badly
> explained in the cover letter [1]) was to show the individual small
> steps that I took to reach the end goal, each step still passing all
> tests, in the hope that small steps are easier to understand than one
> big one. Hence why I didn't explain much in this commit message (and
> others), because I thought that I might have to squash them later. But
> perhaps that is too confusing and I should have just squashed them in
> the first place (and explain all the changes in the commit message -
> it's +177 -198, which is not too big anyway).

FWIW, I like the breakdown. These are tricky cleanups, and seeing them
individually makes it easy to verify that they don't themselves break
anything. I think they just need more explanation.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] index-pack: remove redundant child field
  2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
  2019-10-10 14:45   ` Derrick Stolee
@ 2019-10-17  6:26   ` Jeff King
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-10-17  6:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, mh

On Wed, Oct 09, 2019 at 04:44:19PM -0700, Jonathan Tan wrote:

> -static void prune_base_data(struct base_data *retain)
> +static void prune_base_data(struct base_data *youngest_child)
>  {
>  	struct base_data *b;
>  	struct thread_local *data = get_thread_data();
> -	for (b = data->base_cache;
> -	     data->base_cache_used > delta_base_cache_limit && b;
> -	     b = b->child) {
> -		if (b->data && b != retain)
> -			free_base_data(b);
> +	struct base_data **ancestry = NULL;
> +	int nr = 0, alloc = 0;

Minor, but please use size_t for allocation variables.

> +	int i;

Technically this probably ought to be a size_t as well, but I'm much
more concerned about the allocation ones, where we might accidentally
overflow and underallocate a buffer. Overflowing "i" would probably just
lead to an error or bad result.

I _think_ what the patch is actually doing makes sense (taking as an
assumption that it's heading in a useful direction for the end of the
series).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early
  2019-10-09 23:44 ` [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
@ 2019-10-17  6:30   ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-10-17  6:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, mh

On Wed, Oct 09, 2019 at 04:44:20PM -0700, Jonathan Tan wrote:

> Whenever we make a struct base_data, immediately calculate its delta
> children. This eliminates confusion as to when the
> {ref,ofs}_{first,last} fields are initialized.

That _seems_ like a good idea, but I'm a little worried just because I
don't entirely understand why it was being done lazily before. If you've
puzzled all that out, it would be nice to make the argument in the
commit message.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] index-pack: make resolve_delta() assume base data
  2019-10-09 23:44 ` [PATCH 5/6] index-pack: make resolve_delta() assume base data Jonathan Tan
@ 2019-10-17  6:32   ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-10-17  6:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, mh

On Wed, Oct 09, 2019 at 04:44:21PM -0700, Jonathan Tan wrote:

> A subsequent commit will make the quantum of work smaller, necessitating
> more locking. This commit allows resolve_delta() to be called outside
> the lock.

OK, that makes sense, I think.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] index-pack: make quantum of work smaller
  2019-10-09 23:44 ` [PATCH 6/6] index-pack: make quantum of work smaller Jonathan Tan
@ 2019-10-17  6:35   ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2019-10-17  6:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, mh

On Wed, Oct 09, 2019 at 04:44:22PM -0700, Jonathan Tan wrote:

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 267 ++++++++++++++++++++-----------------------
>  1 file changed, 127 insertions(+), 140 deletions(-)

I think this is a good direction to go in. I confess I didn't carefully
go over the implementation details, since you've marked this as RFC and
it sounds like you're mainly asking about direction. It looks pretty
reasonable from a high level, though.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/7] Better threaded delta resolution in index-pack
  2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
                   ` (5 preceding siblings ...)
  2019-10-09 23:44 ` [PATCH 6/6] index-pack: make quantum of work smaller Jonathan Tan
@ 2019-10-17 20:17 ` Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
                     ` (7 more replies)
  6 siblings, 8 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

Thanks, Stolee and Peff, for taking a look at it. Here is a v2. It is
mostly unchanged, except for expanded commit messages and code comments.

I've also added a documentation clarification that
core.deltaBaseCacheLimit is per-thread, appearing as the first patch in
this patch series.

From patch 3 (now patch 4):

> > +	int i;
> 
> Technically this probably ought to be a size_t as well, but I'm much
> more concerned about the allocation ones, where we might accidentally
> overflow and underallocate a buffer. Overflowing "i" would probably just
> lead to an error or bad result.

I believe this needs to be signed, since we're iterating in reverse
order, so I made it a ssize_t instead (note the extra "s" in front).

From patch 4 (now patch 5):

> > Whenever we make a struct base_data, immediately calculate its delta
> > children. This eliminates confusion as to when the
> > {ref,ofs}_{first,last} fields are initialized.
> 
> That _seems_ like a good idea, but I'm a little worried just because I
> don't entirely understand why it was being done lazily before. If you've
> puzzled all that out, it would be nice to make the argument in the
> commit message.

I've added an explanation in the commit message.

Jonathan Tan (7):
  Documentation: deltaBaseCacheLimit is per-thread
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant parameter
  index-pack: remove redundant child field
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: make resolve_delta() assume base data
  index-pack: make quantum of work smaller

 Documentation/config/core.txt |   2 +-
 builtin/index-pack.c          | 446 ++++++++++++++++++----------------
 2 files changed, 242 insertions(+), 206 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  0a6777a243 Documentation: deltaBaseCacheLimit is per-thread
1:  7562afbaa9 = 2:  b19d6131e0 index-pack: unify threaded and unthreaded code
2:  a8567333dc ! 3:  f01f069a08 index-pack: remove redundant parameter
    @@ Metadata
      ## Commit message ##
         index-pack: remove redundant parameter
     
    +    find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
    +    the object type is already present in the name of the function. Remove
    +    that parameter from these functions.
    +
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## builtin/index-pack.c ##
3:  97eddde2ec ! 4:  3359b66b84 index-pack: remove redundant child field
    @@ Metadata
      ## Commit message ##
         index-pack: remove redundant child field
     
    -    Instead, recompute ancestry if we ever need to reclaim memory.
    +    This is refactoring 1 of 2 to simplify struct base_data.
    +
    +    In index-pack, each thread maintains a doubly-linked list of the delta
    +    chain that it is currently processing (the "base" and "child" pointers
    +    in struct base_data). When a thread exceeds the delta base cache limit
    +    and needs to reclaim memory, it uses the "child" pointers to traverse
    +    the lineage, reclaiming the memory of the eldest delta bases first.
    +
    +    A subsequent patch will perform memory reclaiming in a different way and
    +    will thus no longer need the "child" pointer. Because the "child"
    +    pointer is redundant even now, remove it so that the aforementioned
    +    subsequent patch will be clearer. In the meantime, reclaim memory in the
    +    reverse order of the "base" pointers.
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
    @@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
     -		if (b->data && b != retain)
     -			free_base_data(b);
     +	struct base_data **ancestry = NULL;
    -+	int nr = 0, alloc = 0;
    -+	int i;
    ++	size_t nr = 0, alloc = 0;
    ++	ssize_t i;
     +
     +	if (data->base_cache_used <= delta_base_cache_limit)
     +		return;
    @@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
     +	for (b = youngest_child->base; b != NULL; b = b->base) {
     +		ALLOC_GROW(ancestry, nr + 1, alloc);
     +		ancestry[nr++] = b;
    -+	}
    + 	}
     +	for (i = nr - 1;
     +	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
     +	     i--) {
     +		if (ancestry[i]->data)
     +			free_base_data(ancestry[i]);
    - 	}
    ++	}
     +	free(ancestry);
      }
      
4:  5d9687145d ! 5:  7f18480c45 index-pack: calculate {ref,ofs}_{first,last} early
    @@ Metadata
      ## Commit message ##
         index-pack: calculate {ref,ofs}_{first,last} early
     
    +    This is refactoring 2 of 2 to simplify struct base_data.
    +
         Whenever we make a struct base_data, immediately calculate its delta
         children. This eliminates confusion as to when the
         {ref,ofs}_{first,last} fields are initialized.
     
    +    Before this patch, the delta children were calculated at the last
    +    possible moment. This allowed the members of struct base_data to be
    +    populated in any order, superficially useful when we have the object
    +    contents before the struct object_entry. But this makes reasoning about
    +    the state of struct base_data more complicated, hence this patch.
    +
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## builtin/index-pack.c ##
5:  ca4997017d = 6:  910b15219f index-pack: make resolve_delta() assume base data
6:  4f7c252a7c ! 7:  2f2e36d3ef index-pack: make quantum of work smaller
    @@ Metadata
      ## Commit message ##
         index-pack: make quantum of work smaller
     
    +    Currently, when index-pack resolves deltas, it does not split up delta
    +    trees into threads: each delta base root (an object that is not a
    +    REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
    +    that root (direct or indirect) are processed in the same thread.
    +
    +    This is a problem when a repository contains a large text file (thus,
    +    delta-able) that is modified many times - delta resolution time during
    +    fetching is dominated by processing the deltas corresponding to that
    +    text file.
    +
    +    This patch contains a solution to that. When cloning using
    +
    +      git -c core.deltabasecachelimit=1g clone \
    +        https://fuchsia.googlesource.com/third_party/vulkan-cts
    +
    +    on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
    +    which is the default).
    +
    +    The solution is to have a global work stack. This stack contains delta
    +    bases (objects, whether appearing directly in the packfile or generated
    +    by delta resolution, that themselves have delta children) that need to
    +    be processed; whenever a thread needs work, it peeks at the top of the
    +    stack and processes its next unprocessed child. If a thread finds the
    +    stack empty, it will look for more delta base roots to push on the stack
    +    instead.
    +
    +    The main weakness of having a global work stack is that more time is
    +    spent in the mutex, but profiling has shown that most time is spent in
    +    the resolution of the deltas themselves, so this shouldn't be an issue
    +    in practice. In any case, experimentation (as described in the clone
    +    command above) shows that this patch is a net improvement.
    +
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## builtin/index-pack.c ##
    @@ builtin/index-pack.c: struct base_data {
      	struct object_entry *obj;
      	int ref_first, ref_last;
      	int ofs_first, ofs_last;
    ++	/*
    ++	 * Threads should increment retain_data if they are about to call
    ++	 * patch_delta() using this struct's data as a base, and decrement this
    ++	 * when they are done. While retain_data is nonzero, this struct's data
    ++	 * will not be freed even if the delta base cache limit is exceeded.
    ++	 */
     +	int retain_data;
    ++	/*
    ++	 * The number of direct children that have not been fully processed
    ++	 * (entered work_head, entered done_head, left done_head). When this
    ++	 * number reaches zero, this struct base_data can be freed.
    ++	 */
     +	int children_remaining;
      
      	/* Not initialized by make_base(). */
    @@ builtin/index-pack.c: struct base_data {
      	unsigned long size;
      };
      
    ++/*
    ++ * Stack of struct base_data that have unprocessed children.
    ++ * threaded_second_pass() uses this as a source of work (the other being the
    ++ * objects array).
    ++ */
     +LIST_HEAD(work_head);
    ++
    ++/*
    ++ * Stack of struct base_data that have children, all of whom have been
    ++ * processed or are being processed, and at least one child is being processed.
    ++ * These struct base_data must be kept around until the last child is
    ++ * processed.
    ++ */
     +LIST_HEAD(done_head);
    ++
    ++/*
    ++ * All threads share one delta base cache.
    ++ */
     +size_t base_cache_used;
     +size_t base_cache_limit;
     +
    @@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
     -	struct base_data *b;
     -	struct thread_local *data = get_thread_data();
     -	struct base_data **ancestry = NULL;
    --	int nr = 0, alloc = 0;
    --	int i;
    +-	size_t nr = 0, alloc = 0;
    +-	ssize_t i;
     +	struct list_head *pos;
      
     -	if (data->base_cache_used <= delta_base_cache_limit)
    @@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
      }
      
      static int is_delta_type(enum object_type type)
    +@@ builtin/index-pack.c: static void sha1_object(const void *data, struct object_entry *obj_entry,
    + }
    + 
    + /*
    +- * This function is part of find_unresolved_deltas(). There are two
    +- * walkers going in the opposite ways.
    +- *
    +- * The first one in find_unresolved_deltas() traverses down from
    +- * parent node to children, deflating nodes along the way. However,
    +- * memory for deflated nodes is limited by delta_base_cache_limit, so
    +- * at some point parent node's deflated content may be freed.
    +- *
    +- * The second walker is this function, which goes from current node up
    ++ * Walk from current node up
    +  * to top parent if necessary to deflate the node. In normal
    +  * situation, its parent node would be already deflated, so it just
    +  * needs to apply delta.
     @@ builtin/index-pack.c: static void *get_base_data(struct base_data *c)
      		if (!delta_nr) {
      			c->data = get_data_from_pack(obj);
    @@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
      }
      
     -static void resolve_base(struct object_entry *obj)
    -+static void *do_work(void *data)
    - {
    +-{
     -	struct base_data *base_obj = make_base(obj, NULL);
     -
     -	find_unresolved_deltas(base_obj);
     -}
     -
    --static void *threaded_second_pass(void *data)
    --{
    + static void *threaded_second_pass(void *data)
    + {
     -	set_thread_data(data);
     +	if (data)
     +		set_thread_data(data);
    @@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
     -			work_unlock();
     -			break;
     +		if (list_empty(&work_head)) {
    ++			/*
    ++			 * Take an object from the object array.
    ++			 */
     +			while (nr_dispatched < nr_objects &&
     +			       is_delta_type(objects[nr_dispatched].type))
     +				nr_dispatched++;
    @@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
     +			}
     +			child_obj = &objects[nr_dispatched++];
     +		} else {
    ++			/*
    ++			 * Peek at the top of the stack, and take a child from
    ++			 * it.
    ++			 */
     +			parent = list_first_entry(&work_head, struct base_data,
     +						  list);
     +
    @@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
     +
     +			if (parent->ref_first > parent->ref_last &&
     +			    parent->ofs_first > parent->ofs_last) {
    ++				/*
    ++				 * This parent has run out of children, so move
    ++				 * it to done_head.
    ++				 */
     +				list_del(&parent->list);
     +				list_add(&parent->list, &done_head);
     +			}
     +
    ++			/*
    ++			 * Ensure that the parent has data, since we will need
    ++			 * it later.
    ++			 *
    ++			 * NEEDSWORK: If parent data needs to be reloaded, this
    ++			 * prolongs the time that the current thread spends in
    ++			 * the mutex. A mitigating factor is that parent data
    ++			 * needs to be reloaded only if the delta base cache
    ++			 * limit is exceeded, so in the typical case, this does
    ++			 * not happen.
    ++			 */
     +			get_base_data(parent);
     +			parent->retain_data++;
      		}
    @@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
     +		if (parent)
     +			parent->retain_data--;
     +		if (child->data) {
    ++			/*
    ++			 * This child has its own children, so add it to
    ++			 * work_head.
    ++			 */
     +			list_add(&child->list, &work_head);
     +			base_cache_used += child->size;
     +			prune_base_data(NULL);
     +		} else {
    ++			/*
    ++			 * This child does not have its own children. It may be
    ++			 * the last descendant of its ancestors; free those
    ++			 * that we can.
    ++			 */
     +			struct base_data *p = parent;
     +
     +			while (p) {
    @@ builtin/index-pack.c: static void resolve_deltas(void)
      	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
      		init_thread();
      		for (i = 0; i < nr_threads; i++) {
    - 			int ret = pthread_create(&thread_data[i].thread, NULL,
    --						 threaded_second_pass, thread_data + i);
    -+						 do_work, thread_data + i);
    - 			if (ret)
    - 				die(_("unable to create thread: %s"),
    - 				    strerror(ret));
    -@@ builtin/index-pack.c: static void resolve_deltas(void)
    - 		cleanup_thread();
    - 		return;
    - 	}
    --	threaded_second_pass(&nothread_data);
    -+	do_work(&nothread_data);
    - }
    - 
    - /*
     @@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
      	for (i = 0; i < nr_ref_deltas; i++) {
      		struct ref_delta_entry *d = sorted_by_pos[i];
    @@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
     -		base->data = data;
     -		base->size = size;
     -		find_unresolved_deltas(base);
    ++
    ++		/*
    ++		 * Add this as an object to the objects array and call
    ++		 * threaded_second_pass() (which will pick up the added
    ++		 * object).
    ++		 */
     +		append_obj_to_pack(f, d->oid.hash, data, size, type);
    -+		do_work(NULL); /* will pick up new object in objects array (added by append_obj_to_pack) */
    ++		threaded_second_pass(NULL);
    ++
      		display_progress(progress, nr_resolved_deltas);
      	}
      	free(sorted_by_pos);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 2/7] index-pack: unify threaded and unthreaded code Jonathan Tan
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

Clarify that core.deltaBaseCacheLimit is per-thread, as can be seen from
the fact that cache usage (base_cache_used in struct thread_local in
builtin/index-pack.c) is tracked individually for each thread and
compared against delta_base_cache_limit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config/core.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 852d2ba37a..ce1ba8d55f 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -388,7 +388,7 @@ the largest projects.  You probably do not need to adjust this value.
 Common unit suffixes of 'k', 'm', or 'g' are supported.
 
 core.deltaBaseCacheLimit::
-	Maximum number of bytes to reserve for caching base objects
+	Maximum number of bytes per thread to reserve for caching base objects
 	that may be referenced by multiple deltified objects.  By storing the
 	entire decompressed base objects in a cache Git is able
 	to avoid unpacking and decompressing frequently used base
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/7] index-pack: unify threaded and unthreaded code
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 3/7] index-pack: remove redundant parameter Jonathan Tan
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 60a5591039..df6b3b8cf6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1210,15 +1210,7 @@ static void resolve_deltas(void)
 		cleanup_thread();
 		return;
 	}
-
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
-
-		if (is_delta_type(obj->type))
-			continue;
-		resolve_base(obj);
-		display_progress(progress, nr_resolved_deltas);
-	}
+	threaded_second_pass(&nothread_data);
 }
 
 /*
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/7] index-pack: remove redundant parameter
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 2/7] index-pack: unify threaded and unthreaded code Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2020-02-28  0:04     ` Josh Steadmon
  2019-10-17 20:17   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
the object type is already present in the name of the function. Remove
that parameter from these functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index df6b3b8cf6..296804230c 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -614,7 +614,7 @@ static int compare_ofs_delta_bases(off_t offset1, off_t offset2,
 	       0;
 }
 
-static int find_ofs_delta(const off_t offset, enum object_type type)
+static int find_ofs_delta(const off_t offset)
 {
 	int first = 0, last = nr_ofs_deltas;
 
@@ -624,7 +624,8 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 		int cmp;
 
 		cmp = compare_ofs_delta_bases(offset, delta->offset,
-					      type, objects[delta->obj_no].type);
+					      OBJ_OFS_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -637,10 +638,9 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
 }
 
 static void find_ofs_delta_children(off_t offset,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ofs_delta(offset, type);
+	int first = find_ofs_delta(offset);
 	int last = first;
 	int end = nr_ofs_deltas - 1;
 
@@ -668,7 +668,7 @@ static int compare_ref_delta_bases(const struct object_id *oid1,
 	return oidcmp(oid1, oid2);
 }
 
-static int find_ref_delta(const struct object_id *oid, enum object_type type)
+static int find_ref_delta(const struct object_id *oid)
 {
 	int first = 0, last = nr_ref_deltas;
 
@@ -678,7 +678,8 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 		int cmp;
 
 		cmp = compare_ref_delta_bases(oid, &delta->oid,
-					      type, objects[delta->obj_no].type);
+					      OBJ_REF_DELTA,
+					      objects[delta->obj_no].type);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -691,10 +692,9 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
 }
 
 static void find_ref_delta_children(const struct object_id *oid,
-				    int *first_index, int *last_index,
-				    enum object_type type)
+				    int *first_index, int *last_index)
 {
-	int first = find_ref_delta(oid, type);
+	int first = find_ref_delta(oid);
 	int last = first;
 	int end = nr_ref_deltas - 1;
 
@@ -982,12 +982,10 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 {
 	if (base->ref_last == -1 && base->ofs_last == -1) {
 		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last,
-					OBJ_REF_DELTA);
+					&base->ref_first, &base->ref_last);
 
 		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last,
-					OBJ_OFS_DELTA);
+					&base->ofs_first, &base->ofs_last);
 
 		if (base->ref_last == -1 && base->ofs_last == -1) {
 			free(base->data);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 4/7] index-pack: remove redundant child field
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
                     ` (2 preceding siblings ...)
  2019-10-17 20:17   ` [PATCH v2 3/7] index-pack: remove redundant parameter Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2020-02-28  0:04     ` Josh Steadmon
  2019-10-17 20:17   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

This is refactoring 1 of 2 to simplify struct base_data.

In index-pack, each thread maintains a doubly-linked list of the delta
chain that it is currently processing (the "base" and "child" pointers
in struct base_data). When a thread exceeds the delta base cache limit
and needs to reclaim memory, it uses the "child" pointers to traverse
the lineage, reclaiming the memory of the eldest delta bases first.

A subsequent patch will perform memory reclaiming in a different way and
will thus no longer need the "child" pointer. Because the "child"
pointer is redundant even now, remove it so that the aforementioned
subsequent patch will be clearer. In the meantime, reclaim memory in the
reverse order of the "base" pointers.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 296804230c..220e1e3693 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -34,7 +34,6 @@ struct object_stat {
 
 struct base_data {
 	struct base_data *base;
-	struct base_data *child;
 	struct object_entry *obj;
 	void *data;
 	unsigned long size;
@@ -44,7 +43,6 @@ struct base_data {
 
 struct thread_local {
 	pthread_t thread;
-	struct base_data *base_cache;
 	size_t base_cache_used;
 	int pack_fd;
 };
@@ -380,27 +378,37 @@ static void free_base_data(struct base_data *c)
 	}
 }
 
-static void prune_base_data(struct base_data *retain)
+static void prune_base_data(struct base_data *youngest_child)
 {
 	struct base_data *b;
 	struct thread_local *data = get_thread_data();
-	for (b = data->base_cache;
-	     data->base_cache_used > delta_base_cache_limit && b;
-	     b = b->child) {
-		if (b->data && b != retain)
-			free_base_data(b);
+	struct base_data **ancestry = NULL;
+	size_t nr = 0, alloc = 0;
+	ssize_t i;
+
+	if (data->base_cache_used <= delta_base_cache_limit)
+		return;
+
+	/*
+	 * Free all ancestors of youngest_child until we have enough space,
+	 * starting with the oldest. (We cannot free youngest_child itself.)
+	 */
+	for (b = youngest_child->base; b != NULL; b = b->base) {
+		ALLOC_GROW(ancestry, nr + 1, alloc);
+		ancestry[nr++] = b;
 	}
+	for (i = nr - 1;
+	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
+	     i--) {
+		if (ancestry[i]->data)
+			free_base_data(ancestry[i]);
+	}
+	free(ancestry);
 }
 
 static void link_base_data(struct base_data *base, struct base_data *c)
 {
-	if (base)
-		base->child = c;
-	else
-		get_thread_data()->base_cache = c;
-
 	c->base = base;
-	c->child = NULL;
 	if (c->data)
 		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
@@ -408,11 +416,6 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 
 static void unlink_base_data(struct base_data *c)
 {
-	struct base_data *base = c->base;
-	if (base)
-		base->child = NULL;
-	else
-		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
                     ` (3 preceding siblings ...)
  2019-10-17 20:17   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

This is refactoring 2 of 2 to simplify struct base_data.

Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.

Before this patch, the delta children were calculated at the last
possible moment. This allowed the members of struct base_data to be
populated in any order, superficially useful when we have the object
contents before the struct object_entry. But this makes reasoning about
the state of struct base_data more complicated, hence this patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 125 +++++++++++++++++++++----------------------
 1 file changed, 61 insertions(+), 64 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 220e1e3693..d21353757d 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -33,12 +33,15 @@ struct object_stat {
 };
 
 struct base_data {
+	/* Initialized by make_base(). */
 	struct base_data *base;
 	struct object_entry *obj;
-	void *data;
-	unsigned long size;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+
+	/* Not initialized by make_base(). */
+	void *data;
+	unsigned long size;
 };
 
 struct thread_local {
@@ -362,14 +365,6 @@ static void set_thread_data(struct thread_local *data)
 		pthread_setspecific(key, data);
 }
 
-static struct base_data *alloc_base_data(void)
-{
-	struct base_data *base = xcalloc(1, sizeof(struct base_data));
-	base->ref_last = -1;
-	base->ofs_last = -1;
-	return base;
-}
-
 static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
@@ -406,19 +401,6 @@ static void prune_base_data(struct base_data *youngest_child)
 	free(ancestry);
 }
 
-static void link_base_data(struct base_data *base, struct base_data *c)
-{
-	c->base = base;
-	if (c->data)
-		get_thread_data()->base_cache_used += c->size;
-	prune_base_data(c);
-}
-
-static void unlink_base_data(struct base_data *c)
-{
-	free_base_data(c);
-}
-
 static int is_delta_type(enum object_type type)
 {
 	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
@@ -928,10 +910,25 @@ static void *get_base_data(struct base_data *c)
 	return c->data;
 }
 
-static void resolve_delta(struct object_entry *delta_obj,
-			  struct base_data *base, struct base_data *result)
+static struct base_data *make_base(struct object_entry *obj,
+				   struct base_data *parent)
 {
-	void *base_data, *delta_data;
+	struct base_data *base = xcalloc(1, sizeof(struct base_data));
+	base->base = parent;
+	base->obj = obj;
+	find_ref_delta_children(&obj->idx.oid,
+				&base->ref_first, &base->ref_last);
+	find_ofs_delta_children(obj->idx.offset,
+				&base->ofs_first, &base->ofs_last);
+	return base;
+}
+
+static struct base_data *resolve_delta(struct object_entry *delta_obj,
+				       struct base_data *base)
+{
+	void *base_data, *delta_data, *result_data;
+	struct base_data *result;
+	unsigned long result_size;
 
 	if (show_stat) {
 		int i = delta_obj - objects;
@@ -945,19 +942,31 @@ static void resolve_delta(struct object_entry *delta_obj,
 	}
 	delta_data = get_data_from_pack(delta_obj);
 	base_data = get_base_data(base);
-	result->obj = delta_obj;
-	result->data = patch_delta(base_data, base->size,
-				   delta_data, delta_obj->size, &result->size);
+	result_data = patch_delta(base_data, base->size,
+				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
-	if (!result->data)
+	if (!result_data)
 		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
-	hash_object_file(result->data, result->size,
+	hash_object_file(result_data, result_size,
 			 type_name(delta_obj->real_type), &delta_obj->idx.oid);
-	sha1_object(result->data, NULL, result->size, delta_obj->real_type,
+	sha1_object(result_data, NULL, result_size, delta_obj->real_type,
 		    &delta_obj->idx.oid);
+
+	result = make_base(delta_obj, base);
+	if (result->ref_last == -1 && result->ofs_last == -1) {
+		free(result_data);
+	} else {
+		result->data = result_data;
+		result->size = result_size;
+		get_thread_data()->base_cache_used += result->size;
+		prune_base_data(result);
+	}
+
 	counter_lock();
 	nr_resolved_deltas++;
 	counter_unlock();
+
+	return result;
 }
 
 /*
@@ -983,30 +992,15 @@ static int compare_and_swap_type(signed char *type,
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 						  struct base_data *prev_base)
 {
-	if (base->ref_last == -1 && base->ofs_last == -1) {
-		find_ref_delta_children(&base->obj->idx.oid,
-					&base->ref_first, &base->ref_last);
-
-		find_ofs_delta_children(base->obj->idx.offset,
-					&base->ofs_first, &base->ofs_last);
-
-		if (base->ref_last == -1 && base->ofs_last == -1) {
-			free(base->data);
-			return NULL;
-		}
-
-		link_base_data(prev_base, base);
-	}
-
 	if (base->ref_first <= base->ref_last) {
 		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
 					   base->obj->real_type))
 			BUG("child->real_type != OBJ_REF_DELTA");
 
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
 
@@ -1016,11 +1010,11 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 	if (base->ofs_first <= base->ofs_last) {
 		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result = alloc_base_data();
+		struct base_data *result;
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
-		resolve_delta(child, base, result);
+		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
 
@@ -1028,7 +1022,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 		return result;
 	}
 
-	unlink_base_data(base);
+	free_base_data(base);
 	return NULL;
 }
 
@@ -1071,9 +1065,8 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 
 static void resolve_base(struct object_entry *obj)
 {
-	struct base_data *base_obj = alloc_base_data();
-	base_obj->obj = obj;
-	base_obj->data = NULL;
+	struct base_data *base_obj = make_base(obj, NULL);
+
 	find_unresolved_deltas(base_obj);
 }
 
@@ -1367,21 +1360,25 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base_obj = alloc_base_data();
+		struct base_data *base;
+		void *data;
+		unsigned long size;
+		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
-		base_obj->data = read_object_file(&d->oid, &type,
-						  &base_obj->size);
-		if (!base_obj->data)
+		data = read_object_file(&d->oid, &type, &size);
+		if (!data)
 			continue;
 
-		if (check_object_signature(&d->oid, base_obj->data,
-				base_obj->size, type_name(type)))
+		if (check_object_signature(&d->oid, data,
+					   size, type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		base_obj->obj = append_obj_to_pack(f, d->oid.hash,
-					base_obj->data, base_obj->size, type);
-		find_unresolved_deltas(base_obj);
+		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
+		base = make_base(obj, NULL);
+		base->data = data;
+		base->size = size;
+		find_unresolved_deltas(base);
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 6/7] index-pack: make resolve_delta() assume base data
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
                     ` (4 preceding siblings ...)
  2019-10-17 20:17   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
  2020-02-28  0:03   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Josh Steadmon
  7 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d21353757d..31607a77fc 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -926,7 +926,7 @@ static struct base_data *make_base(struct object_entry *obj,
 static struct base_data *resolve_delta(struct object_entry *delta_obj,
 				       struct base_data *base)
 {
-	void *base_data, *delta_data, *result_data;
+	void *delta_data, *result_data;
 	struct base_data *result;
 	unsigned long result_size;
 
@@ -941,8 +941,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		obj_stat[i].base_object_no = j;
 	}
 	delta_data = get_data_from_pack(delta_obj);
-	base_data = get_base_data(base);
-	result_data = patch_delta(base_data, base->size,
+	assert(base->data);
+	result_data = patch_delta(base->data, base->size,
 				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
 	if (!result_data)
@@ -1000,6 +1000,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 					   base->obj->real_type))
 			BUG("child->real_type != OBJ_REF_DELTA");
 
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
@@ -1014,6 +1015,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 7/7] index-pack: make quantum of work smaller
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
                     ` (5 preceding siblings ...)
  2019-10-17 20:17   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
@ 2019-10-17 20:17   ` Jonathan Tan
  2020-02-28  0:04     ` Josh Steadmon
  2020-02-28  0:03   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Josh Steadmon
  7 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2019-10-17 20:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee, peff

Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.

This patch contains a solution to that. When cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.

The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 336 ++++++++++++++++++++++++-------------------
 1 file changed, 190 insertions(+), 146 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 31607a77fc..072592a35d 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -38,15 +38,49 @@ struct base_data {
 	struct object_entry *obj;
 	int ref_first, ref_last;
 	int ofs_first, ofs_last;
+	/*
+	 * Threads should increment retain_data if they are about to call
+	 * patch_delta() using this struct's data as a base, and decrement this
+	 * when they are done. While retain_data is nonzero, this struct's data
+	 * will not be freed even if the delta base cache limit is exceeded.
+	 */
+	int retain_data;
+	/*
+	 * The number of direct children that have not been fully processed
+	 * (entered work_head, entered done_head, left done_head). When this
+	 * number reaches zero, this struct base_data can be freed.
+	 */
+	int children_remaining;
 
 	/* Not initialized by make_base(). */
+	struct list_head list;
 	void *data;
 	unsigned long size;
 };
 
+/*
+ * Stack of struct base_data that have unprocessed children.
+ * threaded_second_pass() uses this as a source of work (the other being the
+ * objects array).
+ */
+LIST_HEAD(work_head);
+
+/*
+ * Stack of struct base_data that have children, all of whom have been
+ * processed or are being processed, and at least one child is being processed.
+ * These struct base_data must be kept around until the last child is
+ * processed.
+ */
+LIST_HEAD(done_head);
+
+/*
+ * All threads share one delta base cache.
+ */
+size_t base_cache_used;
+size_t base_cache_limit;
+
 struct thread_local {
 	pthread_t thread;
-	size_t base_cache_used;
 	int pack_fd;
 };
 
@@ -369,36 +403,38 @@ static void free_base_data(struct base_data *c)
 {
 	if (c->data) {
 		FREE_AND_NULL(c->data);
-		get_thread_data()->base_cache_used -= c->size;
+		base_cache_used -= c->size;
 	}
 }
 
-static void prune_base_data(struct base_data *youngest_child)
+static void prune_base_data(struct base_data *retain)
 {
-	struct base_data *b;
-	struct thread_local *data = get_thread_data();
-	struct base_data **ancestry = NULL;
-	size_t nr = 0, alloc = 0;
-	ssize_t i;
+	struct list_head *pos;
 
-	if (data->base_cache_used <= delta_base_cache_limit)
+	if (base_cache_used <= base_cache_limit)
 		return;
 
-	/*
-	 * Free all ancestors of youngest_child until we have enough space,
-	 * starting with the oldest. (We cannot free youngest_child itself.)
-	 */
-	for (b = youngest_child->base; b != NULL; b = b->base) {
-		ALLOC_GROW(ancestry, nr + 1, alloc);
-		ancestry[nr++] = b;
+	list_for_each_prev(pos, &done_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	for (i = nr - 1;
-	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
-	     i--) {
-		if (ancestry[i]->data)
-			free_base_data(ancestry[i]);
+
+	list_for_each_prev(pos, &work_head) {
+		struct base_data *b = list_entry(pos, struct base_data, list);
+		if (b->retain_data || b == retain)
+			continue;
+		if (b->data) {
+			free_base_data(b);
+			if (base_cache_used <= base_cache_limit)
+				return;
+		}
 	}
-	free(ancestry);
 }
 
 static int is_delta_type(enum object_type type)
@@ -850,15 +886,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 }
 
 /*
- * This function is part of find_unresolved_deltas(). There are two
- * walkers going in the opposite ways.
- *
- * The first one in find_unresolved_deltas() traverses down from
- * parent node to children, deflating nodes along the way. However,
- * memory for deflated nodes is limited by delta_base_cache_limit, so
- * at some point parent node's deflated content may be freed.
- *
- * The second walker is this function, which goes from current node up
+ * Walk from current node up
  * to top parent if necessary to deflate the node. In normal
  * situation, its parent node would be already deflated, so it just
  * needs to apply delta.
@@ -886,7 +914,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -902,7 +930,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, _("failed to apply delta"));
-			get_thread_data()->base_cache_used += c->size;
+			base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -920,6 +948,8 @@ static struct base_data *make_base(struct object_entry *obj,
 				&base->ref_first, &base->ref_last);
 	find_ofs_delta_children(obj->idx.offset,
 				&base->ofs_first, &base->ofs_last);
+	base->children_remaining = base->ref_last - base->ref_first +
+		base->ofs_last - base->ofs_first + 2;
 	return base;
 }
 
@@ -953,14 +983,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		    &delta_obj->idx.oid);
 
 	result = make_base(delta_obj, base);
-	if (result->ref_last == -1 && result->ofs_last == -1) {
-		free(result_data);
-	} else {
-		result->data = result_data;
-		result->size = result_size;
-		get_thread_data()->base_cache_used += result->size;
-		prune_base_data(result);
-	}
+	result->data = result_data;
+	result->size = result_size;
 
 	counter_lock();
 	nr_resolved_deltas++;
@@ -969,84 +993,6 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 	return result;
 }
 
-/*
- * Standard boolean compare-and-swap: atomically check whether "*type" is
- * "want"; if so, swap in "set" and return true. Otherwise, leave it untouched
- * and return false.
- */
-static int compare_and_swap_type(signed char *type,
-				 enum object_type want,
-				 enum object_type set)
-{
-	enum object_type old;
-
-	type_cas_lock();
-	old = *type;
-	if (old == want)
-		*type = set;
-	type_cas_unlock();
-
-	return old == want;
-}
-
-static struct base_data *find_unresolved_deltas_1(struct base_data *base,
-						  struct base_data *prev_base)
-{
-	if (base->ref_first <= base->ref_last) {
-		struct object_entry *child = objects + ref_deltas[base->ref_first].obj_no;
-		struct base_data *result;
-
-		if (!compare_and_swap_type(&child->real_type, OBJ_REF_DELTA,
-					   base->obj->real_type))
-			BUG("child->real_type != OBJ_REF_DELTA");
-
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ref_first == base->ref_last && base->ofs_last == -1)
-			free_base_data(base);
-
-		base->ref_first++;
-		return result;
-	}
-
-	if (base->ofs_first <= base->ofs_last) {
-		struct object_entry *child = objects + ofs_deltas[base->ofs_first].obj_no;
-		struct base_data *result;
-
-		assert(child->real_type == OBJ_OFS_DELTA);
-		child->real_type = base->obj->real_type;
-		get_base_data(base);
-		result = resolve_delta(child, base);
-		if (base->ofs_first == base->ofs_last)
-			free_base_data(base);
-
-		base->ofs_first++;
-		return result;
-	}
-
-	free_base_data(base);
-	return NULL;
-}
-
-static void find_unresolved_deltas(struct base_data *base)
-{
-	struct base_data *new_base, *prev_base = NULL;
-	for (;;) {
-		new_base = find_unresolved_deltas_1(base, prev_base);
-
-		if (new_base) {
-			prev_base = base;
-			base = new_base;
-		} else {
-			free(base);
-			base = prev_base;
-			if (!base)
-				return;
-			prev_base = base->base;
-		}
-	}
-}
-
 static int compare_ofs_delta_entry(const void *a, const void *b)
 {
 	const struct ofs_delta_entry *delta_a = a;
@@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 	return oidcmp(&delta_a->oid, &delta_b->oid);
 }
 
-static void resolve_base(struct object_entry *obj)
-{
-	struct base_data *base_obj = make_base(obj, NULL);
-
-	find_unresolved_deltas(base_obj);
-}
-
 static void *threaded_second_pass(void *data)
 {
-	set_thread_data(data);
+	if (data)
+		set_thread_data(data);
 	for (;;) {
-		int i;
-		counter_lock();
-		display_progress(progress, nr_resolved_deltas);
-		counter_unlock();
+		struct base_data *parent = NULL;
+		struct object_entry *child_obj;
+		struct base_data *child;
+
 		work_lock();
-		while (nr_dispatched < nr_objects &&
-		       is_delta_type(objects[nr_dispatched].type))
-			nr_dispatched++;
-		if (nr_dispatched >= nr_objects) {
-			work_unlock();
-			break;
+		if (list_empty(&work_head)) {
+			/*
+			 * Take an object from the object array.
+			 */
+			while (nr_dispatched < nr_objects &&
+			       is_delta_type(objects[nr_dispatched].type))
+				nr_dispatched++;
+			if (nr_dispatched >= nr_objects) {
+				work_unlock();
+				break;
+			}
+			child_obj = &objects[nr_dispatched++];
+		} else {
+			/*
+			 * Peek at the top of the stack, and take a child from
+			 * it.
+			 */
+			parent = list_first_entry(&work_head, struct base_data,
+						  list);
+
+			if (parent->ref_first <= parent->ref_last) {
+				child_obj = objects +
+					ref_deltas[parent->ref_first++].obj_no;
+				assert(child_obj->real_type == OBJ_REF_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			} else {
+				child_obj = objects +
+					ofs_deltas[parent->ofs_first++].obj_no;
+				assert(child_obj->real_type == OBJ_OFS_DELTA);
+				child_obj->real_type = parent->obj->real_type;
+			}
+
+			if (parent->ref_first > parent->ref_last &&
+			    parent->ofs_first > parent->ofs_last) {
+				/*
+				 * This parent has run out of children, so move
+				 * it to done_head.
+				 */
+				list_del(&parent->list);
+				list_add(&parent->list, &done_head);
+			}
+
+			/*
+			 * Ensure that the parent has data, since we will need
+			 * it later.
+			 *
+			 * NEEDSWORK: If parent data needs to be reloaded, this
+			 * prolongs the time that the current thread spends in
+			 * the mutex. A mitigating factor is that parent data
+			 * needs to be reloaded only if the delta base cache
+			 * limit is exceeded, so in the typical case, this does
+			 * not happen.
+			 */
+			get_base_data(parent);
+			parent->retain_data++;
 		}
-		i = nr_dispatched++;
 		work_unlock();
 
-		resolve_base(&objects[i]);
+		if (parent) {
+			child = resolve_delta(child_obj, parent);
+			if (!child->children_remaining)
+				FREE_AND_NULL(child->data);
+		} else {
+			child = make_base(child_obj, NULL);
+			if (child->children_remaining) {
+				/*
+				 * Since this child has its own delta children,
+				 * we will need this data in the future.
+				 * Inflate now so that future iterations will
+				 * have access to this object's data while
+				 * outside the work mutex.
+				 */
+				child->data = get_data_from_pack(child_obj);
+				child->size = child_obj->size;
+			}
+		}
+
+		work_lock();
+		if (parent)
+			parent->retain_data--;
+		if (child->data) {
+			/*
+			 * This child has its own children, so add it to
+			 * work_head.
+			 */
+			list_add(&child->list, &work_head);
+			base_cache_used += child->size;
+			prune_base_data(NULL);
+		} else {
+			/*
+			 * This child does not have its own children. It may be
+			 * the last descendant of its ancestors; free those
+			 * that we can.
+			 */
+			struct base_data *p = parent;
+
+			while (p) {
+				struct base_data *next_p;
+
+				p->children_remaining--;
+				if (p->children_remaining)
+					break;
+
+				next_p = p->base;
+				free_base_data(p);
+				list_del(&p->list);
+				free(p);
+
+				p = next_p;
+			}
+		}
+		work_unlock();
 	}
 	return NULL;
 }
@@ -1192,6 +1233,7 @@ static void resolve_deltas(void)
 					  nr_ref_deltas + nr_ofs_deltas);
 
 	nr_dispatched = 0;
+	base_cache_limit = delta_base_cache_limit * nr_threads;
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
 		init_thread();
 		for (i = 0; i < nr_threads; i++) {
@@ -1362,10 +1404,8 @@ static void fix_unresolved_deltas(struct hashfile *f)
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
-		struct base_data *base;
 		void *data;
 		unsigned long size;
-		struct object_entry *obj;
 
 		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
 			continue;
@@ -1376,11 +1416,15 @@ static void fix_unresolved_deltas(struct hashfile *f)
 		if (check_object_signature(&d->oid, data,
 					   size, type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
-		obj = append_obj_to_pack(f, d->oid.hash, data, size, type);
-		base = make_base(obj, NULL);
-		base->data = data;
-		base->size = size;
-		find_unresolved_deltas(base);
+
+		/*
+		 * Add this as an object to the objects array and call
+		 * threaded_second_pass() (which will pick up the added
+		 * object).
+		 */
+		append_obj_to_pack(f, d->oid.hash, data, size, type);
+		threaded_second_pass(NULL);
+
 		display_progress(progress, nr_resolved_deltas);
 	}
 	free(sorted_by_pos);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/7] Better threaded delta resolution in index-pack
  2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
                     ` (6 preceding siblings ...)
  2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
@ 2020-02-28  0:03   ` Josh Steadmon
  2020-03-10 21:45     ` Jonathan Tan
  7 siblings, 1 reply; 32+ messages in thread
From: Josh Steadmon @ 2020-02-28  0:03 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, stolee, peff

On 2019.10.17 13:17, Jonathan Tan wrote:
> Thanks, Stolee and Peff, for taking a look at it. Here is a v2. It is
> mostly unchanged, except for expanded commit messages and code comments.
> 
> I've also added a documentation clarification that
> core.deltaBaseCacheLimit is per-thread, appearing as the first patch in
> this patch series.
> 
> From patch 3 (now patch 4):
> 
> > > +	int i;
> > 
> > Technically this probably ought to be a size_t as well, but I'm much
> > more concerned about the allocation ones, where we might accidentally
> > overflow and underallocate a buffer. Overflowing "i" would probably just
> > lead to an error or bad result.
> 
> I believe this needs to be signed, since we're iterating in reverse
> order, so I made it a ssize_t instead (note the extra "s" in front).
> 
> From patch 4 (now patch 5):
> 
> > > Whenever we make a struct base_data, immediately calculate its delta
> > > children. This eliminates confusion as to when the
> > > {ref,ofs}_{first,last} fields are initialized.
> > 
> > That _seems_ like a good idea, but I'm a little worried just because I
> > don't entirely understand why it was being done lazily before. If you've
> > puzzled all that out, it would be nice to make the argument in the
> > commit message.
> 
> I've added an explanation in the commit message.
> 
> Jonathan Tan (7):
>   Documentation: deltaBaseCacheLimit is per-thread
>   index-pack: unify threaded and unthreaded code
>   index-pack: remove redundant parameter
>   index-pack: remove redundant child field
>   index-pack: calculate {ref,ofs}_{first,last} early
>   index-pack: make resolve_delta() assume base data
>   index-pack: make quantum of work smaller
> 
>  Documentation/config/core.txt |   2 +-
>  builtin/index-pack.c          | 446 ++++++++++++++++++----------------
>  2 files changed, 242 insertions(+), 206 deletions(-)

This series mostly looks good to me. There were a few parts I had
trouble following or convincing myself were safe, so there could be some
improvements with comments or more explicit commit messages, but no
problems apart from that.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/7] index-pack: remove redundant parameter
  2019-10-17 20:17   ` [PATCH v2 3/7] index-pack: remove redundant parameter Jonathan Tan
@ 2020-02-28  0:04     ` Josh Steadmon
  2020-03-10 21:29       ` Jonathan Tan
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Steadmon @ 2020-02-28  0:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, stolee, peff

On 2019.10.17 13:17, Jonathan Tan wrote:
> find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
> the object type is already present in the name of the function. Remove
> that parameter from these functions.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index df6b3b8cf6..296804230c 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -614,7 +614,7 @@ static int compare_ofs_delta_bases(off_t offset1, off_t offset2,
>  	       0;
>  }
>  
> -static int find_ofs_delta(const off_t offset, enum object_type type)
> +static int find_ofs_delta(const off_t offset)
>  {
>  	int first = 0, last = nr_ofs_deltas;
>  
> @@ -624,7 +624,8 @@ static int find_ofs_delta(const off_t offset, enum object_type type)
>  		int cmp;
>  
>  		cmp = compare_ofs_delta_bases(offset, delta->offset,
> -					      type, objects[delta->obj_no].type);
> +					      OBJ_OFS_DELTA,
> +					      objects[delta->obj_no].type);
>  		if (!cmp)
>  			return next;
>  		if (cmp < 0) {

It looks like compare_ofs_delta_bases() could be similarly cleaned up
here? This seems to be the only caller.


> @@ -668,7 +668,7 @@ static int compare_ref_delta_bases(const struct object_id *oid1,
>  	return oidcmp(oid1, oid2);
>  }
>  
> -static int find_ref_delta(const struct object_id *oid, enum object_type type)
> +static int find_ref_delta(const struct object_id *oid)
>  {
>  	int first = 0, last = nr_ref_deltas;
>  
> @@ -678,7 +678,8 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
>  		int cmp;
>  
>  		cmp = compare_ref_delta_bases(oid, &delta->oid,
> -					      type, objects[delta->obj_no].type);
> +					      OBJ_REF_DELTA,
> +					      objects[delta->obj_no].type);
>  		if (!cmp)
>  			return next;
>  		if (cmp < 0) {

And same with compare_ref_delta_bases here.


> @@ -691,10 +692,9 @@ static int find_ref_delta(const struct object_id *oid, enum object_type type)
>  }
>  
>  static void find_ref_delta_children(const struct object_id *oid,
> -				    int *first_index, int *last_index,
> -				    enum object_type type)
> +				    int *first_index, int *last_index)
>  {
> -	int first = find_ref_delta(oid, type);
> +	int first = find_ref_delta(oid);
>  	int last = first;
>  	int end = nr_ref_deltas - 1;
>  
> @@ -982,12 +982,10 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
>  {
>  	if (base->ref_last == -1 && base->ofs_last == -1) {
>  		find_ref_delta_children(&base->obj->idx.oid,
> -					&base->ref_first, &base->ref_last,
> -					OBJ_REF_DELTA);
> +					&base->ref_first, &base->ref_last);
>  
>  		find_ofs_delta_children(base->obj->idx.offset,
> -					&base->ofs_first, &base->ofs_last,
> -					OBJ_OFS_DELTA);
> +					&base->ofs_first, &base->ofs_last);
>  
>  		if (base->ref_last == -1 && base->ofs_last == -1) {
>  			free(base->data);
> -- 
> 2.23.0.866.gb869b98d4c-goog
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 4/7] index-pack: remove redundant child field
  2019-10-17 20:17   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
@ 2020-02-28  0:04     ` Josh Steadmon
  0 siblings, 0 replies; 32+ messages in thread
From: Josh Steadmon @ 2020-02-28  0:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, stolee, peff

On 2019.10.17 13:17, Jonathan Tan wrote:
> @@ -380,27 +378,37 @@ static void free_base_data(struct base_data *c)
>  	}
>  }
>  
> -static void prune_base_data(struct base_data *retain)
> +static void prune_base_data(struct base_data *youngest_child)
>  {
>  	struct base_data *b;
>  	struct thread_local *data = get_thread_data();
> -	for (b = data->base_cache;
> -	     data->base_cache_used > delta_base_cache_limit && b;
> -	     b = b->child) {
> -		if (b->data && b != retain)
> -			free_base_data(b);
> +	struct base_data **ancestry = NULL;
> +	size_t nr = 0, alloc = 0;
> +	ssize_t i;
> +
> +	if (data->base_cache_used <= delta_base_cache_limit)
> +		return;
> +
> +	/*
> +	 * Free all ancestors of youngest_child until we have enough space,
> +	 * starting with the oldest. (We cannot free youngest_child itself.)
> +	 */
> +	for (b = youngest_child->base; b != NULL; b = b->base) {
> +		ALLOC_GROW(ancestry, nr + 1, alloc);
> +		ancestry[nr++] = b;
>  	}
> +	for (i = nr - 1;
> +	     i >= 0 && data->base_cache_used > delta_base_cache_limit;
> +	     i--) {
> +		if (ancestry[i]->data)
> +			free_base_data(ancestry[i]);
> +	}
> +	free(ancestry);
>  }

I had a small complaint that we're allocating new memory in a function
where we are trying to free up space, but in practice it probably
doesn't matter. And this is removed in a later patch anyway.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 7/7] index-pack: make quantum of work smaller
  2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
@ 2020-02-28  0:04     ` Josh Steadmon
  2020-03-10 21:42       ` Jonathan Tan
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Steadmon @ 2020-02-28  0:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, stolee, peff

On 2019.10.17 13:17, Jonathan Tan wrote:
> Currently, when index-pack resolves deltas, it does not split up delta
> trees into threads: each delta base root (an object that is not a
> REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
> that root (direct or indirect) are processed in the same thread.
> 
> This is a problem when a repository contains a large text file (thus,
> delta-able) that is modified many times - delta resolution time during
> fetching is dominated by processing the deltas corresponding to that
> text file.
> 
> This patch contains a solution to that. When cloning using
> 
>   git -c core.deltabasecachelimit=1g clone \
>     https://fuchsia.googlesource.com/third_party/vulkan-cts
> 
> on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
> which is the default).
> 
> The solution is to have a global work stack. This stack contains delta
> bases (objects, whether appearing directly in the packfile or generated
> by delta resolution, that themselves have delta children) that need to
> be processed; whenever a thread needs work, it peeks at the top of the
> stack and processes its next unprocessed child. If a thread finds the
> stack empty, it will look for more delta base roots to push on the stack
> instead.
> 
> The main weakness of having a global work stack is that more time is
> spent in the mutex, but profiling has shown that most time is spent in
> the resolution of the deltas themselves, so this shouldn't be an issue
> in practice. In any case, experimentation (as described in the clone
> command above) shows that this patch is a net improvement.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 336 ++++++++++++++++++++++++-------------------
>  1 file changed, 190 insertions(+), 146 deletions(-)
> 
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 31607a77fc..072592a35d 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -38,15 +38,49 @@ struct base_data {
>  	struct object_entry *obj;
>  	int ref_first, ref_last;
>  	int ofs_first, ofs_last;
> +	/*
> +	 * Threads should increment retain_data if they are about to call
> +	 * patch_delta() using this struct's data as a base, and decrement this
> +	 * when they are done. While retain_data is nonzero, this struct's data
> +	 * will not be freed even if the delta base cache limit is exceeded.
> +	 */
> +	int retain_data;
> +	/*
> +	 * The number of direct children that have not been fully processed
> +	 * (entered work_head, entered done_head, left done_head). When this
> +	 * number reaches zero, this struct base_data can be freed.
> +	 */
> +	int children_remaining;
>  
>  	/* Not initialized by make_base(). */
> +	struct list_head list;
>  	void *data;
>  	unsigned long size;
>  };

As a newb, the addition of a field just called "list" struck me as
needing a more descriptive name. But this makes more sense now after
looking at list.h. And it seems that most uses in other parts of the
code are similarly just named "list".


> +/*
> + * Stack of struct base_data that have unprocessed children.
> + * threaded_second_pass() uses this as a source of work (the other being the
> + * objects array).
> + */
> +LIST_HEAD(work_head);
> +
> +/*
> + * Stack of struct base_data that have children, all of whom have been
> + * processed or are being processed, and at least one child is being processed.
> + * These struct base_data must be kept around until the last child is
> + * processed.
> + */
> +LIST_HEAD(done_head);
> +
> +/*
> + * All threads share one delta base cache.
> + */
> +size_t base_cache_used;
> +size_t base_cache_limit;
> +
>  struct thread_local {
>  	pthread_t thread;
> -	size_t base_cache_used;
>  	int pack_fd;
>  };

Seeing several new shared variables made me paranoid about whether or
not these were thread-safe. There are already five mutexes in this file,
but there are no clear descriptions (apart from hints in the lock names)
about which locks protect which variables. It would be nice to make this
a bit more explicit.

I wish we had something similar to Abseil's "GUARDED_BY" annotations [1]
so that we could ensure thread safety with static analysis, but that's
obviously out of scope for this series.

[1]: https://abseil.io/docs/cpp/guides/synchronization#thread-annotations


> @@ -920,6 +948,8 @@ static struct base_data *make_base(struct object_entry *obj,
>  				&base->ref_first, &base->ref_last);
>  	find_ofs_delta_children(obj->idx.offset,
>  				&base->ofs_first, &base->ofs_last);
> +	base->children_remaining = base->ref_last - base->ref_first +
> +		base->ofs_last - base->ofs_first + 2;
>  	return base;
>  }

The extra +2 here confused me for a bit, but it makes sense now. You're
subtracting indices to get the length, so you want +1 to make the math
work out, and you're doing it across two different lists. In the edge
case where a list is empty, "first" is 0 and "last" is -1, so the math
still works out.


> @@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
[snip]
>  		work_unlock();
>  
> -		resolve_base(&objects[i]);
> +		if (parent) {
> +			child = resolve_delta(child_obj, parent);
> +			if (!child->children_remaining)
> +				FREE_AND_NULL(child->data);
> +		} else {
> +			child = make_base(child_obj, NULL);
> +			if (child->children_remaining) {
> +				/*
> +				 * Since this child has its own delta children,
> +				 * we will need this data in the future.
> +				 * Inflate now so that future iterations will
> +				 * have access to this object's data while
> +				 * outside the work mutex.
> +				 */
> +				child->data = get_data_from_pack(child_obj);
> +				child->size = child_obj->size;
> +			}
> +		}
> +
> +		work_lock();

This part made me uneasy at first, because it looks like we might be
doing work on shared data outside of the work lock. But just prior to
unlocking, we parent->retain_data, so we know it won't be freed. And the
only part we modify is child, which I don't believe is shared with any
other threads.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/7] index-pack: remove redundant parameter
  2020-02-28  0:04     ` Josh Steadmon
@ 2020-03-10 21:29       ` Jonathan Tan
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2020-03-10 21:29 UTC (permalink / raw)
  To: steadmon; +Cc: jonathantanmy, git, stolee, peff

> It looks like compare_ofs_delta_bases() could be similarly cleaned up
> here? This seems to be the only caller.

[snip]

> And same with compare_ref_delta_bases here.

That's true, although I didn't need these cleanups for the rest of my patches,
so I think it's better to defer them to reduce the overall size of the
patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 7/7] index-pack: make quantum of work smaller
  2020-02-28  0:04     ` Josh Steadmon
@ 2020-03-10 21:42       ` Jonathan Tan
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2020-03-10 21:42 UTC (permalink / raw)
  To: steadmon; +Cc: jonathantanmy, git, stolee, peff

> > +/*
> > + * Stack of struct base_data that have unprocessed children.
> > + * threaded_second_pass() uses this as a source of work (the other being the
> > + * objects array).
> > + */
> > +LIST_HEAD(work_head);
> > +
> > +/*
> > + * Stack of struct base_data that have children, all of whom have been
> > + * processed or are being processed, and at least one child is being processed.
> > + * These struct base_data must be kept around until the last child is
> > + * processed.
> > + */
> > +LIST_HEAD(done_head);
> > +
> > +/*
> > + * All threads share one delta base cache.
> > + */
> > +size_t base_cache_used;
> > +size_t base_cache_limit;
> > +
> >  struct thread_local {
> >  	pthread_t thread;
> > -	size_t base_cache_used;
> >  	int pack_fd;
> >  };
> 
> Seeing several new shared variables made me paranoid about whether or
> not these were thread-safe. There are already five mutexes in this file,
> but there are no clear descriptions (apart from hints in the lock names)
> about which locks protect which variables. It would be nice to make this
> a bit more explicit.

I think that this patch set should confine itself to the work mutex and
what it controls (since that's what this patch set is affecting), but
admittedly, that is already more than a handful.

The work mutex controls work_head and done_head, and all the struct
base_data contained therein, except for data. However, data is
controlled by the semaphore retain_data (and the semaphore is itself
controlled by the work mutex, as described in the previous sentence):
when retain_data is positive, data can be read from but not written to.
The work mutex also controls base_cache_used but not base_cache_limit.
The work mutex does not control any struct base_data outside of
work_head and done_head, so it is possible to generate a struct
base_data outside the mutex, take the mutex, and add it to the list
(reducing time spent in the mutex).

This scheme seems rather complicated to me, but I couldn't make it
simpler - I was hoping for advice on this. If this is the best we can
do, I can try to document it in code comments.

> I wish we had something similar to Abseil's "GUARDED_BY" annotations [1]
> so that we could ensure thread safety with static analysis, but that's
> obviously out of scope for this series.
> 
> [1]: https://abseil.io/docs/cpp/guides/synchronization#thread-annotations

I think what we need is more complicated than what these annotations
provide, but anyway, Abseil is C++-only, I believe.

> > @@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
> [snip]
> >  		work_unlock();
> >  
> > -		resolve_base(&objects[i]);
> > +		if (parent) {
> > +			child = resolve_delta(child_obj, parent);
> > +			if (!child->children_remaining)
> > +				FREE_AND_NULL(child->data);
> > +		} else {
> > +			child = make_base(child_obj, NULL);
> > +			if (child->children_remaining) {
> > +				/*
> > +				 * Since this child has its own delta children,
> > +				 * we will need this data in the future.
> > +				 * Inflate now so that future iterations will
> > +				 * have access to this object's data while
> > +				 * outside the work mutex.
> > +				 */
> > +				child->data = get_data_from_pack(child_obj);
> > +				child->size = child_obj->size;
> > +			}
> > +		}
> > +
> > +		work_lock();
> 
> This part made me uneasy at first, because it looks like we might be
> doing work on shared data outside of the work lock. But just prior to
> unlocking, we parent->retain_data, so we know it won't be freed.

That's correct that it made you uneasy (I think the usual solution is to
make data refcounted, but we don't have a refcounting implementation in
Git), and that's correct that my solution was that parent->retain_data
prevents data from being concurrently written to (in this case, freed).
I didn't know how better to implement this.

> And the
> only part we modify is child, which I don't believe is shared with any
> other threads.

Yes - in the part you quoted, child is a whole new object (created from
resolve_delta() or make_base()). It is only added to the list within the
mutex (the work_lock()).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/7] Better threaded delta resolution in index-pack
  2020-02-28  0:03   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Josh Steadmon
@ 2020-03-10 21:45     ` Jonathan Tan
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2020-03-10 21:45 UTC (permalink / raw)
  To: steadmon; +Cc: jonathantanmy, git, stolee, peff

> This series mostly looks good to me. There were a few parts I had
> trouble following or convincing myself were safe, so there could be some
> improvements with comments or more explicit commit messages, but no
> problems apart from that.

Thanks!

For the rest who are following along, I somehow couldn't "git am" these
on latest master, but I could "git am" these on an old master commit
(from the time of the original work). The subsequent rebase works with
trivially-resolved merge conflicts except for one due to a21781011f
("index-pack: downgrade twice-resolved REF_DELTA to die()", 2020-02-04)
- but that was fixed by changing the relevant assert() command to
if/die().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 6/7] index-pack: make resolve_delta() assume base data
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
@ 2020-09-08 19:48   ` Jonathan Tan
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2020-09-08 19:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 94d0f53b03..f69a50d46b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -927,7 +927,7 @@ static struct base_data *make_base(struct object_entry *obj,
 static struct base_data *resolve_delta(struct object_entry *delta_obj,
 				       struct base_data *base)
 {
-	void *base_data, *delta_data, *result_data;
+	void *delta_data, *result_data;
 	struct base_data *result;
 	unsigned long result_size;
 
@@ -942,8 +942,8 @@ static struct base_data *resolve_delta(struct object_entry *delta_obj,
 		obj_stat[i].base_object_no = j;
 	}
 	delta_data = get_data_from_pack(delta_obj);
-	base_data = get_base_data(base);
-	result_data = patch_delta(base_data, base->size,
+	assert(base->data);
+	result_data = patch_delta(base->data, base->size,
 				  delta_data, delta_obj->size, &result_size);
 	free(delta_data);
 	if (!result_data)
@@ -1003,6 +1003,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 			    (uintmax_t)child->idx.offset,
 			    oid_to_hex(&base->obj->idx.oid));
 
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ref_first == base->ref_last && base->ofs_last == -1)
 			free_base_data(base);
@@ -1017,6 +1018,7 @@ static struct base_data *find_unresolved_deltas_1(struct base_data *base,
 
 		assert(child->real_type == OBJ_OFS_DELTA);
 		child->real_type = base->obj->real_type;
+		get_base_data(base);
 		result = resolve_delta(child, base);
 		if (base->ofs_first == base->ofs_last)
 			free_base_data(base);
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-09-08 19:49 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17  6:20   ` Jeff King
2019-10-09 23:44 ` [PATCH 2/6] index-pack: remove redundant parameter Jonathan Tan
2019-10-17  6:21   ` Jeff King
2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
2019-10-10 14:45   ` Derrick Stolee
2019-10-10 19:02     ` Jonathan Tan
2019-10-17  6:24       ` Jeff King
2019-10-17  6:26   ` Jeff King
2019-10-09 23:44 ` [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17  6:30   ` Jeff King
2019-10-09 23:44 ` [PATCH 5/6] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17  6:32   ` Jeff King
2019-10-09 23:44 ` [PATCH 6/6] index-pack: make quantum of work smaller Jonathan Tan
2019-10-17  6:35   ` Jeff King
2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 2/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 3/7] index-pack: remove redundant parameter Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2020-03-10 21:29       ` Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2019-10-17 20:17   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2020-03-10 21:42       ` Jonathan Tan
2020-02-28  0:03   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Josh Steadmon
2020-03-10 21:45     ` Jonathan Tan
  -- strict thread matches above, loose matches on Subject: below --
2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).