From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>, stolee@gmail.com, peff@peff.net
Subject: [PATCH v2 0/7] Better threaded delta resolution in index-pack
Date: Thu, 17 Oct 2019 13:17:09 -0700 [thread overview]
Message-ID: <cover.1571343096.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1570663470.git.jonathantanmy@google.com>
Thanks, Stolee and Peff, for taking a look at it. Here is a v2. It is
mostly unchanged, except for expanded commit messages and code comments.
I've also added a documentation clarification that
core.deltaBaseCacheLimit is per-thread, appearing as the first patch in
this patch series.
From patch 3 (now patch 4):
> > + int i;
>
> Technically this probably ought to be a size_t as well, but I'm much
> more concerned about the allocation ones, where we might accidentally
> overflow and underallocate a buffer. Overflowing "i" would probably just
> lead to an error or bad result.
I believe this needs to be signed, since we're iterating in reverse
order, so I made it a ssize_t instead (note the extra "s" in front).
From patch 4 (now patch 5):
> > Whenever we make a struct base_data, immediately calculate its delta
> > children. This eliminates confusion as to when the
> > {ref,ofs}_{first,last} fields are initialized.
>
> That _seems_ like a good idea, but I'm a little worried just because I
> don't entirely understand why it was being done lazily before. If you've
> puzzled all that out, it would be nice to make the argument in the
> commit message.
I've added an explanation in the commit message.
Jonathan Tan (7):
Documentation: deltaBaseCacheLimit is per-thread
index-pack: unify threaded and unthreaded code
index-pack: remove redundant parameter
index-pack: remove redundant child field
index-pack: calculate {ref,ofs}_{first,last} early
index-pack: make resolve_delta() assume base data
index-pack: make quantum of work smaller
Documentation/config/core.txt | 2 +-
builtin/index-pack.c | 446 ++++++++++++++++++----------------
2 files changed, 242 insertions(+), 206 deletions(-)
Range-diff against v1:
-: ---------- > 1: 0a6777a243 Documentation: deltaBaseCacheLimit is per-thread
1: 7562afbaa9 = 2: b19d6131e0 index-pack: unify threaded and unthreaded code
2: a8567333dc ! 3: f01f069a08 index-pack: remove redundant parameter
@@ Metadata
## Commit message ##
index-pack: remove redundant parameter
+ find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
+ the object type is already present in the name of the function. Remove
+ that parameter from these functions.
+
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
## builtin/index-pack.c ##
3: 97eddde2ec ! 4: 3359b66b84 index-pack: remove redundant child field
@@ Metadata
## Commit message ##
index-pack: remove redundant child field
- Instead, recompute ancestry if we ever need to reclaim memory.
+ This is refactoring 1 of 2 to simplify struct base_data.
+
+ In index-pack, each thread maintains a doubly-linked list of the delta
+ chain that it is currently processing (the "base" and "child" pointers
+ in struct base_data). When a thread exceeds the delta base cache limit
+ and needs to reclaim memory, it uses the "child" pointers to traverse
+ the lineage, reclaiming the memory of the eldest delta bases first.
+
+ A subsequent patch will perform memory reclaiming in a different way and
+ will thus no longer need the "child" pointer. Because the "child"
+ pointer is redundant even now, remove it so that the aforementioned
+ subsequent patch will be clearer. In the meantime, reclaim memory in the
+ reverse order of the "base" pointers.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
@@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
- if (b->data && b != retain)
- free_base_data(b);
+ struct base_data **ancestry = NULL;
-+ int nr = 0, alloc = 0;
-+ int i;
++ size_t nr = 0, alloc = 0;
++ ssize_t i;
+
+ if (data->base_cache_used <= delta_base_cache_limit)
+ return;
@@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
+ for (b = youngest_child->base; b != NULL; b = b->base) {
+ ALLOC_GROW(ancestry, nr + 1, alloc);
+ ancestry[nr++] = b;
-+ }
+ }
+ for (i = nr - 1;
+ i >= 0 && data->base_cache_used > delta_base_cache_limit;
+ i--) {
+ if (ancestry[i]->data)
+ free_base_data(ancestry[i]);
- }
++ }
+ free(ancestry);
}
4: 5d9687145d ! 5: 7f18480c45 index-pack: calculate {ref,ofs}_{first,last} early
@@ Metadata
## Commit message ##
index-pack: calculate {ref,ofs}_{first,last} early
+ This is refactoring 2 of 2 to simplify struct base_data.
+
Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.
+ Before this patch, the delta children were calculated at the last
+ possible moment. This allowed the members of struct base_data to be
+ populated in any order, superficially useful when we have the object
+ contents before the struct object_entry. But this makes reasoning about
+ the state of struct base_data more complicated, hence this patch.
+
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
## builtin/index-pack.c ##
5: ca4997017d = 6: 910b15219f index-pack: make resolve_delta() assume base data
6: 4f7c252a7c ! 7: 2f2e36d3ef index-pack: make quantum of work smaller
@@ Metadata
## Commit message ##
index-pack: make quantum of work smaller
+ Currently, when index-pack resolves deltas, it does not split up delta
+ trees into threads: each delta base root (an object that is not a
+ REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
+ that root (direct or indirect) are processed in the same thread.
+
+ This is a problem when a repository contains a large text file (thus,
+ delta-able) that is modified many times - delta resolution time during
+ fetching is dominated by processing the deltas corresponding to that
+ text file.
+
+ This patch contains a solution to that. When cloning using
+
+ git -c core.deltabasecachelimit=1g clone \
+ https://fuchsia.googlesource.com/third_party/vulkan-cts
+
+ on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
+ which is the default).
+
+ The solution is to have a global work stack. This stack contains delta
+ bases (objects, whether appearing directly in the packfile or generated
+ by delta resolution, that themselves have delta children) that need to
+ be processed; whenever a thread needs work, it peeks at the top of the
+ stack and processes its next unprocessed child. If a thread finds the
+ stack empty, it will look for more delta base roots to push on the stack
+ instead.
+
+ The main weakness of having a global work stack is that more time is
+ spent in the mutex, but profiling has shown that most time is spent in
+ the resolution of the deltas themselves, so this shouldn't be an issue
+ in practice. In any case, experimentation (as described in the clone
+ command above) shows that this patch is a net improvement.
+
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
## builtin/index-pack.c ##
@@ builtin/index-pack.c: struct base_data {
struct object_entry *obj;
int ref_first, ref_last;
int ofs_first, ofs_last;
++ /*
++ * Threads should increment retain_data if they are about to call
++ * patch_delta() using this struct's data as a base, and decrement this
++ * when they are done. While retain_data is nonzero, this struct's data
++ * will not be freed even if the delta base cache limit is exceeded.
++ */
+ int retain_data;
++ /*
++ * The number of direct children that have not been fully processed
++ * (entered work_head, entered done_head, left done_head). When this
++ * number reaches zero, this struct base_data can be freed.
++ */
+ int children_remaining;
/* Not initialized by make_base(). */
@@ builtin/index-pack.c: struct base_data {
unsigned long size;
};
++/*
++ * Stack of struct base_data that have unprocessed children.
++ * threaded_second_pass() uses this as a source of work (the other being the
++ * objects array).
++ */
+LIST_HEAD(work_head);
++
++/*
++ * Stack of struct base_data that have children, all of whom have been
++ * processed or are being processed, and at least one child is being processed.
++ * These struct base_data must be kept around until the last child is
++ * processed.
++ */
+LIST_HEAD(done_head);
++
++/*
++ * All threads share one delta base cache.
++ */
+size_t base_cache_used;
+size_t base_cache_limit;
+
@@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
- struct base_data *b;
- struct thread_local *data = get_thread_data();
- struct base_data **ancestry = NULL;
-- int nr = 0, alloc = 0;
-- int i;
+- size_t nr = 0, alloc = 0;
+- ssize_t i;
+ struct list_head *pos;
- if (data->base_cache_used <= delta_base_cache_limit)
@@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
}
static int is_delta_type(enum object_type type)
+@@ builtin/index-pack.c: static void sha1_object(const void *data, struct object_entry *obj_entry,
+ }
+
+ /*
+- * This function is part of find_unresolved_deltas(). There are two
+- * walkers going in the opposite ways.
+- *
+- * The first one in find_unresolved_deltas() traverses down from
+- * parent node to children, deflating nodes along the way. However,
+- * memory for deflated nodes is limited by delta_base_cache_limit, so
+- * at some point parent node's deflated content may be freed.
+- *
+- * The second walker is this function, which goes from current node up
++ * Walk from current node up
+ * to top parent if necessary to deflate the node. In normal
+ * situation, its parent node would be already deflated, so it just
+ * needs to apply delta.
@@ builtin/index-pack.c: static void *get_base_data(struct base_data *c)
if (!delta_nr) {
c->data = get_data_from_pack(obj);
@@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
}
-static void resolve_base(struct object_entry *obj)
-+static void *do_work(void *data)
- {
+-{
- struct base_data *base_obj = make_base(obj, NULL);
-
- find_unresolved_deltas(base_obj);
-}
-
--static void *threaded_second_pass(void *data)
--{
+ static void *threaded_second_pass(void *data)
+ {
- set_thread_data(data);
+ if (data)
+ set_thread_data(data);
@@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
- work_unlock();
- break;
+ if (list_empty(&work_head)) {
++ /*
++ * Take an object from the object array.
++ */
+ while (nr_dispatched < nr_objects &&
+ is_delta_type(objects[nr_dispatched].type))
+ nr_dispatched++;
@@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
+ }
+ child_obj = &objects[nr_dispatched++];
+ } else {
++ /*
++ * Peek at the top of the stack, and take a child from
++ * it.
++ */
+ parent = list_first_entry(&work_head, struct base_data,
+ list);
+
@@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
+
+ if (parent->ref_first > parent->ref_last &&
+ parent->ofs_first > parent->ofs_last) {
++ /*
++ * This parent has run out of children, so move
++ * it to done_head.
++ */
+ list_del(&parent->list);
+ list_add(&parent->list, &done_head);
+ }
+
++ /*
++ * Ensure that the parent has data, since we will need
++ * it later.
++ *
++ * NEEDSWORK: If parent data needs to be reloaded, this
++ * prolongs the time that the current thread spends in
++ * the mutex. A mitigating factor is that parent data
++ * needs to be reloaded only if the delta base cache
++ * limit is exceeded, so in the typical case, this does
++ * not happen.
++ */
+ get_base_data(parent);
+ parent->retain_data++;
}
@@ builtin/index-pack.c: static int compare_ref_delta_entry(const void *a, const vo
+ if (parent)
+ parent->retain_data--;
+ if (child->data) {
++ /*
++ * This child has its own children, so add it to
++ * work_head.
++ */
+ list_add(&child->list, &work_head);
+ base_cache_used += child->size;
+ prune_base_data(NULL);
+ } else {
++ /*
++ * This child does not have its own children. It may be
++ * the last descendant of its ancestors; free those
++ * that we can.
++ */
+ struct base_data *p = parent;
+
+ while (p) {
@@ builtin/index-pack.c: static void resolve_deltas(void)
if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
init_thread();
for (i = 0; i < nr_threads; i++) {
- int ret = pthread_create(&thread_data[i].thread, NULL,
-- threaded_second_pass, thread_data + i);
-+ do_work, thread_data + i);
- if (ret)
- die(_("unable to create thread: %s"),
- strerror(ret));
-@@ builtin/index-pack.c: static void resolve_deltas(void)
- cleanup_thread();
- return;
- }
-- threaded_second_pass(¬hread_data);
-+ do_work(¬hread_data);
- }
-
- /*
@@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
for (i = 0; i < nr_ref_deltas; i++) {
struct ref_delta_entry *d = sorted_by_pos[i];
@@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
- base->data = data;
- base->size = size;
- find_unresolved_deltas(base);
++
++ /*
++ * Add this as an object to the objects array and call
++ * threaded_second_pass() (which will pick up the added
++ * object).
++ */
+ append_obj_to_pack(f, d->oid.hash, data, size, type);
-+ do_work(NULL); /* will pick up new object in objects array (added by append_obj_to_pack) */
++ threaded_second_pass(NULL);
++
display_progress(progress, nr_resolved_deltas);
}
free(sorted_by_pos);
--
2.23.0.866.gb869b98d4c-goog
next prev parent reply other threads:[~2019-10-17 20:17 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17 6:20 ` Jeff King
2019-10-09 23:44 ` [PATCH 2/6] index-pack: remove redundant parameter Jonathan Tan
2019-10-17 6:21 ` Jeff King
2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
2019-10-10 14:45 ` Derrick Stolee
2019-10-10 19:02 ` Jonathan Tan
2019-10-17 6:24 ` Jeff King
2019-10-17 6:26 ` Jeff King
2019-10-09 23:44 ` [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17 6:30 ` Jeff King
2019-10-09 23:44 ` [PATCH 5/6] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17 6:32 ` Jeff King
2019-10-09 23:44 ` [PATCH 6/6] index-pack: make quantum of work smaller Jonathan Tan
2019-10-17 6:35 ` Jeff King
2019-10-17 20:17 ` Jonathan Tan [this message]
2019-10-17 20:17 ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2019-10-17 20:17 ` [PATCH v2 2/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17 20:17 ` [PATCH v2 3/7] index-pack: remove redundant parameter Jonathan Tan
2020-02-28 0:04 ` Josh Steadmon
2020-03-10 21:29 ` Jonathan Tan
2019-10-17 20:17 ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
2020-02-28 0:04 ` Josh Steadmon
2019-10-17 20:17 ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17 20:17 ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17 20:17 ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-02-28 0:04 ` Josh Steadmon
2020-03-10 21:42 ` Jonathan Tan
2020-02-28 0:03 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Josh Steadmon
2020-03-10 21:45 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1571343096.git.jonathantanmy@google.com \
--to=jonathantanmy@google.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).