git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/6] Better threaded delta resolution in index-pack
@ 2019-10-09 23:44 Jonathan Tan
  2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Jonathan Tan @ 2019-10-09 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, mh

Quoting myself [1]:

> index-pack does parallelize delta resolution, but
> it cannot split up trees into threads: each delta base root can go into
> its own thread, but when a delta base root is processed, all deltas on
> that root (direct or indirect) is processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file. Here are patches that teach index-pack to better divide up
the work.

As an example of the effect, when cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

As you can see from the diff stats, my new algorithm uses comparable
lines of code to the existing one, but I think that it is a bit more
complicated. My main point of difficulty was in handling the delta base
cache - it must be GC-able, but at the same time available to another
thread if it was being used as a base to inflate a delta. In the end,
what I did was to make individual mutex-guarded refcounts for each
inflation result, but the buffer itself is not mutex-guarded: so a
thread could increment the refcount within the mutex, inflate (and
verify) outside the mutex, and then decrement the refcount within the
mutex. (One global mutex guards all the refcounts, as well as other
things.) Any ideas for making this design less complicated is
appreciated.

If this is a good direction, let me know and I'll refine the patches. I
personally think that the improvement in performance is worth the slight
added complexity. Also, in this patch set, I did some cleanup to make
future patches clearer, but some of the cleanup is undone by the future
patches themselves; let me know if it's easier to review if I should
squash those patches.

Also CC-ing Mike Hommey because Mike brought up a repo with a similar
case [2], although that case happens during repack.

[1] https://public-inbox.org/git/20190926003300.195781-1-jonathantanmy@google.com/
[2] https://public-inbox.org/git/20190704100530.smn4rpiekwtfylhz@glandium.org/

Jonathan Tan (6):
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant parameter
  index-pack: remove redundant child field
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: make resolve_delta() assume base data
  index-pack: make quantum of work smaller

 builtin/index-pack.c | 375 ++++++++++++++++++++-----------------------
 1 file changed, 177 insertions(+), 198 deletions(-)

-- 
2.23.0.581.g78d2f28ef7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
@ 2020-08-24 19:16 Jonathan Tan
  2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
  0 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2020-08-24 19:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, steadmon

I'm trying to resurrect [1] and have rebased it to latest master
(675a4aaf3b ("Ninth batch", 2020-08-19)).

Peff said [2] (of v1) that the overall direction seems reasonable and
Josh Steadmon said [3] (of v2) that it looks mostly good except for
possible improvements to commit messages and comments. Josh did not list
out specific improvements to commit messages but I have taken his
suggestions for comments.

[1] https://lore.kernel.org/git/cover.1571343096.git.jonathantanmy@google.com/
[2] https://lore.kernel.org/git/20191017063554.GG10862@sigill.intra.peff.net/
[3] https://lore.kernel.org/git/20200228000350.GB12115@google.com/

Jonathan Tan (7):
  Documentation: deltaBaseCacheLimit is per-thread
  index-pack: remove redundant parameter
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant child field
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: make resolve_delta() assume base data
  index-pack: make quantum of work smaller

 Documentation/config/core.txt |   2 +-
 builtin/index-pack.c          | 449 ++++++++++++++++++----------------
 2 files changed, 244 insertions(+), 207 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-09-08 19:50 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17  6:20   ` Jeff King
2019-10-09 23:44 ` [PATCH 2/6] index-pack: remove redundant parameter Jonathan Tan
2019-10-17  6:21   ` Jeff King
2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
2019-10-10 14:45   ` Derrick Stolee
2019-10-10 19:02     ` Jonathan Tan
2019-10-17  6:24       ` Jeff King
2019-10-17  6:26   ` Jeff King
2019-10-09 23:44 ` [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17  6:30   ` Jeff King
2019-10-09 23:44 ` [PATCH 5/6] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17  6:32   ` Jeff King
2019-10-09 23:44 ` [PATCH 6/6] index-pack: make quantum of work smaller Jonathan Tan
2019-10-17  6:35   ` Jeff King
2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 2/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 3/7] index-pack: remove redundant parameter Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2020-03-10 21:29       ` Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2019-10-17 20:17   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2020-03-10 21:42       ` Jonathan Tan
2020-02-28  0:03   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Josh Steadmon
2020-03-10 21:45     ` Jonathan Tan
  -- strict thread matches above, loose matches on Subject: below --
2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).