From: Taylor Blau <me@ttaylorr.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sat, 21 Mar 2020 00:24:19 -0600 [thread overview]
Message-ID: <20200321062419.GA37345@syl.local> (raw)
In-Reply-To: <20200321061141.GA30636@syl.local>
On Sat, Mar 21, 2020 at 12:11:41AM -0600, Taylor Blau wrote:
> On Sat, Mar 21, 2020 at 01:00:25AM -0400, Jeff King wrote:
> > On Fri, Mar 20, 2020 at 09:44:23PM -0600, Taylor Blau wrote:
> >
> > > In my testing environment, this improves the time to "merge" a split
> > > commit-graph containing all reachable commits in the kernel by
> > > re-writing the same commit-graph (effectively measuring the time it
> > > takes to check that all of those commits still exist) from:
> > >
> > > Attempt 1: 9.614
> > > Attempt 2: 10.984
> > > Attempt 3: 10.39
> > > Attempt 4: 9.14
> > > Attempt 5: 9.439
> > >
> > > real 0m9.140s
> > > user 0m8.207s
> > > sys 0m0.602s
> > >
> > > to:
> > >
> > > Attempt 1: 9.12
> > > Attempt 2: 8.904
> > > Attempt 3: 9.361
> > > Attempt 4: 9.288
> > > Attempt 5: 9.677
> > >
> > > real 0m8.904s
> > > user 0m8.208s
> > > sys 0m0.596s
> > >
> > > yielding a modest ~2.6% improvement in the best timings from each run,
> > > and ~7.4% improvement on average.
> >
> > That still seems really slow to me. If we were truly eliding the load of
> > most of the commit objects, I'd expect an order of magnitude or so
> > improvement. For example, with a fully constructed commit graph in
> > linux.git, I get:
> >
> > $ time git -c core.commitGraph=1 rev-list HEAD | wc -l
> > 886922
> >
> > real 0m1.088s
> > user 0m0.659s
> > sys 0m1.161s
> >
> > $ time git -c core.commitGraph=0 rev-list HEAD | wc -l
> > 886922
> >
> > real 0m7.185s
> > user 0m6.729s
> > sys 0m1.882s
> >
> > Obviously not the same operation, but that should give us a rough idea
> > that commit graph lookups are 6-7 times cheaper than loading the actual
> > objects. I don't remember the details of the case that originally led us
> > towards this patch. Can you share more of the setup you used to generate
> > the numbers above (which repo, but more importantly the commands to
> > create the initial state and then time the test).
>
> Sure. I'm running best-of-five on the time it takes to re-generate and
> merge a commit-graph based on in-pack commits.
>
> The script is (in linux.git):
>
> $ best-of-five \
> -p 'rm -rf .git/objects/info/commit-graph{,s/}; git commit-graph write --split=no-merge 2>/dev/null' \
> git commit-graph write --split=merge-all
>
> So we're measuring the time it takes to crawl all the packs, decide on
> the splitting strategy, and then compare all commits in the new merged
> graph to make sure that they don't already exist in the object store.
>
> But, here's where things get... Bizarre. I was trying to come up with a
> way to do fewer things and spend proportionally more time in
> 'merge_commit_graphs', so I did something like:
>
> - Generate a pack containing a single, empty commit.
> - Generate a split commit-graph containing commits in the single large
> pack containing all of history.
> - Generate a commit-graph of the small pack, and merge it with the
> large pack.
>
> That script is:
>
> $ git --version
> $ git commit -m "empty" --allow-empty
> $ pack="pack-$(git rev-parse HEAD | git pack-objects .git/objects/pack/pack).idx"
> $ best-of-five \
> -p "rm -rf .git/objects/info/commit-graphs && cp -r .git/objects/info/commit-graphs{.bak,}" \
> sh -c "echo $pack | git commit-graph write --split=merge-all"
>
> but things get... slower with this patch? Here are the results before
> and after:
>
> Attempt 1: 8.444
> Attempt 2: 8.453
> Attempt 3: 8.391
> Attempt 4: 8.376
> Attempt 5: 7.859
>
> real 0m7.859s
> user 0m7.309s
> sys 0m0.511s
>
> vs:
>
> Attempt 1: 8.69
> Attempt 2: 8.735
> Attempt 3: 8.619
> Attempt 4: 8.747
> Attempt 5: 8.695
>
> real 0m8.619s
> user 0m8.030s
> sys 0m0.538s
>
> Without more profiling, I'm puzzled by why this patch seems to make
> things *worse* under this scenario.
Hmm. I tried roughly the same setup as the latter of the above two, but
this time under perf. On a warm cache, it does look like this parch is
an improvement:
$ sudo perf diff perf.data perf.data.patch | grep merge_commit
0.11% -0.02% git [.] merge_commit_graph
(perf.data is from running 'next', perf.data.patch is from running
'next' with this patch applied on top). So... a slight improvement, but
nothing like what I had reported in the original patch.
I am puzzled.
> So, I'd appreciate your thoughts: does this set-up seem reasonable? Is
> it introducing some latency that isn't being accounted for in the
> original setup?
>
> > The patch otherwise still makes sense to me, but I suspect there are
> > other similar optimizations nearby that we'll need to do in tandem.
> >
> > -Peff
>
> Thanks,
> Taylor
Thanks,
Taylor
next prev parent reply other threads:[~2020-03-21 6:24 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-21 3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21 3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21 5:00 ` Jeff King
2020-03-21 6:11 ` Taylor Blau
2020-03-21 6:24 ` Taylor Blau [this message]
2020-03-21 7:03 ` Jeff King
2020-03-21 17:27 ` Taylor Blau
2020-03-22 5:36 ` Jeff King
2020-03-22 11:04 ` SZEDER Gábor
2020-03-22 18:45 ` looking up object types quickly, was " Jeff King
2020-03-22 19:18 ` Jeff King
2020-03-23 20:15 ` Taylor Blau
2020-03-22 16:45 ` Taylor Blau
2020-03-24 6:06 ` Jeff King
2020-03-21 18:50 ` Junio C Hamano
2020-03-22 0:03 ` Derrick Stolee
2020-03-22 0:20 ` Taylor Blau
2020-03-22 0:23 ` Derrick Stolee
2020-03-22 5:49 ` Jeff King
2020-03-22 6:04 ` Jeff King
2020-03-22 15:47 ` Taylor Blau
2020-03-24 6:11 ` Jeff King
2020-03-24 23:08 ` Taylor Blau
2020-03-27 8:42 ` Jeff King
2020-03-27 15:03 ` Taylor Blau
2020-03-22 15:44 ` Taylor Blau
2020-03-24 6:14 ` Jeff King
2020-03-21 5:01 ` Junio C Hamano
2020-03-21 4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21 5:04 ` Jeff King
2020-03-21 6:12 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200321062419.GA37345@syl.local \
--to=me@ttaylorr.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).