git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sat, 21 Mar 2020 00:11:41 -0600	[thread overview]
Message-ID: <20200321061141.GA30636@syl.local> (raw)
In-Reply-To: <20200321050025.GA1438317@coredump.intra.peff.net>

On Sat, Mar 21, 2020 at 01:00:25AM -0400, Jeff King wrote:
> On Fri, Mar 20, 2020 at 09:44:23PM -0600, Taylor Blau wrote:
>
> > In my testing environment, this improves the time to "merge" a split
> > commit-graph containing all reachable commits in the kernel by
> > re-writing the same commit-graph (effectively measuring the time it
> > takes to check that all of those commits still exist) from:
> >
> >   Attempt 1: 9.614
> >   Attempt 2: 10.984
> >   Attempt 3: 10.39
> >   Attempt 4: 9.14
> >   Attempt 5: 9.439
> >
> >   real	0m9.140s
> >   user	0m8.207s
> >   sys	0m0.602s
> >
> > to:
> >
> >   Attempt 1: 9.12
> >   Attempt 2: 8.904
> >   Attempt 3: 9.361
> >   Attempt 4: 9.288
> >   Attempt 5: 9.677
> >
> >   real	0m8.904s
> >   user	0m8.208s
> >   sys	0m0.596s
> >
> > yielding a modest ~2.6% improvement in the best timings from each run,
> > and ~7.4% improvement on average.
>
> That still seems really slow to me. If we were truly eliding the load of
> most of the commit objects, I'd expect an order of magnitude or so
> improvement. For example, with a fully constructed commit graph in
> linux.git, I get:
>
>   $ time git -c core.commitGraph=1 rev-list HEAD | wc -l
>   886922
>
>   real	0m1.088s
>   user	0m0.659s
>   sys	0m1.161s
>
>   $ time git -c core.commitGraph=0 rev-list HEAD | wc -l
>   886922
>
>   real	0m7.185s
>   user	0m6.729s
>   sys	0m1.882s
>
> Obviously not the same operation, but that should give us a rough idea
> that commit graph lookups are 6-7 times cheaper than loading the actual
> objects. I don't remember the details of the case that originally led us
> towards this patch. Can you share more of the setup you used to generate
> the numbers above (which repo, but more importantly the commands to
> create the initial state and then time the test).

Sure. I'm running best-of-five on the time it takes to re-generate and
merge a commit-graph based on in-pack commits.

The script is (in linux.git):

  $ best-of-five \
      -p 'rm -rf .git/objects/info/commit-graph{,s/}; git commit-graph write --split=no-merge 2>/dev/null' \
      git commit-graph write --split=merge-all

So we're measuring the time it takes to crawl all the packs, decide on
the splitting strategy, and then compare all commits in the new merged
graph to make sure that they don't already exist in the object store.

But, here's where things get... Bizarre. I was trying to come up with a
way to do fewer things and spend proportionally more time in
'merge_commit_graphs', so I did something like:

  - Generate a pack containing a single, empty commit.
  - Generate a split commit-graph containing commits in the single large
    pack containing all of history.
  - Generate a commit-graph of the small pack, and merge it with the
    large pack.

That script is:

  $ git --version
  $ git commit -m "empty" --allow-empty
  $ pack="pack-$(git rev-parse HEAD | git pack-objects .git/objects/pack/pack).idx"
  $ best-of-five \
      -p "rm -rf .git/objects/info/commit-graphs && cp -r .git/objects/info/commit-graphs{.bak,}" \
      sh -c "echo $pack | git commit-graph write --split=merge-all"

but things get... slower with this patch? Here are the results before
and after:

  Attempt 1: 8.444
  Attempt 2: 8.453
  Attempt 3: 8.391
  Attempt 4: 8.376
  Attempt 5: 7.859

  real	0m7.859s
  user	0m7.309s
  sys	0m0.511s

vs:

  Attempt 1: 8.69
  Attempt 2: 8.735
  Attempt 3: 8.619
  Attempt 4: 8.747
  Attempt 5: 8.695

  real	0m8.619s
  user	0m8.030s
  sys	0m0.538s

Without more profiling, I'm puzzled by why this patch seems to make
things *worse* under this scenario.

So, I'd appreciate your thoughts: does this set-up seem reasonable? Is
it introducing some latency that isn't being accounted for in the
original setup?

> The patch otherwise still makes sense to me, but I suspect there are
> other similar optimizations nearby that we'll need to do in tandem.
>
> -Peff

Thanks,
Taylor

  reply	other threads:[~2020-03-21  6:20 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau [this message]
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau
2020-03-27  8:42                           ` Jeff King
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200321061141.GA30636@syl.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).