git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sat, 21 Mar 2020 03:03:33 -0400	[thread overview]
Message-ID: <20200321070333.GB1441446@coredump.intra.peff.net> (raw)
In-Reply-To: <20200321061141.GA30636@syl.local>

On Sat, Mar 21, 2020 at 12:11:41AM -0600, Taylor Blau wrote:

> Sure. I'm running best-of-five on the time it takes to re-generate and
> merge a commit-graph based on in-pack commits.
> 
> The script is (in linux.git):
> 
>   $ best-of-five \
>       -p 'rm -rf .git/objects/info/commit-graph{,s/}; git commit-graph write --split=no-merge 2>/dev/null' \
>       git commit-graph write --split=merge-all

If I build Git without your patch and run the "--split=merge-all"
command under a debugger, and then break on parse_object() I find that
all of the commits are already parsed. This happens in
close_reachable().

So we weren't actually reading all of the commits even under the old
code. We were just going into deref_tag(), seeing that the object is
already parsed, and then quickly returning when we see that we already
have an OBJ_COMMIT. I suspect most of your timing differences are mostly
noise.

Perhaps a more interesting case is when you're _not_ adding all of the
existing packed commits as input. There we'd feed only a few objects to
close_reachable(), and it would stop traversing as soon as it hits a
parent that's already in a graph file. So most of the existing objects
would remain unparsed.

I'm not sure how to do that, though. Saying "--input=none" still puts
all of those existing graphed objects into the list of oids to include.
I think you'd need a case where you were legitimately only adding a few
commits, but the merge rules say we need to create one big commit-graph
file.

I guess --input=stdin-commits is a good way to simulate that. Try this
(assuming there's already a split-graph file with all of the commits in
it):

  git rev-parse HEAD >input
  time git commit-graph write --input=stdin-commits --split=merge-all <input

Without your patch on linux.git I get:

  real	0m11.713s
  user	0m11.349s
  sys	0m0.341s

but with it I get:

  real	0m2.305s
  user	0m2.177s
  sys	0m0.100s

A more realistic case would probably be feeding a new small pack to
--input=stdin-packs.

> But, here's where things get... Bizarre. I was trying to come up with a
> way to do fewer things and spend proportionally more time in
> 'merge_commit_graphs', so I did something like:
> 
>   - Generate a pack containing a single, empty commit.
>   - Generate a split commit-graph containing commits in the single large
>     pack containing all of history.
>   - Generate a commit-graph of the small pack, and merge it with the
>     large pack.
> 
> That script is:
> 
>   $ git --version
>   $ git commit -m "empty" --allow-empty
>   $ pack="pack-$(git rev-parse HEAD | git pack-objects .git/objects/pack/pack).idx"
>   $ best-of-five \
>       -p "rm -rf .git/objects/info/commit-graphs && cp -r .git/objects/info/commit-graphs{.bak,}" \
>       sh -c "echo $pack | git commit-graph write --split=merge-all"

I think you'd need --stdin-packs in the actual timed command?

At any rate, I think there is a demonstrable speedup there. But
moreover, I'm pretty sure this existing code is not doing what it
expects:

  /* only add commits if they still exist in the repo */
  result = lookup_commit_reference_gently(ctx->r, &oid, 1);

That won't look at the object database at all if the commit is already
marked as parsed. And that parsing might have come from the commit graph
itself, as our earlier attempts showed. So switching to a real
has_object_file() call is an important _correctness_ fix, even leaving
aside the speed improvements.

-Peff

  parent reply	other threads:[~2020-03-21  7:03 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King [this message]
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau
2020-03-27  8:42                           ` Jeff King
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200321070333.GB1441446@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).