git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sun, 22 Mar 2020 01:36:35 -0400	[thread overview]
Message-ID: <20200322053635.GA578498@coredump.intra.peff.net> (raw)
In-Reply-To: <20200321172716.GA39461@syl.local>

On Sat, Mar 21, 2020 at 11:27:16AM -0600, Taylor Blau wrote:

> > I'm not sure how to do that, though. Saying "--input=none" still puts
> > all of those existing graphed objects into the list of oids to include.
> > I think you'd need a case where you were legitimately only adding a few
> > commits, but the merge rules say we need to create one big commit-graph
> > file.
> 
> You have to be careful, since we're taking the reachability closure over
> any commits that you do give. So, one thing you could do to ensure that
> you have an actually small graph is to take something from the output of
> 'git rev-list --max-parents=0 HEAD'.

I don't think you need to be that careful, though. In split-mode,
close_reachable() will stop traversing when it finds a graphed commit.
That's why using the tip of HEAD in my previous example worked.

> To try and reproduce your results, I used '1da177e4c3', which is the
> kernel's first commit in Git. If my interpretation of your setup is
> faithful, it goes something like:
> 
>   $ graphdir=.git/objects/info/commit-graphs
>   $ git rev-parse 1da177e4c3 |
>     git commit-graph write --split=no-merge --stdin-commits
>   $ cp -r "$graphdir{,.bak}"
> 
>   $ best-of-five -p "rm -rf $graphdir && cp -r $graphdir{.bak,}" \
>     'git commit-graph write --split=merge-all'

My case is the opposite, isn't it? Here it looks like you've made a very
_tiny_ commit-graph file (with one commit), and then you're going to end
up adding in all of the new objects. I don't think it would be improved
much by this patch (which makes me very confused by the numbers you got
below).

I also don't think it's that interesting a real-world case.

The more interesting one is where you do already have a big
commit-graph, and want to add just a bit more to it. In the real world,
that might look something like this:

  # create a fake server repo
  git clone --bare . dst.git

  # imagine the server already has everything in a graph file
  git -C dst.git commit-graph write --split=no-merge --reachable

  # and now do a small push
  git commit --allow-empty -m foo
  git push dst.git

  # the server might do an incremental immediately to cover the new
  # objects; here we'll use --stdin-commits with the new data, but a
  # real server might feed the new packfile. We'd probably just use
  # regular --split here in practice, but let's imagine that we're
  # starting to have a lot of graph files, and that triggers a desire to
  # merge. We'll force that state with --split=merge-all.
  git rev-list HEAD^..HEAD |
  git -C dst.git commit-graph write --split=merge-all --stdin-commits

Without your patch, that takes ~11s for me. With it, it takes ~2s.

Another equally interesting case is if the per-push generation _doesn't_
merge anything, and just creates a new, tiny graph file. And then later
we want to do a real maintenance, merging them all done. I think that
would be something like:

  git -C dst.git commit-graph write --input=none --split=merge-all

But that _isn't_ improved by your patch. For the simple reason that all
of the commits will already have been parsed. The "--input=none" option
isn't "no input"; it's actually "take all existing graphed objects as
input" (i.e., it implies --append). So each of those objects will
already have been examined in an earlier stage.

> Where the last step is taking all commits listed in any pack, which is
> cheap to iterate.

I'm not sure it's all that cheap. It has to find the type of every
object in every pack. And finding types involves walking delta chains.
That's something like 7s on my machine for linux.git (compared to the 2s
in which I can just merge down the existing graph files).

> In the above setup, I get something like:
> 
>   git version 2.26.0.rc2.221.ge327a58236
>   Attempt 1: 16.933
>   Attempt 2: 18.101
>   Attempt 3: 17.603
>   Attempt 4: 20.404
>   Attempt 5: 18.871
> 
>   real	0m16.933s
>   user	0m16.440s
>   sys	0m0.472s
> 
> versus:
> 
>   git version 2.26.0.rc2.222.g295e7905ee
>   Attempt 1: 5.34
>   Attempt 2: 4.623
>   Attempt 3: 5.263
>   Attempt 4: 5.268
>   Attempt 5: 5.606
> 
>   real	0m4.623s
>   user	0m4.428s
>   sys	0m0.176s
> 
> which is a best-case savings of ~72.7%, and a savings of ~71.5%. That
> seems much better.

I'm still puzzled by this. In the setup you showed, hardly anything is
graphed. But the change is only in the graph-merge code.

-Peff

  reply	other threads:[~2020-03-22  5:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King [this message]
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau
2020-03-27  8:42                           ` Jeff King
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200322053635.GA578498@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).