git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 0/3] Make add_missing_tags() linear
Date: Thu, 1 Nov 2018 11:57:29 -0700	[thread overview]
Message-ID: <CABPp-BG6+3F+FY0JP2mj0OCLjp17xQXTe54mBCN+RifQd4Aj2A@mail.gmail.com> (raw)
In-Reply-To: <ad4ab388-6244-1811-ec1b-6f8143df620e@gmail.com>

On Thu, Nov 1, 2018 at 5:32 AM Derrick Stolee <stolee@gmail.com> wrote:
> On 11/1/2018 2:52 AM, Elijah Newren wrote:
> > On Wed, Oct 31, 2018 at 5:05 AM Derrick Stolee <stolee@gmail.com> wrote:
> >> On 10/31/2018 2:04 AM, Elijah Newren wrote:
> >>>
> >>> On the original repo where the topic was brought up, with commit-graph
> >>> NOT turned on and using origin/master, I see:
> >>>
> >>> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> >>> To /home/newren/repo-mirror
> >>>   * [new branch]       test5 -> test5
> >>>
> >>> real 1m20.081s
> >>> user 1m19.688s
> >>> sys 0m0.292s
> >>>
> >>> Merging this series in, I now get:
> >>>
> >>> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> >>> To /home/newren/repo-mirror
> >>>   * [new branch]       test5 -> test5
> >>>
> >>> real 0m2.857s
> >>> user 0m2.580s
> >>> sys 0m0.328s
> >>>
> >>> which provides a very nice speedup.
> >>>
> >>> Oddly enough, if I _also_ do the following:
> >>> $ git config core.commitgraph true
> >>> $ git config gc.writecommitgraph true
> >>> $ git gc
> >>>
> >>> then my timing actually slows down just slightly:
> >>> $ time git push --follow-tags --dry-run /home/newren/repo-mirror
> >>> To /home/newren/repo-mirror
> >>>   * [new branch]          test5 -> test5
> >>>
> >>> real 0m3.027s
> >>> user 0m2.696s
> >>> sys 0m0.400s
> >> So you say that the commit-graph is off in the 2.8s case, but not here
> >> in the 3.1s case? I would expect _at minimum_ that the cost of parsing
> >> commits would have a speedup in the commit-graph case.  There may be
> >> something else going on here, since you are timing a `push` event that
> >> is doing more than the current walk.
> >>
> >>> (run-to-run variation seems pretty consistent, < .1s variation, so
> >>> this difference is just enough to notice.)  I wouldn't be that
> >>> surprised if that means there's some really old tags with very small
> >>> generation numbers, meaning it's not gaining anything in this special
> >>> case from the commit-graph, but it does pay the cost of loading the
> >>> commit-graph.
> >> While you have this test environment, do you mind applying the diff
> >> below and re-running the tests? It will output a count for how many
> >> commits are walked by the algorithm. This should help us determine if
> >> this is another case where generation numbers are worse than commit-date,
> >> or if there is something else going on. Thanks!
> > I can do that, but wouldn't you want a similar patch for the old
> > get_merge_bases_many() in order to compare?  Does an absolute number
> > help by itself?
> > It's going to have to be tomorrow, though; not enough time tonight.
>
> No rush. I'd just like to understand how removing the commit-graph file
> can make the new algorithm faster. Putting a similar count in the old
> algorithm would involve giving a count for every call to
> in_merge_bases_many(), which would be very noisy.

$ time git push --dry-run --follow-tags /home/newren/repo-mirror
count: 92912
To /home/newren/repo-mirror
 * [new branch]              test5 -> test5

real    0m3.024s
user    0m2.752s
sys    0m0.320s


Also:
$ git rev-list --count HEAD
55764
$ git rev-list --count --all
91820

Seems a little odd to me that count is greater than `git rev-list
--count --all`.  However, the fact that they are close in magnitude
isn't surprising since I went digging for the commit with smallest
generation number not found in the upstream repo, and found:
$ git ls-remote /home/newren/repo-mirror/ | grep refs/tags/v0.2.0; echo $?
1
$ git rev-list --count refs/tags/v0.2.0
4
$ git rev-list --count refs/tags/v0.2.0 ^HEAD
4


So, the commit-graph can only help us avoid parsing 3 or so commits,
but we have to parse the 5M .git/objects/info/commit-graph file, and
then for every parse_commit() call we need to bsearch_graph() for the
commit.    My theory is that parsing the commit-graph file and binary
searching it for commits is relatively fast, but that the time is just
big enough to measure and notice, while avoiding parsing 3 more
commits is a negligible time savings.

To me, I'm thinking this is one of those bizarre corner cases where
the commit-graph is almost imperceptibly slower than without the
commit-graph.  (And it is a very weird repo; someone repeatedly
filter-branched lots of small independent repos into a monorepo, but
didn't always push everything and didn't clean out all old stuff.)
But if you still see weird stuff you want to dig into further (maybe
the 92912 > 91820 bit?), I'm happy to try out other stuff.

  reply	other threads:[~2018-11-01 18:57 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-30 14:16 [PATCH 0/3] Make add_missing_tags() linear Derrick Stolee via GitGitGadget
2018-10-30 14:16 ` [PATCH 1/3] commit-reach: implement get_reachable_subset Derrick Stolee via GitGitGadget
2018-10-31  3:35   ` Junio C Hamano
2018-10-31 12:01     ` Derrick Stolee
2018-11-02  1:51       ` Junio C Hamano
2018-10-31  6:07   ` Elijah Newren
2018-10-31 11:54     ` Derrick Stolee
2018-10-30 14:16 ` [PATCH 2/3] test-reach: test get_reachable_subset Derrick Stolee via GitGitGadget
2018-10-30 14:16 ` [PATCH 3/3] remote: make add_missing_tags() linear Derrick Stolee via GitGitGadget
2018-10-31  3:05 ` [PATCH 0/3] Make " Junio C Hamano
2018-10-31  6:04 ` Elijah Newren
2018-10-31 12:05   ` Derrick Stolee
2018-11-01  6:52     ` Elijah Newren
2018-11-01 12:32       ` Derrick Stolee
2018-11-01 18:57         ` Elijah Newren [this message]
2018-11-01 19:02           ` Derrick Stolee
2018-11-02 14:58             ` Elijah Newren
2018-11-02 15:38               ` Derrick Stolee
2018-11-02 13:14 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-11-02 13:14   ` [PATCH v2 1/3] commit-reach: implement get_reachable_subset Derrick Stolee via GitGitGadget
2018-11-02 13:14   ` [PATCH v2 2/3] test-reach: test get_reachable_subset Derrick Stolee via GitGitGadget
2018-11-02 13:14   ` [PATCH v2 3/3] remote: make add_missing_tags() linear Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BG6+3F+FY0JP2mj0OCLjp17xQXTe54mBCN+RifQd4Aj2A@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).