git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Mike Hommey <mh@glandium.org>,
	git@vger.kernel.org, Kacper Kornet <draenog@pld-linux.org>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: Revision walking, commit dates, slop
Date: Mon, 20 May 2019 00:28:28 +0200	[thread overview]
Message-ID: <8636laqdtf.fsf@gmail.com> (raw)
In-Reply-To: <20190518120140.GB951@szeder.dev> ("SZEDER \=\?utf-8\?Q\?G\=C3\=A1b\?\= \=\?utf-8\?Q\?or\=22's\?\= message of "Sat, 18 May 2019 14:01:40 +0200")

SZEDER Gábor <szeder.dev@gmail.com> writes:
> On Sat, May 18, 2019 at 01:17:06PM +0900, Mike Hommey wrote:
>> On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
>>> On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
>>>> On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
[...]
>>>>> There are established corner cases, where in a repo where commit dates
>>>>> are not monotonically increasing, revision walking can go horribly
>>>>> wrong. This was discussed in the past in e.g.
>>>>> https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
>>>>> 
>>>>> The only (simple) workable way, given the current algorithm, to get an
>>>>> accurate view off rev-list is to essentially make slop infinite. This
>>>>> works fine, at the expense of runtime.
>>>>> 
>>>>> Now, ignoring any modification for the above, I'm hitting another corner
>>>>> case in some other "weird" history, where I have 500k commits all with
>>>>> the same date. With such a commit dag, something as trivial as
>>>>> `git rev-list HEAD~..HEAD` goes through all commits from the root commit
>>>>> to HEAD, which takes multiple seconds, when the (obvious) output is one
>>>>> commit.
[...]
>>>> All the above is without commit-graph, I presume?  If so, then you
>>>> should give it a try, as it might bring immediate help in your
>>>> pathological repo.  With 5k commit in the same second (enforced via
>>>> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
>>>> 
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.069
>>>>   $ git commit-graph write --reachableComputing commit graph generation
>>>>   numbers: 100% (5000/5000), done.
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.004
>>> 
>>> I'm not observing any difference from using commit-graph, whether in
>>> time or in the number of commits that are looked at in limit_list().
>> 
>> -c core.commitGraph=true does make a difference in time, but not in the
>> number of commits looked at in limit_list(). So it's only faster because
>> each iteration of the loop is faster. It means it's still dependent on
>> the depth of the dag, and the larger the repo will grow, the slower it
>> will get.
>
> Oh, indeed.  Well, at least you'll waste about an order of magnitude
> less processor time until you figure out how to fix it :)
>
> Btw, once upon a time this was fast, but it became slow with commit
> c19d1b4e84 (Fix revision walk for commits with the same dates,
> 2013-03-22).

This might be related to the fact, that with generation numbers v1
(i.e. topological levels) there were cases when using those generation
numbers for ordering caused performance regressions (e.g. for some parts
of Linux kernel history).

There was an idea of replacing them with generation numbers v2 [1],
which if I remember correctly was chosen to be corrected date (that is,
committer date or one more than maximum of dates of parents).  This is
incompatibile change; it turned out however that while commit-graph
format is versioned, unfortunately Git fails hard if commit-graph
version is too new, instead of falling back to not using commit graph.

[1]: https://github.com/derrickstolee/gen-test

Stolee, could you tell us what is the current status on generation
numbers v2?  Thanks in advance.

Best,
--
Jakub Narębski

  reply	other threads:[~2019-05-19 22:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-18  0:54 Revision walking, commit dates, slop Mike Hommey
2019-05-18  1:50 ` SZEDER Gábor
2019-05-18  3:58   ` Mike Hommey
2019-05-18  4:17     ` Mike Hommey
2019-05-18 12:01       ` SZEDER Gábor
2019-05-19 22:28         ` Jakub Narebski [this message]
2019-05-20  1:33       ` Derrick Stolee
2019-05-20 11:02         ` Jakub Narebski
2019-05-20 11:20           ` Derrick Stolee
2019-05-20 13:42             ` Jakub Narebski
2019-05-20 23:27               ` Jakub Narebski
2019-05-21  1:20                 ` Derrick Stolee
2019-05-22 18:29                   ` Jakub Narebski
2019-05-22 19:06                     ` Derrick Stolee
2019-05-23 21:04                       ` Jakub Narebski
2019-06-25  7:51               ` Jakub Narebski
2019-06-25 10:54                 ` Derrick Stolee
2019-09-18  8:43                   ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
2019-09-18 12:26                     ` Derrick Stolee
2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
2019-05-22  2:19             ` Mike Hommey
2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8636laqdtf.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=draenog@pld-linux.org \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=mh@glandium.org \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).