From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Mike Hommey <mh@glandium.org>
Cc: git@vger.kernel.org, Kacper Kornet <draenog@pld-linux.org>
Subject: Re: Revision walking, commit dates, slop
Date: Sat, 18 May 2019 14:01:40 +0200 [thread overview]
Message-ID: <20190518120140.GB951@szeder.dev> (raw)
In-Reply-To: <20190518041706.ct6ie5trvxgdhjar@glandium.org>
On Sat, May 18, 2019 at 01:17:06PM +0900, Mike Hommey wrote:
> On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
> > On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
> > > On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
> > > > There are established corner cases, where in a repo where commit dates
> > > > are not monotonically increasing, revision walking can go horribly
> > > > wrong. This was discussed in the past in e.g.
> > > > https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
> > > >
> > > > The only (simple) workable way, given the current algorithm, to get an
> > > > accurate view off rev-list is to essentially make slop infinite. This
> > > > works fine, at the expense of runtime.
> > > >
> > > > Now, ignoring any modification for the above, I'm hitting another corner
> > > > case in some other "weird" history, where I have 500k commits all with
> > > > the same date. With such a commit dag, something as trivial as
> > > > `git rev-list HEAD~..HEAD` goes through all commits from the root commit
> > > > to HEAD, which takes multiple seconds, when the (obvious) output is one
> > > > commit.
> > > >
> > > > It looks like the only way revision walking stops going through all the
> > > > ancestry is through slop, and slop is essentially made infinite by the
> > > > fact all commits have the same date (because of the date check in
> > > > still_interesting(). By extension, this means the workaound for the
> > > > first corner case above, which is to make slop infinite, essentially
> > > > makes all rev walking go through the entire ancestry of the commits
> > > > given on the command line.
> > > >
> > > > It feels like some cases of everybody_uninteresting should shorcut slop
> > > > entirely, but considering the only way for slop to decrease at all is
> > > > when everybody_uninteresting returns true, that would seem like a wrong
> > > > assumption. But I'm also not sure what slop helps with in the first
> > > > place (but I don't have a clear view of the broader picture of how the
> > > > entire revision walking works).
> > > >
> > > > Anyways, a rather easy way to witness this happening is to create a
> > > > dummy repo like:
> > > > git init foo
> > > > cd foo
> > > > for i in $(seq 1 50); do
> > > > echo $i > a;
> > > > git add a;
> > > > git commit -a -m $i;
> > > > done
> > > >
> > > > The something as simple as `git rev-list HEAD~..HEAD` will go through
> > > > all 50 commits (assuming the script above created commits in the same
> > > > second, which it did on my machine)
> > > >
> > > > By the time both HEAD~ and HEAD have been processed, the revision
> > > > walking should have enough information to determine that it doesn't need
> > > > to go further, but still does. Even with something like HEAD~2..HEAD,
> > > > after the first round of processing parents it should be able to see
> > > > there's not going to be any more interesting commits.
> > > >
> > > > I'm willing to dig into this, but if someone familiar with the
> > > > algorithm could give me some hints as to what I might be missing in the
> > > > big picture, that would be helpful.
> > >
> > > All the above is without commit-graph, I presume? If so, then you
> > > should give it a try, as it might bring immediate help in your
> > > pathological repo. With 5k commit in the same second (enforced via
> > > 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
> > >
> > > $ best-of-five -q git rev-list HEAD~..HEAD
> > > 0.069
> > > $ git commit-graph write --reachableComputing commit graph generation
> > > numbers: 100% (5000/5000), done.
> > > $ best-of-five -q git rev-list HEAD~..HEAD
> > > 0.004
> >
> > I'm not observing any difference from using commit-graph, whether in
> > time or in the number of commits that are looked at in limit_list().
>
> -c core.commitGraph=true does make a difference in time, but not in the
> number of commits looked at in limit_list(). So it's only faster because
> each iteration of the loop is faster. It means it's still dependent on
> the depth of the dag, and the larger the repo will grow, the slower it
> will get.
Oh, indeed. Well, at least you'll waste about an order of magnitude
less processor time until you figure out how to fix it :)
Btw, once upon a time this was fast, but it became slow with commit
c19d1b4e84 (Fix revision walk for commits with the same dates,
2013-03-22).
next prev parent reply other threads:[~2019-05-18 12:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-18 0:54 Revision walking, commit dates, slop Mike Hommey
2019-05-18 1:50 ` SZEDER Gábor
2019-05-18 3:58 ` Mike Hommey
2019-05-18 4:17 ` Mike Hommey
2019-05-18 12:01 ` SZEDER Gábor [this message]
2019-05-19 22:28 ` Jakub Narebski
2019-05-20 1:33 ` Derrick Stolee
2019-05-20 11:02 ` Jakub Narebski
2019-05-20 11:20 ` Derrick Stolee
2019-05-20 13:42 ` Jakub Narebski
2019-05-20 23:27 ` Jakub Narebski
2019-05-21 1:20 ` Derrick Stolee
2019-05-22 18:29 ` Jakub Narebski
2019-05-22 19:06 ` Derrick Stolee
2019-05-23 21:04 ` Jakub Narebski
2019-06-25 7:51 ` Jakub Narebski
2019-06-25 10:54 ` Derrick Stolee
2019-09-18 8:43 ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
2019-09-18 12:26 ` Derrick Stolee
2019-05-21 13:14 ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
2019-05-21 13:59 ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
2019-05-22 2:19 ` Mike Hommey
2019-05-21 2:00 ` Revision walking, commit dates, slop Jonathan Nieder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190518120140.GB951@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=draenog@pld-linux.org \
--cc=git@vger.kernel.org \
--cc=mh@glandium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).