git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Mike Hommey <mh@glandium.org>
To: SZEDER Gábor <szeder.dev@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Revision walking, commit dates, slop
Date: Sat, 18 May 2019 13:17:06 +0900
Message-ID: <20190518041706.ct6ie5trvxgdhjar@glandium.org> (raw)
In-Reply-To: <20190518035828.pjaqfrkkvldhri6v@glandium.org>

On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
> On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
> > On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
> > > There are established corner cases, where in a repo where commit dates
> > > are not monotonically increasing, revision walking can go horribly
> > > wrong. This was discussed in the past in e.g.
> > > https://public-inbox.org/git/20150521061553.GA29269@glandium.org/
> > > 
> > > The only (simple) workable way, given the current algorithm, to get an
> > > accurate view off rev-list is to essentially make slop infinite. This
> > > works fine, at the expense of runtime.
> > > 
> > > Now, ignoring any modification for the above, I'm hitting another corner
> > > case in some other "weird" history, where I have 500k commits all with
> > > the same date. With such a commit dag, something as trivial as
> > > `git rev-list HEAD~..HEAD` goes through all commits from the root commit
> > > to HEAD, which takes multiple seconds, when the (obvious) output is one
> > > commit.
> > > 
> > > It looks like the only way revision walking stops going through all the
> > > ancestry is through slop, and slop is essentially made infinite by the
> > > fact all commits have the same date (because of the date check in
> > > still_interesting(). By extension, this means the workaound for the
> > > first corner case above, which is to make slop infinite, essentially
> > > makes all rev walking go through the entire ancestry of the commits
> > > given on the command line.
> > > 
> > > It feels like some cases of everybody_uninteresting should shorcut slop
> > > entirely, but considering the only way for slop to decrease at all is
> > > when everybody_uninteresting returns true, that would seem like a wrong
> > > assumption. But I'm also not sure what slop helps with in the first
> > > place (but I don't have a clear view of the broader picture of how the
> > > entire revision walking works).
> > > 
> > > Anyways, a rather easy way to witness this happening is to create a
> > > dummy repo like:
> > >   git init foo
> > >   cd foo
> > >   for i in $(seq 1 50); do
> > >     echo $i > a;
> > >     git add a;
> > >     git commit -a -m $i;
> > >   done
> > > 
> > > The something as simple as `git rev-list HEAD~..HEAD` will go through
> > > all 50 commits (assuming the script above created commits in the same
> > > second, which it did on my machine)
> > > 
> > > By the time both HEAD~ and HEAD have been processed, the revision
> > > walking should have enough information to determine that it doesn't need
> > > to go further, but still does. Even with something like HEAD~2..HEAD,
> > > after the first round of processing parents it should be able to see
> > > there's not going to be any more interesting commits.
> > > 
> > > I'm willing to dig into this, but if someone familiar with the
> > > algorithm could give me some hints as to what I might be missing in the
> > > big picture, that would be helpful.
> > 
> > All the above is without commit-graph, I presume?  If so, then you
> > should give it a try, as it might bring immediate help in your
> > pathological repo.  With 5k commit in the same second (enforced via
> > 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
> > 
> >   $ best-of-five -q git rev-list HEAD~..HEAD
> >   0.069
> >   $ git commit-graph write --reachableComputing commit graph generation
> >   numbers: 100% (5000/5000), done.
> >   $ best-of-five -q git rev-list HEAD~..HEAD
> >   0.004
> 
> I'm not observing any difference from using commit-graph, whether in
> time or in the number of commits that are looked at in limit_list().

-c core.commitGraph=true does make a difference in time, but not in the
number of commits looked at in limit_list(). So it's only faster because
each iteration of the loop is faster. It means it's still dependent on
the depth of the dag, and the larger the repo will grow, the slower it
will get.

Mike

  reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-18  0:54 Mike Hommey
2019-05-18  1:50 ` SZEDER Gábor
2019-05-18  3:58   ` Mike Hommey
2019-05-18  4:17     ` Mike Hommey [this message]
2019-05-18 12:01       ` SZEDER Gábor
2019-05-19 22:28         ` Jakub Narebski
2019-05-20  1:33       ` Derrick Stolee
2019-05-20 11:02         ` Jakub Narebski
2019-05-20 11:20           ` Derrick Stolee
2019-05-20 13:42             ` Jakub Narebski
2019-05-20 23:27               ` Jakub Narebski
2019-05-21  1:20                 ` Derrick Stolee
2019-05-22 18:29                   ` Jakub Narebski
2019-05-22 19:06                     ` Derrick Stolee
2019-05-23 21:04                       ` Jakub Narebski
2019-06-25  7:51               ` Jakub Narebski
2019-06-25 10:54                 ` Derrick Stolee
2019-09-18  8:43                   ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
2019-09-18 12:26                     ` Derrick Stolee
2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
2019-05-22  2:19             ` Mike Hommey
2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190518041706.ct6ie5trvxgdhjar@glandium.org \
    --to=mh@glandium.org \
    --cc=git@vger.kernel.org \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror http://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git