git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Mike Hommey <mh@glandium.org>
To: git@vger.kernel.org
Subject: Revision walking, commit dates, slop
Date: Sat, 18 May 2019 09:54:12 +0900	[thread overview]
Message-ID: <20190518005412.n45pj5p2rrtm2bfj@glandium.org> (raw)

Hi,

There are established corner cases, where in a repo where commit dates
are not monotonically increasing, revision walking can go horribly
wrong. This was discussed in the past in e.g.
https://public-inbox.org/git/20150521061553.GA29269@glandium.org/

The only (simple) workable way, given the current algorithm, to get an
accurate view off rev-list is to essentially make slop infinite. This
works fine, at the expense of runtime.

Now, ignoring any modification for the above, I'm hitting another corner
case in some other "weird" history, where I have 500k commits all with
the same date. With such a commit dag, something as trivial as
`git rev-list HEAD~..HEAD` goes through all commits from the root commit
to HEAD, which takes multiple seconds, when the (obvious) output is one
commit.

It looks like the only way revision walking stops going through all the
ancestry is through slop, and slop is essentially made infinite by the
fact all commits have the same date (because of the date check in
still_interesting(). By extension, this means the workaound for the
first corner case above, which is to make slop infinite, essentially
makes all rev walking go through the entire ancestry of the commits
given on the command line.

It feels like some cases of everybody_uninteresting should shorcut slop
entirely, but considering the only way for slop to decrease at all is
when everybody_uninteresting returns true, that would seem like a wrong
assumption. But I'm also not sure what slop helps with in the first
place (but I don't have a clear view of the broader picture of how the
entire revision walking works).

Anyways, a rather easy way to witness this happening is to create a
dummy repo like:
  git init foo
  cd foo
  for i in $(seq 1 50); do
    echo $i > a;
    git add a;
    git commit -a -m $i;
  done

The something as simple as `git rev-list HEAD~..HEAD` will go through
all 50 commits (assuming the script above created commits in the same
second, which it did on my machine)

By the time both HEAD~ and HEAD have been processed, the revision
walking should have enough information to determine that it doesn't need
to go further, but still does. Even with something like HEAD~2..HEAD,
after the first round of processing parents it should be able to see
there's not going to be any more interesting commits.

I'm willing to dig into this, but if someone familiar with the
algorithm could give me some hints as to what I might be missing in the
big picture, that would be helpful.

Cheers,

Mike

             reply	other threads:[~2019-05-18  0:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-18  0:54 Mike Hommey [this message]
2019-05-18  1:50 ` Revision walking, commit dates, slop SZEDER Gábor
2019-05-18  3:58   ` Mike Hommey
2019-05-18  4:17     ` Mike Hommey
2019-05-18 12:01       ` SZEDER Gábor
2019-05-19 22:28         ` Jakub Narebski
2019-05-20  1:33       ` Derrick Stolee
2019-05-20 11:02         ` Jakub Narebski
2019-05-20 11:20           ` Derrick Stolee
2019-05-20 13:42             ` Jakub Narebski
2019-05-20 23:27               ` Jakub Narebski
2019-05-21  1:20                 ` Derrick Stolee
2019-05-22 18:29                   ` Jakub Narebski
2019-05-22 19:06                     ` Derrick Stolee
2019-05-23 21:04                       ` Jakub Narebski
2019-06-25  7:51               ` Jakub Narebski
2019-06-25 10:54                 ` Derrick Stolee
2019-09-18  8:43                   ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Jakub Narebski
2019-09-18 12:26                     ` Derrick Stolee
2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
2019-05-22  2:19             ` Mike Hommey
2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190518005412.n45pj5p2rrtm2bfj@glandium.org \
    --to=mh@glandium.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).