git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Subject: Re: [ANNOUNCE] Git v2.34.0-rc2
Date: Mon, 15 Nov 2021 11:52:45 -0500	[thread overview]
Message-ID: <YZKQXYdemZCNS/Bz@coredump.intra.peff.net> (raw)
In-Reply-To: <YY17rBFIdDl+H47I@coredump.intra.peff.net>

On Thu, Nov 11, 2021 at 03:23:09PM -0500, Jeff King wrote:

> Yes, I think that framing is right: it is making SLOP much worse. We
> could similarly have had bogus timestamps in those commits which would
> cause the same outcome. So in that sense it is nothing new. On the other
> hand, I wonder how often it will cause extra traversal work (keeping in
> mind that this commit traversal is just the first stage; after we find
> the commits, then we talk all of their trees, which is the more
> expensive part).
> 
> For the case of adding new commits directly on top of another branch, I
> think there would be no change. But any time you have to walk down to a
> common fork point (e.g., imagine I made a new branch forked from an old
> bit of history), we may fail to find that. I haven't quite constructed
> an example, but I have a feeling we could end up walking over
> arbitrarily long segments of history.

I was playing around with this a bit more, and there is one subtlety in
the "day-10" snippet I showed that I hadn't noted before. It's important
the day-1 does not have any parents. If we used day-2 instead, then
limit_list() would insert its parent (day-1, in this case) into the
queue, without an UNINTERESTING flag (because it's the parent of
something interesting).  And thus when we call still_interesting(), we
would never decrement the slop counter, because we know we are still
walking back to something potentially interesting.

This "works" because we put the new commit at the end of the list via
commit_list_insert_by_date(). That can be fooled, of course, because
it's assuming the list is already in sorted order (which it isn't). So
there could be an "old" commit at the front, and we place the parent in
front of that, even though it's UNINTERESTING descendants are further
back in the list.

So I do think we could walk an arbitrary string of history in this way,
all the way down to the root, or to something else pointed to by a ref
tip. Here's the example I came up with:

-- >8 --
git init -q repo
cd repo

commit_at() {
	echo $1 >$1
	git add .
	base=1234567890
	unit=86400
	timestamp="@$((base + $1 * unit)) +0000"
	GIT_COMMITTER_DATE=$timestamp \
	GIT_AUTHOR_DATE=$timestamp \
	git commit -qm "commit at day $1"
}

# imagine a bunch of base history
for i in $(seq 100); do
	commit_at $i
done
git tag base

# And then we have some older branches hanging around.
for i in $(seq 1 10); do
	git checkout -b branch-$i base~$((60+$i))
done

# But also a newer one; it's important that this refname
# sort after the other ones, because that's what confuses
# the sorting.
git branch new-branch

# and then somebody pushes/fetches a branch based on an old part of history,
# newer than our old branches, but older than our new one.
#
# We won't actually create the branch here, because we're simulating the state
# before the ref is created, when we do the connectivity check.
old_commit=$(git rev-parse base~50)
new_commit=$(echo foo | git commit-tree -p $old_commit HEAD^{tree})

# and now here's the connectivity check we would do
git rev-list $new_commit --not --all >expect
git rev-list --unsorted-input $new_commit --not --all >actual
diff -u expect actual
-- >8 --

That will report all of base~60..base~50 in the output, when it should
just report the single new commit.

I don't think any of this changes the plan for the 2.34 release (in
fact, it makes me more confident that reverting this change was the
right thing to do). I'm just recording my notes here for revisiting the
topic later.

My suspicion is that there's no easy way to make this work. We're
violating the assumption in still_interesting() that it can easily find
the lowest-date commit. We could drop that assumption for the unsorted
case, but then I think we'd be forced to walk all the way to the root
commits, which is even worse than sorting the tips.

I suspect a better solution would be to make use of generation numbers
from the commit graph if we have them. The --topo-order stuff already
does this, and I kind of wonder if we could piggy-back on that.

-Peff

      parent reply	other threads:[~2021-11-15 16:53 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10  0:59 [ANNOUNCE] Git v2.34.0-rc2 Junio C Hamano
2021-11-10  5:41 ` Jeff King
2021-11-10  6:00   ` Jeff King
2021-11-10  8:11     ` Carlo Arenas
2021-11-10  8:22       ` Jeff King
2021-11-10  9:15         ` Carlo Arenas
2021-11-10  9:35           ` Jeff King
2021-11-10  9:39     ` [PATCH] RelNotes: mention known crasher when ssh signing with OpenSSH 8.7 Carlo Marcelo Arenas Belón
2021-11-10 21:39       ` Junio C Hamano
2021-11-10 22:11         ` Junio C Hamano
2021-11-11  9:16           ` Jeff King
     [not found]             ` <YY7/peK1EOHtATEI@camp.crustytoothpaste.net>
2021-11-13  0:16               ` Junio C Hamano
2021-11-10 21:49     ` [ANNOUNCE] Git v2.34.0-rc2 Junio C Hamano
2021-11-10  6:35 ` Johannes Altmanninger
2021-11-10  8:22   ` Jeff King
2021-11-10 21:43     ` Junio C Hamano
2021-11-11 12:07 ` Jeff King
2021-11-11 17:32   ` Junio C Hamano
2021-11-11 20:23     ` Jeff King
2021-11-11 20:33       ` Junio C Hamano
2021-11-15 15:06       ` Patrick Steinhardt
2021-11-15 15:27         ` Jeff King
2021-11-15 16:52       ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZKQXYdemZCNS/Bz@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).