git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Subject: Re: [ANNOUNCE] Git v2.34.0-rc2
Date: Thu, 11 Nov 2021 09:32:29 -0800	[thread overview]
Message-ID: <xmqqwnlemwcy.fsf@gitster.g> (raw)
In-Reply-To: <YY0HbQJEWbOwuuFj@coredump.intra.peff.net> (Jeff King's message of "Thu, 11 Nov 2021 07:07:09 -0500")

Jeff King <peff@peff.net> writes:

> On Tue, Nov 09, 2021 at 04:59:29PM -0800, Junio C Hamano wrote:
>
>>  * The revision traversal API has been optimized by taking advantage
>>    of the commit-graph, when available, to determine if a commit is
>>    reachable from any of the existing refs.
>
> I was thinking a bit about this change, specifically f45022dc2f
> (connected: do not sort input revisions, 2021-08-09). At the time, Junio
> said[1]:
>
>   Sorting of positive side is done to help both performance and
>   correctness in regular use of the traversal machinery, especially
>   when reachability bitmap is not in effect, but on the negative side
>   I do not think there is any downside to omit sorting offhand.  The
>   only case that may get affected is when the revision.c::SLOP kicks
>   in to deal with oddball commits with incorrect committer timestamps,
>   but then the result of the sorting isn't to be trusted anyway, so...
>
> But wouldn't failure to sort the commits actually _create_ such an
> "oddball commits" scenario, because we'd see the UNINTERESTING commits
> in an arbitrary order, especially with respect to the interesting ones?

You're right.

> So indeed, we get confused by the unsorted input and give the wrong
> answer. We should look at and propagate UNINTERESTING marks from day-9,
> etc on down to day-1, but we don't.
>
> Now in this case, we're sending too much output, which is OK for the
> purposes of the connectivity check. It will just walk day-1 and its
> tree unnecessarily, which is a performance loss but not incorrect.

The primary use of the traversal in check_connected() is when we
have new refs we haven't seen, and they go to the positive end of
the traversal, which will end in the refs we do have (there may be
tons).  The idea is that when one or more of the new refs are truly
"new" in the sense that one or more the objects necessary to connect
them to our refs do not exist, not even in the "not reachable but
not yet pruned" state, this traversal will hit a missing object and
will error out.  So, it is alarming that "day-1" is shown without
painted uninteresting via any of the negative [day-4..day-9]
commits.  Which means, if we are checking if we need to initiate a
real fetch to connect day-1 and day-10 to our DAG, when we think we
have [day-4..day-9] and everything behind them, we stopped traversal
before seeing all the "objects necessary to connect them to our
refs".  If day-2 were missing in our repository, we would have
noticed if we did traversal from sorted tips, but the unsorted
traversal happily misses it.

Not that noticing that day-2 is missing from our repository does not
help much in *this* particular case, though.  It is likely that a
common negotiation would say "I have [day-4..day-9], and want day-1
and day-10", that is reponded with "OK, I know enough, here is
day-10 and its tree/blob that would be missing from a healthy clone
with everything behind day-9", and it won't include day-2 (nor
day-1).  So in this particular example, it would not matter if the
new unsorted traversal is subtly broken (I think the extent of the
damage is similar to making the SLOP problem deliberately worse),
but I am not sure if there are other failure modes that would yield
outright incorrect result.

> My intuition is that this is _probably_ the only type of incorrect
> answer we'd give (i.e., that we'd never go the other way, omitting an
> object we should have included) because nobody would ever set the
> UNINTERESTING flag unnecessarily, and everybody_uninteresting() does do
> a full scan looking for any positive tips.
>
> Still, I didn't see this subtlety mentioned in the earlier discussion,
> and it's a bit alarming not to understand all of the possible
> implications. I'm not sure if it rises to the level of something we want
> to consider more or address before the release. Sorry to only come up
> with this at the -rc2 stage, but I figure now is better than right after
> the release. ;)
>
> -Peff
>
> [1] https://lore.kernel.org/git/xmqqa6lzwu31.fsf@gitster.g/

We probably should revert this step as it can affect correctness in
a big way, but I wonder if the other steps in the same series, or
other topic that came later, rely on it.

At the very least, I think this may be prudent during -rc period,
but on the other hand, I do not know offhand what would later
pursuade us to reinstate it and convince us that it is a safe thing
to do.

 connected.c | 1 -
 1 file changed, 1 deletion(-)

diff --git c/connected.c w/connected.c
index cf68e37a97..35bd4a2638 100644
--- c/connected.c
+++ w/connected.c
@@ -107,7 +107,6 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 	if (opt->progress)
 		strvec_pushf(&rev_list.args, "--progress=%s",
 			     _("Checking connectivity"));
-	strvec_push(&rev_list.args, "--unsorted-input");
 
 	rev_list.git_cmd = 1;
 	rev_list.env = opt->env;

  reply	other threads:[~2021-11-11 17:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10  0:59 [ANNOUNCE] Git v2.34.0-rc2 Junio C Hamano
2021-11-10  5:41 ` Jeff King
2021-11-10  6:00   ` Jeff King
2021-11-10  8:11     ` Carlo Arenas
2021-11-10  8:22       ` Jeff King
2021-11-10  9:15         ` Carlo Arenas
2021-11-10  9:35           ` Jeff King
2021-11-10  9:39     ` [PATCH] RelNotes: mention known crasher when ssh signing with OpenSSH 8.7 Carlo Marcelo Arenas Belón
2021-11-10 21:39       ` Junio C Hamano
2021-11-10 22:11         ` Junio C Hamano
2021-11-11  9:16           ` Jeff King
     [not found]             ` <YY7/peK1EOHtATEI@camp.crustytoothpaste.net>
2021-11-13  0:16               ` Junio C Hamano
2021-11-10 21:49     ` [ANNOUNCE] Git v2.34.0-rc2 Junio C Hamano
2021-11-10  6:35 ` Johannes Altmanninger
2021-11-10  8:22   ` Jeff King
2021-11-10 21:43     ` Junio C Hamano
2021-11-11 12:07 ` Jeff King
2021-11-11 17:32   ` Junio C Hamano [this message]
2021-11-11 20:23     ` Jeff King
2021-11-11 20:33       ` Junio C Hamano
2021-11-15 15:06       ` Patrick Steinhardt
2021-11-15 15:27         ` Jeff King
2021-11-15 16:52       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqwnlemwcy.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).