git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Subject: Re: [ANNOUNCE] Git v2.34.0-rc2
Date: Thu, 11 Nov 2021 15:23:08 -0500	[thread overview]
Message-ID: <YY17rBFIdDl+H47I@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqwnlemwcy.fsf@gitster.g>

On Thu, Nov 11, 2021 at 09:32:29AM -0800, Junio C Hamano wrote:

> > Now in this case, we're sending too much output, which is OK for the
> > purposes of the connectivity check. It will just walk day-1 and its
> > tree unnecessarily, which is a performance loss but not incorrect.
> 
> The primary use of the traversal in check_connected() is when we
> have new refs we haven't seen, and they go to the positive end of
> the traversal, which will end in the refs we do have (there may be
> tons).  The idea is that when one or more of the new refs are truly
> "new" in the sense that one or more the objects necessary to connect
> them to our refs do not exist, not even in the "not reachable but
> not yet pruned" state, this traversal will hit a missing object and
> will error out.  So, it is alarming that "day-1" is shown without
> painted uninteresting via any of the negative [day-4..day-9]
> commits.  Which means, if we are checking if we need to initiate a
> real fetch to connect day-1 and day-10 to our DAG, when we think we
> have [day-4..day-9] and everything behind them, we stopped traversal
> before seeing all the "objects necessary to connect them to our
> refs".  If day-2 were missing in our repository, we would have
> noticed if we did traversal from sorted tips, but the unsorted
> traversal happily misses it.

Yes, but if day-2 were missing in our repository, then we are already
corrupt. And in most cases we would not notice adding a new ref that is
also corrupt (e.g., imagine adding _just_ day-10 which is a descendent
of day-2, but we stop traversal at day-9 when we see that we already
have it reachable).

So I don't think it is actually changing the check_connected() outcome.
I couldn't come up with a case where we should be checking a commit and
don't. Only the other way around.

> Not that noticing that day-2 is missing from our repository does not
> help much in *this* particular case, though.  It is likely that a
> common negotiation would say "I have [day-4..day-9], and want day-1
> and day-10", that is reponded with "OK, I know enough, here is
> day-10 and its tree/blob that would be missing from a healthy clone
> with everything behind day-9", and it won't include day-2 (nor
> day-1).  So in this particular example, it would not matter if the
> new unsorted traversal is subtly broken (I think the extent of the
> damage is similar to making the SLOP problem deliberately worse),
> but I am not sure if there are other failure modes that would yield
> outright incorrect result.

Yes, I think that framing is right: it is making SLOP much worse. We
could similarly have had bogus timestamps in those commits which would
cause the same outcome. So in that sense it is nothing new. On the other
hand, I wonder how often it will cause extra traversal work (keeping in
mind that this commit traversal is just the first stage; after we find
the commits, then we talk all of their trees, which is the more
expensive part).

For the case of adding new commits directly on top of another branch, I
think there would be no change. But any time you have to walk down to a
common fork point (e.g., imagine I made a new branch forked from an old
bit of history), we may fail to find that. I haven't quite constructed
an example, but I have a feeling we could end up walking over
arbitrarily long segments of history.

> We probably should revert this step as it can affect correctness in
> a big way, but I wonder if the other steps in the same series, or
> other topic that came later, rely on it.

I looked them over, and I think this is pretty independent (with the
exception of the refactoring of the no_walk/unsorted flags, but
obviously that had to come first).

> At the very least, I think this may be prudent during -rc period,
> but on the other hand, I do not know offhand what would later
> pursuade us to reinstate it and convince us that it is a safe thing
> to do.
> [...]
> diff --git c/connected.c w/connected.c
> index cf68e37a97..35bd4a2638 100644
> --- c/connected.c
> +++ w/connected.c
> @@ -107,7 +107,6 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  	if (opt->progress)
>  		strvec_pushf(&rev_list.args, "--progress=%s",
>  			     _("Checking connectivity"));
> -	strvec_push(&rev_list.args, "--unsorted-input");
>  
>  	rev_list.git_cmd = 1;
>  	rev_list.env = opt->env;

This seems like a pretty safe and minimal backing-out for the -rc
period. We would still ship with "--unsorted-input" as an option (and
mentioned in the docs), though. If we're worried that it might be a dead
end and we don't want to support it, we could revert f45022dc2f
(connected: do not sort input revisions, 2021-08-09) entirely. That
carries a little more risk of accidentally breaking something during the
revert, but from my reading of the patch it should be pretty safe.

I'd be curious to hear Patrick's thoughts on the whole thing.

-Peff

  reply	other threads:[~2021-11-11 20:23 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10  0:59 [ANNOUNCE] Git v2.34.0-rc2 Junio C Hamano
2021-11-10  5:41 ` Jeff King
2021-11-10  6:00   ` Jeff King
2021-11-10  8:11     ` Carlo Arenas
2021-11-10  8:22       ` Jeff King
2021-11-10  9:15         ` Carlo Arenas
2021-11-10  9:35           ` Jeff King
2021-11-10  9:39     ` [PATCH] RelNotes: mention known crasher when ssh signing with OpenSSH 8.7 Carlo Marcelo Arenas Belón
2021-11-10 21:39       ` Junio C Hamano
2021-11-10 22:11         ` Junio C Hamano
2021-11-11  9:16           ` Jeff King
     [not found]             ` <YY7/peK1EOHtATEI@camp.crustytoothpaste.net>
2021-11-13  0:16               ` Junio C Hamano
2021-11-10 21:49     ` [ANNOUNCE] Git v2.34.0-rc2 Junio C Hamano
2021-11-10  6:35 ` Johannes Altmanninger
2021-11-10  8:22   ` Jeff King
2021-11-10 21:43     ` Junio C Hamano
2021-11-11 12:07 ` Jeff King
2021-11-11 17:32   ` Junio C Hamano
2021-11-11 20:23     ` Jeff King [this message]
2021-11-11 20:33       ` Junio C Hamano
2021-11-15 15:06       ` Patrick Steinhardt
2021-11-15 15:27         ` Jeff King
2021-11-15 16:52       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YY17rBFIdDl+H47I@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).