git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, stolee@gmail.com
Subject: Re: [PATCH] commit-reach: do not parse and iterate minima
Date: Thu, 24 Mar 2022 15:15:19 -0700	[thread overview]
Message-ID: <20220324221519.1324898-1-jonathantanmy@google.com> (raw)
In-Reply-To: <34da1688-7d94-d54c-6b3a-3106d0f15380@github.com>

Derrick Stolee <derrickstolee@github.com> writes:
> On 3/23/2022 7:30 PM, Junio C Hamano wrote:
> > This sounds quite tricky.  In this case we may know which commit we
> > need to avoid (re)parsing to avoid the bug, but would it always be
> > the case?  It feels almost like we want to unparse the commit
> > objects when we clear the grafts information in the previous patch,
> > doesn't it?
>  
> I agree that the adjustment to paint_down_to_common() is a bit too
> coupled to this scenario, when we should just trust our commits to
> be valid if they have been parsed. We should always be able to
> parse our parents.

Thanks for the comments from both of you. I think Stolee's comment
squarely hits the relevant points: it is precisely this scenario
(revision walk to remove unreachable shallow commits) that must be
careful of what it parses, and we *must not* parse the shallow boundary
commit's parents.

I think that there are 2 questions. First, whether we should parse the
shallow boundary commit's parents, and second, whether we should parse
the shallow boundary commit itself. In the commit message, I only
discussed the second (because that implies the first), but perhaps I
should have discussed both. Anyway, the discussion:

(a) Should we parse the shallow boundary commit's parents? I think the
    obvious answer is no, because the remote probably wouldn't have sent
    them. But the code currently does: in paint_down_to_common(), they
    are parsed before being added to the priority queue (and parsing is
    necessary because the priority queue requires their date).
    Incidentally, this results in an error message from
    repo_parse_commit_internal() being printed, but
    repo_in_merge_bases_many() swallows the error by not checking the
    return value (it only checks whether a certain commit has a certain
    flag, which is true by the time the failing parent parse occurs). So
    we should have some sort of one_is_at_min_generation anyway, at
    least so that we can prevent its parents from being parsed.

(b) Should we parse the shallow boundary commit itself? If we don't
    care, then we should unparse commits when grafts are cleared.

    In this case, though, I think that it is the responsibility of the
    shallow code to be careful with what it does with the commits. It is
    performing operations on commits that it alone knows shallow
    information about (because the shallow information is still being
    checked and thus not yet written to the repo). As I wrote in the
    commit message (which is admittedly long and perhaps hard to
    understand), I think that in the typical case, we only have a commit
    when its graft information is already present, so we don't need to
    worry about graft information changing from under it.

> Finding this bug is interesting, but I agree with Junio that a
> better fix would be to "unparse" a commit when modifying its graft
> in any way. That will universally fix it for any potential future
> commit walks that might be hit due to future code changes.

Unparsing also means that code can't rely on commits being already
parsed, even if they would expect it to be true (for example, a commit
in a priority queue would be expected to be parsed, since we would have
needed the date for it to enter the queue in the first place), but maybe
this is not a big problem in practice.

  parent reply	other threads:[~2022-03-24 22:15 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23 21:08 [PATCH] commit-reach: do not parse and iterate minima Jonathan Tan
2022-03-23 23:30 ` Junio C Hamano
2022-03-24 15:27   ` Derrick Stolee
2022-03-24 22:06     ` Taylor Blau
2022-03-25 14:32       ` Derrick Stolee
2022-03-24 22:15     ` Jonathan Tan [this message]
2022-03-24 12:05 ` Bagas Sanjaya
2022-03-24 22:19   ` Jonathan Tan
2022-03-24 15:29 ` Derrick Stolee
2022-03-24 22:21   ` Jonathan Tan
2022-06-02 23:11 ` [PATCH v2] commit,shallow: unparse commits if grafts changed Jonathan Tan
2022-06-03  9:30   ` Ævar Arnfjörð Bjarmason
2022-06-03 13:29     ` Derrick Stolee
2022-06-03 15:27       ` Jonathan Tan
2022-06-03 15:26     ` Jonathan Tan
2022-06-06 17:54 ` [PATCH v3] " Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220324221519.1324898-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).