git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: William Sprent via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	William Sprent <williams@unity3d.com>
Subject: Re: [PATCH v2] fast-export: fix surprising behavior with --first-parent
Date: Fri, 10 Dec 2021 14:02:33 -0800	[thread overview]
Message-ID: <CABPp-BEU2pbfa6CSSMe9Dw7YQCaw+uU1rNMJn1YRraHKJ5D_8g@mail.gmail.com> (raw)
In-Reply-To: <xmqqee6km8ez.fsf@gitster.g>

On Fri, Dec 10, 2021 at 1:55 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > ...  Here's what I think are the relevant points
> > (and yeah, it's lengthy):
> >
> >
> > The revision traversal machinery typically processes and returns all
> > children before any parent.  fast-export needs to operate in the
> > reverse fashion, handling parents before any of their children in
> > order to build up the history starting from the root commit(s).  This
> > would be a clear case where we could just use the revision traversal
> > machinery's "reverse" option to achieve this desired affect.
> >
> > However, this wasn't what the code did.  It added its own array for
> > queuing.  The obvious hand-rolled solution would be to just push all
> > the commits into the array and then traverse afterwards, but it didn't
> > quite do that either.  It instead attempted to process anything it
> > could as soon as it could, and once it could, check whether it could
> > process anything that had been queued.  As far as I can tell, this was
> > an effort to save a little memory in the case of multiple root commits
> > since it could process some commits before queueing all of them.  This
> > involved some helper functions named has_unshown_parent() and
> > handle_tail().  For typical invocations of fast-export, this
> > alternative essentially amounted to a hand-rolled method of reversing
> > the commits -- it was a bunch of work to duplicate the revision
> > traversal machinery's "reverse" option.
> >
> > This hand-rolled reversing mechanism is actually somewhat difficult to
> > reason about.  It takes some time to figure out how it ensures in
> > normal cases that it will actually process all traversed commits
> > (rather than just dropping some and not printing anything for them).
> >
> > And it turns out there are some cases where the code does drop commits
> > without handling them, and not even printing an error or warning for
> > the user.  Due to the has_unshown_parent() checks, some commits could
> > be left in the array at the end of the "while...get_revision()" loop
> > which would be unprocessed.  This could be triggered for example with
> >     git fast-export main -- --first-parent
> > or non-sensical traversal rules such as
> >     git fast-export main -- --grep=Merge --invert-grep
> >
> > While most traversals that don't include all parents should likely
> > trigger errors in fast-export (or at least require being used in
> > combination with --reference-excluded-parents), the --first-parent
> > traversal is at least reasonable and it'd be nice if it didn't just
> > drop commits.  It'd also be nice to have a simpler "reverse traversal"
> > mechanism.  Use the "reverse" option of the revision traversal
> > machinery to achieve both.
>
> The above is a very helpful and understandable explanation of what
> is going on.  I am a bit puzzled by the very last part, though. By
> "It'd also be nice to have a simpler 'reverse traversal' mechanism",
> do you mean that the end users have need to control the direction
> the traversal goes (in other words, they use "git fast-export" for
> some thing, and "git fast-export --reverse" to achieve some other
> things)?  Or do you just mean that we need to do a reverse traversal
> but that is already available in the revision traversal machinery,
> and not using it and rolling our own does not make sense?

Sorry, yeah, I meant the latter.  I do not think end users should have
control of the direction.  Perhaps if that was reworded to "...It'd
also be nice for future readers of the code to have a simpler..." it'd
be clearer?

> > Even for the non-sensical traversal flags like the --grep one above,
> > this would be an improvement.  For example, in that case, the code
> > previously would have silently truncated history to only those commits
> > that do not have an ancestor containing "Merge" in their commit
> > message.  After this code change, that case would would include all
>
> "would would" -> "would"

Good catch.

> > commits without "Merge" in their commit message -- but any commit that
> > previously had a "Merge"-mentioning parent would lose that parent
> > (likely resulting in many new root commits).  While the new behavior
> > is still odd, it is at least understandable given that
> > --reference-excluded-parents is not the default.
>
> Nicely written.

Thanks.  :-)

  reply	other threads:[~2021-12-10 22:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
2021-11-24 11:57   ` William Sprent
2021-11-23 19:14 ` Elijah Newren
2021-11-24 13:05   ` William Sprent
2021-11-24  0:41 ` Junio C Hamano
2021-11-24 13:05   ` William Sprent
2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
2021-12-10  3:48   ` Elijah Newren
2021-12-10 21:55     ` Junio C Hamano
2021-12-10 22:02       ` Elijah Newren [this message]
2021-12-13 15:09     ` William Sprent
2021-12-14  0:31       ` Elijah Newren
2021-12-14 13:11         ` William Sprent
2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
2021-12-21 18:47     ` Elijah Newren
2021-12-21 20:50       ` Junio C Hamano
2021-12-22  8:38         ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BEU2pbfa6CSSMe9Dw7YQCaw+uU1rNMJn1YRraHKJ5D_8g@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=williams@unity3d.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).