git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: William Sprent via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	William Sprent <williams@unity3d.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH] fast-export: fix surprising behavior with --first-parent
Date: Tue, 23 Nov 2021 11:14:26 -0800	[thread overview]
Message-ID: <CABPp-BGm0VtU=z0r0oGc48J020Meh_xdibnagNeDmBGQ-RVu+g@mail.gmail.com> (raw)
In-Reply-To: <pull.1084.git.1637666927224.gitgitgadget@gmail.com>

On Tue, Nov 23, 2021 at 5:25 AM William Sprent via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: William Sprent <williams@unity3d.com>
>
> When invoking git-fast-export with the --first-parent flag on a branch
> with merges, fast-export would early-out on processing the first merge
> on the branch. If combined with --reverse, fast-export would instead
> output all single parent commits on the branch.
>
> This commit makes fast-export output the same commits as rev-list
> --first-parent, and makes --reverse not have an effect on which commits
> are output.
>
> The fix involves removing logic within fast-export which was responsible
> for ensuring that parents are processed before their children, which was
> what was exiting early due to missing second parents. This is replaced
> by setting 'reverse = 1' before revision walking, which, in conjuction
> with topo_order, allows for delegating the ordering of commits to
> revision.c. The reverse flag is set after parsing rev-list arguments to
> avoid having it disabled.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     fast-export: fix surprising behavior with --first-parent
>
>     Hi,
>
>     This is my first time patching git, so I probably need some guidance on
>     my approach. :)
>
>     I've noticed that git fast-export exhibits some odd behavior when passed
>     the --first-parent flag. In the repository I was working on, it would
>     only output the initial commit before exiting. Moreover, adding the
>     --reverse flag causes it to behave differently and instead output all
>     commits in the first parent line that only have one parent. My
>     expectation is more or less that git fast-export should output the same
>     set of commits as git rev-list would output given the same arguments,
>     which matches how it acts when given revision ranges.
>
>     It seems like this behavior comes from the fact that has_unshown_parents
>     will never be false for any merge commits encountered when fast-export
>     is called with --first-parent. This causes the while loop to follow the
>     pattern of pushing all commits into the "commits" queue until the
>     initial commit is encountered, at which point it calls handle_tail which
>     falls through on the first merge commit, causing most of the commits to
>     be unhandled.
>
>     My impression is that this logic only serves to ensure that parents are
>     processed before their children, so in my patch I've opted to fix the
>     issue by delegating that responsibility to revision.c by adding the
>     reverse flag before performing the revision walk. From what I can see,
>     this should be equivalent to what the previous logic is trying to
>     achieve, but I can also see that there could be risk in these changes.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1084%2Fwilliams-unity%2Ffast-export%2Ffirst-parent-issues-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1084/williams-unity/fast-export/first-parent-issues-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1084
>
>  builtin/fast-export.c  | 36 ++----------------------------------
>  t/t9350-fast-export.sh | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+), 34 deletions(-)
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 8e2caf72819..50f8c224b6e 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
>
>  static struct decoration idnums;
>  static uint32_t last_idnum;
> -
> -static int has_unshown_parent(struct commit *commit)
> -{
> -       struct commit_list *parent;
> -
> -       for (parent = commit->parents; parent; parent = parent->next)
> -               if (!(parent->item->object.flags & SHOWN) &&
> -                   !(parent->item->object.flags & UNINTERESTING))
> -                       return 1;
> -       return 0;
> -}
> -
>  struct anonymized_entry {
>         struct hashmap_entry hash;
>         const char *anon;
> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>         return strbuf_detach(&out, NULL);
>  }
>
> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
> -                       struct string_list *paths_of_changed_objects)
> -{
> -       struct commit *commit;
> -       while (commits->nr) {
> -               commit = (struct commit *)object_array_pop(commits);
> -               if (has_unshown_parent(commit)) {
> -                       /* Queue again, to be handled later */
> -                       add_object_array(&commit->object, NULL, commits);
> -                       return;
> -               }
> -               handle_commit(commit, revs, paths_of_changed_objects);
> -       }
> -}
>
>  static void handle_tag(const char *name, struct tag *tag)
>  {
> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>  int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  {
>         struct rev_info revs;
> -       struct object_array commits = OBJECT_ARRAY_INIT;
>         struct commit *commit;
>         char *export_filename = NULL,
>              *import_filename = NULL,
> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>
>         get_tags_and_duplicates(&revs.cmdline);
>
> +       revs.reverse = 1;
>         if (prepare_revision_walk(&revs))
>                 die("revision walk setup failed");
>         revs.diffopt.format_callback = show_filemodify;
>         revs.diffopt.format_callback_data = &paths_of_changed_objects;
>         revs.diffopt.flags.recursive = 1;
>         while ((commit = get_revision(&revs))) {
> -               if (has_unshown_parent(commit)) {
> -                       add_object_array(&commit->object, NULL, &commits);
> -               }
> -               else {
> -                       handle_commit(commit, &revs, &paths_of_changed_objects);
> -                       handle_tail(&commits, &revs, &paths_of_changed_objects);
> -               }
> +               handle_commit(commit, &revs, &paths_of_changed_objects);

Wow, interesting.  I did some related work that I never submitted at
https://github.com/newren/git/commit/08f799b4667de1c755c78e1ea1657f946b588556
-- in that commit, I also noticed that fast-export did not seem
careful enough about making sure to process all commits, and came up
with a much different fix.  However, in my case, I didn't know how to
trigger the problems in fast-export without some additional changes I
was trying to make, and since I never got those other changes working,
I didn't think it was worth submitting my fix.  My solution is more
complex, in part because it was designed to handle ordering
requirements & recommendations other than just ancestry.  However, we
don't currently have any additional ordering requirements (the current
code only considers ancestry), so your solution is much simpler.  If I
do at some point get my other changes working, I'd need to
re-introduce the queue and handle_tail() and my changes to these, but
that can always be done later if and when I get those other changes
working.

Interestingly, Dscho added both the --reverse ability to revision.c
and the commits object_array and the handle_tail() stuff in
fast-export.c.  Both were done in the same year, and --reverse came
first.  See 9c5e66e97da8 (Teach revision machinery about --reverse,
2007-01-20) and f2dc849e9c5f (Add 'git fast-export', the sister of
'git fast-import', 2007-12-02).  It's not clear why revs.reverse = 1
wasn't used previously, although it certainly didn't occur to me
either and I think it would have been an alternative solution to my
first ever git.git contribution -- 784f8affe4df (fast-export: ensure
we traverse commits in topological order, 2009-02-10).  (Though
rev.reverse = 1 wouldn't have provided the same speedups that
rev.topo_order=1 provided.)

I can't think of any cases where this would cause any problems, and it
seems like using rev.reverse is a pretty clean solution.  Just in case
I'm missing something, though, it would be nice to get Dscho to also
comment on this.  I'm cc'ing him.

(Also, I see that Ævar has already provided some good feedback on the
testcase, so I'll stop here.)

>         }
>
>         handle_tags_and_duplicates(&extra_refs);
> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
> index 409b48e2442..bd08101b81d 100755
> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -750,4 +750,34 @@ test_expect_success 'merge commit gets exported with --import-marks' '
>         )
>  '
>
> +
> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
> +       git init first-parent &&
> +       cd first-parent &&
> +       test_commit init &&
> +       git checkout -b topic1 &&
> +       test_commit file2 file2.txt &&
> +       git checkout main &&
> +       git merge topic1 --no-ff &&
> +
> +       git checkout -b topic2 &&
> +       test_commit file3 file3.txt &&
> +       git checkout main &&
> +       git merge topic2 --no-ff &&
> +
> +       test_commit branch-head &&
> +
> +       git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&
> +
> +       git fast-export main -- --first-parent > first-parent-export &&
> +       git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&
> +       git init import && cd import &&
> +       cat ../first-parent-export | git fast-import &&
> +
> +       git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&
> +       test $(git rev-list --all | wc -l) -eq 4 &&
> +       test_cmp ../expected actual &&
> +       test_cmp ../first-parent-export ../first-parent-reverse-export
> +'
> +
>  test_done
>
> base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
> --
> gitgitgadget

  parent reply	other threads:[~2021-11-23 19:14 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
2021-11-24 11:57   ` William Sprent
2021-11-23 19:14 ` Elijah Newren [this message]
2021-11-24 13:05   ` William Sprent
2021-11-24  0:41 ` Junio C Hamano
2021-11-24 13:05   ` William Sprent
2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
2021-12-10  3:48   ` Elijah Newren
2021-12-10 21:55     ` Junio C Hamano
2021-12-10 22:02       ` Elijah Newren
2021-12-13 15:09     ` William Sprent
2021-12-14  0:31       ` Elijah Newren
2021-12-14 13:11         ` William Sprent
2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
2021-12-21 18:47     ` Elijah Newren
2021-12-21 20:50       ` Junio C Hamano
2021-12-22  8:38         ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BGm0VtU=z0r0oGc48J020Meh_xdibnagNeDmBGQ-RVu+g@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=williams@unity3d.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).