git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: William Sprent via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, William Sprent <williams@unity3d.com>
Subject: Re: [PATCH] fast-export: fix surprising behavior with --first-parent
Date: Tue, 23 Nov 2021 14:07:26 +0100	[thread overview]
Message-ID: <211123.865ysjui34.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <pull.1084.git.1637666927224.gitgitgadget@gmail.com>


On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote:

> From: William Sprent <williams@unity3d.com>
>
> When invoking git-fast-export with the --first-parent flag on a branch
> with merges, fast-export would early-out on processing the first merge
> on the branch. If combined with --reverse, fast-export would instead
> output all single parent commits on the branch.
>
> This commit makes fast-export output the same commits as rev-list
> --first-parent, and makes --reverse not have an effect on which commits
> are output.
>
> The fix involves removing logic within fast-export which was responsible
> for ensuring that parents are processed before their children, which was
> what was exiting early due to missing second parents. This is replaced
> by setting 'reverse = 1' before revision walking, which, in conjuction
> with topo_order, allows for delegating the ordering of commits to
> revision.c. The reverse flag is set after parsing rev-list arguments to
> avoid having it disabled.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     fast-export: fix surprising behavior with --first-parent
>     
>     Hi,
>     
>     This is my first time patching git, so I probably need some guidance on
>     my approach. :)

Hi, thanks for your first contribution to git. This is a rather shallow
review, a deeper one is much deserved.

I notice that you're removing code in builtin/fast-export.c, presumably
we have code in revision.c that does the same thing. It would really
help a reviewer for you to dig a bit into the relevant commit history
and note it in the commit message.

I.e. could revision.c always do this, and this was always needless
duplication, or at time X it was needed, but as of Y revision.c learned
to do this, and callers A, B and C were adjusted, but just not this
missed call D? etc.

> -static int has_unshown_parent(struct commit *commit)
> -{
> -	struct commit_list *parent;
> -
> -	for (parent = commit->parents; parent; parent = parent->next)
> -		if (!(parent->item->object.flags & SHOWN) &&
> -		    !(parent->item->object.flags & UNINTERESTING))
> -			return 1;
> -	return 0;
> -}
> -
>  struct anonymized_entry {
>  	struct hashmap_entry hash;
>  	const char *anon;
> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>  	return strbuf_detach(&out, NULL);
>  }
>  
> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
> -			struct string_list *paths_of_changed_objects)
> -{
> -	struct commit *commit;
> -	while (commits->nr) {
> -		commit = (struct commit *)object_array_pop(commits);
> -		if (has_unshown_parent(commit)) {
> -			/* Queue again, to be handled later */
> -			add_object_array(&commit->object, NULL, commits);
> -			return;
> -		}
> -		handle_commit(commit, revs, paths_of_changed_objects);
> -	}
> -}

...

>  static void handle_tag(const char *name, struct tag *tag)
>  {
> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>  int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  {
>  	struct rev_info revs;
> -	struct object_array commits = OBJECT_ARRAY_INIT;
>  	struct commit *commit;
>  	char *export_filename = NULL,
>  	     *import_filename = NULL,
> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>  
>  	get_tags_and_duplicates(&revs.cmdline);
>  
> +	revs.reverse = 1;

Is the placement of revs.reverse = 1 here important, or could it go
earlier after init_revision_sources() when we assign some other values
ir revs?

>  	if (prepare_revision_walk(&revs))
>  		die("revision walk setup failed");

A light reading of prepare_revision_walk() suggests it could come after,
but maybe I'm entirely wrong.

>  	revs.diffopt.format_callback = show_filemodify;
>  	revs.diffopt.format_callback_data = &paths_of_changed_objects;
>  	revs.diffopt.flags.recursive = 1;
>  	while ((commit = get_revision(&revs))) {
> -		if (has_unshown_parent(commit)) {
> -			add_object_array(&commit->object, NULL, &commits);
> -		}
> -		else {
> -			handle_commit(commit, &revs, &paths_of_changed_objects);
> -			handle_tail(&commits, &revs, &paths_of_changed_objects);
> -		}
> +		handle_commit(commit, &revs, &paths_of_changed_objects);
>  	}
>  

Yay code deletion, good if it works (I didn't check).

Since this is just a one-statement while-loop we can also remove its
braces now.

> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
> +	git init first-parent &&
> +	cd first-parent &&

Do any such "cd" in a sub-shell:

	git init x &&
	(
    		cd x &&
                ...
	)
Otherwise the next test after you is going to run in anotherdirectory.

> +	test_commit init &&
> +	git checkout -b topic1 &&
> +	test_commit file2 file2.txt &&
> +	git checkout main &&
> +	git merge topic1 --no-ff &&
> +
> +	git checkout -b topic2 &&
> +	test_commit file3 file3.txt &&
> +	git checkout main &&
> +	git merge topic2 --no-ff &&

Just a nit. I'd use "test_commit A", "test_commit B" etc. when the
filenames etc. aren't important. There's no subsequent reference here,
so I assume they're not.

> +	test_commit branch-head &&
> +
> +	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&

nit; >expected, not > expected is the usual style.
> +
> +	git fast-export main -- --first-parent > first-parent-export &&
> +	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&

ditto:

> +	git init import && cd import &&

ditto earlier "cd" comment.

> +	cat ../first-parent-export | git fast-import &&

Instead of "cat x | prog" do "prog <x".

> +	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&

> +	test $(git rev-list --all | wc -l) -eq 4 &&

Instead:

    git rev-list --all >tmp &&
    test_line_count = 4 tmp

(for some value of tmp)

> +	test_cmp ../expected actual &&
> +	test_cmp ../first-parent-export ../first-parent-reverse-export
> +'

Maybe some of the CD-ing around here wouldu be easier by not doing that
and instead running e.g.:

    git -C subdir fast-export >file-not-in-subdir &&
    ...

  reply	other threads:[~2021-11-23 13:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason [this message]
2021-11-24 11:57   ` William Sprent
2021-11-23 19:14 ` Elijah Newren
2021-11-24 13:05   ` William Sprent
2021-11-24  0:41 ` Junio C Hamano
2021-11-24 13:05   ` William Sprent
2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
2021-12-10  3:48   ` Elijah Newren
2021-12-10 21:55     ` Junio C Hamano
2021-12-10 22:02       ` Elijah Newren
2021-12-13 15:09     ` William Sprent
2021-12-14  0:31       ` Elijah Newren
2021-12-14 13:11         ` William Sprent
2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
2021-12-21 18:47     ` Elijah Newren
2021-12-21 20:50       ` Junio C Hamano
2021-12-22  8:38         ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211123.865ysjui34.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=williams@unity3d.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).