git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: William Sprent <williams@unity3d.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"William Sprent via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] fast-export: fix surprising behavior with --first-parent
Date: Wed, 24 Nov 2021 12:57:03 +0100	[thread overview]
Message-ID: <7c2a01ec-8a69-b7af-cabb-c6a6ef7483a9@unity3d.com> (raw)
In-Reply-To: <211123.865ysjui34.gmgdl@evledraar.gmail.com>

On 23/11/2021 14.07, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote:
> 
>> From: William Sprent <williams@unity3d.com>
>>
>> When invoking git-fast-export with the --first-parent flag on a branch
>> with merges, fast-export would early-out on processing the first merge
>> on the branch. If combined with --reverse, fast-export would instead
>> output all single parent commits on the branch.
>>
>> This commit makes fast-export output the same commits as rev-list
>> --first-parent, and makes --reverse not have an effect on which commits
>> are output.
>>
>> The fix involves removing logic within fast-export which was responsible
>> for ensuring that parents are processed before their children, which was
>> what was exiting early due to missing second parents. This is replaced
>> by setting 'reverse = 1' before revision walking, which, in conjuction
>> with topo_order, allows for delegating the ordering of commits to
>> revision.c. The reverse flag is set after parsing rev-list arguments to
>> avoid having it disabled.
>>
>> Signed-off-by: William Sprent <williams@unity3d.com>
>> ---
>>      fast-export: fix surprising behavior with --first-parent
>>      
>>      Hi,
>>      
>>      This is my first time patching git, so I probably need some guidance on
>>      my approach. :)
> 
> Hi, thanks for your first contribution to git. This is a rather shallow
> review, a deeper one is much deserved.
> > I notice that you're removing code in builtin/fast-export.c, presumably
> we have code in revision.c that does the same thing. It would really
> help a reviewer for you to dig a bit into the relevant commit history
> and note it in the commit message.
> 
> I.e. could revision.c always do this, and this was always needless
> duplication, or at time X it was needed, but as of Y revision.c learned
> to do this, and callers A, B and C were adjusted, but just not this
> missed call D? etc.
> 

That's a really good suggestion. I should've done that. I did dig just 
enough to see that the logic has been around since fast-export was 
introduced, but I didn't check whether the 'reverse' option was part of 
revision.c at that point. I see that Elijah has done that homework for 
me later in this thread and discovered that --reverse was introduce a 
year or so before fast-export though.

>> -static int has_unshown_parent(struct commit *commit)
>> -{
>> -	struct commit_list *parent;
>> -
>> -	for (parent = commit->parents; parent; parent = parent->next)
>> -		if (!(parent->item->object.flags & SHOWN) &&
>> -		    !(parent->item->object.flags & UNINTERESTING))
>> -			return 1;
>> -	return 0;
>> -}
>> -
>>   struct anonymized_entry {
>>   	struct hashmap_entry hash;
>>   	const char *anon;
>> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
>>   	return strbuf_detach(&out, NULL);
>>   }
>>   
>> -static void handle_tail(struct object_array *commits, struct rev_info *revs,
>> -			struct string_list *paths_of_changed_objects)
>> -{
>> -	struct commit *commit;
>> -	while (commits->nr) {
>> -		commit = (struct commit *)object_array_pop(commits);
>> -		if (has_unshown_parent(commit)) {
>> -			/* Queue again, to be handled later */
>> -			add_object_array(&commit->object, NULL, commits);
>> -			return;
>> -		}
>> -		handle_commit(commit, revs, paths_of_changed_objects);
>> -	}
>> -}
> 
> ...
> 
>>   static void handle_tag(const char *name, struct tag *tag)
>>   {
>> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
>>   int cmd_fast_export(int argc, const char **argv, const char *prefix)
>>   {
>>   	struct rev_info revs;
>> -	struct object_array commits = OBJECT_ARRAY_INIT;
>>   	struct commit *commit;
>>   	char *export_filename = NULL,
>>   	     *import_filename = NULL,
>> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
>>   
>>   	get_tags_and_duplicates(&revs.cmdline);
>>   
>> +	revs.reverse = 1;
> 
> Is the placement of revs.reverse = 1 here important, or could it go
> earlier after init_revision_sources() when we assign some other values
> ir revs?
> 
>>   	if (prepare_revision_walk(&revs))
>>   		die("revision walk setup failed");
> 
> A light reading of prepare_revision_walk() suggests it could come after,
> but maybe I'm entirely wrong.

It needs to go after the setup_revisions() call as otherwise a --reverse 
passed to fast-export will toggle the option off again. You are right in 
that it can be moved down. I've done that in my working directory for 
the next patch.

Another option would be to move the revs.reverse up as you suggest and 
then then call die() if it was toggled off by setup_revisions(). The 
only downside I can think of is that it would make any current usage of 
'fast-export --reverse' go from a no-op to an error.

>>   	revs.diffopt.format_callback = show_filemodify;
>>   	revs.diffopt.format_callback_data = &paths_of_changed_objects;
>>   	revs.diffopt.flags.recursive = 1;
>>   	while ((commit = get_revision(&revs))) {
>> -		if (has_unshown_parent(commit)) {
>> -			add_object_array(&commit->object, NULL, &commits);
>> -		}
>> -		else {
>> -			handle_commit(commit, &revs, &paths_of_changed_objects);
>> -			handle_tail(&commits, &revs, &paths_of_changed_objects);
>> -		}
>> +		handle_commit(commit, &revs, &paths_of_changed_objects);
>>   	}
>>   
> 
> Yay code deletion, good if it works (I didn't check).
> 
> Since this is just a one-statement while-loop we can also remove its
> braces now.
>  >> +test_expect_success 'fast-export --first-parent outputs all 
revisions output by revision walk' '
>> +	git init first-parent &&
>> +	cd first-parent &&
> 
> Do any such "cd" in a sub-shell:
> 
> 	git init x &&
> 	(
>      		cd x &&
>                  ...
> 	)
> Otherwise the next test after you is going to run in anotherdirectory.
> 
>> +	test_commit init &&
>> +	git checkout -b topic1 &&
>> +	test_commit file2 file2.txt &&
>> +	git checkout main &&
>> +	git merge topic1 --no-ff &&
>> +
>> +	git checkout -b topic2 &&
>> +	test_commit file3 file3.txt &&
>> +	git checkout main &&
>> +	git merge topic2 --no-ff &&
> 
> Just a nit. I'd use "test_commit A", "test_commit B" etc. when the
> filenames etc. aren't important. There's no subsequent reference here,
> so I assume they're not.
> 
>> +	test_commit branch-head &&
>> +
>> +	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&
> 
> nit; >expected, not > expected is the usual style.
>> +
>> +	git fast-export main -- --first-parent > first-parent-export &&
>> +	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&
> 
> ditto:
> 
>> +	git init import && cd import &&
> 
> ditto earlier "cd" comment.
> 
>> +	cat ../first-parent-export | git fast-import &&
> 
> Instead of "cat x | prog" do "prog <x".
> 
>> +	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&
> 
>> +	test $(git rev-list --all | wc -l) -eq 4 &&
> 
> Instead:
> 
>      git rev-list --all >tmp &&
>      test_line_count = 4 tmp
> 
> (for some value of tmp)
> 
>> +	test_cmp ../expected actual &&
>> +	test_cmp ../first-parent-export ../first-parent-reverse-export
>> +'
> 
> Maybe some of the CD-ing around here wouldu be easier by not doing that
> and instead running e.g.:
> 
>      git -C subdir fast-export >file-not-in-subdir &&
>      ...
> 

Thanks for taking the time to give your feedback. :) I'll remove those 
braces from the while loop and incorporate your comments about the test 
for v2.

  reply	other threads:[~2021-11-24 11:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 11:28 [PATCH] fast-export: fix surprising behavior with --first-parent William Sprent via GitGitGadget
2021-11-23 13:07 ` Ævar Arnfjörð Bjarmason
2021-11-24 11:57   ` William Sprent [this message]
2021-11-23 19:14 ` Elijah Newren
2021-11-24 13:05   ` William Sprent
2021-11-24  0:41 ` Junio C Hamano
2021-11-24 13:05   ` William Sprent
2021-12-09  8:13 ` [PATCH v2] " William Sprent via GitGitGadget
2021-12-10  3:48   ` Elijah Newren
2021-12-10 21:55     ` Junio C Hamano
2021-12-10 22:02       ` Elijah Newren
2021-12-13 15:09     ` William Sprent
2021-12-14  0:31       ` Elijah Newren
2021-12-14 13:11         ` William Sprent
2021-12-16 16:23   ` [PATCH v3] " William Sprent via GitGitGadget
2021-12-21 18:47     ` Elijah Newren
2021-12-21 20:50       ` Junio C Hamano
2021-12-22  8:38         ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c2a01ec-8a69-b7af-cabb-c6a6ef7483a9@unity3d.com \
    --to=williams@unity3d.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).