mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Peter Kästle" <>
To: Junio C Hamano <>
Cc:, Stefan Beller <>
Subject: Re: [RFC 1/2] submodules: test for fetch of non-init subsub-repo
Date: Wed, 11 Nov 2020 13:45:46 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 09.11.20 18:52, Junio C Hamano wrote:
> Peter Kaestle <> writes:
>> This test case triggers a regression, which was introduced by
>> a62387b3fc9f5aeeb04a2db278121d33a9caafa7 in following setup:
> Minor nit.  Please refer to a commit like so:
> a62387b3 (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28)
> That is what "git show -s --pretty=reference" gives for the commit.
> If you have older git, "--format='%h (%s, %ad)' --date=short" would
> work.

Thanks for this hint, this is really useful.

> Instead of saying "if you follow this complex thing, it breaks and
> it is a regression at there", please describe it as a regular bugfix
> log message.  Describe the set-up first, explain the operation you'd
> perform under the condition, and tell readers what your expected
> outcome is.  Then tell readers what actually happens, and how that
> is different from your expected outcome.  Additionally, tell readers
> that it used to work before such and such commit broke it and what
> the root cause of the breakage is.

hm... I did do this in the cover letter, maybe you missed it or I was 
not able to express myself good enough there.
Anyhow, I'll add it to the commit messages, which goes into the log.

Here is my proposal for a new commit message of the test case:

A regression has been introduced by 'a62387b (submodule.c: fetch in 
submodules git directory instead of in worktree, 2018-11-28)'.

The scenario in which it triggers is when one has a remote repository 
with a subrepository inside a subrepository like this:

Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working

Now person A introduces a change to the inner_repo and propagates it 
through the middle_repo and the superproject.
Once person A pushed the changes and person B wants to fetch them using 
"git fetch" on superproject level, git will return with error saying:

Could not access submodule 'inner_repo'
Errors during submodule fetch:

Expectation is that in this case the inner submodule will be recognized 
as uninitialized subrepository and skipped by the git fetch command.

This used to work correctly before 'a62387b (submodule.c: fetch in 
submodules git directory instead of in worktree, 2018-11-28)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside 
.git/modules for a directory only existing in the worktree, delivering 
then of course wrong return value.

About the revert of the a62387b commit, which I proposed in the second 
patch, I'm not sure it's the right way.  The revert was simply my quick 
approach to fix it.  As I'm not fully aware of what the idea was behind 
handling the submodules inside .git/modules instead of the worktree, I 
don't know whether this is the best solution.  Maybe rethinking the 
whole get_next_submodule() algorithm or simply fixing the is_empty_dir() 
to use the worktree path will be a better solution.
--> We should discuss about this.

> What commit the set-up was broken is also an interesting piece of
> information, but it is not as important in the overall picture.
> Also, it probably is a better arrangement, after explaining how the
> current system does not work in the log message, to have the code
> fix in the same patch and add test to ensure the bug will stay
> fixed, in a single patch.  That way, you do not have to start with
> expect_failure and then flip the polarity to expect_success, which
> is a horrible style for reviewers to understand the code fix because
> the second "fix" step does not actually show the effect of what got
> fixed in the patch (the test change shows the flip of the polarity
> of the test plus only a few context lines and does not show what
> behaviour change the "fix" causes).

Ok, will deliver the test and the fix proposal in a single patch.

>> diff --git a/t/ b/t/
>> index dd8e423..9fbd481 100755
>> --- a/t/
>> +++ b/t/
>> @@ -719,4 +719,42 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
>>   	)
>>   '
>> +add_commit_push()
>> +{
> Style.
>      add_commit_push () {


> cf. Documentation/CodingGuidelines.
>> +	dir="$1"
>> +	msg="$2"
>> +	shift 2
>> +	git -C "$dir" add "$@" &&
>> +	git -C "$dir" commit -a -m "$msg" &&
>> +	git -C "$dir" push
>> +}
>> +
>> +test_expect_failure 'fetching a superproject containing an uninitialized sub/sub project' '
>> +	# does not depend on any previous test setups
>> +
>> +	for repo in outer middle inner
>> +	do
>> +		git init --bare $repo &&
>> +		git clone $repo ${repo}_content &&
>> +		echo $repo > ${repo}_content/file &&
> Style.
>      echo "$repo" >"${repo}_content/file" &&


> cf. Documentation/CodingGuidelines.
>> +		add_commit_push ${repo}_content "initial" file
> If any of these iterations, except for the last one, fails in the
> loop, you do not notice the breakage and go on to the next
> iteration.  You'd need "|| return 1" at the end, perhaps.

yes, I definitely missed that.

> So far, you created three bare repositories called outer, middle and
> inner, and each of {outer,middle,inner}_content repositories is a
> copy with a working tree of its counterpart.
>> +	done &&
>> +
>> +	git clone outer A &&
>> +	git -C A submodule add "$pwd/middle" &&
>> +	git -C A/middle/ submodule add "$pwd/inner" &&
> Hmph.  Is it essential to name these directories with full pathname
> for the problem to reproduce, or would the issue also appear if
> these repositories refer to each other with relative pathnames?
> Just being curious---if it only breaks with one and succeeds with
> the other, that deserves commenting here.

Haven't tried that as the case was intended to simulate an environment, 
where one has remote repositories.  And with remote repositories, you 
have an url, which is kind of absolute path.  When reading the failing 
code, I doubt that it really matters.

> So far, you created A that is "outer", added "middle" as its
> submodule and then added "inner" as a submodule of "middle".
> Although it is not wrong per-se, it somehow feels a bit unnatural
> that you didn't do all of the above in the working trees you created
> in the previous step---I would have expected that middle_content
> working tree would be used to add "inner" as its submodule, for
> example.

Not sure I got your concern, maybe it helps you to understand when I add 
this scenario description which we want to mimic:
The "bare" repos outer, middle and inner are created by an administrator 
on a remote server.  Person A is preparing the split of the sources for 
all the other users working in the environment by adding the submodules 
the way which is specified by the software architecture we intend to 
develop in.

>> +	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
>> +	add_commit_push A/ "adding middle sub" .gitmodules middle &&
> And then you conclude the addition of submodules by recording each
> of these two "submodule add" events in a commit and push it out.
>> +	git clone outer B &&
>> +	git -C B/ submodule update --init middle &&
> And then you clone the outer thing (which does not recursively
> instantiate) from A, and instantiate the middle layer (which does
> not recursively instantiate the bottom later, I presume?)

Yes, Person B is cloning into the outer layer without recursively going 
into all the submodules, just initializing the ones, which he is 
expected to work on.  In the tests scenario he's only working on the 
middle layer, but not on the inner one.

> I _think_ the state here should be minimally validated in this test.

Of course we could do so.  My intention was to keep it focused on the 
one thing which we needed to test.  Namely the fetch of an outer repo 
with an uninitialized sub-sub repo.

> If you expect 'outer' and 'middle' are instantiated, perhaps check
> its contents (e.g. do you have a thing called 'file'?  What does it
> have in it?) and check the commit (e.g. does 'rev-parse HEAD' give
> you the commit you expect?).  If you expect 'inner' is not
> instantiated at this point, that should be vaildated as well.  If
> anything, that would explain what your expectations are better than
> any word in the proposed log message.
> In any case, i presume that up to this point things work as expected
> with or without the "fix" patch?  If so, the usual way we structure
> these tests is to stop here and make that a single "setup" test.
> Start the whole sequence above like so, perhaps.
>      test_expect_success 'setup nested submodule fetch test' '
> 		...

Ok, got it, will refactor.

> And then the "interesting" part of the test.
>> +	echo "change on inner repo of A" > A/middle/inner/file &&
> Style.


>> +	add_commit_push A/middle/inner "change on inner" file &&
>> +	add_commit_push A/middle "change on inner" inner &&
>> +	add_commit_push A "change on inner" middle &&
> So you create a new commit in the bottom layer, propagate it up to
> the middle layer, and to the outer layer.  Are these steps also what
> you expect to succeed, or does the "regression" break any of these?
> If these are still part of set-up that is expected to work, you
> probably need to roll these up to the 'setup' step (with some
> validation to express what the tests are expecting). From your
> description, which did not say where exactly in this long sequence
> you expect things to break, unfortunately no reader can tell, so
> I'll leave the restructuring up to you.

Yes those steps are also expected to succeed, it's just important that 
the initial clone of B happens before those pushes.  For your proposed 
restructuring this could also go into the setup step.  Leaving only one 
single command for the actual test to fail:

>> +
>> +	git -C B/ fetch
> And from B that was an original copy of A with only the top and
> middle layer instantiated, you run "git fetch".  Are you happy as
> long as "git fetch" does not exit with non-zero status?  That is
> hard to believe---it may be a necessary condition for the command to
> exit with zero status, but you have other expectations, like what
> commit the remote tracking branch refs/remotes/origin/HEAD ought to
> be pointing at.  I think we should check that, too.

Checking for return code is the one thing which catches this regression, 
but checking whether all the repositories are at the correct HEAD is 
another thing which we probably want to have in for testing future 
changes on the respective part of the code.  Will add it.

Thank you very much for all the comments, I learned a lot by processing 
through them.  I'll send a patch v2 soon.

kind regards,

  reply	other threads:[~2020-11-11 12:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03 14:23 [REGRESSION FIX 0/2] Handling regression introduced by a62387b Peter Kaestle
2020-11-03 14:23 ` [REGRESSION FIX 1/2] submodules: test for fetch of non-init subsub-repo Peter Kaestle
2020-11-03 14:23 ` [REGRESSION FIX 2/2] Revert "submodule.c: fetch in submodules git directory instead of in worktree" Peter Kaestle
2020-11-09  8:33 ` [RFC 0/2] Handling regression introduced by a62387b Peter Kaestle
2020-11-09  8:33   ` [RFC 1/2] submodules: test for fetch of non-init subsub-repo Peter Kaestle
2020-11-09 17:52     ` Junio C Hamano
2020-11-11 12:45       ` Peter Kästle [this message]
2020-11-11 17:22         ` Philip Oakley
2020-11-12 16:00           ` [RFCv2] submodules: fix of regression on fetching " Peter Kaestle
2020-11-11 17:35       ` [RFC 1/2] submodules: test for fetch " Philippe Blain
2020-11-11 19:27         ` Junio C Hamano
2020-11-11 19:24     ` Philippe Blain
2020-11-09  8:33   ` [RFC 2/2] Revert "submodule.c: fetch in submodules git directory instead of in worktree" Peter Kaestle
2020-11-10 15:08     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).