git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philippe Blain <levraiphilippeblain@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>, git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Rose Kunkel <rose@rosekunkel.me>,
	Emily Shaffer <emilyshaffer@google.com>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH v2] submodule: mark submodules with update=none as inactive
Date: Fri, 9 Jul 2021 16:26:35 -0400	[thread overview]
Message-ID: <bf1893ee-6973-d8b2-659e-bb239a0a9ae2@gmail.com> (raw)
In-Reply-To: <20210701225117.909892-1-sandals@crustytoothpaste.net>

Hi brian,

[re-cc'ing Emily and Jonathan who Junio cc'ed in <xmqqeed2sdwc.fsf@gitster.g>
but seemed to have been dropped when you sent v1 and v2 of the patch]

Le 2021-07-01 à 18:51, brian m. carlson a écrit :
> When the user recursively clones a repository with submodules 

Here I would add:

", or runs 'git submodule update --init' after a
non-recursive clone of such a repository, "

> and one or
> more of those submodules is marked with the submodule.<name>.update=none
> configuration, the submodule 

"those submodules" would be clearer, I think.

> will end up being active.  This is a
> problem because we will have skipped cloning or checking out the
> submodule, and as a result, other commands, such as git reset or git
> checkout, will fail if they are invoked with --recurse-submodules (or
> when submodule.recurse is true).
> 
> This is obviously not the behavior the user wanted, so let's fix this by
> specifically setting the submodule as inactive in this case when we're
> initializing the repository.  That will make us properly ignore the
> submodule when performing recursive operations.
> 
> We only do this when initializing a submodule, 

Here for even more clarity I would add:

i.e. 'git submodule init' or 'git submodule update --init',

> since git submodule
> update can update the submodule with various options despite the setting
> of "none" and we want those options to override it as they currently do.
> 
> Reported-by: Rose Kunkel <rose@rosekunkel.me>
> Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>   builtin/submodule--helper.c |  6 ++++++
>   t/t5601-clone.sh            | 24 ++++++++++++++++++++++++
>   2 files changed, 30 insertions(+)

As I said in my review of v1, I think this would warrant a mention in the doc.

In general, I think 'git-submodule(1)' could be more precise about which submodules
are touched by which subcommands. Since the topic that introduced the 'active' concept
was merged in a93dcb0a56 (Merge branch 'bw/submodule-is-active', 2017-03-30), these subcommand
recurse only in active submodules:

- init (with a big caveat, see below)
- sync
- update

The doc makes no mention of that for sync and update. sync says it synchronizes 'all'
submodules, and update says it updates 'registered' submodules ('registered' in not
defined formally anywhere either). And 'active' is mentioned in the description of
'init', but not defined. It would be good to explicitely say "see the 'Active submodules'
section in gitsubmodules(7) for a definition of 'active'", or something like that.

I'm not saying we need to fix that necessarily in this patch, I'm just noting
what my reading of the code and of the doc reveals.

> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index ae6174ab05..a3f8c45d97 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -686,6 +686,12 @@ static void init_submodule(const char *path, const char *prefix,
>   
>   		if (git_config_set_gently(sb.buf, upd))
>   			die(_("Failed to register update mode for submodule path '%s'"), displaypath);
> +
> +		if (sub->update_strategy.type == SM_UPDATE_NONE) {
> +			strbuf_reset(&sb);
> +			strbuf_addf(&sb, "submodule.%s.active", sub->name);
> +			git_config_set_gently(sb.buf, "false");
> +		}
>   	}
>   	strbuf_release(&sb);
>   	free(displaypath);

I did more testing with this patch applied and I fear it is not
completely sufficient. There are 2 main problems, I think.
The first is that the following still triggers the bug:

     git clone server client
     git -C client submodule update --init
     git -C client submodule init       # should be no-op, but isn't
     git -C client reset --hard --recurse-submodules

That's because:

1) 'git submodule init' operates on *all* submodules if 'submodule.active' is unset
     and not <path> is given.
     (see submodule--helper.c::module_init), or the doc [1].
2) 'git submodule init' sets 'submodule.$name.active' to true for the submodules
     on which it operates, unless already covered by 'submodule.active'
     (see submodule--helper.c::init_submodule)
3) the code we're adding to set 'active' to false if 'update=none' is only executed
    if 'submodule.c.update' is not yet in the config, so it gets skipped if we
    repeat 'git submodule init'. (I think this behaviour is sound).

So that's unfortunate, and is also kind of contradictory to what the doc says
for 'git submodule init':
"This command does not alter existing information in .git/config.".
And just to be clear, the behaviour I describe above is already existing, the current
patch just makes it more obvious.

I think we could manage to change that behaviour a bit
in order to have 'submodule init' not modify the config for submodules which are already marked inactive,
*unless* they are explitely matched by the pathspec on the command line.
So we would have:

     git clone server client; cd client
     git submodule init      # initial call sets 'submodule.c.active=false'
     git submodule init      # does not touch c, it's already marked inactive
     git submodule init c    # OK, we really want to mark it as active

To do that, we could use the same trick that we do in update_clone, i.e.

     if (pathspec.nr)
         info.explicit = 1

where 'explicit' (tentative name) is a new field in 'struct init_cb', so that 'init_submodule'
knows if the current submodule was explicitely listed on the command line.


Then there is a second thing. As stated in the commit message,
'git submodule update --checkout' should override the 'update=none'
setting and clone and checkout the submodule. But this behaviour
is broken by the code we're adding, because 'submodule update' only recurse into
active submodules! (see the call to 'is_submodule_active' in
submodule--helper.c::prepare_to_clone_next_submodule).

So this does not clone the submodule:

     git clone --recurse server client   # recursive clone
     git -C client submodule --checkout  # should clone c, doesn't

Neither does this:
    
     git clone server client                   # non-recursive clone
     git -C client submodule update --init
     git -C client submodule update --checkout # should clone c, doesn't

But because of the first problem above, this works(!):

     git clone server client
     git -C client submodule update --init
     git -C client submodule update --init --checkout

Because in the third call, c is set to 'active' by init_submodule,
then is *not* skipped by prepare_to_clone_next_submodule.


So it's all a little bit complicated! But I think that with my suggestion above,
i.e. that 'git submodule init', in the absence of 'submodule.active', would
only switch inactive submodules to active if they are explicitely listed, then
we could get a saner behaviour, at the expense of having to explicitely init
'update=none' submodules on the command line if we really want to '--checkout' :
     
     git clone server client
     git -C client submodule update --init        # first call: set c to inactive
     git -C client submodule update --init        # no-op
     git -C client submodule update --checkout    # does not clone c (currently quiet)
     git -C client submodule update --checkout c  # does not clone c, but warns (current behaviour)
     git -C client submodule init c               # sets c to active
     git -C client submodule update --checkout    # clones c

where the last two command could be a single
'git submodule update --init --checkout c' and ideally the
4th command should also warn the user that they now have to explicitely 'init'
c if they want to check it out, which could simply mean tweaking the already
existing message in next_submodule_warn_missing to also check if
the current submodule has 'update=none' and then display the warning
(instead of just showing it if the submodule was listed on the command
line, which is the current behaviour). Additionnaly, the warning should
say "Maybe you want to use 'update --init %s'?", i.e. specify the path.


What do you think of my suggestions ? I can help push this forward
by contributing patches if we agree that we should go forward with
this slight behaviour change in 'git submodule init' ...


> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index c0688467e7..efe6b13be0 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -752,6 +752,30 @@ test_expect_success 'batch missing blob request does not inadvertently try to fe
>   	git clone --filter=blob:limit=0 "file://$(pwd)/server" client
>   '
>   
> +test_expect_success 'clone with submodule with update=none is not active' '
> +	rm -rf server client &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +
> +	echo aa >server/a &&
> +	echo bb >server/b &&
> +	git -C server submodule add --name c "$(pwd)/repo_for_submodule" c &&
> +	git -C server config -f .gitmodules submodule.c.update none &&
> +	git -C server add a b c .gitmodules &&
> +	git -C server commit -m x &&
> +
> +	git clone --recurse-submodules server client &&
> +	git -C client config submodule.c.active >actual &&
> +	echo false >expected &&
> +	test_cmp actual expected &&
> +	# This would fail if the submodule were active, since it is not checked out.
> +	git -C client reset --recurse-submodules --hard
> +'

I think we might want to also test the non-recursive clone case as well,
i.e. 'git clone' and then 'git submodule update --init', as well as
subsequent calls to 'git submodule init' in light of my analysis above.

Also, the only place in the test suite that I could find where
'update=none' is tested is in t7406.35-38 in t7406-submodule-update.sh
so maybe it would make more sense to put the test(s) there ?

Thanks,

Philippe.

[1] https://git-scm.com/docs/git-submodule#Documentation/git-submodule.txt-init--ltpathgt82308203

  reply	other threads:[~2021-07-09 20:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16  0:16 [BUG] `git reset --hard` fails with `update = none` submodules Rose Kunkel
2021-06-16  0:51 ` brian m. carlson
2021-06-16  0:57   ` Rose Kunkel
2021-06-16  1:03     ` Rose Kunkel
2021-06-16  1:15       ` Rose Kunkel
2021-06-16  1:25       ` brian m. carlson
2021-06-16  1:39         ` Rose Kunkel
2021-06-16  1:46           ` Rose Kunkel
2021-06-16  3:10         ` Junio C Hamano
2021-06-16 13:20           ` Philippe Blain
2021-06-17 23:52             ` brian m. carlson
2021-06-19 21:44               ` [PATCH] submodule: mark submodules with update=none as inactive brian m. carlson
2021-06-22  3:45                 ` Philippe Blain
2021-06-25 23:02                   ` brian m. carlson
2021-06-26 15:12                     ` Philippe Blain
2021-07-01 22:51               ` [PATCH v2] " brian m. carlson
2021-07-09 20:26                 ` Philippe Blain [this message]
2021-07-11 16:59                   ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bf1893ee-6973-d8b2-659e-bb239a0a9ae2@gmail.com \
    --to=levraiphilippeblain@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=rose@rosekunkel.me \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).