git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: SZEDER Gábor <szeder.dev@gmail.com>
Cc: Aleksey Mikhaylov <almikhailov@plesk.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Issue: "Could not access submodule" error when pulling recursively with Git 2.22.0
Date: Fri, 25 Oct 2019 14:41:22 +0200 (CEST)
Message-ID: <nycvar.QRO.7.76.6.1910251415560.46@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <20191023100449.GH4348@szeder.dev>

[-- Attachment #1: Type: text/plain, Size: 9313 bytes --]

Hi Gábor,

On Wed, 23 Oct 2019, SZEDER Gábor wrote:

> On Wed, Oct 23, 2019 at 07:22:12AM +0000, Aleksey Mikhaylov wrote:
> > "Could not access submodule" error when pulling recursively with Git 2.22.0.
> > This issue causes if there is submodule in submodule.
>
> > Please use my simple test repository to reproduce the problem:
> > https://github.com/agmikhailov/example2361.git
> >
> > It is just an empty repository with one submodule (https://github.com/agmikhailov/example2361M1.git)
> > and one submodule of submodule (https://github.com/agmikhailov/example2361M2.git):
> >
> > git clone https://github.com/agmikhailov/example2361.git
> > cd example2361/
> > git submodule init
>
> According to the docs of 'git submodule init', it "initializes the
> submodules recorded in the index".  Therefore, it can only initialize
> the submodule in 'example2361M1', because at this point we have no
> idea that there is a nested submodule in there.
>
>   $ git submodule init
>   Submodule 'example2361M1' (https://github.com/agmikhailov/example2361M1.git) registered for path 'example2361M1'

Indeed, `git submodule init` is not recursive.

> > git submodule update
>
> This command clones 'example2361M1':
>
>   $ git submodule update --recursive
>   Cloning into '/tmp/example2361/example2361M1'...
>   Submodule path 'example2361M1': checked out '6a9be24a1c0ebd44d91ae4dcf1fd62580b936540'
>
> Only at this point can we finally see that there is a nested
> submodule, and can initialize and clone it with:
>
>   $ cd example2361M1
>   $ git submodule init
>   Submodule 'example2361M2' (https://github.com/agmikhailov/example2361M2.git) registered for path 'example2361M2'
>   $ git submodule update
>   Cloning into '/tmp/example2361/example2361M1/example2361M2'...
>   Submodule path 'example2361M2': checked out '9ed39cf1fe0a8cf34e72d2e7ebff1ea9d4a63ac1'

I concur.

> > git pull --recurse-submodules=true
>
> And after that:
>
>   $ cd ../..
>   $ git pull --recurse-submodules=true
>   Fetching submodule example2361M1
>   Fetching submodule example2361M1/example2361M2
>   Already up to date.

Yes, I agree that _probably_ what the user wanted is to initialize the
submodules recursively.

Having said that, I vaguely remember that e.g. Boost has this insane
forest of submodules, and I am almost certain that no sane person wants
to clone them all. _I_ wouldn't.

> > ACTUAL RESULT
> >
> > "git --recurse-submodules=true" command fails with message "Could not access submodule":
> >
> > $ git --recurse-submodules=true
> > Fetching submodule example2361M1
> > Could not access submodule 'example2361M2'
> >
> > EXPECTED RESULT
> >
> > All submodules are successfully updated by "git --recurse-submodules=true" command.
> >
> > ADDITIONAL INFORMATION
> >
> > Git version 2.20.1 does not have this problem.
> > So we had to downgrade Git to work with submodules.
>
> The behavior was indeed different with v2.20.1, but that version
> didn't show the behavior you expected.  When running your commands
> with v2.20.1 I get:
>
>   $ ~/src/git/bin-wrappers/git pull --recurse-submodules=true
>   Fetching submodule example2361M1
>   Already up to date.
>   $ find example2361M1/example2361M2/
>   example2361M1/example2361M2/
>
> So while that 'git pull' didn't error out, it didn't even look at the
> nested submodule, which is still uninitialized and empty.

I would actually argue that this is what is expected: the entire _point_
of submodules is that they can be inactive.

Coming back to the Boost example, what I would want Git to do when only
a fraction of the submodules is active is to skip the inactive ones
during a `git pull --recurse-submodules=true`.

Which v2.20.1 apparently did, and I would call the current behavior a
regression.

> FWIW, bisecting shows that the behavior changed with commit
> a62387b3fc,

Thanks for digging this out!

> but, unfortunately, the commit message doesn't seem to be very helpful
> to me, but perhaps others with more experience with submodules can
> make something out of it.
>
> commit a62387b3fc9f5aeeb04a2db278121d33a9caafa7
> Author: Stefan Beller <sbeller@google.com>
> Date:   Wed Nov 28 16:27:55 2018 -0800
>
>     submodule.c: fetch in submodules git directory instead of in worktree
>
>     Keep the properties introduced in 10f5c52656 (submodule: avoid
>     auto-discovery in prepare_submodule_repo_env(), 2016-09-01), by fixating
>     the git directory of the submodule.
>
>     Signed-off-by: Stefan Beller <sbeller@google.com>
>     Signed-off-by: Junio C Hamano <gitster@pobox.com>
>
>  submodule.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

To me, this commit message suggests that the bahvior should not have
changed in the reported manner.

Let's look at the diff:

-- snip --
diff --git a/submodule.c b/submodule.c
index 77ace5e784a..d1b6646f42d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -494,6 +494,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 			 DEFAULT_GIT_DIR_ENVIRONMENT);
 }

+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+	prepare_submodule_repo_env_no_git_dir(out);
+	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1327,8 +1333,8 @@ static int get_next_submodule(struct child_process *cp,
 		repo = get_submodule_repo_for(spf->r, submodule);
 		if (repo) {
 			child_process_init(cp);
-			cp->dir = xstrdup(repo->worktree);
-			prepare_submodule_repo_env(&cp->env_array);
+			cp->dir = xstrdup(repo->gitdir);
+			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
-- snap --

The obvious difference in behavior is that we no longer let Git
discover the `.git` file/directory, but we define it (because we can).

But actually, we cannot, not if the submodule is not checked out!
Because in this case, there is no `.git` file and Git then tries to
initialize the repository and the worktree, and fails rather miserably
in the reported manner.

Side note: I think there is something spooky going on where we try to
fetch submodules twice when the first time failed, and somehow slip into
the `else` arm of this code:

-- snip --
		task->repo = get_submodule_repo_for(spf->r, task->sub);
		if (task->repo) {
			struct strbuf submodule_prefix = STRBUF_INIT;
			child_process_init(cp);
			cp->dir = task->repo->gitdir;
			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
			cp->git_cmd = 1;
			if (!spf->quiet)
				strbuf_addf(err, "Fetching submodule %s%s\n",
					    spf->prefix, ce->name);
			argv_array_init(&cp->args);
			argv_array_pushv(&cp->args, spf->args.argv);
			argv_array_push(&cp->args, default_argv);
			argv_array_push(&cp->args, "--submodule-prefix");

			strbuf_addf(&submodule_prefix, "%s%s/",
						       spf->prefix,
						       task->sub->path);
			argv_array_push(&cp->args, submodule_prefix.buf);

			spf->count++;
			*task_cb = task;

			strbuf_release(&submodule_prefix);
			return 1;
		} else {

			fetch_task_release(task);
			free(task);

			/*
			 * An empty directory is normal,
			 * the submodule is not initialized
			 */
			if (S_ISGITLINK(ce->ce_mode) &&
			    !is_empty_dir(ce->name)) {
				spf->result = 1;
				strbuf_addf(err,
					    _("Could not access submodule '%s'"),
					    ce->name);
			}
		}
-- snap --

As you can see, at that point, `is_empty_dir()` indicates that it is _no
longer empty_, which means that the recursive pull actually tried to
initialize it, and left things in a half-consistent state.

BTW I also think that Stefan's commit just made a bug very obvious that
had not mattered all that much before: it would seem that before that
commit, when the code tried to fetch a submodule, it would actually
fetch the super project (because it would discover the .git
file/directory, and as the inactive submodule does not have any, it
would have found the super project's).

In essence, I think that the report points out a very real bug, and this
bug should be fixed.

I _could_ imagine that it would be as easy as applying this patch (and
turning the provided reproducer into a regression test), then polishing
this with a nice commit message:

-- snip --
diff --git a/submodule.c b/submodule.c
index 0f199c51378..5694905610a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1443,6 +1443,12 @@ static int get_next_submodule(struct child_process *cp,
 		task->repo = get_submodule_repo_for(spf->r, task->sub);
 		if (task->repo) {
 			struct strbuf submodule_prefix = STRBUF_INIT;
+
+			/* skip uninitialized submodule */
+			if (!file_exists(task->repo->gitdir) ||
+			    is_empty_dir(task->repo->gitdir))
+				continue;
+
 			child_process_init(cp);
 			cp->dir = task->repo->gitdir;
 			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-- snap --

Any takers?

Ciao,
Johannes

      reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-23  7:22 Aleksey Mikhaylov
2019-10-23 10:04 ` SZEDER Gábor
2019-10-25 12:41   ` Johannes Schindelin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1910251415560.46@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=almikhailov@plesk.com \
    --cc=git@vger.kernel.org \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git