git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Glen Choo <chooglen@google.com>
Cc: "Jonathan Tan" <jonathantanmy@google.com>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
Date: Thu, 24 Feb 2022 16:39:27 -0800	[thread overview]
Message-ID: <20220225003928.2891685-1-jonathantanmy@google.com> (raw)
In-Reply-To: <20220224100842.95827-10-chooglen@google.com>

Glen Choo <chooglen@google.com> writes:
> In the process of writing the new tests [1], I noticed some failures of
> the form:
> 
>   # rm the submodule's working tree directory.
>   $ git rm submodule
>   [...]
> 
>   # Do a fetch that requires running a child process from the submodule.
>   $ git fetch --recurse-submodules same-name-1 
>   [...]
> 
>   # Fatal error tells us that we cannot chdir to the deleted working
>     tree.
>   fatal: cannot chdir to '../../../submodule': No such file or directory
> 
> This happens because submodules set/unset a value for core.worktree when
> they are checked out/"un-checked out" (see submodule_move_head() and
> connect_work_tree_and_git_dir()), but "git rm" doesn't know that
> core.worktree should be updated.
> 
> I've worked around this by passing "--work-tree=." to the child process
> [2], but this feels like a hack, especially because this bug should
> affect all child processes in a "git rm"-ed submodule (this probably
> includes the "git branch" processes in gc/branch-recurse-submodules, but
> I haven't confirmed it yet). 

Ah...that's a tricky bug. Thanks for finding it.

> Some more comprehensive solutions that
> could be future work are:
> 
> - Teach "git [add|rm]" to unset core.worktree (the reverse operation,
>   "git restore", should already do the correct thing). This won't detect
>   submodules removed with "rm -r" though.

This might work with the caveat you mentioned.

> - Teach submodule child processes to ignore stale core.worktree values.

This might work, coupled with Emily Shaffer's work on teaching
submodules to know that they're submodules (so we know when a stale
core.worktree can be safely ignored).

[1] https://lore.kernel.org/git/20220203215914.683922-1-emilyshaffer@google.com/

> - Do more things in-core instead of using child processes (avoiding the
>   failing chdir() call).

This might not work if the invocation needs to check the worktree (for
example, as far as I know, we won't delete a branch if it's currently
checked out in a worktree).

> I'm not sure what future work we should pursue, or even if the
> "--work-tree=." workaround is even good, so I'd appreciate feedback
> here.

I can't think of better solutions than what you listed, unfortunately. I
also can't think of a better workaround, but at least it's narrowly
scoped: we know that we're running on a submodule and that the operation
is not affected by a worktree (for example, we're fetching, but we know
we're not fetching with a refspec that updates a currently checked out
branch). Let's see what others have to say.

> @@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>  		if (!item->util)
>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>  		cs_data = item->util;
> +		cs_data->super_oid = commit_oid;
> +		cs_data->path = xstrdup(p->two->path);

Junio mentioned the possibility of cs_data->path already being non-NULL
[1].

[1] https://lore.kernel.org/git/xmqqy220j6kf.fsf@gitster.g/

> @@ -1253,14 +1266,33 @@ void check_for_new_submodule_commits(struct object_id *oid)
>  	oid_array_append(&ref_tips_after_fetch, oid);
>  }
>  
> +/*
> + * Returns 1 if there is at least one submodule gitdir in
> + * $GIT_DIR/modules and 0 otherwise. This follows
> + * submodule_name_to_gitdir(), which looks for submodules in
> + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
> + *
> + * A submodule can be moved to $GIT_DIR/modules manually by running "git
> + * submodule absorbgitdirs", or it may be initialized there by "git
> + * submodule update".
> + */
> +static int repo_has_absorbed_submodules(struct repository *r)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_repo_git_path(&buf, r, "modules/");
> +	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
> +}

buf needs to be released?

> @@ -1333,7 +1365,8 @@ int submodule_touches_in_range(struct repository *r,
>  }
>  
>  struct submodule_parallel_fetch {
> -	int count;
> +	int index_count;
> +	int changed_count;
>  	struct strvec args;
>  	struct repository *r;
>  	const char *prefix;

If we're sticking with these names, probably worth a comment. E.g.
"index_count" is the number of submodules in <name of field that this is
an index of> that we have processed, and likewise for "changed_count".

> @@ -1343,6 +1376,7 @@ struct submodule_parallel_fetch {
>  	int result;
>  
>  	struct string_list changed_submodule_names;
> +	struct string_list seen_submodule_names;
>  
>  	/* Pending fetches by OIDs */
>  	struct fetch_task **oid_fetch_tasks;

Also here - changed is the list that we generated from walking the
fetched superproject commits, and seen is the list of submodules we've
processed in <name of function>.

> @@ -1539,11 +1582,64 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)

[snip]

> +		/*
> +		 * NEEDSWORK: A submodule unpopulated by "git rm" will
> +		 * have core.worktree set, but the actual core.worktree
> +		 * directory won't exist, causing the child process to
> +		 * fail. Forcibly set --work-tree until we get smarter
> +		 * handling for core.worktree in unpopulated submodules.
> +		 */
> +		strvec_push(&task->git_args, "--work-tree=.");
> +		return task;
> +	}
> +	return NULL;
> +}

If we end up sticking to this workaround (which sounds reasonable to
me), the comment here probably should contain a lot of what was written
under the "---" in the commit message.

> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '

[snip]

> +# In downstream, init "submodule2", but do not check it out while
> +# fetching. This lets us assert that unpopulated submodules can be
> +# fetched.
> +test_expect_success 'setup downstream branch with other submodule' '
> +	mkdir submodule2 &&
> +	(
> +		cd submodule2 &&
> +		git init &&
> +		echo sub2content >sub2file &&
> +		git add sub2file &&
> +		git commit -a -m new &&
> +		git branch -M sub2
> +	) &&
> +	git checkout -b super-sub2-only &&
> +	git submodule add "$pwd/submodule2" submodule2 &&
> +	git commit -m "add sub2" &&
> +	git checkout super &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules origin &&
> +		git checkout super-sub2-only &&
> +		# Explicitly run "git submodule update" because sub2 is new
> +		# and has not been cloned.
> +		git submodule update --init &&
> +		git checkout --recurse-submodules super
> +	)
> +'

Hmm...what is the difference between this and the original case in which
the index has no submodules? Both assert that unpopulated submodules
(submodules that cannot be found by iterating the index, as described in
your commit message) can be fetched.

> +	# Use test_cmp manually because verify_fetch_result does not
> +	# consider submodule2. All the repos should be fetched, but only
> +	# submodule2 should be read from a commit
> +	cat <<-EOF > expect.err.combined &&
> +	From $pwd/.
> +	   OLD_HEAD..$super_head  super           -> origin/super
> +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
> +	Fetching submodule submodule
> +	From $pwd/submodule
> +	   OLD_HEAD..$sub_head  sub        -> origin/sub
> +	Fetching submodule submodule/subdir/deepsubmodule
> +	From $pwd/deepsubmodule
> +	   OLD_HEAD..$deep_head  deep       -> origin/deep
> +	Fetching submodule submodule2 at commit $super_sub2_only_head
> +	From $pwd/submodule2
> +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
> +	EOF
> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
> +	test_cmp expect.err.combined actual.err.cmp
> +'

Could verify_fetch_result be modified to consider the new submodule
instead?

> +test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
> +	test_when_finished "git -C same-name-downstream checkout master" &&
> +	(
> +		cd same-name-1 &&
> +		test_commit -C submodule --no-tag b1 &&
> +		git add submodule &&
> +		git commit -m "super-b1"
> +	) &&
> +	(
> +		cd same-name-2 &&
> +		test_commit -C submodule --no-tag b2 &&
> +		git add submodule &&
> +		git commit -m "super-b2"
> +	) &&
> +	(
> +		cd same-name-downstream &&
> +		# even though the .gitmodules is correct, we cannot
> +		# fetch from same-name-2
> +		git checkout same-name-2/master &&
> +		git fetch --recurse-submodules same-name-1 &&
> +		test_must_fail git fetch --recurse-submodules same-name-2

What's the error message printed to the user here? (Just from reading
the code, I would have expected this to succeed, with the submodule
fetch being from same-name-1's submodule since we're fetching submodules
by name, but apparently that is not the case.)

> +	(
> +		cd same-name-downstream/.git/modules/submodule &&
> +		# The submodule has core.worktree pointing to the "git
> +		# rm"-ed directory, overwrite the invalid value.
> +		git --work-tree=. cat-file -e $head1 &&
> +		test_must_fail git --work-tree=. cat-file -e $head2
> +	)

Regarding the worktree workaround, also say "see comment in
get_fetch_task() for more information" or something like that.

  parent reply	other threads:[~2022-02-25  0:39 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
2022-02-10  4:41 ` [PATCH 1/8] submodule: inline submodule_commits() into caller Glen Choo
2022-02-10  4:41 ` [PATCH 2/8] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-02-10 19:00   ` Jonathan Tan
2022-02-10 22:05     ` Junio C Hamano
2022-02-10  4:41 ` [PATCH 3/8] submodule: make static functions read submodules from commits Glen Choo
2022-02-10 19:15   ` Jonathan Tan
2022-02-11 10:07     ` Glen Choo
2022-02-11 10:09     ` Glen Choo
2022-02-10  4:41 ` [PATCH 4/8] t5526: introduce test helper to assert on fetches Glen Choo
2022-02-10  4:41 ` [PATCH 5/8] t5526: use grep " Glen Choo
2022-02-10 19:17   ` Jonathan Tan
2022-02-10  4:41 ` [PATCH 6/8] submodule: extract get_fetch_task() Glen Choo
2022-02-10 19:33   ` Jonathan Tan
2022-02-10  4:41 ` [PATCH 7/8] fetch: fetch unpopulated, changed submodules Glen Choo
2022-02-10 22:49   ` Junio C Hamano
2022-02-11  7:15     ` Glen Choo
2022-02-11 17:07       ` Junio C Hamano
2022-02-10 22:51   ` Jonathan Tan
2022-02-14  4:24     ` Glen Choo
2022-02-14 18:04     ` Glen Choo
2022-02-14 10:17   ` Glen Choo
2022-02-10  4:41 ` [PATCH 8/8] submodule: fix bug and remove add_submodule_odb() Glen Choo
2022-02-10 22:54   ` Junio C Hamano
2022-02-11  3:13     ` Glen Choo
2022-02-10 23:04   ` Jonathan Tan
2022-02-11  3:18     ` Glen Choo
2022-02-11 17:19     ` Junio C Hamano
2022-02-14  2:52       ` Glen Choo
2022-02-10  7:07 ` [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
2022-02-10  8:51   ` Glen Choo
2022-02-10 17:40     ` Junio C Hamano
2022-02-11  2:39       ` Glen Choo
2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
2022-02-15 17:23   ` [PATCH v2 1/9] t5526: introduce test helper to assert on fetches Glen Choo
2022-02-15 21:37     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 2/9] t5526: use grep " Glen Choo
2022-02-15 21:53     ` Ævar Arnfjörð Bjarmason
2022-02-16  3:09       ` Glen Choo
2022-02-16 10:02         ` Ævar Arnfjörð Bjarmason
2022-02-17  4:04           ` Glen Choo
2022-02-17  9:25             ` Ævar Arnfjörð Bjarmason
2022-02-17 16:16               ` Glen Choo
2022-02-15 17:23   ` [PATCH v2 3/9] submodule: make static functions read submodules from commits Glen Choo
2022-02-15 21:18     ` Jonathan Tan
2022-02-16  6:59       ` Glen Choo
2022-02-15 22:00     ` Ævar Arnfjörð Bjarmason
2022-02-16  7:08       ` Glen Choo
2022-02-15 17:23   ` [PATCH v2 4/9] submodule: inline submodule_commits() into caller Glen Choo
2022-02-15 22:02     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 5/9] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-02-15 21:33     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 6/9] submodule: extract get_fetch_task() Glen Choo
2022-02-15 17:23   ` [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules Glen Choo
2022-02-15 22:02     ` Jonathan Tan
2022-02-16  5:46       ` Glen Choo
2022-02-16  9:11         ` Glen Choo
2022-02-16  9:39           ` Ævar Arnfjörð Bjarmason
2022-02-16 17:33             ` Glen Choo
2022-02-15 22:06     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 8/9] submodule: read shallows when finding " Glen Choo
2022-02-15 22:03     ` Jonathan Tan
2022-02-15 17:23   ` [PATCH v2 9/9] submodule: fix latent check_has_commit() bug Glen Choo
2022-02-15 22:04     ` Jonathan Tan
2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
2022-02-24 10:08     ` [PATCH v3 01/10] t5526: introduce test helper to assert on fetches Glen Choo
2022-02-25  0:34       ` Junio C Hamano
2022-02-24 10:08     ` [PATCH v3 02/10] t5526: stop asserting on stderr literally Glen Choo
2022-02-24 11:52       ` Ævar Arnfjörð Bjarmason
2022-02-24 16:15         ` Glen Choo
2022-02-24 18:13           ` Eric Sunshine
2022-02-24 23:05       ` Jonathan Tan
2022-02-25  2:26         ` Glen Choo
2022-02-24 10:08     ` [PATCH v3 03/10] t5526: create superproject commits with test helper Glen Choo
2022-02-24 23:14       ` Jonathan Tan
2022-02-25  2:52         ` Glen Choo
2022-02-25 11:42           ` Ævar Arnfjörð Bjarmason
2022-02-28 18:11             ` Glen Choo
2022-02-24 10:08     ` [PATCH v3 04/10] submodule: make static functions read submodules from commits Glen Choo
2022-02-24 10:08     ` [PATCH v3 05/10] submodule: inline submodule_commits() into caller Glen Choo
2022-02-24 10:08     ` [PATCH v3 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-02-24 10:08     ` [PATCH v3 07/10] submodule: extract get_fetch_task() Glen Choo
2022-02-24 23:26       ` Jonathan Tan
2022-02-24 10:08     ` [PATCH v3 08/10] submodule: move logic into fetch_task_create() Glen Choo
2022-02-24 23:36       ` Jonathan Tan
2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
2022-02-24 21:30       ` Junio C Hamano
2022-02-25  3:04         ` Glen Choo
2022-02-25  0:33       ` Junio C Hamano
2022-02-25  3:07         ` Glen Choo
2022-02-25  0:39       ` Jonathan Tan [this message]
2022-02-25  3:46         ` Glen Choo
2022-03-04 23:46           ` Jonathan Tan
2022-03-05  0:22             ` Glen Choo
2022-03-04 23:53           ` Jonathan Tan
2022-02-26 18:53       ` Junio C Hamano
2022-03-01 20:24         ` Johannes Schindelin
2022-03-01 20:33           ` Junio C Hamano
2022-03-02 23:25             ` Glen Choo
2022-03-01 20:32         ` Junio C Hamano
2022-02-24 10:08     ` [PATCH v3 10/10] submodule: fix latent check_has_commit() bug Glen Choo
2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
2022-03-04  0:57       ` [PATCH v4 01/10] t5526: introduce test helper to assert on fetches Glen Choo
2022-03-04  2:06         ` Junio C Hamano
2022-03-04 22:11           ` Glen Choo
2022-03-04  0:57       ` [PATCH v4 02/10] t5526: stop asserting on stderr literally Glen Choo
2022-03-04  2:12         ` Junio C Hamano
2022-03-04 22:41         ` Jonathan Tan
2022-03-04 23:48           ` Junio C Hamano
2022-03-05  0:25             ` Glen Choo
2022-03-04  0:57       ` [PATCH v4 03/10] t5526: create superproject commits with test helper Glen Choo
2022-03-04 22:59         ` Jonathan Tan
2022-03-04  0:57       ` [PATCH v4 04/10] submodule: make static functions read submodules from commits Glen Choo
2022-03-04  0:57       ` [PATCH v4 05/10] submodule: inline submodule_commits() into caller Glen Choo
2022-03-04  0:57       ` [PATCH v4 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-03-04  0:57       ` [PATCH v4 07/10] submodule: extract get_fetch_task() Glen Choo
2022-03-04  0:57       ` [PATCH v4 08/10] submodule: move logic into fetch_task_create() Glen Choo
2022-03-04  0:57       ` [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
2022-03-04  2:37         ` Junio C Hamano
2022-03-04 22:59           ` Glen Choo
2022-03-05  0:13             ` Junio C Hamano
2022-03-05  0:37               ` Glen Choo
2022-03-08  0:11                 ` Junio C Hamano
2022-03-04 23:56         ` Jonathan Tan
2022-03-04  0:57       ` [PATCH v4 10/10] submodule: fix latent check_has_commit() bug Glen Choo
2022-03-04  2:17         ` Junio C Hamano
2022-03-04  2:22       ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
2022-03-08  0:14       ` [PATCH v5 " Glen Choo
2022-03-08  0:14         ` [PATCH v5 01/10] t5526: introduce test helper to assert on fetches Glen Choo
2022-03-08  0:14         ` [PATCH v5 02/10] t5526: stop asserting on stderr literally Glen Choo
2022-03-08  0:14         ` [PATCH v5 03/10] t5526: create superproject commits with test helper Glen Choo
2022-03-08  0:14         ` [PATCH v5 04/10] submodule: make static functions read submodules from commits Glen Choo
2022-03-08  0:14         ` [PATCH v5 05/10] submodule: inline submodule_commits() into caller Glen Choo
2022-03-08  0:14         ` [PATCH v5 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-03-08  0:14         ` [PATCH v5 07/10] submodule: extract get_fetch_task() Glen Choo
2022-03-08  0:14         ` [PATCH v5 08/10] submodule: move logic into fetch_task_create() Glen Choo
2022-03-08  0:14         ` [PATCH v5 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
2022-03-08  0:14         ` [PATCH v5 10/10] submodule: fix latent check_has_commit() bug Glen Choo
2022-03-08  0:50         ` [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
2022-03-08 18:24           ` Glen Choo
2022-03-09 19:13             ` Junio C Hamano
2022-03-09 19:49               ` Glen Choo
2022-03-09 20:22                 ` Junio C Hamano
2022-03-09 22:11                   ` Glen Choo
2022-03-16 21:58                     ` Glen Choo
2022-03-16 22:06                       ` Junio C Hamano
2022-03-16 22:37                         ` Glen Choo
2022-03-16 23:08                           ` Junio C Hamano
2022-03-08 21:42         ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220225003928.2891685-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=avarab@gmail.com \
    --cc=chooglen@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).