git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Uma Srinivasan <usrinivasan@twitter.com>,
	Jacob Keller <jacob.keller@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Heiko Voigt <hvoigt@hvoigt.net>
Subject: Re: git submodules implementation question
Date: Thu, 1 Sep 2016 14:04:07 -0700	[thread overview]
Message-ID: <CAGZ79kaVBX_zLsSwkz=d_JUfvkeYHyZ1kkYm7Ae_dGBfFarDCQ@mail.gmail.com> (raw)
In-Reply-To: <xmqq4m5zwevl.fsf@gitster.mtv.corp.google.com>

On Thu, Sep 1, 2016 at 1:21 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Stefan Beller <sbeller@google.com> writes:
>>
>>>> The final version needs to be accompanied with tests to show the
>>>> effect of this change for callers.  A test would set up a top-level
>>>> and submodule, deliberately break submodule/.git/ repository and
>>>> show what breaks and how without this change.
>>>
>>> Tests are really good at providing this context as well, or to communicate
>>> the actual underlying problem, which is not quite clear to me.
>>> That is why I refrained from jumping into the discussion as I think the
>>> first few emails were dropped from the mailing list and I am missing context.
>>
>> I do not know where you started reading, but the gist of it is that
>> submodule.c spawns subprocess to run in the submodule's context by
>> assuming that chdir'ing into the <path> of the submodule and running
>> it (i.e. cp.dir set to <path> to drive start_command(&cp)) is
>> sufficient.  When <path>/.git (either it is a directory itself or it
>> points at a directory in .git/module/<name> in the superproject) is
>> a corrupt repository, running "git -C <path> command" would try to
>> auto-detect the repository, because it thinks <path>/.git is not a
>> repository and it thinks it is not at the top-level of the working
>> tree, and instead finds the repository of the top-level, which is
>> almost never what we want.
>
> This is with a test that covers the call in get_next_submodule() for
> the parallel fetch callback.  I think many of the codepaths will end
> up recursing forever the same way without the fix in a submodule
> repository that is broken in a similar way, but I didn't check, so
> I do not consider this to be completed.

Oh I see. That seems like a nasty bug.

>
> -- >8 --
> Subject: submodule: avoid auto-discovery in prepare_submodule_repo_env()
>
> The function is used to set up the environment variable used in a
> subprocess we spawn in a submodule directory.  The callers set up a
> child_process structure, find the working tree path of one submodule
> and set .dir field to it, and then use start_command() API to spawn
> the subprocess like "status", "fetch", etc.
>
> When this happens, we expect that the ".git" (either a directory or
> a gitfile that points at the real location) in the current working
> directory of the subprocess MUST be the repository for the submodule.
>
> If this ".git" thing is a corrupt repository, however, because
> prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the
> subprocess will see ".git", thinks it is not a repository, and
> attempt to find one by going up, likely to end up in finding the
> repository of the superproject.  In some codepaths, this will cause
> a command run with the "--recurse-submodules" option to recurse
> forever.
>
> By exporting GIT_DIR=.git, disable the auto-discovery logic in the
> subprocess, which would instead stop it and report an error.

and GIT_DIR=.git works for both .git files as well as the old fashioned way,
with the submodule repository at .git/, although that is not really documented.

>
> Not-signed-off-yet.
> ---
>  submodule.c                 |  1 +
>  t/t5526-fetch-submodules.sh | 29 +++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/submodule.c b/submodule.c
> index 1b5cdfb..e8258f0 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out)
>                 if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
>                         argv_array_push(out, *var);
>         }
> +       argv_array_push(out, "GIT_DIR=.git");
>  }
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 954d0e4..b2dee30 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -485,4 +485,33 @@ test_expect_success 'fetching submodules respects parallel settings' '
>         )
>  '
>
> +test_expect_success 'fetching submodule into a broken repository' '
> +       # Prepare src and src/sub nested in it
> +       git init src &&
> +       (
> +               cd src &&
> +               git init sub &&
> +               git -C sub commit --allow-empty -m "initial in sub" &&
> +               git submodule add -- ./sub sub &&
> +               git commit -m "initial in top"
> +       ) &&

This is not needed, as setup() set up some repositories for you.

> +
> +       # Clone the old-fashoned way
> +       git clone src dst &&

if you don't go with your own setup, maybe:

        git clone . dst
        git -C dst clone ../submodule submodule

    # and further down s/sub/submodule/

> +       git -C dst clone ../src/sub sub &&
> +
> +       # Make sure that old-fashoned layout is still supported

fashioned

> +       git -C dst status &&
> +
> +       # Recursive-fetch works fine
> +       git -C dst fetch --recurse-submodules &&
> +
> +       # Break the receiving submodule
> +       rm -f dst/sub/.git/HEAD &&
> +
> +       # Recursive-fetch must terminate
> +       # NOTE: without fix this will recurse forever!
> +       test_must_fail git -C dst fetch --recurse-submodules

So in case we'd break it again in the future we'd notice by an
infinite time for the test suite, then we'd notice this comment and
know what's going on?

I would have suggested to run this actual test in a subshell and then
check if the expected failure would have recovered after a second or two
and depending on that trigger a test failure.

But it doesn't seem to be easy with shells, so this is fine (even the newer
signed off patch, apart from nits above)

> +'
> +
>  test_done
>
>

  parent reply	other threads:[~2016-09-01 21:10 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-28 23:24 git submodules implementation question Uma Srinivasan
2016-08-29 20:03 ` Junio C Hamano
2016-08-29 21:03   ` Uma Srinivasan
2016-08-29 21:09     ` Junio C Hamano
2016-08-29 21:13       ` Uma Srinivasan
2016-08-29 23:04         ` Uma Srinivasan
2016-08-29 23:15           ` Junio C Hamano
2016-08-29 23:34             ` Uma Srinivasan
2016-08-30  0:02             ` Jacob Keller
2016-08-30  0:12               ` Uma Srinivasan
2016-08-30  6:09                 ` Jacob Keller
2016-08-30  6:23                   ` Jacob Keller
2016-08-30 17:40                     ` Uma Srinivasan
2016-08-30 17:53                       ` Junio C Hamano
2016-08-31  2:54                         ` Uma Srinivasan
2016-08-31 16:42                           ` Junio C Hamano
2016-08-31 18:40                             ` Uma Srinivasan
2016-08-31 18:44                               ` Junio C Hamano
2016-08-31 18:58                                 ` Uma Srinivasan
2016-09-01  1:04                                   ` Uma Srinivasan
2016-09-01  4:09                             ` Junio C Hamano
2016-09-01 16:05                               ` Uma Srinivasan
2016-09-01 18:32                                 ` Junio C Hamano
2016-09-01 18:37                                   ` Stefan Beller
2016-09-01 19:19                                     ` Junio C Hamano
2016-09-01 19:56                                       ` Uma Srinivasan
2016-09-01 20:29                                         ` Junio C Hamano
2016-09-01 20:21                                       ` Junio C Hamano
2016-09-01 21:02                                         ` Junio C Hamano
2016-09-01 21:04                                         ` Stefan Beller [this message]
2016-09-01 21:12                                           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kaVBX_zLsSwkz=d_JUfvkeYHyZ1kkYm7Ae_dGBfFarDCQ@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jacob.keller@gmail.com \
    --cc=usrinivasan@twitter.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).