git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Antonio Ospite <ao2@ao2.it>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: gitster@pobox.com, git@vger.kernel.org,
	"Brandon Williams" <bmwill@google.com>,
	"Daniel Graña" <dangra@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Richard Hartmann" <richih.mailinglist@gmail.com>,
	"Stefan Beller" <sbeller@google.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v5 9/9] submodule: support reading .gitmodules when it's not in the working tree
Date: Mon, 24 Sep 2018 12:20:31 +0200	[thread overview]
Message-ID: <20180924122031.9dbec6b4c2e2a8c1bff3365b@ao2.it> (raw)
In-Reply-To: <20180918171257.GC27036@localhost>

On Tue, 18 Sep 2018 19:12:57 +0200
SZEDER Gábor <szeder.dev@gmail.com> wrote:

[...]
> On Mon, Sep 17, 2018 at 04:09:40PM +0200, Antonio Ospite wrote:
> > When the .gitmodules file is not available in the working tree, try
> > using the content from the index and from the current branch.
> 
> "from the index and from the current branch" of which repository?
>

I took a look, some comments below.

> > diff --git a/submodule-config.c b/submodule-config.c
> > index 61a555e920..bdb1d0e2c9 100644
> > --- a/submodule-config.c
> > +++ b/submodule-config.c
> 
> > @@ -603,8 +604,21 @@ static void submodule_cache_check_init(struct repository *repo)
> >  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
> >  {
[...]
> > +		file = repo_worktree_path(repo, GITMODULES_FILE);
> > +		if (file_exists(file))
> > +			config_source.file = file;
> > +		else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
> > +			config_source.blob = GITMODULES_INDEX;
> > +		else if (get_oid(GITMODULES_HEAD, &oid) >= 0)
> > +			config_source.blob = GITMODULES_HEAD;
> > +
> 
> The repository used in t7814 contains nested submodules, which means
> that config_from_gitmodules() is invoked three times.
> 
> Now, the first two of those calls look at the superproject and at
> 'submodule', and find the existing files '.../trash
> directory.t7814-grep-recurse-submodules/.gitmodules' and '.../trash
> directory.t7814-grep-recurse-submodules/submodule/.gitmodules',
> respectively.  So far so good.
> 
> The third call, however, looks at the nested submodule at
> 'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
> function goes on with the second condition and calls
> get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
> in the _superproject's_ index.
>

You are correct.

This is a limitation of the object store in git, there is no equivalent
of get_oid() to get the oid from a specific repository and this affects
config_with_options too when the config source is a blob.

This does not affect commands called via "git -C submodule_dir cmd"
because in that case the chdir happens before the_repository is set up,
for instance "git-submodule $SOMETHING --recursive" commands seem to
change the working directory before the recursion.

> > +		config_with_options(fn, data, &config_source, &opts);
> > +
> >  		free(file);
> >  	}
> >  }
> I'm no expert on submodules, but my gut feeling says that this can't
> be right.  But if it _is_ right, then I would say that the commit
> message should explain in detail, why it is right.
> 

I agree it isn't right, I didn't consider the case of nested
submodules, but even if I did the current git design does not
allow to correctly solve the problem: "read config from blob of an
arbitrary repository".

So what to do for the time being?

The issue is there but in a "normal" scenario it is not causing any real
harm because:

  1. currently it is exposed "only" by git grep and nested submodules.

  2. the new mechanism works fine when reading the submodule config for
     the root repository.

  3. the new mechanism does not "usually" impact non-leaf nested
     submodules, because the .gitmodules file is "normally" there.

  4. git grep never *uses* the GITMODULES_INDEX erroneously read
     from the root project when scanning the _leaf_ nested submodule
     because there are no further submodules down the line, the
     following check fails in builtin/grep.c:

       if (recurse_submodules && S_ISGITLINK(entry.mode) ...
       ...

In fact, because of 4. the test suite passes even if the gitmodule
config is not correct for the leaf submodule.

Actually 4. makes me think that the repo_read_gitmodules() call in
builtin/grep.c might not be strictly necessary, its purpose seems to be
to *anticipate* reading the config of *child* submodules while still
processing the *current* submodule, the config for the *current*
submodule was already read from the superproject by the immediately
preceding repo_submodule_init(), via:

  repo_submodule_init()
    submodule_from_path()
      gitmodules_read_check()
        repo_read_gitmodules()

And this would happen anyway also for child submodules down the
recursion path if we removed repo_read_gitmodules() in builtin/grep.c,
the operation would be not protected by grep_read_lock() tho.

The test suite passes even after removing repo_read_gitmodules()
entirely from builtin/grep.c, but I am still not confident that I get
all the implication of why that call was originally added in commit
f9ee2fcdfa (grep: recurse in-process using 'struct repository',
2017-08-02).

Anyways, even if we removed the call we would prevent the problem from
happening in the test suite, but not in the real world, in case non-leaf
submodules without .gitmodules in their working tree.

To recap:
  - patch 9/9 exposes a problem with the object store but for now it's
    only a potential problem in the future case that someone wanted to
    use *nested* submodules without .gitmodules in the working tree.
  - the new code should not affect current users which
    assume .gitmodules to be in the working tree for nested submodules;
  - even if we removed the call to repo_read_gitmodules() call in
    builtin/grep.c we would not avoid the problem entirely, just avoid
    it for the the _leaf_ submodules in case of nested submodules.

Selfishly, I'd propose to still merge the changes (I can send a v6 with
the locking fix in) and I'll write a test_expect_failure snippet to
document the problem Gábor spotted so we remember about it and fix it
when the object store can be accessed per-repository.

I am afraid I cannot look into the core issue about the object store in
my free time, however if someone wanted to sponsor some time I might
consider taking a stab at it.

Ciao,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

  parent reply	other threads:[~2018-09-24 10:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-17 14:09 [PATCH v5 0/9] Make submodules work if .gitmodules is not checked out Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 1/9] submodule: add a print_config_from_gitmodules() helper Antonio Ospite
2018-09-24 10:25   ` Antonio Ospite
2018-09-24 23:06     ` Stefan Beller
2018-09-17 14:09 ` [PATCH v5 2/9] submodule: factor out a config_set_in_gitmodules_file_gently function Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 3/9] t7411: merge tests 5 and 6 Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 4/9] t7411: be nicer to future tests and really clean things up Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 5/9] submodule--helper: add a new 'config' subcommand Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 6/9] submodule: use the 'submodule--helper config' command Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 7/9] t7506: clean up .gitmodules properly before setting up new scenario Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 8/9] submodule: add a helper to check if it is safe to write to .gitmodules Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 9/9] submodule: support reading .gitmodules when it's not in the working tree Antonio Ospite
2018-09-18 17:12   ` SZEDER Gábor
2018-09-19 19:24     ` Junio C Hamano
2018-09-20 15:35     ` Antonio Ospite
2018-09-21 16:19       ` Junio C Hamano
2018-09-27 14:49         ` Antonio Ospite
2018-09-24 10:20     ` Antonio Ospite [this message]
2018-09-24 21:00       ` Stefan Beller
2018-09-27 14:44         ` Antonio Ospite
2018-09-27 18:00           ` Stefan Beller
2018-10-01 15:45             ` Antonio Ospite
2018-10-01 19:42               ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180924122031.9dbec6b4c2e2a8c1bff3365b@ao2.it \
    --to=ao2@ao2.it \
    --cc=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dangra@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=richih.mailinglist@gmail.com \
    --cc=sbeller@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).