git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Antonio Ospite <ao2@ao2.it>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: gitster@pobox.com, git@vger.kernel.org,
	"Brandon Williams" <bmwill@google.com>,
	"Daniel Graña" <dangra@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Richard Hartmann" <richih.mailinglist@gmail.com>,
	"Stefan Beller" <sbeller@google.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v5 9/9] submodule: support reading .gitmodules when it's not in the working tree
Date: Thu, 20 Sep 2018 17:35:52 +0200	[thread overview]
Message-ID: <20180920173552.6109014827a062dcf3821632@ao2.it> (raw)
In-Reply-To: <20180918171257.GC27036@localhost>

On Tue, 18 Sep 2018 19:12:57 +0200
SZEDER Gábor <szeder.dev@gmail.com> wrote:

> Hi Antonio,
> 
> it appears that this patch (and its previous versions as well) is
> responsible for triggering occasional test failures in
> 't7814-grep-recurse-submodules.sh', more frequently, about once in
> every ten runs, on macOS on Travis CI, less frequently, about once in
> a couple of hundred runs on Linux on my machine.
>

Thanks a lot for testing Gábor, it's really appreciated.

> The reason for the failure is memory corruption manifesting in various
> ways: segfault, malloc() or use after free() errors from libc, corrupt
> loose object, invalid ref, bogus output, etc.
> 
> Applying the following patch makes t7814 fail almost every time,
> though sometimes that loop has to iterate over 1000 times until that
> 'git grep' finally fails...  so good luck with debugging ;)
[...]

I managed to capture some traces of the segfaults using this variation:

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 7184113b9b..56e87c3f8a 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -337,6 +337,10 @@ test_expect_success 'grep --recurse-submodules should pass the pattern type alon
        test_must_fail git -c grep.patternType=fixed grep --recurse-submodules -e "(.|.)[\d]" &&

        # Basic
+       for i in $(test_seq 0 2000)
+       do
+               debug --debugger="gdb --silent -ex run -ex quit --return-child-result --args" git grep --recurse-submodules 1 >/dev/null || return 1
+       done &&
        git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
        cat >expect <<-\EOF &&
        a:(1|2)d(3|4)


Running t7814 with --run="1,6,22" is enough to observe the issue.

FWICS these corruptions are caused by concurrent accesses to the object
store.

The issue is caused by these facts:
  1. git grep uses threads;
  2. git grep reads submodules config with repo_read_gitmodules;
  3. repo_read_gitmodules calls config_from_gitmodules
  4. the changes in patch 9 in this series make config_from_gitmodules
     use the object store, which apparently is not mt-safe, while the
     previous use of git_config_from_file() was.

> On first look I didn't notice anything that is obviously wrong in this
> patch and could be responsible for the memory corruption, but there is
> one thing I found strange, though:
> 
> 
> On Mon, Sep 17, 2018 at 04:09:40PM +0200, Antonio Ospite wrote:
> > When the .gitmodules file is not available in the working tree, try
> > using the content from the index and from the current branch.
> 
> "from the index and from the current branch" of which repository?
> 
[...]

> > diff --git a/submodule-config.c b/submodule-config.c
> > index 61a555e920..bdb1d0e2c9 100644
> > --- a/submodule-config.c
> > +++ b/submodule-config.c
> 
> > @@ -603,8 +604,21 @@ static void submodule_cache_check_init(struct repository *repo)
> >  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
> >  {
> >  	if (repo->worktree) {
> > -		char *file = repo_worktree_path(repo, GITMODULES_FILE);
> > -		git_config_from_file(fn, file, data);
> > +		struct git_config_source config_source = { 0 };
> > +		const struct config_options opts = { 0 };
> > +		struct object_id oid;
> > +		char *file;
> > +
> > +		file = repo_worktree_path(repo, GITMODULES_FILE);
> > +		if (file_exists(file))
> > +			config_source.file = file;
> > +		else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
> > +			config_source.blob = GITMODULES_INDEX;
> 
> The repository used in t7814 contains nested submodules, which means
> that config_from_gitmodules() is invoked three times.
> 
> Now, the first two of those calls look at the superproject and at
> 'submodule', and find the existing files '.../trash
> directory.t7814-grep-recurse-submodules/.gitmodules' and '.../trash
> directory.t7814-grep-recurse-submodules/submodule/.gitmodules',
> respectively.  So far so good.
> 
> The third call, however, looks at the nested submodule at
> 'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
> function goes on with the second condition and calls
> get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
> in the _superproject's_ index.
> 
> I'm no expert on submodules, but my gut feeling says that this can't
> be right.  But if it _is_ right, then I would say that the commit
> message should explain in detail, why it is right.
>

I'll think about that too.

> Anyway, even if it is indeed wrong, I'm not sure whether this is the
> root cause of the memory corruption.
> 

I think the immediate cause of the corruptions is multi-threading in
grep, I can prevent the issue from happening by using "git grep
--threads 1 ...".

Protecting the problematic submodules function could work for now, but
I'd like to have more comments, my proposal is:

diff --git a/builtin/grep.c b/builtin/grep.c
index 601f801158..52b45de749 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -427,6 +427,11 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
        if (repo_submodule_init(&submodule, superproject, path))
                return 0;

+       grep_read_lock();
+       /*
+        * NEEDSWORK: repo_read_gitmodules accesses the object store which is
+        * global, thus it needs to be protected.
+        */
        repo_read_gitmodules(&submodule);

        /*
@@ -439,7 +444,6 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
         * store is no longer global and instead is a member of the repository
         * object.
         */
-       grep_read_lock();
        add_to_alternates_memory(submodule.objects->objectdir);
        grep_read_unlock();


The pre-existing NEEDSWORK comment there also suggests that these
problems with the object store are known. I was not aware of them.

With the change from above I could not reproduce the problem anymore,
this should be the only location where
repo_read_gitmodules/config_from_gitmodules is called in a thread.

Thanks you,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

  parent reply	other threads:[~2018-09-20 15:35 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-17 14:09 [PATCH v5 0/9] Make submodules work if .gitmodules is not checked out Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 1/9] submodule: add a print_config_from_gitmodules() helper Antonio Ospite
2018-09-24 10:25   ` Antonio Ospite
2018-09-24 23:06     ` Stefan Beller
2018-09-17 14:09 ` [PATCH v5 2/9] submodule: factor out a config_set_in_gitmodules_file_gently function Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 3/9] t7411: merge tests 5 and 6 Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 4/9] t7411: be nicer to future tests and really clean things up Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 5/9] submodule--helper: add a new 'config' subcommand Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 6/9] submodule: use the 'submodule--helper config' command Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 7/9] t7506: clean up .gitmodules properly before setting up new scenario Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 8/9] submodule: add a helper to check if it is safe to write to .gitmodules Antonio Ospite
2018-09-17 14:09 ` [PATCH v5 9/9] submodule: support reading .gitmodules when it's not in the working tree Antonio Ospite
2018-09-18 17:12   ` SZEDER Gábor
2018-09-19 19:24     ` Junio C Hamano
2018-09-20 15:35     ` Antonio Ospite [this message]
2018-09-21 16:19       ` Junio C Hamano
2018-09-27 14:49         ` Antonio Ospite
2018-09-24 10:20     ` Antonio Ospite
2018-09-24 21:00       ` Stefan Beller
2018-09-27 14:44         ` Antonio Ospite
2018-09-27 18:00           ` Stefan Beller
2018-10-01 15:45             ` Antonio Ospite
2018-10-01 19:42               ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180920173552.6109014827a062dcf3821632@ao2.it \
    --to=ao2@ao2.it \
    --cc=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dangra@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=richih.mailinglist@gmail.com \
    --cc=sbeller@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).