git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Martin Ågren" <martin.agren@gmail.com>
To: git@vger.kernel.org
Cc: Brandon Williams <bmwill@google.com>
Subject: [PATCH/RFC] grep: turn off threading when recursing submodules
Date: Thu, 19 Oct 2017 21:07:03 +0200	[thread overview]
Message-ID: <20171019190703.1534-1-martin.agren@gmail.com> (raw)

With --recurse-submodules, we use the machinery for alternate object
databases and pretend that the repository is an alternate ODB. This has
some drawbacks since the list of alternates is global and will grow as
we proceed. Still, that's a problem with performance, not correctness.
The other immediately available option is to spawn an external process.
We actually used to do this until f9ee2fcdf (grep: recurse in-process
using 'struct repository', 2017-08-02).

So, as we encounter submodules during our recursing, we add them to the
list of ODBs. But with threading, our changes to the list are not
protected against races. Indeed, ThreadSanitizer reports a problem in
this area with t7814.

The race which ThreadSanitizer detects is that `grep_submodule()` calls
`add_to_alternates_memory()` around the same time that
`grep_source_load_oid()` ends up calling `prepare_packed_git()`. A
similar race might occur if two threads encounter a submodule each at
the same time and add them to the list of ODBs. This might end up losing
one of them and could give incorrect results.

Turn off threading when we're recursing submodules to avoid this race.
Long term, a better approach will be to address the existing NEEDSWORK
in builtin/grep.c to introduce per-repo object stores. Then we should be
able to revert this commit.

Alternatively, we could equip the list with a mutex (or maybe do some
lock-less cleverness), but it seems a bit sad considering it shouldn't
really be needed: the list of ODBs would normally be fully populated
before we start using it.

Another approach would be to first recurse the submodules and collect
the ODBs, then recurse again to do the actual grepping. That would be a
more involved change. Or, we could revert f9ee2fcdf. That would hurt
those who do not use threading.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
This is a simple, stupid solution. I'm posting this in patch-form so
that I can do one better than just mailing about the race and waving my
hands. Maybe threading is common enough that reverting could be a better
approach. Or implementing some (optional) locking...

Output from ThreadSanitizer below the patch.

 builtin/grep.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2d65f27d0..29f79104d 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1023,6 +1023,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		die(_("invalid number of threads specified (%d)"), num_threads);
 	if (num_threads == 1)
 		num_threads = 0;
+	/*
+	 * NEEDSWORK: When we recurse through submodules we misuse the
+	 * alt-odb-mechanism (alternate object databases). Doing so may hit
+	 * data-races if we are threading. This can be removed once the object
+	 * store is no longer global and instead is a member of the repository
+	 * object.
+	 */
+	if (recurse_submodules)
+		num_threads = 0;
 #else
 	if (num_threads)
 		warning(_("no threads support, ignoring --threads"));
-- 
WARNING: ThreadSanitizer: data race (pid=27207)
  Write of size 8 at 0x7d4c0000de40 by main thread:
    #0 link_alt_odb_entry sha1_file.c:361
    #1 link_alt_odb_entries.lto_priv.1377 sha1_file.c:422
    #2 add_to_alternates_memory sha1_file.c:510
    #3 grep_submodule builtin/grep.c:434
    #4 grep_cache builtin/grep.c:508
    #5 grep_submodule builtin/grep.c:462
    #6 grep_cache builtin/grep.c:508
    #7 cmd_grep builtin/grep.c:1092
    #8 run_builtin git.c:346
    #9 handle_builtin.lto_priv.1929 git.c:554
    #10 run_argv git.c:606
    #11 cmd_main git.c:683
    #12 main common-main.c:43

  Previous read of size 8 at 0x7d4c0000de40 by thread T4 (mutexes: write
M6):
    #0 prepare_packed_git.part.9.lto_priv.953 packfile.c:883
    #1 prepare_packed_git packfile.c:289
    #2 find_pack_entry packfile.c:1836
    #3 sha1_object_info_extended sha1_file.c:1179
    #4 read_object.lto_priv.900 packfile.c:1453
    #5 read_sha1_file_extended.constprop.779 sha1_file.c:1279
    #6 read_sha1_file cache.h:1162
    #7 grep_source_load_oid grep.c:1966
    #8 grep_source_load grep.c:2019
    #9 grep_source_is_binary grep.c:2045
    #10 grep_source_1 grep.c:1689
    #11 grep_source grep.c:1897
    #12 run builtin/grep.c:183
    #13 <null> <null>

  Location is heap block of size 408 at 0x7d4c0000de40 allocated by main
thread:
    #0 calloc <null>
    #1 xcalloc.constprop.820 wrapper.c:160
    #2 alloc_alt_odb sha1_file.c:449
    #3 link_alt_odb_entry sha1_file.c:358
    #4 link_alt_odb_entries.lto_priv.1377 sha1_file.c:422
    #5 add_to_alternates_memory sha1_file.c:510
    #6 grep_submodule builtin/grep.c:434
    #7 grep_cache builtin/grep.c:508
    #8 cmd_grep builtin/grep.c:1092
    #9 run_builtin git.c:346
    #10 handle_builtin.lto_priv.1929 git.c:554
    #11 run_argv git.c:606
    #12 cmd_main git.c:683
    #13 main common-main.c:43

  Mutex M6 (0x000000959340) created at:
    #0 pthread_mutex_init <null>
    #1 start_threads builtin/grep.c:204
    #2 cmd_grep builtin/grep.c:1047
    #3 run_builtin git.c:346
    #4 handle_builtin.lto_priv.1929 git.c:554
    #5 run_argv git.c:606
    #6 cmd_main git.c:683
    #7 main common-main.c:43

  Thread T4 (tid=27212, running) created by main thread at:
    #0 pthread_create <null>
    #1 start_threads builtin/grep.c:223
    #2 cmd_grep builtin/grep.c:1047
    #3 run_builtin git.c:346
    #4 handle_builtin.lto_priv.1929 git.c:554
    #5 run_argv git.c:606
    #6 cmd_main git.c:683
    #7 main common-main.c:43

SUMMARY: ThreadSanitizer: data race sha1_file.c:361 link_alt_odb_entry

-- 
2.15.0.rc1.71.g1477d2401


             reply	other threads:[~2017-10-19 19:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-19 19:07 Martin Ågren [this message]
2017-10-19 19:26 ` [PATCH/RFC] grep: turn off threading when recursing submodules Brandon Williams
2017-10-19 19:34   ` Jeff King
2017-10-19 19:50     ` Martin Ågren
2017-11-01 20:45       ` [PATCH v2] grep: take the read-lock when adding a submodule Martin Ågren
2017-11-01 20:55         ` Brandon Williams
2017-11-01 21:11         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171019190703.1534-1-martin.agren@gmail.com \
    --to=martin.agren@gmail.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).