From: Matheus Tavares <matheus.bernardino@usp.br>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, gitster@pobox.com,
jrnieder@gmail.com, olyatelezhnaya@gmail.com, pclouds@gmail.com,
jonathantanmy@google.com, peff@peff.net,
Brandon Williams <bwilliams.eng@gmail.com>,
Stefan Beller <stefanbeller@gmail.com>
Subject: [PATCH v3 08/12] grep: allow submodule functions to run in parallel
Date: Wed, 15 Jan 2020 23:39:56 -0300 [thread overview]
Message-ID: <af8ad95d413aa3d763769eb3ae9544e25ccbe2d1.1579141989.git.matheus.bernardino@usp.br> (raw)
In-Reply-To: <cover.1579141989.git.matheus.bernardino@usp.br>
Now that object reading operations are internally protected, the
submodule initialization functions at builtin/grep.c:grep_submodule()
are very close to being thread-safe. Let's take a look at each call and
remove from the critical section what we can, for better performance:
- submodule_from_path() and is_submodule_active() cannot be called in
parallel yet only because they call repo_read_gitmodules() which
contains, in its call stack, operations that would otherwise be in
race condition with object reading (for example parse_object() and
is_promisor_remote()). However, they only call repo_read_gitmodules()
if it wasn't read before. So let's pre-read it before firing the
threads and allow these two functions to safely be called in
parallel.
- repo_submodule_init() is already thread-safe, so remove it from the
critical section without other necessary changes.
- The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as
no other thread is performing object reading operations in the subrepo
yet. However, threads might be working in the superproject, and this
function calls add_to_alternates_memory() internally, which is racy
with object readings in the superproject. So it must be kept
protected for now. Let's add a "NEEDSWORK" to it, informing why it
cannot be removed from the critical section yet.
- Finally, add_to_alternates_memory() must be kept protected for the
same reason as the item above.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/grep.c | 38 ++++++++++++++++++++++----------------
1 file changed, 22 insertions(+), 16 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index d3ed05c1da..ac3d86c2e5 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -401,25 +401,23 @@ static int grep_submodule(struct grep_opt *opt,
struct grep_opt subopt;
int hit;
- /*
- * NEEDSWORK: submodules functions need to be protected because they
- * call config_from_gitmodules(): the latter contains in its call stack
- * many thread-unsafe operations that are racy with object reading, such
- * as parse_object() and is_promisor_object().
- */
- obj_read_lock();
sub = submodule_from_path(superproject, &null_oid, path);
- if (!is_submodule_active(superproject, path)) {
- obj_read_unlock();
+ if (!is_submodule_active(superproject, path))
return 0;
- }
- if (repo_submodule_init(&subrepo, superproject, sub)) {
- obj_read_unlock();
+ if (repo_submodule_init(&subrepo, superproject, sub))
return 0;
- }
+ /*
+ * NEEDSWORK: repo_read_gitmodules() might call
+ * add_to_alternates_memory() via config_from_gitmodules(). This
+ * operation causes a race condition with concurrent object readings
+ * performed by the worker threads. That's why we need obj_read_lock()
+ * here. It should be removed once it's no longer necessary to add the
+ * subrepo's odbs to the in-memory alternates list.
+ */
+ obj_read_lock();
repo_read_gitmodules(&subrepo, 0);
/*
@@ -1052,6 +1050,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
pathspec.recursive = 1;
pathspec.recurse_submodules = !!recurse_submodules;
+ if (recurse_submodules && (!use_index || untracked))
+ die(_("option not supported with --recurse-submodules"));
+
if (list.nr || cached || show_in_pager) {
if (num_threads > 1)
warning(_("invalid option combination, ignoring --threads"));
@@ -1071,6 +1072,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
&& (opt.pre_context || opt.post_context ||
opt.file_break || opt.funcbody))
skip_first_line = 1;
+
+ /*
+ * Pre-read gitmodules (if not read already) to prevent racy
+ * lazy reading in worker threads.
+ */
+ if (recurse_submodules)
+ repo_read_gitmodules(the_repository, 1);
+
start_threads(&opt);
} else {
/*
@@ -1105,9 +1114,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
}
}
- if (recurse_submodules && (!use_index || untracked))
- die(_("option not supported with --recurse-submodules"));
-
if (!show_in_pager && !opt.status_only)
setup_pager();
--
2.24.1
next prev parent reply other threads:[~2020-01-16 2:41 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-10 20:27 [GSoC][PATCH 0/4] grep: re-enable threads when cached, w/ parallel inflation Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 1/4] object-store: add lock to read_object_file_extended() Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 2/4] grep: allow locks to be enabled individually Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 3/4] grep: disable grep_read_mutex when possible Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 4/4] grep: re-enable threads in some non-worktree cases Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 00/11] grep: improve threading and fix race conditions Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 01/11] grep: fix race conditions on userdiff calls Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 02/11] grep: fix race conditions at grep_submodule() Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 03/11] grep: fix racy calls in grep_objects() Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 04/11] replace-object: make replace operations thread-safe Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 05/11] object-store: allow threaded access to object reading Matheus Tavares
2019-11-12 2:54 ` Jonathan Tan
2019-11-13 5:20 ` Jeff King
2019-11-14 5:57 ` Matheus Tavares Bernardino
2019-11-14 6:01 ` Jeff King
2019-11-14 18:15 ` Jonathan Tan
2019-11-15 4:12 ` Jeff King
2019-12-19 22:27 ` Matheus Tavares Bernardino
2020-01-09 22:02 ` Matheus Tavares Bernardino
2020-01-10 19:07 ` Christian Couder
2019-09-30 1:50 ` [PATCH v2 06/11] grep: replace grep_read_mutex by internal obj read lock Matheus Tavares
2019-10-01 19:23 ` [PATCH] squash! " Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 07/11] submodule-config: add skip_if_read option to repo_read_gitmodules() Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 08/11] grep: allow submodule functions to run in parallel Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 09/11] grep: protect packed_git [re-]initialization Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 10/11] grep: re-enable threads in non-worktree case Matheus Tavares
2019-09-30 1:50 ` [PATCH v2 11/11] grep: move driver pre-load out of critical section Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 00/12] grep: improve threading and fix race conditions Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 01/12] grep: fix race conditions on userdiff calls Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 02/12] grep: fix race conditions at grep_submodule() Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 03/12] grep: fix racy calls in grep_objects() Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 04/12] replace-object: make replace operations thread-safe Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 05/12] object-store: allow threaded access to object reading Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 06/12] grep: replace grep_read_mutex by internal obj read lock Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 07/12] submodule-config: add skip_if_read option to repo_read_gitmodules() Matheus Tavares
2020-01-16 2:39 ` Matheus Tavares [this message]
2020-01-29 11:26 ` [PATCH v3 08/12] grep: allow submodule functions to run in parallel SZEDER Gábor
2020-01-29 18:49 ` Junio C Hamano
2020-01-29 18:57 ` Junio C Hamano
2020-01-29 20:42 ` Matheus Tavares Bernardino
2020-01-30 13:28 ` Philippe Blain
2020-01-16 2:39 ` [PATCH v3 09/12] grep: protect packed_git [re-]initialization Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 10/12] grep: re-enable threads in non-worktree case Matheus Tavares
2020-01-16 2:39 ` [PATCH v3 11/12] grep: move driver pre-load out of critical section Matheus Tavares
2020-01-16 2:40 ` [PATCH v3 12/12] grep: use no. of cores as the default no. of threads Matheus Tavares
2020-01-16 13:11 ` Victor Leschuk
2020-01-16 14:47 ` [PATCH] " Matheus Tavares
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af8ad95d413aa3d763769eb3ae9544e25ccbe2d1.1579141989.git.matheus.bernardino@usp.br \
--to=matheus.bernardino@usp.br \
--cc=bwilliams.eng@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=olyatelezhnaya@gmail.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=stefanbeller@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).