git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Matheus Tavares <matheus.bernardino@usp.br>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, olyatelezhnaya@gmail.com,
	pclouds@gmail.com, gitster@pobox.com, jrnieder@gmail.com,
	Brandon Williams <bwilliams.eng@gmail.com>,
	Stefan Beller <stefanbeller@gmail.com>, Jeff King <peff@peff.net>
Subject: [PATCH v2 08/11] grep: allow submodule functions to run in parallel
Date: Sun, 29 Sep 2019 22:50:54 -0300
Message-ID: <8070e154070e6eaef6f31077d15349e74147ff84.1569808052.git.matheus.bernardino@usp.br> (raw)
In-Reply-To: <cover.1569808052.git.matheus.bernardino@usp.br>

Now that object reading operations are internally protected, the
submodule initialization functions at builtin/grep.c:grep_submodule()
are very close to being thread-safe. Let's take a look at each call and
remove from the critical section what we can, for better performance:

- submodule_from_path() and is_submodule_active() cannot be called in
  parallel yet only because they call repo_read_gitmodules() which
  contains, in its call stack, operations that would otherwise be in
  race condition with object reading (for example parse_object() and
  is_promisor_remote()). However, they only call repo_read_gitmodules()
  if it wasn't read before. So let's pre-read it before firing the
  threads and allow these two functions to safelly be called in
  parallel.

- repo_submodule_init() is already thread-safe, so remove it from the
  critical section without other necessary changes.

- The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as
  no other thread is performing object reading operations in the subrepo
  yet. However, threads might be working in the superproject, and this
  function calls add_to_alternates_memory() internally, which is racy
  with object readings in the superproject. So it must be kept
  protected for now. Let's add a "NEEDSWORK" to it, informing why it
  cannot be removed from the critical section yet.

- Finally, add_to_alternates_memory() must be kept protected by the same
  reason of the above item.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/grep.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 1c4ff4a75f..c973ac46a7 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -401,25 +401,23 @@ static int grep_submodule(struct grep_opt *opt,
 	struct grep_opt subopt;
 	int hit;
 
-	/*
-	 * NEEDSWORK: submodules functions need to be protected because they
-	 * call config_from_gitmodules(): the latter contains in its call stack
-	 * many thread-unsafe operations that are racy with object reading, such
-	 * as parse_object() and is_promisor_object().
-	 */
-	obj_read_lock();
 	sub = submodule_from_path(superproject, &null_oid, path);
 
-	if (!is_submodule_active(superproject, path)) {
-		obj_read_unlock();
+	if (!is_submodule_active(superproject, path))
 		return 0;
-	}
 
-	if (repo_submodule_init(&subrepo, superproject, sub)) {
-		obj_read_unlock();
+	if (repo_submodule_init(&subrepo, superproject, sub))
 		return 0;
-	}
 
+	/*
+	 * NEEDSWORK: repo_read_gitmodules() might call
+	 * add_to_alternates_memory() via config_from_gitmodules(). This
+	 * operation causes a race condition with concurrent object readings
+	 * performed by the worker threads. That's why we need obj_read_lock()
+	 * here. It should be removed once it's no longer necessary to add the
+	 * subrepo's odbs to the in-memory alternates list.
+	 */
+	obj_read_lock();
 	repo_read_gitmodules(&subrepo, 0);
 
 	/*
@@ -1052,6 +1050,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.recursive = 1;
 	pathspec.recurse_submodules = !!recurse_submodules;
 
+	if (recurse_submodules && (!use_index || untracked))
+		die(_("option not supported with --recurse-submodules"));
+
 	if (list.nr || cached || show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
@@ -1071,6 +1072,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		    && (opt.pre_context || opt.post_context ||
 			opt.file_break || opt.funcbody))
 			skip_first_line = 1;
+
+		/*
+		 * Pre-read gitmodules (if not read already) to prevent racy
+		 * lazy reading in worker threads.
+		 */
+		if (recurse_submodules)
+			repo_read_gitmodules(the_repository, 1);
+
 		start_threads(&opt);
 	} else {
 		/*
@@ -1105,9 +1114,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked))
-		die(_("option not supported with --recurse-submodules"));
-
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
-- 
2.23.0


  parent reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-10 20:27 [GSoC][PATCH 0/4] grep: re-enable threads when cached, w/ parallel inflation Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 1/4] object-store: add lock to read_object_file_extended() Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 2/4] grep: allow locks to be enabled individually Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 3/4] grep: disable grep_read_mutex when possible Matheus Tavares
2019-08-10 20:27 ` [GSoC][PATCH 4/4] grep: re-enable threads in some non-worktree cases Matheus Tavares
2019-09-30  1:50 ` [PATCH v2 00/11] grep: improve threading and fix race conditions Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 01/11] grep: fix race conditions on userdiff calls Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 02/11] grep: fix race conditions at grep_submodule() Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 03/11] grep: fix racy calls in grep_objects() Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 04/11] replace-object: make replace operations thread-safe Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 05/11] object-store: allow threaded access to object reading Matheus Tavares
2019-11-12  2:54     ` Jonathan Tan
2019-11-13  5:20       ` Jeff King
2019-11-14  5:57         ` Matheus Tavares Bernardino
2019-11-14  6:01           ` Jeff King
2019-11-14 18:15             ` Jonathan Tan
2019-11-15  4:12               ` Jeff King
2019-09-30  1:50   ` [PATCH v2 06/11] grep: replace grep_read_mutex by internal obj read lock Matheus Tavares
2019-10-01 19:23     ` [PATCH] squash! " Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 07/11] submodule-config: add skip_if_read option to repo_read_gitmodules() Matheus Tavares
2019-09-30  1:50   ` Matheus Tavares [this message]
2019-09-30  1:50   ` [PATCH v2 09/11] grep: protect packed_git [re-]initialization Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 10/11] grep: re-enable threads in non-worktree case Matheus Tavares
2019-09-30  1:50   ` [PATCH v2 11/11] grep: move driver pre-load out of critical section Matheus Tavares

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8070e154070e6eaef6f31077d15349e74147ff84.1569808052.git.matheus.bernardino@usp.br \
    --to=matheus.bernardino@usp.br \
    --cc=bwilliams.eng@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git