git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Michael Haggerty" <mhagger@alum.mit.edu>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [RFC/PATCH] gc: run more pre-detach operations under lock
Date: Wed, 19 Jun 2019 12:26:00 +0200	[thread overview]
Message-ID: <20190619102601.24913-1-avarab@gmail.com> (raw)
In-Reply-To: <20190619094630.32557-1-pclouds@gmail.com>

Continue the work started in c45af94dbc ("gc: run pre-detach
operations under lock", 2017-07-11).

Now we'll take the lock before we print out "Auto packing the
repository[...]", and we'll optimistically tolerate a locking failure
at that point instead of dying.

This (mostly) solves two issues:

 1) Fetching in parallel from multiple remotes (or other
    parallel/concurrent "gc --auto") would, when the need_to_gc()
    heuristic fired, emit a bunch of "Auto packing the repository in
    background" messages. See [1] for a description of a "pfetch"
    alias that can trigger such an issue. Now we'll (usually) only
    print one of these messages.

 2) Because we'd call hold_lock_file_for_update() with
    "LOCK_DIE_ON_ERROR" such concurrent "gc" would error out with
    "Another git process seems to be running[...]".

    Now we'll (usually) avoid that by saying that "gc --auto" locking
    isn't so important. We tolerate a lock failure on "gc --auto" with
    the assumption that a "gc --auto" runs frequently enough that any
    one such invocation can fail, whereas without "--auto" we'll error
    on failure to acquire the lock

Why "mostly" and "usually"? There's still unaddressed caveats with
these two, respectively:

 1) The gc.pid lock is only held while the "git gc" runs. Thus a
    "pfetch" operation as described in [1] might start N jobs, one of
    which runs "gc", finishes, and then "gc --auto" finds it needs to
    run again in the context of a single (from the user's perspective)
    git command.

    I don't think that's worth dealing with. The cases where we'll
    find we need to take action "gc --auto" right after it's finished
    are rare (see e.g. [2]).

 2) There's still the unresolved race condition noted in c45af94dbc
    where we "hand off" the lock to the child process we fork under
    gc.autoDetach=true. It means we might start another "gc --auto" if
    we're so unlucky as to make the check and get the lock in the time
    the "real" earlier "gc --auto" parent/child process takes to
    "unlock() && fork() && lock()".

    Fixing that is outside the scope of this change. It's fixable by
    refactoring the lockfile.c code and daemonize() so that we'd e.g.:

      1. parent: lock() in parent, write PID to gc.pid
      2. parent: fork() the child process
      3. child: spin for X amount of time waiting until gc.pid lists
         our PID, not our parent's
      4. parent: amend the gc.pid lock to
         s/PARENT_PID/CHILD_PID/ (in-place rename(gc.pid.new,
         gc.pid)), fsync(gc.pid) and exit()
      5. child: notices updated gc.pid, proceeds with its gc. Does
         delete_tempfile(gc.pid) before exit()
      6. parent: Once lock is handed over (child trusted to poll the
         lock) exit() without delete_tempfile(gc.pid)

    Using some cross-process lock handover like that we could
    guarantee that there's never a point at which an outside process
    could get the gc.pid lock, that the PID written in the file always
    referred to a live process, and with #3 reasonably
    guarantee (aside from pathological scheduling shenanigans) that we
    don't blindly hand the PID off to a process that's already
    finished with its gc.

Testing for this new behavior is hard.

1. https://public-inbox.org/git/20170712200054.mxcabiyttijpbkbb@sigill.intra.peff.net/ (should
   be https://public-inbox.org/git/87bmopzbqx.fsf@gmail.com/ but vger
   seems to not have gotten a copy)
2. https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

This patch is part of a WIP branch I have that's a bit of a mess. It's
more-gc-detach-under-lock on github.com/avar/git.git. It doesn't apply
on master because it relies on some previous test work, but for RFC
purposes I figured it was better to send it stand-alone.

But I think this sort of approach is better than Duy's proposed patch,
because...

On Wed, Jun 19 2019, Nguyễn Thái Ngọc Duy wrote:

> So let's try to avoid that. We should only need one 'gc' run after all
> objects and references are added anyway. Add a new option --no-auto-gc
> that will be used by those n-1 processes. 'gc --auto' will always run on
> the main fetch process (*).
>
> (*) even if we fetch remotes in parallel at some point in future, this
>     should still be fine because we should "join" all those processes
>     before this step.

This is what I'm trying to fix in my version of this patch. This is
only true for yours if you assume that the user is going to be
invoking "fetch" in a single terminal window, IOW that we have an
implicit global mutex of one top-level git command at a time.

Wheras mine fixes e.g. the same issue for:

    parallel 'git fetch {}' ::: $(git remote)

Ditto for you running a "git" command and your editor running a
"fetch" at the same time.

A similar "one terminal" assumption was made when changing the
auto-detach behavior in your 62aad1849f ("gc --auto: do not lock refs
in the background", 2014-05-25).

To be clear, I think that (and your patch here) would (mostly) be an
improvement. I just wonder if we can do better.

I got stuck with this WIP series of mine because while it's mostly
ready sans the 'Why "mostly" and "usually"' caveats mentioned in my
commit message, I wondered if we couldn't do much better by:

 a) Do some version of reverting your 62aad1849f, this would entirely
    get rid of this need for handing off a lock to a child (noted
    above), but as-is would have e.g. "commit" run into a *.lock.

    Is that mitigated by 4ff0f01cb7 ("refs: retry acquiring reference
    locks for 100ms", 2017-08-21)? Or could we have "gc" run in some
    "lock-less" mode where it does all the work of expiring a
    ref/reflog at a given OID, and if it changed just tries again
    without needing to *.lock it except for the brief period of
    changing the already prepared file?

 b) If we couldn't do "a" for whatever reason implement some "locking
    priority" where a not-gc command could create a file (which "gc"
    would poll) saying "I want it for real work, release it!", or
    alternatively "gc" would create "*.locked-by-gc" in addition to
    "*.lock" files, and if the "*.locked-by-gc" was seen when the
    100ms retry would kick in, we'd make that say 10 seconds instead
    of 100 ms.

 builtin/gc.c  | 19 ++++++++++++-------
 lockfile.c    |  2 ++
 lockfile.h    |  8 +++++++-
 t/t6500-gc.sh |  8 ++++----
 4 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index d12316fa48..2cd5803e86 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -391,7 +391,7 @@ static int need_to_gc(void)
 }
 
 /* return NULL on success, else hostname running the gc */
-static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
+static const char *lock_repo_for_gc(int force, pid_t* ret_pid, int die_on_error)
 {
 	struct lock_file lock = LOCK_INIT;
 	char my_host[HOST_NAME_MAX + 1];
@@ -401,13 +401,17 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 	FILE *fp;
 	int fd;
 	char *pidfile_path;
+	int flags = die_on_error ? LOCK_DIE_ON_ERROR : LOCK_QUIET_ON_ERROR;
 
 	if (xgethostname(my_host, sizeof(my_host)))
 		xsnprintf(my_host, sizeof(my_host), "unknown");
 
 	pidfile_path = git_pathdup("gc.pid");
-	fd = hold_lock_file_for_update(&lock, pidfile_path,
-				       LOCK_DIE_ON_ERROR);
+	fd = hold_lock_file_for_update(&lock, pidfile_path, flags);
+	if (fd < 0) {
+		assert(!die_on_error); /* should die in unable_to_lock_die() */
+		return pidfile_path;
+	}
 	if (!force) {
 		static char locking_host[HOST_NAME_MAX + 1];
 		static char *scan_fmt;
@@ -533,7 +537,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	int auto_gc = 0;
 	int quiet = -1;
 	int force = 0;
-	const char *name;
+	const char *name = NULL;
 	pid_t pid;
 	int daemonized = 0;
 	int keep_base_pack = -1;
@@ -599,6 +603,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		 */
 		if (!need_to_gc())
 			return 0;
+		if (lock_repo_for_gc(force, &pid, 0))
+			return 0;
 		if (!quiet) {
 			if (detach_auto)
 				fprintf(stderr, _("Auto packing the repository in background for optimum performance.\n"));
@@ -616,8 +622,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 				/* Last gc --auto failed. Skip this one. */
 				return 0;
 
-			if (lock_repo_for_gc(force, &pid))
-				return 0;
 			gc_before_repack(); /* dies on failure */
 			git_test_sleep("GIT_TEST_GC_SLEEP_PRE_FORK");
 			delete_tempfile(&pidfile);
@@ -644,7 +648,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		string_list_clear(&keep_pack, 0);
 	}
 
-	name = lock_repo_for_gc(force, &pid);
+	if (!pidfile)
+		name = lock_repo_for_gc(force, &pid, 1);
 	if (name) {
 		if (auto_gc)
 			return 0; /* be quiet on --auto */
diff --git a/lockfile.c b/lockfile.c
index 8e8ab4f29f..2a86d3a656 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -173,6 +173,8 @@ int hold_lock_file_for_update_timeout(struct lock_file *lk, const char *path,
 				      int flags, long timeout_ms)
 {
 	int fd = lock_file_timeout(lk, path, flags, timeout_ms);
+	if (flags & LOCK_QUIET_ON_ERROR)
+		return fd;
 	if (fd < 0) {
 		if (flags & LOCK_DIE_ON_ERROR)
 			unable_to_lock_die(path, errno);
diff --git a/lockfile.h b/lockfile.h
index 9843053ce8..dc366d70cc 100644
--- a/lockfile.h
+++ b/lockfile.h
@@ -135,10 +135,16 @@ struct lock_file {
 
 /*
  * ... this flag can be passed instead to return -1 and give the usual
- * error message upon an error.
+ * error message upon an error, or ...
  */
 #define LOCK_REPORT_ON_ERROR 4
 
+/*
+ * ... just be quiet and let the caller handle it by checking the
+ * return return value.
+ */
+#define LOCK_QUIET_ON_ERROR 8
+
 /*
  * Usually symbolic links in the destination path are resolved. This
  * means that (1) the lockfile is created by adding ".lock" to the
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 363ca18980..4dea98f1c3 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -175,14 +175,14 @@ test_racy_gc_auto () {
 	"
 }
 
-test_racy_gc_auto failure false N/A N/A N/A
+test_racy_gc_auto success false N/A N/A N/A
 for fork_works in true false
 do
 	for sleep_before_fork in 0 1
 	do
 		for sleep_before_fork_no_lock in 0 1
 		do
-			test_racy_gc_auto failure true $sleep_before_fork $sleep_before_fork_no_lock $fork_works
+			test_racy_gc_auto success true $sleep_before_fork $sleep_before_fork_no_lock $fork_works
 		done
 	done
 done
@@ -204,8 +204,8 @@ test_racy_faked_gc_auto () {
 		test_line_count = 0 errors
 	"
 }
-test_racy_faked_gc_auto failure true
-test_racy_faked_gc_auto failure false
+test_racy_faked_gc_auto success true
+test_racy_faked_gc_auto success false
 
 run_and_wait_for_auto_gc () {
 	# We read stdout from gc for the side effect of waiting until the
-- 
2.22.0.rc1.257.g3120a18244


  reply	other threads:[~2019-06-19 10:26 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19  9:46 [PATCH] fetch: only run 'gc' once when fetching multiple remotes Nguyễn Thái Ngọc Duy
2019-06-19 10:26 ` Ævar Arnfjörð Bjarmason [this message]
2019-06-19 12:51   ` [RFC/PATCH] gc: run more pre-detach operations under lock Duy Nguyen
2019-06-19 18:01     ` Ævar Arnfjörð Bjarmason
2019-06-19 19:10       ` Jeff King
2019-06-19 22:49         ` Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 0/6] Change <non-empty?> GIT_TEST_* variables to <boolean> Ævar Arnfjörð Bjarmason
2019-06-20 18:13             ` Junio C Hamano
2019-06-20 21:00               ` Ævar Arnfjörð Bjarmason
2019-06-20 20:03             ` Junio C Hamano
2019-06-20 21:09             ` [PATCH v2 0/8] " Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 1/8] config tests: simplify include cycle test Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 2/8] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-21 17:07                 ` Junio C Hamano
2019-06-21 10:18               ` [PATCH v3 3/8] config.c: refactor die_bad_number() to not call gettext() early Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 4/8] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 5/8] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-24 16:47                 ` Junio C Hamano
2019-06-21 10:18               ` [PATCH v3 6/8] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-21 10:18               ` [PATCH v3 7/8] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-09-06 12:13                 ` [PATCH 1/2] t/lib-git-svn.sh: check GIT_TEST_SVN_HTTPD when running SVN HTTP tests SZEDER Gábor
2019-09-06 12:13                   ` [PATCH 2/2] ci: restore running httpd tests SZEDER Gábor
2019-09-06 17:03                     ` Junio C Hamano
2019-09-06 19:13                     ` Jeff King
2019-09-07 10:16                       ` SZEDER Gábor
2019-11-22 13:14                         ` [PATCH 0/2] tests: catch non-bool GIT_TEST_* values SZEDER Gábor
2019-11-22 13:14                           ` [PATCH 1/2] tests: add 'test_bool_env' to " SZEDER Gábor
2019-11-25 13:50                             ` Jeff King
2019-11-22 13:14                           ` [PATCH 2/2] t5608-clone-2gb.sh: turn GIT_TEST_CLONE_2GB into a bool SZEDER Gábor
2019-11-25 13:53                             ` Jeff King
2019-06-21 10:18               ` [PATCH v3 8/8] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 1/8] config tests: simplify include cycle test Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 2/8] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-20 22:11               ` Junio C Hamano
2019-06-20 22:21                 ` Junio C Hamano
2019-06-21  8:11                   ` Ævar Arnfjörð Bjarmason
2019-06-21 15:04                     ` Junio C Hamano
2019-06-20 21:09             ` [PATCH v2 3/8] config.c: refactor die_bad_number() to not call gettext() early Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 4/8] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 5/8] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 6/8] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 7/8] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-06-20 21:09             ` [PATCH v2 8/8] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 1/6] env--helper: new undocumented builtin wrapping git_env_*() Ævar Arnfjörð Bjarmason
2019-06-20 19:25             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 2/6] t6040 test: stop using global "script" variable Ævar Arnfjörð Bjarmason
2019-06-20 19:54             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 3/6] tests: make GIT_TEST_GETTEXT_POISON a boolean Ævar Arnfjörð Bjarmason
2019-06-20 20:00             ` Junio C Hamano
2019-06-19 23:30           ` [PATCH 4/6] tests README: re-flow a previously changed paragraph Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 5/6] tests: replace test_tristate with "git env--helper" Ævar Arnfjörð Bjarmason
2019-06-19 23:30           ` [PATCH 6/6] tests: make GIT_TEST_FAIL_PREREQS a boolean Ævar Arnfjörð Bjarmason
2019-06-20 10:26           ` [RFC/PATCH] gc: run more pre-detach operations under lock Duy Nguyen
2019-06-20 21:49             ` Ævar Arnfjörð Bjarmason
2019-06-20 10:43           ` Jeff King
2019-06-20 18:55         ` Junio C Hamano
2019-06-19 19:08     ` Jeff King
2019-06-19 18:59 ` [PATCH] fetch: only run 'gc' once when fetching multiple remotes Jeff King
2019-06-20 10:11   ` Duy Nguyen
2019-06-20 10:21     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190619102601.24913-1-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).