git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Turner <dturner@twosigma.com>
To: git@vger.kernel.org
Cc: peff@peff.net, pclouds@gmail.com, David Turner <dturner@twosigma.com>
Subject: [PATCH v3] gc: ignore old gc.log files
Date: Fri, 10 Feb 2017 14:20:19 -0500	[thread overview]
Message-ID: <20170210192019.13927-1-dturner@twosigma.com> (raw)

Git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.

In this commit, git learns to ignore gc.log files which are older than
(by default) one day old.  It also learns about a config, gc.logExpiry
to manage this.  There is also some cleanup: a successful manual gc,
or a warning-free auto gc with an old log file, will remove any old
gc.log files.

It might still happen that manual intervention is required
(e.g. because the repo is corrupt), but at the very least it won't be
because Git is too dumb to try again.

Automatic gc was intended to make client repositories be
self-maintaining.  It would be good if automatic gc were also useful
to server operators.  A server can end up in a state whre there are
lots of unreferenced loose objects (say, because many users are doing
a bunch of rebasing and pushing their rebased branches). Before this
patch, this state would cause a gc.log file to be created, preventing
future auto gcs.  Then pack files could pile up.  Since many git
operations are O(n) in the number of pack files, this would lead to
poor performance.  Now, the pack files will get cleaned up, if
necessary, at least once per day.  And operators who find a need for
more-frequent gcs can adjust gc.logExpiry to meet their needs.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt |  6 ++++
 builtin/gc.c             | 75 +++++++++++++++++++++++++++++++++++++++++++-----
 t/t6500-gc.sh            | 13 +++++++++
 3 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fc5a28a32..a684b7e3e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1402,6 +1402,12 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.logExpiry::
+	If the file gc.log exists, then `git gc --auto` won't run
+	unless that file is more than 'gc.logExpiry' old.  Default is
+	"1.day".  See `gc.pruneExpire` for more ways to specify its
+	value.
+
 gc.packRefs::
 	Running `git pack-refs` in a repository renders it
 	unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/builtin/gc.c b/builtin/gc.c
index 331f21926..6f297aa91 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -33,6 +33,7 @@ static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
+static unsigned long gc_log_expire_time;
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 
@@ -76,10 +77,47 @@ static void git_config_date_string(const char *key, const char **output)
 static void process_log_file(void)
 {
 	struct stat st;
-	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
+	if (fstat(get_lock_file_fd(&log_lock), &st)) {
+		if (errno == ENOENT) {
+			/*
+			 * The user has probably intentionally deleted
+			 * gc.log.lock (perhaps because they're blowing
+			 * away the whole repo), so thre's no need to
+			 * report anything here.  But we also won't
+			 * delete gc.log, because we don't know what
+			 * the user's intentions are.
+			 */
+		} else {
+			FILE *fp;
+			int fd;
+			int saved_errno = errno;
+			/*
+			 * Perhaps there was an i/o error or another
+			 * unlikely situation.  Try to make a note of
+			 * this in gc.log.  If this fails again,
+			 * give up and leave gc.log as it was.
+			 */
+			rollback_lock_file(&log_lock);
+			fd = hold_lock_file_for_update(&log_lock,
+						       git_path("gc.log"),
+						       LOCK_DIE_ON_ERROR);
+
+			fp = fdopen(fd, "w");
+			fprintf(fp, _("Failed to fstat %s: %s"),
+				get_tempfile_path(&log_lock.tempfile),
+				strerror(errno));
+			fclose(fp);
+			commit_lock_file(&log_lock);
+		}
+
+	} else if (st.st_size) {
+		/* There was some error recorded in the lock file */
 		commit_lock_file(&log_lock);
-	else
+	} else {
+		/* No error, clean up any old gc.log */
+		unlink(git_path("gc.log"));
 		rollback_lock_file(&log_lock);
+	}
 }
 
 static void process_log_file_at_exit(void)
@@ -113,6 +151,9 @@ static void gc_config(void)
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_date_string("gc.pruneexpire", &prune_expire);
 	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
+	if (!git_config_get_value("gc.logexpiry", &value))
+		parse_expiry_date(value, &gc_log_expire_time);
+
 	git_config(git_default_config, NULL);
 }
 
@@ -290,19 +331,34 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 static int report_last_gc_error(void)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
+	int ret = 0;
+	struct stat st;
+	char *gc_log_path = git_pathdup("gc.log");
+
+	if (stat(gc_log_path, &st)) {
+		if (errno == ENOENT)
+			goto done;
+
+		ret = error_errno(_("Can't stat %s"), gc_log_path);
+		goto done;
+	}
+
+	if (st.st_mtime < gc_log_expire_time)
+		goto done;
 
-	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
+	ret = strbuf_read_file(&sb, gc_log_path, 0);
 	if (ret > 0)
-		return error(_("The last gc run reported the following. "
+		ret = error(_("The last gc run reported the following. "
 			       "Please correct the root cause\n"
 			       "and remove %s.\n"
 			       "Automatic cleanup will not be performed "
 			       "until the file is removed.\n\n"
 			       "%s"),
-			     git_path("gc.log"), sb.buf);
+			    gc_log_path, sb.buf);
 	strbuf_release(&sb);
-	return 0;
+done:
+	free(gc_log_path);
+	return ret;
 }
 
 static int gc_before_repack(void)
@@ -349,6 +405,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	argv_array_pushl(&prune_worktrees, "worktree", "prune", "--expire", NULL);
 	argv_array_pushl(&rerere, "rerere", "gc", NULL);
 
+	/* default expiry time, overwritten in gc_config */
+	parse_expiry_date("1.day", &gc_log_expire_time);
 	gc_config();
 
 	if (pack_refs < 0)
@@ -448,5 +506,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 
+	if (!daemonized)
+		unlink(git_path("gc.log"));
+
 	return 0;
 }
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 1762dfa6a..84ad07eb2 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,5 +67,18 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_line_count = 2 new # There is one new pack and its .idx
 '
 
+test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&
+	test_commit foo &&
+	test_commit bar &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+	echo fleem >.git/gc.log &&
+	test_must_fail git gc --auto 2>err &&
+	test_i18ngrep "^error:" err &&
+	test-chmtime =-86401 .git/gc.log &&
+	git gc --auto
+'
 
 test_done
-- 
2.11.GIT


             reply	other threads:[~2017-02-10 19:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-10 19:20 David Turner [this message]
2017-02-10 19:47 ` [PATCH v3] gc: ignore old gc.log files Junio C Hamano
2017-02-10 19:56   ` Junio C Hamano
2017-02-10 20:08 ` Jeff King
2017-02-10 20:44   ` David Turner
2017-02-10 20:51     ` Jeff King
2017-02-10 21:04       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170210192019.13927-1-dturner@twosigma.com \
    --to=dturner@twosigma.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).