git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matt McCutchen <matt@mattmccutchen.net>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>
Subject: [PATCH] git-gc.txt: expand discussion of races with other processes
Date: Tue, 15 Nov 2016 14:08:51 -0500	[thread overview]
Message-ID: <1479237042.2406.89.camel@mattmccutchen.net> (raw)
In-Reply-To: <20161115174028.zvohfcw4jse3jrmm@sigill.intra.peff.net>

In general, "git gc" may delete objects that another concurrent process
is using but hasn't created a reference to.  Git has some mitigations,
but they fall short of a complete solution.  Document this in the
git-gc(1) man page and add a reference from the documentation of the
gc.pruneExpire config variable.

Based on a write-up by Jeff King:

http://marc.info/?l=git&m=147922960131779&w=2

Signed-off-by: Matt McCutchen <matt@mattmccutchen.net>
---
 Documentation/config.txt |  4 +++-
 Documentation/git-gc.txt | 34 ++++++++++++++++++++++++++--------
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 21fdddf..3f1d931 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1409,7 +1409,9 @@ gc.pruneExpire::
 	Override the grace period with this config variable.  The value
 	"now" may be used to disable this grace period and always prune
 	unreachable objects immediately, or "never" may be used to
-	suppress pruning.
+	suppress pruning.  This feature helps prevent corruption when
+	'git gc' runs concurrently with another process writing to the
+	repository; see the "NOTES" section of linkgit:git-gc[1].
 
 gc.worktreePruneExpire::
 	When 'git gc' is run, it calls
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index bed60f4..852b72c 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -63,11 +63,10 @@ automatic consolidation of packs.
 --prune=<date>::
 	Prune loose objects older than date (default is 2 weeks ago,
 	overridable by the config variable `gc.pruneExpire`).
-	--prune=all prunes loose objects regardless of their age (do
-	not use --prune=all unless you know exactly what you are doing.
-	Unless the repository is quiescent, you will lose newly created
-	objects that haven't been anchored with the refs and end up
-	corrupting your repository).  --prune is on by default.
+	--prune=all prunes loose objects regardless of their age and
+	increases the risk of corruption if another process is writing to
+	the repository concurrently; see "NOTES" below. --prune is on by
+	default.
 
 --no-prune::
 	Do not prune any loose objects.
@@ -138,17 +137,36 @@ default is "2 weeks ago".
 Notes
 -----
 
-'git gc' tries very hard to be safe about the garbage it collects. In
+'git gc' tries very hard not to delete objects that are referenced
+anywhere in your repository. In
 particular, it will keep not only objects referenced by your current set
 of branches and tags, but also objects referenced by the index,
 remote-tracking branches, refs saved by 'git filter-branch' in
 refs/original/, or reflogs (which may reference commits in branches
 that were later amended or rewound).
-
-If you are expecting some objects to be collected and they aren't, check
+If you are expecting some objects to be deleted and they aren't, check
 all of those locations and decide whether it makes sense in your case to
 remove those references.
 
+On the other hand, when 'git gc' runs concurrently with another process,
+there is a risk of it deleting an object that the other process is using
+but hasn't created a reference to. This may just cause the other process
+to fail or may corrupt the repository if the other process later adds a
+reference to the deleted object. Git has two features that significantly
+mitigate this problem:
+
+. Any object with modification time newer than the `--prune` date is kept,
+  along with everything reachable from it.
+
+. Most operations that add an object to the database update the
+  modification time of the object if it is already present so that #1
+  applies.
+
+However, these features fall short of a complete solution, so users who
+run commands concurrently have to live with some risk of corruption (which
+seems to be low in practice) unless they turn off automatic garbage
+collection with 'git config gc.auto 0'.
+
 HOOKS
 -----
 
-- 
2.7.4



  reply	other threads:[~2016-11-15 19:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-15 14:13 Protecting old temporary objects being reused from concurrent "git gc"? Matt McCutchen
2016-11-15 17:06 ` Jeff King
2016-11-15 17:33   ` Matt McCutchen
2016-11-15 17:40     ` Jeff King
2016-11-15 19:08       ` Matt McCutchen [this message]
2016-11-15 19:12       ` Matt McCutchen
2016-11-15 20:01       ` Junio C Hamano
2016-11-16  8:07         ` Jeff King
2016-11-16 18:18           ` Junio C Hamano
2016-11-16 18:58       ` Junio C Hamano
2016-11-17  1:04         ` Jeff King
2016-11-17  1:35           ` Re* " Junio C Hamano
2016-11-17  1:43             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1479237042.2406.89.camel@mattmccutchen.net \
    --to=matt@mattmccutchen.net \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).