git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>
Subject: [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default
Date: Fri, 24 Sep 2021 14:46:13 -0400	[thread overview]
Message-ID: <YU4c9S6jypbS4YCK@coredump.intra.peff.net> (raw)
In-Reply-To: <YU4ZOF9+ubmoItmK@coredump.intra.peff.net>

The original point of the GIT_REF_PARANOIA flag was to include broken
refs in iterations, so that possibly-destructive operations would not
silently ignore them (and would generally instead try to operate on the
oids and fail when the objects could not be accessed).

We already turned this on by default for some dangerous operations, like
"repack -ad" (where missing a reachability tip would mean dropping the
associated history). But it was not on for general use, even though it
could easily result in the spreading of corruption (e.g., imagine
cloning a repository which simply omits some of its refs because
their objects are missing; the result quietly succeeds even though you
did not clone everything!).

This patch turns on GIT_REF_PARANOIA by default. So a clone as mentioned
above would actually fail (upload-pack tells us about the broken ref,
and when we ask for the objects, pack-objects fails to deliver them).
This may be inconvenient when working with a corrupted repository, but:

  - we are better off to err on the side of complaining about
    corruption, and then provide mechanisms for explicitly loosening
    safety.

  - this is only one type of corruption anyway. If we are missing any
    other objects in the history that _aren't_ ref tips, then we'd
    behave similarly (happily show the ref, but then barf when we
    started traversing).

We retain the GIT_REF_PARANOIA variable, but simply default it to "1"
instead of "0". That gives the user an escape hatch for loosening this
when working with a corrupt repository. It won't work across a remote
connection to upload-pack (because we can't necessarily set environment
variables on the remote), but there the client has other options (e.g.,
choosing which refs to fetch).

As a bonus, this also makes ref iteration faster in general (because we
don't have to call has_object_file() for each ref), though probably not
noticeably so in the general case. In a repo with a million refs, it
shaved a few hundred milliseconds off of upload-pack's advertisement;
that's noticeable, but most repos are not nearly that large.

The possible downside here is that any operation which iterates refs but
doesn't ever open their objects may now quietly claim to have X when the
object is corrupted (e.g., "git rev-list new-branch --not --all" will
treat a broken ref as uninteresting). But again, that's not really any
different than corruption below the ref level. We might have
refs/heads/old-branch as non-corrupt, but we are not actively checking
that we have the entire reachable history. Or the pointed-to object
could even be corrupted on-disk (but our "do we have it" check would
still succeed). In that sense, this is merely bringing ref-corruption in
line with general object corruption.

One alternative implementation would be to actually check for broken
refs, and then _immediately die_ if we see any. That would cause the
"rev-list --not --all" case above to abort immediately. But in many ways
that's the worst of all worlds:

  - it still spends time looking up the objects an extra time

  - it still doesn't catch corruption below the ref level

  - it's even more inconvenient; with the current implementation of
    GIT_REF_PARANOIA for something like upload-pack, we can make
    the advertisement and let the client choose a non-broken piece of
    history. If we bail as soon as we see a broken ref, they cannot even
    see the advertisement.

The test changes here show some of the fallout. A non-destructive "git
repack -adk" now fails by default (but we can override it). Deleting a
broken ref now actually tells the hooks the correct "before" state,
rather than a confusing null oid.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git.txt       | 19 ++++++++++---------
 refs.c                      |  2 +-
 t/t5312-prune-corruption.sh | 10 ++++++++--
 t/t5516-fetch-push.sh       |  7 ++++---
 4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index abace9eac2..d63c65e67d 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -867,15 +867,16 @@ for full details.
 	end user, to be recorded in the body of the reflog.
 
 `GIT_REF_PARANOIA`::
-	If set to `1`, include broken or badly named refs when iterating
-	over lists of refs. In a normal, non-corrupted repository, this
-	does nothing. However, enabling it may help git to detect and
-	abort some operations in the presence of broken refs. Git sets
-	this variable automatically when performing destructive
-	operations like linkgit:git-prune[1]. You should not need to set
-	it yourself unless you want to be paranoid about making sure
-	an operation has touched every ref (e.g., because you are
-	cloning a repository to make a backup).
+	If set to `0`, ignore broken or badly named refs when iterating
+	over lists of refs. Normally Git will try to include any such
+	refs, which may cause some operations to fail. This is usually
+	preferable, as potentially destructive operations (e.g.,
+	linkgit:git-prune[1]) are better off aborting rather than
+	ignoring broken refs (and thus considering the history they
+	point to as not worth saving). The default value is `1` (i.e.,
+	be paranoid about detecting and aborting all operations). You
+	should not normally need to set this to `0`, but it may be
+	useful when trying to salvage data from a corrupted repository.
 
 `GIT_ALLOW_PROTOCOL`::
 	If set to a colon-separated list of protocols, behave as if
diff --git a/refs.c b/refs.c
index e72510813e..ac19c689fa 100644
--- a/refs.c
+++ b/refs.c
@@ -1420,7 +1420,7 @@ struct ref_iterator *refs_ref_iterator_begin(
 
 	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
 		if (ref_paranoia < 0)
-			ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
+			ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 1);
 		if (ref_paranoia) {
 			flags |= DO_FOR_EACH_INCLUDE_BROKEN;
 			flags |= DO_FOR_EACH_OMIT_DANGLING_SYMREFS;
diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index d8ec5a7462..ea889c088a 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -49,11 +49,17 @@ test_expect_success 'put bogus object into pack' '
 	git cat-file -e $bogus
 '
 
-test_expect_success 'non-destructive repack ignores bogus name' '
+test_expect_success 'non-destructive repack bails on bogus ref' '
 	create_bogus_ref &&
-	git repack -adk
+	test_must_fail git repack -adk
 '
 
+test_expect_success 'GIT_REF_PARANOIA=0 overrides safety' '
+	create_bogus_ref &&
+	GIT_REF_PARANOIA=0 git repack -adk
+'
+
+
 test_expect_success 'destructive repack keeps packed object' '
 	create_bogus_ref &&
 	test_must_fail git repack -Ad --unpack-unreachable=now &&
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index b13553ecf4..8212ca56dc 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -707,20 +707,21 @@ test_expect_success 'pushing valid refs triggers post-receive and post-update ho
 
 test_expect_success 'deleting dangling ref triggers hooks with correct args' '
 	mk_test_with_hooks testrepo heads/branch &&
+	orig=$(git -C testrepo rev-parse refs/heads/branch) &&
 	rm -f testrepo/.git/objects/??/* &&
 	git push testrepo :refs/heads/branch &&
 	(
 		cd testrepo/.git &&
 		cat >pre-receive.expect <<-EOF &&
-		$ZERO_OID $ZERO_OID refs/heads/branch
+		$orig $ZERO_OID refs/heads/branch
 		EOF
 
 		cat >update.expect <<-EOF &&
-		refs/heads/branch $ZERO_OID $ZERO_OID
+		refs/heads/branch $orig $ZERO_OID
 		EOF
 
 		cat >post-receive.expect <<-EOF &&
-		$ZERO_OID $ZERO_OID refs/heads/branch
+		$orig $ZERO_OID refs/heads/branch
 		EOF
 
 		cat >post-update.expect <<-EOF &&
-- 
2.33.0.1071.gb37e412355


  parent reply	other threads:[~2021-09-24 18:46 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
2021-09-24 18:32 ` [PATCH 01/16] t7900: clean up some more broken refs Jeff King
2021-09-27 17:38   ` Jonathan Tan
2021-09-27 19:49     ` Jeff King
2021-09-24 18:33 ` [PATCH 02/16] t5516: don't use HEAD ref for invalid ref-deletion tests Jeff King
2021-09-24 18:34 ` [PATCH 03/16] t5600: provide detached HEAD for corruption failures Jeff King
2021-09-24 18:35 ` [PATCH 04/16] t5312: drop "verbose" helper Jeff King
2021-09-24 18:36 ` [PATCH 05/16] t5312: create bogus ref as necessary Jeff King
2021-09-24 18:36 ` [PATCH 06/16] t5312: test non-destructive repack Jeff King
2021-09-24 18:37 ` [PATCH 07/16] t5312: be more assertive about command failure Jeff King
2021-09-24 18:37 ` [PATCH 08/16] refs-internal.h: move DO_FOR_EACH_* flags next to each other Jeff King
2021-09-24 18:39 ` [PATCH 09/16] refs-internal.h: reorganize DO_FOR_EACH_* flag documentation Jeff King
2021-09-24 18:41 ` [PATCH 10/16] refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag Jeff King
2021-09-24 18:42 ` [PATCH 11/16] refs: omit dangling symrefs when using GIT_REF_PARANOIA Jeff King
2021-09-24 18:46 ` Jeff King [this message]
2021-09-27 17:42   ` [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default Jonathan Tan
2021-09-24 18:46 ` [PATCH 13/16] repack, prune: drop GIT_REF_PARANOIA settings Jeff King
2021-09-24 18:48 ` [PATCH 14/16] ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN Jeff King
2021-09-24 18:48 ` [PATCH 15/16] ref-filter: drop broken-ref code entirely Jeff King
2021-09-24 18:48 ` [PATCH 16/16] refs: drop "broken" flag from for_each_fullref_in() Jeff King
2021-09-27 17:47   ` Jonathan Tan
2021-09-24 20:22 ` [PATCH 0/16] enabling GIT_REF_PARANOIA by default Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YU4c9S6jypbS4YCK@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).