From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: Re: [PATCH] doc/fsck: discuss mix of --connectivity-only and --dangling
Date: Wed, 27 Feb 2019 09:59:28 -0500 [thread overview]
Message-ID: <20190227145928.GA3727@sigill.intra.peff.net> (raw)
In-Reply-To: <20190227145549.GA3255@sigill.intra.peff.net>
On Wed, Feb 27, 2019 at 09:55:49AM -0500, Jeff King wrote:
> We could make the two cases work identically by taking a separate pass
> over the unreachable objects, parsing them and marking objects they
> refer to as USED. That would still avoid parsing any blobs, but we'd pay
> the cost to access any unreachable commits and trees. Since the point of
> --connectivity-only is to quickly report whether all reachable objects
> are present, I'd argue that it's not worth slowing it down to produce
> a better-analyzed dangling list.
>
> Instead, let's document this somewhat surprising property of
> connectivity-only. If somebody really wants to the extra analysis, we
> can add a separate option to enable it.
I'm actually a little torn on this. We could consider this a bug, and
the "option" to disable it when you want things to go fast is to say
"--no-dangling". That leaves no way to say "show me the list of
unreachable objects, but don't bother spending extra time on dangling
analysis". But I don't think I've ever really wanted that list of
unreachable objects anyway (and besides, you could do it pretty easily
with cat-file, rev-list, and comm).
So I sketched up what it might look like to just fix the bug (but kick
in only when needed), which is below.
diff --git a/builtin/fsck.c b/builtin/fsck.c
index bb4227bebc..d26fb0a044 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -235,6 +235,48 @@ static int mark_used(struct object *obj, int type, void *data, struct fsck_optio
return 0;
}
+static void mark_unreachable_referents(const struct object_id *oid)
+{
+ struct fsck_options options = FSCK_OPTIONS_DEFAULT;
+ struct object *obj = lookup_object(the_repository, oid->hash);
+
+ if (!obj || !(obj->flags & HAS_OBJ))
+ return; /* not part of our original set */
+ if (obj->flags & REACHABLE)
+ return; /* reachable objects already traversed */
+
+ /*
+ * Avoid passing OBJ_NONE to fsck_walk, which will parse the object
+ * (and we want to avoid parsing blobs).
+ */
+ if (obj->type == OBJ_NONE) {
+ enum object_type type = oid_object_info(the_repository,
+ &obj->oid, NULL);
+ if (type > 0)
+ object_as_type(the_repository, obj, type, 0);
+ }
+
+ options.walk = mark_used;
+ fsck_walk(obj, NULL, &options);
+}
+
+static int mark_loose_unreachable_referents(const struct object_id *oid,
+ const char *path,
+ void *data)
+{
+ mark_unreachable_referents(oid);
+ return 0;
+}
+
+static int mark_packed_unreachable_referents(const struct object_id *oid,
+ struct packed_git *pack,
+ uint32_t pos,
+ void *data)
+{
+ mark_unreachable_referents(oid);
+ return 0;
+}
+
/*
* Check a single reachable object
*/
@@ -347,6 +389,26 @@ static void check_connectivity(void)
/* Traverse the pending reachable objects */
traverse_reachable();
+ /*
+ * With --connectivity-only, we won't have actually opened and marked
+ * unreachable objects with USED. Do that now to make --dangling, etc
+ * accurate.
+ */
+ if (connectivity_only && (show_dangling || write_lost_and_found)) {
+ /*
+ * Even though we already have a "struct object" for each of
+ * these in memory, we must not iterate over the internal
+ * object hash as we do below. Our loop would potentially
+ * resize the hash, making our iteration invalid.
+ *
+ * Instead, we'll just go back to the source list of objects,
+ * and ignore any that weren't present in our earlier
+ * traversal.
+ */
+ for_each_loose_object(mark_loose_unreachable_referents, NULL, 0);
+ for_each_packed_object(mark_packed_unreachable_referents, NULL, 0);
+ }
+
/* Look up all the requirements, warn about missing objects.. */
max = get_max_object_index();
if (verbose)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index c61f972141..49f08d5b9c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -740,7 +740,7 @@ test_expect_success 'fsck detects truncated loose object' '
# for each of type, we have one version which is referenced by another object
# (and so while unreachable, not dangling), and another variant which really is
# dangling.
-test_expect_success 'fsck notices dangling objects' '
+test_expect_success 'create dangling-object repository' '
git init dangling &&
(
cd dangling &&
@@ -751,19 +751,34 @@ test_expect_success 'fsck notices dangling objects' '
commit=$(git commit-tree $tree) &&
dcommit=$(git commit-tree -p $commit $tree) &&
- cat >expect <<-EOF &&
+ cat >expect <<-EOF
dangling blob $dblob
dangling commit $dcommit
dangling tree $dtree
EOF
+ )
+'
+test_expect_success 'fsck notices dangling objects' '
+ (
+ cd dangling &&
git fsck >actual &&
# the output order is non-deterministic, as it comes from a hash
sort <actual >actual.sorted &&
test_i18ncmp expect actual.sorted
)
'
+test_expect_success 'fsck --connectivity-only notices dangling objects' '
+ (
+ cd dangling &&
+ git fsck --connectivity-only >actual &&
+ # the output order is non-deterministic, as it comes from a hash
+ sort <actual >actual.sorted &&
+ test_i18ncmp expect actual.sorted
+ )
+'
+
test_expect_success 'fsck $name notices bogus $name' '
test_must_fail git fsck bogus &&
test_must_fail git fsck $ZERO_OID
next prev parent reply other threads:[~2019-02-27 14:59 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-27 14:55 [PATCH] doc/fsck: discuss mix of --connectivity-only and --dangling Jeff King
2019-02-27 14:59 ` Jeff King [this message]
2019-03-01 2:50 ` Junio C Hamano
2019-03-05 4:26 ` Jeff King
2019-03-05 4:45 ` [PATCH v2 0/2] fsck --connectivity-only --dangling Jeff King
2019-03-05 4:46 ` [PATCH v2 1/2] doc/fsck: clarify --connectivity-only behavior Jeff King
2019-03-05 4:47 ` [PATCH v2 2/2] fsck: always compute USED flags for unreachable objects Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190227145928.GA3727@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).