[PATCH] connected: distinguish local/remote bad objects

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] connected: distinguish local/remote bad objects
@ 2022-06-08 21:05 Jonathan Tan
  2022-06-08 22:33 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jonathan Tan @ 2022-06-08 21:05 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When the connectivity check after a fetch fails, an error message
"<remote> did not send all necessary objects" is printed. That error
message is printed regardless of the reason of failure: in particular,
that message may be printed if the connectivity check fails because a
local object is missing. (The connectivity check reads local objects too
because it compares the set of objects that the remote claims to send
against the set of objects that our refs directly or indirectly
reference.)

The connectivity check passes, to "git rev-list", remote objects
directly and local objects through "--not". And internally, the latter
are marked with the UNINTERESTING flag. When reading a commit during the
commit walk, we know whether the commit came from an UNINTERESTING
commit or not. Therefore, use this flag to produce a clearer error
message when a bad object is read.

This necessitates changes in revision.c which is used by components
other than the connectivity check and may have different meanings for
objects passe with and without "--not", so guard the extra diagnostics
behind a CLI argument.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
We noticed, at $DAYJOB, some of these messages that were likely caused
by missing objects in the local repository instead, so here is a patch
that will make it easier to diagnose such issues.
---
 builtin/fetch.c              |  2 +-
 connected.c                  |  1 +
 revision.c                   | 16 ++++++++++++--
 revision.h                   |  3 +++
 t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
 5 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ac29c2b1ae..6f43b2bf8d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1133,7 +1133,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 		rm = ref_map;
 		if (check_connected(iterate_ref_map, &rm, &opt)) {
-			rc = error(_("%s did not send all necessary objects\n"), url);
+			rc = error(_("connectivity check failed for %s\n"), url);
 			goto abort;
 		}
 	}
diff --git a/connected.c b/connected.c
index ed3025e7a2..ea773f25db 100644
--- a/connected.c
+++ b/connected.c
@@ -94,6 +94,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 		strvec_push(&rev_list.args, opt->shallow_file);
 	}
 	strvec_push(&rev_list.args,"rev-list");
+	strvec_push(&rev_list.args, "--detailed-bad-object");
 	strvec_push(&rev_list.args, "--objects");
 	strvec_push(&rev_list.args, "--stdin");
 	if (has_promisor_remote())
diff --git a/revision.c b/revision.c
index 090a967bf4..777e762373 100644
--- a/revision.c
+++ b/revision.c
@@ -367,6 +367,16 @@ void add_head_to_pending(struct rev_info *revs)
 	add_pending_object(revs, obj, "HEAD");
 }
 
+static void NORETURN bad_object(struct rev_info *revs, const char *name,
+				unsigned int flags)
+{
+	if (!revs->detailed_bad_object)
+		die("bad object %s", name);
+	if (flags & UNINTERESTING)
+		die("bad object %s (from local object store)", name);
+	die("bad object %s (from remote)", name);
+}
+
 static struct object *get_reference(struct rev_info *revs, const char *name,
 				    const struct object_id *oid,
 				    unsigned int flags)
@@ -390,7 +400,7 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 			return object;
 		if (revs->exclude_promisor_objects && is_promisor_object(oid))
 			return NULL;
-		die("bad object %s", name);
+		bad_object(revs, name, flags);
 	}
 	object->flags |= flags;
 	return object;
@@ -426,7 +436,7 @@ static struct commit *handle_commit(struct rev_info *revs,
 			if (revs->exclude_promisor_objects &&
 			    is_promisor_object(&tag->tagged->oid))
 				return NULL;
-			die("bad object %s", oid_to_hex(&tag->tagged->oid));
+			bad_object(revs, oid_to_hex(&tag->tagged->oid), flags);
 		}
 		object->flags |= flags;
 		/*
@@ -2537,6 +2547,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		if (fetch_if_missing)
 			BUG("exclude_promisor_objects can only be used when fetch_if_missing is 0");
 		revs->exclude_promisor_objects = 1;
+	} else if (!strcmp(arg, "--detailed-bad-object")) {
+		revs->detailed_bad_object = 1;
 	} else {
 		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
 		if (!opts)
diff --git a/revision.h b/revision.h
index e80c148b19..7f685dd5bb 100644
--- a/revision.h
+++ b/revision.h
@@ -328,6 +328,9 @@ struct rev_info {
 
 	/* Location where temporary objects for remerge-diff are written. */
 	struct tmp_objdir *remerge_objdir;
+
+	/* Error reporting info */
+	unsigned detailed_bad_object : 1;
 };
 
 int ref_excluded(struct string_list *, const char *path);
diff --git a/t/t5518-fetch-exit-status.sh b/t/t5518-fetch-exit-status.sh
index 5c4ac2556e..f1adac1dd6 100755
--- a/t/t5518-fetch-exit-status.sh
+++ b/t/t5518-fetch-exit-status.sh
@@ -37,4 +37,47 @@ test_expect_success 'forced update' '
 
 '
 
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'connectivity check failure due to missing local object' '
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_when_finished "rm -rf \"$SERVER\" client" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+
+	test_create_repo client &&
+	test_commit -C client bar &&
+
+	# Simulate missing client objects.
+	rm -rf client/.git/objects/* &&
+	test_must_fail git -C client fetch $HTTPD_URL/smart/server 2>err &&
+	grep "(from local object store)" err &&
+	! grep "(from remote)" err &&
+	grep "error: connectivity check failed for" err
+'
+
+test_expect_success 'connectivity check failure due to missing remote object' '
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_when_finished "rm -rf \"$SERVER\" client" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+	git -C "$SERVER" config uploadpack.allowRefInWant true &&
+	SERVER_HEAD=$(git -C "$SERVER" rev-parse HEAD) &&
+	SERVER_FAKE_HEAD=$(echo $SERVER_HEAD | tr "0123456789abcdef" "123456789abcdef0") &&
+
+	test_create_repo client &&
+
+	# Make the server claim that it has $SERVER_FAKE_HEAD as
+	# refs/heads/main. The server still sends $SERVER_HEAD in the packfile,
+	# so the client will see $SERVER_FAKE_HEAD as missing.
+	echo "s=$SERVER_HEAD refs/heads/main=$SERVER_FAKE_HEAD refs/heads/main= if /wanted-refs/../packfile/" >"$HTTPD_ROOT_PATH/one-time-perl" &&
+
+	test_must_fail git -C client fetch $HTTPD_URL/one_time_perl/server refs/heads/main 2>err &&
+	grep "(from remote)" err &&
+	! grep "(from local object store)" err &&
+	grep "error: connectivity check failed for" err
+'
+
 test_done
+
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] connected: distinguish local/remote bad objects
  2022-06-08 21:05 [PATCH] connected: distinguish local/remote bad objects Jonathan Tan
@ 2022-06-08 22:33 ` Junio C Hamano
  2022-06-09 17:17   ` Jonathan Tan
  2022-06-09 16:55 ` Junio C Hamano
  2022-06-10 19:52 ` [PATCH v2] fetch,fetch-pack: clarify connectivity check error Jonathan Tan
  2 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2022-06-08 22:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index ac29c2b1ae..6f43b2bf8d 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1133,7 +1133,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  
>  		rm = ref_map;
>  		if (check_connected(iterate_ref_map, &rm, &opt)) {
> -			rc = error(_("%s did not send all necessary objects\n"), url);
> +			rc = error(_("connectivity check failed for %s\n"), url);
>  			goto abort;
>  		}
>  	}

Clever.

I was wondering how you are going to deal with the different exit
condition from the rev-list that is only expressed with the two
different messages.  You _could_ grep for "from local" vs "from
remote", but you just let the human user who is staring at the error
message to read it, and stop mentioning the details in this message.

OK.

> diff --git a/connected.c b/connected.c
> index ed3025e7a2..ea773f25db 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -94,6 +94,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  		strvec_push(&rev_list.args, opt->shallow_file);
>  	}
>  	strvec_push(&rev_list.args,"rev-list");
> +	strvec_push(&rev_list.args, "--detailed-bad-object");
>  	strvec_push(&rev_list.args, "--objects");
>  	strvec_push(&rev_list.args, "--stdin");
>  	if (has_promisor_remote())
> diff --git a/revision.c b/revision.c
> index 090a967bf4..777e762373 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -367,6 +367,16 @@ void add_head_to_pending(struct rev_info *revs)
>  	add_pending_object(revs, obj, "HEAD");
>  }
>  
> +static void NORETURN bad_object(struct rev_info *revs, const char *name,
> +				unsigned int flags)
> +{
> +	if (!revs->detailed_bad_object)
> +		die("bad object %s", name);
> +	if (flags & UNINTERESTING)
> +		die("bad object %s (from local object store)", name);
> +	die("bad object %s (from remote)", name);
> +}

If the missing object you found is reachable from existing tips
(i.e. local aka UNINTERESTING) and also from what they should have
sent (i.e. remote tips), when we discover that the object does not
exist locally (but we can have an in-core shell object whose object
name is already known because child objects that are closer to the
tips than the missing object do exist and point at it), does this
new heuristic work reliably?  

Do we always die and report bad_object() with UNINTERESTING in the
the flags variable, or only when we are lucky?

IOW, if our current branch pionts at A, while the other side says
they are updating it to B,

    X-----o-----A
     \
      x---B

we try to traverse "rev-list ^A B" and make sure everything exists.
If we find objects 'o' missing, it is clear that it was something we
were supposed to have on the local side even before we started the
fetch.  But if 'X' is missing, by the time we notice and report a
missing object, do we always reliably know that it ought to be
reachable from both?  Or do we have 50/50 chance that the traversal
comes from 'o' earlier than from 'x' (in which case X is found to be
missing when we try to see who is the parent of 'o', and flags have
UNINTERESTING bit), or later than from 'x' (in which case X is found
when trying to see who the parents of 'x' is, and we may not know
and we may not bother to find out if X is also a parent of 'o',
hence we'd still say 'You do not have X, and we were looking at 'x',
which we got from the other end, so they were supposed to have sent
it', which would be a misdiagnosis)?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] connected: distinguish local/remote bad objects
  2022-06-08 21:05 [PATCH] connected: distinguish local/remote bad objects Jonathan Tan
  2022-06-08 22:33 ` Junio C Hamano
@ 2022-06-09 16:55 ` Junio C Hamano
  2022-06-09 17:17   ` Jonathan Tan
  2022-06-09 18:00   ` Ævar Arnfjörð Bjarmason
  2022-06-10 19:52 ` [PATCH v2] fetch,fetch-pack: clarify connectivity check error Jonathan Tan
  2 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2022-06-09 16:55 UTC (permalink / raw)
  To: Jonathan Tan, Ævar Arnfjörð Bjarmason; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>  builtin/fetch.c              |  2 +-
>  connected.c                  |  1 +
>  revision.c                   | 16 ++++++++++++--
>  revision.h                   |  3 +++
>  t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 62 insertions(+), 3 deletions(-)

This seems to break linux-leaks CI job by making 5518, which was
marked in some topic in flight to expect to be leak-free, fail.

Because of the way linux-leaks test framework is done, it is not
easy to tell if the code changes essential to this topic introduced
new leaks, in which case we would want to fix that.

Note that this may not the fault of the code changes in this patch.
If the tests added by the patch started using git commands that are
known to leak (i.e. not ready to be subjected to the "leaks" test)
in order to prepare the scenario or to inspect the result, even if
the code changes in this topic did not introduce any leak, we can
see the same breakage in linux-leaks CI job.  An easy way out would
be to disable leak-check CI for the entire 5518, but that is not
very satisfactory, as the earlier part of that script should still
be leak-free.  Another way out might be to add these two tests in a
new script, which is not marked as not-leaking.  After all, what the
new topic adds is not about exit status but how that exit status
comes about, so it might not be a bad idea even without the CI leak
stuff anyway.

Ævar, does the internal state used for revision walking count as
leaking when it is still held by the time we hit die() in
bad_object(), or anything on stack when we die() are still reachable
and won't be reported as a failure?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] connected: distinguish local/remote bad objects
  2022-06-09 16:55 ` Junio C Hamano
@ 2022-06-09 17:17   ` Jonathan Tan
  2022-06-09 18:00   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2022-06-09 17:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, Ævar Arnfjörð Bjarmason, git

Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >  builtin/fetch.c              |  2 +-
> >  connected.c                  |  1 +
> >  revision.c                   | 16 ++++++++++++--
> >  revision.h                   |  3 +++
> >  t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
> >  5 files changed, 62 insertions(+), 3 deletions(-)
> 
> This seems to break linux-leaks CI job by making 5518, which was
> marked in some topic in flight to expect to be leak-free, fail.
> 
> Because of the way linux-leaks test framework is done, it is not
> easy to tell if the code changes essential to this topic introduced
> new leaks, in which case we would want to fix that.

Thanks. I'll recheck this once I make a new version.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] connected: distinguish local/remote bad objects
  2022-06-08 22:33 ` Junio C Hamano
@ 2022-06-09 17:17   ` Jonathan Tan
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2022-06-09 17:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git

Junio C Hamano <gitster@pobox.com> writes:
> If the missing object you found is reachable from existing tips
> (i.e. local aka UNINTERESTING) and also from what they should have
> sent (i.e. remote tips), when we discover that the object does not
> exist locally (but we can have an in-core shell object whose object
> name is already known because child objects that are closer to the
> tips than the missing object do exist and point at it), does this
> new heuristic work reliably?  
> 
> Do we always die and report bad_object() with UNINTERESTING in the
> the flags variable, or only when we are lucky?
> 
> IOW, if our current branch pionts at A, while the other side says
> they are updating it to B,
> 
>     X-----o-----A
>      \
>       x---B
> 
> we try to traverse "rev-list ^A B" and make sure everything exists.
> If we find objects 'o' missing, it is clear that it was something we
> were supposed to have on the local side even before we started the
> fetch.  But if 'X' is missing, by the time we notice and report a
> missing object, do we always reliably know that it ought to be
> reachable from both?  Or do we have 50/50 chance that the traversal
> comes from 'o' earlier than from 'x' (in which case X is found to be
> missing when we try to see who is the parent of 'o', and flags have
> UNINTERESTING bit), or later than from 'x' (in which case X is found
> when trying to see who the parents of 'x' is, and we may not know
> and we may not bother to find out if X is also a parent of 'o',
> hence we'd still say 'You do not have X, and we were looking at 'x',
> which we got from the other end, so they were supposed to have sent
> it', which would be a misdiagnosis)?
> 
> Thanks.

Ah, good catch. I'll take a look into fixing this. (A misdiagnosis would
defeat the purpose of this patch, yes.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] connected: distinguish local/remote bad objects
  2022-06-09 16:55 ` Junio C Hamano
  2022-06-09 17:17   ` Jonathan Tan
@ 2022-06-09 18:00   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-09 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git


On Thu, Jun 09 2022, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>>  builtin/fetch.c              |  2 +-
>>  connected.c                  |  1 +
>>  revision.c                   | 16 ++++++++++++--
>>  revision.h                   |  3 +++
>>  t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
>>  5 files changed, 62 insertions(+), 3 deletions(-)
>
> This seems to break linux-leaks CI job by making 5518, which was
> marked in some topic in flight to expect to be leak-free, fail.
>
> Because of the way linux-leaks test framework is done, it is not
> easy to tell if the code changes essential to this topic introduced
> new leaks, in which case we would want to fix that.

I think this is just an existing leak that happens to be exposed by a
new (in this file) test, i.e. transport_get() leaks via an xmalloc() for
transport_helper.

> Note that this may not the fault of the code changes in this patch.
> If the tests added by the patch started using git commands that are
> known to leak (i.e. not ready to be subjected to the "leaks" test)
> in order to prepare the scenario or to inspect the result, even if
> the code changes in this topic did not introduce any leak, we can
> see the same breakage in linux-leaks CI job.  An easy way out would
> be to disable leak-check CI for the entire 5518, but that is not
> very satisfactory, as the earlier part of that script should still
> be leak-free.

I think doing that would be fine in this case. It will get easier to fix
leaks now that "struct rev_info" is out of the way (and I've got a lot
of pending patches), but I can always loop back & re-mark this
particular test as leak-free at some future date.

> Another way out might be to add these two tests in a
> new script, which is not marked as not-leaking.  After all, what the
> new topic adds is not about exit status but how that exit status
> comes about, so it might not be a bad idea even without the CI leak
> stuff anyway.

Yeah, that sounds especially good in this case, as if we can't run httpd
we'll print a meaningful "skip" message in that case. See 0a2bfccb9c8
(t0051: use "skip_all" under !MINGW in single-test file, 2022-02-04)

> Ævar, does the internal state used for revision walking count as
> leaking when it is still held by the time we hit die() in
> bad_object(), or anything on stack when we die() are still reachable
> and won't be reported as a failure?

No, but in this case the variable containing the leaked data isn't in
scope by the time we exit, i.e. it was used by fetch_one() which had it
malloc'd, but the struct it lived in went away, and now we're exiting
from cmd_fetch() etc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] fetch,fetch-pack: clarify connectivity check error
  2022-06-08 21:05 [PATCH] connected: distinguish local/remote bad objects Jonathan Tan
  2022-06-08 22:33 ` Junio C Hamano
  2022-06-09 16:55 ` Junio C Hamano
@ 2022-06-10 19:52 ` Jonathan Tan
  2022-06-10 20:25   ` Junio C Hamano
  2 siblings, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2022-06-10 19:52 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster, avarab

When the connectivity check after a fetch fails, an error message
"<remote> did not send all necessary objects" is printed. That error
message is printed regardless of the reason of failure: in particular,
that message may be printed if the connectivity check fails because a
local object is missing. (The connectivity check reads local objects too
because it compares the set of objects that the remote claims to send
against the set of objects that our refs directly or indirectly
reference.)

Since it is not necessarily true that the remote did not send all
necessary objects, update the error message to be something that more
correctly reflects the situation.

The updated error message is admittedly vague and one alternative
solution would be to update the revision walking code to be able to more
precisely specify if a bad object is supposed to be available locally or
supposed to have been sent by the remote. That would likely require, in
the revision walking code, to delay parsing any object until all its
children has been parsed, which seems to be a significant undertaking.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
As Junio said in [1], my original v1 code doesn't work when an object is
referenced both by a local object and a remote one. I tried looking into
making it work but it looks difficult.

So here is a patch that just changes the error message to a vaguer but
hopefully more correct one. I wonder what the community thinks of it -
I think it is more correct (and means that we do not need to say "it's
not the remote fault despite what the error message says") but at the
same time, many people are already used to this message (and a search
online reveals several web sites that talk about this).

[1] https://lore.kernel.org/git/xmqqh74upyz3.fsf@gitster.g/
---
 builtin/fetch.c | 2 +-
 fetch-pack.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ac29c2b1ae..15737ca293 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1133,7 +1133,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,

 		rm = ref_map;
 		if (check_connected(iterate_ref_map, &rm, &opt)) {
-			rc = error(_("%s did not send all necessary objects\n"), url);
+			rc = error(_("connectivity check failed after fetching from %s\n"), url);
 			goto abort;
 		}
 	}
diff --git a/fetch-pack.c b/fetch-pack.c
index cb6647d657..91269008cc 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -2021,7 +2021,7 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		if (args->deepen)
 			opt.is_deepening_fetch = 1;
 		if (check_connected(iterate_ref_map, &iterator, &opt)) {
-			error(_("remote did not send all necessary objects"));
+			error(_("connectivity check failed after fetching from remote"));
 			free_refs(ref_cpy);
 			ref_cpy = NULL;
 			rollback_shallow_file(the_repository, &shallow_lock);
-- 
2.36.1.476.g0c4daa206d-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fetch,fetch-pack: clarify connectivity check error
  2022-06-10 19:52 ` [PATCH v2] fetch,fetch-pack: clarify connectivity check error Jonathan Tan
@ 2022-06-10 20:25   ` Junio C Hamano
  2022-06-17 20:03     ` Jonathan Tan
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2022-06-10 20:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> When the connectivity check after a fetch fails, an error message
> "<remote> did not send all necessary objects" is printed. That error
> message is printed regardless of the reason of failure: in particular,
> that message may be printed if the connectivity check fails because a
> local object is missing. (The connectivity check reads local objects too
> because it compares the set of objects that the remote claims to send
> against the set of objects that our refs directly or indirectly
> reference.)
>
> Since it is not necessarily true that the remote did not send all
> necessary objects, update the error message to be something that more
> correctly reflects the situation.
>
> The updated error message is admittedly vague and one alternative
> solution would be to update the revision walking code to be able to more
> precisely specify if a bad object is supposed to be available locally or
> supposed to have been sent by the remote. That would likely require, in
> the revision walking code, to delay parsing any object until all its
> children has been parsed, which seems to be a significant undertaking.
>

While that is all true, I do not think the posted patch as-is is a
good idea.  After all, object missing on our side locally before you
start "git fetch" is a local repository corruption, which ought to
be a much rarer event than a still in development and in flux server
end sending nonsense packfiles, no?

At least, "they didn't do X" would give the user somewhere to start
investigation (e.g. complain to the server folks about the error,
stating where they started from and what they tried to fetch).  The
new message may be playing it "safe" by not saying anything you are
not absolutely sure about, but that is much less useful to the users
who got the message.

Now, when check_connected() fails early in store_updated_refs(), we
haven't updated the remote-tracking refs to point at the tips we
obtained, have we?

One thing we could do is to run "git fsck" to see if the local
history is broken.  We may have added unreferenced objects that have
come from the remote end, but having dangling or unreachable objects
is not an error, while it is a sure sign of repository corruption if
there are missing objects.  Because we haven't update any of the
refs (local or remote-tracking), we still should be able to see if
the problem is on our end at this point

If "o" or "X" (in the illustrated history in my review of your v1)
is found to be missing by "git fsck" (or "git fsck A"), then you
know that the server end is not the culprit (or at least may not be
the sole culprit) for this connectivity check failure.  If "git
fsck" reports no missing objects, then what was found to be missing
is not "X" but is "x" or "B", and we are more certain that the
remote end is more likely to be problematic.

As it is on the error codepath, we could even automatically run fsck
to figure out if we have objects missing after giving an initial
diagnosis, but at least it would be easy to give an advice at this
point in the code to run fsck after seeing a failure from this
fetch.

As you have two places that show the error, perhaps the first step
is to introduce a helper function:

	int diagnose_check_connected_error(const char *url)
	{
		return error(_("%s may not have sent necessary objects"),
                		url ? url : "remote");
	}

and call it instead of calling error() in the places you touched in
v2.

As a follow up step, then, we could do something like

	int diagnose_check_connected_error(const char *url)
	{
+		static const char advice[] =
+			N_("run 'git fsck' to see if your repository "
+			"is missing objects; if it passes, the problem is "
+			"on the remote end");
+
+		advise_if_enabled(ADVICE_CONNECTIVETY_FAILURE, advice);
		return error(_("%s may not have sent necessary	objects"),
                		url ? url : "remote");
	}

or if we are more ambitious, actually run "git fsck" via the run_command() 
API for the user.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fetch,fetch-pack: clarify connectivity check error
  2022-06-10 20:25   ` Junio C Hamano
@ 2022-06-17 20:03     ` Jonathan Tan
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2022-06-17 20:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, avarab

Junio C Hamano <gitster@pobox.com> writes:
> While that is all true, I do not think the posted patch as-is is a
> good idea.  After all, object missing on our side locally before you
> start "git fetch" is a local repository corruption, which ought to
> be a much rarer event than a still in development and in flux server
> end sending nonsense packfiles, no?
> 
> At least, "they didn't do X" would give the user somewhere to start
> investigation (e.g. complain to the server folks about the error,
> stating where they started from and what they tried to fetch).  The
> new message may be playing it "safe" by not saying anything you are
> not absolutely sure about, but that is much less useful to the users
> who got the message.

Hmm...that's true. Let me try to understand the revision walking code a
bit more, and if by then I still don't end up changing the revision walk
to avoid prematurely parsing a parent, I'll make a patch with your
suggestion.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-17 20:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-08 21:05 [PATCH] connected: distinguish local/remote bad objects Jonathan Tan
2022-06-08 22:33 ` Junio C Hamano
2022-06-09 17:17   ` Jonathan Tan
2022-06-09 16:55 ` Junio C Hamano
2022-06-09 17:17   ` Jonathan Tan
2022-06-09 18:00   ` Ævar Arnfjörð Bjarmason
2022-06-10 19:52 ` [PATCH v2] fetch,fetch-pack: clarify connectivity check error Jonathan Tan
2022-06-10 20:25   ` Junio C Hamano
2022-06-17 20:03     ` Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).