git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] revision: allow missing promisor objects on CLI
@ 2019-12-28  0:34 Jonathan Tan
  2019-12-28  3:50 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jonathan Tan @ 2019-12-28  0:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matvore

Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline
objects", 2018-12-06) prevented some segmentation faults from occurring
by tightening handling of missing objects provided through the CLI: if
--ignore-missing is set, then it is OK (and the missing object ignored,
just like one would if encountered in traversal).

However, in the case that --ignore-missing is not set but
--exclude-promisor-objects is set, there is still no distinction between
the case wherein the missing object is a promisor object and the case
wherein it is not. This is unnecessarily restrictive, since if a missing
promisor object is encountered in traversal, it is ignored; likewise it
should be ignored if provided through the CLI. Therefore, distinguish
between these 2 cases. (As a bonus, the code is now simpler.)

(Note that this only affects handling of missing promisor objects.
Handling of non-missing promisor objects is already done by setting all
of them to UNINTERESTING in prepare_revision_walk().)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
For those curious, I discovered this while trying to extend the
check-targets-only optimization done in dfa33a298d ("clone: do faster
object check for partial clones", 2019-04-21) to fetches as well. While
investigating my previous work of adding check-target functionality in
addition to the usual connectivity check (for correctness, not for
performance) in 35f9e3e5e7 ("fetch: in partial clone, check presence of
targets", 2018-09-21), I discovered that at current master, the check
was somehow no longer needed because rev-list dies on missing objects on
CLI. But I don't think that the current behavior is obvious, hence this
commit (which also restores the need for the check-target
functionality).
---
 revision.c               | 10 +++++++++-
 t/t0410-partial-clone.sh | 10 ++--------
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/revision.c b/revision.c
index 8136929e23..345615e300 100644
--- a/revision.c
+++ b/revision.c
@@ -1907,7 +1907,15 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		/*
+		 * Either this object is missing and ignore_missing is true, or
+		 * this object is a (missing) promisor object and
+		 * exclude_promisor_objects is true. In any case, we are
+		 * allowed to skip processing of this object; this object will
+		 * not appear in output and cannot be used as a source of
+		 * UNINTERESTING ancestors (since it is missing).
+		 */
+		return 0;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a3988bd4b8..fd28f5402a 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -416,15 +416,9 @@ test_expect_success 'rev-list dies for missing objects on cmd line' '
 	git -C repo config extensions.partialclone "arbitrary string" &&
 
 	for OBJ in "$COMMIT" "$TREE" "$BLOB"; do
-		test_must_fail git -C repo rev-list --objects \
+		git -C repo rev-list --objects \
 			--exclude-promisor-objects "$OBJ" &&
-		test_must_fail git -C repo rev-list --objects-edge-aggressive \
-			--exclude-promisor-objects "$OBJ" &&
-
-		# Do not die or crash when --ignore-missing is passed.
-		git -C repo rev-list --ignore-missing --objects \
-			--exclude-promisor-objects "$OBJ" &&
-		git -C repo rev-list --ignore-missing --objects-edge-aggressive \
+		git -C repo rev-list --objects-edge-aggressive \
 			--exclude-promisor-objects "$OBJ"
 	done
 '
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: allow missing promisor objects on CLI
  2019-12-28  0:34 [PATCH] revision: allow missing promisor objects on CLI Jonathan Tan
@ 2019-12-28  3:50 ` Junio C Hamano
  2019-12-30 18:38   ` Jonathan Tan
  2019-12-30 23:44 ` [PATCH v2] " Jonathan Tan
  2020-01-11 22:34 ` [PATCH v3 0/2] Un-regress rev-list --exclude-promisor-objects Jonathan Tan
  2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2019-12-28  3:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, matvore

Jonathan Tan <jonathantanmy@google.com> writes:

>  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
>  	if (!object)
> -		return revs->ignore_missing ? 0 : -1;
> +		/*
> +		 * Either this object is missing and ignore_missing is true, or
> +		 * this object is a (missing) promisor object and
> +		 * exclude_promisor_objects is true.

I had to guess and dig where these assertions are coming from; we
should not force future readers of the code to.

At least this comment must say why these assertions hold.  Say
something like "get_reference() yields NULL on only such and such
cases" before concluding with "and in any of these cases, we can
safely ignore it because ...".

I think the two cases the comment covers are safe for this caller to
silently return 0.  Another case get_reference() yields NULL is when
oid_object_info() says it is a commit but it turns out that the
object is found by repo_parse_commit() to be a non-commit, isn't it?
I am not sure if it is safe for this caller to just return 0.  There
may be some other "unusual-but-not-fatal" cases where get_reference()
does not hit a die() but returns NULL.

Thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: allow missing promisor objects on CLI
  2019-12-28  3:50 ` Junio C Hamano
@ 2019-12-30 18:38   ` Jonathan Tan
  2019-12-30 20:33     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2019-12-30 18:38 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, matvore

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
> >  	if (!object)
> > -		return revs->ignore_missing ? 0 : -1;
> > +		/*
> > +		 * Either this object is missing and ignore_missing is true, or
> > +		 * this object is a (missing) promisor object and
> > +		 * exclude_promisor_objects is true.
> 
> I had to guess and dig where these assertions are coming from; we
> should not force future readers of the code to.
> 
> At least this comment must say why these assertions hold.  Say
> something like "get_reference() yields NULL on only such and such
> cases" before concluding with "and in any of these cases, we can
> safely ignore it because ...".

OK, will do.

> I think the two cases the comment covers are safe for this caller to
> silently return 0.  Another case get_reference() yields NULL is when
> oid_object_info() says it is a commit but it turns out that the
> object is found by repo_parse_commit() to be a non-commit, isn't it?
> I am not sure if it is safe for this caller to just return 0.  There
> may be some other "unusual-but-not-fatal" cases where get_reference()
> does not hit a die() but returns NULL.

I don't think there is any other case where get_reference() yields NULL,
at least where I based my patch (99c33bed56 ("Git 2.25-rc0",
2019-12-25)). Quoting the entire get_reference():

> static struct object *get_reference(struct rev_info *revs, const char *name,
>                                     const struct object_id *oid,
>                                     unsigned int flags)
> {
>         struct object *object;
> 
>         /*
>          * If the repository has commit graphs, repo_parse_commit() avoids
>          * reading the object buffer, so use it whenever possible.
>          */
>         if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
>                 struct commit *c = lookup_commit(revs->repo, oid);
>                 if (!repo_parse_commit(revs->repo, c))
>                         object = (struct object *) c;
>                 else
>                         object = NULL;
>         } else {
>                 object = parse_object(revs->repo, oid);
>         }

No return statements at all prior to this line.

>         if (!object) {
>                 if (revs->ignore_missing)
>                         return object;

Return NULL (the value of object).

>                 if (revs->exclude_promisor_objects && is_promisor_object(oid))
>                         return NULL;

Return NULL.

>                 die("bad object %s", name);

Die (so this function invocation never returns). In conclusion, if
object is NULL at this point in time, get_reference() either returns
NULL or dies.

>         }

Since get_reference() did not return NULL or die, object is non-NULL
here.

>         object->flags |= flags;
>         return object;

Nothing has overridden object since, so we're returning non-NULL here.

> }

So I think get_reference() only returns NULL in those two safe cases.
(Or did I miss something?)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: allow missing promisor objects on CLI
  2019-12-30 18:38   ` Jonathan Tan
@ 2019-12-30 20:33     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2019-12-30 20:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, matvore

Jonathan Tan <jonathantanmy@google.com> writes:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> >  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
>> >  	if (!object)
>> > -		return revs->ignore_missing ? 0 : -1;
>> > +		/*
>> > +		 * Either this object is missing and ignore_missing is true, or
>> > +		 * this object is a (missing) promisor object and
>> > +		 * exclude_promisor_objects is true.
>> 
>> I had to guess and dig where these assertions are coming from; we
>> should not force future readers of the code to.
>> 
>> At least this comment must say why these assertions hold.  Say
>> something like "get_reference() yields NULL on only such and such
>> cases" before concluding with "and in any of these cases, we can
>> safely ignore it because ...".
>
> OK, will do.
>
>> I think the two cases the comment covers are safe for this caller to
>> silently return 0.  Another case get_reference() yields NULL is when
>> oid_object_info() says it is a commit but it turns out that the
>> object is found by repo_parse_commit() to be a non-commit, isn't it?
>> I am not sure if it is safe for this caller to just return 0.  There
>> may be some other "unusual-but-not-fatal" cases where get_reference()
>> does not hit a die() but returns NULL.
>
> I don't think there is any other case where get_reference() yields NULL,
> at least where I based my patch (99c33bed56 ("Git 2.25-rc0",
> 2019-12-25)). Quoting the entire get_reference():
>
>> static struct object *get_reference(struct rev_info *revs, const char *name,
>>                                     const struct object_id *oid,
>>                                     unsigned int flags)
>> {
>>         struct object *object;
>> 
>>         /*
>>          * If the repository has commit graphs, repo_parse_commit() avoids
>>          * reading the object buffer, so use it whenever possible.
>>          */
>>         if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
>>                 struct commit *c = lookup_commit(revs->repo, oid);
>>                 if (!repo_parse_commit(revs->repo, c))
>>                         object = (struct object *) c;
>>                 else
>>                         object = NULL;

This is the case where oid must be COMMIT from oid_object_info()'s
point of view, but repo_parse_commit() finds it as a non-commit, and
object becomes NULL.  This is quite different from the normal lazy
clone case where exclude-promisor-objects etc. wants to cover, that
the object whose name is oid is truly missing because it can be
fetched later from elsewhere.  Instead, we have found that there is
an inconsistency in the data we have about the object, iow, a
possible corruption.

>>         if (!object) {
>>                 if (revs->ignore_missing)
>>                         return object;
>
> Return NULL (the value of object).
>
>>                 if (revs->exclude_promisor_objects && is_promisor_object(oid))
>>                         return NULL;
>
> Return NULL.
>
>>                 die("bad object %s", name);
>
> Die (so this function invocation never returns). In conclusion, if
> object is NULL at this point in time, get_reference() either returns
> NULL or dies.

And when !object, this does not die if

 - ignore-missing is in effect, or
 - exclude-promisor is in effect and this is a promisor object that
   is missing from the local repository.

and instead return NULL.

It just felt funny that the "we found something fishy about the
asked-for object" case is treated the same way for the purpose of
ignore-missing and exclude-promisor.  The asked-for objet is
certainly not missing (i.e. we know more than we want to know about
the object---some part of our database says it is a commit and some
other part says it is not).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] revision: allow missing promisor objects on CLI
  2019-12-28  0:34 [PATCH] revision: allow missing promisor objects on CLI Jonathan Tan
  2019-12-28  3:50 ` Junio C Hamano
@ 2019-12-30 23:44 ` Jonathan Tan
  2019-12-31  0:09   ` Jonathan Nieder
  2020-01-11 22:34 ` [PATCH v3 0/2] Un-regress rev-list --exclude-promisor-objects Jonathan Tan
  2 siblings, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2019-12-30 23:44 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline
objects", 2018-12-06) prevented some segmentation faults from occurring
by tightening handling of missing objects provided through the CLI: if
--ignore-missing is set, then it is OK (and the missing object ignored,
just like one would if encountered in traversal).

However, in the case that --ignore-missing is not set but
--exclude-promisor-objects is set, there is still no distinction between
the case wherein the missing object is a promisor object and the case
wherein it is not. This is unnecessarily restrictive, since if a missing
promisor object is encountered in traversal, it is ignored; likewise it
should be ignored if provided through the CLI. Therefore, distinguish
between these 2 cases. (As a bonus, the code is now simpler.)

(Note that this only affects handling of missing promisor objects.
Handling of non-missing promisor objects is already done by setting all
of them to UNINTERESTING in prepare_revision_walk().)

Additionally, clarify in get_reference() that error messages are already
being printed by the functions called (parse_object(),
repo_parse_commit(), and parse_commit_buffer() - invoked by the latter).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Changes from v1: Improved code comments and commit message

> This is the case where oid must be COMMIT from oid_object_info()'s
> point of view, but repo_parse_commit() finds it as a non-commit, and
> object becomes NULL.  This is quite different from the normal lazy
> clone case where exclude-promisor-objects etc. wants to cover, that
> the object whose name is oid is truly missing because it can be
> fetched later from elsewhere.  Instead, we have found that there is
> an inconsistency in the data we have about the object, iow, a
> possible corruption.

Thanks! I should have looked at the first half of get_reference() more
carefully.

If there is corruption in the form of hash mismatch, parse_object() will
print a message and then return NULL, leaving get_reference() to handle
it - and treat it as missing in this case. It seems reasonable to me to
handle the repo_parse_commit() failure in a similar way. I've added
comments to clarify that error messages are being printed.
---
 revision.c               | 23 ++++++++++++++++++++++-
 t/t0410-partial-clone.sh | 10 ++--------
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/revision.c b/revision.c
index 8136929e23..af1e31b4fc 100644
--- a/revision.c
+++ b/revision.c
@@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 		if (!repo_parse_commit(revs->repo, c))
 			object = (struct object *) c;
 		else
+			/*
+			 * There is something wrong with the commit.
+			 * repo_parse_commit() will have already printed an
+			 * error message. For our purposes, treat as missing.
+			 */
 			object = NULL;
 	} else {
+		/*
+		 * There is something wrong with the object. parse_object()
+		 * will have already printed an error message. For our
+		 * purposes, treat as missing.
+		 */
 		object = parse_object(revs->repo, oid);
 	}
 
@@ -1907,7 +1917,18 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		/*
+		 * If this object is corrupt, get_reference() prints an error
+		 * message and treats it as missing.
+		 *
+		 * get_reference() returns NULL only if this object is missing
+		 * and ignore_missing is true, or this object is a (missing)
+		 * promisor object and exclude_promisor_objects is true. In
+		 * both these cases, we can safely ignore this object because
+		 * this object will not appear in output and cannot be used as
+		 * a source of UNINTERESTING ancestors (since it is missing).
+		 */
+		return 0;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a3988bd4b8..fd28f5402a 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -416,15 +416,9 @@ test_expect_success 'rev-list dies for missing objects on cmd line' '
 	git -C repo config extensions.partialclone "arbitrary string" &&
 
 	for OBJ in "$COMMIT" "$TREE" "$BLOB"; do
-		test_must_fail git -C repo rev-list --objects \
+		git -C repo rev-list --objects \
 			--exclude-promisor-objects "$OBJ" &&
-		test_must_fail git -C repo rev-list --objects-edge-aggressive \
-			--exclude-promisor-objects "$OBJ" &&
-
-		# Do not die or crash when --ignore-missing is passed.
-		git -C repo rev-list --ignore-missing --objects \
-			--exclude-promisor-objects "$OBJ" &&
-		git -C repo rev-list --ignore-missing --objects-edge-aggressive \
+		git -C repo rev-list --objects-edge-aggressive \
 			--exclude-promisor-objects "$OBJ"
 	done
 '
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] revision: allow missing promisor objects on CLI
  2019-12-30 23:44 ` [PATCH v2] " Jonathan Tan
@ 2019-12-31  0:09   ` Jonathan Nieder
  2020-01-02 20:49     ` Jonathan Tan
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Nieder @ 2019-12-31  0:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster

Jonathan Tan wrote:

> Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline
> objects", 2018-12-06) prevented some segmentation faults from occurring
> by tightening handling of missing objects provided through the CLI: if
> --ignore-missing is set, then it is OK (and the missing object ignored,
> just like one would if encountered in traversal).
>
> However, in the case that --ignore-missing is not set but
> --exclude-promisor-objects is set, there is still no distinction between
> the case wherein the missing object is a promisor object and the case
> wherein it is not. This is unnecessarily restrictive, since if a missing
> promisor object is encountered in traversal, it is ignored; likewise it
> should be ignored if provided through the CLI. Therefore, distinguish
> between these 2 cases. (As a bonus, the code is now simpler.)

nit about tenses, not worth a reroll on its own: "As a bonus, this
makes the code simpler" (since commit messages describe the status quo
before the patch in the present tense).

[...]
> --- a/revision.c
> +++ b/revision.c
> @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
>  		if (!repo_parse_commit(revs->repo, c))
>  			object = (struct object *) c;
>  		else
> +			/*
> +			 * There is something wrong with the commit.
> +			 * repo_parse_commit() will have already printed an
> +			 * error message. For our purposes, treat as missing.
> +			 */
>  			object = NULL;
>  	} else {
> +		/*
> +		 * There is something wrong with the object. parse_object()

If we got here, we are in the !commit case, which is not inherently wrong,
right?  Is the intent something like the following?

	if (oid_object_info(...) == OBJ_COMMIT && !repo_parse_commit(...)) {
		... good ...
	} else if (parse_object(...)) {
		... good ...
	} else {
		/*
		 * An error occured while parsing the object.
		 * parse_commit or parse_object has already printed an
		 * error message.  For our purposes, it's safe to
		 * assume the object as missing.
		 */
		object = NULL;
	}

This might be easiest to understand as a separate mini-function.
Something like

 /*
  * Like parse_object, but optimized by reading commits from the
  * commit graph.
  *
  * If the repository has commit graphs, repo_parse_commit() avoids
  * reading the object buffer, so use it whenever possible.
  */
 static struct object *parse_object_probably_commit(
		struct repository *r, const struct object_id *oid)
 {
	struct commit *c;

	if (oid_object_info(r, oid, NULL) != OBJ_COMMIT)
		return parse_object(r, oid);

	c = lookup_commit(r, oid);
	if (repo_parse_commit(r, c))
		return NULL;
	return (struct object *) c;
 }

 static struct object *get_reference(struct rev_info *revs, ...)
 {
	struct object *object = parse_object_probably_commit(revs->repo, oid);
	if (!object)
		/*
		 * An error occured while parsing the object.
		 * parse_object_probably_commit has already printed an
		 * error message.  For our purposes, it's safe to
		 * assume the object as missing.
		 */
		;

By the way, why doesn't parse_object itself check the commit graph for
commit objects?

[...]
> @@ -1907,7 +1917,18 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
>  		verify_non_filename(revs->prefix, arg);
>  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
>  	if (!object)
> -		return revs->ignore_missing ? 0 : -1;
> +		/*
> +		 * If this object is corrupt, get_reference() prints an error
> +		 * message and treats it as missing.

By "and treats it as missing" does this mean "and we should treat it
as missing"?  (Forgive my ignorance.)

Why do we treat corrupt objects as missing?  Is this for graceful
degredation when trying to recover data from a corrupt repository?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] revision: allow missing promisor objects on CLI
  2019-12-31  0:09   ` Jonathan Nieder
@ 2020-01-02 20:49     ` Jonathan Tan
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2020-01-02 20:49 UTC (permalink / raw)
  To: jrnieder; +Cc: jonathantanmy, git, gitster

> > However, in the case that --ignore-missing is not set but
> > --exclude-promisor-objects is set, there is still no distinction between
> > the case wherein the missing object is a promisor object and the case
> > wherein it is not. This is unnecessarily restrictive, since if a missing
> > promisor object is encountered in traversal, it is ignored; likewise it
> > should be ignored if provided through the CLI. Therefore, distinguish
> > between these 2 cases. (As a bonus, the code is now simpler.)
> 
> nit about tenses, not worth a reroll on its own: "As a bonus, this
> makes the code simpler" (since commit messages describe the status quo
> before the patch in the present tense).

OK.

> > --- a/revision.c
> > +++ b/revision.c
> > @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
> >  		if (!repo_parse_commit(revs->repo, c))
> >  			object = (struct object *) c;
> >  		else
> > +			/*
> > +			 * There is something wrong with the commit.
> > +			 * repo_parse_commit() will have already printed an
> > +			 * error message. For our purposes, treat as missing.
> > +			 */
> >  			object = NULL;
> >  	} else {
> > +		/*
> > +		 * There is something wrong with the object. parse_object()
> 
> If we got here, we are in the !commit case, which is not inherently wrong,
> right?

[snip]

Ah, good catch. It should be "If parse_object() returns NULL, ..."

> By the way, why doesn't parse_object itself check the commit graph for
> commit objects?

Because that's how I coded it in ec0c5798ee ("revision: use commit graph
in get_reference()", 2018-12-28). In the commit message, I said:

> Another way to accomplish this effect would be to modify parse_object()
> to use the commit graph if possible; however, I did not want to change
> parse_object()'s current behavior of always checking the object
> signature of the returned object.

But that doesn't mean that we cannot change it.

> By "and treats it as missing" does this mean "and we should treat it
> as missing"?  (Forgive my ignorance.)

I don't fully know if we should, hence my specific wording :-P but see
my answer to your next question.

> Why do we treat corrupt objects as missing?  Is this for graceful
> degredation when trying to recover data from a corrupt repository?

This (at least, treating wrong-hash objects the same as missing) has
been true since acdeec62cb ("Don't ever return corrupt objects from
"parse_object()"", 2007-03-20) (and that commit message has no
explanation). I think this is best - I consider corrupt objects to be
similar to missing objects with respect to repository corruption, so for
me it is logical to treat them the same way.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 0/2] Un-regress rev-list --exclude-promisor-objects
  2019-12-28  0:34 [PATCH] revision: allow missing promisor objects on CLI Jonathan Tan
  2019-12-28  3:50 ` Junio C Hamano
  2019-12-30 23:44 ` [PATCH v2] " Jonathan Tan
@ 2020-01-11 22:34 ` Jonathan Tan
  2020-01-11 22:34   ` [PATCH v3 1/2] revision: document get_reference() Jonathan Tan
  2020-01-11 22:34   ` [PATCH v3 2/2] revision: un-regress --exclude-promisor-objects Jonathan Tan
  2 siblings, 2 replies; 12+ messages in thread
From: Jonathan Tan @ 2020-01-11 22:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster, jrnieder

I took another look at this and tried to simplify things. The main
points were:

 - there is a real bug
 - it can be fixed by relying on get_reference() more
    - but there was some discussion about what get_reference() does, so
      I added some documentation first

Hopefully those main points were adequately conveyed in the new commit
messages, and I didn't oversimplify things.

There was some discussion about whether get_reference() should treat
corrupt objects as missing. After some thought, I think the best
argument for doing so is that this has been its behavior for some time,
and have wrote that in the first commit.

Jonathan Tan (2):
  revision: document get_reference()
  revision: un-regress --exclude-promisor-objects

 revision.c               | 12 +++++++++++-
 t/t0410-partial-clone.sh | 12 +++---------
 2 files changed, 14 insertions(+), 10 deletions(-)

-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] revision: document get_reference()
  2020-01-11 22:34 ` [PATCH v3 0/2] Un-regress rev-list --exclude-promisor-objects Jonathan Tan
@ 2020-01-11 22:34   ` Jonathan Tan
  2020-03-25 20:46     ` Emily Shaffer
  2020-01-11 22:34   ` [PATCH v3 2/2] revision: un-regress --exclude-promisor-objects Jonathan Tan
  1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2020-01-11 22:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster, jrnieder

In particular, document the behavior when the object is corrupt. The
existing behavior when parse_object() encounters a hash mismatch has
been there since cc243c3ceb ("show: --ignore-missing", 2011-05-19), and
the existing behavior when the code disagrees on whether an object is a
commit has been there since ec0c5798ee ("revision: use commit graph in
get_reference()", 2018-12-28).
---
 revision.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/revision.c b/revision.c
index 8136929e23..91ca194388 100644
--- a/revision.c
+++ b/revision.c
@@ -355,6 +355,16 @@ void add_head_to_pending(struct rev_info *revs)
 	add_pending_object(revs, obj, "HEAD");
 }
 
+/*
+ * Returns the object corresponding to "oid" and sets the given flags on
+ * it.
+ *
+ * If that object is missing or corrupt, this function returns NULL if
+ * "revs" permits it (that is, if revs->ignore_missing is true or if
+ * revs->exclude_promisor_objects is true and the object is a promisor
+ * object), and dies otherwise. Note that corrupt objects are treated
+ * like missing objects, to preserve existing behavior.
+ */
 static struct object *get_reference(struct rev_info *revs, const char *name,
 				    const struct object_id *oid,
 				    unsigned int flags)
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] revision: un-regress --exclude-promisor-objects
  2020-01-11 22:34 ` [PATCH v3 0/2] Un-regress rev-list --exclude-promisor-objects Jonathan Tan
  2020-01-11 22:34   ` [PATCH v3 1/2] revision: document get_reference() Jonathan Tan
@ 2020-01-11 22:34   ` Jonathan Tan
  2020-03-25 20:50     ` Emily Shaffer
  1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2020-01-11 22:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster, jrnieder

Before commit 4cf67869b2 ("list-objects.c: don't segfault for missing
cmdline objects", 2018-12-06),

  git rev-list --exclude-promisor-objects $A_MISSING_PROMISOR_OBJECT

succeeds. But after that commit, this invocation produces a non-zero
result.

Restore this functionality: since get_reference() already does what we
need, we can just use its return value; skip the arg if the return value
is NULL, and use it otherwise (if the arg is invalid, get_reference()
already dies). With this commit, --exclude-promisor-objects treats both
promisor objects passed through the CLI and promisor objects found
through traversal in the same say: it excludes them, so it does not
matter whether they're missing or not.
---
 revision.c               |  2 +-
 t/t0410-partial-clone.sh | 12 +++---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 91ca194388..0659a09b02 100644
--- a/revision.c
+++ b/revision.c
@@ -1917,7 +1917,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		return 0;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a3988bd4b8..b251985e82 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -397,7 +397,7 @@ test_expect_success 'rev-list stops traversal at promisor commit, tree, and blob
 	grep $(git -C repo rev-parse bar) out  # sanity check that some walking was done
 '
 
-test_expect_success 'rev-list dies for missing objects on cmd line' '
+test_expect_success 'rev-list accepts missing and promised objects on command line ' '
 	rm -rf repo &&
 	test_create_repo repo &&
 	test_commit -C repo foo &&
@@ -416,15 +416,9 @@ test_expect_success 'rev-list dies for missing objects on cmd line' '
 	git -C repo config extensions.partialclone "arbitrary string" &&
 
 	for OBJ in "$COMMIT" "$TREE" "$BLOB"; do
-		test_must_fail git -C repo rev-list --objects \
+		git -C repo rev-list --objects \
 			--exclude-promisor-objects "$OBJ" &&
-		test_must_fail git -C repo rev-list --objects-edge-aggressive \
-			--exclude-promisor-objects "$OBJ" &&
-
-		# Do not die or crash when --ignore-missing is passed.
-		git -C repo rev-list --ignore-missing --objects \
-			--exclude-promisor-objects "$OBJ" &&
-		git -C repo rev-list --ignore-missing --objects-edge-aggressive \
+		git -C repo rev-list --objects-edge-aggressive \
 			--exclude-promisor-objects "$OBJ"
 	done
 '
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] revision: document get_reference()
  2020-01-11 22:34   ` [PATCH v3 1/2] revision: document get_reference() Jonathan Tan
@ 2020-03-25 20:46     ` Emily Shaffer
  0 siblings, 0 replies; 12+ messages in thread
From: Emily Shaffer @ 2020-03-25 20:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, jrnieder

On Sat, Jan 11, 2020 at 02:34:55PM -0800, Jonathan Tan wrote:
> In particular, document the behavior when the object is corrupt. The
> existing behavior when parse_object() encounters a hash mismatch has
> been there since cc243c3ceb ("show: --ignore-missing", 2011-05-19), and
> the existing behavior when the code disagrees on whether an object is a
> commit has been there since ec0c5798ee ("revision: use commit graph in
> get_reference()", 2018-12-28).

Does "disagreement on whether an object is a commit" count as corrupt
object? Otherwise I'm not seeing mention of it in the comment that was
added.

> ---
>  revision.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/revision.c b/revision.c
> index 8136929e23..91ca194388 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -355,6 +355,16 @@ void add_head_to_pending(struct rev_info *revs)
>  	add_pending_object(revs, obj, "HEAD");
>  }
>  
> +/*
> + * Returns the object corresponding to "oid" and sets the given flags on
> + * it.
> + *
> + * If that object is missing or corrupt, this function returns NULL if
> + * "revs" permits it (that is, if revs->ignore_missing is true or if
> + * revs->exclude_promisor_objects is true and the object is a promisor
> + * object), and dies otherwise. Note that corrupt objects are treated

The parenthetical is hard to parse. Is it "(revs->ignore_missing is
true) or (revs->exclude_promisor_objects is true and the object is a
promisor)" or something else? Maybe an extra comma,

  (that is, if revs->ignore_missing is true, or if
  revs->exclude_promisor_objects is true and the object is a promisor
  object)

> + * like missing objects, to preserve existing behavior.
> + */
>  static struct object *get_reference(struct rev_info *revs, const char *name,
>  				    const struct object_id *oid,
>  				    unsigned int flags)
> -- 
> 2.25.0.rc1.283.g88dfdc4193-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] revision: un-regress --exclude-promisor-objects
  2020-01-11 22:34   ` [PATCH v3 2/2] revision: un-regress --exclude-promisor-objects Jonathan Tan
@ 2020-03-25 20:50     ` Emily Shaffer
  0 siblings, 0 replies; 12+ messages in thread
From: Emily Shaffer @ 2020-03-25 20:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, jrnieder

On Sat, Jan 11, 2020 at 02:34:56PM -0800, Jonathan Tan wrote:
> Before commit 4cf67869b2 ("list-objects.c: don't segfault for missing
> cmdline objects", 2018-12-06),
> 
>   git rev-list --exclude-promisor-objects $A_MISSING_PROMISOR_OBJECT
> 
> succeeds. But after that commit, this invocation produces a non-zero
> result.
> 
> Restore this functionality: since get_reference() already does what we
> need, we can just use its return value; skip the arg if the return value
> is NULL, and use it otherwise (if the arg is invalid, get_reference()
> already dies). With this commit, --exclude-promisor-objects treats both
> promisor objects passed through the CLI and promisor objects found
> through traversal in the same say: it excludes them, so it does not
> matter whether they're missing or not.

Since the return code is changing I'm kind of worried about what other
impacts this will have. What's the call tree for handle_revision_arg
look like?

It looks like it's called in a couple places by revision.c and by
builtin/pack-objects.c, but because of the way args are packed up in
struct rev_info, it's hard to tell when those callers will be affected
or not.

> ---
>  revision.c               |  2 +-
>  t/t0410-partial-clone.sh | 12 +++---------
>  2 files changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/revision.c b/revision.c
> index 91ca194388..0659a09b02 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1917,7 +1917,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
>  		verify_non_filename(revs->prefix, arg);
>  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
>  	if (!object)
> -		return revs->ignore_missing ? 0 : -1;
> +		return 0;
>  	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
>  	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
>  	free(oc.path);
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index a3988bd4b8..b251985e82 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -397,7 +397,7 @@ test_expect_success 'rev-list stops traversal at promisor commit, tree, and blob
>  	grep $(git -C repo rev-parse bar) out  # sanity check that some walking was done
>  '
>  
> -test_expect_success 'rev-list dies for missing objects on cmd line' '
> +test_expect_success 'rev-list accepts missing and promised objects on command line ' '
>  	rm -rf repo &&
>  	test_create_repo repo &&
>  	test_commit -C repo foo &&
> @@ -416,15 +416,9 @@ test_expect_success 'rev-list dies for missing objects on cmd line' '
>  	git -C repo config extensions.partialclone "arbitrary string" &&
>  
>  	for OBJ in "$COMMIT" "$TREE" "$BLOB"; do
> -		test_must_fail git -C repo rev-list --objects \
> +		git -C repo rev-list --objects \
>  			--exclude-promisor-objects "$OBJ" &&
> -		test_must_fail git -C repo rev-list --objects-edge-aggressive \
> -			--exclude-promisor-objects "$OBJ" &&
> -
> -		# Do not die or crash when --ignore-missing is passed.
> -		git -C repo rev-list --ignore-missing --objects \
> -			--exclude-promisor-objects "$OBJ" &&
> -		git -C repo rev-list --ignore-missing --objects-edge-aggressive \
> +		git -C repo rev-list --objects-edge-aggressive \
>  			--exclude-promisor-objects "$OBJ"

It seems to me the -ignore-missing tests should still pass and therefore
shouldn't be removed, no? But then I looked a little harder, and it
looks like before the test said, "This call fails, but with
--ignore-missing it does not fail" - and now the test just says "This
call does not fail". So it looks OK to me.

>  	done
>  '
> -- 
> 2.25.0.rc1.283.g88dfdc4193-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-03-25 20:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-28  0:34 [PATCH] revision: allow missing promisor objects on CLI Jonathan Tan
2019-12-28  3:50 ` Junio C Hamano
2019-12-30 18:38   ` Jonathan Tan
2019-12-30 20:33     ` Junio C Hamano
2019-12-30 23:44 ` [PATCH v2] " Jonathan Tan
2019-12-31  0:09   ` Jonathan Nieder
2020-01-02 20:49     ` Jonathan Tan
2020-01-11 22:34 ` [PATCH v3 0/2] Un-regress rev-list --exclude-promisor-objects Jonathan Tan
2020-01-11 22:34   ` [PATCH v3 1/2] revision: document get_reference() Jonathan Tan
2020-03-25 20:46     ` Emily Shaffer
2020-01-11 22:34   ` [PATCH v3 2/2] revision: un-regress --exclude-promisor-objects Jonathan Tan
2020-03-25 20:50     ` Emily Shaffer

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).