git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH v3 0/2] partial-clone: fix two issues with sparse filter handling
@ 2019-08-29 23:19 Jon Simons
  2019-08-29 23:19 ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Jon Simons
  2019-08-29 23:19 ` [PATCH v3 2/2] list-objects-filter: handle unresolved sparse filter OID Jon Simons
  0 siblings, 2 replies; 17+ messages in thread
From: Jon Simons @ 2019-08-29 23:19 UTC (permalink / raw)
  To: jon, git; +Cc: me, peff, sunshine, stolee

Included here are two fixes for partial cloning with sparse filters.
These issues were uncovered in early testing internally at GitHub,
where Taylor and Peff have provided early offlist review feedback.

This third revision includes a fix for a test bug introduced in the
second revision.

Jon Simons (2):
  list-objects-filter: only parse sparse OID when 'have_git_dir'
  list-objects-filter: handle unresolved sparse filter OID

 list-objects-filter-options.c |  3 ++-
 list-objects-filter.c         |  6 +++++-
 t/t5616-partial-clone.sh      | 28 ++++++++++++++++++++++++++++
 3 files changed, 35 insertions(+), 2 deletions(-)

-- 
2.23.0.37.g745f681289.dirty


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-08-29 23:19 [PATCH v3 0/2] partial-clone: fix two issues with sparse filter handling Jon Simons
@ 2019-08-29 23:19 ` Jon Simons
  2019-08-30 18:08   ` Junio C Hamano
  2019-08-29 23:19 ` [PATCH v3 2/2] list-objects-filter: handle unresolved sparse filter OID Jon Simons
  1 sibling, 1 reply; 17+ messages in thread
From: Jon Simons @ 2019-08-29 23:19 UTC (permalink / raw)
  To: jon, git; +Cc: me, peff, sunshine, stolee

Fix a bug in partial cloning with sparse filters by ensuring to check
for 'have_git_dir' before attempting to resolve the sparse filter OID.

Otherwise the client will trigger:

    BUG: refs.c:1851: attempting to get main_ref_store outside of repository

when attempting to git clone with a sparse filter.

Note that this fix is the minimal one which avoids the BUG and allows
for the clone to complete successfully:

There is an open question as to whether there should be any attempt
to resolve the OID provided by the client in this context, as a filter
for the clone to be used on the remote side.  For cases where local
and remote OID resolutions differ, resolving on the client side could
be considered a bug.  For now, the minimal approach here is used to
unblock further testing for partial clones with sparse filters, while
a more invasive fix could make sense to pursue as a future direction.

t5616 is updated to demonstrate the change.

Signed-off-by: Jon Simons <jon@jonsimons.org>
---
 list-objects-filter-options.c |  3 ++-
 t/t5616-partial-clone.sh      | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 1cb20c659c..aaba312edb 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -71,7 +71,8 @@ static int gently_parse_list_objects_filter(
 		 * command, but DO NOT complain if we don't have the blob or
 		 * ref locally.
 		 */
-		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
+		if (have_git_dir() &&
+		    !get_oid_with_context(the_repository, v0, GET_OID_BLOB,
 					  &sparse_oid, &oc))
 			filter_options->sparse_oid_value = oiddup(&sparse_oid);
 		filter_options->choice = LOFC_SPARSE_OID;
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 565254558f..5c68431d10 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -241,6 +241,27 @@ test_expect_success 'fetch what is specified on CLI even if already promised' '
 	! grep "?$(cat blob)" missing_after
 '
 
+test_expect_success 'setup src repo for sparse filter' '
+	git init sparse-src &&
+	git -C sparse-src config --local uploadpack.allowfilter 1 &&
+	git -C sparse-src config --local uploadpack.allowanysha1inwant 1 &&
+	for n in 1 2 3 4
+	do
+		test_commit -C sparse-src "this-is-file-$n" file.$n.txt || return 1
+	done &&
+	test_write_lines /file1.txt /file3.txt >sparse-src/odd-files &&
+	test_write_lines /file2.txt /file4.txt >sparse-src/even-files &&
+	echo "/*" >sparse-src/all-files &&
+	git -C sparse-src add odd-files even-files all-files &&
+	git -C sparse-src commit -m "some sparse checkout files"
+'
+
+test_expect_success 'partial clone with sparse filter succeeds' '
+	git clone --no-local --no-checkout --filter=sparse:oid=master:all-files sparse-src pc-all &&
+	git clone --no-local --no-checkout --filter=sparse:oid=master:even-files sparse-src pc-even &&
+	git clone --no-local --no-checkout --filter=sparse:oid=master:odd-files sparse-src pc-odd
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.23.0.37.g745f681289.dirty


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 2/2] list-objects-filter: handle unresolved sparse filter OID
  2019-08-29 23:19 [PATCH v3 0/2] partial-clone: fix two issues with sparse filter handling Jon Simons
  2019-08-29 23:19 ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Jon Simons
@ 2019-08-29 23:19 ` Jon Simons
  1 sibling, 0 replies; 17+ messages in thread
From: Jon Simons @ 2019-08-29 23:19 UTC (permalink / raw)
  To: jon, git; +Cc: me, peff, sunshine, stolee

Handle a potential NULL 'sparse_oid_value' when attempting to load
sparse filter exclusions by blob, to avoid segfaulting later during
'add_excludes_from_blob_to_list'.

While here, uniquify the errors emitted to distinguish between the
case that a given OID is NULL due to an earlier failure to resolve it,
and when an OID resolves but parsing the sparse filter spec fails.

t5616 is updated to demonstrate the change.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jon Simons <jon@jonsimons.org>
---
 list-objects-filter.c    | 6 +++++-
 t/t5616-partial-clone.sh | 7 +++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 36e1f774bc..252fae5d4e 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -464,9 +464,13 @@ static void *filter_sparse_oid__init(
 {
 	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
 	d->omits = omitted;
+	if (!filter_options->sparse_oid_value)
+		die(_("unable to read sparse filter specification from %s"),
+		      filter_options->filter_spec);
 	if (add_excludes_from_blob_to_list(filter_options->sparse_oid_value,
 					   NULL, 0, &d->el) < 0)
-		die("could not load filter specification");
+		die(_("unable to parse sparse filter data in %s"),
+		      oid_to_hex(filter_options->sparse_oid_value));
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 5c68431d10..d01886c464 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -262,6 +262,13 @@ test_expect_success 'partial clone with sparse filter succeeds' '
 	git clone --no-local --no-checkout --filter=sparse:oid=master:odd-files sparse-src pc-odd
 '
 
+test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
+	test_must_fail git clone --no-local --no-checkout --filter=sparse:oid=master:sparse-filter sparse-src sc1 2>err &&
+	test_i18ngrep "unable to read sparse filter specification from sparse:oid=master:sparse-filter" err &&
+	test_must_fail git clone --no-local --no-checkout --filter=sparse:oid=master sparse-src sc2 2>err &&
+	test_i18ngrep "unable to parse sparse filter data in $(git -C sparse-src rev-parse master)" err
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.23.0.37.g745f681289.dirty


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-08-29 23:19 ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Jon Simons
@ 2019-08-30 18:08   ` Junio C Hamano
  2019-09-04  4:54     ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2019-08-30 18:08 UTC (permalink / raw)
  To: Jon Simons; +Cc: git, me, peff, sunshine, stolee

Jon Simons <jon@jonsimons.org> writes:

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index 1cb20c659c..aaba312edb 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -71,7 +71,8 @@ static int gently_parse_list_objects_filter(
>  		 * command, but DO NOT complain if we don't have the blob or
>  		 * ref locally.
>  		 */
> -		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
> +		if (have_git_dir() &&
> +		    !get_oid_with_context(the_repository, v0, GET_OID_BLOB,
>  					  &sparse_oid, &oc))
>  			filter_options->sparse_oid_value = oiddup(&sparse_oid);
>  		filter_options->choice = LOFC_SPARSE_OID;

Sorry, I do not quite understand what this wants to do.  We say "we
parsed this correctly, this filter is sparse:oid=<blob>" without
filling sparse_oid_value field at all.  What do the consumers of
such a filter_options instance do to such a half-read option?

If they say "ah, the parser wanted to do sparse:oid but we couldn't
really figure the <blob> part out, so let's ignore it", that might
be the best they could do anyway, but it somewhat feels iffy.  I
wonder if we are better off inventing a new "we tried to parse but
we lack sufficient info to make it useful" value to use in .choice
field and return it instead.

In the longer term, what do we want to happen in such a case where
"read this blob to figure out what I want you to do" cannot be
satisfied due to chicken-and-egg situation like this?  Can we
postpone fetching or cloning that *wants* to use the named <blob>
when we discover that the <blob> is not available (of which, your
"!have_git_dir()" is a subset), grab that single <blob> first from
the other side before doing the main transfer, and then resume the
transfer that wants to use the <blob> after we successfully do so,
or something?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-08-30 18:08   ` Junio C Hamano
@ 2019-09-04  4:54     ` Jeff King
  2019-09-05 18:57       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2019-09-04  4:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

On Fri, Aug 30, 2019 at 11:08:34AM -0700, Junio C Hamano wrote:

> Jon Simons <jon@jonsimons.org> writes:
> 
> > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> > index 1cb20c659c..aaba312edb 100644
> > --- a/list-objects-filter-options.c
> > +++ b/list-objects-filter-options.c
> > @@ -71,7 +71,8 @@ static int gently_parse_list_objects_filter(
> >  		 * command, but DO NOT complain if we don't have the blob or
> >  		 * ref locally.
> >  		 */
> > -		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
> > +		if (have_git_dir() &&
> > +		    !get_oid_with_context(the_repository, v0, GET_OID_BLOB,
> >  					  &sparse_oid, &oc))
> >  			filter_options->sparse_oid_value = oiddup(&sparse_oid);
> >  		filter_options->choice = LOFC_SPARSE_OID;
> 
> Sorry, I do not quite understand what this wants to do.  We say "we
> parsed this correctly, this filter is sparse:oid=<blob>" without
> filling sparse_oid_value field at all.  What do the consumers of
> such a filter_options instance do to such a half-read option?

Empirically, they either die() or segfault. ;)

I agree the code even before Jon's patch is pretty funny. Forgetting the
"not a repo" case for a moment, we know that get_oid_with_context()
might not find the name. In the case of rev-list, we manually check the
validity of sparse_oid_value later and die().  Which seems like a
layering violation, but at least gives the desired outcome.

In the case of clone, it gets parsed twice:

  1. On the client side, we don't have a repo yet, and so we BUG(). But
     once fixed, we don't care about the value at all! We never use it,
     and instead send the original filter spec over to the server, so at
     most this whole option-parsing is giving us an early syntax check.

  2. On the server side, upload-pack receives the filter spec, parses
     it, but doesn't remember to check whether it's a valid oid. It
     segfaults if it isn't (that's patch 2 here).

So these patches are punting on the greater question of why we want to
parse so early, and are not making anything worse. AFAICT, "clone
--filter=sparse:oid" has never worked (even though our tests did cover
the underlying rev-list and pack-objects code paths).

> If they say "ah, the parser wanted to do sparse:oid but we couldn't
> really figure the <blob> part out, so let's ignore it", that might
> be the best they could do anyway, but it somewhat feels iffy.  I
> wonder if we are better off inventing a new "we tried to parse but
> we lack sufficient info to make it useful" value to use in .choice
> field and return it instead.

TBH, I'm not sure why the original is so eager to parse early. I guess
it allows:

  - a dual use of the options parser; we can use it both to sanity-check
    the options before sending them to a server, and to actually use the
    filter ourselves.

  - earlier detection maybe gives us a cleaner error path (e.g.,
    rev-list can do its own error handling). But I'd think doing it when
    we actually initialize the filter would be enough.

I.e., if we want to go all the way, I think this two-patch series could
basically be replaced with something like the (totally untested)
approach below, which just pushes the parsing closer to the
point-of-use.

Adding Jeff Hostetler to the cc, in case he recalls any reason not to
use that approach.

> In the longer term, what do we want to happen in such a case where
> "read this blob to figure out what I want you to do" cannot be
> satisfied due to chicken-and-egg situation like this?  Can we
> postpone fetching or cloning that *wants* to use the named <blob>
> when we discover that the <blob> is not available (of which, your
> "!have_git_dir()" is a subset), grab that single <blob> first from
> the other side before doing the main transfer, and then resume the
> transfer that wants to use the <blob> after we successfully do so,
> or something?

I don't think there's any chicken-and-egg here. The client side of a
clone does not ever care about what's in sparse_oid_value at all, before
or after (and the name needs to be resolved from the server's view of
the refs, anyway). I think there could be a feature where we do a narrow
clone based on the spec in an oid, and then do a matching sparse
checkout. And that feature would want to make sure it knows how the
server resolved the spec in the first step. But AFAIK we don't yet have
such a feature (and the simplest thing would probably be a capability
where the server tells us the blob oid, and makes sure it is
transmitted).

-Peff

---
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 301ccb970b..74dbfad73d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -471,10 +471,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			parse_list_objects_filter(&filter_options, arg);
 			if (filter_options.choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
-			if (filter_options.choice == LOFC_SPARSE_OID &&
-			    !filter_options.sparse_oid_value)
-				die(_("invalid sparse value '%s'"),
-				    filter_options.filter_spec);
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 1cb20c659c..76a9a49a75 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -63,17 +63,8 @@ static int gently_parse_list_objects_filter(
 		return 0;
 
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
-		struct object_context oc;
-		struct object_id sparse_oid;
-
-		/*
-		 * Try to parse <oid-expression> into an OID for the current
-		 * command, but DO NOT complain if we don't have the blob or
-		 * ref locally.
-		 */
-		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
-					  &sparse_oid, &oc))
-			filter_options->sparse_oid_value = oiddup(&sparse_oid);
+		/* v0 is durable because it points into our saved filter_spec */
+		filter_options->sparse_oid_name = v0;
 		filter_options->choice = LOFC_SPARSE_OID;
 		return 0;
 
@@ -138,7 +129,6 @@ void list_objects_filter_release(
 	struct list_objects_filter_options *filter_options)
 {
 	free(filter_options->filter_spec);
-	free(filter_options->sparse_oid_value);
 	memset(filter_options, 0, sizeof(*filter_options));
 }
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index c54f0000fb..a819e42ffb 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -42,7 +42,7 @@ struct list_objects_filter_options {
 	 * choice-specific; not all values will be defined for any given
 	 * choice.
 	 */
-	struct object_id *sparse_oid_value;
+	const char *sparse_oid_name;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
 };
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 36e1f774bc..130c17b95c 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -463,9 +463,16 @@ static void *filter_sparse_oid__init(
 	filter_free_fn *filter_free_fn)
 {
 	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
+	struct object_context oc;
+	struct object_id sparse_oid;
+
+	if (!get_oid_with_context(the_repository,
+				  filter_options->sparse_oid_name,
+				  GET_OID_BLOB, &sparse_oid, &oc))
+		die("unable to access sparse blob in '%s'",
+		    filter_options->sparse_oid_name);
 	d->omits = omitted;
-	if (add_excludes_from_blob_to_list(filter_options->sparse_oid_value,
-					   NULL, 0, &d->el) < 0)
+	if (add_excludes_from_blob_to_list(&sparse_oid, NULL, 0, &d->el) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-09-04  4:54     ` Jeff King
@ 2019-09-05 18:57       ` Junio C Hamano
  2019-09-09 13:54         ` Jeff Hostetler
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2019-09-05 18:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

Jeff King <peff@peff.net> writes:

> So these patches are punting on the greater question of why we want to
> parse so early, and are not making anything worse. AFAICT, "clone
> --filter=sparse:oid" has never worked (even though our tests did cover
> the underlying rev-list and pack-objects code paths).
> ...
> TBH, I'm not sure why the original is so eager to parse early. I guess
> it allows:
>
>   - a dual use of the options parser; we can use it both to sanity-check
>     the options before sending them to a server, and to actually use the
>     filter ourselves.
>
>   - earlier detection maybe gives us a cleaner error path (e.g.,
>     rev-list can do its own error handling). But I'd think doing it when
>     we actually initialize the filter would be enough.
>
> I.e., if we want to go all the way, I think this two-patch series could
> basically be replaced with something like the (totally untested)
> approach below, which just pushes the parsing closer to the
> point-of-use.
>
> Adding Jeff Hostetler to the cc, in case he recalls any reason not to
> use that approach.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-09-05 18:57       ` Junio C Hamano
@ 2019-09-09 13:54         ` Jeff Hostetler
  2019-09-09 17:08           ` Jeff King
  2019-09-09 17:12           ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff Hostetler @ 2019-09-09 13:54 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King
  Cc: Jeff Hostetler, Jon Simons, git, me, sunshine, stolee



On 9/5/2019 2:57 PM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> So these patches are punting on the greater question of why we want to
>> parse so early, and are not making anything worse. AFAICT, "clone
>> --filter=sparse:oid" has never worked (even though our tests did cover
>> the underlying rev-list and pack-objects code paths).
>> ...
>> TBH, I'm not sure why the original is so eager to parse early. I guess
>> it allows:
>>
>>    - a dual use of the options parser; we can use it both to sanity-check
>>      the options before sending them to a server, and to actually use the
>>      filter ourselves.
>>
>>    - earlier detection maybe gives us a cleaner error path (e.g.,
>>      rev-list can do its own error handling). But I'd think doing it when
>>      we actually initialize the filter would be enough.
>>
>> I.e., if we want to go all the way, I think this two-patch series could
>> basically be replaced with something like the (totally untested)
>> approach below, which just pushes the parsing closer to the
>> point-of-use.
>>
>> Adding Jeff Hostetler to the cc, in case he recalls any reason not to
>> use that approach.
> 
> Thanks.
> 

I think both of Peff's guesses are correct.

IIRC I wrote the original parse_list_objects_filter() and friends to
syntax check the command line arguments of rev-list.  In hindsight,
this looks a bit aggressive at that layer, or rather now that it is
being used by various places in other commands (such as parsing
messages from the wire), it shouldn't call die() as Peff suggests.

I like the code Peff suggests.  Making parse_list_objects_filter()
a bit simpler and not call die().  Callers should then check the
function return value as necessary.

It would be nice if we could continue to let parse_list_objects_filter()
do the syntax checking -- that is, we can still check that we received a
ulong in "blob:limit:<nr>" and that "sparse:oid:<oid>" looks like a hex
value, for example.  Just save the actual <oid> lookup to the higher
layer, if and when that makes sense.

Jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-09-09 13:54         ` Jeff Hostetler
@ 2019-09-09 17:08           ` Jeff King
  2019-09-09 20:03             ` Jeff Hostetler
  2019-09-15  1:09             ` [PATCH 0/3] clone --filter=sparse:oid bugs Jeff King
  2019-09-09 17:12           ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Junio C Hamano
  1 sibling, 2 replies; 17+ messages in thread
From: Jeff King @ 2019-09-09 17:08 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

On Mon, Sep 09, 2019 at 09:54:36AM -0400, Jeff Hostetler wrote:

> It would be nice if we could continue to let parse_list_objects_filter()
> do the syntax checking -- that is, we can still check that we received a
> ulong in "blob:limit:<nr>" and that "sparse:oid:<oid>" looks like a hex
> value, for example.  Just save the actual <oid> lookup to the higher
> layer, if and when that makes sense.

Yeah, I agree that is the right place to do syntactic checking. But I
think we can't do much checking for the sparse-oid. We currently accept
not just a hex oid, but any name. And I think that is useful; I should
be able to say "master:sparse-file" and have it resolved by the remote
side.

So it really is "any name which is syntactically resolvable as a sha1
expression". At which point I think you might as well punt and just wait
until we _actually_ try to resolve the name to see if it's valid.

I'll work up what I sent earlier into a real patch, and include some of
this discussion.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-09-09 13:54         ` Jeff Hostetler
  2019-09-09 17:08           ` Jeff King
@ 2019-09-09 17:12           ` Junio C Hamano
  2019-09-09 19:49             ` Jeff Hostetler
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2019-09-09 17:12 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Jeff King, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

Jeff Hostetler <git@jeffhostetler.com> writes:

> It would be nice if we could continue to let parse_list_objects_filter()
> do the syntax checking -- that is, we can still check that we received a
> ulong in "blob:limit:<nr>" and that "sparse:oid:<oid>" looks like a hex
> value, for example.  Just save the actual <oid> lookup to the higher
> layer, if and when that makes sense.

Hmmm, am I misremembering things or did I hear somebody mention in
this thread that people give "sparse:oid:master" (not blob object
name in hex, but a refname) and expect the other side to resolve it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-09-09 17:12           ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Junio C Hamano
@ 2019-09-09 19:49             ` Jeff Hostetler
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Hostetler @ 2019-09-09 19:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee



On 9/9/2019 1:12 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> It would be nice if we could continue to let parse_list_objects_filter()
>> do the syntax checking -- that is, we can still check that we received a
>> ulong in "blob:limit:<nr>" and that "sparse:oid:<oid>" looks like a hex
>> value, for example.  Just save the actual <oid> lookup to the higher
>> layer, if and when that makes sense.
> 
> Hmmm, am I misremembering things or did I hear somebody mention in
> this thread that people give "sparse:oid:master" (not blob object
> name in hex, but a refname) and expect the other side to resolve it?
> 

You're right.  I misspoke.  I was thinking about the hex OID case
and forgetting about the <rev>:<path> form.

Jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir'
  2019-09-09 17:08           ` Jeff King
@ 2019-09-09 20:03             ` Jeff Hostetler
  2019-09-15  1:09             ` [PATCH 0/3] clone --filter=sparse:oid bugs Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff Hostetler @ 2019-09-09 20:03 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee



On 9/9/2019 1:08 PM, Jeff King wrote:
> On Mon, Sep 09, 2019 at 09:54:36AM -0400, Jeff Hostetler wrote:
> 
>> It would be nice if we could continue to let parse_list_objects_filter()
>> do the syntax checking -- that is, we can still check that we received a
>> ulong in "blob:limit:<nr>" and that "sparse:oid:<oid>" looks like a hex
>> value, for example.  Just save the actual <oid> lookup to the higher
>> layer, if and when that makes sense.
> 
> Yeah, I agree that is the right place to do syntactic checking. But I
> think we can't do much checking for the sparse-oid. We currently accept
> not just a hex oid, but any name. And I think that is useful; I should
> be able to say "master:sparse-file" and have it resolved by the remote
> side.

Right. I forgot about the "master:sparse-file" case and was only
thinking about the hex oid case.  The sparse-file case is very
useful.

> 
> So it really is "any name which is syntactically resolvable as a sha1
> expression". At which point I think you might as well punt and just wait
> until we _actually_ try to resolve the name to see if it's valid.
> 
> I'll work up what I sent earlier into a real patch, and include some of
> this discussion.
> 
> -Peff
> 

thanks
Jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/3] clone --filter=sparse:oid bugs
  2019-09-09 17:08           ` Jeff King
  2019-09-09 20:03             ` Jeff Hostetler
@ 2019-09-15  1:09             ` Jeff King
  2019-09-15  1:11               ` [PATCH 1/3] t5616: test cloning/fetching with sparse:oid=<oid> filter Jeff King
                                 ` (3 more replies)
  1 sibling, 4 replies; 17+ messages in thread
From: Jeff King @ 2019-09-15  1:09 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

On Mon, Sep 09, 2019 at 01:08:24PM -0400, Jeff King wrote:

> I'll work up what I sent earlier into a real patch, and include some of
> this discussion.

Here it is. I pulled Jon's tests out into their own patch (mostly
because it makes it easier to give credit). Then patch 2 is my fix, and
patch 3 is the message fixups he had done.

This replaces what's queued in js/partial-clone-sparse-blob.

  [1/3]: t5616: test cloning/fetching with sparse:oid=<oid> filter
  [2/3]: list-objects-filter: delay parsing of sparse oid
  [3/3]: list-objects-filter: give a more specific error sparse parsing error

 builtin/rev-list.c            |  4 ----
 list-objects-filter-options.c | 14 ++------------
 list-objects-filter-options.h |  2 +-
 list-objects-filter.c         | 14 +++++++++++---
 t/t5616-partial-clone.sh      | 36 +++++++++++++++++++++++++++++++++++
 5 files changed, 50 insertions(+), 20 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] t5616: test cloning/fetching with sparse:oid=<oid> filter
  2019-09-15  1:09             ` [PATCH 0/3] clone --filter=sparse:oid bugs Jeff King
@ 2019-09-15  1:11               ` Jeff King
  2019-09-15  1:13               ` [PATCH 2/3] list-objects-filter: delay parsing of sparse oid Jeff King
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2019-09-15  1:11 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

From: Jon Simons <jon@jonsimons.org>

We test in t5317 that "sparse:oid" filters work with rev-list, but
there's no coverage at all confirming that they work with a fetch or
clone (and in fact, there are several bugs). Let's do a basic test that
a clone fetches the correct objects.

[jk: extracted from Jon's earlier fix patches. I also simplified the
     setup down to a single sparse file, and I added checks that we got the
     right blobs]

Signed-off-by: Jon Simons <jon@jonsimons.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5616-partial-clone.sh | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 565254558f..0bdbc819f1 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -241,6 +241,42 @@ test_expect_success 'fetch what is specified on CLI even if already promised' '
 	! grep "?$(cat blob)" missing_after
 '
 
+test_expect_success 'setup src repo for sparse filter' '
+	git init sparse-src &&
+	git -C sparse-src config --local uploadpack.allowfilter 1 &&
+	git -C sparse-src config --local uploadpack.allowanysha1inwant 1 &&
+	test_commit -C sparse-src one &&
+	test_commit -C sparse-src two &&
+	echo /one.t >sparse-src/only-one &&
+	git -C sparse-src add . &&
+	git -C sparse-src commit -m "add sparse checkout files"
+'
+
+test_expect_failure 'partial clone with sparse filter succeeds' '
+	rm -rf dst.git &&
+	git clone --no-local --bare \
+		  --filter=sparse:oid=master:only-one \
+		  sparse-src dst.git &&
+	(
+		cd dst.git &&
+		git rev-list --objects --missing=print HEAD >out &&
+		grep "^$(git rev-parse HEAD:one.t)" out &&
+		grep "^?$(git rev-parse HEAD:two.t)" out
+	)
+'
+
+test_expect_failure 'partial clone with unresolvable sparse filter fails cleanly' '
+	rm -rf dst.git &&
+	test_must_fail git clone --no-local --bare \
+				 --filter=sparse:oid=master:no-such-name \
+				 sparse-src dst.git 2>err &&
+	test_i18ngrep "unable to access sparse blob in .master:no-such-name" err &&
+	test_must_fail git clone --no-local --bare \
+				 --filter=sparse:oid=master \
+				 sparse-src dst.git 2>err &&
+	test_i18ngrep "could not load filter specification" err
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.23.0.667.gcccf1fbb03


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/3] list-objects-filter: delay parsing of sparse oid
  2019-09-15  1:09             ` [PATCH 0/3] clone --filter=sparse:oid bugs Jeff King
  2019-09-15  1:11               ` [PATCH 1/3] t5616: test cloning/fetching with sparse:oid=<oid> filter Jeff King
@ 2019-09-15  1:13               ` Jeff King
  2019-09-15 16:12                 ` Jeff King
  2019-09-15  1:13               ` [PATCH 3/3] list-objects-filter: give a more specific error sparse parsing error Jeff King
  2019-09-15 16:51               ` [PATCH 4/3] list-objects-filter: use empty string instead of NULL for sparse "base" Jeff King
  3 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2019-09-15  1:13 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

The list-objects-filter code has two steps to its initialization:

  1. parse_list_objects_filter() makes sure the spec is a filter we know
     about and is syntactically correct. This step is done by "rev-list"
     or "upload-pack" that is going to apply a filter, but also by "git
     clone" or "git fetch" before they send the spec across the wire.

  2. list_objects_filter__init() runs the type-specific initialization
     (using function pointers established in step 1). This happens at
     the start of traverse_commit_list_filtered(), when we're about to
     actually use the filter.

It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.

The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:

  - if we're not in a repository (e.g., because it's git-clone parsing
    the spec), then we trigger a BUG() trying to resolve the name

  - if we did hit the error case, we still have to notice that later and
    bail. The code path in rev-list handles this, but the one in
    upload-pack does not, leading to a segfault.

We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.

As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c            |  4 ----
 list-objects-filter-options.c | 14 ++------------
 list-objects-filter-options.h |  2 +-
 list-objects-filter.c         | 11 +++++++++--
 t/t5616-partial-clone.sh      |  4 ++--
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 301ccb970b..74dbfad73d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -471,10 +471,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			parse_list_objects_filter(&filter_options, arg);
 			if (filter_options.choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
-			if (filter_options.choice == LOFC_SPARSE_OID &&
-			    !filter_options.sparse_oid_value)
-				die(_("invalid sparse value '%s'"),
-				    filter_options.filter_spec);
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 1cb20c659c..43f41f446c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -63,17 +63,8 @@ static int gently_parse_list_objects_filter(
 		return 0;
 
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
-		struct object_context oc;
-		struct object_id sparse_oid;
-
-		/*
-		 * Try to parse <oid-expression> into an OID for the current
-		 * command, but DO NOT complain if we don't have the blob or
-		 * ref locally.
-		 */
-		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
-					  &sparse_oid, &oc))
-			filter_options->sparse_oid_value = oiddup(&sparse_oid);
+		/* v0 points into filter_options->filter_spec; no allocation needed */
+		filter_options->sparse_oid_name = v0;
 		filter_options->choice = LOFC_SPARSE_OID;
 		return 0;
 
@@ -138,7 +129,6 @@ void list_objects_filter_release(
 	struct list_objects_filter_options *filter_options)
 {
 	free(filter_options->filter_spec);
-	free(filter_options->sparse_oid_value);
 	memset(filter_options, 0, sizeof(*filter_options));
 }
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index c54f0000fb..a819e42ffb 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -42,7 +42,7 @@ struct list_objects_filter_options {
 	 * choice-specific; not all values will be defined for any given
 	 * choice.
 	 */
-	struct object_id *sparse_oid_value;
+	const char *sparse_oid_name;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
 };
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 36e1f774bc..d2cdc03a73 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -463,9 +463,16 @@ static void *filter_sparse_oid__init(
 	filter_free_fn *filter_free_fn)
 {
 	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
+	struct object_context oc;
+	struct object_id sparse_oid;
+
+	if (get_oid_with_context(the_repository,
+				 filter_options->sparse_oid_name,
+				 GET_OID_BLOB, &sparse_oid, &oc))
+		die("unable to access sparse blob in '%s'",
+		    filter_options->sparse_oid_name);
 	d->omits = omitted;
-	if (add_excludes_from_blob_to_list(filter_options->sparse_oid_value,
-					   NULL, 0, &d->el) < 0)
+	if (add_excludes_from_blob_to_list(&sparse_oid, NULL, 0, &d->el) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 0bdbc819f1..84365b802a 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -252,7 +252,7 @@ test_expect_success 'setup src repo for sparse filter' '
 	git -C sparse-src commit -m "add sparse checkout files"
 '
 
-test_expect_failure 'partial clone with sparse filter succeeds' '
+test_expect_success 'partial clone with sparse filter succeeds' '
 	rm -rf dst.git &&
 	git clone --no-local --bare \
 		  --filter=sparse:oid=master:only-one \
@@ -265,7 +265,7 @@ test_expect_failure 'partial clone with sparse filter succeeds' '
 	)
 '
 
-test_expect_failure 'partial clone with unresolvable sparse filter fails cleanly' '
+test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
 	rm -rf dst.git &&
 	test_must_fail git clone --no-local --bare \
 				 --filter=sparse:oid=master:no-such-name \
-- 
2.23.0.667.gcccf1fbb03


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/3] list-objects-filter: give a more specific error sparse parsing error
  2019-09-15  1:09             ` [PATCH 0/3] clone --filter=sparse:oid bugs Jeff King
  2019-09-15  1:11               ` [PATCH 1/3] t5616: test cloning/fetching with sparse:oid=<oid> filter Jeff King
  2019-09-15  1:13               ` [PATCH 2/3] list-objects-filter: delay parsing of sparse oid Jeff King
@ 2019-09-15  1:13               ` Jeff King
  2019-09-15 16:51               ` [PATCH 4/3] list-objects-filter: use empty string instead of NULL for sparse "base" Jeff King
  3 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2019-09-15  1:13 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

From: Jon Simons <jon@jonsimons.org>

The sparse:oid filter has two error modes: we might fail to resolve the
name to an OID, or we might fail to parse the contents of that OID. In
the latter case, let's give a less generic error message, and mention
the OID we did find.

While we're here, let's also mark both messages as translatable.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects-filter.c    | 5 +++--
 t/t5616-partial-clone.sh | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index d2cdc03a73..50f0c6d07b 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -469,11 +469,12 @@ static void *filter_sparse_oid__init(
 	if (get_oid_with_context(the_repository,
 				 filter_options->sparse_oid_name,
 				 GET_OID_BLOB, &sparse_oid, &oc))
-		die("unable to access sparse blob in '%s'",
+		die(_("unable to access sparse blob in '%s'"),
 		    filter_options->sparse_oid_name);
 	d->omits = omitted;
 	if (add_excludes_from_blob_to_list(&sparse_oid, NULL, 0, &d->el) < 0)
-		die("could not load filter specification");
+		die(_("unable to parse sparse filter data in %s"),
+		    oid_to_hex(&sparse_oid));
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 84365b802a..b85d3f5e4c 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -274,7 +274,7 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
 	test_must_fail git clone --no-local --bare \
 				 --filter=sparse:oid=master \
 				 sparse-src dst.git 2>err &&
-	test_i18ngrep "could not load filter specification" err
+	test_i18ngrep "unable to parse sparse filter data in" err
 '
 
 . "$TEST_DIRECTORY"/lib-httpd.sh
-- 
2.23.0.667.gcccf1fbb03

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] list-objects-filter: delay parsing of sparse oid
  2019-09-15  1:13               ` [PATCH 2/3] list-objects-filter: delay parsing of sparse oid Jeff King
@ 2019-09-15 16:12                 ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2019-09-15 16:12 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

On Sat, Sep 14, 2019 at 09:13:19PM -0400, Jeff King wrote:

> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -63,17 +63,8 @@ static int gently_parse_list_objects_filter(
>  		return 0;
>  
>  	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
> -		struct object_context oc;
> -		struct object_id sparse_oid;
> -
> -		/*
> -		 * Try to parse <oid-expression> into an OID for the current
> -		 * command, but DO NOT complain if we don't have the blob or
> -		 * ref locally.
> -		 */
> -		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
> -					  &sparse_oid, &oc))
> -			filter_options->sparse_oid_value = oiddup(&sparse_oid);
> +		/* v0 points into filter_options->filter_spec; no allocation needed */
> +		filter_options->sparse_oid_name = v0;

Looks like this comment is no longer true after merging with
md/list-objects-filter-combo, which is in 'next'.

We can obviously deal with it during the merge, but it probably makes
sense to just be more defensive from the start. Here's a revised version
of patch 2, with these changes:

    diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
    index 43f41f446c..adbea552a0 100644
    --- a/list-objects-filter-options.c
    +++ b/list-objects-filter-options.c
    @@ -63,8 +63,7 @@ static int gently_parse_list_objects_filter(
     		return 0;
     
     	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
    -		/* v0 points into filter_options->filter_spec; no allocation needed */
    -		filter_options->sparse_oid_name = v0;
    +		filter_options->sparse_oid_name = xstrdup(v0);
     		filter_options->choice = LOFC_SPARSE_OID;
     		return 0;
     
    @@ -129,6 +128,7 @@ void list_objects_filter_release(
     	struct list_objects_filter_options *filter_options)
     {
     	free(filter_options->filter_spec);
    +	free(filter_options->sparse_oid_name);
     	memset(filter_options, 0, sizeof(*filter_options));
     }
     
    diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
    index a819e42ffb..20b9d1e587 100644
    --- a/list-objects-filter-options.h
    +++ b/list-objects-filter-options.h
    @@ -42,7 +42,7 @@ struct list_objects_filter_options {
     	 * choice-specific; not all values will be defined for any given
     	 * choice.
     	 */
    -	const char *sparse_oid_name;
    +	char *sparse_oid_name;
     	unsigned long blob_limit_value;
     	unsigned long tree_exclude_depth;
     };

-- >8 --
Subject: [PATCH] list-objects-filter: delay parsing of sparse oid

The list-objects-filter code has two steps to its initialization:

  1. parse_list_objects_filter() makes sure the spec is a filter we know
     about and is syntactically correct. This step is done by "rev-list"
     or "upload-pack" that is going to apply a filter, but also by "git
     clone" or "git fetch" before they send the spec across the wire.

  2. list_objects_filter__init() runs the type-specific initialization
     (using function pointers established in step 1). This happens at
     the start of traverse_commit_list_filtered(), when we're about to
     actually use the filter.

It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.

The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:

  - if we're not in a repository (e.g., because it's git-clone parsing
    the spec), then we trigger a BUG() trying to resolve the name

  - if we did hit the error case, we still have to notice that later and
    bail. The code path in rev-list handles this, but the one in
    upload-pack does not, leading to a segfault.

We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.

As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c            |  4 ----
 list-objects-filter-options.c | 14 ++------------
 list-objects-filter-options.h |  2 +-
 list-objects-filter.c         | 11 +++++++++--
 t/t5616-partial-clone.sh      |  4 ++--
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 301ccb970b..74dbfad73d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -471,10 +471,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			parse_list_objects_filter(&filter_options, arg);
 			if (filter_options.choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
-			if (filter_options.choice == LOFC_SPARSE_OID &&
-			    !filter_options.sparse_oid_value)
-				die(_("invalid sparse value '%s'"),
-				    filter_options.filter_spec);
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 1cb20c659c..adbea552a0 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -63,17 +63,7 @@ static int gently_parse_list_objects_filter(
 		return 0;
 
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
-		struct object_context oc;
-		struct object_id sparse_oid;
-
-		/*
-		 * Try to parse <oid-expression> into an OID for the current
-		 * command, but DO NOT complain if we don't have the blob or
-		 * ref locally.
-		 */
-		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
-					  &sparse_oid, &oc))
-			filter_options->sparse_oid_value = oiddup(&sparse_oid);
+		filter_options->sparse_oid_name = xstrdup(v0);
 		filter_options->choice = LOFC_SPARSE_OID;
 		return 0;
 
@@ -138,7 +128,7 @@ void list_objects_filter_release(
 	struct list_objects_filter_options *filter_options)
 {
 	free(filter_options->filter_spec);
-	free(filter_options->sparse_oid_value);
+	free(filter_options->sparse_oid_name);
 	memset(filter_options, 0, sizeof(*filter_options));
 }
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index c54f0000fb..20b9d1e587 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -42,7 +42,7 @@ struct list_objects_filter_options {
 	 * choice-specific; not all values will be defined for any given
 	 * choice.
 	 */
-	struct object_id *sparse_oid_value;
+	char *sparse_oid_name;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
 };
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 36e1f774bc..d2cdc03a73 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -463,9 +463,16 @@ static void *filter_sparse_oid__init(
 	filter_free_fn *filter_free_fn)
 {
 	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
+	struct object_context oc;
+	struct object_id sparse_oid;
+
+	if (get_oid_with_context(the_repository,
+				 filter_options->sparse_oid_name,
+				 GET_OID_BLOB, &sparse_oid, &oc))
+		die("unable to access sparse blob in '%s'",
+		    filter_options->sparse_oid_name);
 	d->omits = omitted;
-	if (add_excludes_from_blob_to_list(filter_options->sparse_oid_value,
-					   NULL, 0, &d->el) < 0)
+	if (add_excludes_from_blob_to_list(&sparse_oid, NULL, 0, &d->el) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 0bdbc819f1..84365b802a 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -252,7 +252,7 @@ test_expect_success 'setup src repo for sparse filter' '
 	git -C sparse-src commit -m "add sparse checkout files"
 '
 
-test_expect_failure 'partial clone with sparse filter succeeds' '
+test_expect_success 'partial clone with sparse filter succeeds' '
 	rm -rf dst.git &&
 	git clone --no-local --bare \
 		  --filter=sparse:oid=master:only-one \
@@ -265,7 +265,7 @@ test_expect_failure 'partial clone with sparse filter succeeds' '
 	)
 '
 
-test_expect_failure 'partial clone with unresolvable sparse filter fails cleanly' '
+test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
 	rm -rf dst.git &&
 	test_must_fail git clone --no-local --bare \
 				 --filter=sparse:oid=master:no-such-name \
-- 
2.23.0.667.gcccf1fbb03


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/3] list-objects-filter: use empty string instead of NULL for sparse "base"
  2019-09-15  1:09             ` [PATCH 0/3] clone --filter=sparse:oid bugs Jeff King
                                 ` (2 preceding siblings ...)
  2019-09-15  1:13               ` [PATCH 3/3] list-objects-filter: give a more specific error sparse parsing error Jeff King
@ 2019-09-15 16:51               ` Jeff King
  3 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2019-09-15 16:51 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, Jon Simons, git, me, sunshine, stolee

On Sat, Sep 14, 2019 at 09:09:42PM -0400, Jeff King wrote:

> On Mon, Sep 09, 2019 at 01:08:24PM -0400, Jeff King wrote:
> 
> > I'll work up what I sent earlier into a real patch, and include some of
> > this discussion.
> 
> Here it is. I pulled Jon's tests out into their own patch (mostly
> because it makes it easier to give credit). Then patch 2 is my fix, and
> patch 3 is the message fixups he had done.
> 
> This replaces what's queued in js/partial-clone-sparse-blob.
> 
>   [1/3]: t5616: test cloning/fetching with sparse:oid=<oid> filter
>   [2/3]: list-objects-filter: delay parsing of sparse oid
>   [3/3]: list-objects-filter: give a more specific error sparse parsing error

And here's a bonus patch that I found while running under ASan/UBSan
(since I wanted to double-check the memory handling of patch 2 when
merged with 'next').

-- >8 --
Subject: list-objects-filter: use empty string instead of NULL for sparse "base"

We use add_excludes_from_blob_to_list() to parse a sparse blob. Since
we don't have a base path, we pass NULL and 0 for the base and baselen,
respectively. But the rest of the exclude code passes a literal empty
string instead of NULL for this case. And indeed, we eventually end up
with match_pathname() calling fspathncmp(), which then calls the system
strncmp(path, base, baselen).

This works on many platforms, which notice that baselen is 0 and do not
look at the bytes of "base" at all. But it does violate the C standard,
and building with SANITIZE=undefined will complain. You can also see it
by instrumenting fspathncmp like this:

	diff --git a/dir.c b/dir.c
	index d021c908e5..4bb3d3ec96 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -71,6 +71,8 @@ int fspathcmp(const char *a, const char *b)

	 int fspathncmp(const char *a, const char *b, size_t count)
	 {
	+	if (!a || !b)
	+		BUG("null fspathncmp arguments");
	 	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
	 }

We could perhaps be more defensive in match_pathname(), but even if we
did so, it makes sense for this code to match the rest of the exclude
callers.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 50f0c6d07b..83c788e8b5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -472,7 +472,7 @@ static void *filter_sparse_oid__init(
 		die(_("unable to access sparse blob in '%s'"),
 		    filter_options->sparse_oid_name);
 	d->omits = omitted;
-	if (add_excludes_from_blob_to_list(&sparse_oid, NULL, 0, &d->el) < 0)
+	if (add_excludes_from_blob_to_list(&sparse_oid, "", 0, &d->el) < 0)
 		die(_("unable to parse sparse filter data in %s"),
 		    oid_to_hex(&sparse_oid));
 
-- 
2.23.0.667.gcccf1fbb03


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29 23:19 [PATCH v3 0/2] partial-clone: fix two issues with sparse filter handling Jon Simons
2019-08-29 23:19 ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Jon Simons
2019-08-30 18:08   ` Junio C Hamano
2019-09-04  4:54     ` Jeff King
2019-09-05 18:57       ` Junio C Hamano
2019-09-09 13:54         ` Jeff Hostetler
2019-09-09 17:08           ` Jeff King
2019-09-09 20:03             ` Jeff Hostetler
2019-09-15  1:09             ` [PATCH 0/3] clone --filter=sparse:oid bugs Jeff King
2019-09-15  1:11               ` [PATCH 1/3] t5616: test cloning/fetching with sparse:oid=<oid> filter Jeff King
2019-09-15  1:13               ` [PATCH 2/3] list-objects-filter: delay parsing of sparse oid Jeff King
2019-09-15 16:12                 ` Jeff King
2019-09-15  1:13               ` [PATCH 3/3] list-objects-filter: give a more specific error sparse parsing error Jeff King
2019-09-15 16:51               ` [PATCH 4/3] list-objects-filter: use empty string instead of NULL for sparse "base" Jeff King
2019-09-09 17:12           ` [PATCH v3 1/2] list-objects-filter: only parse sparse OID when 'have_git_dir' Junio C Hamano
2019-09-09 19:49             ` Jeff Hostetler
2019-08-29 23:19 ` [PATCH v3 2/2] list-objects-filter: handle unresolved sparse filter OID Jon Simons

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox