git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
@ 2018-07-06 19:34 Jonathan Tan
  2018-07-06 19:34 ` [PATCH 1/2] upload-pack: send refs' objects despite "filter" Jonathan Tan
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Jonathan Tan @ 2018-07-06 19:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, git

When cloning a repository with a tagged blob (like the Git repository)
with --filter=blob:none, the following message appears:

        error: missing object referenced by 'refs/tags/junio-gpg-pub'

and the resulting repository also fails fsck.

Patch 1 fixes the protocol documentation and the server side of Git, and
patch 2 makes clone error out when such a situation occurs.

An argument could be made that we should not merge patch 2 just yet due
to the fact that some server implementations (such as Git and JGit)
still exhibit the old behavior, and the resulting clones (albeit failing
fsck) are still usable, because when attempting to load the blob, Git
will automatically fetch it. I'm on the fence about this, and have
included patch 2 in this patch set nevertheless for completeness.

Jonathan Tan (2):
  upload-pack: send refs' objects despite "filter"
  clone: check connectivity even if clone is partial

 Documentation/technical/pack-protocol.txt |  4 +-
 builtin/clone.c                           |  2 +-
 list-objects.c                            |  6 +--
 object.h                                  |  2 +-
 revision.c                                |  1 +
 revision.h                                |  3 +-
 t/t5317-pack-objects-filter-objects.sh    | 16 ++++++
 t/t5616-partial-clone.sh                  | 64 +++++++++++++++++++++++
 8 files changed, 91 insertions(+), 7 deletions(-)

-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] upload-pack: send refs' objects despite "filter"
  2018-07-06 19:34 [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
@ 2018-07-06 19:34 ` Jonathan Tan
  2018-07-06 19:34 ` [PATCH 2/2] clone: check connectivity even if clone is partial Jonathan Tan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2018-07-06 19:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, git

A filter line in a request to upload-pack filters out objects regardless
of whether they are directly referenced by a "want" line or not. This
means that cloning with "--filter=blob:none" (or another filter that
excludes blobs) from a repository with at least one ref pointing to a
blob (for example, the Git repository itself) results in output like the
following:

    error: missing object referenced by 'refs/tags/junio-gpg-pub'

and if that particular blob is not referenced by a fetched tree, the
resulting clone fails fsck because there is no object from the remote to
vouch that the missing object is a promisor object.

Update both the protocol and the upload-pack implementation to include
all explicitly specified "want" objects in the packfile regardless of
the filter specification.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/pack-protocol.txt |  4 +++-
 list-objects.c                            |  6 +++---
 object.h                                  |  2 +-
 revision.c                                |  1 +
 revision.h                                |  3 ++-
 t/t5317-pack-objects-filter-objects.sh    | 16 ++++++++++++++++
 t/t5616-partial-clone.sh                  | 16 ++++++++++++++++
 7 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 7fee6b780a..508a344cf1 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -284,7 +284,9 @@ information is sent back to the client in the next step.
 The client can optionally request that pack-objects omit various
 objects from the packfile using one of several filtering techniques.
 These are intended for use with partial clone and partial fetch
-operations.  See `rev-list` for possible "filter-spec" values.
+operations. An object that does not meet a filter-spec value is
+omitted unless explicitly requested in a 'want' line. See `rev-list`
+for possible filter-spec values.
 
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
diff --git a/list-objects.c b/list-objects.c
index 3eec510357..be4118889a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -47,7 +47,7 @@ static void process_blob(struct rev_info *revs,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (filter_fn)
+	if (!(obj->flags & USER_GIVEN) && filter_fn)
 		r = filter_fn(LOFS_BLOB, obj,
 			      path->buf, &path->buf[pathlen],
 			      filter_data);
@@ -132,7 +132,7 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (filter_fn)
+	if (!(obj->flags & USER_GIVEN) && filter_fn)
 		r = filter_fn(LOFS_BEGIN_TREE, obj,
 			      base->buf, &base->buf[baselen],
 			      filter_data);
@@ -171,7 +171,7 @@ static void process_tree(struct rev_info *revs,
 				     cb_data, filter_fn, filter_data);
 	}
 
-	if (filter_fn) {
+	if (!(obj->flags & USER_GIVEN) && filter_fn) {
 		r = filter_fn(LOFS_END_TREE, obj,
 			      base->buf, &base->buf[baselen],
 			      filter_data);
diff --git a/object.h b/object.h
index 5c13955000..cf8da6ba83 100644
--- a/object.h
+++ b/object.h
@@ -27,7 +27,7 @@ struct object_array {
 
 /*
  * object flag allocation:
- * revision.h:               0---------10                                26
+ * revision.h:               0---------10                              2526
  * fetch-pack.c:             0----5
  * walker.c:                 0-2
  * upload-pack.c:                4       11----------------19
diff --git a/revision.c b/revision.c
index 40fd91ff2b..1b37da988d 100644
--- a/revision.c
+++ b/revision.c
@@ -172,6 +172,7 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
+	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index b8c47b98e2..dc8f96366e 100644
--- a/revision.h
+++ b/revision.h
@@ -19,8 +19,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
+#define USER_GIVEN	(1u<<25) /* given directly by the user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 1b0acc383b..6710c8bc8c 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -160,6 +160,22 @@ test_expect_success 'verify blob:limit=1k' '
 	test_cmp observed expected
 '
 
+test_expect_success 'verify explicitly specifying oversized blob in input' '
+	git -C r2 ls-files -s large.1000 large.10000 \
+		| awk -f print_2.awk \
+		| sort >expected &&
+	git -C r2 pack-objects --rev --stdout --filter=blob:limit=1k >filter.pack <<-EOF &&
+	HEAD
+	$(git -C r2 rev-parse HEAD:large.10000)
+	EOF
+	git -C r2 index-pack ../filter.pack &&
+	git -C r2 verify-pack -v ../filter.pack \
+		| grep blob \
+		| awk -f print_1.awk \
+		| sort >observed &&
+	test_cmp observed expected
+'
+
 test_expect_success 'verify blob:limit=1m' '
 	git -C r2 ls-files -s large.1000 large.10000 \
 		| awk -f print_2.awk \
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index cee5565367..8a2bf86491 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,4 +154,20 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
+	rm -rf src dst &&
+	git init src &&
+	test_commit -C src x &&
+	test_config -C src uploadpack.allowfilter 1 &&
+	test_config -C src uploadpack.allowanysha1inwant 1 &&
+
+	# Create a tag pointing to a blob.
+	BLOB=$(echo blob-contents | git -C src hash-object --stdin -w) &&
+	git -C src tag myblob "$BLOB" &&
+
+	git clone --filter="blob:none" "file://$(pwd)/src" dst 2>err &&
+	! grep "does not point to a valid object" err &&
+	git -C dst fsck
+'
+
 test_done
-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] clone: check connectivity even if clone is partial
  2018-07-06 19:34 [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
  2018-07-06 19:34 ` [PATCH 1/2] upload-pack: send refs' objects despite "filter" Jonathan Tan
@ 2018-07-06 19:34 ` Jonathan Tan
  2018-07-06 19:38 ` [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
  2018-07-09 19:33 ` Junio C Hamano
  3 siblings, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2018-07-06 19:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, git

The commit that introduced the partial clone feature - 548719fbdc
("clone: partial clone", 2017-12-08) - excluded connectivity checks
for partial clones, but this also meant that it is possible for a clone
to succeed, yet not have all objects either present or promised.
Specifically, if cloning with --filter=blob:none from a repository that
has a tag pointing to a blob, and the blob is not sent in the packfile,
the clone will pass, even if the blob is not referenced by any tree in
the packfile.

Turn on connectivity checks for partial clone.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/clone.c          |  2 +-
 t/t5616-partial-clone.sh | 48 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 8f86d99c51..fa53550758 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1201,7 +1201,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
 			   branch_top.buf, reflog_msg.buf, transport,
-			   !is_local && !filter_options.choice);
+			   !is_local);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 8a2bf86491..44d8e80171 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -170,4 +170,52 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+# Converts bytes into a form suitable for inclusion in a sed command. For
+# example, "printf 'ab\r\n' | hex_unpack" results in '\x61\x62\x0d\x0a'.
+sed_escape () {
+	perl -e '$/ = undef; $input = <>; print unpack("H2" x length($input), $input)' |
+		sed 's/\(..\)/\\x\1/g'
+}
+
+test_expect_success 'upon cloning, check that all refs point to objects' '
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	rm -rf "$SERVER" repo &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+	test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
+
+	# Create a tag pointing to a blob.
+	BLOB=$(echo blob-contents | git -C "$SERVER" hash-object --stdin -w) &&
+	git -C "$SERVER" tag myblob "$BLOB" &&
+
+	# Craft a packfile not including that blob.
+	git -C "$SERVER" rev-parse HEAD |
+		git -C "$SERVER" pack-objects --stdout >incomplete.pack &&
+
+	# Replace the existing packfile with the crafted one. The protocol
+	# requires that the packfile be sent in sideband 1, hence the extra
+	# \x01 byte at the beginning.
+	printf "1,/packfile/!c %04x\\\\x01%s0000" \
+		"$(($(wc -c <incomplete.pack) + 5))" \
+		"$(sed_escape <incomplete.pack)" \
+		>"$HTTPD_ROOT_PATH/one-time-sed" &&
+
+	# Use protocol v2 because the sed command looks for the "packfile"
+	# section header.
+	test_config -C "$SERVER" protocol.version 2 &&
+	test_must_fail git -c protocol.version=2 clone \
+		--filter=blob:none $HTTPD_URL/one_time_sed/server repo 2>err &&
+
+	grep "did not send all necessary objects" err &&
+
+	# Ensure that the one-time-sed script was used.
+	! test -e "$HTTPD_ROOT_PATH/one-time-sed"
+'
+
+stop_httpd
+
 test_done
-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-06 19:34 [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
  2018-07-06 19:34 ` [PATCH 1/2] upload-pack: send refs' objects despite "filter" Jonathan Tan
  2018-07-06 19:34 ` [PATCH 2/2] clone: check connectivity even if clone is partial Jonathan Tan
@ 2018-07-06 19:38 ` Jonathan Tan
  2018-07-06 20:10   ` Junio C Hamano
  2018-07-09 19:33 ` Junio C Hamano
  3 siblings, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2018-07-06 19:38 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git, git

> When cloning a repository with a tagged blob (like the Git repository)
> with --filter=blob:none, the following message appears:
> 
>         error: missing object referenced by 'refs/tags/junio-gpg-pub'
> 
> and the resulting repository also fails fsck.
> 
> Patch 1 fixes the protocol documentation and the server side of Git, and
> patch 2 makes clone error out when such a situation occurs.
> 
> An argument could be made that we should not merge patch 2 just yet due
> to the fact that some server implementations (such as Git and JGit)
> still exhibit the old behavior, and the resulting clones (albeit failing
> fsck) are still usable, because when attempting to load the blob, Git
> will automatically fetch it. I'm on the fence about this, and have
> included patch 2 in this patch set nevertheless for completeness.
> 
> Jonathan Tan (2):
>   upload-pack: send refs' objects despite "filter"
>   clone: check connectivity even if clone is partial

Forgot to mention: this patch set is based on bw/ref-in-want because I
needed its one-time-sed functionality.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-06 19:38 ` [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
@ 2018-07-06 20:10   ` Junio C Hamano
  2018-07-06 21:25     ` Jonathan Tan
  2018-07-06 21:29     ` Jonathan Nieder
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2018-07-06 20:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, git

Jonathan Tan <jonathantanmy@google.com> writes:

>> When cloning a repository with a tagged blob (like the Git repository)
>> with --filter=blob:none, the following message appears:
>> 
>>         error: missing object referenced by 'refs/tags/junio-gpg-pub'
>> 
>> and the resulting repository also fails fsck.

Hmph, the approach taken by these two patches smells bad.

When a blob is deliberately omitted with --fitler=blob:none, the
fsck that encounters an entry in a tree object that is about that
expected-to-be-and-actually-is-missing blob does *not* complain and
that is by design, right?  Naïvely, I may expect fsck to follow the
same principle--when it encounters a reference to an object that is
deliberately missing, refrain from complaining, regardless of the
place the offending reference was found, be it inside a tree object
or a ref.

Perhaps that line of consistent behaviour may be impossible due to
the way the reference is stored inside a tree object and a ref?
E.g. a reference to an expected-to-be-missing blob still lets us
know that the missing object is expected to be a blob, but a ref
only stores the object name and not its type, so we can tell it is
missing, but we cannot say it is OK to be missing because we expect
it to be a blob, or something?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-06 20:10   ` Junio C Hamano
@ 2018-07-06 21:25     ` Jonathan Tan
  2018-07-06 21:29     ` Jonathan Nieder
  1 sibling, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2018-07-06 21:25 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, git

> Hmph, the approach taken by these two patches smells bad.
> 
> When a blob is deliberately omitted with --fitler=blob:none, the
> fsck that encounters an entry in a tree object that is about that
> expected-to-be-and-actually-is-missing blob does *not* complain and
> that is by design, right?  Naïvely, I may expect fsck to follow the
> same principle--when it encounters a reference to an object that is
> deliberately missing, refrain from complaining, regardless of the
> place the offending reference was found, be it inside a tree object
> or a ref.

The difference between the two is that this tree object is from a
packfile with an associated ".promisor" file, indicating that the
destinations of any "dangling" links are promisor objects (and thus,
fsck and others will not complain about those). We don't have such a
mechanism for refs.

This was a tradeoff between the desire to still detect repository
corruption whenever we can, and to not impact performance greatly when
operating on a partial repository.

Besides the solution in this patch set, we could (1) allow any ref to
point to a missing object, (2) allow a subset of refs to point to a
missing object, or (3) have another place in $GIT_DIR that can record
the hashes of promisor objects. (1) seems contrary to the repository
corruption detection goal that we have. (2) is plausible, but we would
have to figure out what to record - it's fine to record the RHS of the
default refspec when we clone, but I don't know what we should record
when the user subsequently runs "git fetch origin
refs/heads/master:refs/heads/master". As for (3), that would require
another repository extension since we wouldn't want an old Git to fetch
into a new Git repository without updating the promisor-hashes file -
but at this point, it seems easier to update the protocol instead. Any
comments are welcome, and I'll think about this some more.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-06 20:10   ` Junio C Hamano
  2018-07-06 21:25     ` Jonathan Tan
@ 2018-07-06 21:29     ` Jonathan Nieder
  2018-07-06 22:18       ` Junio C Hamano
  2018-07-09 16:07       ` Junio C Hamano
  1 sibling, 2 replies; 12+ messages in thread
From: Jonathan Nieder @ 2018-07-06 21:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, git

Hi,

Junio C Hamano wrote:
>> Jonathan Tan <jonathantanmy@google.com> writes:

>>> When cloning a repository with a tagged blob (like the Git repository)
>>> with --filter=blob:none, the following message appears:
>>>
>>>         error: missing object referenced by 'refs/tags/junio-gpg-pub'
>>>
>>> and the resulting repository also fails fsck.
[...]
>                            Naïvely, I may expect fsck to follow the
> same principle--when it encounters a reference to an object that is
> deliberately missing, refrain from complaining, regardless of the
> place the offending reference was found, be it inside a tree object
> or a ref.
>
> Perhaps that line of consistent behaviour may be impossible due to
> the way the reference is stored inside a tree object and a ref?

Yes, exactly.  I believe this is why we did not treat refs as
promisors when we introduced promised objects.

We could revisit that decision and make refs into promisors, storing
any extra metadata on the side.  That would be a more invasive change
than this series does, though.  In the model used today, where refs
are not promisors, this series is the logical thing to do.

> E.g. a reference to an expected-to-be-missing blob still lets us
> know that the missing object is expected to be a blob, but a ref
> only stores the object name and not its type, so we can tell it is
> missing, but we cannot say it is OK to be missing because we expect
> it to be a blob, or something?

The main downside of saying that filters should not be applied to the
objects named directly in a "want" is that such objects could be large.

The upsides are that

- it makes the protocol more consistent with the current invariants
  maintained by "git fsck"

- down the line, it should make operations like "fetch just this one
  tree" a little simpler, since you can use

   filter blob:none
   filter tree:none
   want <this tree>

Later we could introduce a separate kind of "want" for which filters
apply to the object named directly, too, to provide more flexibility
for the client.  I can imagine cases where a person would want each of
those two behaviors, so I don't think we are incurring unnecessary
complexity by always sending the object named in a "want" without such
an explicit filtering request.  So I'm comfortable with the direction
this series goes in, though I haven't looked at the patches in detail.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-06 21:29     ` Jonathan Nieder
@ 2018-07-06 22:18       ` Junio C Hamano
  2018-07-09 16:07       ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2018-07-06 22:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> The upsides are that
> ...
> - down the line, it should make operations like "fetch just this one
>   tree" a little simpler, since you can use
>
>    filter blob:none
>    filter tree:none
>    want <this tree>

;-) 

I think this example, especially without the first line, would have
a practical use, i.e. "grab the contents of this directory without
recursing", and as such is a very convincing argument to support the
approach taken by the solution presented here.

> ....  So I'm comfortable with the direction
> this series goes in, though I haven't looked at the patches in detail.

Likewise.  Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-06 21:29     ` Jonathan Nieder
  2018-07-06 22:18       ` Junio C Hamano
@ 2018-07-09 16:07       ` Junio C Hamano
  2018-07-09 17:10         ` Jonathan Tan
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2018-07-09 16:07 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git, git

Jonathan Nieder <jrnieder@gmail.com> writes:

>>>> When cloning a repository with a tagged blob (like the Git repository)
>>>> with --filter=blob:none, the following message appears:
>>>>
>>>>         error: missing object referenced by 'refs/tags/junio-gpg-pub'
>>>>
>>>> and the resulting repository also fails fsck.
> [...]
>>                            Naïvely, I may expect fsck to follow the
>> same principle--when it encounters a reference to an object that is
>> deliberately missing, refrain from complaining, regardless of the
>> place the offending reference was found, be it inside a tree object
>> or a ref.
>>
>> Perhaps that line of consistent behaviour may be impossible due to
>> the way the reference is stored inside a tree object and a ref?
>
> Yes, exactly.  I believe this is why we did not treat refs as
> promisors when we introduced promised objects.
>
> We could revisit that decision and make refs into promisors, storing
> any extra metadata on the side.  That would be a more invasive change
> than this series does, though.  In the model used today, where refs
> are not promisors, this series is the logical thing to do.

I think that refs in files-backend is a bit hard to update to
annotate with extra information like "is this merely promising or a
tip of the connectivity graph?" and agree that we should not make
refs promisors.

Such an otherwise unreferenced blob could be made into a promised
object by the promisor repository, though, I think.  It can include
a synthetic tree that points at such a blob in the pack stream data
if it knows that the resulting pack will be marked with the
.promisor bit, and by doing so, there is no need to update the
client side, no?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-09 16:07       ` Junio C Hamano
@ 2018-07-09 17:10         ` Jonathan Tan
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2018-07-09 17:10 UTC (permalink / raw)
  To: gitster; +Cc: jrnieder, jonathantanmy, git, git

> I think that refs in files-backend is a bit hard to update to
> annotate with extra information like "is this merely promising or a
> tip of the connectivity graph?" and agree that we should not make
> refs promisors.

OK, we agree on this :-)

> Such an otherwise unreferenced blob could be made into a promised
> object by the promisor repository, though, I think.  It can include
> a synthetic tree that points at such a blob in the pack stream data
> if it knows that the resulting pack will be marked with the
> .promisor bit, and by doing so, there is no need to update the
> client side, no?

Yes, that is possible, but I think the client side still needs to be
updated, for the same reason as in my implementation (the client should
check that the server sent all necessary objects - the blob itself in my
case, and the synthetic tree in yours).

I think that if we were planning to introduce a synthetic tree, though,
it would be better just to include the object itself (as in my
implementation). In both solutions, both the protocol and the server
need to be updated, but existing clients will still work (minus the
checking that would still be desirable with both solutions).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-06 19:34 [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
                   ` (2 preceding siblings ...)
  2018-07-06 19:38 ` [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
@ 2018-07-09 19:33 ` Junio C Hamano
  2018-07-09 20:07   ` Jonathan Tan
  3 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2018-07-09 19:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, git

Jonathan Tan <jonathantanmy@google.com> writes:

> When cloning a repository with a tagged blob (like the Git repository)
> with --filter=blob:none, the following message appears:
>
>         error: missing object referenced by 'refs/tags/junio-gpg-pub'
>
> and the resulting repository also fails fsck.
>
> Patch 1 fixes the protocol documentation and the server side of Git, and
> patch 2 makes clone error out when such a situation occurs.
>
> An argument could be made that we should not merge patch 2 just yet due
> to the fact that some server implementations (such as Git and JGit)
> still exhibit the old behavior, and the resulting clones (albeit failing
> fsck) are still usable, because when attempting to load the blob, Git
> will automatically fetch it. I'm on the fence about this, and have
> included patch 2 in this patch set nevertheless for completeness.

I think the latter is probably a good thing to nudge the server
implementations in the right direction by gently poking them ;-)

The patches textually apply cleanly on 'master' but apparently it
needs some fixes in jt/partial-clone-fsck-connectivity topic to
function correctly?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Avoiding errors when partial cloning a tagged blob
  2018-07-09 19:33 ` Junio C Hamano
@ 2018-07-09 20:07   ` Jonathan Tan
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Tan @ 2018-07-09 20:07 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, git

> > An argument could be made that we should not merge patch 2 just yet due
> > to the fact that some server implementations (such as Git and JGit)
> > still exhibit the old behavior, and the resulting clones (albeit failing
> > fsck) are still usable, because when attempting to load the blob, Git
> > will automatically fetch it. I'm on the fence about this, and have
> > included patch 2 in this patch set nevertheless for completeness.
> 
> I think the latter is probably a good thing to nudge the server
> implementations in the right direction by gently poking them ;-)

OK, I'll keep patch 2 then.

> The patches textually apply cleanly on 'master' but apparently it
> needs some fixes in jt/partial-clone-fsck-connectivity topic to
> function correctly?

I forgot to mention in the original e-mail that this is on
bw/ref-in-want. (I mentioned it in an e-mail I sent later [1].) I've
checked and basing it on both bw/ref-in-want and
jt/partial-clone-fsck-connectivity (which is based on bw/ref-in-want
itself) both work, so you can choose either one.

[1] https://public-inbox.org/git/20180706193847.160161-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-07-09 20:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06 19:34 [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
2018-07-06 19:34 ` [PATCH 1/2] upload-pack: send refs' objects despite "filter" Jonathan Tan
2018-07-06 19:34 ` [PATCH 2/2] clone: check connectivity even if clone is partial Jonathan Tan
2018-07-06 19:38 ` [PATCH 0/2] Avoiding errors when partial cloning a tagged blob Jonathan Tan
2018-07-06 20:10   ` Junio C Hamano
2018-07-06 21:25     ` Jonathan Tan
2018-07-06 21:29     ` Jonathan Nieder
2018-07-06 22:18       ` Junio C Hamano
2018-07-09 16:07       ` Junio C Hamano
2018-07-09 17:10         ` Jonathan Tan
2018-07-09 19:33 ` Junio C Hamano
2018-07-09 20:07   ` Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).