git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Make partial clone/fetch work when transfer.fsckobjects=1
@ 2018-03-14 18:42 Jonathan Tan
  2018-03-14 18:42 ` [PATCH 1/2] index-pack: support checking objects but not links Jonathan Tan
  2018-03-14 18:42 ` [PATCH 2/2] fetch-pack: do not check links for partial fetch Jonathan Tan
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Tan @ 2018-03-14 18:42 UTC (permalink / raw)
  To: git; +Cc: git, Jonathan Tan

One of my colleagues noticed that we obtain a "fatal: did not receive
expected object" error when partial-cloning (that is, with --filter set)
if transfer.fsckobjects is true. Here's a fix for that.

Jonathan Tan (2):
  index-pack: support checking objects but not links
  fetch-pack: do not check links for partial fetch

 Documentation/git-index-pack.txt |  3 +++
 builtin/index-pack.c             |  6 ++++--
 fetch-pack.c                     | 13 +++++++++++--
 t/t5302-pack-index.sh            |  5 +++++
 t/t5616-partial-clone.sh         | 11 +++++++++++
 5 files changed, 34 insertions(+), 4 deletions(-)

-- 
2.16.2.520.gd0db9edba.dirty


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] index-pack: support checking objects but not links
  2018-03-14 18:42 [PATCH 0/2] Make partial clone/fetch work when transfer.fsckobjects=1 Jonathan Tan
@ 2018-03-14 18:42 ` Jonathan Tan
  2018-03-14 18:42 ` [PATCH 2/2] fetch-pack: do not check links for partial fetch Jonathan Tan
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Tan @ 2018-03-14 18:42 UTC (permalink / raw)
  To: git; +Cc: git, Jonathan Tan

The index-pack command currently supports the
--check-self-contained-and-connected argument, for internal use only,
that instructs it to only check for broken links and not broken objects.
For partial clones, we need the inverse, so add a --fsck-objects
argument that checks for broken objects and not broken links, also for
internal use only.

This will be used by fetch-pack in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-index-pack.txt | 3 +++
 builtin/index-pack.c             | 6 ++++--
 t/t5302-pack-index.sh            | 5 +++++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 1b4b65d66..138edb47b 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -77,6 +77,9 @@ OPTIONS
 --check-self-contained-and-connected::
 	Die if the pack contains broken links. For internal use only.
 
+--fsck-objects::
+	Die if the pack contains broken objects. For internal use only.
+
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
 	deltas. This requires that index-pack be compiled with
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 59878e70b..f46cb5967 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -827,7 +827,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 		free(has_data);
 	}
 
-	if (strict) {
+	if (strict || do_fsck_object) {
 		read_lock();
 		if (type == OBJ_BLOB) {
 			struct blob *blob = lookup_blob(oid);
@@ -853,7 +853,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			if (do_fsck_object &&
 			    fsck_object(obj, buf, size, &fsck_options))
 				die(_("Error in object"));
-			if (fsck_walk(obj, NULL, &fsck_options))
+			if (strict && fsck_walk(obj, NULL, &fsck_options))
 				die(_("Not all child objects of %s are reachable"), oid_to_hex(&obj->oid));
 
 			if (obj->type == OBJ_TREE) {
@@ -1688,6 +1688,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 			} else if (!strcmp(arg, "--check-self-contained-and-connected")) {
 				strict = 1;
 				check_self_contained_and_connected = 1;
+			} else if (!strcmp(arg, "--fsck-objects")) {
+				do_fsck_object = 1;
 			} else if (!strcmp(arg, "--verify")) {
 				verify = 1;
 			} else if (!strcmp(arg, "--verify-stat")) {
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index c2fc584da..d695a6082 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -262,4 +262,9 @@ EOF
     grep "^warning:.* expected .tagger. line" err
 '
 
+test_expect_success 'index-pack --fsck-objects also warns upon missing tagger in tag' '
+    git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
+    grep "^warning:.* expected .tagger. line" err
+'
+
 test_done
-- 
2.16.2.520.gd0db9edba.dirty


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] fetch-pack: do not check links for partial fetch
  2018-03-14 18:42 [PATCH 0/2] Make partial clone/fetch work when transfer.fsckobjects=1 Jonathan Tan
  2018-03-14 18:42 ` [PATCH 1/2] index-pack: support checking objects but not links Jonathan Tan
@ 2018-03-14 18:42 ` Jonathan Tan
  2018-03-14 21:55   ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Tan @ 2018-03-14 18:42 UTC (permalink / raw)
  To: git; +Cc: git, Jonathan Tan

When doing a partial clone or fetch with transfer.fsckobjects=1, use the
--fsck-objects instead of the --strict flag when invoking index-pack so
that links are not checked, only objects. This is because incomplete
links are expected when doing a partial clone or fetch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c             | 13 +++++++++++--
 t/t5616-partial-clone.sh | 11 +++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index d97461296..1d6117565 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -886,8 +886,17 @@ static int get_pack(struct fetch_pack_args *args,
 	    ? fetch_fsck_objects
 	    : transfer_fsck_objects >= 0
 	    ? transfer_fsck_objects
-	    : 0)
-		argv_array_push(&cmd.args, "--strict");
+	    : 0) {
+		if (args->from_promisor)
+			/*
+			 * We cannot use --strict in index-pack because it
+			 * checks both broken objects and links, but we only
+			 * want to check for broken objects.
+			 */
+			argv_array_push(&cmd.args, "--fsck-objects");
+		else
+			argv_array_push(&cmd.args, "--strict");
+	}
 
 	cmd.in = demux.out;
 	cmd.git_cmd = 1;
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 29d863118..cee556536 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -143,4 +143,15 @@ test_expect_success 'manual prefetch of missing objects' '
 	test_line_count = 0 observed.oids
 '
 
+test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack --fsck-objects' '
+	git init src &&
+	test_commit -C src x &&
+	test_config -C src uploadpack.allowfilter 1 &&
+	test_config -C src uploadpack.allowanysha1inwant 1 &&
+
+	GIT_TRACE="$(pwd)/trace" git -c transfer.fsckobjects=1 \
+		clone --filter="blob:none" "file://$(pwd)/src" dst &&
+	grep "git index-pack.*--fsck-objects" trace
+'
+
 test_done
-- 
2.16.2.520.gd0db9edba.dirty


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] fetch-pack: do not check links for partial fetch
  2018-03-14 18:42 ` [PATCH 2/2] fetch-pack: do not check links for partial fetch Jonathan Tan
@ 2018-03-14 21:55   ` Junio C Hamano
  2018-03-14 22:24     ` Jonathan Tan
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2018-03-14 21:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, git

Jonathan Tan <jonathantanmy@google.com> writes:

> When doing a partial clone or fetch with transfer.fsckobjects=1, use the
> --fsck-objects instead of the --strict flag when invoking index-pack so
> that links are not checked, only objects. This is because incomplete
> links are expected when doing a partial clone or fetch.

It is expected that _some_ links are missing, but this makes me
wonder if we can do better than disabling the connectivity check
altogether.  Does "git fetch" lack sufficient information to attempt
the connectivity check, and when (and only when) it hits a broken
link, see if that is because the connectivity check traversal is
crossing a "partial" fetch boundary, or something along that line?

>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c             | 13 +++++++++++--
>  t/t5616-partial-clone.sh | 11 +++++++++++
>  2 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index d97461296..1d6117565 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -886,8 +886,17 @@ static int get_pack(struct fetch_pack_args *args,
>  	    ? fetch_fsck_objects
>  	    : transfer_fsck_objects >= 0
>  	    ? transfer_fsck_objects
> -	    : 0)
> -		argv_array_push(&cmd.args, "--strict");
> +	    : 0) {
> +		if (args->from_promisor)
> +			/*
> +			 * We cannot use --strict in index-pack because it
> +			 * checks both broken objects and links, but we only
> +			 * want to check for broken objects.
> +			 */
> +			argv_array_push(&cmd.args, "--fsck-objects");
> +		else
> +			argv_array_push(&cmd.args, "--strict");
> +	}
>  
>  	cmd.in = demux.out;
>  	cmd.git_cmd = 1;
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 29d863118..cee556536 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -143,4 +143,15 @@ test_expect_success 'manual prefetch of missing objects' '
>  	test_line_count = 0 observed.oids
>  '
>  
> +test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack --fsck-objects' '
> +	git init src &&
> +	test_commit -C src x &&
> +	test_config -C src uploadpack.allowfilter 1 &&
> +	test_config -C src uploadpack.allowanysha1inwant 1 &&
> +
> +	GIT_TRACE="$(pwd)/trace" git -c transfer.fsckobjects=1 \
> +		clone --filter="blob:none" "file://$(pwd)/src" dst &&
> +	grep "git index-pack.*--fsck-objects" trace
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] fetch-pack: do not check links for partial fetch
  2018-03-14 21:55   ` Junio C Hamano
@ 2018-03-14 22:24     ` Jonathan Tan
  2018-03-15 17:15       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Tan @ 2018-03-14 22:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, git

On Wed, 14 Mar 2018 14:55:31 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > When doing a partial clone or fetch with transfer.fsckobjects=1, use the
> > --fsck-objects instead of the --strict flag when invoking index-pack so
> > that links are not checked, only objects. This is because incomplete
> > links are expected when doing a partial clone or fetch.
> 
> It is expected that _some_ links are missing, but this makes me
> wonder if we can do better than disabling the connectivity check
> altogether.  Does "git fetch" lack sufficient information to attempt
> the connectivity check, and when (and only when) it hits a broken
> link, see if that is because the connectivity check traversal is
> crossing a "partial" fetch boundary, or something along that line?

Our only definition (currently) for the "partial" fetch boundary is
whether an object in a promisor packfile (a packfile obtained from the
promisor remote) references it, so I think that checking for crossing a
"partial" fetch boundary does not add anything. This is because by that
definition, any missing links observed from objects newly fetched from
the promisor remote cross a "partial" fetch boundary (since all objects
fetched in this way "promise" all objects that they refer to).

But it is true that we might be able to do better in checking, for
example, that a packfile fetched using a blob size limit contains all
referenced trees (that is, only blobs are allowed to be missing).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] fetch-pack: do not check links for partial fetch
  2018-03-14 22:24     ` Jonathan Tan
@ 2018-03-15 17:15       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2018-03-15 17:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, git

Jonathan Tan <jonathantanmy@google.com> writes:

> Our only definition (currently) for the "partial" fetch boundary is
> whether an object in a promisor packfile (a packfile obtained from the
> promisor remote) references it, so I think that checking for crossing a
> "partial" fetch boundary does not add anything.

Ah, that's a perfect answer. When we find a link that leads to a
missing object in a pack after receiving it from a remote repository
that gives us promises, the missing object must be available from
the remote repository by definition.  Makes perfect sense ;-)

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-15 17:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-14 18:42 [PATCH 0/2] Make partial clone/fetch work when transfer.fsckobjects=1 Jonathan Tan
2018-03-14 18:42 ` [PATCH 1/2] index-pack: support checking objects but not links Jonathan Tan
2018-03-14 18:42 ` [PATCH 2/2] fetch-pack: do not check links for partial fetch Jonathan Tan
2018-03-14 21:55   ` Junio C Hamano
2018-03-14 22:24     ` Jonathan Tan
2018-03-15 17:15       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).