git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] fetch: in partial clone, check presence of targets
@ 2018-09-20 18:48 Jonathan Tan
  2018-09-20 20:50 ` Junio C Hamano
  2018-09-21 18:22 ` [PATCH v2 0/2] Check presence of targets when fetching to partial clone Jonathan Tan
  0 siblings, 2 replies; 8+ messages in thread
From: Jonathan Tan @ 2018-09-20 18:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matvore

When fetching an object that is known as a promisor object to the local
repository, the connectivity check in quickfetch() in builtin/fetch.c
succeeds, causing object transfer to be bypassed. However, this should
not happen if that object is merely promised and not actually present.

Because this happens, when a user invokes "git fetch origin <sha-1>" on
the command-line, the <sha-1> object may not actually be fetched even
though the command returns an exit code of 0. This is a similar issue
(but with a different cause) to the one fixed by a0c9016abd
("upload-pack: send refs' objects despite "filter"", 2018-07-09).

Therefore, update quickfetch() to also directly check for the presence
of all objects to be fetched.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
In this patch, I have striven to avoid piping from Git commands that may
fail, following the guidelines in [1].

[1] https://public-inbox.org/git/c625bfe2205d51b3158ef71e4bf472708642c146.1537223021.git.matvore@google.com/
---
 builtin/fetch.c          | 19 +++++++++++++++++++
 t/t5616-partial-clone.sh | 17 +++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d21..e9640fe5a 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -938,6 +938,25 @@ static int quickfetch(struct ref *ref_map)
 	 */
 	if (deepen)
 		return -1;
+
+	if (repository_format_partial_clone) {
+		/*
+		 * For the purposes of the connectivity check,
+		 * check_connected() considers all objects promised by
+		 * promisor objects as existing, which means that the
+		 * connectivity check would pass even if a target object
+		 * in rm is merely promised and not present. When
+		 * fetching objects, we need all of them to be present
+		 * (in addition to having correct connectivity), so
+		 * check them directly.
+		 */
+		struct ref *r;
+		for (r = rm; r; r = r->next) {
+			if (!has_object_file(&r->old_oid))
+				return -1;
+		}
+	}
+
 	opt.quiet = 1;
 	return check_connected(iterate_ref_map, &rm, &opt);
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..359d27d02 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -170,6 +170,23 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+test_expect_success 'fetch what is specified on CLI even if already promised' '
+	rm -rf src dst.git &&
+	git init src &&
+	test_commit -C src foo &&
+	test_config -C src uploadpack.allowfilter 1 &&
+	test_config -C src uploadpack.allowanysha1inwant 1 &&
+
+	git hash-object --stdin <src/foo.t >blob &&
+
+	git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git &&
+	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before &&
+	grep "?$(cat blob)" missing_before &&
+	git -C dst.git fetch origin $(cat blob) &&
+	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after &&
+	! grep "?$(cat blob)" missing_after
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: in partial clone, check presence of targets
  2018-09-20 18:48 [PATCH] fetch: in partial clone, check presence of targets Jonathan Tan
@ 2018-09-20 20:50 ` Junio C Hamano
  2018-09-20 22:10   ` Jonathan Tan
  2018-09-21 18:22 ` [PATCH v2 0/2] Check presence of targets when fetching to partial clone Jonathan Tan
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2018-09-20 20:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, matvore

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 61bec5d21..e9640fe5a 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -938,6 +938,25 @@ static int quickfetch(struct ref *ref_map)
>  	 */
>  	if (deepen)
>  		return -1;
> +
> +	if (repository_format_partial_clone) {
> +		/*
> +		 * For the purposes of the connectivity check,
> +		 * check_connected() considers all objects promised by
> +		 * promisor objects as existing, which means that the
> +		 * connectivity check would pass even if a target object
> +		 * in rm is merely promised and not present. When
> +		 * fetching objects, we need all of them to be present
> +		 * (in addition to having correct connectivity), so
> +		 * check them directly.
> +		 */
> +		struct ref *r;
> +		for (r = rm; r; r = r->next) {
> +			if (!has_object_file(&r->old_oid))
> +				return -1;
> +		}

Because check_connected() lies in the presense of "promisor", we can
defeat it this way, which makes sense.  

I wonder if it makes sense to do this check unconditionally, even
when we are in a fully functioning clone.  The check is quite cheap
and can reject a ref_map that has an object we do not know about,
without check_connected() having to ask the rev-list.

> +	}
> +
>  	opt.quiet = 1;
>  	return check_connected(iterate_ref_map, &rm, &opt);
>  }
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index bbbe7537d..359d27d02 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -170,6 +170,23 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
>  	git -C dst fsck
>  '
>  
> +test_expect_success 'fetch what is specified on CLI even if already promised' '
> +	rm -rf src dst.git &&
> +	git init src &&
> +	test_commit -C src foo &&
> +	test_config -C src uploadpack.allowfilter 1 &&
> +	test_config -C src uploadpack.allowanysha1inwant 1 &&
> +
> +	git hash-object --stdin <src/foo.t >blob &&
> +
> +	git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git &&
> +	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before &&
> +	grep "?$(cat blob)" missing_before &&
> +	git -C dst.git fetch origin $(cat blob) &&
> +	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after &&
> +	! grep "?$(cat blob)" missing_after
> +'
> +
>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: in partial clone, check presence of targets
  2018-09-20 20:50 ` Junio C Hamano
@ 2018-09-20 22:10   ` Jonathan Tan
  2018-09-20 23:28     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Tan @ 2018-09-20 22:10 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, matvore

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > +	if (repository_format_partial_clone) {
> > +		/*
> > +		 * For the purposes of the connectivity check,
> > +		 * check_connected() considers all objects promised by
> > +		 * promisor objects as existing, which means that the
> > +		 * connectivity check would pass even if a target object
> > +		 * in rm is merely promised and not present. When
> > +		 * fetching objects, we need all of them to be present
> > +		 * (in addition to having correct connectivity), so
> > +		 * check them directly.
> > +		 */
> > +		struct ref *r;
> > +		for (r = rm; r; r = r->next) {
> > +			if (!has_object_file(&r->old_oid))
> > +				return -1;
> > +		}
> 
> Because check_connected() lies in the presense of "promisor", we can
> defeat it this way, which makes sense.  
> 
> I wonder if it makes sense to do this check unconditionally, even
> when we are in a fully functioning clone.  The check is quite cheap
> and can reject a ref_map that has an object we do not know about,
> without check_connected() having to ask the rev-list.

It makes sense to me from a runtime point of view - the usual case, for
me, is when we're missing at least one object that we're fetching, and
doing the cheap check even allows us to skip the expensive check.

The hard part for me lies in how to communicate to future readers of the
code that they cannot remove this section to simplify the code. We would
need a more complicated comment, something like this:

 /*
  * Check if all wanted objects are present.
  *
  * If the local repository is a partial clone, check_connected() is not
  * sufficient - it would pass even if a wanted object is merely
  * promised and not present. This is because, for the purposes of the
  * connectivity check, check_connected() considers all objects promised
  * by promisor objects as existing.
  *
  * If the local repository is not a partial clone, check_connected() is
  * sufficient, but this loop allows us to avoid the expensive
  * check_connected() in the usual case that at least one wanted object
  * is missing.
  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: in partial clone, check presence of targets
  2018-09-20 22:10   ` Jonathan Tan
@ 2018-09-20 23:28     ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2018-09-20 23:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, matvore

Jonathan Tan <jonathantanmy@google.com> writes:

> The hard part for me lies in how to communicate to future readers of the
> code that they cannot remove this section to simplify the code. We would
> need a more complicated comment, something like this:

That suggests two things.

 - Perhaps quickfetch() is misnamed.  It is to ensure "these exist,
   and are 'connected'"; renaming the helper to convey that would be
   sufficient to prevent future readers from removing the "these
   exist" check from it.

 - Perhaps check_connected() is also misnamed, if the above "these
   exist, and are 'connected'" is not a sufficient warning against
   removal of the "these exist" test, perhaps "check_connected()" is
   not telling the readers that things that are 'connected' do not
   have to exist.  What does being 'connected' mean in the world
   with "promised" objects anyway?  The designer of the feature
   should probably have a concise and clear answer.

>  /*
>   * Check if all wanted objects are present.

Here 'wanted' means... the tip that was directly requested must
exist, and in addition, anything that is reachable from it must
either exist locally or available from the lazy-clone source?  But
that is not quite it. Your definition of 'present' is fuzzy and mean
two different things---for the wanted tips, they must exist.  For
the objects that are required for these wanted tips to be well
formed, they do not have to exist but it is OK for them to be merely
promised.

Perhaps the comment for the quickfetch() function itself should say

/*
 * Ensure that the requested tips exist locally, and anything that is
 * reachable from them either exist locally or promised to be available.
 */

Adding a similar comment to check_connected() function is left as an
exercise, but I suspect it would be the latter half of the above
sentence.

It may be worth renaming both functions for clarity, as I mentioned
already.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 0/2] Check presence of targets when fetching to partial clone
  2018-09-20 18:48 [PATCH] fetch: in partial clone, check presence of targets Jonathan Tan
  2018-09-20 20:50 ` Junio C Hamano
@ 2018-09-21 18:22 ` Jonathan Tan
  2018-09-21 18:22   ` [PATCH v2 1/2] connected: document connectivity in partial clones Jonathan Tan
  2018-09-21 18:22   ` [PATCH v2 2/2] fetch: in partial clone, check presence of targets Jonathan Tan
  1 sibling, 2 replies; 8+ messages in thread
From: Jonathan Tan @ 2018-09-21 18:22 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

New in v2:
 - added patch to clarify in documentation what check_connected() does
 - renamed quickfetch() to check_exist_and_connected() to better reflect
   what it does
 - also updated its documentation; I avoided usage of "wanted objects"
   and used "fetch targets" instead to clarify that I'm only talking
   about the direct targets, not all objects referenced by those targets
 - in check_exist_and_connected() (previously quickfetch()), check
   existence directly regardless of whether the repository is a partial
   clone or not

This should resolve most of Junio's comments in [1], except that I chose
not to modify or rename check_connected(). In this current world, it is
true that we usually require existence of ref targets, but that might
not be so in the future (having said that, I don't know of any schedules
for this future). Also, check_connected() is used in a few places, so
such a change would cause some churn in the codebase. So I left this
function alone.

[1] https://public-inbox.org/git/xmqqy3bvycie.fsf@gitster-ct.c.googlers.com/

Jonathan Tan (2):
  connected: document connectivity in partial clones
  fetch: in partial clone, check presence of targets

 builtin/fetch.c          | 15 +++++++++++++--
 connected.h              |  6 +++---
 t/t5616-partial-clone.sh | 17 +++++++++++++++++
 3 files changed, 33 insertions(+), 5 deletions(-)

-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] connected: document connectivity in partial clones
  2018-09-21 18:22 ` [PATCH v2 0/2] Check presence of targets when fetching to partial clone Jonathan Tan
@ 2018-09-21 18:22   ` Jonathan Tan
  2018-09-21 20:19     ` Junio C Hamano
  2018-09-21 18:22   ` [PATCH v2 2/2] fetch: in partial clone, check presence of targets Jonathan Tan
  1 sibling, 1 reply; 8+ messages in thread
From: Jonathan Tan @ 2018-09-21 18:22 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

In acb0c57260 ("fetch: support filters", 2017-12-08), check_connected()
was extended to allow objects to either be promised to be available (if
the repository is a partial clone) or to be present; previously, this
function required the latter. However, this change was not reflected in
the documentation of that function. Update the documentation
accordingly.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 connected.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/connected.h b/connected.h
index e4c961817..8d5a6b3ad 100644
--- a/connected.h
+++ b/connected.h
@@ -51,9 +51,9 @@ struct check_connected_options {
 #define CHECK_CONNECTED_INIT { 0 }
 
 /*
- * Make sure that our object store has all the commits necessary to
- * connect the ancestry chain to some of our existing refs, and all
- * the trees and blobs that these commits use.
+ * Make sure that all given objects and all objects reachable from them
+ * either exist in our object store or (if the repository is a partial
+ * clone) are promised to be available.
  *
  * Return 0 if Ok, non zero otherwise (i.e. some missing objects)
  *
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] fetch: in partial clone, check presence of targets
  2018-09-21 18:22 ` [PATCH v2 0/2] Check presence of targets when fetching to partial clone Jonathan Tan
  2018-09-21 18:22   ` [PATCH v2 1/2] connected: document connectivity in partial clones Jonathan Tan
@ 2018-09-21 18:22   ` Jonathan Tan
  1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Tan @ 2018-09-21 18:22 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

When fetching an object that is known as a promisor object to the local
repository, the connectivity check in quickfetch() in builtin/fetch.c
succeeds, causing object transfer to be bypassed. However, this should
not happen if that object is merely promised and not actually present.

Because this happens, when a user invokes "git fetch origin <sha-1>" on
the command-line, the <sha-1> object may not actually be fetched even
though the command returns an exit code of 0. This is a similar issue
(but with a different cause) to the one fixed by a0c9016abd
("upload-pack: send refs' objects despite "filter"", 2018-07-09).

Therefore, update quickfetch() to also directly check for the presence
of all objects to be fetched. Its documentation and name are also
updated to better reflect what it does.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fetch.c          | 15 +++++++++++++--
 t/t5616-partial-clone.sh | 17 +++++++++++++++++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d21..b9e74c129 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -924,10 +924,11 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
  * everything we are going to fetch already exists and is connected
  * locally.
  */
-static int quickfetch(struct ref *ref_map)
+static int check_exist_and_connected(struct ref *ref_map)
 {
 	struct ref *rm = ref_map;
 	struct check_connected_options opt = CHECK_CONNECTED_INIT;
+	struct ref *r;
 
 	/*
 	 * If we are deepening a shallow clone we already have these
@@ -938,13 +939,23 @@ static int quickfetch(struct ref *ref_map)
 	 */
 	if (deepen)
 		return -1;
+
+	/*
+	 * check_connected() allows objects to merely be promised, but
+	 * we need all direct targets to exist.
+	 */
+	for (r = rm; r; r = r->next) {
+		if (!has_object_file(&r->old_oid))
+			return -1;
+	}
+
 	opt.quiet = 1;
 	return check_connected(iterate_ref_map, &rm, &opt);
 }
 
 static int fetch_refs(struct transport *transport, struct ref *ref_map)
 {
-	int ret = quickfetch(ref_map);
+	int ret = check_exist_and_connected(ref_map);
 	if (ret)
 		ret = transport_fetch_refs(transport, ref_map);
 	if (!ret)
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..359d27d02 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -170,6 +170,23 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+test_expect_success 'fetch what is specified on CLI even if already promised' '
+	rm -rf src dst.git &&
+	git init src &&
+	test_commit -C src foo &&
+	test_config -C src uploadpack.allowfilter 1 &&
+	test_config -C src uploadpack.allowanysha1inwant 1 &&
+
+	git hash-object --stdin <src/foo.t >blob &&
+
+	git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git &&
+	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before &&
+	grep "?$(cat blob)" missing_before &&
+	git -C dst.git fetch origin $(cat blob) &&
+	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after &&
+	! grep "?$(cat blob)" missing_after
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] connected: document connectivity in partial clones
  2018-09-21 18:22   ` [PATCH v2 1/2] connected: document connectivity in partial clones Jonathan Tan
@ 2018-09-21 20:19     ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2018-09-21 20:19 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> In acb0c57260 ("fetch: support filters", 2017-12-08), check_connected()
> was extended to allow objects to either be promised to be available (if
> the repository is a partial clone) or to be present; previously, this
> function required the latter. However, this change was not reflected in
> the documentation of that function. Update the documentation
> accordingly.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  connected.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Very much makes sense.  I think this is sufficient clarification to
allay your earlier worry of having to have a huge in-code comment to
prevent the "must exist" loop from getting removed.

> diff --git a/connected.h b/connected.h
> index e4c961817..8d5a6b3ad 100644
> --- a/connected.h
> +++ b/connected.h
> @@ -51,9 +51,9 @@ struct check_connected_options {
>  #define CHECK_CONNECTED_INIT { 0 }
>  
>  /*
> - * Make sure that our object store has all the commits necessary to
> - * connect the ancestry chain to some of our existing refs, and all
> - * the trees and blobs that these commits use.
> + * Make sure that all given objects and all objects reachable from them
> + * either exist in our object store or (if the repository is a partial
> + * clone) are promised to be available.
>   *
>   * Return 0 if Ok, non zero otherwise (i.e. some missing objects)
>   *

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-09-21 20:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-20 18:48 [PATCH] fetch: in partial clone, check presence of targets Jonathan Tan
2018-09-20 20:50 ` Junio C Hamano
2018-09-20 22:10   ` Jonathan Tan
2018-09-20 23:28     ` Junio C Hamano
2018-09-21 18:22 ` [PATCH v2 0/2] Check presence of targets when fetching to partial clone Jonathan Tan
2018-09-21 18:22   ` [PATCH v2 1/2] connected: document connectivity in partial clones Jonathan Tan
2018-09-21 20:19     ` Junio C Hamano
2018-09-21 18:22   ` [PATCH v2 2/2] fetch: in partial clone, check presence of targets Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).