git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* [PATCH 0/2] Skip a connectivity check during fetch --filter
@ 2020-01-12  4:15 Jonathan Tan
  2020-01-12  4:15 ` [PATCH 1/2] connected: verify promisor-ness of partial clone Jonathan Tan
  2020-01-12  4:15 ` [PATCH 2/2] fetch: forgo full connectivity check if --filter Jonathan Tan
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Tan @ 2020-01-12  4:15 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The same optimization in dfa33a298d ("clone: do faster object check for
partial clones", 2019-04-21) can be applied to fetch as well, so this
patch set does so. Patch 1 makes the check more robust, and patch 2
applies it to one of two connectivity checks performed during the fetch.

As mentioned in patch 2, when fetching from a local repo, I got a
speedup of 6.63s to 3.39s. 

Jonathan Tan (2):
  connected: verify promisor-ness of partial clone
  fetch: forgo full connectivity check if --filter

 builtin/clone.c |  5 +++--
 builtin/fetch.c | 11 ++++++++++-
 connected.c     | 19 ++++++++++++++-----
 connected.h     | 11 ++++++-----
 4 files changed, 33 insertions(+), 13 deletions(-)

-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] connected: verify promisor-ness of partial clone
  2020-01-12  4:15 [PATCH 0/2] Skip a connectivity check during fetch --filter Jonathan Tan
@ 2020-01-12  4:15 ` Jonathan Tan
  2020-01-29 20:41   ` Jonathan Nieder
  2020-01-12  4:15 ` [PATCH 2/2] fetch: forgo full connectivity check if --filter Jonathan Tan
  1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Tan @ 2020-01-12  4:15 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Commit dfa33a298d ("clone: do faster object check for partial clones",
2019-04-21) optimized the connectivity check done when cloning with
--filter to check only the existence of objects directly pointed to by
refs. But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are
promisor objects, that is, they appear in a promisor pack.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/clone.c |  5 +++--
 connected.c     | 19 ++++++++++++++-----
 connected.h     | 11 ++++++-----
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 0fc89ae2b9..0516181052 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -673,7 +673,7 @@ static void update_remote_refs(const struct ref *refs,
 			       const char *msg,
 			       struct transport *transport,
 			       int check_connectivity,
-			       int check_refs_only)
+			       int check_refs_are_promisor_objects_only)
 {
 	const struct ref *rm = mapped_refs;
 
@@ -682,7 +682,8 @@ static void update_remote_refs(const struct ref *refs,
 
 		opt.transport = transport;
 		opt.progress = transport->progress;
-		opt.check_refs_only = !!check_refs_only;
+		opt.check_refs_are_promisor_objects_only =
+			!!check_refs_are_promisor_objects_only;
 
 		if (check_connected(iterate_ref_map, &rm, &opt))
 			die(_("remote did not send all necessary objects"));
diff --git a/connected.c b/connected.c
index c337f5f7f4..7e9bd1bc62 100644
--- a/connected.c
+++ b/connected.c
@@ -52,19 +52,28 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 		strbuf_release(&idx_file);
 	}
 
-	if (opt->check_refs_only) {
+	if (opt->check_refs_are_promisor_objects_only) {
 		/*
 		 * For partial clones, we don't want to have to do a regular
 		 * connectivity check because we have to enumerate and exclude
 		 * all promisor objects (slow), and then the connectivity check
 		 * itself becomes a no-op because in a partial clone every
 		 * object is a promisor object. Instead, just make sure we
-		 * received the objects pointed to by each wanted ref.
+		 * received, in a promisor packfile, the objects pointed to by
+		 * each wanted ref.
 		 */
 		do {
-			if (!repo_has_object_file_with_flags(the_repository, &oid,
-							     OBJECT_INFO_SKIP_FETCH_OBJECT))
-				return 1;
+			struct packed_git *p;
+
+			for (p = get_all_packs(the_repository); p; p = p->next) {
+				if (!p->pack_promisor)
+					continue;
+				if (find_pack_entry_one(oid.hash, p))
+					goto promisor_pack_found;
+			}
+			return 1;
+promisor_pack_found:
+			;
 		} while (!fn(cb_data, &oid));
 		return 0;
 	}
diff --git a/connected.h b/connected.h
index ce2e7d8f2e..eba5c261ba 100644
--- a/connected.h
+++ b/connected.h
@@ -48,12 +48,13 @@ struct check_connected_options {
 	unsigned is_deepening_fetch : 1;
 
 	/*
-	 * If non-zero, only check the top-level objects referenced by the
-	 * wanted refs (passed in as cb_data). This is useful for partial
-	 * clones, where enumerating and excluding all promisor objects is very
-	 * slow and the commit-walk itself becomes a no-op.
+	 * If non-zero, only check that the top-level objects referenced by the
+	 * wanted refs (passed in as cb_data) are promisor objects. This is
+	 * useful for partial clones, where enumerating and excluding all
+	 * promisor objects is very slow and the commit-walk itself becomes a
+	 * no-op.
 	 */
-	unsigned check_refs_only : 1;
+	unsigned check_refs_are_promisor_objects_only : 1;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] fetch: forgo full connectivity check if --filter
  2020-01-12  4:15 [PATCH 0/2] Skip a connectivity check during fetch --filter Jonathan Tan
  2020-01-12  4:15 ` [PATCH 1/2] connected: verify promisor-ness of partial clone Jonathan Tan
@ 2020-01-12  4:15 ` Jonathan Tan
  2020-01-29 20:43   ` Jonathan Nieder
  1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Tan @ 2020-01-12  4:15 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

If a filter is specified, we do not need a full connectivity check on
the contents of the packfile we just fetched; we only need to check that
the objects referenced are promisor objects.

This significantly speeds up fetches into repositories that have many
promisor objects, because during the connectivity check, all promisor
objects are enumerated (to mark them UNINTERESTING), and that takes a
significant amount of time.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
the remaining time is spent in yet another connectivity check
(fetch_refs -> check_exist_and_connected) prior to the fetch - that will
hopefully be done in a subsequent patch.
---
 builtin/fetch.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b4c6d921d0..6fb50320eb 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -906,8 +906,17 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	if (!connectivity_checked) {
+		struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
+		if (filter_options.choice)
+			/*
+			 * Since a filter is specified, objects indirectly
+			 * referenced by refs are allowed to be absent.
+			 */
+			opt.check_refs_are_promisor_objects_only = 1;
+
 		rm = ref_map;
-		if (check_connected(iterate_ref_map, &rm, NULL)) {
+		if (check_connected(iterate_ref_map, &rm, &opt)) {
 			rc = error(_("%s did not send all necessary objects\n"), url);
 			goto abort;
 		}
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] connected: verify promisor-ness of partial clone
  2020-01-12  4:15 ` [PATCH 1/2] connected: verify promisor-ness of partial clone Jonathan Tan
@ 2020-01-29 20:41   ` Jonathan Nieder
  0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2020-01-29 20:41 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan wrote:

> Commit dfa33a298d ("clone: do faster object check for partial clones",
> 2019-04-21) optimized the connectivity check done when cloning with
> --filter to check only the existence of objects directly pointed to by
> refs. But this is not sufficient: they also need to be promisor objects.
> Make this check more robust by instead checking that these objects are
> promisor objects, that is, they appear in a promisor pack.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/clone.c |  5 +++--
>  connected.c     | 19 ++++++++++++++-----
>  connected.h     | 11 ++++++-----
>  3 files changed, 23 insertions(+), 12 deletions(-)

Good call.  Sorry for the slow review.

[...]
> --- a/connected.c
> +++ b/connected.c
> @@ -52,19 +52,28 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  		strbuf_release(&idx_file);
>  	}
>  
> -	if (opt->check_refs_only) {
> +	if (opt->check_refs_are_promisor_objects_only) {
>  		/*
>  		 * For partial clones, we don't want to have to do a regular
>  		 * connectivity check because we have to enumerate and exclude
>  		 * all promisor objects (slow), and then the connectivity check
>  		 * itself becomes a no-op because in a partial clone every
>  		 * object is a promisor object. Instead, just make sure we
> -		 * received the objects pointed to by each wanted ref.
> +		 * received, in a promisor packfile, the objects pointed to by
> +		 * each wanted ref.
>  		 */
>  		do {
> -			if (!repo_has_object_file_with_flags(the_repository, &oid,
> -							     OBJECT_INFO_SKIP_FETCH_OBJECT))
> -				return 1;
> +			struct packed_git *p;
> +
> +			for (p = get_all_packs(the_repository); p; p = p->next) {
> +				if (!p->pack_promisor)
> +					continue;
> +				if (find_pack_entry_one(oid.hash, p))
> +					goto promisor_pack_found;
> +			}
> +			return 1;
> +promisor_pack_found:
> +			;
>  		} while (!fn(cb_data, &oid));
>  		return 0;

Yep, does what it says on the tin.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] fetch: forgo full connectivity check if --filter
  2020-01-12  4:15 ` [PATCH 2/2] fetch: forgo full connectivity check if --filter Jonathan Tan
@ 2020-01-29 20:43   ` Jonathan Nieder
  2020-01-30 18:57     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2020-01-29 20:43 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan wrote:

> If a filter is specified, we do not need a full connectivity check on
> the contents of the packfile we just fetched; we only need to check that
> the objects referenced are promisor objects.
>
> This significantly speeds up fetches into repositories that have many
> promisor objects, because during the connectivity check, all promisor
> objects are enumerated (to mark them UNINTERESTING), and that takes a
> significant amount of time.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
> the remaining time is spent in yet another connectivity check
> (fetch_refs -> check_exist_and_connected) prior to the fetch - that will
> hopefully be done in a subsequent patch.

Can this information (at least the speedup) be included in the comment
message?

Or even better, can we demonstrate the impact using a perf test?

> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -906,8 +906,17 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  		url = xstrdup("foreign");
>  
>  	if (!connectivity_checked) {
> +		struct check_connected_options opt = CHECK_CONNECTED_INIT;
> +
> +		if (filter_options.choice)
> +			/*
> +			 * Since a filter is specified, objects indirectly
> +			 * referenced by refs are allowed to be absent.
> +			 */
> +			opt.check_refs_are_promisor_objects_only = 1;
> +
>  		rm = ref_map;
> -		if (check_connected(iterate_ref_map, &rm, NULL)) {
> +		if (check_connected(iterate_ref_map, &rm, &opt)) {
>  			rc = error(_("%s did not send all necessary objects\n"), url);
>  			goto abort;
>  		}

Simple and sensible.  With or without a change like described above,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] fetch: forgo full connectivity check if --filter
  2020-01-29 20:43   ` Jonathan Nieder
@ 2020-01-30 18:57     ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2020-01-30 18:57 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> Jonathan Tan wrote:
>
>> If a filter is specified, we do not need a full connectivity check on
>> the contents of the packfile we just fetched; we only need to check that
>> the objects referenced are promisor objects.
>>
>> This significantly speeds up fetches into repositories that have many
>> promisor objects, because during the connectivity check, all promisor
>> objects are enumerated (to mark them UNINTERESTING), and that takes a
>> significant amount of time.
>>
>> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>> ---
>> For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
>> the remaining time is spent in yet another connectivity check
>> (fetch_refs -> check_exist_and_connected) prior to the fetch - that will
>> hopefully be done in a subsequent patch.
>
> Can this information (at least the speedup) be included in the comment
> message?
>
> Or even better, can we demonstrate the impact using a perf test?

It does make sense, but let's queue these two first and then add it
as a follow-up patch on top.

Thanks for writing and reviewing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-30 18:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-12  4:15 [PATCH 0/2] Skip a connectivity check during fetch --filter Jonathan Tan
2020-01-12  4:15 ` [PATCH 1/2] connected: verify promisor-ness of partial clone Jonathan Tan
2020-01-29 20:41   ` Jonathan Nieder
2020-01-12  4:15 ` [PATCH 2/2] fetch: forgo full connectivity check if --filter Jonathan Tan
2020-01-29 20:43   ` Jonathan Nieder
2020-01-30 18:57     ` Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git