git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] connected: always use partial clone optimization
Date: Thu, 26 Mar 2020 14:11:56 -0700	[thread overview]
Message-ID: <20200326211156.GA37946@google.com> (raw)
In-Reply-To: <20200320220045.258462-1-jonathantanmy@google.com>

On Fri, Mar 20, 2020 at 03:00:45PM -0700, Jonathan Tan wrote:
> With 50033772d5 ("connected: verify promisor-ness of partial clone",
> 2020-01-30), the fast path (checking promisor packs) in
> check_connected() now passes a subset of the slow path (rev-list) - if
> all objects to be checked are found in promisor packs, both the fast
> path and the slow path will pass; otherwise, the fast path will
> definitely not pass. This means that we can always attempt the fast path
> whenever we need to do the slow path.
> 
> The fast path is currently guarded by a flag; therefore, remove that
> flag. Also, make the fast path fallback to the slow path - if the fast
> path fails, the failing OID and all remaining OIDs will be passed to
> rev-list.

It looks like a pretty simple change. I had one probably-biased
complaint about gotos below, otherwise it looks reasonable to me.

> 
> The main user-visible benefit is the performance of fetch from a partial
> clone - specifically, the speedup of the connectivity check done before
> the fetch. In particular, a no-op fetch into a partial clone on my
> computer was sped up from 7 seconds to 0.01 seconds. This is a
> complement to the work in 2df1aa239c ("fetch: forgo full
> connectivity check if --filter", 2020-01-30), which is the child of the
> aforementioned 50033772d5. In that commit, the connectivity check
> *after* the fetch was sped up.
> 
> The addition of the fast path might cause performance reductions in
> these cases:
> 
>  - If a partial clone or a fetch into a partial clone fails, Git will
>    fruitlessly run rev-list (it is expected that everything fetched
>    would go into promisor packs, so if that didn't happen, it is most
>    likely that rev-list will fail too).
> 
>  - Any connectivity checks done by receive-pack, in the (in my opinion,
>    unlikely) event that a partial clone serves receive-pack.
> 
> I think that these cases are rare enough, and the performance reduction
> in this case minor enough (additional object DB access), that the
> benefit of avoiding a flag outweighs these.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> This is the second half of the work I did previously [1]. Quoting from
> [1]:
> 
> > For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
> > the remaining time is spent in yet another connectivity check
> > (fetch_refs -> check_exist_and_connected) prior to the fetch - that will
> > hopefully be done in a subsequent patch.
> 
> This is the subsequent patch. (Note that the timings were done on
> another computer, so don't compare the timings from [1] and this patch
> directly.)
> 
> [1] https://lore.kernel.org/git/be1d6aa4c4fd8868f3682b73c01a92d3830534ad.1578802317.git.jonathantanmy@google.com/
> ---
>  builtin/clone.c | 7 ++-----
>  builtin/fetch.c | 7 -------
>  connected.c     | 9 +++++++--
>  connected.h     | 9 ---------
>  4 files changed, 9 insertions(+), 23 deletions(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 1ad26f4d8c..4b2b14ff61 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -672,8 +672,7 @@ static void update_remote_refs(const struct ref *refs,
>  			       const char *branch_top,
>  			       const char *msg,
>  			       struct transport *transport,
> -			       int check_connectivity,
> -			       int check_refs_are_promisor_objects_only)
> +			       int check_connectivity)
>  {
>  	const struct ref *rm = mapped_refs;
>  
> @@ -682,8 +681,6 @@ static void update_remote_refs(const struct ref *refs,
>  
>  		opt.transport = transport;
>  		opt.progress = transport->progress;
> -		opt.check_refs_are_promisor_objects_only =
> -			!!check_refs_are_promisor_objects_only;
>  
>  		if (check_connected(iterate_ref_map, &rm, &opt))
>  			die(_("remote did not send all necessary objects"));
> @@ -1275,7 +1272,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  
>  	update_remote_refs(refs, mapped_refs, remote_head_points_at,
>  			   branch_top.buf, reflog_msg.buf, transport,
> -			   !is_local, filter_options.choice);
> +			   !is_local);
>  
>  	update_head(our_head_points_at, remote_head, reflog_msg.buf);
>  
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index bf6bab80fa..1097e1e512 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -908,13 +908,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  	if (!connectivity_checked) {
>  		struct check_connected_options opt = CHECK_CONNECTED_INIT;
>  
> -		if (filter_options.choice)
> -			/*
> -			 * Since a filter is specified, objects indirectly
> -			 * referenced by refs are allowed to be absent.
> -			 */
> -			opt.check_refs_are_promisor_objects_only = 1;
> -
>  		rm = ref_map;
>  		if (check_connected(iterate_ref_map, &rm, &opt)) {
>  			rc = error(_("%s did not send all necessary objects\n"), url);
> diff --git a/connected.c b/connected.c
> index 7e9bd1bc62..846f2e4eef 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -52,7 +52,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  		strbuf_release(&idx_file);
>  	}
>  
> -	if (opt->check_refs_are_promisor_objects_only) {
> +	if (has_promisor_remote()) {
>  		/*
>  		 * For partial clones, we don't want to have to do a regular
>  		 * connectivity check because we have to enumerate and exclude
> @@ -71,13 +71,18 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  				if (find_pack_entry_one(oid.hash, p))
>  					goto promisor_pack_found;
>  			}
> -			return 1;
> +			/*
> +			 * Fallback to rev-list with oid and the rest of the
> +			 * object IDs provided by fn.
> +			 */
> +			goto no_promisor_pack_found;
>  promisor_pack_found:
>  			;
>  		} while (!fn(cb_data, &oid));
>  		return 0;
>  	}
>  
> +no_promisor_pack_found:

Having a look at the final structure of the loop with these gotos, I'm a
little confused. Could be this isn't C-idiomatic but I think the code
could be easier to read with helpers instead of gotos. I realize it's
longer but I have a hard time understanding that your gotos are used to
double-continue or double-break; nested loops tend to make me want to
use helpers. But - I'm a lowly barely-reformed C++ developer, so what do
I know ;)

  int oid_in_promisor(oid) {
    for (p = get_all_packs(the_repository); p; p = p->next) {
      if (!p->pack_promisor)
        continue;
      if (find_pack_entry_one(oid.hash, p)
        return 1;
    }
  }

  int all_oids_in_promisors(oid, fn, cb_data)
  {
    do {
      if (! oid_in_promisor(oid))
        return 0;
    } while (!fn(cb_data, &oid));
    return 1;
  }

  int check_connected(...)
  {
    ...
    if (has_promisor_remote()) {
      if (all_oids_in_promisors(oid, fn, cb_data))
        return 0;
      if (opt->shallow_file) {
       ...
  }

>  	if (opt->shallow_file) {
>  		argv_array_push(&rev_list.args, "--shallow-file");
>  		argv_array_push(&rev_list.args, opt->shallow_file);
> diff --git a/connected.h b/connected.h
> index eba5c261ba..8d5a6b3ad6 100644
> --- a/connected.h
> +++ b/connected.h
> @@ -46,15 +46,6 @@ struct check_connected_options {
>  	 * during a fetch.
>  	 */
>  	unsigned is_deepening_fetch : 1;
> -
> -	/*
> -	 * If non-zero, only check that the top-level objects referenced by the
> -	 * wanted refs (passed in as cb_data) are promisor objects. This is
> -	 * useful for partial clones, where enumerating and excluding all
> -	 * promisor objects is very slow and the commit-walk itself becomes a
> -	 * no-op.
> -	 */
> -	unsigned check_refs_are_promisor_objects_only : 1;
>  };
>  
>  #define CHECK_CONNECTED_INIT { 0 }
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

  parent reply	other threads:[~2020-03-26 21:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-20 22:00 [PATCH] connected: always use partial clone optimization Jonathan Tan
2020-03-20 22:54 ` Junio C Hamano
2020-03-26 19:01 ` Josh Steadmon
2020-03-26 21:11 ` Emily Shaffer [this message]
2020-03-26 23:14   ` Josh Steadmon
2020-03-29 17:39     ` Junio C Hamano
2020-03-30  3:32       ` Jonathan Tan
2020-03-30  5:12         ` Junio C Hamano
2020-03-30 16:04           ` Jonathan Tan
2020-03-30 18:09             ` Junio C Hamano
2020-03-30 13:37   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200326211156.GA37946@google.com \
    --to=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).