git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josh Steadmon <steadmon@google.com>
To: git@vger.kernel.org
Cc: jonathantanmy@google.com, peff@peff.net
Subject: [PATCH v4] clone: do faster object check for partial clones
Date: Fri, 19 Apr 2019 14:00:13 -0700	[thread overview]
Message-ID: <b4a285e2a199ea059c165aa344d037374797fa40.1555707373.git.steadmon@google.com> (raw)
In-Reply-To: <6de682d5e48186970644569586fc6613763d5caa.1554312374.git.steadmon@google.com>

For partial clones, doing a full connectivity check is wasteful; we skip
promisor objects (which, for a partial clone, is all known objects), and
enumerating them all to exclude them from the connectivity check can
take a significant amount of time on large repos.

At most, we want to make sure that we get the objects referred to by any
wanted refs. For partial clones, just check that these objects were
transferred.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
This is an update of the original V1 approach (skipping the fully
connectivity check when doing a partial clone) with updated commit
message and comments to address the review concerns.


Range-diff against v1:
1:  9c29e1ce8d ! 1:  b4a285e2a1 clone: do faster object check for partial clones
    @@ -4,8 +4,8 @@
     
         For partial clones, doing a full connectivity check is wasteful; we skip
         promisor objects (which, for a partial clone, is all known objects), and
    -    excluding them all from the connectivity check can take a significant
    -    amount of time on large repos.
    +    enumerating them all to exclude them from the connectivity check can
    +    take a significant amount of time on large repos.
     
         At most, we want to make sure that we get the objects referred to by any
         wanted refs. For partial clones, just check that these objects were
    @@ -59,10 +59,12 @@
      
     +	if (opt->check_refs_only) {
     +		/*
    -+		 * For partial clones, we don't want to walk the full commit
    -+		 * graph because we're skipping promisor objects anyway. We
    -+		 * should just check that objects referenced by wanted refs were
    -+		 * transferred.
    ++		 * For partial clones, we don't want to have to do a regular
    ++		 * connectivity check because we have to enumerate and exclude
    ++		 * all promisor objects (slow), and then the connectivity check
    ++		 * itself becomes a no-op because in a partial clone every
    ++		 * object is a promisor object. Instead, just make sure we
    ++		 * received the objects pointed to by each wanted ref.
     +		 */
     +		do {
     +			if (!repo_has_object_file(the_repository, &oid))
    @@ -86,8 +88,8 @@
     +	/*
     +	 * If non-zero, only check the top-level objects referenced by the
     +	 * wanted refs (passed in as cb_data). This is useful for partial
    -+	 * clones, where this can be much faster than excluding all promisor
    -+	 * objects prior to walking the commit graph.
    ++	 * clones, where enumerating and excluding all promisor objects is very
    ++	 * slow and the commit-walk itself becomes a no-op.
     +	 */
     +	unsigned check_refs_only : 1;
      };

 builtin/clone.c |  6 ++++--
 connected.c     | 17 +++++++++++++++++
 connected.h     |  8 ++++++++
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..fdbbd8942a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -657,7 +657,8 @@ static void update_remote_refs(const struct ref *refs,
 			       const char *branch_top,
 			       const char *msg,
 			       struct transport *transport,
-			       int check_connectivity)
+			       int check_connectivity,
+			       int check_refs_only)
 {
 	const struct ref *rm = mapped_refs;
 
@@ -666,6 +667,7 @@ static void update_remote_refs(const struct ref *refs,
 
 		opt.transport = transport;
 		opt.progress = transport->progress;
+		opt.check_refs_only = !!check_refs_only;
 
 		if (check_connected(iterate_ref_map, &rm, &opt))
 			die(_("remote did not send all necessary objects"));
@@ -1224,7 +1226,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
 			   branch_top.buf, reflog_msg.buf, transport,
-			   !is_local);
+			   !is_local, filter_options.choice);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/connected.c b/connected.c
index 1bba888eff..1ab481fed6 100644
--- a/connected.c
+++ b/connected.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "object-store.h"
 #include "run-command.h"
 #include "sigchain.h"
 #include "connected.h"
@@ -49,6 +50,22 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 		strbuf_release(&idx_file);
 	}
 
+	if (opt->check_refs_only) {
+		/*
+		 * For partial clones, we don't want to have to do a regular
+		 * connectivity check because we have to enumerate and exclude
+		 * all promisor objects (slow), and then the connectivity check
+		 * itself becomes a no-op because in a partial clone every
+		 * object is a promisor object. Instead, just make sure we
+		 * received the objects pointed to by each wanted ref.
+		 */
+		do {
+			if (!repo_has_object_file(the_repository, &oid))
+				return 1;
+		} while (!fn(cb_data, &oid));
+		return 0;
+	}
+
 	if (opt->shallow_file) {
 		argv_array_push(&rev_list.args, "--shallow-file");
 		argv_array_push(&rev_list.args, opt->shallow_file);
diff --git a/connected.h b/connected.h
index 8d5a6b3ad6..ce2e7d8f2e 100644
--- a/connected.h
+++ b/connected.h
@@ -46,6 +46,14 @@ struct check_connected_options {
 	 * during a fetch.
 	 */
 	unsigned is_deepening_fetch : 1;
+
+	/*
+	 * If non-zero, only check the top-level objects referenced by the
+	 * wanted refs (passed in as cb_data). This is useful for partial
+	 * clones, where enumerating and excluding all promisor objects is very
+	 * slow and the commit-walk itself becomes a no-op.
+	 */
+	unsigned check_refs_only : 1;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
-- 
2.21.0.593.g511ec345e18-goog


  parent reply	other threads:[~2019-04-19 21:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-03 17:27 [PATCH] clone: do faster object check for partial clones Josh Steadmon
2019-04-03 18:58 ` Jonathan Tan
2019-04-03 19:41 ` Jeff King
2019-04-03 20:57   ` Jonathan Tan
2019-04-04  0:21     ` Josh Steadmon
2019-04-04  1:33     ` Jeff King
2019-04-04 22:53 ` [PATCH v2] rev-list: exclude promisor objects at walk time Josh Steadmon
2019-04-04 23:08   ` Jeff King
2019-04-04 23:47     ` Josh Steadmon
2019-04-05  0:00       ` Jeff King
2019-04-05  0:09         ` Josh Steadmon
2019-04-08 20:59           ` Josh Steadmon
2019-04-08 21:06 ` [PATCH v3] " Josh Steadmon
2019-04-08 22:23   ` Christian Couder
2019-04-08 23:12     ` Josh Steadmon
2019-04-09 15:14   ` Junio C Hamano
2019-04-09 15:15     ` Jeff King
2019-04-09 15:43       ` Junio C Hamano
2019-04-09 16:35         ` Josh Steadmon
2019-04-09 18:04   ` SZEDER Gábor
2019-04-09 23:42     ` Josh Steadmon
2019-04-11  4:06       ` Jeff King
2019-04-12 22:38         ` Josh Steadmon
2019-04-13  5:34           ` Jeff King
2019-04-19 20:26             ` Josh Steadmon
2019-04-19 21:00 ` Josh Steadmon [this message]
2019-04-22 21:31   ` [PATCH v4] clone: do faster object check for partial clones Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4a285e2a199ea059c165aa344d037374797fa40.1555707373.git.steadmon@google.com \
    --to=steadmon@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).