git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/11] Resumable clone
@ 2016-09-16  0:12 Kevin Wern
  2016-09-16  0:12 ` [PATCH 01/11] Resumable clone: create service git-prime-clone Kevin Wern
                   ` (12 more replies)
  0 siblings, 13 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Hey, all,

It's been a while (sent a very short patch in May), but I've
still been working on the resumable clone feature and checking up on
the mailing list for any updates. After submitting the prime-clone
service alone, I figured implementing the whole thing would be the best
way to understand the full scope of the problem (this is my first real
contribution here, and learning while working on such an involved
feature has not been easy). 

This is a functional implementation handling a direct http/ftp URI to a
single, fully connected packfile (i.e. the link is a direct path to the
file, not a prefix or guess). My hope is that this acts as a bare
minimum cross-section spanning the full requirments that can expand in
width as more cases are added (.info file, split bundle, daemon
download service). This is certainly not perfect, but I think it at
least prototypes each component involved in the workflow.

This patch series is based on jc/bundle, because the logic to find the
tips of a pack's history already exists there (I call index-pack
--clone-bundle on the downloaded file, and read the file to write the
references to a temporary directory). If I need to re-implement this
logic or base it on another branch, let me know. For ease of pulling
and testing, I included the branch here:

https://github.com/kevinwern/git/tree/feature/prime-clone

Although there are a few changes internally from the last patch,
the "alternate resource" url to download is configured on the
server side in exactly the same way:

[primeclone]
	url = http://location/pack-$NAME.pack
	filetype = pack

The prime-clone service simply outputs the components as:

####url filetype
0000

On the client side, the transport_prime_clone and
transport_download_primer APIs are built to be more robust (i.e. read
messages without dying due to protocol errors), so that git clone can
always try them without being dependent on the capability output of
git-upload-pack. transport_download_primer is dependent on the success
of transport_prime_clone, but transport_prime_clone is always run on an
initial clone. Part of achieving this robustness involves adding
*_gentle functions to pkt_line, so that prime_clone can fail silently
without dying.

The transport_download_primer function uses a resumable download,
which is applicable to both automatic and manual resuming. Automatic
is programmatically reconnecting to the resource after being
interrupted (up to a set number of times). Manual is using a newly
taught --resume option on the command line:

git clone --resume <resumable_work_or_git_dir>

Right now, a manually resumable directory is left behind only if the
*client* is interrupted while a new junk mode, JUNK_LEAVE_RESUMABLE,
is set (right before the download). For an initial clone, if the
connection fails after automatic resuming, the client erases the
partial resources and falls through to a normal clone. However, once a
resumable directory is left behind by the program, it is NEVER
deleted/abandoned after it is continued with --resume.

I think determining when a resource is "unsalvageable" should be more
nuanced. Especially in a case where a connection is perpetually poor
and the user wishes to resume over a long period of time. The timeout
logic itself *definitely* needs more nuance than "repeat 5 times", such
as expanding wait times and using earlier successes when deciding to
try again. Right now, I think the most important part of this patch is
that these two paths (falling through after a failed download, exiting
to be manually resumed later) exist.

Off the top of my head, outstanding issues/TODOs inlcude:
	- The above issue of determining when to fall through, when to
	  reattempt, and when to write the resumable info and exit
	  in git clone.
	- Creating git-daemon service to download a resumable resource.
	  Pretty straightforward, I think, especially if
	  http.getanyfile already exists. This falls more under
	  "haven't gotten to yet" than dilemma.
	- Logic for git clone to determine when a full clone would
	  be superior, such as when a clone is local or a reference is
	  given.
	- Configuring prime-clone for multiple resources, in two
	  dimensions: (a) resources to choose from (e.g. fall back to
	  a second resource if the first one doesn't work) and (b)
	  resources to be downloaded together or in sequence (e.g.
	  download http://host/this, then http://host/that). Maybe
	  prime-clone could also handle client preferences in terms of
	  filetype or protocol. For this, I just have to re-read a few
	  discussions about the filetypes we use to see if there are
	  any outliers that aren't representable in this way. I think
	  this is another "haven't gotten to yet".
	- Related to the above, seeing if there are any outlying
	  resource types whose process can't be modularized into:
	  download to location, use, clean one way if failed, clean
	  another way if succeeded. The "split bundle," for example,
	  is retrieved (download), read for the pack location (use),
	  and then the packfile is retrieved (download). I believe, in
	  this case, all of that can be considered the "download," and
	  then indexing/writing can be considered "use." But I'm not
	  sure if there are more extreme cases.
	- Creating the logic to guess a packfile, and append that to a
	  prefix specified by the admin. Additionally, allowing the
	  admin to use a custom script to use their own logic to
	  output the URL.
	- Preventing the retry wait period (currently set by using
	  select()) from being interrupted by other system calls.
	  I believe there is a setting in libcurl, but I don't want
	  to make any potentially large-impact changes without
	  discussing it first. Plus, I believe changes to http.c were
	  up for discussion anyway.
	- Finding if there's a more elegant way to access the alternate
	  resource than invoking remote-helper with a url we don't care
	  about (the same url that will be specified later to stdin
	  with "download-primer").
	- Finding if there is a better way to suppress index-pack's
	  output than creating a run-command option specifically to
	  suppress stdout.
	- When running with ssh and a password, the credentials are
	  prompted for twice. I don't know if there is a way to
	  preserve credentials between executions. I couldn't find any
	  examples in git's source.

Some of these are issues I've been actively working on, but I'm
hitting a point where keeping everyone up-to-date trumps completeness.
Hopefully, the bulk of the 'learning and re-doing' is done and I can
update more frequently in smaller increments.

I will probably work on the git-daemon download service, the curl
timeout issue, and supporting other filetypes next.

Feedback is appreciated.

Kevin Wern (11):
  Resumable clone: create service git-prime-clone
  Resumable clone: add prime-clone endpoints
  pkt-line: create gentle packet_read_line functions
  Resumable clone: add prime-clone to remote-curl
  Resumable clone: add output parsing to connect.c
  Resumable clone: implement transport_prime_clone
  Resumable clone: add resumable download to http/curl
  Resumable clone: create transport_download_primer
  path: add resumable marker
  run command: add RUN_COMMAND_NO_STDOUT
  Resumable clone: implement primer logic in git-clone

 .gitignore                         |   1 +
 Documentation/git-clone.txt        |  16 +
 Documentation/git-daemon.txt       |   7 +
 Documentation/git-http-backend.txt |   7 +
 Documentation/git-prime-clone.txt  |  39 +++
 Makefile                           |   2 +
 builtin.h                          |   1 +
 builtin/clone.c                    | 590 +++++++++++++++++++++++++++++++------
 builtin/prime-clone.c              |  77 +++++
 cache.h                            |   1 +
 connect.c                          |  47 +++
 connect.h                          |  10 +-
 daemon.c                           |   7 +
 git.c                              |   1 +
 http-backend.c                     |  22 +-
 http.c                             |  86 +++++-
 http.h                             |   7 +-
 path.c                             |   1 +
 pkt-line.c                         |  47 ++-
 pkt-line.h                         |  16 +
 remote-curl.c                      | 192 +++++++++---
 run-command.c                      |   1 +
 run-command.h                      |   1 +
 t/t9904-git-prime-clone.sh         | 181 ++++++++++++
 transport-helper.c                 |  75 ++++-
 transport.c                        |  53 ++++
 transport.h                        |  27 ++
 27 files changed, 1361 insertions(+), 154 deletions(-)
 create mode 100644 Documentation/git-prime-clone.txt
 create mode 100644 builtin/prime-clone.c
 create mode 100755 t/t9904-git-prime-clone.sh

-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/11] Resumable clone: create service git-prime-clone
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16 20:53   ` Junio C Hamano
  2016-09-16  0:12 ` [PATCH 02/11] Resumable clone: add prime-clone endpoints Kevin Wern
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Create git-prime-clone, a program to be executed on the server that
returns the location and type of static resource to download before
performing the rest of a clone.

Additionally, as this executable's location will be configurable (see:
upload-pack and receive-pack), add the program to
BINDIR_PROGRAMS_NEED_X, in addition to the usual builtin places. Add
git-prime-clone executable to gitignore, as well

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 .gitignore                        |  1 +
 Documentation/git-prime-clone.txt | 39 ++++++++++++++++++++
 Makefile                          |  2 +
 builtin.h                         |  1 +
 builtin/prime-clone.c             | 77 +++++++++++++++++++++++++++++++++++++++
 git.c                             |  1 +
 6 files changed, 121 insertions(+)
 create mode 100644 Documentation/git-prime-clone.txt
 create mode 100644 builtin/prime-clone.c

diff --git a/.gitignore b/.gitignore
index 5087ce1..bfea25c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -106,6 +106,7 @@
 /git-pack-refs
 /git-parse-remote
 /git-patch-id
+/git-prime-clone
 /git-prune
 /git-prune-packed
 /git-pull
diff --git a/Documentation/git-prime-clone.txt b/Documentation/git-prime-clone.txt
new file mode 100644
index 0000000..fc5917d
--- /dev/null
+++ b/Documentation/git-prime-clone.txt
@@ -0,0 +1,39 @@
+git-prime-clone(1)
+============
+
+NAME
+----
+git-prime-clone - Get the location of an alternate resource
+to fetch before clone
+
+
+SYNOPSIS
+--------
+[verse]
+'git prime-clone' [--strict] <dir>
+
+DESCRIPTION
+-----------
+
+Outputs the resource, if configured to do so. Otherwise, returns
+nothing (packet flush 0000).
+
+CONFIGURE
+---------
+
+primeclone.url::
+	The full url of the resource (e.g.
+	http://examplehost/pack-$NAME.pack).
+
+primeclone.filetype::
+	The type of the resource (e.g. pack).
+
+primeclone.enabled::
+	When 'false', git-prime-clone will return an empty response,
+	regardless of what the rest of the configuration specifies;
+	otherwise, it will return the configured response. Is 'true'
+	by default.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 24bef8d..f2564ec 100644
--- a/Makefile
+++ b/Makefile
@@ -648,6 +648,7 @@ OTHER_PROGRAMS = git$X
 # what test wrappers are needed and 'install' will install, in bindir
 BINDIR_PROGRAMS_NEED_X += git
 BINDIR_PROGRAMS_NEED_X += git-upload-pack
+BINDIR_PROGRAMS_NEED_X += git-prime-clone
 BINDIR_PROGRAMS_NEED_X += git-receive-pack
 BINDIR_PROGRAMS_NEED_X += git-upload-archive
 BINDIR_PROGRAMS_NEED_X += git-shell
@@ -904,6 +905,7 @@ BUILTIN_OBJS += builtin/pack-objects.o
 BUILTIN_OBJS += builtin/pack-redundant.o
 BUILTIN_OBJS += builtin/pack-refs.o
 BUILTIN_OBJS += builtin/patch-id.o
+BUILTIN_OBJS += builtin/prime-clone.o
 BUILTIN_OBJS += builtin/prune-packed.o
 BUILTIN_OBJS += builtin/prune.o
 BUILTIN_OBJS += builtin/pull.o
diff --git a/builtin.h b/builtin.h
index 6b95006..c9e2254 100644
--- a/builtin.h
+++ b/builtin.h
@@ -97,6 +97,7 @@ extern int cmd_notes(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_objects(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_redundant(int argc, const char **argv, const char *prefix);
 extern int cmd_patch_id(int argc, const char **argv, const char *prefix);
+extern int cmd_prime_clone(int argc, const char **argv, const char *prefix);
 extern int cmd_prune(int argc, const char **argv, const char *prefix);
 extern int cmd_prune_packed(int argc, const char **argv, const char *prefix);
 extern int cmd_pull(int argc, const char **argv, const char *prefix);
diff --git a/builtin/prime-clone.c b/builtin/prime-clone.c
new file mode 100644
index 0000000..ce914d3
--- /dev/null
+++ b/builtin/prime-clone.c
@@ -0,0 +1,77 @@
+#include "cache.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static char const * const prime_clone_usage[] = {
+	N_("git prime-clone [--strict] <dir>"),
+	NULL
+};
+
+static unsigned int enabled = 1;
+static const char *url = NULL, *filetype = NULL;
+static int strict;
+
+static struct option prime_clone_options[] = {
+	OPT_BOOL(0, "strict", &strict, N_("Do not attempt <dir>/.git if <dir> "
+					  "is not a git directory")),
+	OPT_END(),
+};
+
+static void prime_clone(void)
+{
+	if (!enabled) {
+		fprintf(stderr, _("prime-clone not enabled\n"));
+	}
+	else if (url && filetype){
+		packet_write(1, "%s %s\n", filetype, url);
+	}
+	else if (url || filetype) {
+		if (filetype)
+			fprintf(stderr, _("prime-clone not properly "
+					  "configured: missing url\n"));
+		else if (url)
+			fprintf(stderr, _("prime-clone not properly "
+					  "configured: missing filetype\n"));
+	}
+	packet_flush(1);
+}
+
+static int prime_clone_config(const char *var, const char *value, void *unused)
+{
+	if (!strcmp("primeclone.url",var)) {
+		return git_config_pathname(&url, var, value);
+	}
+	if (!strcmp("primeclone.enabled",var)) {
+		enabled = git_config_bool(var, value);
+		return 0;
+	}
+	if (!strcmp("primeclone.filetype",var)) {
+		return git_config_string(&filetype, var, value);
+	}
+	return git_default_config(var, value, unused);
+}
+
+int cmd_prime_clone(int argc, const char **argv, const char *prefix)
+{
+	const char *dir;
+	argc = parse_options(argc, argv, prefix, prime_clone_options,
+			     prime_clone_usage, 0);
+	if (argc == 0) {
+		usage_msg_opt(_("No repository specified."), prime_clone_usage,
+			      prime_clone_options);
+	}
+	else if (argc > 1) {
+		usage_msg_opt(_("Too many arguments."), prime_clone_usage,
+			      prime_clone_options);
+	}
+
+	dir = argv[0];
+
+	if (!enter_repo(dir, 0)){
+		die(_("'%s' does not appear to be a git repository"), dir);
+	}
+
+	git_config(prime_clone_config, NULL);
+	prime_clone();
+	return 0;
+}
diff --git a/git.c b/git.c
index 6cc0c07..a5681fb 100644
--- a/git.c
+++ b/git.c
@@ -447,6 +447,7 @@ static struct cmd_struct commands[] = {
 	{ "pack-refs", cmd_pack_refs, RUN_SETUP },
 	{ "patch-id", cmd_patch_id },
 	{ "pickaxe", cmd_blame, RUN_SETUP },
+	{ "prime-clone", cmd_prime_clone },
 	{ "prune", cmd_prune, RUN_SETUP },
 	{ "prune-packed", cmd_prune_packed, RUN_SETUP },
 	{ "pull", cmd_pull, RUN_SETUP | NEED_WORK_TREE },
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/11] Resumable clone: add prime-clone endpoints
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
  2016-09-16  0:12 ` [PATCH 01/11] Resumable clone: create service git-prime-clone Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-19 13:15   ` Duy Nguyen
  2016-09-16  0:12 ` [PATCH 03/11] pkt-line: create gentle packet_read_line functions Kevin Wern
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Add logic to serve git-prime-clone to git and http clients.

Do not pass --stateless-rpc and --advertise-refs options to
prime-clone. It is inherently stateless and an 'advertisement'.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 Documentation/git-daemon.txt       |  7 +++++++
 Documentation/git-http-backend.txt |  7 +++++++
 daemon.c                           |  7 +++++++
 http-backend.c                     | 22 +++++++++++++++++-----
 4 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-daemon.txt b/Documentation/git-daemon.txt
index a69b361..853faab 100644
--- a/Documentation/git-daemon.txt
+++ b/Documentation/git-daemon.txt
@@ -231,6 +231,13 @@ receive-pack::
 	enabled by setting `daemon.receivepack` configuration item to
 	`true`.
 
+primeclone::
+	This serves 'git prime-clone' service to clients, allowing
+	'git clone' clients to get the location of a static resource
+	to download and integrate before performing an incremental
+	fetch. It is 'false' by default, but can be enabled by setting
+	it to `true`.
+
 EXAMPLES
 --------
 We assume the following in /etc/services::
diff --git a/Documentation/git-http-backend.txt b/Documentation/git-http-backend.txt
index 9268fb6..40be74e 100644
--- a/Documentation/git-http-backend.txt
+++ b/Documentation/git-http-backend.txt
@@ -54,6 +54,13 @@ http.receivepack::
 	disabled by setting this item to `false`, or enabled for all
 	users, including anonymous users, by setting it to `true`.
 
+http.primeclone::
+	This serves 'git prime-clone' service to clients, allowing
+	'git clone' clients to get the location of a static resource
+	to download and integrate before performing an incremental
+	fetch. It is 'false' by default, but can be enabled by setting
+	it to `true`.
+
 URL TRANSLATION
 ---------------
 To determine the location of the repository on disk, 'git http-backend'
diff --git a/daemon.c b/daemon.c
index 8d45c33..c2f539c 100644
--- a/daemon.c
+++ b/daemon.c
@@ -475,10 +475,17 @@ static int receive_pack(void)
 	return run_service_command(argv);
 }
 
+static int prime_clone(void)
+{
+	static const char *argv[] = { "prime-clone", "--strict", ".", NULL };
+	return run_service_command(argv);
+}
+
 static struct daemon_service daemon_service[] = {
 	{ "upload-archive", "uploadarch", upload_archive, 0, 1 },
 	{ "upload-pack", "uploadpack", upload_pack, 1, 1 },
 	{ "receive-pack", "receivepack", receive_pack, 0, 1 },
+	{ "prime-clone", "primeclone", prime_clone, 0, 1 },
 };
 
 static void enable_service(const char *name, int ena)
diff --git a/http-backend.c b/http-backend.c
index 8870a26..9c89a10 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -27,6 +27,7 @@ struct rpc_service {
 static struct rpc_service rpc_service[] = {
 	{ "upload-pack", "uploadpack", 1, 1 },
 	{ "receive-pack", "receivepack", 0, -1 },
+	{ "prime-clone", "primeclone", 0, -1 },
 };
 
 static struct string_list *get_parameters(void)
@@ -450,11 +451,22 @@ static void get_info_refs(char *arg)
 	hdr_nocache();
 
 	if (service_name) {
-		const char *argv[] = {NULL /* service name */,
-			"--stateless-rpc", "--advertise-refs",
-			".", NULL};
+		struct argv_array argv;
 		struct rpc_service *svc = select_service(service_name);
 
+		argv_array_init(&argv);
+		argv_array_push(&argv, svc->name);
+
+		// prime-clone does not need --stateless-rpc and
+		// --advertise-refs options. Maybe it will in the future, but
+		// until then it seems best to do this instead of adding
+		// "dummy" options.
+		if (strcmp(svc->name, "prime-clone") != 0) {
+			argv_array_pushl(&argv, "--stateless-rpc",
+					 "--advertise-refs", NULL);
+		}
+
+		argv_array_pushl(&argv, ".", NULL);
 		strbuf_addf(&buf, "application/x-git-%s-advertisement",
 			svc->name);
 		hdr_str(content_type, buf.buf);
@@ -463,8 +475,8 @@ static void get_info_refs(char *arg)
 		packet_write(1, "# service=git-%s\n", svc->name);
 		packet_flush(1);
 
-		argv[0] = svc->name;
-		run_service(argv, 0);
+		run_service(argv.argv, 0);
+		argv_array_clear(&argv);
 
 	} else {
 		select_getanyfile();
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/11] pkt-line: create gentle packet_read_line functions
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
  2016-09-16  0:12 ` [PATCH 01/11] Resumable clone: create service git-prime-clone Kevin Wern
  2016-09-16  0:12 ` [PATCH 02/11] Resumable clone: add prime-clone endpoints Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16 22:17   ` Junio C Hamano
  2016-09-16  0:12 ` [PATCH 04/11] Resumable clone: add prime-clone to remote-curl Kevin Wern
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Create a functions that can read malformed messages without dying.
Includes creation of flag PACKET_READ_GENTLE_ALL. For use handling
prime-clone (or other server error) responses.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 pkt-line.c | 47 ++++++++++++++++++++++++++++++++++++++---------
 pkt-line.h | 16 ++++++++++++++++
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 62fdb37..96060e5 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -155,13 +155,17 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 		*src_size -= ret;
 	} else {
 		ret = read_in_full(fd, dst, size);
-		if (ret < 0)
+		if (ret < 0) {
+			if (options & PACKET_READ_GENTLE_ALL)
+				return -1;
+
 			die_errno("read error");
+		}
 	}
 
 	/* And complain if we didn't get enough bytes to satisfy the read. */
 	if (ret < size) {
-		if (options & PACKET_READ_GENTLE_ON_EOF)
+		if (options & (PACKET_READ_GENTLE_ON_EOF | PACKET_READ_GENTLE_ALL))
 			return -1;
 
 		die("The remote end hung up unexpectedly");
@@ -205,15 +209,23 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
 	if (ret < 0)
 		return ret;
 	len = packet_length(linelen);
-	if (len < 0)
+	if (len < 0) {
+		if (options & PACKET_READ_GENTLE_ALL)
+			return -1;
+
 		die("protocol error: bad line length character: %.4s", linelen);
+	}
 	if (!len) {
 		packet_trace("0000", 4, 0);
 		return 0;
 	}
 	len -= 4;
-	if (len >= size)
+	if (len >= size) {
+		if (options & PACKET_READ_GENTLE_ALL)
+			return -1;
+
 		die("protocol error: bad line length %d", len);
+	}
 	ret = get_packet_data(fd, src_buf, src_len, buffer, len, options);
 	if (ret < 0)
 		return ret;
@@ -229,22 +241,39 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
 
 static char *packet_read_line_generic(int fd,
 				      char **src, size_t *src_len,
-				      int *dst_len)
+				      int *dst_len, int flags)
 {
 	int len = packet_read(fd, src, src_len,
 			      packet_buffer, sizeof(packet_buffer),
-			      PACKET_READ_CHOMP_NEWLINE);
+			      flags);
 	if (dst_len)
 		*dst_len = len;
-	return len ? packet_buffer : NULL;
+	return len > 0 ? packet_buffer : NULL;
 }
 
 char *packet_read_line(int fd, int *len_p)
 {
-	return packet_read_line_generic(fd, NULL, NULL, len_p);
+	return packet_read_line_generic(fd, NULL, NULL, len_p,
+			PACKET_READ_CHOMP_NEWLINE);
 }
 
 char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
 {
-	return packet_read_line_generic(-1, src, src_len, dst_len);
+	return packet_read_line_generic(-1, src, src_len, dst_len,
+			PACKET_READ_CHOMP_NEWLINE);
+}
+
+char *packet_read_line_gentle(int fd, int *len_p)
+{
+	return packet_read_line_generic(fd, NULL, NULL, len_p,
+			PACKET_READ_CHOMP_NEWLINE |
+			PACKET_READ_GENTLE_ALL);
+}
+
+
+char *packet_read_line_buf_gentle(char **src, size_t *src_len, int *dst_len)
+{
+	return packet_read_line_generic(-1, src, src_len, dst_len,
+			PACKET_READ_CHOMP_NEWLINE |
+			PACKET_READ_GENTLE_ALL);
 }
diff --git a/pkt-line.h b/pkt-line.h
index 3cb9d91..553e42e 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -52,11 +52,15 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((f
  * condition 4 (truncated input), but instead return -1. However, we will still
  * die for the other 3 conditions.
  *
+ * If options contains PACKET_READ_GENTLE_ALL, we will not die on any of the
+ * conditions, but return -1 instead.
+ *
  * If options contains PACKET_READ_CHOMP_NEWLINE, a trailing newline (if
  * present) is removed from the buffer before returning.
  */
 #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
 #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
+#define PACKET_READ_GENTLE_ALL    (1u<<2)
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
 		*buffer, unsigned size, int options);
 
@@ -75,6 +79,18 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
+/*
+ * Same as packet_read_line, but does not die on any errors (uses
+ * PACKET_READ_GENTLE_ALL).
+ */
+char *packet_read_line_gentle(int fd, int *len_p);
+
+/*
+ * Same as packet_read_line_buf, but does not die on any errors (uses
+ * PACKET_READ_GENTLE_ALL).
+ */
+char *packet_read_line_buf_gentle(char **src_buf, size_t *src_len, int *size);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 extern char packet_buffer[LARGE_PACKET_MAX];
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/11] Resumable clone: add prime-clone to remote-curl
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (2 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 03/11] pkt-line: create gentle packet_read_line functions Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-19 13:52   ` Duy Nguyen
  2016-09-16  0:12 ` [PATCH 05/11] Resumable clone: add output parsing to connect.c Kevin Wern
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Add function and interface to handle prime-clone input, extracting
and using duplicate functionality from discover_refs as function
request_service.

Because part of our goal is for prime_clone to recover from errors,
HTTP errors are only optionally printed to screen and never cause
death in this case.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 remote-curl.c | 165 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 121 insertions(+), 44 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 15e48e2..8ebb587 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -13,6 +13,8 @@
 #include "sha1-array.h"
 #include "send-pack.h"
 
+#define HTTP_ERROR_GENTLE (1u << 0)
+
 static struct remote *remote;
 /* always ends with a trailing slash */
 static struct strbuf url = STRBUF_INIT;
@@ -244,7 +246,31 @@ static int show_http_message(struct strbuf *type, struct strbuf *charset,
 	return 0;
 }
 
-static struct discovery *discover_refs(const char *service, int for_push)
+static char *http_handle_result(int http_return)
+{
+	struct strbuf error = STRBUF_INIT;
+
+	switch (http_return) {
+	case HTTP_OK:
+		return NULL;
+	case HTTP_MISSING_TARGET:
+		strbuf_addf(&error, "repository '%s' not found", url.buf);
+		break;
+	case HTTP_NOAUTH:
+		strbuf_addf(&error, "Authentication failed for '%s'",
+			    url.buf);
+		break;
+	default:
+		strbuf_addf(&error, "unable to access '%s': %s", url.buf,
+			    curl_errorstr);
+		break;
+	}
+
+	return strbuf_detach(&error, NULL);
+}
+
+static int request_service(char const *const service, char **buffer_full,
+			    char **buffer_msg, size_t *buffer_len, int flags)
 {
 	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
@@ -252,13 +278,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	struct strbuf buffer = STRBUF_INIT;
 	struct strbuf refs_url = STRBUF_INIT;
 	struct strbuf effective_url = STRBUF_INIT;
-	struct discovery *last = last_discovery;
-	int http_ret, maybe_smart = 0;
-	struct http_get_options options;
-
-	if (last && !strcmp(service, last->service))
-		return last;
-	free_discovery(last);
+	int http_ret, maybe_smart = 0, ran_smart = 0;
+	struct http_get_options get_options;
+	const char *error_string;
 
 	strbuf_addf(&refs_url, "%sinfo/refs", url.buf);
 	if ((starts_with(url.buf, "http://") || starts_with(url.buf, "https://")) &&
@@ -271,45 +293,41 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		strbuf_addf(&refs_url, "service=%s", service);
 	}
 
-	memset(&options, 0, sizeof(options));
-	options.content_type = &type;
-	options.charset = &charset;
-	options.effective_url = &effective_url;
-	options.base_url = &url;
-	options.no_cache = 1;
-	options.keep_error = 1;
-
-	http_ret = http_get_strbuf(refs_url.buf, &buffer, &options);
-	switch (http_ret) {
-	case HTTP_OK:
-		break;
-	case HTTP_MISSING_TARGET:
-		show_http_message(&type, &charset, &buffer);
-		die("repository '%s' not found", url.buf);
-	case HTTP_NOAUTH:
-		show_http_message(&type, &charset, &buffer);
-		die("Authentication failed for '%s'", url.buf);
-	default:
-		show_http_message(&type, &charset, &buffer);
-		die("unable to access '%s': %s", url.buf, curl_errorstr);
+	memset(&get_options, 0, sizeof(get_options));
+	get_options.content_type = &type;
+	get_options.charset = &charset;
+	get_options.effective_url = &effective_url;
+	get_options.base_url = &url;
+	get_options.no_cache = 1;
+	get_options.keep_error = 1;
+
+	http_ret = http_get_strbuf(refs_url.buf, &buffer, &get_options);
+	error_string = http_handle_result(http_ret);
+	if (error_string) {
+		if (!(flags & HTTP_ERROR_GENTLE)) {
+			show_http_message(&type, &charset, &buffer);
+			die("%s", error_string);
+		}
+		else if (options.verbosity > 1) {
+			show_http_message(&type, &charset, &buffer);
+			fprintf(stderr, "%s\n", error_string);
+		}
 	}
 
-	last= xcalloc(1, sizeof(*last_discovery));
-	last->service = service;
-	last->buf_alloc = strbuf_detach(&buffer, &last->len);
-	last->buf = last->buf_alloc;
+	*buffer_full = strbuf_detach(&buffer, buffer_len);
+	*buffer_msg = *buffer_full;
 
 	strbuf_addf(&exp, "application/x-%s-advertisement", service);
 	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
+	    (5 <= *buffer_len && (*buffer_msg)[4] == '#') &&
+	    !strbuf_cmp(&exp, &type) && http_ret == HTTP_OK) {
 		char *line;
 
 		/*
 		 * smart HTTP response; validate that the service
 		 * pkt-line matches our request.
 		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
+		line = packet_read_line_buf(buffer_msg, buffer_len, NULL);
 
 		strbuf_reset(&exp);
 		strbuf_addf(&exp, "# service=%s", service);
@@ -321,23 +339,80 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		 * until a packet flush marker.  Ignore these now, but
 		 * in the future we might start to scan them.
 		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
+		while (packet_read_line_buf(buffer_msg, buffer_len, NULL))
 			;
 
-		last->proto_git = 1;
+		ran_smart = 1;
 	}
 
-	if (last->proto_git)
-		last->refs = parse_git_refs(last, for_push);
-	else
-		last->refs = parse_info_refs(last);
-
 	strbuf_release(&refs_url);
 	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);
 	strbuf_release(&buffer);
+
+	return ran_smart;
+}
+
+static void prime_clone(void)
+{
+	char *result, *result_full, *line;
+	size_t result_len;
+	int err = 0, one_successful = 0;
+
+	if (request_service("git-prime-clone", &result_full, &result,
+			&result_len, HTTP_ERROR_GENTLE)) {
+		while (line = packet_read_line_buf_gentle(&result, &result_len,
+							  NULL)) {
+			char *space = strchr(line ,' ');
+
+			// We will eventually support multiple resources, so
+			// always parse the whole message
+			if (err)
+				continue;
+			if (!space || strchr(space + 1, ' ')) {
+				if (options.verbosity > 1)
+					fprintf(stderr, "prime clone "
+						"protocol error: got '%s'\n",
+						line);
+				printf("error\n");
+				err = 1;
+				continue;
+			}
+
+			one_successful = 1;
+			printf("%s\n", line);
+		}
+		if (!one_successful && options.verbosity > 1)
+			fprintf(stderr, "did not get required components for "
+				"alternate resource\n");
+	}
+
+	printf("\n");
+	fflush(stdout);
+	free(result_full);
+}
+
+
+static struct discovery *discover_refs(const char *service, int for_push)
+{
+	struct discovery *last = last_discovery;
+
+	if (last && !strcmp(service, last->service))
+		return last;
+	free_discovery(last);
+
+	last= xcalloc(1, sizeof(*last_discovery));
+	last->service = service;
+	last->proto_git = request_service(service, &last->buf_alloc,
+					  &last->buf, &last->len, 0);
+
+	if (last->proto_git)
+		last->refs = parse_git_refs(last, for_push);
+	else
+		last->refs = parse_info_refs(last);
+
 	last_discovery = last;
 	return last;
 }
@@ -1030,7 +1105,8 @@ int main(int argc, const char **argv)
 		} else if (!strcmp(buf.buf, "list") || starts_with(buf.buf, "list ")) {
 			int for_push = !!strstr(buf.buf + 4, "for-push");
 			output_refs(get_refs(for_push));
-
+		} else if (!strcmp(buf.buf, "prime-clone")) {
+			prime_clone();
 		} else if (starts_with(buf.buf, "push ")) {
 			parse_push(&buf);
 
@@ -1056,6 +1132,7 @@ int main(int argc, const char **argv)
 			printf("fetch\n");
 			printf("option\n");
 			printf("push\n");
+			printf("prime-clone\n");
 			printf("check-connectivity\n");
 			printf("\n");
 			fflush(stdout);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/11] Resumable clone: add output parsing to connect.c
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (3 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 04/11] Resumable clone: add prime-clone to remote-curl Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16  0:12 ` [PATCH 06/11] Resumable clone: implement transport_prime_clone Kevin Wern
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Add method for transport to call when parsing primeclone output from
stdin. Suppress stderr when using git_connect with ssh, unless output
is verbose.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 connect.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 connect.h | 10 ++++++----
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/connect.c b/connect.c
index 0478631..284de53 100644
--- a/connect.c
+++ b/connect.c
@@ -804,6 +804,10 @@ struct child_process *git_connect(int fd[2], const char *url,
 		}
 		argv_array_push(&conn->args, cmd.buf);
 
+		if (flags & CONNECT_SUPPRESS_ERRORS) {
+			conn->no_stderr = 1;
+		}
+
 		if (start_command(conn))
 			die("unable to fork");
 
@@ -831,3 +835,46 @@ int finish_connect(struct child_process *conn)
 	free(conn);
 	return code;
 }
+
+const struct alt_resource *const get_alt_res_connect(int fd, int flags)
+{
+	struct alt_resource *res = NULL;
+	const char *line;
+	char *url = NULL, *filetype = NULL;
+	char *error_string = NULL;
+
+	while (line = packet_read_line_gentle(fd, NULL)) {
+		const char *space = strchr(line, ' ');
+
+		// We will eventually support multiple resources, so always
+		// parse the whole message
+		if ((filetype && url) || error_string) {
+			continue;
+		}
+		if (skip_prefix(line, "ERR ", &line) || !space ||
+				strchr(space + 1, ' ')) {
+			error_string = xstrdup(line);
+			continue;
+		}
+		filetype = xstrndup(line, (space - line));
+		url = xstrdup(space + 1);
+	}
+
+	if (filetype && url && !error_string){
+		res = xcalloc(1, sizeof(*res));
+		res->filetype = filetype;
+		res->url = url;
+	}
+	else {
+		if (!(flags & CONNECT_SUPPRESS_ERRORS)) {
+			if (error_string)
+				fprintf(stderr, "prime clone protocol error: "
+					"got '%s'\n", error_string);
+			else
+				fprintf(stderr, "did not get required "
+					"components for alternate resource\n");
+		}
+	}
+
+	return res;
+}
diff --git a/connect.h b/connect.h
index 01f14cd..966c0eb 100644
--- a/connect.h
+++ b/connect.h
@@ -1,10 +1,11 @@
 #ifndef CONNECT_H
 #define CONNECT_H
 
-#define CONNECT_VERBOSE       (1u << 0)
-#define CONNECT_DIAG_URL      (1u << 1)
-#define CONNECT_IPV4          (1u << 2)
-#define CONNECT_IPV6          (1u << 3)
+#define CONNECT_VERBOSE         (1u << 0)
+#define CONNECT_DIAG_URL        (1u << 1)
+#define CONNECT_IPV4            (1u << 2)
+#define CONNECT_IPV6            (1u << 3)
+#define CONNECT_SUPPRESS_ERRORS (1u << 4)
 extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
 extern int finish_connect(struct child_process *conn);
 extern int git_connection_is_socket(struct child_process *conn);
@@ -12,5 +13,6 @@ extern int server_supports(const char *feature);
 extern int parse_feature_request(const char *features, const char *feature);
 extern const char *server_feature_value(const char *feature, int *len_ret);
 extern int url_is_local_not_ssh(const char *url);
+const struct alt_resource *const get_alt_res_connect(int fd, int flags);
 
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/11] Resumable clone: implement transport_prime_clone
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (4 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 05/11] Resumable clone: add output parsing to connect.c Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16  0:12 ` [PATCH 07/11] Resumable clone: add resumable download to http/curl Kevin Wern
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Create transport_prime_clone API, as well as all internal methods.
Create representations of alt_resource and prime-clone path options.

The intention of get_alt_res_helper is solely to parse the output of
remote-curl because transport-helper does not handle verbose options
or speaking to the user verbosely. Therefore, all error parsing is
done with remote-curl, and any protocol breach between remote-curl and
transport-helper will treated as a bug and result in death.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 transport-helper.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 transport.c        | 44 ++++++++++++++++++++++++++++++++++++++++++++
 transport.h        | 20 ++++++++++++++++++++
 3 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/transport-helper.c b/transport-helper.c
index b934183..eb185d5 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -28,7 +28,8 @@ struct helper_data {
 		signed_tags : 1,
 		check_connectivity : 1,
 		no_disconnect_req : 1,
-		no_private_update : 1;
+		no_private_update : 1,
+		prime_clone : 1;
 	char *export_marks;
 	char *import_marks;
 	/* These go from remote name (as in "list") to private name */
@@ -180,6 +181,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->export = 1;
 		else if (!strcmp(capname, "check-connectivity"))
 			data->check_connectivity = 1;
+		else if (!strcmp(capname, "prime-clone"))
+			data->prime_clone = 1;
 		else if (!data->refspecs && skip_prefix(capname, "refspec ", &arg)) {
 			ALLOC_GROW(refspecs,
 				   refspec_nr + 1,
@@ -248,6 +251,7 @@ static int disconnect_helper(struct transport *transport)
 }
 
 static const char *unsupported_options[] = {
+	TRANS_OPT_PRIMECLONE,
 	TRANS_OPT_UPLOADPACK,
 	TRANS_OPT_RECEIVEPACK,
 	TRANS_OPT_THIN,
@@ -1054,6 +1058,50 @@ static struct ref *get_refs_list(struct transport *transport, int for_push)
 	return ret;
 }
 
+static const struct alt_resource *const get_alt_res_helper(struct transport *transport)
+{
+	struct helper_data *data = transport->data;
+	char *url = NULL, *filetype = NULL;
+	struct alt_resource *ret = NULL;
+	struct strbuf out = STRBUF_INIT;
+	struct child_process *helper = get_helper(transport);
+	int err = 0;
+
+	helper = get_helper(transport);
+	write_constant(helper->in, "prime-clone\n");
+
+	while (!recvline(data, &out)) {
+		char *space = strchr(out.buf, ' ');
+
+		if (!*out.buf)
+			break;
+
+		if (starts_with(out.buf, "error")) {
+			err = 1;
+			continue;
+		}
+
+		if (!space || strchr(space + 1, ' '))
+			die("malformed alternate resource response: %s\n",
+			    out.buf);
+
+		if ((filetype && url) || err)
+			continue;
+
+		filetype = xstrndup(out.buf, (space - out.buf));
+		url = xstrdup(space + 1);
+	}
+
+	if (filetype && url && !err) {
+		ret = xcalloc(1, sizeof(*ret));
+		ret->filetype = filetype;
+		ret->url = url;
+	}
+
+	strbuf_release(&out);
+	return ret;
+}
+
 int transport_helper_init(struct transport *transport, const char *name)
 {
 	struct helper_data *data = xcalloc(1, sizeof(*data));
@@ -1067,6 +1115,7 @@ int transport_helper_init(struct transport *transport, const char *name)
 	transport->data = data;
 	transport->set_option = set_helper_option;
 	transport->get_refs_list = get_refs_list;
+	transport->prime_clone = get_alt_res_helper;
 	transport->fetch = fetch;
 	transport->push_refs = push_refs;
 	transport->disconnect = release_helper;
diff --git a/transport.c b/transport.c
index 7bd3206..dd0d839 100644
--- a/transport.c
+++ b/transport.c
@@ -131,6 +131,9 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_RECEIVEPACK)) {
 		opts->receivepack = value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_PRIMECLONE)) {
+		opts->primeclone = value;
+		return 0;
 	} else if (!strcmp(name, TRANS_OPT_THIN)) {
 		opts->thin = !!value;
 		return 0;
@@ -533,6 +536,42 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
 	return ret;
 }
 
+const struct alt_resource *const get_alt_res_via_connect(struct transport *transport)
+{
+	struct git_transport_data *data = transport->data;
+	const struct alt_resource *res = NULL;
+	int flags = transport->verbose > 0 ? 0 : CONNECT_SUPPRESS_ERRORS;
+
+	data->conn = git_connect(data->fd, transport->url,
+				 transport->smart_options->primeclone, flags);
+	res = get_alt_res_connect(data->fd[0], flags);
+
+	close(data->fd[0]);
+	close(data->fd[1]);
+	finish_connect(data->conn);
+	data->conn = NULL;
+
+	return res;
+}
+
+const struct alt_resource *const transport_prime_clone(struct transport *transport)
+{
+	if (transport->prime_clone && !transport->alt_res)
+		transport->alt_res = transport->prime_clone(transport);
+	if (transport->verbose > 0) {
+		if (transport->alt_res)
+			// redundant at this point, but will be
+			// more useful in future iterations with
+			// lists of potential resources
+			fprintf(stderr, "alt resource found: %s (%s)\n",
+				transport->alt_res->url,
+				transport->alt_res->filetype);
+		else
+			fprintf(stderr, "alt res not found\n");
+	}
+	return transport->alt_res;
+}
+
 static int connect_git(struct transport *transport, const char *name,
 		       const char *executable, int fd[2])
 {
@@ -691,6 +730,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
 		ret->set_option = NULL;
 		ret->get_refs_list = get_refs_via_connect;
 		ret->fetch = fetch_refs_via_pack;
+		ret->prime_clone = get_alt_res_via_connect;
 		ret->push_refs = git_transport_push;
 		ret->connect = connect_git;
 		ret->disconnect = disconnect_git;
@@ -713,6 +753,10 @@ struct transport *transport_get(struct remote *remote, const char *url)
 		ret->smart_options->receivepack = "git-receive-pack";
 		if (remote->receivepack)
 			ret->smart_options->receivepack = remote->receivepack;
+		// No remote.*.primeclone config because prime-clone only
+		// applies to clone. After that, it is never used the repo
+		// again.
+		ret->smart_options->primeclone = "git-prime-clone";
 	}
 
 	return ret;
diff --git a/transport.h b/transport.h
index c681408..2bb6963 100644
--- a/transport.h
+++ b/transport.h
@@ -15,6 +15,7 @@ struct git_transport_options {
 	int depth;
 	const char *uploadpack;
 	const char *receivepack;
+	const char *primeclone;
 	struct push_cas_option *cas;
 };
 
@@ -24,11 +25,17 @@ enum transport_family {
 	TRANSPORT_FAMILY_IPV6
 };
 
+struct alt_resource {
+	char *url;
+	char *filetype;
+};
+
 struct transport {
 	struct remote *remote;
 	const char *url;
 	void *data;
 	const struct ref *remote_refs;
+	const struct alt_resource *alt_res;
 
 	/**
 	 * Indicates whether we already called get_refs_list(); set by
@@ -68,6 +75,15 @@ struct transport {
 	struct ref *(*get_refs_list)(struct transport *transport, int for_push);
 
 	/**
+	 * Returns the location of an alternate resource to fetch before
+	 * cloning.
+	 *
+	 * If the transport cannot determine an alternate resource, then
+	 * NULL is returned.
+	 **/
+	const struct alt_resource *const (*prime_clone)(struct transport *transport);
+
+	/**
 	 * Fetch the objects for the given refs. Note that this gets
 	 * an array, and should ignore the list structure.
 	 *
@@ -164,6 +180,9 @@ int transport_restrict_protocols(void);
 /* The program to use on the remote side to send a pack */
 #define TRANS_OPT_UPLOADPACK "uploadpack"
 
+/* The program to use on the remote side to check for alternate resource */
+#define TRANS_OPT_PRIMECLONE "primeclone"
+
 /* The program to use on the remote side to receive a pack */
 #define TRANS_OPT_RECEIVEPACK "receivepack"
 
@@ -208,6 +227,7 @@ int transport_push(struct transport *connection,
 		   unsigned int * reject_reasons);
 
 const struct ref *transport_get_remote_refs(struct transport *transport);
+const struct alt_resource *const transport_prime_clone(struct transport *transport);
 
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/11] Resumable clone: add resumable download to http/curl
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (5 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 06/11] Resumable clone: implement transport_prime_clone Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16 22:45   ` Junio C Hamano
  2016-09-16  0:12 ` [PATCH 08/11] Resumable clone: create transport_download_primer Kevin Wern
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Create resumable download procedure and progress display function.
The conversion from B to KB occurs because otherwise the byte counts
for large repos (i.e. Linux) overflow calculating percentage.

The download protocol includes the resource's URL, and the directory
the resource will be downloaded to. The url passed to remote-curl on
invocation does not matter (git clone will use the resource url
again here).

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 http.c        | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 http.h        |  7 ++++-
 remote-curl.c | 27 +++++++++++++++++++
 3 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/http.c b/http.c
index 1d5e3bb..93d6324 100644
--- a/http.c
+++ b/http.c
@@ -10,6 +10,8 @@
 #include "pkt-line.h"
 #include "gettext.h"
 #include "transport.h"
+#include "progress.h"
+#include "dir.h"
 
 #if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
@@ -1136,7 +1138,10 @@ static int handle_curl_result(struct slot_results *results)
 				curl_easy_strerror(results->curl_result),
 				sizeof(curl_errorstr));
 #endif
-		return HTTP_ERROR;
+		if (results->http_code >= 400)
+			return HTTP_ERROR;
+		else
+			return HTTP_ERROR_RESUMABLE;
 	}
 }
 
@@ -1365,6 +1370,40 @@ static void http_opt_request_remainder(CURL *curl, off_t pos)
 #define HTTP_REQUEST_STRBUF	0
 #define HTTP_REQUEST_FILE	1
 
+static int bytes_to_rounded_kb(double bytes)
+{
+	return (int) (bytes + 512)/1024;
+}
+
+int progress_func(void *data, double total_to_download, double now_downloaded,
+		  double total_to_upload, double now_uploadeded)
+{
+	struct progress **progress = data;
+	int kilobytes = total_to_download >= 1024;
+
+	if (total_to_download <= 0.0) {
+		return 0;
+	}
+	if (kilobytes) {
+		now_downloaded = bytes_to_rounded_kb(now_downloaded);
+		total_to_download = bytes_to_rounded_kb(total_to_download);
+	}
+	if (!*progress && now_downloaded < total_to_download) {
+		if (total_to_download > 1024)
+			*progress = start_progress("Downloading (KB)",
+						   total_to_download);
+		else
+			*progress = start_progress("Downloading (B)",
+						   total_to_download);
+	}
+	display_progress(*progress, now_downloaded);
+	if (now_downloaded == total_to_download) {
+		stop_progress(progress);
+	}
+	return 0;
+}
+
+
 static int http_request(const char *url,
 			void *result, int target,
 			const struct http_get_options *options)
@@ -1373,6 +1412,7 @@ static int http_request(const char *url,
 	struct slot_results results;
 	struct curl_slist *headers = NULL;
 	struct strbuf buf = STRBUF_INIT;
+	struct progress *progress = NULL;
 	const char *accept_language;
 	int ret;
 
@@ -1389,6 +1429,16 @@ static int http_request(const char *url,
 			off_t posn = ftello(result);
 			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
 					 fwrite);
+			if (options && options->progress) {
+				curl_easy_setopt(slot->curl,
+						 CURLOPT_NOPROGRESS, 0);
+				curl_easy_setopt(slot->curl,
+						 CURLOPT_PROGRESSFUNCTION,
+						 progress_func);
+				curl_easy_setopt(slot->curl,
+						 CURLOPT_PROGRESSDATA,
+						 &progress);
+			}
 			if (posn > 0)
 				http_opt_request_remainder(slot->curl, posn);
 		} else
@@ -1559,6 +1609,40 @@ cleanup:
 	return ret;
 }
 
+int http_download_primer(const char *url, const char *out_file)
+{
+	int ret = 0, try_count = HTTP_TRY_COUNT;
+	struct http_get_options options = {0};
+	options.progress = 1;
+
+	if (file_exists(out_file)) {
+		fprintf(stderr,
+			"File already downloaded: '%s', skipping...\n",
+			out_file);
+		return ret;
+	}
+
+	do {
+		if (try_count != HTTP_TRY_COUNT) {
+			fprintf(stderr, "Connection interrupted for some "
+				"reason, retrying (%d attempts left)\n",
+				try_count);
+			struct timeval time = {10, 0}; // 1s
+			select(0, NULL, NULL, NULL, &time);
+		}
+		ret = http_get_file(url, out_file, &options);
+		try_count--;
+	} while (try_count > 0 && ret == HTTP_ERROR_RESUMABLE);
+
+	if (ret != HTTP_OK) {
+		error("Unable to get resource: %s", url);
+		ret = -1;
+	}
+
+	return ret;
+}
+
+
 int http_fetch_ref(const char *base, struct ref *ref)
 {
 	struct http_get_options options = {0};
diff --git a/http.h b/http.h
index 4ef4bbd..6a7ce7b 100644
--- a/http.h
+++ b/http.h
@@ -138,7 +138,8 @@ extern char *get_remote_object_url(const char *url, const char *hex,
 /* Options for http_get_*() */
 struct http_get_options {
 	unsigned no_cache:1,
-		 keep_error:1;
+		 keep_error:1,
+		 progress:1;
 
 	/* If non-NULL, returns the content-type of the response. */
 	struct strbuf *content_type;
@@ -172,6 +173,7 @@ struct http_get_options {
 #define HTTP_START_FAILED	3
 #define HTTP_REAUTH	4
 #define HTTP_NOAUTH	5
+#define HTTP_ERROR_RESUMABLE	6
 
 /*
  * Requests a URL and stores the result in a strbuf.
@@ -180,6 +182,9 @@ struct http_get_options {
  */
 int http_get_strbuf(const char *url, struct strbuf *result, struct http_get_options *options);
 
+#define HTTP_TRY_COUNT 5
+int http_download_primer(const char *url, const char *out_file);
+
 extern int http_fetch_ref(const char *base, struct ref *ref);
 
 /* Helpers for fetching packs */
diff --git a/remote-curl.c b/remote-curl.c
index 8ebb587..051ba52 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -394,6 +394,30 @@ static void prime_clone(void)
 	free(result_full);
 }
 
+static void download_primer(const char *url, const char *base_dir)
+{
+	char *slash_ptr = strchr(url, '/'), *out_file;
+	struct strbuf out_path = STRBUF_INIT;
+	do {
+		out_file = slash_ptr + 1;
+	} while (slash_ptr = strchr(out_file, '/'));
+	strbuf_addf(&out_path, "%s/%s", base_dir, out_file);
+	if (!http_download_primer(url, out_path.buf))
+		printf("%s\n", out_path.buf);
+	printf("\n");
+	fflush(stdout);
+}
+
+static void parse_download_primer(struct strbuf *buf)
+{
+	const char *remote_url;
+	if (skip_prefix(buf->buf, "download-primer ", &remote_url)) {
+		char *base_path;
+		base_path = strchr(remote_url, ' ');
+		*base_path++ = '\0';
+		download_primer(remote_url, base_path);
+	}
+}
 
 static struct discovery *discover_refs(const char *service, int for_push)
 {
@@ -1105,6 +1129,8 @@ int main(int argc, const char **argv)
 		} else if (!strcmp(buf.buf, "list") || starts_with(buf.buf, "list ")) {
 			int for_push = !!strstr(buf.buf + 4, "for-push");
 			output_refs(get_refs(for_push));
+		} else if (starts_with(buf.buf, "download-primer")) {
+			parse_download_primer(&buf);
 		} else if (!strcmp(buf.buf, "prime-clone")) {
 			prime_clone();
 		} else if (starts_with(buf.buf, "push ")) {
@@ -1132,6 +1158,7 @@ int main(int argc, const char **argv)
 			printf("fetch\n");
 			printf("option\n");
 			printf("push\n");
+			printf("download-primer\n");
 			printf("prime-clone\n");
 			printf("check-connectivity\n");
 			printf("\n");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/11] Resumable clone: create transport_download_primer
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (6 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 07/11] Resumable clone: add resumable download to http/curl Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16  0:12 ` [PATCH 09/11] path: add resumable marker Kevin Wern
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Create function transport_download_primer and components
to invoke and pass commands to remote-curl.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 transport-helper.c | 24 ++++++++++++++++++++++++
 transport.c        |  9 +++++++++
 transport.h        |  7 +++++++
 3 files changed, 40 insertions(+)

diff --git a/transport-helper.c b/transport-helper.c
index eb185d5..2ff96ef 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -29,6 +29,7 @@ struct helper_data {
 		check_connectivity : 1,
 		no_disconnect_req : 1,
 		no_private_update : 1,
+		download_primer : 1,
 		prime_clone : 1;
 	char *export_marks;
 	char *import_marks;
@@ -183,6 +184,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->check_connectivity = 1;
 		else if (!strcmp(capname, "prime-clone"))
 			data->prime_clone = 1;
+		else if (!strcmp(capname, "download-primer"))
+			data->download_primer = 1;
 		else if (!data->refspecs && skip_prefix(capname, "refspec ", &arg)) {
 			ALLOC_GROW(refspecs,
 				   refspec_nr + 1,
@@ -1058,6 +1061,26 @@ static struct ref *get_refs_list(struct transport *transport, int for_push)
 	return ret;
 }
 
+static const char *download_primer(struct transport *transport,
+				   const struct alt_resource *res,
+				   const char *base_path)
+{
+	struct helper_data *data = transport->data;
+	struct child_process *helper;
+	struct strbuf buf = STRBUF_INIT, out = STRBUF_INIT;
+	char *ret = NULL;
+
+	helper = get_helper(transport);
+
+	strbuf_addf(&buf, "download-primer %s %s\n", res->url, base_path);
+	sendline(data, &buf);
+	recvline(data, &out);
+	strbuf_release(&buf);
+	if (out.len > 0)
+		ret = strbuf_detach(&out, NULL);
+	return ret;
+}
+
 static const struct alt_resource *const get_alt_res_helper(struct transport *transport)
 {
 	struct helper_data *data = transport->data;
@@ -1115,6 +1138,7 @@ int transport_helper_init(struct transport *transport, const char *name)
 	transport->data = data;
 	transport->set_option = set_helper_option;
 	transport->get_refs_list = get_refs_list;
+	transport->download_primer = download_primer;
 	transport->prime_clone = get_alt_res_helper;
 	transport->fetch = fetch;
 	transport->push_refs = push_refs;
diff --git a/transport.c b/transport.c
index dd0d839..3b33029 100644
--- a/transport.c
+++ b/transport.c
@@ -572,6 +572,15 @@ const struct alt_resource *const transport_prime_clone(struct transport *transpo
 	return transport->alt_res;
 }
 
+const char *transport_download_primer(struct transport *transport,
+				      const struct alt_resource *alt_res,
+				      const char *base_dir)
+{
+	if (transport->download_primer)
+		return transport->download_primer(transport, alt_res, base_dir);
+	return NULL;
+}
+
 static int connect_git(struct transport *transport, const char *name,
 		       const char *executable, int fd[2])
 {
diff --git a/transport.h b/transport.h
index 2bb6963..1484d6d 100644
--- a/transport.h
+++ b/transport.h
@@ -83,6 +83,10 @@ struct transport {
 	 **/
 	const struct alt_resource *const (*prime_clone)(struct transport *transport);
 
+	const char *(*download_primer)(struct transport *transport,
+			const struct alt_resource *alt_res,
+			const char *base_path);
+
 	/**
 	 * Fetch the objects for the given refs. Note that this gets
 	 * an array, and should ignore the list structure.
@@ -228,6 +232,9 @@ int transport_push(struct transport *connection,
 
 const struct ref *transport_get_remote_refs(struct transport *transport);
 const struct alt_resource *const transport_prime_clone(struct transport *transport);
+const char *transport_download_primer(struct transport *transport,
+		const struct alt_resource *alt_res,
+		const char *base_path);
 
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/11] path: add resumable marker
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (7 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 08/11] Resumable clone: create transport_download_primer Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-19 13:24   ` Duy Nguyen
  2016-09-16  0:12 ` [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT Kevin Wern
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Create function to get gitdir file RESUMABLE.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 cache.h | 1 +
 path.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/cache.h b/cache.h
index d7ff46e..1f4117c 100644
--- a/cache.h
+++ b/cache.h
@@ -811,6 +811,7 @@ const char *git_path_merge_mode(void);
 const char *git_path_merge_head(void);
 const char *git_path_fetch_head(void);
 const char *git_path_shallow(void);
+const char *git_path_resumable(void);
 
 /*
  * Return the name of the file in the local object database that would
diff --git a/path.c b/path.c
index 8b7e168..9360ed9 100644
--- a/path.c
+++ b/path.c
@@ -1201,4 +1201,5 @@ GIT_PATH_FUNC(git_path_merge_rr, "MERGE_RR")
 GIT_PATH_FUNC(git_path_merge_mode, "MERGE_MODE")
 GIT_PATH_FUNC(git_path_merge_head, "MERGE_HEAD")
 GIT_PATH_FUNC(git_path_fetch_head, "FETCH_HEAD")
+GIT_PATH_FUNC(git_path_resumable, "RESUMABLE")
 GIT_PATH_FUNC(git_path_shallow, "shallow")
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (8 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 09/11] path: add resumable marker Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16 23:07   ` Junio C Hamano
  2016-09-16  0:12 ` [PATCH 11/11] Resumable clone: implement primer logic in git-clone Kevin Wern
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Add option RUN_COMMAND_NO_STDOUT, which sets no_stdout on a child
process.

This will be used by git clone when calling index-pack on a downloaded
packfile.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 run-command.c | 1 +
 run-command.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/run-command.c b/run-command.c
index 863dad5..c4f82f9 100644
--- a/run-command.c
+++ b/run-command.c
@@ -574,6 +574,7 @@ int run_command_v_opt_cd_env(const char **argv, int opt, const char *dir, const
 	cmd.silent_exec_failure = opt & RUN_SILENT_EXEC_FAILURE ? 1 : 0;
 	cmd.use_shell = opt & RUN_USING_SHELL ? 1 : 0;
 	cmd.clean_on_exit = opt & RUN_CLEAN_ON_EXIT ? 1 : 0;
+	cmd.no_stdout = opt & RUN_COMMAND_NO_STDOUT ? 1 : 0;
 	cmd.dir = dir;
 	cmd.env = env;
 	return run_command(&cmd);
diff --git a/run-command.h b/run-command.h
index 42917e8..2d2c871 100644
--- a/run-command.h
+++ b/run-command.h
@@ -70,6 +70,7 @@ extern int run_hook_ve(const char *const *env, const char *name, va_list args);
 #define RUN_SILENT_EXEC_FAILURE 8
 #define RUN_USING_SHELL 16
 #define RUN_CLEAN_ON_EXIT 32
+#define RUN_COMMAND_NO_STDOUT 64
 int run_command_v_opt(const char **argv, int opt);
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/11] Resumable clone: implement primer logic in git-clone
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (9 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT Kevin Wern
@ 2016-09-16  0:12 ` Kevin Wern
  2016-09-16 23:32   ` Junio C Hamano
  2016-09-19 14:04   ` Duy Nguyen
  2016-09-16 20:47 ` [PATCH 00/11] Resumable clone Junio C Hamano
  2016-09-27 21:51 ` Eric Wong
  12 siblings, 2 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-16  0:12 UTC (permalink / raw)
  To: git

Use transport_download_primer and transport_prime_clone in git clone.
This only supports a fully connected packfile.

transport_prime_clone and transport_download_primer are executed
completely independent of transport_(get|fetch)_remote_refs, et al.
transport_download_primer is executed based on the existence of an
alt_resource. The idea is that the "prime clone" execution should be
able to attempt retrieving an alternate resource without dying, as
opposed to depending on the result of upload pack's "capabilities" to
indicate whether or not the client can attempt it.

If a resumable resource is available, execute a codepath with the
following modular components:
- downloading resource to a specific directory
- using the resource (for pack, indexing and generating the bundle
  file)
- cleaning up the resource (if the download or use fails)
- cleaning up the resource (if the download or use succeeds)

If resume is interrupted on the client side, the alternate resource
info is written to the RESUMABLE file in the git directory.

On resume, the required info is extracted by parsing the created
config file, and that info is used to determine the work and git
directories. If these cannot be determined, the program exits.
The writing of the refspec and determination of the initial git
directories are skipped, along with transport_prime_clone.

The main purpose of this series of patches is to flesh out a codepath
for automatic resuming, manual resuming, and leaving a resumable
directory on exit--the logic for when to do these still needs more
work.

Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
---
 Documentation/git-clone.txt |  16 ++
 builtin/clone.c             | 590 +++++++++++++++++++++++++++++++++++++-------
 t/t9904-git-prime-clone.sh  | 181 ++++++++++++++
 3 files changed, 698 insertions(+), 89 deletions(-)
 create mode 100755 t/t9904-git-prime-clone.sh

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index b7c467a..5934bb6 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -16,6 +16,7 @@ SYNOPSIS
 	  [--depth <depth>] [--[no-]single-branch]
 	  [--recursive | --recurse-submodules] [--] <repository>
 	  [<directory>]
+'git clone --resume <resumable_dir>'
 
 DESCRIPTION
 -----------
@@ -172,6 +173,12 @@ objects from the source repository into a pack in the cloned repository.
 	via ssh, this specifies a non-default path for the command
 	run on the other end.
 
+--prime-clone <prime-clone>::
+-p <prime-clone>::
+	When given and the repository to clone from is accessed
+	via ssh, this specifies a non-default path for the command
+	run on the other end.
+
 --template=<template_directory>::
 	Specify the directory from which templates will be used;
 	(See the "TEMPLATE DIRECTORY" section of linkgit:git-init[1].)
@@ -232,6 +239,15 @@ objects from the source repository into a pack in the cloned repository.
 	for `host.xz:foo/.git`).  Cloning into an existing directory
 	is only allowed if the directory is empty.
 
+--resume::
+	Resume a partially cloned repo in a "resumable" state. This
+	can only be specified with a single local directory (<resumable
+	dir>). This is incompatible with all other options.
+
+<resumable_dir>::
+	The directory of the partial clone. This could be either the
+	work directory or the git directory.
+
 :git-clone: 1
 include::urls.txt[]
 
diff --git a/builtin/clone.c b/builtin/clone.c
index 9ac6c01..d9a13dc 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -8,7 +8,9 @@
  * Clone a repository into a different directory that does not yet exist.
  */
 
+#include "cache.h"
 #include "builtin.h"
+#include "bundle.h"
 #include "lockfile.h"
 #include "parse-options.h"
 #include "fetch-pack.h"
@@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
 
 static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
 static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
+static int option_resume;
 static char *option_template, *option_depth;
-static char *option_origin = NULL;
+static const char *option_origin = NULL;
 static char *option_branch = NULL;
 static const char *real_git_dir;
 static char *option_upload_pack = "git-upload-pack";
+static char *option_prime_clone = "git-prime-clone";
 static int option_verbosity;
 static int option_progress = -1;
 static enum transport_family family;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static const struct alt_resource *alt_res = NULL;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -85,10 +90,14 @@ static struct option builtin_clone_options[] = {
 		   N_("checkout <branch> instead of the remote's HEAD")),
 	OPT_STRING('u', "upload-pack", &option_upload_pack, N_("path"),
 		   N_("path to git-upload-pack on the remote")),
+	OPT_STRING('p', "prime-clone", &option_prime_clone, N_("path"),
+		   N_("path to git-prime-clone on the remote")),
 	OPT_STRING(0, "depth", &option_depth, N_("depth"),
 		    N_("create a shallow clone of that depth")),
 	OPT_BOOL(0, "single-branch", &option_single_branch,
 		    N_("clone only one branch, HEAD or --branch")),
+	OPT_BOOL(0, "resume", &option_resume,
+		    N_("continue a resumable clone")),
 	OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 		   N_("separate git dir from working tree")),
 	OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
@@ -278,6 +287,21 @@ static void strip_trailing_slashes(char *dir)
 	*end = '\0';
 }
 
+static char *get_filename(const char *dir)
+{
+	char *dir_copy = xstrdup(dir);
+	strip_trailing_slashes(dir_copy);
+	char *filename, *final = NULL;
+
+	filename = find_last_dir_sep(dir);
+
+	if (filename && *(++filename))
+		final = xstrdup(filename);
+
+	free(dir_copy);
+	return final;
+}
+
 static int add_one_reference(struct string_list_item *item, void *cb_data)
 {
 	char *ref_git;
@@ -451,6 +475,7 @@ static const char *junk_work_tree;
 static const char *junk_git_dir;
 static enum {
 	JUNK_LEAVE_NONE,
+	JUNK_LEAVE_RESUMABLE,
 	JUNK_LEAVE_REPO,
 	JUNK_LEAVE_ALL
 } junk_mode = JUNK_LEAVE_NONE;
@@ -460,6 +485,29 @@ N_("Clone succeeded, but checkout failed.\n"
    "You can inspect what was checked out with 'git status'\n"
    "and retry the checkout with 'git checkout -f HEAD'\n");
 
+static const char junk_leave_resumable_msg[] =
+N_("Clone interrupted while copying resumable resource.\n"
+   "Try using 'git clone --resume <new_directory>',\n"
+   "where <new_directory> is either the new working \n"
+   "directory or git directory.\n\n"
+   "If this does not succeed, it could be because the\n"
+   "resource has been moved, corrupted, or changed.\n"
+   "If this is the case, you should remove <new_directory>\n"
+   "and run the original command.\n");
+
+static void write_resumable_resource()
+{
+	const char *filename = git_path_resumable();
+	struct strbuf content = STRBUF_INIT;
+	strbuf_addf(&content, "%s\n%s\n", alt_res->url, alt_res->filetype);
+	int fd = open(filename, O_WRONLY | O_CREAT, 0666);
+	if (fd < 0)
+		die_errno(_("Could not open '%s' for writing"), filename);
+	if (write_in_full(fd, content.buf, content.len) != content.len)
+		die_errno(_("Could not write to '%s'"), filename);
+	close(fd);
+}
+
 static void remove_junk(void)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -467,7 +515,11 @@ static void remove_junk(void)
 	switch (junk_mode) {
 	case JUNK_LEAVE_REPO:
 		warning("%s", _(junk_leave_repo_msg));
-		/* fall-through */
+		return;
+	case JUNK_LEAVE_RESUMABLE:
+		write_resumable_resource();
+		warning("%s", _(junk_leave_resumable_msg));
+		return;
 	case JUNK_LEAVE_ALL:
 		return;
 	default:
@@ -562,7 +614,7 @@ static void write_remote_refs(const struct ref *local_refs)
 		die("%s", err.buf);
 
 	for (r = local_refs; r; r = r->next) {
-		if (!r->peer_ref)
+		if (!r->peer_ref || ref_exists(r->peer_ref->name))
 			continue;
 		if (ref_transaction_create(t, r->peer_ref->name, r->old_oid.hash,
 					   0, NULL, &err))
@@ -820,11 +872,296 @@ static void dissociate_from_references(void)
 	free(alternates);
 }
 
+static int do_index_pack(const char *in_pack_file, const char *out_idx_file)
+{
+	const char *argv[] = { "index-pack", "--clone-bundle", "-v",
+			       "--check-self-contained-and-connected", "-o",
+			       out_idx_file, in_pack_file, NULL };
+	return run_command_v_opt(argv, RUN_GIT_CMD|RUN_COMMAND_NO_STDOUT);
+}
+
+static const char *replace_extension(const char *filename, const char *existing,
+				     const char *replacement)
+{
+	struct strbuf new_filename = STRBUF_INIT;
+	int existing_len = strlen(existing);
+	int replacement_len = strlen(replacement);
+	int filename_len = strlen(filename);
+
+	if (!(filename && existing && replacement)) {
+		return NULL;
+	}
+
+	if (!strncmp(filename + filename_len - existing_len,
+		     existing, existing_len)) {
+		int existing_position = filename_len - existing_len;
+		strbuf_addstr(&new_filename, filename);
+		strbuf_splice(&new_filename, existing_position, existing_len,
+				replacement, replacement_len);
+	}
+
+	return strbuf_detach(&new_filename, NULL);
+}
+
+static const char *setup_and_index_pack(const char *filename)
+{
+	const char *primer_idx_path = NULL, *primer_bndl_path = NULL;
+	primer_idx_path = replace_extension(filename, ".pack", ".idx");
+	primer_bndl_path = replace_extension(filename, ".pack", ".bndl");
+
+	if (!(primer_idx_path && primer_bndl_path)) {
+		warning("invalid pack filename '%s', falling back to full "
+			"clone", filename);
+		return NULL;
+	}
+
+	if (!file_exists(primer_bndl_path)) {
+		if (do_index_pack(filename, primer_idx_path)) {
+			warning("could not index primer pack, falling back to "
+				"full clone");
+			return NULL;
+		}
+	}
+
+	return primer_bndl_path;
+}
+
+static int write_bundle_refs(const char *bundle_filename)
+{
+	struct ref_transaction *t;
+	struct bundle_header history_tips;
+	const char *temp_ref_base = "resume";
+	struct strbuf err = STRBUF_INIT;
+	int i;
+
+	init_bundle_header(&history_tips, bundle_filename);
+	read_bundle_header(&history_tips);
+
+	t = ref_transaction_begin(&err);
+	for (i = 0; i < history_tips.references.nr; i++) {
+		struct strbuf ref_name = STRBUF_INIT;
+		strbuf_addf(&ref_name, "refs/temp/%s/%s/temp-%s",
+			    option_origin, temp_ref_base,
+			    sha1_to_hex(history_tips.references.list[i].sha1));
+		if (!ref_exists(ref_name.buf)) {
+			if (ref_transaction_create(t, ref_name.buf,
+					history_tips.references.list[i].sha1,
+					0, NULL, &err)) {
+				warning(_("%s"), err.buf);
+				return -1;
+			}
+		}
+		strbuf_release(&ref_name);
+	}
+
+	if (initial_ref_transaction_commit(t, &err)) {
+		warning("%s", err.buf);
+		return -1;
+	}
+	ref_transaction_free(t);
+	release_bundle_header(&history_tips);
+	return 0;
+}
+
+static int use_alt_resource_pack(const char *alt_res_path)
+{
+	int ret = -1;
+	const char *bundle_path = setup_and_index_pack(alt_res_path);
+	if (bundle_path)
+		ret = write_bundle_refs(bundle_path);
+	return ret;
+}
+
+static int use_alt_resource(const char *alt_res_path)
+{
+	int ret = -1;
+	if (!strcmp(alt_res->filetype, "pack"))
+		ret = use_alt_resource_pack(alt_res_path);
+	return ret;
+}
+
+static void clean_alt_resource_pack(const char *resource_path,
+				    int prime_successful)
+{
+	struct bundle_header history_tips;
+	const char *temp_ref_base = "resume";
+	const char *bundle_path;
+
+	if (!resource_path)
+		return;
+
+	bundle_path = replace_extension(resource_path, ".pack", ".bndl");
+
+	if (prime_successful) {
+		init_bundle_header(&history_tips, bundle_path);
+		read_bundle_header(&history_tips);
+
+		for (int i = 0; i < history_tips.references.nr; i++) {
+			struct strbuf ref_name = STRBUF_INIT;
+			strbuf_addf(&ref_name, "refs/temp/%s/%s/temp-%s",
+				    option_origin, temp_ref_base,
+				    sha1_to_hex(history_tips.references.list[i].sha1));
+			if (ref_exists(ref_name.buf)) {
+				delete_ref(ref_name.buf,
+					   history_tips.references.list[i].sha1,
+					   0);
+			}
+			strbuf_release(&ref_name);
+		}
+		release_bundle_header(&history_tips);
+	}
+
+	if (!prime_successful) {
+		const char *tmp_path = mkpath("%s.temp", resource_path);
+		const char *idx_path = replace_extension(resource_path, ".pack",
+							 ".idx");
+		if (file_exists(resource_path)) {
+			unlink(resource_path);
+		}
+		if (file_exists(tmp_path)) {
+			unlink(tmp_path);
+		}
+		if (file_exists(idx_path)) {
+			unlink(idx_path);
+		}
+	}
+	if (file_exists(bundle_path)) {
+		unlink(bundle_path);
+	}
+}
+
+static const char *fetch_alt_resource_pack(struct transport *transport,
+					  const char *base_dir)
+{
+	struct strbuf download_path = STRBUF_INIT;
+	const char *resource_path = NULL;
+	struct remote *primer_remote = remote_get(alt_res->url);
+	struct transport *primer_transport = transport_get(primer_remote,
+							   alt_res->url);
+	strbuf_addf(&download_path, "%s/objects/pack", base_dir);
+	fprintf(stderr, "Downloading primer: %s...\n", alt_res->url);
+	resource_path = transport_download_primer(primer_transport, alt_res,
+						  download_path.buf);
+	transport_disconnect(primer_transport);
+	return resource_path;
+}
+
+static void clean_alt_resource(const char *resource_path, int prime_successful)
+{
+	if (!strcmp(alt_res->filetype, "pack"))
+		clean_alt_resource_pack(resource_path, prime_successful);
+}
+
+static const char *fetch_alt_resource(struct transport *transport,
+				      const char *base_dir)
+{
+	const char *resource_path = NULL;
+	if (!strcmp(alt_res->filetype, "pack"))
+		resource_path = fetch_alt_resource_pack(transport, base_dir);
+	return resource_path;
+}
+
+static const struct alt_resource *get_last_alt_resource(void)
+{
+	struct alt_resource *ret = NULL;
+	FILE *fp;
+	if (fp = fopen(git_path_resumable(), "r")) {
+		ret = xcalloc(1, sizeof(struct alt_resource));
+		struct strbuf line = STRBUF_INIT;
+		strbuf_getline(&line, fp);
+		ret->url = strbuf_detach(&line, NULL);
+		strbuf_getline(&line, fp);
+		ret->filetype = strbuf_detach(&line, NULL);
+		fclose(fp);
+	}
+	return ret;
+}
+
+struct remote_config {
+	const char *name;
+	const char *fetch_pattern;
+	const char *worktree;
+	int bare;
+	int mirror;
+};
+
+static int get_remote_info(const char *key, const char *value, void *priv)
+{
+	struct remote_config *p = priv;
+	char *sub_key;
+	char *name;
+
+	if (skip_prefix(key, "remote.", &key)) {
+		name = xstrdup(key);
+		sub_key = strchr(name, '.');
+		*sub_key++ = 0;
+		if (!p->name)
+			p->name = xstrdup(name);
+		else if (!strcmp(sub_key, "fetch"))
+			git_config_string(&p->fetch_pattern, key, value);
+		else if (!strcmp(sub_key, "mirror"))
+			p->mirror =  git_config_bool(key, value);
+		free(name);
+	}
+	else if (!strcmp(key, "core.bare"))
+		p->bare =  git_config_bool(key, value);
+	else if (!strcmp(key, "core.worktree"))
+		git_config_string(&p->worktree, key, value);
+
+	return 0;
+}
+
+static void get_existing_state(char *dir, const char **git_dir,
+			   const char **work_tree,
+			   struct remote_config *past_info)
+{
+	if (is_git_directory(dir)) {
+		*git_dir = xstrdup(dir);
+		*work_tree = NULL;
+	}
+	else if (file_exists(mkpath("%s/.git", dir))){
+		*work_tree = xstrdup(dir);
+		*git_dir = xstrdup(resolve_gitdir(mkpath("%s/.git", dir)));
+	}
+
+	if (!*git_dir)
+		die(_("'%s' does not appear to be a git repo."), dir);
+
+	set_git_dir(*git_dir);
+	git_config(get_remote_info, past_info);
+
+	if (!*work_tree) {
+		if (past_info->worktree) {
+			*work_tree = past_info->worktree;
+		}
+		else if (!past_info->bare) {
+			int containing_dir_success = 1;
+			char *filename = get_filename(*git_dir);
+			if (filename && !strcmp(filename, ".git")) {
+				const char *parent_dir = mkpath("%s/..",
+								*git_dir);
+				*work_tree = xstrdup(real_path(parent_dir));
+				if (access(*work_tree, W_OK) < 0)
+					containing_dir_success = 0;
+			}
+			else {
+				containing_dir_success = 0;
+			}
+			if (!containing_dir_success)
+				die(_("'%s' is configured for a work tree, "
+				      "but no candidate exists."), dir);
+		}
+	}
+	if (*work_tree)
+		set_git_work_tree(*work_tree);
+}
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
-	int is_bundle = 0, is_local;
+	int is_bundle = 0, is_local, argc_original, option_count;
 	struct stat buf;
-	const char *repo_name, *repo, *work_tree, *git_dir;
+	const char *repo_name, *repo, *work_tree, *git_dir = NULL;
+	const char *resource_path;
 	char *path, *dir;
 	int dest_exists;
 	const struct ref *refs, *remote_head;
@@ -838,13 +1175,23 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	const char *src_ref_prefix = "refs/heads/";
 	struct remote *remote;
 	int err = 0, complete_refs_before_fetch = 1;
-
 	struct refspec *refspec;
 	const char *fetch_pattern;
 
 	packet_trace_identity("clone");
+	argc_original = argc;
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
 			     builtin_clone_usage, 0);
+	option_count = argc_original - argc;
+
+	if (option_resume && option_count > 2) {
+		die(_("--resume is incompatible with all other options."));
+	}
+
+	if (option_resume && argc != 1) {
+		die(_("--resume must be specified with a single resumable "
+		      "directory."));
+	}
 
 	if (argc > 2)
 		usage_msg_opt(_("Too many arguments."),
@@ -872,105 +1219,140 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (!option_origin)
 		option_origin = "origin";
 
-	repo_name = argv[0];
+	if (option_resume) {
+		struct remote_config past_info = {0};
+		dir = xstrdup(real_path(argv[0]));
+		strip_trailing_slashes(dir);
+		if (!file_exists(dir))
+			die(_("directory '%s' does not exist."), dir);
+		get_existing_state(dir, &git_dir, &work_tree, &past_info);
+
+		if (!work_tree)
+			option_no_checkout = 1;
+		if (past_info.fetch_pattern)
+			fetch_pattern = past_info.fetch_pattern;
+		else {
+			struct strbuf fetch_temp = STRBUF_INIT;
+			strbuf_addstr(&branch_top, src_ref_prefix);
+			strbuf_addf(&fetch_temp, "+%s*:%s*", src_ref_prefix,
+					branch_top.buf);
+			fetch_pattern = strbuf_detach(&fetch_temp, NULL);
+		}
 
-	path = get_repo_path(repo_name, &is_bundle);
-	if (path)
-		repo = xstrdup(absolute_path(repo_name));
-	else if (!strchr(repo_name, ':'))
-		die(_("repository '%s' does not exist"), repo_name);
-	else
-		repo = repo_name;
+		option_origin = past_info.name;
+		option_mirror = past_info.mirror;
+		option_bare = past_info.bare;
+		refspec = parse_fetch_refspec(1, &fetch_pattern);
 
-	/* no need to be strict, transport_set_option() will validate it again */
-	if (option_depth && atoi(option_depth) < 1)
-		die(_("depth %s is not a positive number"), option_depth);
+		if (!(alt_res = get_last_alt_resource()))
+			die(_("--resume option used, but current "
+			      "directory is not resumable"));
 
-	if (argc == 2)
-		dir = xstrdup(argv[1]);
-	else
-		dir = guess_dir_name(repo_name, is_bundle, option_bare);
-	strip_trailing_slashes(dir);
+		junk_mode = JUNK_LEAVE_RESUMABLE;
+		atexit(remove_junk);
+		sigchain_push_common(remove_junk_on_signal);
+	}
+	else {
+		repo_name = argv[0];
+
+		path = get_repo_path(repo_name, &is_bundle);
+		if (path)
+			repo = xstrdup(absolute_path(repo_name));
+		else if (!strchr(repo_name, ':'))
+			die(_("repository '%s' does not exist"), repo_name);
+		else
+			repo = repo_name;
 
-	dest_exists = !stat(dir, &buf);
-	if (dest_exists && !is_empty_dir(dir))
-		die(_("destination path '%s' already exists and is not "
-			"an empty directory."), dir);
+		/* no need to be strict, transport_set_option() will validate it again */
+		if (option_depth && atoi(option_depth) < 1)
+			die(_("depth %s is not a positive number"), option_depth);
 
-	strbuf_addf(&reflog_msg, "clone: from %s", repo);
+		if (argc == 2)
+			dir = xstrdup(argv[1]);
+		else
+			dir = guess_dir_name(repo_name, is_bundle, option_bare);
+		strip_trailing_slashes(dir);
 
-	if (option_bare)
-		work_tree = NULL;
-	else {
-		work_tree = getenv("GIT_WORK_TREE");
-		if (work_tree && !stat(work_tree, &buf))
-			die(_("working tree '%s' already exists."), work_tree);
-	}
+		dest_exists = !stat(dir, &buf);
+		if (dest_exists && !is_empty_dir(dir))
+			die(_("destination path '%s' already exists and is not "
+				"an empty directory."), dir);
 
-	if (option_bare || work_tree)
-		git_dir = xstrdup(dir);
-	else {
-		work_tree = dir;
-		git_dir = mkpathdup("%s/.git", dir);
-	}
+		strbuf_addf(&reflog_msg, "clone: from %s", repo);
 
-	atexit(remove_junk);
-	sigchain_push_common(remove_junk_on_signal);
-
-	if (!option_bare) {
-		if (safe_create_leading_directories_const(work_tree) < 0)
-			die_errno(_("could not create leading directories of '%s'"),
-				  work_tree);
-		if (!dest_exists && mkdir(work_tree, 0777))
-			die_errno(_("could not create work tree dir '%s'"),
-				  work_tree);
-		junk_work_tree = work_tree;
-		set_git_work_tree(work_tree);
-	}
+		if (option_bare)
+			work_tree = NULL;
+		else {
+			work_tree = getenv("GIT_WORK_TREE");
+			if (work_tree && !stat(work_tree, &buf))
+				die(_("working tree '%s' already exists."), work_tree);
+		}
 
-	junk_git_dir = git_dir;
-	if (safe_create_leading_directories_const(git_dir) < 0)
-		die(_("could not create leading directories of '%s'"), git_dir);
+		if (option_bare || work_tree)
+			git_dir = xstrdup(dir);
+		else {
+			work_tree = dir;
+			git_dir = mkpathdup("%s/.git", dir);
+		}
 
-	set_git_dir_init(git_dir, real_git_dir, 0);
-	if (real_git_dir) {
-		git_dir = real_git_dir;
-		junk_git_dir = real_git_dir;
-	}
+		atexit(remove_junk);
+		sigchain_push_common(remove_junk_on_signal);
 
-	if (0 <= option_verbosity) {
-		if (option_bare)
-			fprintf(stderr, _("Cloning into bare repository '%s'...\n"), dir);
-		else
-			fprintf(stderr, _("Cloning into '%s'...\n"), dir);
-	}
-	init_db(option_template, INIT_DB_QUIET);
-	write_config(&option_config);
+		if (!option_bare) {
+			if (safe_create_leading_directories_const(work_tree) < 0)
+				die_errno(_("could not create leading directories of '%s'"),
+					  work_tree);
+			if (!dest_exists && mkdir(work_tree, 0777))
+				die_errno(_("could not create work tree dir '%s'"),
+					  work_tree);
+			junk_work_tree = work_tree;
+			set_git_work_tree(work_tree);
+		}
 
-	git_config(git_default_config, NULL);
+		junk_git_dir = git_dir;
+		if (safe_create_leading_directories_const(git_dir) < 0)
+			die(_("could not create leading directories of '%s'"), git_dir);
 
-	if (option_bare) {
-		if (option_mirror)
-			src_ref_prefix = "refs/";
-		strbuf_addstr(&branch_top, src_ref_prefix);
+		set_git_dir_init(git_dir, real_git_dir, 0);
+		if (real_git_dir) {
+			git_dir = real_git_dir;
+			junk_git_dir = real_git_dir;
+		}
 
-		git_config_set("core.bare", "true");
-	} else {
-		strbuf_addf(&branch_top, "refs/remotes/%s/", option_origin);
-	}
+		if (0 <= option_verbosity) {
+			if (option_bare)
+				fprintf(stderr, _("Cloning into bare repository '%s'...\n"), dir);
+			else
+				fprintf(stderr, _("Cloning into '%s'...\n"), dir);
+		}
+		init_db(option_template, INIT_DB_QUIET);
+		write_config(&option_config);
+
+		git_config(git_default_config, NULL);
 
-	strbuf_addf(&value, "+%s*:%s*", src_ref_prefix, branch_top.buf);
-	strbuf_addf(&key, "remote.%s.url", option_origin);
-	git_config_set(key.buf, repo);
-	strbuf_reset(&key);
+		if (option_bare) {
+			if (option_mirror)
+				src_ref_prefix = "refs/";
+			strbuf_addstr(&branch_top, src_ref_prefix);
 
-	if (option_reference.nr)
-		setup_reference();
+			git_config_set("core.bare", "true");
+		} else {
+			strbuf_addf(&branch_top, "refs/remotes/%s/", option_origin);
+		}
+
+		strbuf_addf(&value, "+%s*:%s*", src_ref_prefix, branch_top.buf);
+		strbuf_addf(&key, "remote.%s.url", option_origin);
+		git_config_set(key.buf, repo);
+		strbuf_reset(&key);
 
-	fetch_pattern = value.buf;
-	refspec = parse_fetch_refspec(1, &fetch_pattern);
+		if (option_reference.nr)
+			setup_reference();
 
-	strbuf_reset(&value);
+		fetch_pattern = value.buf;
+		refspec = parse_fetch_refspec(1, &fetch_pattern);
+
+		strbuf_reset(&value);
+	}
 
 	remote = remote_get(option_origin);
 	transport = transport_get(remote, remote->url[0]);
@@ -1003,6 +1385,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_single_branch)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
 
+	if (option_prime_clone)
+		transport_set_option(transport, TRANS_OPT_PRIMECLONE,
+				     option_prime_clone);
+
 	if (option_upload_pack)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
@@ -1010,6 +1396,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (transport->smart_options && !option_depth)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
+	if (!is_local && option_reference.nr == 0 && !alt_res)
+		alt_res = transport_prime_clone(transport);
 	refs = transport_get_remote_refs(transport);
 
 	if (refs) {
@@ -1064,8 +1452,24 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 					      "refs/heads/master");
 	}
 
-	write_refspec_config(src_ref_prefix, our_head_points_at,
-			remote_head_points_at, &branch_top);
+	if (!option_resume) {
+		write_refspec_config(src_ref_prefix, our_head_points_at,
+				     remote_head_points_at, &branch_top);
+	}
+
+	if (alt_res) {
+		junk_mode = JUNK_LEAVE_RESUMABLE;
+		resource_path = fetch_alt_resource(transport, git_dir);
+		if (!resource_path || use_alt_resource(resource_path) < 0) {
+			if (option_resume)
+				die(_("resumable resource is no longer "
+				      "available or usable"));
+			junk_mode = JUNK_LEAVE_NONE;
+			clean_alt_resource(resource_path, 0);
+			resource_path = NULL;
+			alt_res = NULL;
+		}
+	}
 
 	if (is_local)
 		clone_local(path, git_dir);
@@ -1085,9 +1489,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		dissociate_from_references();
 	}
 
+	if (resource_path) {
+		clean_alt_resource(resource_path, 1);
+	}
+
 	junk_mode = JUNK_LEAVE_REPO;
 	err = checkout();
 
+	if (file_exists(git_path_resumable())) {
+		unlink(git_path_resumable());
+	}
+
 	strbuf_release(&reflog_msg);
 	strbuf_release(&branch_top);
 	strbuf_release(&key);
diff --git a/t/t9904-git-prime-clone.sh b/t/t9904-git-prime-clone.sh
new file mode 100755
index 0000000..257cea9
--- /dev/null
+++ b/t/t9904-git-prime-clone.sh
@@ -0,0 +1,181 @@
+#!/bin/sh
+
+test_description='tests for git prime-clone'
+. ./test-lib.sh
+
+ROOT_PATH="$PWD"
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'resume fails for no parameters' '
+	test_must_fail git clone --resume
+'
+
+test_expect_success 'resume fails with other options' '
+	test_must_fail git clone --resume --bare
+'
+
+test_expect_success 'resume fails for excess parameters' '
+	test_must_fail git clone --resume a b
+'
+
+test_expect_success 'resume fails for nonexistent directory' '
+	test_must_fail git clone --resume nonexistent
+'
+
+test_expect_success 'setup repo and httpd' '
+	mkdir server &&
+	cd server &&
+	git init &&
+	echo "content\\n" >example.c &&
+	git add example.c &&
+	git commit -m "I am a packed commit" &&
+	git repack . &&
+	git config --local http.primeclone true &&
+	git config --local primeclone.url \
+		$HTTPD_URL/server/.git/objects/pack/$(find .git/objects/pack/*.pack -printf "%f") &&
+	git config --local primeclone.filetype pack &&
+	echo "content\\n" >example2.c &&
+	echo "new content\\n" >example.c &&
+	git add example.c example2.c &&
+	git commit -m "I am an unpacked commit" &&
+	cd - &&
+	mv server "$HTTPD_DOCUMENT_ROOT_PATH"
+'
+
+test_expect_success 'prime-clone works http' '
+	git clone $HTTPD_URL/smart/server client &&
+	rm -r client
+'
+
+test_expect_success 'prime-clone falls back not permitted' '
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	git config --local http.primeclone false &&
+	cd - &&
+	git clone $HTTPD_URL/smart/server client &&
+	rm -r client &&
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	git config --local http.primeclone true &&
+	cd -
+'
+
+test_expect_success 'prime-clone falls back not enabled' '
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	git config --local primeclone.enabled 0 &&
+	cd - &&
+	git clone $HTTPD_URL/smart/server client &&
+	rm -r client &&
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	git config --local --unset-all primeclone.enabled &&
+	cd -
+'
+
+test_expect_success 'prime-clone falls back incorrect config' '
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	git config --local --unset-all primeclone.filetype &&
+	cd - &&
+	git clone $HTTPD_URL/smart/server client &&
+	rm -r client &&
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	git config --local primeclone.filetype pack &&
+	cd -
+'
+
+test_expect_success 'clone resume fails in complete/unmarked directory' '
+	git clone $HTTPD_URL/smart/server client &&
+	test_must_fail git clone --resume client &&
+	rm -r client
+'
+
+test_expect_success 'clone resume works with marked repo (work dir, normal)' '
+	git clone $HTTPD_URL/smart/server client &&
+	cd client &&
+	rm .git/objects/pack/*.idx &&
+	echo -n "$HTTPD_URL/server/" > .git/RESUMABLE &&
+	find .git/objects/pack/*.pack >> .git/RESUMABLE &&
+	echo "pack" >> .git/RESUMABLE &&
+	mv $(find .git/objects/pack/*.pack) $(find .git/objects/pack/*.pack).tmp &&
+	sed -i "2,$ d" $(find .git/objects/pack/*.pack.tmp) &&
+	rm * &&
+	git clone --resume . &&
+	cd - &&
+	rm -r client
+'
+
+test_expect_success 'clone resume works with marked repo (git dir, normal)' '
+	git clone $HTTPD_URL/smart/server client &&
+	cd client &&
+	rm .git/objects/pack/*.idx &&
+	echo -n "$HTTPD_URL/server/" > .git/RESUMABLE &&
+	find .git/objects/pack/*.pack >> .git/RESUMABLE &&
+	echo "pack" >> .git/RESUMABLE &&
+	mv $(find .git/objects/pack/*.pack) $(find .git/objects/pack/*.pack).tmp &&
+	sed -i "2,$ d" $(find .git/objects/pack/*.pack.tmp) &&
+	rm * &&
+	git clone --resume .git &&
+	cd - &&
+	rm -r client
+'
+
+test_expect_success 'clone resume works inside marked repo (git dir, normal)' '
+	git clone $HTTPD_URL/smart/server client &&
+	cd client &&
+	rm .git/objects/pack/*.idx &&
+	echo -n "$HTTPD_URL/server/" > .git/RESUMABLE &&
+	find .git/objects/pack/*.pack >> .git/RESUMABLE &&
+	echo "pack" >> .git/RESUMABLE &&
+	mv $(find .git/objects/pack/*.pack) $(find .git/objects/pack/*.pack).tmp &&
+	sed -i "2,$ d" $(find .git/objects/pack/*.pack.tmp) &&
+	rm * &&
+	cd .git &&
+	git clone --resume . &&
+	cd ../.. &&
+	rm -r client
+'
+
+test_expect_success 'clone resume works with marked repo (work dir, split)' '
+	git clone --separate-git-dir=separate_dir.git \
+		$HTTPD_URL/smart/server client &&
+	cd separate_dir.git &&
+	rm objects/pack/*.idx &&
+	echo -n "$HTTPD_URL/server/" > RESUMABLE &&
+	echo ".git/$(find objects/pack/*.pack)" >> RESUMABLE &&
+	echo "pack" >> RESUMABLE &&
+	mv $(find objects/pack/*.pack) $(find objects/pack/*.pack).tmp &&
+	sed -i "2,$ d" $(find objects/pack/*.pack.tmp) &&
+	cd ../client &&
+	rm * &&
+	cd .. &&
+	git clone --resume client &&
+	rm -r client separate_dir.git
+'
+
+test_expect_success 'clone resume works with marked repo (git dir, split)' '
+	git clone --separate-git-dir=separate_dir.git \
+		$HTTPD_URL/smart/server client &&
+	cd separate_dir.git &&
+	rm objects/pack/*.idx &&
+	echo -n "$HTTPD_URL/server/" > RESUMABLE &&
+	echo ".git/$(find objects/pack/*.pack)" >> RESUMABLE &&
+	echo "pack" >> RESUMABLE &&
+	mv $(find objects/pack/*.pack) $(find objects/pack/*.pack).tmp &&
+	sed -i "2,$ d" $(find objects/pack/*.pack.tmp) &&
+	cd ../client &&
+	rm * &&
+	cd .. &&
+	git clone --resume separate_dir.git &&
+	rm -r client separate_dir.git
+'
+
+test_expect_success 'prime-clone falls back unusable file' '
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	git config --local primeclone.url $HTTPD_URL/server/.git/HEAD &&
+	cd - &&
+	git clone $HTTPD_URL/smart/server client &&
+	rm -r client &&
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	cd -
+'
+
+stop_httpd
+test_done
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/11] Resumable clone
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (10 preceding siblings ...)
  2016-09-16  0:12 ` [PATCH 11/11] Resumable clone: implement primer logic in git-clone Kevin Wern
@ 2016-09-16 20:47 ` Junio C Hamano
  2016-09-27 21:51 ` Eric Wong
  12 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-09-16 20:47 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> writes:

> It's been a while (sent a very short patch in May), but I've
> still been working on the resumable clone feature and checking up on
> the mailing list for any updates. After submitting the prime-clone
> service alone, I figured implementing the whole thing would be the best
> way to understand the full scope of the problem (this is my first real
> contribution here, and learning while working on such an involved
> feature has not been easy). 

It may not have been easy but I hope it has been a fun journey for
you ;-)

> On the client side, the transport_prime_clone and
> transport_download_primer APIs are built to be more robust (i.e. read
> messages without dying due to protocol errors), so that git clone can
> always try them without being dependent on the capability output of
> git-upload-pack. transport_download_primer is dependent on the success
> of transport_prime_clone, but transport_prime_clone is always run on an
> initial clone. Part of achieving this robustness involves adding
> *_gentle functions to pkt_line, so that prime_clone can fail silently
> without dying.

OK.

> Right now, a manually resumable directory is left behind only if the
> *client* is interrupted while a new junk mode, JUNK_LEAVE_RESUMABLE,
> is set (right before the download). For an initial clone, if the
> connection fails after automatic resuming, the client erases the
> partial resources and falls through to a normal clone. However, once a
> resumable directory is left behind by the program, it is NEVER
> deleted/abandoned after it is continued with --resume.

Sounds like you made a sensible design decision here.

> 	- When running with ssh and a password, the credentials are
> 	  prompted for twice. I don't know if there is a way to
> 	  preserve credentials between executions. I couldn't find any
> 	  examples in git's source.

We leave credentail reuse to keyring services like ssh-agent.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] Resumable clone: create service git-prime-clone
  2016-09-16  0:12 ` [PATCH 01/11] Resumable clone: create service git-prime-clone Kevin Wern
@ 2016-09-16 20:53   ` Junio C Hamano
  2016-09-28  4:40     ` Kevin Wern
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-09-16 20:53 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> writes:

> Create git-prime-clone, a program to be executed on the server that
> returns the location and type of static resource to download before
> performing the rest of a clone.
>
> Additionally, as this executable's location will be configurable (see:
> upload-pack and receive-pack), add the program to
> BINDIR_PROGRAMS_NEED_X, in addition to the usual builtin places. Add
> git-prime-clone executable to gitignore, as well
>
> Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
> ---

I wonder if we even need a separate service like this.

Wouldn't a new protocol capability that is advertised from
upload-pack sufficient to tell the "git clone" that it can
and should consider priming from this static resource?

> +static void prime_clone(void)
> +{
> +	if (!enabled) {
> +		fprintf(stderr, _("prime-clone not enabled\n"));
> +	}
> +	else if (url && filetype){
> +		packet_write(1, "%s %s\n", filetype, url);
> +	}
> +	else if (url || filetype) {
> +		if (filetype)
> +			fprintf(stderr, _("prime-clone not properly "
> +					  "configured: missing url\n"));
> +		else if (url)
> +			fprintf(stderr, _("prime-clone not properly "
> +					  "configured: missing filetype\n"));
> +	}
> +	packet_flush(1);
> +}

Two minor comments:

 - For whom are you going to localize these strings?  This program
   is running on the server side and we do not know the locale
   preferred by the end-user who is sitting on the other end of the
   connection, no?

 - Turn "}\n\s+else " into "} else ", please.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] pkt-line: create gentle packet_read_line functions
  2016-09-16  0:12 ` [PATCH 03/11] pkt-line: create gentle packet_read_line functions Kevin Wern
@ 2016-09-16 22:17   ` Junio C Hamano
  2016-09-28  4:42     ` Kevin Wern
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-09-16 22:17 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> writes:

>  	/* And complain if we didn't get enough bytes to satisfy the read. */
>  	if (ret < size) {
> -		if (options & PACKET_READ_GENTLE_ON_EOF)
> +		if (options & (PACKET_READ_GENTLE_ON_EOF | PACKET_READ_GENTLE_ALL))
>  			return -1;

The name _ALL suggested to me that there may be multiple "under this
condition, be gentle", "under that condition, be gentle", and _ALL
is used as a catch-all "under any condition, be gentle".  If you
defined _ALL symbol to have all GENTLE bits on, this line could have
become

	if (options & PACKET_READ_GENTLE_ALL)

> @@ -205,15 +209,23 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
>  	if (ret < 0)
>  		return ret;
>  	len = packet_length(linelen);
> -	if (len < 0)
> +	if (len < 0) {
> +		if (options & PACKET_READ_GENTLE_ALL)
> +			return -1;

On the other hand, however, you do want to die here when only
GENTLE_ON_EOF is set.

Taking the above two observations together, I'd have to say that
_ALL is probably a misnomer.  I agree with a need for a flag with
the behaviour you defined in this patch, though.

>  		die("protocol error: bad line length character: %.4s", linelen);

>  static char *packet_read_line_generic(int fd,
>  				      char **src, size_t *src_len,
> -				      int *dst_len)
> +				      int *dst_len, int flags)

The original one is called options, not flags, and it would be
easier to follow if it is consistently called options, instead of
requiring the reader to keep track of "ah, it is called flags here
but the callee renames it to options".

> +/*
> + * Same as packet_read_line, but does not die on any errors (uses
> + * PACKET_READ_GENTLE_ALL).
> + */
> +char *packet_read_line_gentle(int fd, int *len_p);
> +
> +/*
> + * Same as packet_read_line_buf, but does not die on any errors (uses
> + * PACKET_READ_GENTLE_ALL).
> + */
> +char *packet_read_line_buf_gentle(char **src_buf, size_t *src_len, int *size);

I think most if not all "do the same thing as do_something() but
report errors instead of dying" variant of functions are named
do_something_gently(), not do_something_gentle().


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/11] Resumable clone: add resumable download to http/curl
  2016-09-16  0:12 ` [PATCH 07/11] Resumable clone: add resumable download to http/curl Kevin Wern
@ 2016-09-16 22:45   ` Junio C Hamano
  2016-09-28  6:41     ` Kevin Wern
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-09-16 22:45 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> writes:

> +int http_download_primer(const char *url, const char *out_file)
> +{
> +	int ret = 0, try_count = HTTP_TRY_COUNT;
> +	struct http_get_options options = {0};
> +	options.progress = 1;
> +
> +	if (file_exists(out_file)) {
> +		fprintf(stderr,
> +			"File already downloaded: '%s', skipping...\n",
> +			out_file);
> +		return ret;
> +	}
> +
> +	do {
> +		if (try_count != HTTP_TRY_COUNT) {
> +			fprintf(stderr, "Connection interrupted for some "
> +				"reason, retrying (%d attempts left)\n",
> +				try_count);
> +			struct timeval time = {10, 0}; // 1s

We do not use // comment.

> +			select(0, NULL, NULL, NULL, &time);
> +		}
> +		ret = http_get_file(url, out_file, &options);

I didn't realize that http_get_file() -> http_request() codepath,
when it is the output file, already can do the "ftell and request
the reminder".  Very nice.

> @@ -1136,7 +1138,10 @@ static int handle_curl_result(struct slot_results *results)
>  				curl_easy_strerror(results->curl_result),
>  				sizeof(curl_errorstr));
>  #endif
> -		return HTTP_ERROR;
> +		if (results->http_code >= 400)
> +			return HTTP_ERROR;
> +		else
> +			return HTTP_ERROR_RESUMABLE;
>  	}
>  }

Hmm, is "anything below 400" a good definition of resumable errors?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT
  2016-09-16  0:12 ` [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT Kevin Wern
@ 2016-09-16 23:07   ` Junio C Hamano
  2016-09-18 19:22     ` Johannes Schindelin
  2016-09-28  4:46     ` Kevin Wern
  0 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-09-16 23:07 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> writes:

> Add option RUN_COMMAND_NO_STDOUT, which sets no_stdout on a child
> process.
>
> This will be used by git clone when calling index-pack on a downloaded
> packfile.

If it is just one caller, would't it make more sense for that caller
set no_stdout explicitly itself?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone
  2016-09-16  0:12 ` [PATCH 11/11] Resumable clone: implement primer logic in git-clone Kevin Wern
@ 2016-09-16 23:32   ` Junio C Hamano
  2016-09-28  5:49     ` Kevin Wern
  2016-09-19 14:04   ` Duy Nguyen
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-09-16 23:32 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> writes:

> Use transport_download_primer and transport_prime_clone in git clone.
> This only supports a fully connected packfile.
>
> transport_prime_clone and transport_download_primer are executed
> completely independent of transport_(get|fetch)_remote_refs, et al.
> transport_download_primer is executed based on the existence of an
> alt_resource. The idea is that the "prime clone" execution should be
> able to attempt retrieving an alternate resource without dying, as
> opposed to depending on the result of upload pack's "capabilities" to
> indicate whether or not the client can attempt it.
>
> If a resumable resource is available, execute a codepath with the
> following modular components:
> - downloading resource to a specific directory
> - using the resource (for pack, indexing and generating the bundle
>   file)
> - cleaning up the resource (if the download or use fails)
> - cleaning up the resource (if the download or use succeeds)
>
> If resume is interrupted on the client side, the alternate resource
> info is written to the RESUMABLE file in the git directory.
>
> On resume, the required info is extracted by parsing the created
> config file, and that info is used to determine the work and git
> directories. If these cannot be determined, the program exits.
> The writing of the refspec and determination of the initial git
> directories are skipped, along with transport_prime_clone.
>
> The main purpose of this series of patches is to flesh out a codepath
> for automatic resuming, manual resuming, and leaving a resumable
> directory on exit--the logic for when to do these still needs more
> work.
>
> Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
> ---
>  Documentation/git-clone.txt |  16 ++
>  builtin/clone.c             | 590 +++++++++++++++++++++++++++++++++++++-------
>  t/t9904-git-prime-clone.sh  | 181 ++++++++++++++
>  3 files changed, 698 insertions(+), 89 deletions(-)
>  create mode 100755 t/t9904-git-prime-clone.sh
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index b7c467a..5934bb6 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -16,6 +16,7 @@ SYNOPSIS
>  	  [--depth <depth>] [--[no-]single-branch]
>  	  [--recursive | --recurse-submodules] [--] <repository>
>  	  [<directory>]
> +'git clone --resume <resumable_dir>'
>  
>  DESCRIPTION
>  -----------
> @@ -172,6 +173,12 @@ objects from the source repository into a pack in the cloned repository.
>  	via ssh, this specifies a non-default path for the command
>  	run on the other end.
>  
> +--prime-clone <prime-clone>::
> +-p <prime-clone>::

Not many other options have single letter shorthand.  Is it expected
that it is worth to let this option squat on a short-and-sweet "-p",
perhaps because it is so frequently used?

> +--resume::
> +	Resume a partially cloned repo in a "resumable" state. This
> +	can only be specified with a single local directory (<resumable
> +	dir>). This is incompatible with all other options.
> +
> +<resumable_dir>::
> +	The directory of the partial clone. This could be either the
> +	work directory or the git directory.

I think these should be described this way:

    --resume <resumable_dir>::
            description if what resume option does and how resumable_dir
            is used by the option.

in a single bullet point.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 9ac6c01..d9a13dc 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -8,7 +8,9 @@
>   * Clone a repository into a different directory that does not yet exist.
>   */
>  
> +#include "cache.h"
>  #include "builtin.h"

I do not think you need to include cache.h if you are including
builtin.h; Documentation/CodingGuidelines says:

 - The first #include in C files, except in platform specific compat/
   implementations, must be either "git-compat-util.h", "cache.h" or
   "builtin.h".  You do not have to include more than one of these.

> @@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
>  
>  static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
>  static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
> +static int option_resume;
>  static char *option_template, *option_depth;
> -static char *option_origin = NULL;
> +static const char *option_origin = NULL;

Is this change related to anything you are doing here?

If you are fixing things while at it, please don't ;-) If you really
want to, please also remove " = NULL", from this line and also from
the next line.  Also do not add " = NULL" at the end of alt_res.

>  static char *option_branch = NULL;
>  ...
> +static const struct alt_resource *alt_res = NULL;

> +static char *get_filename(const char *dir)
> +{
> +	char *dir_copy = xstrdup(dir);
> +	strip_trailing_slashes(dir_copy);
> +	char *filename, *final = NULL;
> +
> +	filename = find_last_dir_sep(dir);
> +
> +	if (filename && *(++filename))
> +		final = xstrdup(filename);
> +
> +	free(dir_copy);
> +	return final;
> +}

Hmph, don't we have our own basename(3) lookalike that knows about
dir-sep already?

> @@ -451,6 +475,7 @@ static const char *junk_work_tree;
>  static const char *junk_git_dir;
>  static enum {
>  	JUNK_LEAVE_NONE,
> +	JUNK_LEAVE_RESUMABLE,
>  	JUNK_LEAVE_REPO,
>  	JUNK_LEAVE_ALL
>  } junk_mode = JUNK_LEAVE_NONE;
> @@ -460,6 +485,29 @@ N_("Clone succeeded, but checkout failed.\n"
>     "You can inspect what was checked out with 'git status'\n"
>     "and retry the checkout with 'git checkout -f HEAD'\n");
>  
> +static const char junk_leave_resumable_msg[] =
> +N_("Clone interrupted while copying resumable resource.\n"
> +   "Try using 'git clone --resume <new_directory>',\n"
> +   "where <new_directory> is either the new working \n"
> +   "directory or git directory.\n\n"
> +   "If this does not succeed, it could be because the\n"
> +   "resource has been moved, corrupted, or changed.\n"
> +   "If this is the case, you should remove <new_directory>\n"
> +   "and run the original command.\n");
> +
> +static void write_resumable_resource()
> +{
> +	const char *filename = git_path_resumable();
> +	struct strbuf content = STRBUF_INIT;
> +	strbuf_addf(&content, "%s\n%s\n", alt_res->url, alt_res->filetype);
> +	int fd = open(filename, O_WRONLY | O_CREAT, 0666);
> +	if (fd < 0)
> +		die_errno(_("Could not open '%s' for writing"), filename);
> +	if (write_in_full(fd, content.buf, content.len) != content.len)
> +		die_errno(_("Could not write to '%s'"), filename);
> +	close(fd);
> +}
>
>  static void remove_junk(void)
>  {
>  	struct strbuf sb = STRBUF_INIT;
> @@ -467,7 +515,11 @@ static void remove_junk(void)
>  	switch (junk_mode) {
>  	case JUNK_LEAVE_REPO:
>  		warning("%s", _(junk_leave_repo_msg));
> -		/* fall-through */
> +		return;
> +	case JUNK_LEAVE_RESUMABLE:
> +		write_resumable_resource();
> +		warning("%s", _(junk_leave_resumable_msg));
> +		return;

Nice.

> @@ -562,7 +614,7 @@ static void write_remote_refs(const struct ref *local_refs)
>  		die("%s", err.buf);
>  
>  	for (r = local_refs; r; r = r->next) {
> -		if (!r->peer_ref)
> +		if (!r->peer_ref || ref_exists(r->peer_ref->name))
>  			continue;
>  		if (ref_transaction_create(t, r->peer_ref->name, r->old_oid.hash,
>  					   0, NULL, &err))

What is this change about?


> @@ -820,11 +872,296 @@ static void dissociate_from_references(void)
>  	free(alternates);
>  }
>  
> +static int do_index_pack(const char *in_pack_file, const char *out_idx_file)
> +{
> +	const char *argv[] = { "index-pack", "--clone-bundle", "-v",
> +			       "--check-self-contained-and-connected", "-o",
> +			       out_idx_file, in_pack_file, NULL };
> +	return run_command_v_opt(argv, RUN_GIT_CMD|RUN_COMMAND_NO_STDOUT);
> +}

That looks vaguely familiar ;-)

> +static const char *setup_and_index_pack(const char *filename)
> +{
> +	const char *primer_idx_path = NULL, *primer_bndl_path = NULL;
> +	primer_idx_path = replace_extension(filename, ".pack", ".idx");
> +	primer_bndl_path = replace_extension(filename, ".pack", ".bndl");
> +
> +	if (!(primer_idx_path && primer_bndl_path)) {
> +		warning("invalid pack filename '%s', falling back to full "
> +			"clone", filename);
> +		return NULL;
> +	}
> +
> +	if (!file_exists(primer_bndl_path)) {
> +		if (do_index_pack(filename, primer_idx_path)) {
> +			warning("could not index primer pack, falling back to "
> +				"full clone");
> +			return NULL;
> +		}
> +	}

Can it be another (undetected) failure mode that .bndl somehow
already existed, but not .idx, leaving the resulting object store in
an incosistent state?  Can do_index_pack() fail and leave .bndl
behind to get you into such a state?

> +static int write_bundle_refs(const char *bundle_filename)
> +{
> +	struct ref_transaction *t;
> +	struct bundle_header history_tips;
> +	const char *temp_ref_base = "resume";
> +	struct strbuf err = STRBUF_INIT;
> +	int i;
> +
> +	init_bundle_header(&history_tips, bundle_filename);
> +	read_bundle_header(&history_tips);
> +
> +	t = ref_transaction_begin(&err);
> +	for (i = 0; i < history_tips.references.nr; i++) {
> +		struct strbuf ref_name = STRBUF_INIT;
> +		strbuf_addf(&ref_name, "refs/temp/%s/%s/temp-%s",
> +			    option_origin, temp_ref_base,
> +			    sha1_to_hex(history_tips.references.list[i].sha1));

Can we do this without polluting refs/temp/ namespace?

I am imagining that you are first fetching the .pack file from
sideways when primer service is available, running index-pack on it
to produce the bundle, and the step after that is to run "git fetch"
against the original remote to fill the gap between the bit-stale
history you got in the bundle and the reality that has progressed
since the primer pack was made, and you need a way to tell to the
other end that you already have the history leading to these refs
when you run "git fetch".  I think a bit better way to do so is to
send these has ".have" while you run the "fetch".

Wouldn't it do if you add the "--advertise-bundle-tips=<bndl>"
option to "git fetch", move the code to read the bundle header to
it, and point the bundle's filename with the option when you spawn
"git fetch"?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT
  2016-09-16 23:07   ` Junio C Hamano
@ 2016-09-18 19:22     ` Johannes Schindelin
  2016-09-28  4:46     ` Kevin Wern
  1 sibling, 0 replies; 39+ messages in thread
From: Johannes Schindelin @ 2016-09-18 19:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

Hi,

On Fri, 16 Sep 2016, Junio C Hamano wrote:

> Kevin Wern <kevin.m.wern@gmail.com> writes:
> 
> > Add option RUN_COMMAND_NO_STDOUT, which sets no_stdout on a child
> > process.
> >
> > This will be used by git clone when calling index-pack on a downloaded
> > packfile.
> 
> If it is just one caller, would't it make more sense for that caller
> set no_stdout explicitly itself?

Taking a step back, maybe it is not such a good idea to swallow the output
in all cases, including the error cases?

Maybe the best course of action is to hide stdout/stderr by default but
show it in case of a non-zero exit code, i.e. using
https://public-inbox.org/git/6383b7afcdeb6c999862aa32ba437997f2dd3d4e.1472633606.git.johannes.schindelin@gmx.de/ ?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] Resumable clone: add prime-clone endpoints
  2016-09-16  0:12 ` [PATCH 02/11] Resumable clone: add prime-clone endpoints Kevin Wern
@ 2016-09-19 13:15   ` Duy Nguyen
  2016-09-28  4:43     ` Kevin Wern
  0 siblings, 1 reply; 39+ messages in thread
From: Duy Nguyen @ 2016-09-19 13:15 UTC (permalink / raw)
  To: Kevin Wern; +Cc: Git Mailing List

On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern <kevin.m.wern@gmail.com> wrote:
>  static struct daemon_service daemon_service[] = {
>         { "upload-archive", "uploadarch", upload_archive, 0, 1 },
>         { "upload-pack", "uploadpack", upload_pack, 1, 1 },
>         { "receive-pack", "receivepack", receive_pack, 0, 1 },
> +       { "prime-clone", "primeclone", prime_clone, 0, 1 },
>  };

I guess this is why you chose to implement a new command in 01/11,
simpler to be called from http-backend?

> +               // prime-clone does not need --stateless-rpc and
> +               // --advertise-refs options. Maybe it will in the future, but
> +               // until then it seems best to do this instead of adding
> +               // "dummy" options.

Stick to /* .. */

> +               if (strcmp(svc->name, "prime-clone") != 0) {
> +                       argv_array_pushl(&argv, "--stateless-rpc",
> +                                        "--advertise-refs", NULL);
> +               }

We also have an exception for select_getanyfile() below. I think it's
time we add a function callback in struct rpc_service to run each
service the way they want. Then prime-clone won't need an exception
(neither does select_anyfile, mostly)
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/11] path: add resumable marker
  2016-09-16  0:12 ` [PATCH 09/11] path: add resumable marker Kevin Wern
@ 2016-09-19 13:24   ` Duy Nguyen
  0 siblings, 0 replies; 39+ messages in thread
From: Duy Nguyen @ 2016-09-19 13:24 UTC (permalink / raw)
  To: Kevin Wern; +Cc: Git Mailing List

On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern <kevin.m.wern@gmail.com> wrote:
> Create function to get gitdir file RESUMABLE.

A very good opportunity to explain what this file is or will be (and
whether its content matters or just its existence). Either as commit
message, or even better in Documentation/gitrepository-layout.txt.
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] Resumable clone: add prime-clone to remote-curl
  2016-09-16  0:12 ` [PATCH 04/11] Resumable clone: add prime-clone to remote-curl Kevin Wern
@ 2016-09-19 13:52   ` Duy Nguyen
  2016-09-28  6:45     ` Kevin Wern
  0 siblings, 1 reply; 39+ messages in thread
From: Duy Nguyen @ 2016-09-19 13:52 UTC (permalink / raw)
  To: Kevin Wern; +Cc: Git Mailing List

On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern <kevin.m.wern@gmail.com> wrote:
> +static void prime_clone(void)
> +{
> +       char *result, *result_full, *line;
> +       size_t result_len;
> +       int err = 0, one_successful = 0;
> +
> +       if (request_service("git-prime-clone", &result_full, &result,
> +                       &result_len, HTTP_ERROR_GENTLE)) {
> +               while (line = packet_read_line_buf_gentle(&result, &result_len,
> +                                                         NULL)) {
> +                       char *space = strchr(line ,' ');
> +
> +                       // We will eventually support multiple resources, so
> +                       // always parse the whole message
> +                       if (err)
> +                               continue;
> +                       if (!space || strchr(space + 1, ' ')) {
> +                               if (options.verbosity > 1)
> +                                       fprintf(stderr, "prime clone "
> +                                               "protocol error: got '%s'\n",
> +                                               line);
> +                               printf("error\n");
> +                               err = 1;
> +                               continue;
> +                       }
> +
> +                       one_successful = 1;
> +                       printf("%s\n", line);

A brief overview for this service in
Documentation/technical/http-protocol.txt (and maybe
Documentation/gitremote-helpers.txt as well) would be great help. It's
a bit hard to follow because at this point I don't know anything about
the server side (and on top of that I was confused between http
send/receive vs transport send/receive, but this is my fault).

> +               }
> +               if (!one_successful && options.verbosity > 1)
> +                       fprintf(stderr, "did not get required components for "
> +                               "alternate resource\n");
> +       }
> +
> +       printf("\n");
> +       fflush(stdout);
> +       free(result_full);
> +}
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone
  2016-09-16  0:12 ` [PATCH 11/11] Resumable clone: implement primer logic in git-clone Kevin Wern
  2016-09-16 23:32   ` Junio C Hamano
@ 2016-09-19 14:04   ` Duy Nguyen
  2016-09-19 17:16     ` Junio C Hamano
  2016-09-28  4:44     ` Kevin Wern
  1 sibling, 2 replies; 39+ messages in thread
From: Duy Nguyen @ 2016-09-19 14:04 UTC (permalink / raw)
  To: Kevin Wern; +Cc: Git Mailing List

On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern <kevin.m.wern@gmail.com> wrote:
>  builtin/clone.c             | 590 +++++++++++++++++++++++++++++++++++++-------

Argh.. this is too big for my brain at this hour. It might be easier
to follow if you separate out some code move (I think I've seen some,
not sure). I'll try to have another look when I find time. But it's
great to hear from you again, the pleasant surprise in my inbox today,
as I thought we lost you ;-) There's hope for resumable clone maybe
before 2018 again.
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone
  2016-09-19 14:04   ` Duy Nguyen
@ 2016-09-19 17:16     ` Junio C Hamano
  2016-09-28  4:44     ` Kevin Wern
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-09-19 17:16 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Kevin Wern, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern <kevin.m.wern@gmail.com> wrote:
>>  builtin/clone.c             | 590 +++++++++++++++++++++++++++++++++++++-------
>
> Argh.. this is too big for my brain at this hour. It might be easier
> to follow if you separate out some code move (I think I've seen some,
> not sure). I'll try to have another look when I find time. But it's
> great to hear from you again, the pleasant surprise in my inbox today,
> as I thought we lost you ;-) There's hope for resumable clone maybe
> before 2018 again.

I had a similar thought.

What "git clone" (WITHOUT Kevin's update) should have been was to be
as close to

    * Parse command line arguments;

    * Create a new repository and go into it; this step would
      require us to have parsed the command line for --template,
      <directory>, --separate-git-dir, etc.

    * Talk to the remote and do get_remote_heads() aka ls-remote
      output;

    * Decide what fetch refspec to use, which alternate object store
      to borrow from; this step would require us to have parsed the
      command line for --reference, --mirror, --origin, etc;

    * Issue "git fetch" with the refspec determined above; this step
      would require us to have parsed the command line for --depth, etc.

    * Run "git checkout -b" to create an initial checkout; this step
      would require us to have parsed the command line for --branch,
      etc.

and the current code Kevin is basing his work on is not quite in
that shape.  This round of the patches may be RFC so it may be OK
but the ready-for-review work may need to do the refactoring to get
it close to the above shape as a preparatory step before doing
anything else.  Once that is done, Kevin's series (other than the
part that acutally does the resumable static file download) will
become a very easily understood "Ah, we know where to prime the well
from, so let's do that first" step inserted immediately before the
"Issue 'git fetch'" step.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/11] Resumable clone
  2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
                   ` (11 preceding siblings ...)
  2016-09-16 20:47 ` [PATCH 00/11] Resumable clone Junio C Hamano
@ 2016-09-27 21:51 ` Eric Wong
  2016-09-27 22:07   ` Junio C Hamano
  12 siblings, 1 reply; 39+ messages in thread
From: Eric Wong @ 2016-09-27 21:51 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> wrote:
> Hey, all,
> 
> It's been a while (sent a very short patch in May), but I've
> still been working on the resumable clone feature and checking up on
> the mailing list for any updates. After submitting the prime-clone
> service alone, I figured implementing the whole thing would be the best
> way to understand the full scope of the problem (this is my first real
> contribution here, and learning while working on such an involved
> feature has not been easy). 

Thank you for working on this.  I'm hugely interested in this
feature as both a cheapskate sysadmin and as a client with
unreliable connectivity (and I've barely been connected this month).

> This is a functional implementation handling a direct http/ftp URI to a
> single, fully connected packfile (i.e. the link is a direct path to the
> file, not a prefix or guess). My hope is that this acts as a bare
> minimum cross-section spanning the full requirments that can expand in
> width as more cases are added (.info file, split bundle, daemon
> download service). This is certainly not perfect, but I think it at
> least prototypes each component involved in the workflow.
> 
> This patch series is based on jc/bundle, because the logic to find the
> tips of a pack's history already exists there (I call index-pack
> --clone-bundle on the downloaded file, and read the file to write the
> references to a temporary directory). If I need to re-implement this
> logic or base it on another branch, let me know. For ease of pulling
> and testing, I included the branch here:
> 
> https://github.com/kevinwern/git/tree/feature/prime-clone

Am I correct this imposes no additional storage burden for servers?

(unlike the current .bundle dance used by kernel.org:
  https://www.kernel.org/cloning-linux-from-a-bundle.html )

That would be great!

> Although there are a few changes internally from the last patch,
> the "alternate resource" url to download is configured on the
> server side in exactly the same way:
> 
> [primeclone]
> 	url = http://location/pack-$NAME.pack
> 	filetype = pack

If unconfigured, I wonder if a primeclone pack can be inferred by
the existence of a pack bitmap (or merely being the biggest+oldest
pack for dumb HTTP).

> The prime-clone service simply outputs the components as:
> 
> ####url filetype
> 0000
> 
> On the client side, the transport_prime_clone and
> transport_download_primer APIs are built to be more robust (i.e. read
> messages without dying due to protocol errors), so that git clone can
> always try them without being dependent on the capability output of
> git-upload-pack. transport_download_primer is dependent on the success
> of transport_prime_clone, but transport_prime_clone is always run on an
> initial clone. Part of achieving this robustness involves adding
> *_gentle functions to pkt_line, so that prime_clone can fail silently
> without dying.
> 
> The transport_download_primer function uses a resumable download,
> which is applicable to both automatic and manual resuming. Automatic
> is programmatically reconnecting to the resource after being
> interrupted (up to a set number of times). Manual is using a newly
> taught --resume option on the command line:
> 
> git clone --resume <resumable_work_or_git_dir>

I think calling "git fetch" should resume, actually.
It would reduce the learning curve and seems natural to me:
"fetch" is jabout grabbing whatever else appeared since the
last clone/fetch happened.

> Right now, a manually resumable directory is left behind only if the
> *client* is interrupted while a new junk mode, JUNK_LEAVE_RESUMABLE,
> is set (right before the download). For an initial clone, if the
> connection fails after automatic resuming, the client erases the
> partial resources and falls through to a normal clone. However, once a
> resumable directory is left behind by the program, it is NEVER
> deleted/abandoned after it is continued with --resume.

I'm not sure if erasing partial resources should ever be done
automatically.  Perhaps a note to the user explaining the
situation and potential ways to correct/resume it.

> I think determining when a resource is "unsalvageable" should be more
> nuanced. Especially in a case where a connection is perpetually poor
> and the user wishes to resume over a long period of time. The timeout
> logic itself *definitely* needs more nuance than "repeat 5 times", such
> as expanding wait times and using earlier successes when deciding to
> try again. Right now, I think the most important part of this patch is
> that these two paths (falling through after a failed download, exiting
> to be manually resumed later) exist.
> 
> Off the top of my head, outstanding issues/TODOs inlcude:
> 	- The above issue of determining when to fall through, when to
> 	  reattempt, and when to write the resumable info and exit
> 	  in git clone.

My current (initial) reaction is: you're overthinking this.

I think it's less surprising to a user to always write resumable
info and let them know how to resume (or abort); rather than
trying to second-guess their intent.

Going by the zero-one-infinity rule, I'd probably attempt an
auto-retry once on socket errors before saving state and bailing
with instructions on how to resume.

If they hit Ctrl-C manually, then just tell them they can
either resume or "rm -r" the directory.

> 	- Creating git-daemon service to download a resumable resource.
> 	  Pretty straightforward, I think, especially if
> 	  http.getanyfile already exists. This falls more under
> 	  "haven't gotten to yet" than dilemma.

I think this could be handled natively by git-daemon for
trickling data to slow clients in the existing event loop (and
expanded to use epoll/kqueue).  Similar to how X-Sendfile works
with (Apache|lighttpd) or X-Accel in nginx.

This would be cheaper than wasting a process (or thread) to
trickle to low-bandwidth clients.  But this may be an
optimization we defer until we've ironed out other parts.

> 	- Logic for git clone to determine when a full clone would
> 	  be superior, such as when a clone is local or a reference is
> 	  given.
> 	- Configuring prime-clone for multiple resources, in two
> 	  dimensions: (a) resources to choose from (e.g. fall back to
> 	  a second resource if the first one doesn't work) and (b)
> 	  resources to be downloaded together or in sequence (e.g.
> 	  download http://host/this, then http://host/that). Maybe
> 	  prime-clone could also handle client preferences in terms of
> 	  filetype or protocol. For this, I just have to re-read a few
> 	  discussions about the filetypes we use to see if there are
> 	  any outliers that aren't representable in this way. I think
> 	  this is another "haven't gotten to yet".

Perhaps using the existing http-alternates (and automatic
primeclone pack inference I wrote about above) can be done.

<snip>

> 	- Creating the logic to guess a packfile, and append that to a
> 	  prefix specified by the admin. Additionally, allowing the
> 	  admin to use a custom script to use their own logic to
> 	  output the URL.

Yes :) Though I'm not sure if the custom script is necessary.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/11] Resumable clone
  2016-09-27 21:51 ` Eric Wong
@ 2016-09-27 22:07   ` Junio C Hamano
  2016-09-28 17:32     ` Junio C Hamano
  2016-09-28 20:46     ` Eric Wong
  0 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-09-27 22:07 UTC (permalink / raw)
  To: Eric Wong; +Cc: Kevin Wern, git

Eric Wong <e@80x24.org> writes:

>> [primeclone]
>> 	url = http://location/pack-$NAME.pack
>> 	filetype = pack
>
> If unconfigured, I wonder if a primeclone pack can be inferred by
> the existence of a pack bitmap (or merely being the biggest+oldest
> pack for dumb HTTP).

That would probably be a nice heuristics but it is unclear who
should find that out at runtime.  The downloading side would not
have a visiblity into directory listing.

>> git clone --resume <resumable_work_or_git_dir>
>
> I think calling "git fetch" should resume, actually.
> It would reduce the learning curve and seems natural to me:
> "fetch" is jabout grabbing whatever else appeared since the
> last clone/fetch happened.

I hate say this but it sounds to me like a terrible idea.  At that
point when you need to resume, there is not even ref for "fetch" to
base its incremental work off of.  It is better to keep the knowledge
of this "priming" dance inside "clone".  Hopefully the original "clone"
whose connection was disconnected in the middle would automatically
attempt resuming and "clone --resume" would not be as often as needed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] Resumable clone: create service git-prime-clone
  2016-09-16 20:53   ` Junio C Hamano
@ 2016-09-28  4:40     ` Kevin Wern
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  4:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

On Fri, Sep 16, 2016 at 01:53:15PM -0700, Junio C Hamano wrote:
> Kevin Wern <kevin.m.wern@gmail.com> writes:
> 
> > Create git-prime-clone, a program to be executed on the server that
> > returns the location and type of static resource to download before
> > performing the rest of a clone.
> >
> > Additionally, as this executable's location will be configurable (see:
> > upload-pack and receive-pack), add the program to
> > BINDIR_PROGRAMS_NEED_X, in addition to the usual builtin places. Add
> > git-prime-clone executable to gitignore, as well
> >
> > Signed-off-by: Kevin Wern <kevin.m.wern@gmail.com>
> > ---
> 
> I wonder if we even need a separate service like this.
> 
> Wouldn't a new protocol capability that is advertised from
> upload-pack sufficient to tell the "git clone" that it can
> and should consider priming from this static resource?

The short answer is yes, it could be done that way. Both methods--extending
upload-pack and creating a new service--were suggested in different
discussions.

However, my thought was to implement the separate service because:
	- It is much easier for an admin trying the feature to sanity check the
	  output of an executable, compared to passing messages to upload-pack.
	- In the other scenario, upload-pack might get too expansive in size
	  and scope--not only codewise, but in terms of config namespace if
	  "uploadpack" concerns too many things that are only tangentially
	  related (the properties of the primer resource).
	- The transport_prime_clone API can be called independent of other
	  transport API functions, which might prove useful when revisiting or
	  refactoring code.
	- You favored the creation of a service in our original discussion [1].
	  I'm not sure if your reasoning was similar to mine.

It definitely was a tight decision--for me, ultimately weighing the value added
in usability (point 1) against the need for a "failsafe" implementation. All
the other points are more speculative, IMO, but the first was strong enough for
me.

What do you think?

[1] http://www.spinics.net/lists/git/msg269992.html

> Two minor comments:
> 
>  - For whom are you going to localize these strings?  This program
>    is running on the server side and we do not know the locale
>    preferred by the end-user who is sitting on the other end of the
>    connection, no?
> 
>  - Turn "}\n\s+else " into "} else ", please.

These are fair points. Changing for the revised version.

- Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] pkt-line: create gentle packet_read_line functions
  2016-09-16 22:17   ` Junio C Hamano
@ 2016-09-28  4:42     ` Kevin Wern
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  4:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

On Fri, Sep 16, 2016 at 03:17:28PM -0700, Junio C Hamano wrote:
> Kevin Wern <kevin.m.wern@gmail.com> writes:
> 
> >  	/* And complain if we didn't get enough bytes to satisfy the read. */
> >  	if (ret < size) {
> > -		if (options & PACKET_READ_GENTLE_ON_EOF)
> > +		if (options & (PACKET_READ_GENTLE_ON_EOF | PACKET_READ_GENTLE_ALL))
> >  			return -1;
> 
> The name _ALL suggested to me that there may be multiple "under this
> condition, be gentle", "under that condition, be gentle", and _ALL
> is used as a catch-all "under any condition, be gentle".  If you
> defined _ALL symbol to have all GENTLE bits on, this line could have
> become
> 
> 	if (options & PACKET_READ_GENTLE_ALL)
> 
> > @@ -205,15 +209,23 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
> >  	if (ret < 0)
> >  		return ret;
> >  	len = packet_length(linelen);
> > -	if (len < 0)
> > +	if (len < 0) {
> > +		if (options & PACKET_READ_GENTLE_ALL)
> > +			return -1;
> 
> On the other hand, however, you do want to die here when only
> GENTLE_ON_EOF is set.
> 
> Taking the above two observations together, I'd have to say that
> _ALL is probably a misnomer.  I agree with a need for a flag with
> the behaviour you defined in this patch, though.

OK, my thought is either:
	- Come up with a name for a flag, or flags, for the other cases (to
	  check in the function, i.e. PACKET_READ_GENTLE_INVALID), and still
	  pass in PACKET_READ_GENTLE_ALL, which is all those bits on plus
	  *_GENTLE_ON_EOF.
	- Come up with a better name for this single flag, like
	  PACKET_READ_DONT_DIE ... only better.

What would you suggest here?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] Resumable clone: add prime-clone endpoints
  2016-09-19 13:15   ` Duy Nguyen
@ 2016-09-28  4:43     ` Kevin Wern
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  4:43 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Kevin Wern, Git Mailing List

On Mon, Sep 19, 2016 at 08:15:00PM +0700, Duy Nguyen wrote:
> We also have an exception for select_getanyfile() below. I think it's
> time we add a function callback in struct rpc_service to run each
> service the way they want. Then prime-clone won't need an exception
> (neither does select_anyfile, mostly)
> -- 
> Duy

Great idea! I'll definitely do that if we decide to go with the service I
implemented.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone
  2016-09-19 14:04   ` Duy Nguyen
  2016-09-19 17:16     ` Junio C Hamano
@ 2016-09-28  4:44     ` Kevin Wern
  1 sibling, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  4:44 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Kevin Wern, Git Mailing List

On Mon, Sep 19, 2016 at 09:04:40PM +0700, Duy Nguyen wrote:
> On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern <kevin.m.wern@gmail.com> wrote:
> >  builtin/clone.c             | 590 +++++++++++++++++++++++++++++++++++++-------
> 
> Argh.. this is too big for my brain at this hour. It might be easier
> to follow if you separate out some code move (I think I've seen some,
> not sure). I'll try to have another look when I find time. But it's
> great to hear from you again, the pleasant surprise in my inbox today,
> as I thought we lost you ;-) There's hope for resumable clone maybe
> before 2018 again.

Sorry, I didn't mean to hurt anyone haha. I probably should have broken it down
into the bare process (priming from static resource, with no options), and the
resume option.

Thanks so much for your feedback! :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT
  2016-09-16 23:07   ` Junio C Hamano
  2016-09-18 19:22     ` Johannes Schindelin
@ 2016-09-28  4:46     ` Kevin Wern
  2016-09-28 17:54       ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  4:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

On Fri, Sep 16, 2016 at 04:07:00PM -0700, Junio C Hamano wrote:
> Kevin Wern <kevin.m.wern@gmail.com> writes:
> 
> > Add option RUN_COMMAND_NO_STDOUT, which sets no_stdout on a child
> > process.
> >
> > This will be used by git clone when calling index-pack on a downloaded
> > packfile.
> 
> If it is just one caller, would't it make more sense for that caller
> set no_stdout explicitly itself?

I based the calling code in do_index_pack on dissociate_from_references, which
uses run_command_v_opt, so it never occured to me to do that. I thought it was
just good, uniform style and encapsulation. Like how transport's methods and
internals aren't really intended to be changed or accessed--unless it's through
the APIs we create.

However, I don't feel very strongly about this, so I'm okay with this change.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone
  2016-09-16 23:32   ` Junio C Hamano
@ 2016-09-28  5:49     ` Kevin Wern
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  5:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

On Fri, Sep 16, 2016 at 04:32:05PM -0700, Junio C Hamano wrote:
> >  -----------
> > @@ -172,6 +173,12 @@ objects from the source repository into a pack in the cloned repository.
> >  	via ssh, this specifies a non-default path for the command
> >  	run on the other end.
> >  
> > +--prime-clone <prime-clone>::
> > +-p <prime-clone>::
> 
> Not many other options have single letter shorthand.  Is it expected
> that it is worth to let this option squat on a short-and-sweet "-p",
> perhaps because it is so frequently used?

I based my decision more on precedent than value--the "--upload-pack" option
had the shorthand "-u".

> > @@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
> >  
> >  static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
> >  static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
> > +static int option_resume;
> >  static char *option_template, *option_depth;
> > -static char *option_origin = NULL;
> > +static const char *option_origin = NULL;
> 
> Is this change related to anything you are doing here?
> 
> If you are fixing things while at it, please don't ;-) If you really
> want to, please also remove " = NULL", from this line and also from
> the next line.  Also do not add " = NULL" at the end of alt_res.
> 

Yes, it was because remote_config uses all const char* strings, and .name gets
assigned to option_origin when it pulls the previous info from config. I looked
to see if option_origin's underlying string is ever modified in a way that
makes it ineligible to be const, and it wasn't.

So, I should remove the NULLs here? And then initalize alt_res to NULL in
cmd_clone?

> >  static char *option_branch = NULL;
> >  ...
> > +static const struct alt_resource *alt_res = NULL;
> 
> > +static char *get_filename(const char *dir)
> > +{
> > +	char *dir_copy = xstrdup(dir);
> > +	strip_trailing_slashes(dir_copy);
> > +	char *filename, *final = NULL;
> > +
> > +	filename = find_last_dir_sep(dir);
> > +
> > +	if (filename && *(++filename))
> > +		final = xstrdup(filename);
> > +
> > +	free(dir_copy);
> > +	return final;
> > +}
> 
> Hmph, don't we have our own basename(3) lookalike that knows about
> dir-sep already?

Whoops, did not catch that. Thanks!

> > @@ -562,7 +614,7 @@ static void write_remote_refs(const struct ref *local_refs)
> >  		die("%s", err.buf);
> >  
> >  	for (r = local_refs; r; r = r->next) {
> > -		if (!r->peer_ref)
> > +		if (!r->peer_ref || ref_exists(r->peer_ref->name))
> >  			continue;
> >  		if (ref_transaction_create(t, r->peer_ref->name, r->old_oid.hash,
> >  					   0, NULL, &err))
> 
> What is this change about?

Because resumable clone is supposed to handle halting, I included the
possibility of halting after the final fetch happened and remote refs were
updated (basically anything after resumable download happens). This change
basically says "If the remote ref was not previously written, then write it."
Because the write is all-or-nothing, this means the logic should either write
all of the intended references, or none of them (because they were written
in the previous invocation).

> > +static const char *setup_and_index_pack(const char *filename)
> > +{
> > +	const char *primer_idx_path = NULL, *primer_bndl_path = NULL;
> > +	primer_idx_path = replace_extension(filename, ".pack", ".idx");
> > +	primer_bndl_path = replace_extension(filename, ".pack", ".bndl");
> > +
> > +	if (!(primer_idx_path && primer_bndl_path)) {
> > +		warning("invalid pack filename '%s', falling back to full "
> > +			"clone", filename);
> > +		return NULL;
> > +	}
> > +
> > +	if (!file_exists(primer_bndl_path)) {
> > +		if (do_index_pack(filename, primer_idx_path)) {
> > +			warning("could not index primer pack, falling back to "
> > +				"full clone");
> > +			return NULL;
> > +		}
> > +	}
> 
> Can it be another (undetected) failure mode that .bndl somehow
> already existed, but not .idx, leaving the resulting object store in
> an incosistent state?  Can do_index_pack() fail and leave .bndl
> behind to get you into such a state?

I don't think so, based on looking at builtin/index-pack.c. write_bundle_file()
happens at the very end of final(), which is long after the process created the
resulting .idx, or has died in the event one is not created. There also is no
automatic cleanup of .idx after that if the process dies somehow after .bndl
is written.

> > +static int write_bundle_refs(const char *bundle_filename)
> > +{
> > +	struct ref_transaction *t;
> > +	struct bundle_header history_tips;
> > +	const char *temp_ref_base = "resume";
> > +	struct strbuf err = STRBUF_INIT;
> > +	int i;
> > +
> > +	init_bundle_header(&history_tips, bundle_filename);
> > +	read_bundle_header(&history_tips);
> > +
> > +	t = ref_transaction_begin(&err);
> > +	for (i = 0; i < history_tips.references.nr; i++) {
> > +		struct strbuf ref_name = STRBUF_INIT;
> > +		strbuf_addf(&ref_name, "refs/temp/%s/%s/temp-%s",
> > +			    option_origin, temp_ref_base,
> > +			    sha1_to_hex(history_tips.references.list[i].sha1));
> 
> Can we do this without polluting refs/temp/ namespace?
> 
> I am imagining that you are first fetching the .pack file from
> sideways when primer service is available, running index-pack on it
> to produce the bundle, and the step after that is to run "git fetch"
> against the original remote to fill the gap between the bit-stale
> history you got in the bundle and the reality that has progressed
> since the primer pack was made, and you need a way to tell to the
> other end that you already have the history leading to these refs
> when you run "git fetch".  I think a bit better way to do so is to
> send these has ".have" while you run the "fetch".
> 
> Wouldn't it do if you add the "--advertise-bundle-tips=<bndl>"
> option to "git fetch", move the code to read the bundle header to
> it, and point the bundle's filename with the option when you spawn
> "git fetch"?

My implementation is definitely a minimum working model. I was hoping to
move to something cleaner.

Is this new option preferable to leveraging the "--reference" option you
mentioned in the earlier discussion? I thought that was a clean solution,
especially because it uses an existing option. Additionally, would there be
any use for this new fetch option outside of cloning? If so, I could see the
value--otherwise, knowing that we want to keep as much resume-specific
knowledge inside clone as possible, "--reference" with a .bndl seems better.

- Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/11] Resumable clone: add resumable download to http/curl
  2016-09-16 22:45   ` Junio C Hamano
@ 2016-09-28  6:41     ` Kevin Wern
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  6:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

On Fri, Sep 16, 2016 at 03:45:44PM -0700, Junio C Hamano wrote:
> > @@ -1136,7 +1138,10 @@ static int handle_curl_result(struct slot_results *results)
> >  				curl_easy_strerror(results->curl_result),
> >  				sizeof(curl_errorstr));
> >  #endif
> > -		return HTTP_ERROR;
> > +		if (results->http_code >= 400)
> > +			return HTTP_ERROR;
> > +		else
> > +			return HTTP_ERROR_RESUMABLE;
> >  	}
> >  }
> 
> Hmm, is "anything below 400" a good definition of resumable errors?

Definitely a rough definition, but I think it covers the majority of cases. I
based this on the fact that my example resumable errors (disconnections during
download) either had results->http_code set to 206 or 0, and that http errors
generally indicate some underlying issue that can't be solved with a repeat
request (there are plenty of exceptions, such as 503 or 504 errors that happen
intermittently). I thought that was enough to be a placeholder in the model. I
did consider something like "has (the file changed size or http_code been
206, some indication of success happened) in the last [x] attempts," but I
question the precision of something like that, especially before exploring
other tried-and-true solutions.

This patch is definitely the sketchiest part of my series, IMO. The concept
isn't fleshed out enough, even theoretically--containing only the element of
retrying with some very basic components to those decisions. I think that this
will ultimately require fishing through other examples of resumable downloads
(which was very painful to read when I tried). I know curl has a --retry option,
but doesn't include it in libcurl...harumph. I'll probably try to read that
implementation and chromium's resumable download code again.

I'll get back to you with a better plan here once I read through a few
examples.

- Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] Resumable clone: add prime-clone to remote-curl
  2016-09-19 13:52   ` Duy Nguyen
@ 2016-09-28  6:45     ` Kevin Wern
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28  6:45 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Kevin Wern, Git Mailing List

On Mon, Sep 19, 2016 at 08:52:34PM +0700, Duy Nguyen wrote:
> 
> A brief overview for this service in
> Documentation/technical/http-protocol.txt (and maybe
> Documentation/gitremote-helpers.txt as well) would be great help. It's
> a bit hard to follow because at this point I don't know anything about
> the server side (and on top of that I was confused between http
> send/receive vs transport send/receive, but this is my fault).
> 

I figured I would miss something in this vein. So many things to cover!

Thanks again for reading.

- Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/11] Resumable clone
  2016-09-27 22:07   ` Junio C Hamano
@ 2016-09-28 17:32     ` Junio C Hamano
  2016-09-28 18:22       ` Junio C Hamano
  2016-09-28 20:46     ` Eric Wong
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-09-28 17:32 UTC (permalink / raw)
  To: Eric Wong; +Cc: Kevin Wern, git

Junio C Hamano <gitster@pobox.com> writes:

>>> git clone --resume <resumable_work_or_git_dir>
>>
>> I think calling "git fetch" should resume, actually.
>> It would reduce the learning curve and seems natural to me:
>> "fetch" is jabout grabbing whatever else appeared since the
>> last clone/fetch happened.
>
> I hate say this but it sounds to me like a terrible idea.  At that
> point when you need to resume, there is not even ref for "fetch" to
> base its incremental work off of.  It is better to keep the knowledge
> of this "priming" dance inside "clone".  Hopefully the original "clone"
> whose connection was disconnected in the middle would automatically
> attempt resuming and "clone --resume" would not be as often as needed.

After sleeping on this, I want to take the above back.

I think teaching "git fetch" about the "resume" part makes tons of
sense.

What "git clone" should have been was:

    * Parse command line arguments;

    * Create a new repository and go into it; this step would
      require us to have parsed the command line for --template,
      <directory>, --separate-git-dir, etc.

    * Talk to the remote and do get_remote_heads() aka ls-remote
      output;

    * Decide what fetch refspec to use, which alternate object store
      to borrow from; this step would require us to have parsed the
      command line for --reference, --mirror, --origin, etc;

    --- we'll insert something new here ---

    * Issue "git fetch" with the refspec determined above; this step
      would require us to have parsed the command line for --depth, etc.

    * Run "git checkout -b" to create an initial checkout; this step
      would require us to have parsed the command line for --branch,
      etc.

Even though the current code conceptually does the above, these
steps are not cleanly separated as such.  I think our update to gain
"resumable clone" feature on the client side need to start by
refactoring the current code, before learning "resumable clone", to
look like the above.

Once we do that, we can insert an extra step before the step that
runs "git fetch" to optionally [*1*] grab the extra piece of
information Kevin's "prime-clone" service produces [*2*], and store
it in the "new repository" somewhere [*3*].

And then, as you suggested, an updated "git fetch" can be taught to
notice the priming information left by the previous step, and use it
to attempt to download the pack until success, and to index that
pack to learn the tips that can be used as ".have" entries in the
request.  From the original server's point of view, this fetch
request would "want" the same set of objects, but would appear as
an incremental update.

Of course, the final step that happens in "git clone", i.e. the
initial checkout, needs to be done somehow, if your user decides to
resume with "git fetch", as "git fetch" _never_ touches the working
tree.  So for that purpose, the primary end-user facing interface
may still have to be "git clone --resume <dir>".  That would
probably skip all four steps in the above sequence, the new
"download priming information" step and go directly to the step that
runs "git fetch".

I do agree that is a much better design, and the crucial design
decision that makes it a better design is your making "git fetch"
aware of this "ah, we have the instruction left in this repository
how to prime its object store" information.

Thanks.


[Footnotes]

*1* It is debatable if it would be an overall win to use the "first
    prime by grabbing a large packfile" clone if we are doing
    shallow or single-branch clone, hence "optionally".  It is
    important to notice that we already have enough information to
    base the decision at this point in the above sequence.

*2* As I said, I do not think it needs to be a separate new service,
    and I suspect it may be a better design to carry it over the
    protocol extension.  At this point in the above sequence, we
    have done an equivalent of ls-remote and if we designed a
    protocol extension to carry the information we should already
    have it.  If we use a separate new service, we can of course
    make a separate connection to ask about "prime-clone"
    information.  The way this piece of information is transmitted
    is of secondary importance.

*3* In addition to the "prime-clone" information, we may need to
    store some information that is only known to "clone" (perhaps
    because it was given from the command line) to help the final
    "checkout -b" step to know what to checkout around here, in case
    the next "fetch" step is interrupted and killed.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT
  2016-09-28  4:46     ` Kevin Wern
@ 2016-09-28 17:54       ` Junio C Hamano
  2016-09-28 18:06         ` Kevin Wern
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-09-28 17:54 UTC (permalink / raw)
  To: Kevin Wern; +Cc: git

Kevin Wern <kevin.m.wern@gmail.com> writes:

> On Fri, Sep 16, 2016 at 04:07:00PM -0700, Junio C Hamano wrote:
>> Kevin Wern <kevin.m.wern@gmail.com> writes:
>> 
>> > Add option RUN_COMMAND_NO_STDOUT, which sets no_stdout on a child
>> > process.
>> >
>> > This will be used by git clone when calling index-pack on a downloaded
>> > packfile.
>> 
>> If it is just one caller, would't it make more sense for that caller
>> set no_stdout explicitly itself?
>
> I based the calling code in do_index_pack on dissociate_from_references, which
> uses run_command_v_opt, so it never occured to me to do that. I thought it was
> just good, uniform style and encapsulation. Like how transport's methods and
> internals aren't really intended to be changed or accessed--unless it's through
> the APIs we create.
>
> However, I don't feel very strongly about this, so I'm okay with this change.

I am neutral and with no opinion.  I may have offered a solution to
a problem that did not exist.

I just got an impression that you were apologetic for having to add
this option that is otherwise useless and tried to suggest a simpler
solution that does not involve such an addition.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT
  2016-09-28 17:54       ` Junio C Hamano
@ 2016-09-28 18:06         ` Kevin Wern
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wern @ 2016-09-28 18:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

On Wed, Sep 28, 2016 at 10:54:52AM -0700, Junio C Hamano wrote:
> 
> I just got an impression that you were apologetic for having to add
> this option that is otherwise useless and tried to suggest a simpler
> solution that does not involve such an addition.

Sorry, to be clear, I meant I was ok with your suggestion. That's what I meant
by 'this change.'

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/11] Resumable clone
  2016-09-28 17:32     ` Junio C Hamano
@ 2016-09-28 18:22       ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-09-28 18:22 UTC (permalink / raw)
  To: Eric Wong; +Cc: Kevin Wern, git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
> What "git clone" should have been was:
>
>     * Parse command line arguments;
>
>     * Create a new repository and go into it; this step would
>       require us to have parsed the command line for --template,
>       <directory>, --separate-git-dir, etc.
>
>     * Talk to the remote and do get_remote_heads() aka ls-remote
>       output;
>
>     * Decide what fetch refspec to use, which alternate object store
>       to borrow from; this step would require us to have parsed the
>       command line for --reference, --mirror, --origin, etc;
>
>     --- we'll insert something new here ---
>
>     * Issue "git fetch" with the refspec determined above; this step
>       would require us to have parsed the command line for --depth, etc.
>
>     * Run "git checkout -b" to create an initial checkout; this step
>       would require us to have parsed the command line for --branch,
>       etc.
>
> Even though the current code conceptually does the above, these
> steps are not cleanly separated as such.  I think our update to gain
> "resumable clone" feature on the client side need to start by
> refactoring the current code, before learning "resumable clone", to
> look like the above.
>
> Once we do that, we can insert an extra step before the step that
> runs "git fetch" to optionally [*1*] grab the extra piece of
> information Kevin's "prime-clone" service produces [*2*], and store
> it in the "new repository" somewhere [*3*].
>
> And then, as you suggested, an updated "git fetch" can be taught to
> notice the priming information left by the previous step, and use it
> to attempt to download the pack until success, and to index that
> pack to learn the tips that can be used as ".have" entries in the
> request.  From the original server's point of view, this fetch
> request would "want" the same set of objects, but would appear as
> an incremental update.

Thinking about this even more, it probably makes even more sense to
move the new "learn prime info and store it in repository somewhere,
so that later re-invocation of 'git fetch' can take advantage of it"
step _into_ "git fetch".  That would allow "git fetch" in a freshly
created empty repository take advantage of this feature for free.

The step that "git clone" internally drives "git fetch" would not
actually be done by spawning a separate process with run_command()
because we would want to reuse the connection we already have with
the server when "git clone" first talked to it to learn "ls-remote"
equivalent (i.e. transport_get_remote_refs()).  I wonder if we can
do without this early "ls-remote"; that would further simplify
things by allowing us to just spawn "git fetch" internally.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/11] Resumable clone
  2016-09-27 22:07   ` Junio C Hamano
  2016-09-28 17:32     ` Junio C Hamano
@ 2016-09-28 20:46     ` Eric Wong
  1 sibling, 0 replies; 39+ messages in thread
From: Eric Wong @ 2016-09-28 20:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Wern, git

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> >> [primeclone]
> >> 	url = http://location/pack-$NAME.pack
> >> 	filetype = pack
> >
> > If unconfigured, I wonder if a primeclone pack can be inferred by
> > the existence of a pack bitmap (or merely being the biggest+oldest
> > pack for dumb HTTP).
> 
> That would probably be a nice heuristics but it is unclear who
> should find that out at runtime.  The downloading side would not
> have a visiblity into directory listing.

I think making a bunch of HEAD requests based on the contents of
$GIT_DIR/objects/info/packs wouldn't be too expensive on either
end, especially when HTTP/1.1 persistent connections + pipelining
may be used.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-09-28 20:46 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
2016-09-16  0:12 ` [PATCH 01/11] Resumable clone: create service git-prime-clone Kevin Wern
2016-09-16 20:53   ` Junio C Hamano
2016-09-28  4:40     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 02/11] Resumable clone: add prime-clone endpoints Kevin Wern
2016-09-19 13:15   ` Duy Nguyen
2016-09-28  4:43     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 03/11] pkt-line: create gentle packet_read_line functions Kevin Wern
2016-09-16 22:17   ` Junio C Hamano
2016-09-28  4:42     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 04/11] Resumable clone: add prime-clone to remote-curl Kevin Wern
2016-09-19 13:52   ` Duy Nguyen
2016-09-28  6:45     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 05/11] Resumable clone: add output parsing to connect.c Kevin Wern
2016-09-16  0:12 ` [PATCH 06/11] Resumable clone: implement transport_prime_clone Kevin Wern
2016-09-16  0:12 ` [PATCH 07/11] Resumable clone: add resumable download to http/curl Kevin Wern
2016-09-16 22:45   ` Junio C Hamano
2016-09-28  6:41     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 08/11] Resumable clone: create transport_download_primer Kevin Wern
2016-09-16  0:12 ` [PATCH 09/11] path: add resumable marker Kevin Wern
2016-09-19 13:24   ` Duy Nguyen
2016-09-16  0:12 ` [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT Kevin Wern
2016-09-16 23:07   ` Junio C Hamano
2016-09-18 19:22     ` Johannes Schindelin
2016-09-28  4:46     ` Kevin Wern
2016-09-28 17:54       ` Junio C Hamano
2016-09-28 18:06         ` Kevin Wern
2016-09-16  0:12 ` [PATCH 11/11] Resumable clone: implement primer logic in git-clone Kevin Wern
2016-09-16 23:32   ` Junio C Hamano
2016-09-28  5:49     ` Kevin Wern
2016-09-19 14:04   ` Duy Nguyen
2016-09-19 17:16     ` Junio C Hamano
2016-09-28  4:44     ` Kevin Wern
2016-09-16 20:47 ` [PATCH 00/11] Resumable clone Junio C Hamano
2016-09-27 21:51 ` Eric Wong
2016-09-27 22:07   ` Junio C Hamano
2016-09-28 17:32     ` Junio C Hamano
2016-09-28 18:22       ` Junio C Hamano
2016-09-28 20:46     ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).