git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC 0/4] Shallow clones with on-demand fetch
@ 2017-03-04 19:18 Mark Thomas
  2017-03-04 19:18 ` [RFC 1/4] upload-file: Add upload-file command Mark Thomas
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Mark Thomas @ 2017-03-04 19:18 UTC (permalink / raw)
  To: git; +Cc: Mark Thomas

Hello everyone,

This is an RFC for an enhancement to shallow repositories to make them
behave more like full clones.

I was inspired a bit by Microsoft's announcement of their Git VFS.  I
saw that people have talked in the past about making git fetch objects
from remotes as they are needed, and decided to give it a try.

The patch series adds a "--on-demand" option to git clone, which, when
used in conjunction with the existing shallow clone operations, clones
the full history of the repository's commits, but only the files that
would be included in the shallow clone.

When a file that is missing is required, git requests the file on-demand
from the remote, via a new 'upload-file' service.

Public git servers are unlikely to want to enable this, due to the
addition load it may cause, but within an organization's own network, it
will allow full access to the repository history without needing a full
initial clone.

The patch set is in four parts:
  1:
    Adds the "upload-file" command, which starts a new protocol
    conversation with the client allowing it to request file info and
    file contents.  The connection is kept open so that the client
    can make as many requests as it likes.  The client terminates the
    connection by sending a packet containing "end".
  2:
    Adds the ability for file info and content to be requested from
    the remote if the file cannot be found in any pack, or loose in
    the repository.  Currently this only looks at the default remote,
    but the intention is this would be configurable.
  3:
    Adds the "on-demand" capability to "upload-pack".  When a client
    requests this capability, "upload-pack" includes in the pack
    all commits, even those that would normally be dropped by the
    shallow clone.
  4:
    Adds the "--on-demand" option to clone, to request a shallow
    clone.

This is a proof-of-concept, so it is in no way complete.  It contains a
few hacks to make it work, but these can be ironed out with a bit more
work.  What I have so far is sufficient to try out the idea.  I'd like
to get people's opinions on it before I spend any more time working on
it, plus also I'm not very familiar with the git codebase, so some help
would be appreciated.

As an example, the Linux repository currently stands at 2.0GB of packed
data.  A "git clone --shallow-since=2016-01-01 --on-demand" is only
561MB, and yet remains fully functional.  A git blame on the Makefile,
for example, shows all changes to the file, right back to Linus's
original commit in 2005.

Still to do:

 - Fix up the hacks and make everything work correctly.
 - Make fetching of further updates work correctly.
 - Store the retrieved files in an LRU cache, possibly with the option
   of storing them in the main repo data, too.
 - Add a gc/enshallow operation to make the repo shallower by forgetting
   old files, or moving them to the LRU cache.
 - Add configurable remote to fetch from.
 - Documentation.
 - Much more.

Please let me know what you think, and if an experienced git developer
would like to help out with finishing this, that would be even better.

Mark Thomas (4):
  upload-file: Add upload-file command
  on-demand: Fetch missing files from remote
  upload-pack: Send all commits if client requests on-demand
  clone: Request on-demand shallow clones

 .gitignore             |   1 +
 Makefile               |   3 +
 builtin/clone.c        |   7 +-
 builtin/pack-objects.c |  26 ++++++-
 cache-tree.c           |   2 +-
 cache.h                |   3 +-
 daemon.c               |   6 ++
 fetch-pack.c           |   3 +
 fetch-pack.h           |   1 +
 list-objects.c         |  12 ++--
 list-objects.h         |  13 +++-
 object.h               |   1 +
 on_demand.c            | 183 +++++++++++++++++++++++++++++++++++++++++++++++++
 on_demand.h            |  12 ++++
 sha1_file.c            |   8 ++-
 shallow.c              |   2 +-
 transport.c            |   3 +
 transport.h            |   4 ++
 upload-file.c          |  87 +++++++++++++++++++++++
 upload-pack.c          |   8 ++-
 20 files changed, 370 insertions(+), 15 deletions(-)
 create mode 100644 on_demand.c
 create mode 100644 on_demand.h
 create mode 100644 upload-file.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/4] upload-file: Add upload-file command
  2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
@ 2017-03-04 19:18 ` Mark Thomas
  2017-03-04 19:18 ` [RFC 2/4] on-demand: Fetch missing files from remote Mark Thomas
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mark Thomas @ 2017-03-04 19:18 UTC (permalink / raw)
  To: git; +Cc: Mark Thomas

The upload-file command allows a remote to request specific files by sha1.

Signed-off-by: Mark Thomas <markbt@efaref.net>
---
 .gitignore    |  1 +
 Makefile      |  2 ++
 daemon.c      |  6 +++++
 upload-file.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+)
 create mode 100644 upload-file.c

diff --git a/.gitignore b/.gitignore
index 833ef3b..c2db9c2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -165,6 +165,7 @@
 /git-update-ref
 /git-update-server-info
 /git-upload-archive
+/git-upload-file
 /git-upload-pack
 /git-var
 /git-verify-commit
diff --git a/Makefile b/Makefile
index 9ec6065..1b84322 100644
--- a/Makefile
+++ b/Makefile
@@ -597,6 +597,7 @@ PROGRAM_OBJS += sh-i18n--envsubst.o
 PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
 PROGRAM_OBJS += upload-pack.o
+PROGRAM_OBJS += upload-file.o
 PROGRAM_OBJS += remote-testsvn.o
 
 # Binary suffix, set to .exe for Windows builds
@@ -668,6 +669,7 @@ BINDIR_PROGRAMS_NEED_X += git-upload-pack
 BINDIR_PROGRAMS_NEED_X += git-receive-pack
 BINDIR_PROGRAMS_NEED_X += git-upload-archive
 BINDIR_PROGRAMS_NEED_X += git-shell
+BINDIR_PROGRAMS_NEED_X += git-upload-file
 
 BINDIR_PROGRAMS_NO_X += git-cvsserver
 
diff --git a/daemon.c b/daemon.c
index 473e6b6..8b5b026 100644
--- a/daemon.c
+++ b/daemon.c
@@ -479,6 +479,11 @@ static int upload_pack(void)
 	return run_service_command(argv);
 }
 
+static int upload_file(void)
+{
+	const char *argv[] = { "upload-file", ".", NULL };
+	return run_service_command(argv);
+}
 static int upload_archive(void)
 {
 	static const char *argv[] = { "upload-archive", ".", NULL };
@@ -494,6 +499,7 @@ static int receive_pack(void)
 static struct daemon_service daemon_service[] = {
 	{ "upload-archive", "uploadarch", upload_archive, 0, 1 },
 	{ "upload-pack", "uploadpack", upload_pack, 1, 1 },
+	{ "upload-file", "uploadfile", upload_file, 1, 1 },
 	{ "receive-pack", "receivepack", receive_pack, 0, 1 },
 };
 
diff --git a/upload-file.c b/upload-file.c
new file mode 100644
index 0000000..cb2bfe8
--- /dev/null
+++ b/upload-file.c
@@ -0,0 +1,87 @@
+
+#include "cache.h"
+#include "exec_cmd.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static const char * const upload_file_usage[] = {
+	N_("git upload-file [<options>] <dir>"),
+	NULL
+};
+
+
+static void upload_file(void)
+{
+	for (;;) {
+		char *line = packet_read_line(0, NULL);
+		const char *arg;
+		if (!line)
+			break;
+
+		if (skip_prefix(line, "info ", &arg)) {
+			unsigned char sha1[20];
+			void *buffer;
+			enum object_type type;
+			unsigned long size;
+
+			if (get_sha1_hex(arg, sha1))
+				die("invalid sha: %s", arg);
+
+			buffer = read_sha1_file(sha1, &type, &size);
+			if (buffer) {
+				packet_write_fmt(1, "found %s %d %ld\n", sha1_to_hex(sha1), type, size);
+				free(buffer);
+			} else {
+				packet_write_fmt(1, "missing %s\n", sha1_to_hex(sha1));
+			}
+		}
+
+		if (skip_prefix(line, "get ", &arg)) {
+			unsigned char sha1[20];
+			void *buffer;
+			enum object_type type;
+			unsigned long size;
+
+			if (get_sha1_hex(arg, sha1))
+				die("invalid sha: %s", arg);
+
+			buffer = read_sha1_file(sha1, &type, &size);
+			if (buffer) {
+				packet_write_fmt(1, "found %s %d %ld\n", sha1_to_hex(sha1), type, size);
+				write_or_die(1, buffer, size);
+				free(buffer);
+			} else {
+				packet_write_fmt(1, "missing %s\n", sha1_to_hex(sha1));
+			}
+			
+		}
+
+		if (!strcmp(line, "end"))
+			break;
+	}
+}
+
+int cmd_main(int argc, const char **argv)
+{
+	const char *dir;
+	struct option options[] = {
+		OPT_END()
+	};
+
+	packet_trace_identity("upload-file");
+
+	argc = parse_options(argc, argv, NULL, options, upload_file_usage, 0);
+
+	if (argc != 1)
+		usage_with_options(upload_file_usage, options);
+
+	setup_path();
+
+	dir = argv[0];
+
+	if (!enter_repo(dir, 0))
+		die("'%s' does not appear to be a git repository", dir);
+
+	upload_file();
+	return 0;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/4] on-demand: Fetch missing files from remote
  2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
  2017-03-04 19:18 ` [RFC 1/4] upload-file: Add upload-file command Mark Thomas
@ 2017-03-04 19:18 ` Mark Thomas
  2017-03-04 19:19 ` [RFC 3/4] upload-pack: Send all commits if client requests on-demand Mark Thomas
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mark Thomas @ 2017-03-04 19:18 UTC (permalink / raw)
  To: git; +Cc: Mark Thomas

If an object (tree, blob, ...) is not found either in the
packs or loose, check if it is available on-demand from
the remote.

Signed-off-by: Mark Thomas <markbt@efaref.net>
---
 Makefile    |   1 +
 cache.h     |   3 +-
 on_demand.c | 157 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 on_demand.h |   8 ++++
 sha1_file.c |   8 +++-
 5 files changed, 174 insertions(+), 3 deletions(-)
 create mode 100644 on_demand.c
 create mode 100644 on_demand.h

diff --git a/Makefile b/Makefile
index 1b84322..fb8ca6c 100644
--- a/Makefile
+++ b/Makefile
@@ -784,6 +784,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += on_demand.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/cache.h b/cache.h
index 80b6372..d34af06 100644
--- a/cache.h
+++ b/cache.h
@@ -1730,7 +1730,8 @@ struct object_info {
 		OI_CACHED,
 		OI_LOOSE,
 		OI_PACKED,
-		OI_DBCACHED
+		OI_DBCACHED,
+		OI_ONDEMAND
 	} whence;
 	union {
 		/*
diff --git a/on_demand.c b/on_demand.c
new file mode 100644
index 0000000..a0aaf18
--- /dev/null
+++ b/on_demand.c
@@ -0,0 +1,157 @@
+#include "transport.h"
+#include "pkt-line.h"
+#include "remote.h"
+#include "commit.h"
+
+static struct remote *remote = NULL;
+static struct transport *transport = NULL;
+static int fd[2];
+static int connected;
+
+struct trace_key trace_on_demand = TRACE_KEY_INIT(ON_DEMAND);
+
+static void on_demand_cleanup(void)
+{
+	if (connected) {
+		packet_write_fmt(fd[1], "end\n");
+		transport_disconnect(transport);
+		connected = 0;
+	}
+}
+
+static int on_demand_connect(void)
+{
+	if (!connected) {
+		if (!remote)
+			remote = remote_get(NULL);
+		if (remote && !transport)
+			transport = transport_get(remote, NULL);
+		if (!remote || !transport)
+			return 0;
+		if (transport_connect(transport, transport->url, "git-upload-file", fd))
+			return 0;
+		connected = 1;
+		atexit(on_demand_cleanup);
+	}
+	return 1;
+}
+
+void *read_remote_on_demand(const unsigned char *sha1, enum object_type *type,
+			    unsigned long *size)
+{
+	const char *line;
+	const char *arg;
+	int line_size;
+
+	if (!on_demand_connect())
+		return NULL;
+
+	packet_write_fmt(fd[1], "get %s\n", sha1_to_hex(sha1));
+
+	line = packet_read_line(fd[0], &line_size);
+
+	if (line_size == 0)
+		return NULL;
+
+	if (skip_prefix(line, "missing ", &arg))
+		return NULL;
+
+	if (skip_prefix(line, "found ", &arg)) {
+		char *end = NULL;
+		void *buffer;
+		unsigned char file_sha1[GIT_SHA1_RAWSZ];
+		enum object_type file_type;
+		unsigned long file_size;
+		ssize_t size_read;
+
+		if (get_sha1_hex(arg, file_sha1))
+			die("git on-demand: protocol error, "
+			    "expected to get sha in '%s'", line);
+		arg += GIT_SHA1_HEXSZ;
+
+		file_type = strtol(arg, &end, 0);
+		if (!end || file_type < 0 || file_type >= OBJ_MAX)
+			die("git on-demand: protocol error, "
+			    "invalid object type in '%s'", line);
+		arg = end;
+
+		file_size = strtoul(arg, &end, 0);
+		if (!end || *end || file_size > LONG_MAX)
+			die("git on-demand: protocol error, "
+			    "invalid file size in '%s'", line);
+
+		buffer = xmalloc(file_size);
+		if (!buffer)
+			die("git on-demand: failed to allocate "
+			    "buffer for %ld bytes", file_size);
+
+		size_read = read_in_full(fd[0], buffer, file_size);
+		if (size_read != (ssize_t)file_size)
+			die("git on-demand: protocol error, "
+			    "failed to read file data");
+
+		trace_printf_key(&trace_on_demand, "on-demand: fetched %s\n",
+				 sha1_to_hex(sha1));
+		*type = file_type;
+		*size = file_size;
+		return buffer;
+	}
+
+	die("git on-demand: protocol error, "
+	    "unexpected response: '%s'", line);
+}
+
+int object_info_on_demand(const unsigned char *sha1, struct object_info *oi)
+{
+	const char *line;
+	const char *arg;
+	int line_size;
+
+	if (!on_demand_connect())
+		return -1;
+
+	packet_write_fmt(fd[1], "info %s\n", sha1_to_hex(sha1));
+
+	line = packet_read_line(fd[0], &line_size);
+
+	if (line_size == 0)
+		return -1;
+
+	if (skip_prefix(line, "missing ", &arg))
+		return -1;
+
+	if (skip_prefix(line, "found ", &arg)) {
+		char *end = NULL;
+		unsigned char sha1[GIT_SHA1_RAWSZ];
+		enum object_type file_type;
+		unsigned long file_size;
+
+		if (get_sha1_hex(arg, sha1))
+			die("git on-demand: protocol error, "
+			    "expected to get sha in '%s'", line);
+		arg += GIT_SHA1_HEXSZ;
+
+		file_type = strtol(arg, &end, 0);
+		if (!end || file_type < 0 || file_type >= OBJ_MAX)
+			die("git on-demand: protocol error, "
+			    "invalid object type in '%s'", line);
+		arg = end;
+
+		file_size = strtoul(arg, &end, 0);
+		if (!end || *end || file_size > LONG_MAX)
+			die("git on-demand: protocol error, "
+			    "invalid file size in '%s'", line);
+
+		if (oi->typep)
+			*oi->typep = file_type;
+		if (oi->sizep)
+			*oi->typep = file_size;
+		if (oi->disk_sizep)
+			*oi->disk_sizep = 0;
+		oi->whence = OI_ONDEMAND;
+		return 0;
+	}
+
+	die("git on-demand: protocol error, "
+	    "unexpected response: '%s'", line);
+}
diff --git a/on_demand.h b/on_demand.h
new file mode 100644
index 0000000..09a8072
--- /dev/null
+++ b/on_demand.h
@@ -0,0 +1,8 @@
+#ifndef ON_DEMAND_H
+#define ON_DEMAND_H
+
+void *read_remote_on_demand(const unsigned char *sha1, enum object_type *type,
+			    unsigned long *size);
+int object_info_on_demand(const unsigned char *sha1, struct object_info *oi);
+
+#endif
diff --git a/sha1_file.c b/sha1_file.c
index 6628f06..510da41 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -27,6 +27,7 @@
 #include "list.h"
 #include "mergesort.h"
 #include "quote.h"
+#include "on_demand.h"
 
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
@@ -2979,7 +2980,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		/* Not a loose object; someone else may have just packed it. */
 		reprepare_packed_git();
 		if (!find_pack_entry(real, &e))
-			return -1;
+			return object_info_on_demand(real, oi);
 	}
 
 	/*
@@ -3091,7 +3092,10 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 		return buf;
 	}
 	reprepare_packed_git();
-	return read_packed_sha1(sha1, type, size);
+	buf = read_packed_sha1(sha1, type, size);
+	if (buf)
+		return buf;
+	return read_remote_on_demand(sha1, type, size);
 }
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 3/4] upload-pack: Send all commits if client requests on-demand
  2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
  2017-03-04 19:18 ` [RFC 1/4] upload-file: Add upload-file command Mark Thomas
  2017-03-04 19:18 ` [RFC 2/4] on-demand: Fetch missing files from remote Mark Thomas
@ 2017-03-04 19:19 ` Mark Thomas
  2017-03-04 19:19 ` [RFC 4/4] clone: Request on-demand shallow clones Mark Thomas
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mark Thomas @ 2017-03-04 19:19 UTC (permalink / raw)
  To: git; +Cc: Mark Thomas

Signed-off-by: Mark Thomas <markbt@efaref.net>
---
 builtin/pack-objects.c | 26 ++++++++++++++++++++++++--
 list-objects.c         | 12 +++++++-----
 list-objects.h         | 13 ++++++++++++-
 object.h               |  1 +
 on_demand.c            | 26 ++++++++++++++++++++++++++
 on_demand.h            |  4 ++++
 upload-pack.c          |  8 +++++++-
 7 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f294dcf..c8b2503 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -24,6 +24,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "mru.h"
+#include "on_demand.h"
 
 static const char *pack_usage[] = {
 	N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -77,6 +78,8 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static int send_all_commits;
+
 /*
  * stats
  */
@@ -2750,12 +2753,15 @@ static void record_recent_commit(struct commit *commit, void *data)
 static void get_object_list(int ac, const char **av)
 {
 	struct rev_info revs;
+	struct rev_info revs2;
 	char line[1000];
 	int flags = 0;
 
 	init_revisions(&revs, NULL);
+	init_revisions(&revs2, NULL);
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, &revs, NULL);
+	setup_revisions(ac, av, &revs2, NULL);
 
 	/* make sure shallows are read */
 	is_repository_shallow();
@@ -2776,7 +2782,10 @@ static void get_object_list(int ac, const char **av)
 				unsigned char sha1[20];
 				if (get_sha1_hex(line + 10, sha1))
 					die("not an SHA-1 '%s'", line + 10);
-				register_shallow(sha1);
+				if (send_all_commits)
+					register_on_demand_cutoff(sha1);
+				else
+					register_shallow(sha1);
 				use_bitmap_index = 0;
 				continue;
 			}
@@ -2784,6 +2793,8 @@ static void get_object_list(int ac, const char **av)
 		}
 		if (handle_revision_arg(line, &revs, flags, REVARG_CANNOT_BE_FILENAME))
 			die("bad revision '%s'", line);
+		if (handle_revision_arg(line, &revs2, flags, REVARG_CANNOT_BE_FILENAME))
+			die("bad revision '%s'", line);
 	}
 
 	if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
@@ -2792,7 +2803,16 @@ static void get_object_list(int ac, const char **av)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	mark_edges_uninteresting(&revs, show_edge);
-	traverse_commit_list(&revs, show_commit, show_object, NULL);
+
+	if (send_all_commits) {
+		revs2.include_check = on_demand_include_check;
+		traverse_commit_list(&revs2, on_demand_show_commit_tree, NULL,
+				     NULL);
+		reset_revision_walk();
+	}
+
+	traverse_commit_list_extended(&revs, show_commit, show_object,
+				      on_demand_show_tree_check, NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
@@ -2928,6 +2948,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("use a bitmap index if available to speed up counting objects")),
 		OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
 			 N_("write a bitmap index together with the pack index")),
+		OPT_BOOL(0, "send-all-commits", &send_all_commits,
+			 N_("send all commits for on-demand shallow fetches")),
 		OPT_END(),
 	};
 
diff --git a/list-objects.c b/list-objects.c
index f3ca6aa..2607549 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -183,10 +183,11 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *data)
+void traverse_commit_list_extended(struct rev_info *revs,
+				   show_commit_fn show_commit,
+				   show_object_fn show_object,
+				   show_tree_check_fn show_tree_check,
+				   void *data)
 {
 	int i;
 	struct commit *commit;
@@ -198,7 +199,8 @@ void traverse_commit_list(struct rev_info *revs,
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (commit->tree)
+		if (show_object && commit->tree &&
+		    (!show_tree_check || show_tree_check(commit, data)))
 			add_pending_tree(revs, commit->tree);
 		show_commit(commit, data);
 	}
diff --git a/list-objects.h b/list-objects.h
index 0cebf85..e80dc8c 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -3,7 +3,18 @@
 
 typedef void (*show_commit_fn)(struct commit *, void *);
 typedef void (*show_object_fn)(struct object *, const char *, void *);
-void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
+typedef int (*show_tree_check_fn)(struct commit *, void *);
+void traverse_commit_list_extended(struct rev_info *, show_commit_fn,
+				   show_object_fn, show_tree_check_fn, void *);
+
+inline void traverse_commit_list(struct rev_info *revs,
+				 show_commit_fn show_commit,
+				 show_object_fn show_object,
+				 void *data)
+{
+	traverse_commit_list_extended(revs, show_commit, show_object, NULL,
+				      data);
+}
 
 typedef void (*show_edge_fn)(struct commit *);
 void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
diff --git a/object.h b/object.h
index f52957d..25177fd 100644
--- a/object.h
+++ b/object.h
@@ -38,6 +38,7 @@ struct object_array {
  * http-push.c:                            16-----19
  * commit.c:                               16-----19
  * sha1_name.c:                                     20
+ * on_demand.c:                                       21-22
  */
 #define FLAG_BITS  27
 
diff --git a/on_demand.c b/on_demand.c
index a0aaf18..c72d7a5 100644
--- a/on_demand.c
+++ b/on_demand.c
@@ -155,3 +155,29 @@ int object_info_on_demand(const unsigned char *sha1, struct object_info *oi)
 	die("git on-demand: protocol error, "
 	    "unexpected response: '%s'", line);
 }
+
+#define ON_DEMAND_CUTOFF	(1u << 21)
+#define ON_DEMAND_SHOW_TREE	(1u << 22)
+
+void register_on_demand_cutoff(const unsigned char *sha1)
+{
+	struct commit *commit = lookup_commit(sha1);
+	if (commit)
+		commit->object.flags |= ON_DEMAND_CUTOFF;
+}
+
+int on_demand_include_check(struct commit *commit, void *data)
+{
+	return !(commit->object.flags & ON_DEMAND_CUTOFF);
+}
+
+void on_demand_show_commit_tree(struct commit *commit, void *data)
+{
+	commit->object.flags |= ON_DEMAND_SHOW_TREE;
+}
+
+int on_demand_show_tree_check(struct commit *commit, void *data)
+{
+	return !!(commit->object.flags &
+		  (ON_DEMAND_SHOW_TREE|ON_DEMAND_CUTOFF));
+}
diff --git a/on_demand.h b/on_demand.h
index 09a8072..7bbb523 100644
--- a/on_demand.h
+++ b/on_demand.h
@@ -4,5 +4,9 @@
 void *read_remote_on_demand(const unsigned char *sha1, enum object_type *type,
 			    unsigned long *size);
 int object_info_on_demand(const unsigned char *sha1, struct object_info *oi);
+void register_on_demand_cutoff(const unsigned char *sha1);
+int on_demand_include_check(struct commit *commit, void *data);
+void on_demand_show_commit_tree(struct commit *commit, void *data);
+int on_demand_show_tree_check(struct commit *commit, void *data);
 
 #endif
diff --git a/upload-pack.c b/upload-pack.c
index 7597ba3..1b552b4 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -42,6 +42,7 @@ static int multi_ack;
 static int no_done;
 static int use_thin_pack, use_ofs_delta, use_include_tag;
 static int no_progress, daemon_mode;
+static int send_all_commits;
 /* Allow specifying sha1 if it is a ref tip. */
 #define ALLOW_TIP_SHA1	01
 /* Allow request of a sha1 if it is reachable from a ref (possibly hidden ref). */
@@ -130,6 +131,8 @@ static void create_pack_file(void)
 		argv_array_push(&pack_objects.args, "--delta-base-offset");
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
+	if (send_all_commits)
+		argv_array_push(&pack_objects.args, "--send-all-commits");
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
@@ -820,6 +823,8 @@ static void receive_needs(void)
 			no_progress = 1;
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
+		if (parse_feature_request(features, "on-demand"))
+			send_all_commits = 1;
 
 		o = parse_object(sha1_buf);
 		if (!o)
@@ -924,7 +929,8 @@ static int send_ref(const char *refname, const struct object_id *oid,
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow deepen-since deepen-not"
-		" deepen-relative no-progress include-tag multi_ack_detailed";
+		" deepen-relative no-progress include-tag multi_ack_detailed"
+		" on-demand";
 	const char *refname_nons = strip_namespace(refname);
 	struct object_id peeled;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 4/4] clone: Request on-demand shallow clones
  2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
                   ` (2 preceding siblings ...)
  2017-03-04 19:19 ` [RFC 3/4] upload-pack: Send all commits if client requests on-demand Mark Thomas
@ 2017-03-04 19:19 ` Mark Thomas
  2017-03-06 19:16 ` [RFC 0/4] Shallow clones with on-demand fetch Jonathan Tan
  2017-03-06 19:18 ` Junio C Hamano
  5 siblings, 0 replies; 9+ messages in thread
From: Mark Thomas @ 2017-03-04 19:19 UTC (permalink / raw)
  To: git; +Cc: Mark Thomas

Add the --on-demand option to git-clone, which, when used in combination
with the existing shallow clone options, requests an on-demand shallow
clone.

An on-demand shallow clone contains all commits from all history, but
the commits that would normally be omitted in the shallow clone do not
have their trees or blobs in the repository.  Instead, they will be
fetched on-demand from the remote.

Signed-off-by: Mark Thomas <markbt@efaref.net>
---
 builtin/clone.c | 7 ++++++-
 cache-tree.c    | 2 +-
 fetch-pack.c    | 3 +++
 fetch-pack.h    | 1 +
 shallow.c       | 2 +-
 transport.c     | 3 +++
 transport.h     | 4 ++++
 7 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 3f63edb..7541016 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -40,6 +40,7 @@ static const char * const builtin_clone_usage[] = {
 
 static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
 static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
+static int option_on_demand;
 static int option_shallow_submodules;
 static int deepen;
 static char *option_template, *option_depth, *option_since;
@@ -100,6 +101,8 @@ static struct option builtin_clone_options[] = {
 		    N_("create a shallow clone since a specific time")),
 	OPT_STRING_LIST(0, "shallow-exclude", &option_not, N_("revision"),
 			N_("deepen history of shallow clone, excluding rev")),
+	OPT_BOOL(0, "on-demand", &option_on_demand,
+		 N_("Make shallow clone an on-demand clone")),
 	OPT_BOOL(0, "single-branch", &option_single_branch,
 		    N_("clone only one branch, HEAD or --branch")),
 	OPT_BOOL(0, "shallow-submodules", &option_shallow_submodules,
@@ -1045,6 +1048,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_not.nr)
 		transport_set_option(transport, TRANS_OPT_DEEPEN_NOT,
 				     (const char *)&option_not);
+	if (option_on_demand)
+		transport_set_option(transport, TRANS_OPT_ON_DEMAND, "1");
 	if (option_single_branch)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
 
@@ -1118,7 +1123,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_fetch_refs(transport, mapped_refs);
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
-			   branch_top.buf, reflog_msg.buf, transport, !is_local);
+			   branch_top.buf, reflog_msg.buf, transport, !is_local && !option_on_demand);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/cache-tree.c b/cache-tree.c
index 345ea35..10b14fe 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -356,7 +356,7 @@ static int update_one(struct cache_tree *it,
 		}
 		if (mode != S_IFGITLINK && !missing_ok && !has_sha1_file(sha1)) {
 			strbuf_release(&buffer);
-			if (expected_missing)
+			if (expected_missing || 1 /*** FIXME: markbt temp hack, to allow missing files ***/)
 				return -1;
 			return error("invalid object %06o %s for '%.*s'",
 				mode, sha1_to_hex(sha1), entlen+baselen, path);
diff --git a/fetch-pack.c b/fetch-pack.c
index e0f5d5c..1dd4823 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -372,6 +372,7 @@ static int find_common(struct fetch_pack_args *args,
 			if (prefer_ofs_delta)   strbuf_addstr(&c, " ofs-delta");
 			if (deepen_since_ok)    strbuf_addstr(&c, " deepen-since");
 			if (deepen_not_ok)      strbuf_addstr(&c, " deepen-not");
+			if (args->on_demand)    strbuf_addf(&c, " on-demand");
 			if (agent_supported)    strbuf_addf(&c, " agent=%s",
 							    git_user_agent_sanitized());
 			packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
@@ -936,6 +937,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		die(_("Server does not support --shallow-exclude"));
 	if (!server_supports("deepen-relative") && args->deepen_relative)
 		die(_("Server does not support --deepen"));
+	if (!server_supports("on-demand") && args->on_demand)
+		die(_("Server does not support --on-demand"));
 
 	if (everything_local(args, &ref, sought, nr_sought)) {
 		packet_flush(fd[1]);
diff --git a/fetch-pack.h b/fetch-pack.h
index c912e3d..16ab8bd 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -29,6 +29,7 @@ struct fetch_pack_args {
 	unsigned cloning:1;
 	unsigned update_shallow:1;
 	unsigned deepen:1;
+	unsigned on_demand:1;
 };
 
 /*
diff --git a/shallow.c b/shallow.c
index 11f7dde..a24292b 100644
--- a/shallow.c
+++ b/shallow.c
@@ -45,7 +45,7 @@ int is_repository_shallow(void)
 	FILE *fp;
 	char buf[1024];
 	const char *path = alternate_shallow_file;
-
+	is_shallow = 0;  /*** FIXME: markbt temp hack to allow shallow repos with on-demand files ***/
 	if (is_shallow >= 0)
 		return is_shallow;
 
diff --git a/transport.c b/transport.c
index 5828e06..69f8d72 100644
--- a/transport.c
+++ b/transport.c
@@ -160,6 +160,8 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
 		opts->deepen_relative = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_ON_DEMAND)) {
+		opts->on_demand = !!value;
 	}
 	return 1;
 }
@@ -223,6 +225,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.deepen_since = data->options.deepen_since;
 	args.deepen_not = data->options.deepen_not;
 	args.deepen_relative = data->options.deepen_relative;
+	args.on_demand = data->options.on_demand;
 	args.check_self_contained_and_connected =
 		data->options.check_self_contained_and_connected;
 	args.cloning = transport->cloning;
diff --git a/transport.h b/transport.h
index bc55715..d4f848b 100644
--- a/transport.h
+++ b/transport.h
@@ -15,6 +15,7 @@ struct git_transport_options {
 	unsigned self_contained_and_connected : 1;
 	unsigned update_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned on_demand : 1;
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
@@ -210,6 +211,9 @@ void transport_check_allowed(const char *type);
 /* Send push certificates */
 #define TRANS_OPT_PUSH_CERT "pushcert"
 
+/* On-demand clone */
+#define TRANS_OPT_ON_DEMAND "on-demand"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] Shallow clones with on-demand fetch
  2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
                   ` (3 preceding siblings ...)
  2017-03-04 19:19 ` [RFC 4/4] clone: Request on-demand shallow clones Mark Thomas
@ 2017-03-06 19:16 ` Jonathan Tan
  2017-03-06 20:01   ` Stefan Beller
  2017-03-06 19:18 ` Junio C Hamano
  5 siblings, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2017-03-06 19:16 UTC (permalink / raw)
  To: Mark Thomas, git; +Cc: peartben

On 03/04/2017 11:18 AM, Mark Thomas wrote:
> I was inspired a bit by Microsoft's announcement of their Git VFS.  I
> saw that people have talked in the past about making git fetch objects
> from remotes as they are needed, and decided to give it a try.

For reference, one such conversation is [1]. (cc-ing Ben Peart also)

> The patch series adds a "--on-demand" option to git clone, which, when
> used in conjunction with the existing shallow clone operations, clones
> the full history of the repository's commits, but only the files that
> would be included in the shallow clone.
>
> When a file that is missing is required, git requests the file on-demand
> from the remote, via a new 'upload-file' service.

A reachability check (of the blob) might be a good idea. The current Git 
implementation already supports fetching a blob (perhaps a bug) but has 
problems with reachability calculations that I tried to fix in [2], but 
found some bugs that weren't easily fixable.

As I said in [2], I think that proper fetching of blobs on demand is a 
prerequisite to any sort of missing object tolerance (like your 
on-demand clones), so I haven't thought much about the topics in the 
rest of your patch set.

[1] <20170113155253.1644-1-benpeart@microsoft.com> (you can search for 
emails by Message ID in online archives like 
https://public-inbox.org/git if you don't already have them)
[2] <cover.1487984670.git.jonathantanmy@google.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] Shallow clones with on-demand fetch
  2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
                   ` (4 preceding siblings ...)
  2017-03-06 19:16 ` [RFC 0/4] Shallow clones with on-demand fetch Jonathan Tan
@ 2017-03-06 19:18 ` Junio C Hamano
  2017-03-07  9:42   ` Jeff King
  5 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2017-03-06 19:18 UTC (permalink / raw)
  To: Mark Thomas; +Cc: git

Mark Thomas <markbt@efaref.net> writes:

> This is a proof-of-concept, so it is in no way complete.  It contains a
> few hacks to make it work, but these can be ironed out with a bit more
> work.  What I have so far is sufficient to try out the idea.

Two things that immediately come to mind (which may or may not be
real issues) are 

 (1) What (if any) security model you have in mind.

     From object-confidentiality's point of view, this needs to be
     enabled only on a host that allows
     uploadpack.allowAnySHA1InWant but even riskier.

     From DoS point of view, you can make a short 40-byte request to
     cause the other side emit megabytes of stuff.  I do not think
     it is a new problem (anybody can repeatedly request a clone of
     large stuff), but there may be new ramifications.

 (2) If the interface to ask just one object kills the whole idea
     due to roundtrip latency.

     You may want to be able to say "I want all objects reachable
     from this tree; please give me a packfile of needed objects
     assuming that I have all objects reachable from this other tree
     (or these other trees)".


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] Shallow clones with on-demand fetch
  2017-03-06 19:16 ` [RFC 0/4] Shallow clones with on-demand fetch Jonathan Tan
@ 2017-03-06 20:01   ` Stefan Beller
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Beller @ 2017-03-06 20:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Mark Thomas, git@vger.kernel.org, Ben Peart

On Mon, Mar 6, 2017 at 11:16 AM, Jonathan Tan <jonathantanmy@google.com> wrote:

> [1] <20170113155253.1644-1-benpeart@microsoft.com> (you can search for
> emails by Message ID in online archives like https://public-inbox.org/git if
> you don't already have them)

Not just search, but the immediate lookup is
 https://public-inbox.org/git/<message-id>
so
 https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com

> [2] <cover.1487984670.git.jonathantanmy@google.com>

and
  https://public-inbox.org/git/cover.1487984670.git.jonathantanmy@google.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/4] Shallow clones with on-demand fetch
  2017-03-06 19:18 ` Junio C Hamano
@ 2017-03-07  9:42   ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2017-03-07  9:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mark Thomas, git

On Mon, Mar 06, 2017 at 11:18:30AM -0800, Junio C Hamano wrote:

> Mark Thomas <markbt@efaref.net> writes:
> 
> > This is a proof-of-concept, so it is in no way complete.  It contains a
> > few hacks to make it work, but these can be ironed out with a bit more
> > work.  What I have so far is sufficient to try out the idea.
> 
> Two things that immediately come to mind (which may or may not be
> real issues) are 
> 
>  (1) What (if any) security model you have in mind.
> 
>      From object-confidentiality's point of view, this needs to be
>      enabled only on a host that allows
>      uploadpack.allowAnySHA1InWant but even riskier.
> 
>      From DoS point of view, you can make a short 40-byte request to
>      cause the other side emit megabytes of stuff.  I do not think
>      it is a new problem (anybody can repeatedly request a clone of
>      large stuff), but there may be new ramifications.
> 
>  (2) If the interface to ask just one object kills the whole idea
>      due to roundtrip latency.
> 
>      You may want to be able to say "I want all objects reachable
>      from this tree; please give me a packfile of needed objects
>      assuming that I have all objects reachable from this other tree
>      (or these other trees)".

Not just latency, but you also lose all of the benefits of delta
compression. So if I asked for:

  git log -p -- foo.c

and git is going to fault in all of the various versions of foo.c over
time, it's _much_ more efficient to batch them into a single request, so
that the server can reuse on-disk deltas between the various versions.
That makes the transmission smaller, and it also makes it more likely
for the server to be able to transmit the bits straight off the disk
(rather than assembling each delta itself then zlib-compressing the
result).

Similarly, there's a latency tension in just finding out whether an
object exists. When we call has_sha1_file() as part of a fetch, for
example, we really want to be able to answer it quickly. So you'd
probably want some mechanism to say "tell me the sha1, type, and size"
of each object I _could_ get via upload-file. The size of that data is
far from trivial for a large repository, but you're probably better off
getting it once than paying the latency cost to fetch it piecemeal.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-07 12:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
2017-03-04 19:18 ` [RFC 1/4] upload-file: Add upload-file command Mark Thomas
2017-03-04 19:18 ` [RFC 2/4] on-demand: Fetch missing files from remote Mark Thomas
2017-03-04 19:19 ` [RFC 3/4] upload-pack: Send all commits if client requests on-demand Mark Thomas
2017-03-04 19:19 ` [RFC 4/4] clone: Request on-demand shallow clones Mark Thomas
2017-03-06 19:16 ` [RFC 0/4] Shallow clones with on-demand fetch Jonathan Tan
2017-03-06 20:01   ` Stefan Beller
2017-03-06 19:18 ` Junio C Hamano
2017-03-07  9:42   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).