git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	jrnieder@gmail.com, sbeller@google.com, git@jeffhostetler.com,
	peartben@gmail.com, philipoakley@iee.org
Subject: [RFC PATCH v2 4/4] sha1_file: support promised object hook
Date: Wed, 19 Jul 2017 17:21:47 -0700	[thread overview]
Message-ID: <a7fa848672281e29bff01f8caf2abe9f6ce5f4ba.1500508695.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1500508695.git.jonathantanmy@google.com>
In-Reply-To: <cover.1500508695.git.jonathantanmy@google.com>

Teach sha1_file to invoke a hook whenever an object is requested and
unavailable but is promised. The hook is a shell command that can be
configured through "git config"; this hook takes in a list of hashes and
writes (if successful) the corresponding objects to the repo's local
storage.

The usage of the hook can be suppressed through a flag when invoking
has_object_file_with_flags() and other similar functions.
parse_or_promise_object() in object.c requires this functionality, and
has been modified to use it.

This is meant as a temporary measure to ensure that all Git commands
work in such a situation. Future patches will update some commands to
either tolerate promised objects (without invoking the hook) or be more
efficient in invoking the promised objects hook.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in sha1_file that operate on packed objects (because I
     need to check callers that know about the loose/packed distinction
     and operate on both differently, and ensure that they can handle
     the concept of objects that are neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and at the packed-related
functions that take in a hash. For for_each_packed_object, the callers
either already work or are fixed in this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about promised objects
 - builtin/cat-file - fixed in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
 - find_pack_entry_one
   - this searches a single pack that is provided as an argument; the
     caller already knows (through other means) that the sought object
     is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a
     file if it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promised objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

An alternative design that I considered but rejected:

 - Adding a hook whenever a packed object is requested, not on any
   object.  That is, whenever we attempt to search the packfiles for an
   object, if it is missing (from the packfiles and from the loose
   object storage), to invoke the hook (which must then store it as a
   packfile), open the packfile the hook generated, and report that the
   object is found in that new packfile. This reduces the amount of
   analysis needed (in that we only need to look at how packed objects
   are handled), but requires that the hook generate packfiles (or for
   sha1_file to pack whatever loose objects are generated), creating one
   packfile for each missing object and potentially very many packfiles
   that must be linearly searched. This may be tolerable now for repos
   that only have a few missing objects (for example, repos that only
   want to exclude large blobs), and might be tolerable in the future if
   we have batching support for the most commonly used commands, but is
   not tolerable now for repos that exclude a large amount of objects.

Helped-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt                         |   8 +
 Documentation/gitrepository-layout.txt           |   8 +
 Documentation/technical/read-object-protocol.txt | 102 ++++++++++++
 builtin/cat-file.c                               |   9 ++
 cache.h                                          |   2 +
 object.c                                         |   3 +-
 promised-object.c                                | 194 +++++++++++++++++++++++
 promised-object.h                                |  12 ++
 sha1_file.c                                      |  44 +++--
 t/t3907-promised-object.sh                       |  32 ++++
 t/t3907/read-object                              | 114 +++++++++++++
 11 files changed, 513 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/technical/read-object-protocol.txt
 create mode 100755 t/t3907/read-object

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d5c9c4cab..c293ac921 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -393,6 +393,14 @@ The default is false, except linkgit:git-clone[1] or linkgit:git-init[1]
 will probe and set core.ignoreCase true if appropriate when the repository
 is created.
 
+core.promisedBlobCommand::
+	If set, whenever a blob in the local repo is attempted to be read, but
+	is both missing and a promised blob, invoke this shell command to
+	generate or obtain that blob before reporting an error. This shell
+	command should take one or more hashes, each terminated by a newline,
+	as standard input, and (if successful) should write the corresponding
+	objects to the local repo (packed or loose).
+
 core.precomposeUnicode::
 	This option is only used by Mac OS implementation of Git.
 	When core.precomposeUnicode=true, Git reverts the unicode decomposition
diff --git a/Documentation/gitrepository-layout.txt b/Documentation/gitrepository-layout.txt
index f51ed4e37..7dea7fe6b 100644
--- a/Documentation/gitrepository-layout.txt
+++ b/Documentation/gitrepository-layout.txt
@@ -47,6 +47,10 @@ use with dumb transports but otherwise is OK as long as
 `objects/info/alternates` points at the object stores it
 borrows from.
 +
+. You could have objects that are merely promised by another source.
+When Git requires those objects, it will invoke the command in the
+`extensions.promisedObjects` configuration variable.
++
 This directory is ignored if $GIT_COMMON_DIR is set and
 "$GIT_COMMON_DIR/objects" will be used instead.
 
@@ -91,6 +95,10 @@ objects/info/http-alternates::
 	this object store borrows objects from, to be used when
 	the repository is fetched over HTTP.
 
+objects/promised::
+	This file records the sha1 object names, types, and sizes of
+	promised objects.
+
 refs::
 	References are stored in subdirectories of this
 	directory.  The 'git prune' command knows to preserve
diff --git a/Documentation/technical/read-object-protocol.txt b/Documentation/technical/read-object-protocol.txt
new file mode 100644
index 000000000..a893b46e7
--- /dev/null
+++ b/Documentation/technical/read-object-protocol.txt
@@ -0,0 +1,102 @@
+Read Object Process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The read-object process enables Git to read all missing blobs with a
+single process invocation for the entire life of a single Git command.
+This is achieved by using a packet format (pkt-line, see technical/
+protocol-common.txt) based protocol over standard input and standard
+output as follows. All packets, except for the "*CONTENT" packets and
+the "0000" flush packet, are considered text and therefore are
+terminated by a LF.
+
+Git starts the process when it encounters the first missing object that
+needs to be retrieved. After the process is started, Git sends a welcome
+message ("git-read-object-client"), a list of supported protocol version
+numbers, and a flush packet. Git expects to read a welcome response
+message ("git-read-object-server"), exactly one protocol version number
+from the previously sent list, and a flush packet. All further
+communication will be based on the selected version.
+
+The remaining protocol description below documents "version=1". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities that
+it supports and a flush packet. Git expects to read a list of desired
+capabilities, which must be a subset of the supported capabilities list,
+and a flush packet as response:
+------------------------
+packet: git> git-read-object-client
+packet: git> version=1
+packet: git> version=42
+packet: git> 0000
+packet: git< git-read-object-server
+packet: git< version=1
+packet: git< 0000
+packet: git> capability=get
+packet: git> capability=have
+packet: git> capability=put
+packet: git> capability=not-yet-invented
+packet: git> 0000
+packet: git< capability=get
+packet: git< 0000
+------------------------
+The only supported capability in version 1 is "get".
+
+Afterwards Git sends a list of "key=value" pairs terminated with a flush
+packet. The list will contain at least the command (based on the
+supported capabilities) and the sha1 of the object to retrieve. Please
+note, that the process must not send any response before it received the
+final flush packet.
+
+When the process receives the "get" command, it should make the requested
+object available in the git object store and then return success. Git will
+then check the object store again and this time find it and proceed.
+------------------------
+packet: git> command=get
+packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
+packet: git> 0000
+------------------------
+
+The process is expected to respond with a list of "key=value" pairs
+terminated with a flush packet. If the process does not experience
+problems then the list must contain a "success" status.
+------------------------
+packet: git< status=success
+packet: git< 0000
+------------------------
+
+In case the process cannot or does not want to process the content, it
+is expected to respond with an "error" status.
+------------------------
+packet: git< status=error
+packet: git< 0000
+------------------------
+
+In case the process cannot or does not want to process the content as
+well as any future content for the lifetime of the Git process, then it
+is expected to respond with an "abort" status at any point in the
+protocol.
+------------------------
+packet: git< status=abort
+packet: git< 0000
+------------------------
+
+Git neither stops nor restarts the process in case the "error"/"abort"
+status is set.
+
+If the process dies during the communication or does not adhere to the
+protocol then Git will stop the process and restart it with the next
+object that needs to be processed.
+
+After the read-object process has processed an object it is expected to
+wait for the next "key=value" list containing a command. Git will close
+the command pipe on exit. The process is expected to detect EOF and exit
+gracefully on its own. Git will wait until the process has stopped.
+
+A long running read-object process demo implementation can be found in
+`contrib/long-running-read-object/example.pl` located in the Git core
+repository. If you develop your own long running process then the
+`GIT_TRACE_PACKET` environment variables can be very helpful for
+debugging (see linkgit:git[1]).
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 96b786e48..33f636926 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -12,6 +12,7 @@
 #include "streaming.h"
 #include "tree-walk.h"
 #include "sha1-array.h"
+#include "promised-object.h"
 
 struct batch_options {
 	int enabled;
@@ -432,6 +433,13 @@ static int batch_packed_object(const struct object_id *oid,
 	return 0;
 }
 
+static int batch_promised_object(const struct object_id *oid,
+				 void *data)
+{
+	oid_array_append(data, oid);
+	return 0;
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -473,6 +481,7 @@ static int batch_objects(struct batch_options *opt)
 
 		for_each_loose_object(batch_loose_object, &sa, 0);
 		for_each_packed_object(batch_packed_object, &sa, 0);
+		for_each_promised_object(batch_promised_object, &sa);
 
 		cb.opt = opt;
 		cb.expand = &data;
diff --git a/cache.h b/cache.h
index dd94b5ffc..75b71f38b 100644
--- a/cache.h
+++ b/cache.h
@@ -1835,6 +1835,8 @@ struct object_info {
 #define OBJECT_INFO_SKIP_CACHED 4
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
+/* Ignore list of promised objects */
+#define OBJECT_INFO_IGNORE_PROMISES 16
 extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags);
 extern int packed_object_info(struct packed_git *pack, off_t offset, struct object_info *);
 
diff --git a/object.c b/object.c
index 0aeb95084..23f2a6cbc 100644
--- a/object.c
+++ b/object.c
@@ -285,7 +285,8 @@ struct object *parse_or_promise_object(const struct object_id *oid)
 {
 	enum object_type type;
 
-	if (has_object_file(oid))
+	if (has_object_file_with_flags(oid, OBJECT_INFO_SKIP_CACHED |
+					    OBJECT_INFO_IGNORE_PROMISES))
 		return parse_object(oid);
 
 	if (is_promised_object(oid, &type, NULL)) {
diff --git a/promised-object.c b/promised-object.c
index 487ade437..d8d95ebb2 100644
--- a/promised-object.c
+++ b/promised-object.c
@@ -2,6 +2,12 @@
 #include "promised-object.h"
 #include "sha1-lookup.h"
 #include "strbuf.h"
+#include "run-command.h"
+#include "sha1-array.h"
+#include "config.h"
+#include "sigchain.h"
+#include "sub-process.h"
+#include "pkt-line.h"
 
 #define ENTRY_SIZE (GIT_SHA1_RAWSZ + 1 + 8)
 /*
@@ -128,3 +134,191 @@ int fsck_promised_objects(void)
 	}
 	return 0;
 }
+
+#define CAP_GET    (1u<<0)
+
+static int subprocess_map_initialized;
+static struct hashmap subprocess_map;
+
+struct read_object_process {
+	struct subprocess_entry subprocess;
+	unsigned int supported_capabilities;
+};
+
+int start_read_object_fn(struct subprocess_entry *subprocess)
+{
+	int err;
+	struct read_object_process *entry = (struct read_object_process *)subprocess;
+	struct child_process *process;
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+	char *cap_buf;
+	const char *cap_name;
+
+	process = subprocess_get_child_process(&entry->subprocess);
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_writel(process->in, "git-read-object-client", "version=1", NULL);
+	if (err)
+		goto done;
+
+	err = strcmp(packet_read_line(process->out, NULL), "git-read-object-server");
+	if (err) {
+		error("external process '%s' does not support read-object protocol version 1", subprocess->cmd);
+		goto done;
+	}
+	err = strcmp(packet_read_line(process->out, NULL), "version=1");
+	if (err)
+		goto done;
+	err = packet_read_line(process->out, NULL) != NULL;
+	if (err)
+		goto done;
+
+	err = packet_writel(process->in, "capability=get", NULL);
+	if (err)
+		goto done;
+
+	for (;;) {
+		cap_buf = packet_read_line(process->out, NULL);
+		if (!cap_buf)
+			break;
+		string_list_split_in_place(&cap_list, cap_buf, '=', 1);
+
+		if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, "capability"))
+			continue;
+
+		cap_name = cap_list.items[1].string;
+		if (!strcmp(cap_name, "get")) {
+			entry->supported_capabilities |= CAP_GET;
+		}
+		else {
+			warning(
+				"external process '%s' requested unsupported read-object capability '%s'",
+				subprocess->cmd, cap_name
+			);
+		}
+
+		string_list_clear(&cap_list, 0);
+	}
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE)
+		return err ? err : errno;
+
+	return 0;
+}
+
+static int read_object_process(const unsigned char *sha1)
+{
+	int err;
+	struct read_object_process *entry;
+	struct child_process *process;
+	struct strbuf status = STRBUF_INIT;
+	uint64_t start;
+
+	start = getnanotime();
+
+	if (!repository_format_promised_objects)
+		die("BUG: if extensions.promisedObjects is not set, there "
+		    "should not be any promised objects");
+
+	if (!subprocess_map_initialized) {
+		subprocess_map_initialized = 1;
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
+		entry = NULL;
+	} else {
+		entry = (struct read_object_process *)subprocess_find_entry(&subprocess_map, repository_format_promised_objects);
+	}
+	if (!entry) {
+		entry = xmalloc(sizeof(*entry));
+		entry->supported_capabilities = 0;
+
+		if (subprocess_start(&subprocess_map, &entry->subprocess, repository_format_promised_objects, start_read_object_fn)) {
+			free(entry);
+			return -1;
+		}
+	}
+	process = subprocess_get_child_process(&entry->subprocess);
+
+	if (!(CAP_GET & entry->supported_capabilities))
+		return -1;
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_write_fmt_gently(process->in, "command=get\n");
+	if (err)
+		goto done;
+
+	err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
+	if (err)
+		goto done;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		goto done;
+
+	err = subprocess_read_status(process->out, &status);
+	err = err ? err : strcmp(status.buf, "success");
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE) {
+		err = err ? err : errno;
+		if (!strcmp(status.buf, "error")) {
+			/* The process signaled a problem with the file. */
+		}
+		else if (!strcmp(status.buf, "abort")) {
+			/*
+			 * The process signaled a permanent problem. Don't try to read
+			 * objects with the same command for the lifetime of the current
+			 * Git process.
+			 */
+			entry->supported_capabilities &= ~CAP_GET;
+		}
+		else {
+			/*
+			 * Something went wrong with the read-object process.
+			 * Force shutdown and restart if needed.
+			 */
+			error("external process '%s' failed", repository_format_promised_objects);
+			subprocess_stop(&subprocess_map, (struct subprocess_entry *)entry);
+			free(entry);
+		}
+	}
+
+	trace_performance_since(start, "read_object_process");
+
+	return err;
+}
+
+int request_promised_objects(const struct oid_array *oids)
+{
+	int oids_requested = 0;
+	int i;
+
+	for (i = 0; i < oids->nr; i++) {
+		if (is_promised_object(&oids->oid[i], NULL, NULL))
+			break;
+	}
+
+	if (i == oids->nr)
+		/* Nothing to fetch */
+		return 0;
+
+	for (; i < oids->nr; i++) {
+		if (is_promised_object(&oids->oid[i], NULL, NULL)) {
+			read_object_process(oids->oid[i].hash);
+			oids_requested++;
+		}
+	}
+
+	/*
+	 * The command above may have updated packfiles, so update our record
+	 * of them.
+	 */
+	reprepare_packed_git();
+	return oids_requested;
+}
diff --git a/promised-object.h b/promised-object.h
index 7eaedff17..8ad47aa4c 100644
--- a/promised-object.h
+++ b/promised-object.h
@@ -2,6 +2,7 @@
 #define PROMISED_OBJECT_H
 
 #include "cache.h"
+#include "sha1-array.h"
 
 /*
  * Returns 1 if oid is the name of a promised object. For non-blobs, 0 is
@@ -19,4 +20,15 @@ int for_each_promised_object(each_promised_object_fn, void *);
  */
 int fsck_promised_objects(void);
 
+/*
+ * If any of the given objects are promised objects, invokes
+ * core.promisedobjectcommand with those objects and returns the number of
+ * objects requested. No check is made as to whether the invocation actually
+ * populated the repository with the promised objects.
+ *
+ * If none of the given objects are promised objects, this function does not
+ * invoke anything and returns 0.
+ */
+int request_promised_objects(const struct oid_array *oids);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 5862386cd..ded0ef46b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -28,6 +28,11 @@
 #include "list.h"
 #include "mergesort.h"
 #include "quote.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+#include "sha1-lookup.h"
+#include "promised-object.h"
+#include "sha1-array.h"
 
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
@@ -2983,6 +2988,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
 				    lookup_replace_object(sha1) :
 				    sha1;
+	int already_retried = 0;
 
 	if (!oi)
 		oi = &blank_oi;
@@ -3007,30 +3013,40 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		}
 	}
 
-	if (!find_pack_entry(real, &e)) {
-		/* Most likely it's a loose object. */
-		if (!sha1_loose_object_info(real, oi, flags)) {
-			oi->whence = OI_LOOSE;
-			return 0;
-		}
+retry:
+	if (find_pack_entry(real, &e))
+		goto found_packed;
 
-		/* Not a loose object; someone else may have just packed it. */
-		if (flags & OBJECT_INFO_QUICK) {
-			return -1;
-		} else {
-			reprepare_packed_git();
-			if (!find_pack_entry(real, &e))
-				return -1;
+	/* Most likely it's a loose object. */
+	if (!sha1_loose_object_info(real, oi, flags)) {
+		oi->whence = OI_LOOSE;
+		return 0;
+	}
+
+	/* Not a loose object; someone else may have just packed it. */
+	reprepare_packed_git();
+	if (find_pack_entry(real, &e))
+		goto found_packed;
+
+	/* Check if it is a promised blob */
+	if (!already_retried && !(flags & OBJECT_INFO_IGNORE_PROMISES)) {
+		struct oid_array promised = OID_ARRAY_INIT;
+		oid_array_append_sha1(&promised, real);
+		if (request_promised_objects(&promised)) {
+			already_retried = 1;
+			goto retry;
 		}
 	}
 
+	return -1;
+
+found_packed:
 	if (oi == &blank_oi)
 		/*
 		 * We know that the caller doesn't actually need the
 		 * information below, so return early.
 		 */
 		return 0;
-
 	rtype = packed_object_info(e.p, e.offset, oi);
 	if (rtype < 0) {
 		mark_bad_packed_object(e.p, real);
diff --git a/t/t3907-promised-object.sh b/t/t3907-promised-object.sh
index 3e0caf4f9..d9e6a6486 100755
--- a/t/t3907-promised-object.sh
+++ b/t/t3907-promised-object.sh
@@ -38,4 +38,36 @@ test_expect_success '...but fails again with GIT_IGNORE_PROMISED_OBJECTS' '
 	unset GIT_IGNORE_PROMISED_OBJECTS
 '
 
+test_expect_success 'sha1_object_info_extended (through git cat-file)' '
+	test_create_repo server &&
+	test_commit -C server 1 1.t abcdefgh &&
+	HASH=$(git hash-object server/1.t) &&
+
+	test_create_repo client &&
+	test_must_fail git -C client cat-file -p "$HASH"
+'
+
+test_expect_success '...succeeds if it is a promised object' '
+	printf "%s03%016x" "$HASH" "$(wc -c <server/1.t)" |
+		hex_pack >client/.git/objects/promised &&
+	git -C client config core.repositoryformatversion 1 &&
+	git -C client config extensions.promisedobjects \
+		"\"$TEST_DIRECTORY/t3907/read-object\" \"$(pwd)/server/.git\"" &&
+	git -C client cat-file -p "$HASH"
+'
+
+test_expect_success 'cat-file --batch-all-objects with promised objects' '
+	rm -rf client &&
+	test_create_repo client &&
+	git -C client config core.repositoryformatversion 1 &&
+	git -C client config extensions.promisedobjects \
+		"\"$TEST_DIRECTORY/t3907/read-object\" \"$(pwd)/server/.git\"" &&
+	printf "%s03%016x" "$HASH" "$(wc -c <server/1.t)" |
+		hex_pack >client/.git/objects/promised &&
+
+	# Verify that the promised object is printed
+	git -C client cat-file --batch --batch-all-objects | tee out |
+		grep abcdefgh
+'
+
 test_done
diff --git a/t/t3907/read-object b/t/t3907/read-object
new file mode 100755
index 000000000..9666ad597
--- /dev/null
+++ b/t/t3907/read-object
@@ -0,0 +1,114 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#	cut -d' ' -f1 | git pack-objects /guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard <sha from rev-parse call above>
+#
+# Please note, this sample is a minimal skeleton. No proper error handling
+# was implemented.
+#
+
+use strict;
+use warnings;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = $ARGV[0];
+
+sub packet_bin_read {
+	my $buffer;
+	my $bytes_read = read STDIN, $buffer, 4;
+	if ( $bytes_read == 0 ) {
+
+		# EOF - Git stopped talking to us!
+		exit();
+	}
+	elsif ( $bytes_read != 4 ) {
+		die "invalid packet: '$buffer'";
+	}
+	my $pkt_size = hex($buffer);
+	if ( $pkt_size == 0 ) {
+		return ( 1, "" );
+	}
+	elsif ( $pkt_size > 4 ) {
+		my $content_size = $pkt_size - 4;
+		$bytes_read = read STDIN, $buffer, $content_size;
+		if ( $bytes_read != $content_size ) {
+			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
+		}
+		return ( 0, $buffer );
+	}
+	else {
+		die "invalid packet size: $pkt_size";
+	}
+}
+
+sub packet_txt_read {
+	my ( $res, $buf ) = packet_bin_read();
+	unless ( $buf =~ s/\n$// ) {
+		die "A non-binary line MUST be terminated by an LF.";
+	}
+	return ( $res, $buf );
+}
+
+sub packet_bin_write {
+	my $buf = shift;
+	print STDOUT sprintf( "%04x", length($buf) + 4 );
+	print STDOUT $buf;
+	STDOUT->flush();
+}
+
+sub packet_txt_write {
+	packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+	print STDOUT sprintf( "%04x", 0 );
+	STDOUT->flush();
+}
+
+( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad initialize";
+( packet_txt_read() eq ( 0, "version=1" ) )				 || die "bad version";
+( packet_bin_read() eq ( 1, "" ) )                       || die "bad version end";
+
+packet_txt_write("git-read-object-server");
+packet_txt_write("version=1");
+packet_flush();
+
+( packet_txt_read() eq ( 0, "capability=get" ) )    || die "bad capability";
+( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";
+
+packet_txt_write("capability=get");
+packet_flush();
+
+while (1) {
+	my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
+
+	if ( $command eq "get" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
+		packet_txt_write(($?) ? "status=error" : "status=success");
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.0.rc0.284.gd933b75aa4-goog


  parent reply	other threads:[~2017-07-20  0:22 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11 19:48 [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Jonathan Tan
2017-07-11 19:48 ` [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs Jonathan Tan
2017-07-11 22:02   ` Stefan Beller
2017-07-19 23:37     ` Jonathan Tan
2017-07-12 17:29   ` Jeff Hostetler
2017-07-12 19:28     ` Jonathan Nieder
2017-07-13 14:48       ` Jeff Hostetler
2017-07-13 15:05         ` Jeff Hostetler
2017-07-13 19:39     ` Jonathan Tan
2017-07-14 20:03       ` Jeff Hostetler
2017-07-14 21:30         ` Jonathan Nieder
2017-07-11 19:48 ` [RFC PATCH 2/3] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-11 22:06   ` Stefan Beller
2017-07-19 23:56     ` Jonathan Tan
2017-07-20  0:06       ` Stefan Beller
2017-07-11 19:48 ` [RFC PATCH 3/3] sha1_file: add promised blob hook support Jonathan Tan
2017-07-11 22:38   ` Stefan Beller
2017-07-12 17:40   ` Ben Peart
2017-07-12 20:38     ` Jonathan Nieder
2017-07-16 15:23 ` [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Philip Oakley
2017-07-17 17:43   ` Ben Peart
2017-07-25 20:48     ` Philip Oakley
2017-07-17 18:03   ` Jonathan Nieder
2017-07-29 12:51     ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 0/4] Partial clone: promised objects (not only blobs) Jonathan Tan
2017-07-20  0:21 ` [RFC PATCH v2 1/4] object: remove "used" field from struct object Jonathan Tan
2017-07-20  0:36   ` Stefan Beller
2017-07-20  0:55     ` Jonathan Tan
2017-07-20 17:44       ` Ben Peart
2017-07-20 21:20   ` Junio C Hamano
2017-07-20  0:21 ` [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects Jonathan Tan
2017-07-20 18:07   ` Stefan Beller
2017-07-20 19:17     ` Jonathan Tan
2017-07-20 19:58   ` Ben Peart
2017-07-20 21:13     ` Jonathan Tan
2017-07-21 16:24       ` Ben Peart
2017-07-21 20:33         ` Jonathan Tan
2017-07-25 15:10           ` Ben Peart
2017-07-29 13:26             ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 3/4] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-20  0:21 ` Jonathan Tan [this message]
2017-07-20 18:23   ` [RFC PATCH v2 4/4] sha1_file: support promised object hook Stefan Beller
2017-07-20 20:58     ` Ben Peart
2017-07-20 21:18       ` Jonathan Tan
2017-07-21 16:27         ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7fa848672281e29bff01f8caf2abe9f6ce5f4ba.1500508695.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peartben@gmail.com \
    --cc=philipoakley@iee.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).