git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/40] Promisor remotes and external ODB support
@ 2018-01-03 16:33 Christian Couder
  2018-01-03 16:33 ` [PATCH 01/40] Add initial external odb support Christian Couder
                   ` (39 more replies)
  0 siblings, 40 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This is an early patch series that start to merge the
jh/fsck-promisors patch series (which is currently in pu) with the
external odb patch series.

The merge is not complete and there is still work needed, but all the
tests pass and in my opinion this shows that it is a good way forward
to share the same mechanism to handle (many) remote object stores and
the related fsck and gc problems.

The jh/partial-clone (a separate patch series on top of
jh/fsck-promisors) still needs some work before it can be used on top
of this series. I rebased it on top but the tests do not pass yet. The
result of my rebase and current attempt to fix tests is here:

https://github.com/chriscool/git/commits/gl-partial-clone-rebased

This patch series does not include the last part of the previous
external odb series which was about adding an `--inital-refspec`
option to `git clone`.

A few promisor related links
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

v6 partial clone part 2:
https://public-inbox.org/git/20171205165854.64979-1-git@jeffhostetler.com/

v7 partial clone part 3:
https://public-inbox.org/git/20171208155851.855-1-git@jeffhostetler.com/

External odb related links
~~~~~~~~~~~~~~~~~~~~~~~~~~

Peff started to work on external odbs some years ago:

http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207040
http://thread.gmane.org/gmane.comp.version-control.git/247171
http://thread.gmane.org/gmane.comp.version-control.git/202902/focus=203020

His work, which is not compile-tested any more, is still there:

https://github.com/peff/git/commits/jk/external-odb-wip

Initial discussions about external odbs are there:

http://thread.gmane.org/gmane.comp.version-control.git/288151/focus=295160

Version 1, 2, 3, 4, 5 and 6 of the external odbs series are here:

https://public-inbox.org/git/20160613085546.11784-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20160628181933.24620-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20161130210420.15982-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20170620075523.26961-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20170803091926.1755-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20170916080731.13925-1-chriscool@tuxfamily.org/

Some of the discussions related to Ben Peart's work that is used by
this series are here:

https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170322165220.5660-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/

Version 1, 2, 3, 4, 5 and 6 of the external odbs series are here:

https://github.com/chriscool/git/commits/gl-external-odb12
https://github.com/chriscool/git/commits/gl-external-odb22
https://github.com/chriscool/git/commits/gl-external-odb61
https://github.com/chriscool/git/commits/gl-external-odb239
https://github.com/chriscool/git/commits/gl-external-odb373
https://github.com/chriscool/git/commits/gl-external-odb411

A patch series to add Git/Packet.pm that is now in master is also
related:

https://public-inbox.org/git/20171019123030.17338-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20171105213836.11717-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20171110132200.7871-1-chriscool@tuxfamily.org/


https://public-inbox.org/git/20171019123030.17338-1-chriscool@tuxfamily.org/

Ben Peart (1):
  Add t0450 to test 'get_direct' mechanism

Christian Couder (30):
  Add initial external odb support
  Add GIT_NO_EXTERNAL_ODB env variable
  external-odb: add has_external_odb()
  external-odb: implement external_odb_get_direct
  sha1_file: prepare for external odbs
  external-odb: add script mode support
  odb-helper: add 'enum odb_helper_type'
  odb-helper: add odb_helper_init() to send 'init' instruction
  t0400: add 'put_raw_obj' instruction to odb-helper script
  external odb: add 'put_raw_obj' support
  external-odb: accept only blobs for now
  t0400: add test for external odb write support
  Add t0410 to test external ODB transfer
  lib-httpd: pass config file to start_httpd()
  lib-httpd: add upload.sh
  lib-httpd: add list.sh
  lib-httpd: add apache-e-odb.conf
  odb-helper: add odb_helper_get_raw_object()
  pack-objects: don't pack objects in external odbs
  Add t0420 to test transfer to HTTP external odb
  external-odb: add 'get_direct' support
  odb-helper: add 'script_mode' to 'struct odb_helper'
  odb-helper: add init_object_process()
  Add t0460 to test passing git objects
  odb-helper: add put_object_process()
  Add t0470 to test passing raw objects
  odb-helper: add have_object_process()
  Add t0480 to test "have" capability and raw objects
  external-odb: use 'odb=magic' attribute to mark odb blobs
  Add Documentation/technical/external-odb.txt

Jonathan Tan (9):
  fsck: introduce promisor objects
  fsck: support refs pointing to promisor objects
  fsck: support referenced promisor objects
  fsck: support promisor objects as CLI argument
  index-pack: refactor writing of .keep files
  introduce fetch-object: fetch one promisor object
  sha1_file: support lazily fetching missing objects
  rev-list: support termination at promisor objects
  gc: do not repack promisor packfiles

 Documentation/git-pack-objects.txt       |   11 +
 Documentation/gitremote-helpers.txt      |    7 +
 Documentation/rev-list-options.txt       |   11 +
 Documentation/technical/external-odb.txt |  342 ++++++++++
 Makefile                                 |    3 +
 builtin/cat-file.c                       |    3 +
 builtin/fetch-pack.c                     |   10 +
 builtin/fsck.c                           |   26 +-
 builtin/gc.c                             |    4 +
 builtin/index-pack.c                     |  111 +--
 builtin/pack-objects.c                   |   41 +-
 builtin/prune.c                          |    7 +
 builtin/repack.c                         |    9 +-
 builtin/rev-list.c                       |   69 +-
 cache.h                                  |   29 +-
 environment.c                            |    4 +
 external-odb.c                           |  204 ++++++
 external-odb.h                           |   13 +
 fetch-object.c                           |   31 +
 fetch-object.h                           |    6 +
 fetch-pack.c                             |   48 +-
 fetch-pack.h                             |    8 +
 list-objects.c                           |   29 +-
 object.c                                 |    2 +-
 odb-helper.c                             | 1079 ++++++++++++++++++++++++++++++
 odb-helper.h                             |   53 ++
 packfile.c                               |   78 ++-
 packfile.h                               |   13 +
 remote-curl.c                            |   14 +-
 revision.c                               |   33 +-
 revision.h                               |    5 +-
 sha1_file.c                              |  139 +++-
 t/lib-httpd.sh                           |    8 +-
 t/lib-httpd/apache-e-odb.conf            |  214 ++++++
 t/lib-httpd/list.sh                      |   41 ++
 t/lib-httpd/upload.sh                    |   45 ++
 t/t0400-external-odb.sh                  |   83 +++
 t/t0410-partial-clone.sh                 |  343 ++++++++++
 t/t0410-transfer-e-odb.sh                |  147 ++++
 t/t0420-transfer-http-e-odb.sh           |  145 ++++
 t/t0450-read-object.sh                   |   28 +
 t/t0450/read-object                      |   68 ++
 t/t0460-read-object-git.sh               |   28 +
 t/t0460/read-object-git                  |   78 +++
 t/t0470-read-object-http-e-odb.sh        |  112 ++++
 t/t0470/read-object-plain                |   83 +++
 t/t0480-read-object-have-http-e-odb.sh   |  112 ++++
 t/t0480/read-object-plain-have           |  103 +++
 transport.c                              |    8 +
 transport.h                              |   11 +
 50 files changed, 3978 insertions(+), 121 deletions(-)
 create mode 100644 Documentation/technical/external-odb.txt
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100644 t/lib-httpd/apache-e-odb.conf
 create mode 100644 t/lib-httpd/list.sh
 create mode 100644 t/lib-httpd/upload.sh
 create mode 100755 t/t0400-external-odb.sh
 create mode 100755 t/t0410-partial-clone.sh
 create mode 100755 t/t0410-transfer-e-odb.sh
 create mode 100755 t/t0420-transfer-http-e-odb.sh
 create mode 100755 t/t0450-read-object.sh
 create mode 100755 t/t0450/read-object
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git
 create mode 100755 t/t0470-read-object-http-e-odb.sh
 create mode 100755 t/t0470/read-object-plain
 create mode 100755 t/t0480-read-object-have-http-e-odb.sh
 create mode 100755 t/t0480/read-object-plain-have

-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 01/40] Add initial external odb support
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-04 19:59   ` Jeff Hostetler
  2018-01-03 16:33 ` [PATCH 02/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
                   ` (38 subsequent siblings)
  39 siblings, 1 reply; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

The external-odb.{c,h} files will contain the functions
that are called by the rest of Git mostly from
"sha1_file.c" to access the objects managed by the
external odbs.

The odb-helper.{c,h} files will contain the functions to
actually implement communication with either the internal
functions or the external scripts or processes that will
manage and provide external git objects.

For now only infrastructure to create helpers from the
config and to manage a cache for the 'have' command is
implemented.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Makefile       |  2 ++
 cache.h        |  1 +
 external-odb.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 external-odb.h |  7 ++++++
 odb-helper.c   | 54 +++++++++++++++++++++++++++++++++++++++++++
 odb-helper.h   | 24 ++++++++++++++++++++
 sha1_file.c    | 19 +++++++++++++++-
 7 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h

diff --git a/Makefile b/Makefile
index 2a81ae22e9..07694185c9 100644
--- a/Makefile
+++ b/Makefile
@@ -799,6 +799,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += external-odb.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += fsmonitor.o
@@ -834,6 +835,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += odb-helper.o
 LIB_OBJS += oidmap.o
 LIB_OBJS += oidset.o
 LIB_OBJS += packfile.o
diff --git a/cache.h b/cache.h
index a2ec8c0b55..21af6442af 100644
--- a/cache.h
+++ b/cache.h
@@ -1587,6 +1587,7 @@ extern void prepare_alt_odb(void);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
+extern void prepare_external_alt_odb(void);
 
 /*
  * Allocate a "struct alternate_object_database" but do _not_ actually
diff --git a/external-odb.c b/external-odb.c
new file mode 100644
index 0000000000..f3ea491333
--- /dev/null
+++ b/external-odb.c
@@ -0,0 +1,72 @@
+#include "cache.h"
+#include "external-odb.h"
+#include "odb-helper.h"
+#include "config.h"
+
+static struct odb_helper *helpers;
+static struct odb_helper **helpers_tail = &helpers;
+
+static struct odb_helper *find_or_create_helper(const char *name, int len)
+{
+	struct odb_helper *o;
+
+	for (o = helpers; o; o = o->next)
+		if (!strncmp(o->name, name, len) && !o->name[len])
+			return o;
+
+	o = odb_helper_new(name, len);
+	*helpers_tail = o;
+	helpers_tail = &o->next;
+
+	return o;
+}
+
+static int external_odb_config(const char *var, const char *value, void *data)
+{
+	struct odb_helper *o;
+	const char *name;
+	int namelen;
+	const char *subkey;
+
+	if (parse_config_key(var, "odb", &name, &namelen, &subkey) < 0)
+		return 0;
+
+	o = find_or_create_helper(name, namelen);
+
+	if (!strcmp(subkey, "promisorremote"))
+		return git_config_string(&o->dealer, var, value);
+
+	return 0;
+}
+
+static void external_odb_init(void)
+{
+	static int initialized;
+
+	if (initialized)
+		return;
+	initialized = 1;
+
+	git_config(external_odb_config, NULL);
+}
+
+const char *external_odb_root(void)
+{
+	static const char *root;
+	if (!root)
+		root = git_pathdup("objects/external");
+	return root;
+}
+
+int external_odb_has_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next)
+		if (odb_helper_has_object(o, sha1))
+			return 1;
+	return 0;
+}
+
diff --git a/external-odb.h b/external-odb.h
new file mode 100644
index 0000000000..ae2b228792
--- /dev/null
+++ b/external-odb.h
@@ -0,0 +1,7 @@
+#ifndef EXTERNAL_ODB_H
+#define EXTERNAL_ODB_H
+
+extern const char *external_odb_root(void);
+extern int external_odb_has_object(const unsigned char *sha1);
+
+#endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
new file mode 100644
index 0000000000..1404393807
--- /dev/null
+++ b/odb-helper.c
@@ -0,0 +1,54 @@
+#include "cache.h"
+#include "object.h"
+#include "argv-array.h"
+#include "odb-helper.h"
+#include "run-command.h"
+#include "sha1-lookup.h"
+
+struct odb_helper *odb_helper_new(const char *name, int namelen)
+{
+	struct odb_helper *o;
+
+	o = xcalloc(1, sizeof(*o));
+	o->name = xmemdupz(name, namelen);
+
+	return o;
+}
+
+struct odb_helper_cmd {
+	struct argv_array argv;
+	struct child_process child;
+};
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+	if (o->have_valid)
+		return;
+	o->have_valid = 1;
+
+	/* TODO */
+}
+
+static const unsigned char *have_sha1_access(size_t index, void *table)
+{
+	struct odb_helper_object *have = table;
+	return have[index].sha1;
+}
+
+static struct odb_helper_object *odb_helper_lookup(struct odb_helper *o,
+						   const unsigned char *sha1)
+{
+	int idx;
+
+	odb_helper_load_have(o);
+	idx = sha1_pos(sha1, o->have, o->have_nr, have_sha1_access);
+	if (idx < 0)
+		return NULL;
+	return &o->have[idx];
+}
+
+int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
+{
+	return !!odb_helper_lookup(o, sha1);
+}
+
diff --git a/odb-helper.h b/odb-helper.h
new file mode 100644
index 0000000000..9395e606ce
--- /dev/null
+++ b/odb-helper.h
@@ -0,0 +1,24 @@
+#ifndef ODB_HELPER_H
+#define ODB_HELPER_H
+
+struct odb_helper {
+	const char *name;
+	const char *dealer;
+
+	struct odb_helper_object {
+		unsigned char sha1[20];
+		unsigned long size;
+		enum object_type type;
+	} *have;
+	int have_nr;
+	int have_alloc;
+	int have_valid;
+
+	struct odb_helper *next;
+};
+
+extern struct odb_helper *odb_helper_new(const char *name, int namelen);
+extern int odb_helper_has_object(struct odb_helper *o,
+				 const unsigned char *sha1);
+
+#endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index 3da70ac650..3f5ff274e2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -29,6 +29,7 @@
 #include "mergesort.h"
 #include "quote.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -669,6 +670,21 @@ int foreach_alt_odb(alt_odb_fn fn, void *cb)
 	return r;
 }
 
+void prepare_external_alt_odb(void)
+{
+	static int linked_external;
+	const char *path;
+
+	if (linked_external)
+		return;
+
+	path = external_odb_root();
+	if (!access(path, F_OK)) {
+		link_alt_odb_entry(path, NULL, 0, "");
+		linked_external = 1;
+	}
+}
+
 void prepare_alt_odb(void)
 {
 	const char *alt;
@@ -682,6 +698,7 @@ void prepare_alt_odb(void)
 	link_alt_odb_entries(alt, PATH_SEP, NULL, 0);
 
 	read_info_alternates(get_object_directory(), 0);
+	prepare_external_alt_odb();
 }
 
 /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
@@ -722,7 +739,7 @@ static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
 		if (check_and_freshen_file(path, freshen))
 			return 1;
 	}
-	return 0;
+	return external_odb_has_object(sha1);
 }
 
 static int check_and_freshen(const unsigned char *sha1, int freshen)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 02/40] Add GIT_NO_EXTERNAL_ODB env variable
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
  2018-01-03 16:33 ` [PATCH 01/40] Add initial external odb support Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 03/40] external-odb: add has_external_odb() Christian Couder
                   ` (37 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This new environment variable will be used to perform git
commands without involving any external odb mechanism.

This makes it possible for example to create new blobs that
will not be sent to an external odb even if the external odb
supports "put_*" instructions.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h        | 9 +++++++++
 environment.c  | 4 ++++
 external-odb.c | 3 +--
 sha1_file.c    | 3 +++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 21af6442af..7b27abac35 100644
--- a/cache.h
+++ b/cache.h
@@ -437,6 +437,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
 #define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
 #define GIT_REPLACE_REF_BASE_ENVIRONMENT "GIT_REPLACE_REF_BASE"
+#define NO_EXTERNAL_ODB_ENVIRONMENT "GIT_NO_EXTERNAL_ODB"
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
@@ -813,6 +814,14 @@ void reset_shared_repository(void);
 extern int check_replace_refs;
 extern char *git_replace_ref_base;
 
+/*
+ * Do external odbs need to be used this run?  This variable is
+ * initialized to true unless $GIT_NO_EXTERNAL_ODB is set, but it
+ * maybe set to false by some commands that do not want external
+ * odbs to be active.
+ */
+extern int use_external_odb;
+
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
diff --git a/environment.c b/environment.c
index 63ac38a46f..b3bd0daae2 100644
--- a/environment.c
+++ b/environment.c
@@ -48,6 +48,7 @@ const char *excludes_file;
 enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
 int check_replace_refs = 1;
 char *git_replace_ref_base;
+int use_external_odb = 1;
 enum eol core_eol = EOL_UNSET;
 enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
 unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
@@ -117,6 +118,7 @@ const char * const local_repo_env[] = {
 	INDEX_ENVIRONMENT,
 	NO_REPLACE_OBJECTS_ENVIRONMENT,
 	GIT_REPLACE_REF_BASE_ENVIRONMENT,
+	NO_EXTERNAL_ODB_ENVIRONMENT,
 	GIT_PREFIX_ENVIRONMENT,
 	GIT_SUPER_PREFIX_ENVIRONMENT,
 	GIT_SHALLOW_FILE_ENVIRONMENT,
@@ -156,6 +158,8 @@ void setup_git_env(void)
 	free(git_replace_ref_base);
 	git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
 							  : "refs/replace/");
+	if (getenv(NO_EXTERNAL_ODB_ENVIRONMENT))
+		use_external_odb = 0;
 	free(namespace);
 	namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
 	shallow_file = getenv(GIT_SHALLOW_FILE_ENVIRONMENT);
diff --git a/external-odb.c b/external-odb.c
index f3ea491333..390958dbfe 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -43,7 +43,7 @@ static void external_odb_init(void)
 {
 	static int initialized;
 
-	if (initialized)
+	if (initialized || !use_external_odb)
 		return;
 	initialized = 1;
 
@@ -69,4 +69,3 @@ int external_odb_has_object(const unsigned char *sha1)
 			return 1;
 	return 0;
 }
-
diff --git a/sha1_file.c b/sha1_file.c
index 3f5ff274e2..cba6b2a537 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -675,6 +675,9 @@ void prepare_external_alt_odb(void)
 	static int linked_external;
 	const char *path;
 
+	if (!use_external_odb)
+		return;
+
 	if (linked_external)
 		return;
 
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 03/40] external-odb: add has_external_odb()
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
  2018-01-03 16:33 ` [PATCH 01/40] Add initial external odb support Christian Couder
  2018-01-03 16:33 ` [PATCH 02/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 04/40] fsck: introduce promisor objects Christian Couder
                   ` (36 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This function will be used to check if the external odb
mechanism is actually used.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 7 +++++++
 external-odb.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 390958dbfe..d26e63d8b1 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -50,6 +50,13 @@ static void external_odb_init(void)
 	git_config(external_odb_config, NULL);
 }
 
+int has_external_odb(void)
+{
+	external_odb_init();
+
+	return !!helpers;
+}
+
 const char *external_odb_root(void)
 {
 	static const char *root;
diff --git a/external-odb.h b/external-odb.h
index ae2b228792..9a3c2f01b3 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -1,6 +1,7 @@
 #ifndef EXTERNAL_ODB_H
 #define EXTERNAL_ODB_H
 
+extern int has_external_odb(void);
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
 
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 04/40] fsck: introduce promisor objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (2 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 03/40] external-odb: add has_external_odb() Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 05/40] fsck: support refs pointing to " Christian Couder
                   ` (35 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fsck.c           |  2 +-
 cache.h                  |  3 +-
 packfile.c               | 78 ++++++++++++++++++++++++++++++++++++++++++++--
 packfile.h               | 13 ++++++++
 t/t0410-partial-clone.sh | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 172 insertions(+), 5 deletions(-)
 create mode 100755 t/t0410-partial-clone.sh

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 04846d46f9..793d289367 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid,
 					xstrfmt("%s@{%"PRItime"}", refname, timestamp));
 			obj->flags |= USED;
 			mark_object_reachable(obj);
-		} else {
+		} else if (!is_promisor_object(oid)) {
 			error("%s: invalid reflog entry %s", refname, oid_to_hex(oid));
 			errors_found |= ERROR_REACHABLE;
 		}
diff --git a/cache.h b/cache.h
index 7b27abac35..078607ee91 100644
--- a/cache.h
+++ b/cache.h
@@ -1649,7 +1649,8 @@ extern struct packed_git {
 	unsigned pack_local:1,
 		 pack_keep:1,
 		 freshened:1,
-		 do_not_close:1;
+		 do_not_close:1,
+		 pack_promisor:1;
 	unsigned char sha1[20];
 	struct revindex_entry *revindex;
 	/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/packfile.c b/packfile.c
index 4a5fe7ab18..aee6c3d674 100644
--- a/packfile.c
+++ b/packfile.c
@@ -8,6 +8,12 @@
 #include "list.h"
 #include "streaming.h"
 #include "sha1-lookup.h"
+#include "commit.h"
+#include "object.h"
+#include "tag.h"
+#include "tree-walk.h"
+#include "tree.h"
+#include "external-odb.h"
 
 char *odb_pack_name(struct strbuf *buf,
 		    const unsigned char *sha1,
@@ -643,10 +649,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
 		return NULL;
 
 	/*
-	 * ".pack" is long enough to hold any suffix we're adding (and
+	 * ".promisor" is long enough to hold any suffix we're adding (and
 	 * the use xsnprintf double-checks that)
 	 */
-	alloc = st_add3(path_len, strlen(".pack"), 1);
+	alloc = st_add3(path_len, strlen(".promisor"), 1);
 	p = alloc_packed_git(alloc);
 	memcpy(p->pack_name, path, path_len);
 
@@ -654,6 +660,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
 	if (!access(p->pack_name, F_OK))
 		p->pack_keep = 1;
 
+	xsnprintf(p->pack_name + path_len, alloc - path_len, ".promisor");
+	if (!access(p->pack_name, F_OK))
+		p->pack_promisor = 1;
+
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
 	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
 		free(p);
@@ -781,7 +791,8 @@ static void prepare_packed_git_one(char *objdir, int local)
 		if (ends_with(de->d_name, ".idx") ||
 		    ends_with(de->d_name, ".pack") ||
 		    ends_with(de->d_name, ".bitmap") ||
-		    ends_with(de->d_name, ".keep"))
+		    ends_with(de->d_name, ".keep") ||
+		    ends_with(de->d_name, ".promisor"))
 			string_list_append(&garbage, path.buf);
 		else
 			report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
@@ -1889,6 +1900,9 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 	for (p = packed_git; p; p = p->next) {
 		if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
 			continue;
+		if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) &&
+		    !p->pack_promisor)
+			continue;
 		if (open_pack_index(p)) {
 			pack_errors = 1;
 			continue;
@@ -1899,3 +1913,61 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 	}
 	return r ? r : pack_errors;
 }
+
+static int add_promisor_object(const struct object_id *oid,
+			       struct packed_git *pack,
+			       uint32_t pos,
+			       void *set_)
+{
+	struct oidset *set = set_;
+	struct object *obj = parse_object(oid);
+	if (!obj)
+		return 1;
+
+	oidset_insert(set, oid);
+
+	/*
+	 * If this is a tree, commit, or tag, the objects it refers
+	 * to are also promisor objects. (Blobs refer to no objects.)
+	 */
+	if (obj->type == OBJ_TREE) {
+		struct tree *tree = (struct tree *)obj;
+		struct tree_desc desc;
+		struct name_entry entry;
+		if (init_tree_desc_gently(&desc, tree->buffer, tree->size))
+			/*
+			 * Error messages are given when packs are
+			 * verified, so do not print any here.
+			 */
+			return 0;
+		while (tree_entry_gently(&desc, &entry))
+			oidset_insert(set, entry.oid);
+	} else if (obj->type == OBJ_COMMIT) {
+		struct commit *commit = (struct commit *) obj;
+		struct commit_list *parents = commit->parents;
+
+		oidset_insert(set, &commit->tree->object.oid);
+		for (; parents; parents = parents->next)
+			oidset_insert(set, &parents->item->object.oid);
+	} else if (obj->type == OBJ_TAG) {
+		struct tag *tag = (struct tag *) obj;
+		oidset_insert(set, &tag->tagged->oid);
+	}
+	return 0;
+}
+
+int is_promisor_object(const struct object_id *oid)
+{
+	static struct oidset promisor_objects;
+	static int promisor_objects_prepared;
+
+	if (!promisor_objects_prepared) {
+		if (has_external_odb()) {
+			for_each_packed_object(add_promisor_object,
+					       &promisor_objects,
+					       FOR_EACH_OBJECT_PROMISOR_ONLY);
+		}
+		promisor_objects_prepared = 1;
+	}
+	return oidset_contains(&promisor_objects, oid);
+}
diff --git a/packfile.h b/packfile.h
index 0cdeb54dcd..a7fca598d6 100644
--- a/packfile.h
+++ b/packfile.h
@@ -1,6 +1,8 @@
 #ifndef PACKFILE_H
 #define PACKFILE_H
 
+#include "oidset.h"
+
 /*
  * Generate the filename to be used for a pack file with checksum "sha1" and
  * extension "ext". The result is written into the strbuf "buf", overwriting
@@ -124,6 +126,11 @@ extern int has_sha1_pack(const unsigned char *sha1);
 
 extern int has_pack_index(const unsigned char *sha1);
 
+/*
+ * Only iterate over packs obtained from the promisor remote.
+ */
+#define FOR_EACH_OBJECT_PROMISOR_ONLY 2
+
 /*
  * Iterate over packed objects in both the local
  * repository and any alternates repositories (unless the
@@ -135,4 +142,10 @@ typedef int each_packed_object_fn(const struct object_id *oid,
 				  void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
+/*
+ * Return 1 if an object in a promisor packfile is or refers to the given
+ * object, 0 otherwise.
+ */
+extern int is_promisor_object(const struct object_id *oid);
+
 #endif
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
new file mode 100755
index 0000000000..9257b8c885
--- /dev/null
+++ b/t/t0410-partial-clone.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+test_description='partial clone'
+
+. ./test-lib.sh
+
+delete_object () {
+	rm $1/.git/objects/$(echo $2 | sed -e 's|^..|&/|')
+}
+
+pack_as_from_promisor () {
+	HASH=$(git -C repo pack-objects .git/objects/pack/pack) &&
+	>repo/.git/objects/pack/pack-$HASH.promisor
+}
+
+test_expect_success 'missing reflog object, but promised by a commit, passes fsck' '
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	C=$(git -C repo commit-tree -m c -p $A HEAD^{tree}) &&
+
+	# Reference $A only from reflog, and delete it
+	git -C repo branch my_branch "$A" &&
+	git -C repo branch -f my_branch my_commit &&
+	delete_object repo "$A" &&
+
+	# State that we got $C, which refers to $A, from promisor
+	printf "$C\n" | pack_as_from_promisor &&
+
+	# Normally, it fails
+	test_must_fail git -C repo fsck &&
+
+	# But with the extension, it succeeds
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo fsck
+'
+
+test_expect_success 'missing reflog object, but promised by a tag, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	git -C repo tag -a -m d my_tag_name $A &&
+	T=$(git -C repo rev-parse my_tag_name) &&
+	git -C repo tag -d my_tag_name &&
+
+	# Reference $A only from reflog, and delete it
+	git -C repo branch my_branch "$A" &&
+	git -C repo branch -f my_branch my_commit &&
+	delete_object repo "$A" &&
+
+	# State that we got $T, which refers to $A, from promisor
+	printf "$T\n" | pack_as_from_promisor &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo fsck
+'
+
+test_expect_success 'missing reflog object alone fails fsck, even with extension set' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	B=$(git -C repo commit-tree -m b HEAD^{tree}) &&
+
+	# Reference $A only from reflog, and delete it
+	git -C repo branch my_branch "$A" &&
+	git -C repo branch -f my_branch my_commit &&
+	delete_object repo "$A" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	test_must_fail git -C repo fsck
+'
+
+test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 05/40] fsck: support refs pointing to promisor objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (3 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 04/40] fsck: introduce promisor objects Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 06/40] fsck: support referenced " Christian Couder
                   ` (34 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach fsck to not treat refs referring to missing promisor objects as an
error when extensions.partialclone is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fsck.c           |  8 ++++++++
 t/t0410-partial-clone.sh | 24 ++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 793d289367..c6bb29d242 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -434,6 +434,14 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid,
 
 	obj = parse_object(oid);
 	if (!obj) {
+		if (is_promisor_object(oid)) {
+			/*
+			 * Increment default_refs anyway, because this is a
+			 * valid ref.
+			 */
+			 default_refs++;
+			 return 0;
+		}
 		error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
 		errors_found |= ERROR_REACHABLE;
 		/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 9257b8c885..c4639e1134 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -13,6 +13,14 @@ pack_as_from_promisor () {
 	>repo/.git/objects/pack/pack-$HASH.promisor
 }
 
+promise_and_delete () {
+	HASH=$(git -C repo rev-parse "$1") &&
+	git -C repo tag -a -m message my_annotated_tag "$HASH" &&
+	git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
+	git -C repo tag -d my_annotated_tag &&
+	delete_object repo "$HASH"
+}
+
 test_expect_success 'missing reflog object, but promised by a commit, passes fsck' '
 	test_create_repo repo &&
 	test_commit -C repo my_commit &&
@@ -78,4 +86,20 @@ test_expect_success 'missing reflog object alone fails fsck, even with extension
 	test_must_fail git -C repo fsck
 '
 
+test_expect_success 'missing ref object, but promised, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+	# Reference $A only from ref
+	git -C repo branch my_branch "$A" &&
+	promise_and_delete "$A" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo fsck
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 06/40] fsck: support referenced promisor objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (4 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 05/40] fsck: support refs pointing to " Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 07/40] fsck: support promisor objects as CLI argument Christian Couder
                   ` (33 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach fsck to not treat missing promisor objects indirectly pointed to
by refs as an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fsck.c           | 11 +++++++++++
 t/t0410-partial-clone.sh | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index c6bb29d242..b8bcb0e40c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt
 	if (obj->flags & REACHABLE)
 		return 0;
 	obj->flags |= REACHABLE;
+
+	if (is_promisor_object(&obj->oid))
+		/*
+		 * Further recursion does not need to be performed on this
+		 * object since it is a promisor object (so it does not need to
+		 * be added to "pending").
+		 */
+		return 0;
+
 	if (!(obj->flags & HAS_OBJ)) {
 		if (parent && !has_object_file(&obj->oid)) {
 			printf("broken link from %7s %s\n",
@@ -208,6 +217,8 @@ static void check_reachable_object(struct object *obj)
 	 * do a full fsck
 	 */
 	if (!(obj->flags & HAS_OBJ)) {
+		if (is_promisor_object(&obj->oid))
+			return;
 		if (has_sha1_pack(obj->oid.hash))
 			return; /* it is in pack - forget about it */
 		printf("missing %s %s\n", printable_type(obj),
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index c4639e1134..46c88e8dfa 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -102,4 +102,27 @@ test_expect_success 'missing ref object, but promised, passes fsck' '
 	git -C repo fsck
 '
 
+test_expect_success 'missing object, but promised, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+	test_commit -C repo 3 &&
+	git -C repo tag -a annotated_tag -m "annotated tag" &&
+
+	C=$(git -C repo rev-parse 1) &&
+	T=$(git -C repo rev-parse 2^{tree}) &&
+	B=$(git hash-object repo/3.t) &&
+	AT=$(git -C repo rev-parse annotated_tag) &&
+
+	promise_and_delete "$C" &&
+	promise_and_delete "$T" &&
+	promise_and_delete "$B" &&
+	promise_and_delete "$AT" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo fsck
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 07/40] fsck: support promisor objects as CLI argument
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (5 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 06/40] fsck: support referenced " Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 08/40] index-pack: refactor writing of .keep files Christian Couder
                   ` (32 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach fsck to not treat missing promisor objects provided on the CLI as
an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fsck.c           |  2 ++
 t/t0410-partial-clone.sh | 13 +++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b8bcb0e40c..a6fa6d6482 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -750,6 +750,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			struct object *obj = lookup_object(oid.hash);
 
 			if (!obj || !(obj->flags & HAS_OBJ)) {
+				if (is_promisor_object(&oid))
+					continue;
 				error("%s: object missing", oid_to_hex(&oid));
 				errors_found |= ERROR_OBJECT;
 				continue;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 46c88e8dfa..a0f901fa1d 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -125,4 +125,17 @@ test_expect_success 'missing object, but promised, passes fsck' '
 	git -C repo fsck
 '
 
+test_expect_success 'missing CLI object, but promised, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	promise_and_delete "$A" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo fsck "$A"
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 08/40] index-pack: refactor writing of .keep files
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (6 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 07/40] fsck: support promisor objects as CLI argument Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 09/40] introduce fetch-object: fetch one promisor object Christian Couder
                   ` (31 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 99 ++++++++++++++++++++++++++++------------------------
 1 file changed, 53 insertions(+), 46 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4c51aec81f..47515d8977 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1389,15 +1389,58 @@ static void fix_unresolved_deltas(struct sha1file *f)
 	free(sorted_by_pos);
 }
 
+static const char *derive_filename(const char *pack_name, const char *suffix,
+				   struct strbuf *buf)
+{
+	size_t len;
+	if (!strip_suffix(pack_name, ".pack", &len))
+		die(_("packfile name '%s' does not end with '.pack'"),
+		    pack_name);
+	strbuf_add(buf, pack_name, len);
+	strbuf_addch(buf, '.');
+	strbuf_addstr(buf, suffix);
+	return buf->buf;
+}
+
+static void write_special_file(const char *suffix, const char *msg,
+			       const char *pack_name, const unsigned char *sha1,
+			       const char **report)
+{
+	struct strbuf name_buf = STRBUF_INIT;
+	const char *filename;
+	int fd;
+	int msg_len = strlen(msg);
+
+	if (pack_name)
+		filename = derive_filename(pack_name, suffix, &name_buf);
+	else
+		filename = odb_pack_name(&name_buf, sha1, suffix);
+
+	fd = odb_pack_keep(filename);
+	if (fd < 0) {
+		if (errno != EEXIST)
+			die_errno(_("cannot write %s file '%s'"),
+				  suffix, filename);
+	} else {
+		if (msg_len > 0) {
+			write_or_die(fd, msg, msg_len);
+			write_or_die(fd, "\n", 1);
+		}
+		if (close(fd) != 0)
+			die_errno(_("cannot close written %s file '%s'"),
+				  suffix, filename);
+		*report = suffix;
+	}
+	strbuf_release(&name_buf);
+}
+
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
-		  const char *keep_name, const char *keep_msg,
-		  unsigned char *sha1)
+		  const char *keep_msg, unsigned char *sha1)
 {
 	const char *report = "pack";
 	struct strbuf pack_name = STRBUF_INIT;
 	struct strbuf index_name = STRBUF_INIT;
-	struct strbuf keep_name_buf = STRBUF_INIT;
 	int err;
 
 	if (!from_stdin) {
@@ -1409,28 +1452,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 			die_errno(_("error while closing pack file"));
 	}
 
-	if (keep_msg) {
-		int keep_fd, keep_msg_len = strlen(keep_msg);
-
-		if (!keep_name)
-			keep_name = odb_pack_name(&keep_name_buf, sha1, "keep");
-
-		keep_fd = odb_pack_keep(keep_name);
-		if (keep_fd < 0) {
-			if (errno != EEXIST)
-				die_errno(_("cannot write keep file '%s'"),
-					  keep_name);
-		} else {
-			if (keep_msg_len > 0) {
-				write_or_die(keep_fd, keep_msg, keep_msg_len);
-				write_or_die(keep_fd, "\n", 1);
-			}
-			if (close(keep_fd) != 0)
-				die_errno(_("cannot close written keep file '%s'"),
-					  keep_name);
-			report = "keep";
-		}
-	}
+	if (keep_msg)
+		write_special_file("keep", keep_msg, final_pack_name, sha1,
+				   &report);
 
 	if (final_pack_name != curr_pack_name) {
 		if (!final_pack_name)
@@ -1472,7 +1496,6 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 
 	strbuf_release(&index_name);
 	strbuf_release(&pack_name);
-	strbuf_release(&keep_name_buf);
 }
 
 static int git_index_pack_config(const char *k, const char *v, void *cb)
@@ -1615,26 +1638,13 @@ static void show_pack_info(int stat_only)
 	}
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-				   struct strbuf *buf)
-{
-	size_t len;
-	if (!strip_suffix(pack_name, ".pack", &len))
-		die(_("packfile name '%s' does not end with '.pack'"),
-		    pack_name);
-	strbuf_add(buf, pack_name, len);
-	strbuf_addstr(buf, suffix);
-	return buf->buf;
-}
-
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
 	const char *curr_index;
 	const char *index_name = NULL, *pack_name = NULL;
-	const char *keep_name = NULL, *keep_msg = NULL;
-	struct strbuf index_name_buf = STRBUF_INIT,
-		      keep_name_buf = STRBUF_INIT;
+	const char *keep_msg = NULL;
+	struct strbuf index_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
@@ -1740,9 +1750,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (from_stdin && !startup_info->have_repository)
 		die(_("--stdin requires a git repository"));
 	if (!index_name && pack_name)
-		index_name = derive_filename(pack_name, ".idx", &index_name_buf);
-	if (keep_msg && !keep_name && pack_name)
-		keep_name = derive_filename(pack_name, ".keep", &keep_name_buf);
+		index_name = derive_filename(pack_name, "idx", &index_name_buf);
 
 	if (verify) {
 		if (!index_name)
@@ -1790,13 +1798,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
-		      keep_name, keep_msg,
+		      keep_msg,
 		      pack_sha1);
 	else
 		close(input_fd);
 	free(objects);
 	strbuf_release(&index_name_buf);
-	strbuf_release(&keep_name_buf);
 	if (pack_name == NULL)
 		free((void *) curr_pack);
 	if (index_name == NULL)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 09/40] introduce fetch-object: fetch one promisor object
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (7 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 08/40] index-pack: refactor writing of .keep files Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 10/40] external-odb: implement external_odb_get_direct Christian Couder
                   ` (30 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/gitremote-helpers.txt |  7 ++++++
 Makefile                            |  1 +
 builtin/fetch-pack.c                |  8 +++++++
 builtin/index-pack.c                | 14 ++++++++---
 fetch-object.c                      | 28 ++++++++++++++++++++++
 fetch-object.h                      |  6 +++++
 fetch-pack.c                        | 48 +++++++++++++++++++++----------------
 fetch-pack.h                        |  8 +++++++
 remote-curl.c                       | 14 ++++++++++-
 transport.c                         |  8 +++++++
 transport.h                         | 11 +++++++++
 11 files changed, 128 insertions(+), 25 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 4a584f3c5d..4b8c93ec59 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -466,6 +466,13 @@ set by Git if the remote helper has the 'option' capability.
 	Transmit <string> as a push option. As the push option
 	must not contain LF or NUL characters, the string is not encoded.
 
+'option from-promisor' {'true'|'false'}::
+	Indicate that these objects are being fetched from a promisor.
+
+'option no-dependents' {'true'|'false'}::
+	Indicate that only the objects wanted need to be fetched, not
+	their dependents.
+
 SEE ALSO
 --------
 linkgit:git-remote[1]
diff --git a/Makefile b/Makefile
index 07694185c9..25b878279e 100644
--- a/Makefile
+++ b/Makefile
@@ -800,6 +800,7 @@ LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += external-odb.o
+LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += fsmonitor.o
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d13f9..02abe7211e 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.update_shallow = 1;
 			continue;
 		}
+		if (!strcmp("--from-promisor", arg)) {
+			args.from_promisor = 1;
+			continue;
+		}
+		if (!strcmp("--no-dependents", arg)) {
+			args.no_dependents = 1;
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 47515d8977..9dffaf20ae 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1429,14 +1429,16 @@ static void write_special_file(const char *suffix, const char *msg,
 		if (close(fd) != 0)
 			die_errno(_("cannot close written %s file '%s'"),
 				  suffix, filename);
-		*report = suffix;
+		if (report)
+			*report = suffix;
 	}
 	strbuf_release(&name_buf);
 }
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
-		  const char *keep_msg, unsigned char *sha1)
+		  const char *keep_msg, const char *promisor_msg,
+		  unsigned char *sha1)
 {
 	const char *report = "pack";
 	struct strbuf pack_name = STRBUF_INIT;
@@ -1455,6 +1457,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	if (keep_msg)
 		write_special_file("keep", keep_msg, final_pack_name, sha1,
 				   &report);
+	if (promisor_msg)
+		write_special_file("promisor", promisor_msg, final_pack_name,
+				   sha1, NULL);
 
 	if (final_pack_name != curr_pack_name) {
 		if (!final_pack_name)
@@ -1644,6 +1649,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	const char *curr_index;
 	const char *index_name = NULL, *pack_name = NULL;
 	const char *keep_msg = NULL;
+	const char *promisor_msg = NULL;
 	struct strbuf index_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
@@ -1688,6 +1694,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				stat_only = 1;
 			} else if (skip_to_optional_arg(arg, "--keep", &keep_msg)) {
 				; /* nothing to do */
+			} else if (skip_to_optional_arg(arg, "--promisor", &promisor_msg)) {
+				; /* nothing to do */
 			} else if (starts_with(arg, "--threads=")) {
 				char *end;
 				nr_threads = strtoul(arg+10, &end, 0);
@@ -1798,7 +1806,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
-		      keep_msg,
+		      keep_msg, promisor_msg,
 		      pack_sha1);
 	else
 		close(input_fd);
diff --git a/fetch-object.c b/fetch-object.c
new file mode 100644
index 0000000000..23ec2bb0d0
--- /dev/null
+++ b/fetch-object.c
@@ -0,0 +1,28 @@
+#include "cache.h"
+#include "packfile.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "transport.h"
+#include "fetch-object.h"
+
+/*
+ * TODO Investigate having fetch_object() return error/success,
+ * changing callers.
+ */
+void fetch_object(const char *remote_name, const unsigned char *sha1)
+{
+	struct remote *remote;
+	struct transport *transport;
+	struct ref *ref;
+
+	remote = remote_get(remote_name);
+	if (!remote->url[0])
+		die(_("Remote with no URL"));
+	transport = transport_get(remote, remote->url[0]);
+
+	ref = alloc_ref(sha1_to_hex(sha1));
+	hashcpy(ref->old_oid.hash, sha1);
+	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+	transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
+	transport_fetch_refs(transport, ref);
+}
diff --git a/fetch-object.h b/fetch-object.h
new file mode 100644
index 0000000000..f371300c88
--- /dev/null
+++ b/fetch-object.h
@@ -0,0 +1,6 @@
+#ifndef FETCH_OBJECT_H
+#define FETCH_OBJECT_H
+
+extern void fetch_object(const char *remote_name, const unsigned char *sha1);
+
+#endif
diff --git a/fetch-pack.c b/fetch-pack.c
index 9f6b07ad91..7aa1f58995 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -450,6 +450,8 @@ static int find_common(struct fetch_pack_args *args,
 
 	flushes = 0;
 	retval = -1;
+	if (args->no_dependents)
+		goto done;
 	while ((oid = get_rev())) {
 		packet_buf_write(&req_buf, "have %s\n", oid_to_hex(oid));
 		print_verbose(args, "have %s", oid_to_hex(oid));
@@ -735,29 +737,31 @@ static int everything_local(struct fetch_pack_args *args,
 		}
 	}
 
-	if (!args->deepen) {
-		for_each_ref(mark_complete_oid, NULL);
-		for_each_cached_alternate(mark_alternate_complete);
-		commit_list_sort_by_date(&complete);
-		if (cutoff)
-			mark_recent_complete_commits(args, cutoff);
-	}
+	if (!args->no_dependents) {
+		if (!args->deepen) {
+			for_each_ref(mark_complete_oid, NULL);
+			for_each_cached_alternate(mark_alternate_complete);
+			commit_list_sort_by_date(&complete);
+			if (cutoff)
+				mark_recent_complete_commits(args, cutoff);
+		}
 
-	/*
-	 * Mark all complete remote refs as common refs.
-	 * Don't mark them common yet; the server has to be told so first.
-	 */
-	for (ref = *refs; ref; ref = ref->next) {
-		struct object *o = deref_tag(lookup_object(ref->old_oid.hash),
-					     NULL, 0);
+		/*
+		 * Mark all complete remote refs as common refs.
+		 * Don't mark them common yet; the server has to be told so first.
+		 */
+		for (ref = *refs; ref; ref = ref->next) {
+			struct object *o = deref_tag(lookup_object(ref->old_oid.hash),
+						     NULL, 0);
 
-		if (!o || o->type != OBJ_COMMIT || !(o->flags & COMPLETE))
-			continue;
+			if (!o || o->type != OBJ_COMMIT || !(o->flags & COMPLETE))
+				continue;
 
-		if (!(o->flags & SEEN)) {
-			rev_list_push((struct commit *)o, COMMON_REF | SEEN);
+			if (!(o->flags & SEEN)) {
+				rev_list_push((struct commit *)o, COMMON_REF | SEEN);
 
-			mark_common((struct commit *)o, 1, 1);
+				mark_common((struct commit *)o, 1, 1);
+			}
 		}
 	}
 
@@ -833,7 +837,7 @@ static int get_pack(struct fetch_pack_args *args,
 		argv_array_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep) {
+	if (do_keep || args->from_promisor) {
 		if (pack_lockfile)
 			cmd.out = -1;
 		cmd_name = "index-pack";
@@ -843,7 +847,7 @@ static int get_pack(struct fetch_pack_args *args,
 			argv_array_push(&cmd.args, "-v");
 		if (args->use_thin_pack)
 			argv_array_push(&cmd.args, "--fix-thin");
-		if (args->lock_pack || unpack_limit) {
+		if (do_keep && (args->lock_pack || unpack_limit)) {
 			char hostname[HOST_NAME_MAX + 1];
 			if (xgethostname(hostname, sizeof(hostname)))
 				xsnprintf(hostname, sizeof(hostname), "localhost");
@@ -853,6 +857,8 @@ static int get_pack(struct fetch_pack_args *args,
 		}
 		if (args->check_self_contained_and_connected)
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
+		if (args->from_promisor)
+			argv_array_push(&cmd.args, "--promisor");
 	}
 	else {
 		cmd_name = "unpack-objects";
diff --git a/fetch-pack.h b/fetch-pack.h
index b6aeb43a8e..aeac152644 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -29,6 +29,14 @@ struct fetch_pack_args {
 	unsigned cloning:1;
 	unsigned update_shallow:1;
 	unsigned deepen:1;
+	unsigned from_promisor:1;
+
+	/*
+	 * If 1, fetch_pack() will also not modify any object flags.
+	 * This allows fetch_pack() to safely be called by any function,
+	 * regardless of which object flags it uses (if any).
+	 */
+	unsigned no_dependents:1;
 };
 
 /*
diff --git a/remote-curl.c b/remote-curl.c
index 0053b09549..431839111a 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -33,7 +33,9 @@ struct options {
 		thin : 1,
 		/* One of the SEND_PACK_PUSH_CERT_* constants. */
 		push_cert : 2,
-		deepen_relative : 1;
+		deepen_relative : 1,
+		from_promisor : 1,
+		no_dependents : 1;
 };
 static struct options options;
 static struct string_list cas_options = STRING_LIST_INIT_DUP;
@@ -157,6 +159,12 @@ static int set_option(const char *name, const char *value)
 			return -1;
 		return 0;
 #endif /* LIBCURL_VERSION_NUM >= 0x070a08 */
+	} else if (!strcmp(name, "from-promisor")) {
+		options.from_promisor = 1;
+		return 0;
+	} else if (!strcmp(name, "no-dependents")) {
+		options.no_dependents = 1;
+		return 0;
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -822,6 +830,10 @@ static int fetch_git(struct discovery *heads,
 				 options.deepen_not.items[i].string);
 	if (options.deepen_relative && options.depth)
 		argv_array_push(&args, "--deepen-relative");
+	if (options.from_promisor)
+		argv_array_push(&args, "--from-promisor");
+	if (options.no_dependents)
+		argv_array_push(&args, "--no-dependents");
 	argv_array_push(&args, url.buf);
 
 	for (i = 0; i < nr_heads; i++) {
diff --git a/transport.c b/transport.c
index fc802260f6..e82db773fd 100644
--- a/transport.c
+++ b/transport.c
@@ -161,6 +161,12 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
 		opts->deepen_relative = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FROM_PROMISOR)) {
+		opts->from_promisor = !!value;
+		return 0;
+	} else if (!strcmp(name, TRANS_OPT_NO_DEPENDENTS)) {
+		opts->no_dependents = !!value;
+		return 0;
 	}
 	return 1;
 }
@@ -229,6 +235,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 		data->options.check_self_contained_and_connected;
 	args.cloning = transport->cloning;
 	args.update_shallow = data->options.update_shallow;
+	args.from_promisor = data->options.from_promisor;
+	args.no_dependents = data->options.no_dependents;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index 731c78b679..8c3430a5b9 100644
--- a/transport.h
+++ b/transport.h
@@ -15,6 +15,8 @@ struct git_transport_options {
 	unsigned self_contained_and_connected : 1;
 	unsigned update_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned from_promisor : 1;
+	unsigned no_dependents : 1;
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
@@ -159,6 +161,15 @@ void transport_check_allowed(const char *type);
 /* Send push certificates */
 #define TRANS_OPT_PUSH_CERT "pushcert"
 
+/* Indicate that these objects are being fetched by a promisor */
+#define TRANS_OPT_FROM_PROMISOR "from-promisor"
+
+/*
+ * Indicate that only the objects wanted need to be fetched, not their
+ * dependents
+ */
+#define TRANS_OPT_NO_DEPENDENTS "no-dependents"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 10/40] external-odb: implement external_odb_get_direct
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (8 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 09/40] introduce fetch-object: fetch one promisor object Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-04 17:44   ` Jeff Hostetler
  2018-01-03 16:33 ` [PATCH 11/40] sha1_file: support lazily fetching missing objects Christian Couder
                   ` (29 subsequent siblings)
  39 siblings, 1 reply; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This is implemented only in the promisor remote mode
for now by calling fetch_object().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 15 +++++++++++++++
 external-odb.h |  1 +
 odb-helper.c   | 13 +++++++++++++
 odb-helper.h   |  3 ++-
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/external-odb.c b/external-odb.c
index d26e63d8b1..5d0afb9762 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -76,3 +76,18 @@ int external_odb_has_object(const unsigned char *sha1)
 			return 1;
 	return 0;
 }
+
+int external_odb_get_direct(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		if (odb_helper_get_direct(o, sha1) < 0)
+			continue;
+		return 0;
+	}
+
+	return -1;
+}
diff --git a/external-odb.h b/external-odb.h
index 9a3c2f01b3..fd6708163e 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,5 +4,6 @@
 extern int has_external_odb(void);
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
+extern int external_odb_get_direct(const unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index 1404393807..4b70b287af 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -4,6 +4,7 @@
 #include "odb-helper.h"
 #include "run-command.h"
 #include "sha1-lookup.h"
+#include "fetch-object.h"
 
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
@@ -52,3 +53,15 @@ int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
 	return !!odb_helper_lookup(o, sha1);
 }
 
+int odb_helper_get_direct(struct odb_helper *o,
+			  const unsigned char *sha1)
+{
+	int res = 0;
+	uint64_t start = getnanotime();
+
+	fetch_object(o->dealer, sha1);
+
+	trace_performance_since(start, "odb_helper_get_direct");
+
+	return res;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 9395e606ce..f4bc66b0ef 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -20,5 +20,6 @@ struct odb_helper {
 extern struct odb_helper *odb_helper_new(const char *name, int namelen);
 extern int odb_helper_has_object(struct odb_helper *o,
 				 const unsigned char *sha1);
-
+extern int odb_helper_get_direct(struct odb_helper *o,
+				 const unsigned char *sha1);
 #endif /* ODB_HELPER_H */
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 11/40] sha1_file: support lazily fetching missing objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (9 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 10/40] external-odb: implement external_odb_get_direct Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 12/40] rev-list: support termination at promisor objects Christian Couder
                   ` (28 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/cat-file.c       |  3 +++
 builtin/fetch-pack.c     |  2 ++
 builtin/fsck.c           |  3 +++
 builtin/index-pack.c     |  6 ++++++
 cache.h                  |  8 ++++++++
 fetch-object.c           |  3 +++
 sha1_file.c              | 28 ++++++++++++++++++--------
 t/t0410-partial-clone.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd75a..1e4edd81a0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -13,6 +13,7 @@
 #include "tree-walk.h"
 #include "sha1-array.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 struct batch_options {
 	int enabled;
@@ -475,6 +476,8 @@ static int batch_objects(struct batch_options *opt)
 
 		for_each_loose_object(batch_loose_object, &sa, 0);
 		for_each_packed_object(batch_packed_object, &sa, 0);
+		if (has_external_odb())
+			warning("This repository uses an odb. Some objects may not be loaded.");
 
 		cb.opt = opt;
 		cb.expand = &data;
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe7211e..15eeed7b17 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	struct oid_array shallow = OID_ARRAY_INIT;
 	struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+	fetch_if_missing = 0;
+
 	packet_trace_identity("fetch-pack");
 
 	memset(&args, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index a6fa6d6482..7a8a679d4f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	int i;
 	struct alternate_object_database *alt;
 
+	/* fsck knows how to handle missing promisor objects */
+	fetch_if_missing = 0;
+
 	errors_found = 0;
 	check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 9dffaf20ae..54c921fa71 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 	int report_end_of_input = 0;
 
+	/*
+	 * index-pack never needs to fetch missing objects, since it only
+	 * accesses the repo to do hash collision checks
+	 */
+	fetch_if_missing = 0;
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index 078607ee91..3fabf998ce 100644
--- a/cache.h
+++ b/cache.h
@@ -1789,6 +1789,14 @@ struct object_info {
 #define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags);
 
+/*
+ * Set this to 0 to prevent sha1_object_info_extended() from fetching missing
+ * blobs. This has a difference only if extensions.partialClone is set.
+ *
+ * Its default value is 1.
+ */
+extern int fetch_if_missing;
+
 /* Dumb servers support */
 extern int update_server_info(int);
 
diff --git a/fetch-object.c b/fetch-object.c
index 23ec2bb0d0..8afadeda2b 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -14,7 +14,9 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	struct remote *remote;
 	struct transport *transport;
 	struct ref *ref;
+	int original_fetch_if_missing = fetch_if_missing;
 
+	fetch_if_missing = 0;
 	remote = remote_get(remote_name);
 	if (!remote->url[0])
 		die(_("Remote with no URL"));
@@ -25,4 +27,5 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
 	transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
 	transport_fetch_refs(transport, ref);
+	fetch_if_missing = original_fetch_if_missing;
 }
diff --git a/sha1_file.c b/sha1_file.c
index cba6b2a537..261baf800f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1233,6 +1233,8 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	return (status < 0) ? status : 0;
 }
 
+int fetch_if_missing = 1;
+
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
@@ -1241,6 +1243,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
 				    lookup_replace_object(sha1) :
 				    sha1;
+	int already_retried = 0;
 
 	if (is_null_sha1(real))
 		return -1;
@@ -1268,19 +1271,29 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		}
 	}
 
-	if (!find_pack_entry(real, &e)) {
+	while (1) {
+		if (find_pack_entry(real, &e))
+			break;
+
 		/* Most likely it's a loose object. */
 		if (!sha1_loose_object_info(real, oi, flags))
 			return 0;
 
 		/* Not a loose object; someone else may have just packed it. */
-		if (flags & OBJECT_INFO_QUICK) {
-			return -1;
-		} else {
-			reprepare_packed_git();
-			if (!find_pack_entry(real, &e))
-				return -1;
+		reprepare_packed_git();
+		if (find_pack_entry(real, &e))
+			break;
+
+		/* Check if it is a missing object */
+		if (fetch_if_missing && has_external_odb() &&
+		    !already_retried) {
+			if (!external_odb_get_direct(real))
+				return 0;
+			already_retried = 1;
+			continue;
 		}
+
+		return -1;
 	}
 
 	if (oi == &blank_oi)
@@ -1289,7 +1302,6 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		 * information below, so return early.
 		 */
 		return 0;
-
 	rtype = packed_object_info(e.p, e.offset, oi);
 	if (rtype < 0) {
 		mark_bad_packed_object(e.p, real);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a0f901fa1d..8b20d18603 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -138,4 +138,55 @@ test_expect_success 'missing CLI object, but promised, passes fsck' '
 	git -C repo fsck "$A"
 '
 
+test_expect_success 'fetching of missing objects' '
+	rm -rf repo &&
+	test_create_repo server &&
+	test_commit -C server foo &&
+	git -C server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/server" repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fetching of missing objects from an HTTP server' '
+	rm -rf repo &&
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+	git -C "$SERVER" repack -a -d --write-bitmap-index &&
+
+	git clone $HTTPD_URL/smart/server repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+stop_httpd
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 12/40] rev-list: support termination at promisor objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (10 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 11/40] sha1_file: support lazily fetching missing objects Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 13/40] gc: do not repack promisor packfiles Christian Couder
                   ` (27 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/rev-list-options.txt |  11 ++++
 builtin/rev-list.c                 |  69 ++++++++++++++++++++++---
 list-objects.c                     |  29 ++++++++++-
 object.c                           |   2 +-
 revision.c                         |  33 +++++++++++-
 revision.h                         |   5 +-
 t/t0410-partial-clone.sh           | 101 +++++++++++++++++++++++++++++++++++++
 7 files changed, 239 insertions(+), 11 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 22f5c9b43d..7b273635de 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -750,10 +750,21 @@ The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
 +
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing objects will raise an error.
++
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 endif::git-rev-list[]
 
+--exclude-promisor-objects::
+	(For internal use only.)  Prefilter object traversal at
+	promisor boundary.  This is used with partial clone.  This is
+	stronger than `--missing=allow-promisor` because it limits the
+	traversal, rather than just silencing errors about missing
+	objects.
+
 --no-walk[=(sorted|unsorted)]::
 	Only show the given commits, but do not traverse their ancestors.
 	This has no effect if a range is specified. If the argument
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index d5345b6a2e..e27aa1fc07 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] <commit-id>... [ -- paths... ]\n"
@@ -67,6 +68,7 @@ enum missing_action {
 	MA_ERROR = 0,    /* fail if any missing objects are encountered */
 	MA_ALLOW_ANY,    /* silently allow ALL missing objects */
 	MA_PRINT,        /* print ALL missing objects in special section */
+	MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 
@@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void *data)
 
 static inline void finish_object__ma(struct object *obj)
 {
+	/*
+	 * Whether or not we try to dynamically fetch missing objects
+	 * from the server, we currently DO NOT have the object.  We
+	 * can either print, allow (ignore), or conditionally allow
+	 * (ignore) them.
+	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
 		die("missing blob object '%s'", oid_to_hex(&obj->oid));
@@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj)
 		oidset_insert(&missing_objects, &obj->oid);
 		return;
 
+	case MA_ALLOW_PROMISOR:
+		if (is_promisor_object(&obj->oid))
+			return;
+		die("unexpected missing blob object '%s'",
+		    oid_to_hex(&obj->oid));
+		return;
+
 	default:
 		BUG("unhandled missing_action");
 		return;
 	}
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
+		return 1;
+	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
 		parse_object(&obj->oid);
+	return 0;
 }
 
 static void show_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	finish_object(obj, name, cb_data);
+	if (finish_object(obj, name, cb_data))
+		return;
 	display_progress(progress, ++progress_counter);
 	if (info->flags & REV_LIST_QUIET)
 		return;
@@ -315,11 +334,19 @@ static inline int parse_missing_action_value(const char *value)
 
 	if (!strcmp(value, "allow-any")) {
 		arg_missing_action = MA_ALLOW_ANY;
+		fetch_if_missing = 0;
 		return 1;
 	}
 
 	if (!strcmp(value, "print")) {
 		arg_missing_action = MA_PRINT;
+		fetch_if_missing = 0;
+		return 1;
+	}
+
+	if (!strcmp(value, "allow-promisor")) {
+		arg_missing_action = MA_ALLOW_PROMISOR;
+		fetch_if_missing = 0;
 		return 1;
 	}
 
@@ -344,6 +371,35 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+
+	/*
+	 * Scan the argument list before invoking setup_revisions(), so that we
+	 * know if fetch_if_missing needs to be set to 0.
+	 *
+	 * "--exclude-promisor-objects" acts as a pre-filter on missing objects
+	 * by not crossing the boundary from realized objects to promisor
+	 * objects.
+	 *
+	 * Let "--missing" to conditionally set fetch_if_missing.
+	 */
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+		if (!strcmp(arg, "--exclude-promisor-objects")) {
+			fetch_if_missing = 0;
+			revs.exclude_promisor_objects = 1;
+			break;
+		}
+	}
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+		if (skip_prefix(arg, "--missing=", &arg)) {
+			if (revs.exclude_promisor_objects)
+				die(_("cannot combine --exclude-promisor-objects and --missing"));
+			if (parse_missing_action_value(arg))
+				break;
+		}
+	}
+
 	argc = setup_revisions(argc, argv, &revs, NULL);
 
 	memset(&info, 0, sizeof(info));
@@ -412,9 +468,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		if (skip_prefix(arg, "--missing=", &arg) &&
-		    parse_missing_action_value(arg))
-			continue;
+		if (!strcmp(arg, "--exclude-promisor-objects"))
+			continue; /* already handled above */
+		if (skip_prefix(arg, "--missing=", &arg))
+			continue; /* already handled above */
 
 		usage(rev_list_usage);
 
diff --git a/list-objects.c b/list-objects.c
index 0966cdc9fa..168bef688a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -9,6 +9,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "packfile.h"
 
 static void process_blob(struct rev_info *revs,
 			 struct blob *blob,
@@ -30,6 +31,20 @@ static void process_blob(struct rev_info *revs,
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
 
+	/*
+	 * Pre-filter known-missing objects when explicitly requested.
+	 * Otherwise, a missing object error message may be reported
+	 * later (depending on other filtering criteria).
+	 *
+	 * Note that this "--exclude-promisor-objects" pre-filtering
+	 * may cause the actual filter to report an incomplete list
+	 * of missing objects.
+	 */
+	if (revs->exclude_promisor_objects &&
+	    !has_object_file(&obj->oid) &&
+	    is_promisor_object(&obj->oid))
+		return;
+
 	pathlen = path->len;
 	strbuf_addstr(path, name);
 	if (filter_fn)
@@ -91,6 +106,8 @@ static void process_tree(struct rev_info *revs,
 		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int gently = revs->ignore_missing_links ||
+		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -98,9 +115,19 @@ static void process_tree(struct rev_info *revs,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, revs->ignore_missing_links) < 0) {
+	if (parse_tree_gently(tree, gently) < 0) {
 		if (revs->ignore_missing_links)
 			return;
+
+		/*
+		 * Pre-filter known-missing tree objects when explicitly
+		 * requested.  This may cause the actual filter to report
+		 * an incomplete list of missing objects.
+		 */
+		if (revs->exclude_promisor_objects &&
+		    is_promisor_object(&obj->oid))
+			return;
+
 		die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
diff --git a/object.c b/object.c
index b9a4a0e501..4c222d6260 100644
--- a/object.c
+++ b/object.c
@@ -252,7 +252,7 @@ struct object *parse_object(const struct object_id *oid)
 	if (obj && obj->parsed)
 		return obj;
 
-	if ((obj && obj->type == OBJ_BLOB) ||
+	if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
 	    (!obj && has_object_file(oid) &&
 	     sha1_object_info(oid->hash, NULL) == OBJ_BLOB)) {
 		if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
diff --git a/revision.c b/revision.c
index 72f2b4572e..f246728f2d 100644
--- a/revision.c
+++ b/revision.c
@@ -198,6 +198,8 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 	if (!object) {
 		if (revs->ignore_missing)
 			return object;
+		if (revs->exclude_promisor_objects && is_promisor_object(oid))
+			return NULL;
 		die("bad object %s", name);
 	}
 	object->flags |= flags;
@@ -799,9 +801,17 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
 
 	for (parent = commit->parents; parent; parent = parent->next) {
 		struct commit *p = parent->item;
-
-		if (parse_commit_gently(p, revs->ignore_missing_links) < 0)
+		int gently = revs->ignore_missing_links ||
+			     revs->exclude_promisor_objects;
+		if (parse_commit_gently(p, gently) < 0) {
+			if (revs->exclude_promisor_objects &&
+			    is_promisor_object(&p->object.oid)) {
+				if (revs->first_parent_only)
+					break;
+				continue;
+			}
 			return -1;
+		}
 		if (revs->show_source && !p->util)
 			p->util = commit->util;
 		p->object.flags |= left_flag;
@@ -2100,6 +2110,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--exclude-promisor-objects")) {
+		if (fetch_if_missing)
+			die("BUG: exclude_promisor_objects can only be used when fetch_if_missing is 0");
+		revs->exclude_promisor_objects = 1;
 	} else {
 		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
 		if (!opts)
@@ -2842,6 +2856,16 @@ void reset_revision_walk(void)
 	clear_object_flags(SEEN | ADDED | SHOWN);
 }
 
+static int mark_uninteresting(const struct object_id *oid,
+			      struct packed_git *pack,
+			      uint32_t pos,
+			      void *unused)
+{
+	struct object *o = parse_object(oid);
+	o->flags |= UNINTERESTING | SEEN;
+	return 0;
+}
+
 int prepare_revision_walk(struct rev_info *revs)
 {
 	int i;
@@ -2870,6 +2894,11 @@ int prepare_revision_walk(struct rev_info *revs)
 	    (revs->limited && limiting_can_increase_treesame(revs)))
 		revs->treesame.name = "treesame";
 
+	if (revs->exclude_promisor_objects) {
+		for_each_packed_object(mark_uninteresting, NULL,
+				       FOR_EACH_OBJECT_PROMISOR_ONLY);
+	}
+
 	if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED)
 		commit_list_sort_by_date(&revs->commits);
 	if (revs->no_walk)
diff --git a/revision.h b/revision.h
index 19dc9bdddb..eb14994316 100644
--- a/revision.h
+++ b/revision.h
@@ -122,7 +122,10 @@ struct rev_info {
 			ancestry_path:1,
 			first_parent_only:1,
 			line_level_traverse:1,
-			tree_blobs_in_commit_order:1;
+			tree_blobs_in_commit_order:1,
+
+			/* for internal use only */
+			exclude_promisor_objects:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 8b20d18603..4dc02459a5 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -160,6 +160,107 @@ test_expect_success 'fetching of missing objects' '
 	git verify-pack --verbose "$IDX" | grep "$HASH"
 '
 
+test_expect_success 'rev-list stops traversal at missing and promised commit' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+
+	FOO=$(git -C repo rev-parse foo) &&
+	promise_and_delete "$FOO" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects bar >out &&
+	grep $(git -C repo rev-parse bar) out &&
+	! grep $FOO out
+'
+
+test_expect_success 'rev-list stops traversal at missing and promised tree' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	mkdir repo/a_dir &&
+	echo something >repo/a_dir/something &&
+	git -C repo add a_dir/something &&
+	git -C repo commit -m bar &&
+
+	# foo^{tree} (tree referenced from commit)
+	TREE=$(git -C repo rev-parse foo^{tree}) &&
+
+	# a tree referenced by HEAD^{tree} (tree referenced from tree)
+	TREE2=$(git -C repo ls-tree HEAD^{tree} | grep " tree " | head -1 | cut -b13-52) &&
+
+	promise_and_delete "$TREE" &&
+	promise_and_delete "$TREE2" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	grep $(git -C repo rev-parse foo) out &&
+	! grep $TREE out &&
+	grep $(git -C repo rev-parse HEAD) out &&
+	! grep $TREE2 out
+'
+
+test_expect_success 'rev-list stops traversal at missing and promised blob' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	echo something >repo/something &&
+	git -C repo add something &&
+	git -C repo commit -m foo &&
+
+	BLOB=$(git -C repo hash-object -w something) &&
+	promise_and_delete "$BLOB" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	grep $(git -C repo rev-parse HEAD) out &&
+	! grep $BLOB out
+'
+
+test_expect_success 'rev-list stops traversal at promisor commit, tree, and blob' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	COMMIT=$(git -C repo rev-parse foo) &&
+	TREE=$(git -C repo rev-parse bar^{tree}) &&
+	BLOB=$(git hash-object repo/baz.t) &&
+	printf "%s\n%s\n%s\n" $COMMIT $TREE $BLOB | pack_as_from_promisor &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	! grep $COMMIT out &&
+	! grep $TREE out &&
+	! grep $BLOB out &&
+	grep $(git -C repo rev-parse bar) out  # sanity check that some walking was done
+'
+
+test_expect_success 'rev-list accepts missing and promised objects on command line' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	COMMIT=$(git -C repo rev-parse foo) &&
+	TREE=$(git -C repo rev-parse bar^{tree}) &&
+	BLOB=$(git hash-object repo/baz.t) &&
+
+	promise_and_delete $COMMIT &&
+	promise_and_delete $TREE &&
+	promise_and_delete $BLOB &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
+'
+
 LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 13/40] gc: do not repack promisor packfiles
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (11 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 12/40] rev-list: support termination at promisor objects Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 14/40] sha1_file: prepare for external odbs Christian Couder
                   ` (26 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-pack-objects.txt | 11 ++++++++
 builtin/gc.c                       |  4 +++
 builtin/pack-objects.c             | 37 ++++++++++++++++++++++++--
 builtin/prune.c                    |  7 +++++
 builtin/repack.c                   |  9 +++++--
 t/t0410-partial-clone.sh           | 54 ++++++++++++++++++++++++++++++++++++--
 6 files changed, 116 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index aa403d02f3..81bc490ac5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -255,6 +255,17 @@ a missing object is encountered.  This is the default action.
 The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
++
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing object will raise an error.
+
+--exclude-promisor-objects::
+	Omit objects that are known to be in the promisor remote.  (This
+	option has the purpose of operating only on locally created objects,
+	so that when we repack, we still maintain a distinction between
+	locally created objects [without .promisor] and objects from the
+	promisor remote [with .promisor].)  This is used with partial clone.
 
 SEE ALSO
 --------
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0edf..cef1461d1a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "argv-array.h"
 #include "commit.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -458,6 +459,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			argv_array_push(&prune, prune_expire);
 			if (quiet)
 				argv_array_push(&prune, "--no-progress");
+			if (has_external_odb())
+				argv_array_push(&prune,
+						"--exclude-promisor-objects");
 			if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
 				return error(FAILED_RUN, prune.argv[0]);
 		}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6b9cfc289d..6c71552cdf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -75,6 +75,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0;
 static struct list_objects_filter_options filter_options;
 
 enum missing_action {
-	MA_ERROR = 0,    /* fail if any missing objects are encountered */
-	MA_ALLOW_ANY,    /* silently allow ALL missing objects */
+	MA_ERROR = 0,      /* fail if any missing objects are encountered */
+	MA_ALLOW_ANY,      /* silently allow ALL missing objects */
+	MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
@@ -2578,6 +2581,20 @@ static void show_object__ma_allow_any(struct object *obj, const char *name, void
 	show_object(obj, name, data);
 }
 
+static void show_object__ma_allow_promisor(struct object *obj, const char *name, void *data)
+{
+	assert(arg_missing_action == MA_ALLOW_PROMISOR);
+
+	/*
+	 * Quietly ignore EXPECTED missing objects.  This avoids problems with
+	 * staging them now and getting an odd error later.
+	 */
+	if (!has_object_file(&obj->oid) && is_promisor_object(&obj->oid))
+		return;
+
+	show_object(obj, name, data);
+}
+
 static int option_parse_missing_action(const struct option *opt,
 				       const char *arg, int unset)
 {
@@ -2592,10 +2609,18 @@ static int option_parse_missing_action(const struct option *opt,
 
 	if (!strcmp(arg, "allow-any")) {
 		arg_missing_action = MA_ALLOW_ANY;
+		fetch_if_missing = 0;
 		fn_show_object = show_object__ma_allow_any;
 		return 0;
 	}
 
+	if (!strcmp(arg, "allow-promisor")) {
+		arg_missing_action = MA_ALLOW_PROMISOR;
+		fetch_if_missing = 0;
+		fn_show_object = show_object__ma_allow_promisor;
+		return 0;
+	}
+
 	die(_("invalid value for --missing"));
 	return 0;
 }
@@ -3009,6 +3034,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		{ OPTION_CALLBACK, 0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action },
+		OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
+			 N_("do not pack objects in promisor packfiles")),
 		OPT_END(),
 	};
 
@@ -3054,6 +3081,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		argv_array_push(&rp, "--unpacked");
 	}
 
+	if (exclude_promisor_objects) {
+		use_internal_rev_list = 1;
+		fetch_if_missing = 0;
+		argv_array_push(&rp, "--exclude-promisor-objects");
+	}
+
 	if (!reuse_object)
 		reuse_delta = 0;
 	if (pack_compression_level == -1)
diff --git a/builtin/prune.c b/builtin/prune.c
index d2fdae680a..4cfec82f40 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -101,12 +101,15 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
 	struct progress *progress = NULL;
+	int exclude_promisor_objects = 0;
 	const struct option options[] = {
 		OPT__DRY_RUN(&show_only, N_("do not remove, show only")),
 		OPT__VERBOSE(&verbose, N_("report pruned objects")),
 		OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
 		OPT_EXPIRY_DATE(0, "expire", &expire,
 				N_("expire objects older than <time>")),
+		OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
+			 N_("limit traversal to objects outside promisor packfiles")),
 		OPT_END()
 	};
 	char *s;
@@ -139,6 +142,10 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 		show_progress = isatty(2);
 	if (show_progress)
 		progress = start_delayed_progress(_("Checking connectivity"), 0);
+	if (exclude_promisor_objects) {
+		fetch_if_missing = 0;
+		revs.exclude_promisor_objects = 1;
+	}
 
 	mark_reachable_objects(&revs, 1, expire, progress);
 	stop_progress(&progress);
diff --git a/builtin/repack.c b/builtin/repack.c
index f17a68a17d..a6d9cfb92c 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "external-odb.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -83,7 +84,8 @@ static void remove_pack_on_signal(int signo)
 
 /*
  * Adds all packs hex strings to the fname list, which do not
- * have a corresponding .keep file.
+ * have a corresponding .keep or .promisor file. These packs are not to
+ * be kept if we are going to pack everything into one file.
  */
 static void get_non_kept_pack_filenames(struct string_list *fname_list)
 {
@@ -101,7 +103,8 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list)
 
 		fname = xmemdupz(e->d_name, len);
 
-		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
+		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)) &&
+		    !file_exists(mkpath("%s/%s.promisor", packdir, fname)))
 			string_list_append_nodup(fname_list, fname);
 		else
 			free(fname);
@@ -232,6 +235,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--all");
 	argv_array_push(&cmd.args, "--reflog");
 	argv_array_push(&cmd.args, "--indexed-objects");
+	if (has_external_odb())
+		argv_array_push(&cmd.args, "--exclude-promisor-objects");
 	if (window)
 		argv_array_pushf(&cmd.args, "--window=%s", window);
 	if (window_memory)
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4dc02459a5..6af4712da8 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -10,14 +10,16 @@ delete_object () {
 
 pack_as_from_promisor () {
 	HASH=$(git -C repo pack-objects .git/objects/pack/pack) &&
-	>repo/.git/objects/pack/pack-$HASH.promisor
+	>repo/.git/objects/pack/pack-$HASH.promisor &&
+	echo $HASH
 }
 
 promise_and_delete () {
 	HASH=$(git -C repo rev-parse "$1") &&
 	git -C repo tag -a -m message my_annotated_tag "$HASH" &&
 	git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
-	git -C repo tag -d my_annotated_tag &&
+	# tag -d prints a message to stdout, so redirect it
+	git -C repo tag -d my_annotated_tag >/dev/null &&
 	delete_object repo "$HASH"
 }
 
@@ -261,6 +263,54 @@ test_expect_success 'rev-list accepts missing and promised objects on command li
 	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
 '
 
+test_expect_success 'gc does not repack promisor objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
+	HASH=$(printf "$TREE_HASH\n" | pack_as_from_promisor) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo gc &&
+
+	# Ensure that the promisor packfile still exists, and remove it
+	test -e repo/.git/objects/pack/pack-$HASH.pack &&
+	rm repo/.git/objects/pack/pack-$HASH.* &&
+
+	# Ensure that the single other pack contains the commit, but not the tree
+	ls repo/.git/objects/pack/pack-*.pack >packlist &&
+	test_line_count = 1 packlist &&
+	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
+	grep "$(git -C repo rev-parse HEAD)" out &&
+	! grep "$TREE_HASH" out
+'
+
+test_expect_success 'gc stops traversal when a missing but promised object is reached' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
+	HASH=$(promise_and_delete $TREE_HASH) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
+	git -C repo gc &&
+
+	# Ensure that the promisor packfile still exists, and remove it
+	test -e repo/.git/objects/pack/pack-$HASH.pack &&
+	rm repo/.git/objects/pack/pack-$HASH.* &&
+
+	# Ensure that the single other pack contains the commit, but not the tree
+	ls repo/.git/objects/pack/pack-*.pack >packlist &&
+	test_line_count = 1 packlist &&
+	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
+	grep "$(git -C repo rev-parse HEAD)" out &&
+	! grep "$TREE_HASH" out
+'
+
 LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 14/40] sha1_file: prepare for external odbs
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (12 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 13/40] gc: do not repack promisor packfiles Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-04 18:00   ` Jeff Hostetler
  2018-01-03 16:33 ` [PATCH 15/40] external-odb: add script mode support Christian Couder
                   ` (25 subsequent siblings)
  39 siblings, 1 reply; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

In the following commits we will need some functions that were
internal to sha1_file.c, so let's first make them non static
and declare them in "cache.h". While at it, let's rename
'create_tmpfile()' to 'create_object_tmpfile()' to make its
name less generic.

Let's also split out 'sha1_file_name_alt()' from
'sha1_file_name()' and 'open_sha1_file_alt()' from
'open_sha1_file()', as we will need both of these new
functions too.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h     |  8 ++++++++
 sha1_file.c | 47 +++++++++++++++++++++++++++++------------------
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index 3fabf998ce..f41c102cb4 100644
--- a/cache.h
+++ b/cache.h
@@ -964,6 +964,12 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
+/*
+ * Like sha1_file_name, but return the filename within a specific alternate
+ * object directory. Shares the same static buffer with sha1_file_name.
+ */
+extern const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1);
+
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
  * The result will be at least `len` characters long, and will be NUL
@@ -1251,6 +1257,8 @@ extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
 
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
 
+extern int create_object_tmpfile(struct strbuf *tmp, const char *filename);
+extern void close_sha1_file(int fd);
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 261baf800f..785e8dda03 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -322,17 +322,22 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
 	}
 }
 
-const char *sha1_file_name(const unsigned char *sha1)
+const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1)
 {
 	static struct strbuf buf = STRBUF_INIT;
 
 	strbuf_reset(&buf);
-	strbuf_addf(&buf, "%s/", get_object_directory());
+	strbuf_addf(&buf, "%s/", objdir);
 
 	fill_sha1_path(&buf, sha1);
 	return buf.buf;
 }
 
+const char *sha1_file_name(const unsigned char *sha1)
+{
+	return sha1_file_name_alt(get_object_directory(), sha1);
+}
+
 struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
 {
 	strbuf_setlen(&alt->scratch, alt->base_len);
@@ -902,24 +907,14 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
 	return -1;
 }
 
-/*
- * Like stat_sha1_file(), but actually open the object and return the
- * descriptor. See the caveats on the "path" parameter above.
- */
-static int open_sha1_file(const unsigned char *sha1, const char **path)
+static int open_sha1_file_alt(const unsigned char *sha1, const char **path)
 {
-	int fd;
 	struct alternate_object_database *alt;
-	int most_interesting_errno;
-
-	*path = sha1_file_name(sha1);
-	fd = git_open(*path);
-	if (fd >= 0)
-		return fd;
-	most_interesting_errno = errno;
+	int most_interesting_errno = errno;
 
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
+		int fd;
 		*path = alt_sha1_path(alt, sha1);
 		fd = git_open(*path);
 		if (fd >= 0)
@@ -931,6 +926,22 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	return -1;
 }
 
+/*
+ * Like stat_sha1_file(), but actually open the object and return the
+ * descriptor. See the caveats on the "path" parameter above.
+ */
+static int open_sha1_file(const unsigned char *sha1, const char **path)
+{
+	int fd;
+
+	*path = sha1_file_name(sha1);
+	fd = git_open(*path);
+	if (fd >= 0)
+		return fd;
+
+	return open_sha1_file_alt(sha1, path);
+}
+
 /*
  * Map the loose object at "path" if it is not NULL, or the path found by
  * searching for a loose object named "sha1".
@@ -1527,7 +1538,7 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
 }
 
 /* Finalize a file on disk, and close it. */
-static void close_sha1_file(int fd)
+void close_sha1_file(int fd)
 {
 	if (fsync_object_files)
 		fsync_or_die(fd, "sha1 file");
@@ -1551,7 +1562,7 @@ static inline int directory_size(const char *filename)
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+int create_object_tmpfile(struct strbuf *tmp, const char *filename)
 {
 	int fd, dirlen = directory_size(filename);
 
@@ -1591,7 +1602,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	const char *filename = sha1_file_name(sha1);
 
-	fd = create_tmpfile(&tmp_file, filename);
+	fd = create_object_tmpfile(&tmp_file, filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error("insufficient permission for adding an object to repository database %s", get_object_directory());
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 15/40] external-odb: add script mode support
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (13 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 14/40] sha1_file: prepare for external odbs Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-04 19:55   ` Jeff Hostetler
  2018-01-03 16:33 ` [PATCH 16/40] odb-helper: add 'enum odb_helper_type' Christian Couder
                   ` (24 subsequent siblings)
  39 siblings, 1 reply; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This adds support for the script command mode where
an helper script or command is called to retrieve or
manage objects.

This implements the 'have' and 'get_git_obj'
instructions for the script mode.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c          |  51 ++++++++++-
 external-odb.h          |   1 +
 odb-helper.c            | 218 +++++++++++++++++++++++++++++++++++++++++++++++-
 odb-helper.h            |   4 +
 sha1_file.c             |  12 ++-
 t/t0400-external-odb.sh |  44 ++++++++++
 6 files changed, 327 insertions(+), 3 deletions(-)
 create mode 100755 t/t0400-external-odb.sh

diff --git a/external-odb.c b/external-odb.c
index 5d0afb9762..81f2aa5fac 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -33,8 +33,14 @@ static int external_odb_config(const char *var, const char *value, void *data)
 
 	o = find_or_create_helper(name, namelen);
 
-	if (!strcmp(subkey, "promisorremote"))
+	if (!strcmp(subkey, "promisorremote")) {
+		o->type = ODB_HELPER_GIT_REMOTE;
 		return git_config_string(&o->dealer, var, value);
+	}
+	if (!strcmp(subkey, "scriptcommand")) {
+		o->type = ODB_HELPER_SCRIPT_CMD;
+		return git_config_string(&o->dealer, var, value);
+	}
 
 	return 0;
 }
@@ -77,6 +83,49 @@ int external_odb_has_object(const unsigned char *sha1)
 	return 0;
 }
 
+int external_odb_get_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+	const char *path;
+
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	path = sha1_file_name_alt(external_odb_root(), sha1);
+	safe_create_leading_directories_const(path);
+	prepare_external_alt_odb();
+
+	for (o = helpers; o; o = o->next) {
+		struct strbuf tmpfile = STRBUF_INIT;
+		int ret;
+		int fd;
+
+		if (!odb_helper_has_object(o, sha1))
+			continue;
+
+		fd = create_object_tmpfile(&tmpfile, path);
+		if (fd < 0) {
+			strbuf_release(&tmpfile);
+			return -1;
+		}
+
+		if (odb_helper_get_object(o, sha1, fd) < 0) {
+			close(fd);
+			unlink(tmpfile.buf);
+			strbuf_release(&tmpfile);
+			continue;
+		}
+
+		close_sha1_file(fd);
+		ret = finalize_object_file(tmpfile.buf, path);
+		strbuf_release(&tmpfile);
+		if (!ret)
+			return 0;
+	}
+
+	return -1;
+}
+
 int external_odb_get_direct(const unsigned char *sha1)
 {
 	struct odb_helper *o;
diff --git a/external-odb.h b/external-odb.h
index fd6708163e..fb8b94972f 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,6 +4,7 @@
 extern int has_external_odb(void);
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
+extern int external_odb_get_object(const unsigned char *sha1);
 extern int external_odb_get_direct(const unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index 4b70b287af..c1a3443dc7 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -21,13 +21,124 @@ struct odb_helper_cmd {
 	struct child_process child;
 };
 
+/*
+ * Callers are responsible to ensure that the result of vaddf(fmt, ap)
+ * is properly shell-quoted.
+ */
+static void prepare_helper_command(struct argv_array *argv, const char *cmd,
+				   const char *fmt, va_list ap)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addstr(&buf, cmd);
+	strbuf_addch(&buf, ' ');
+	strbuf_vaddf(&buf, fmt, ap);
+
+	argv_array_push(argv, buf.buf);
+	strbuf_release(&buf);
+}
+
+__attribute__((format (printf,3,4)))
+static int odb_helper_start(struct odb_helper *o,
+			    struct odb_helper_cmd *cmd,
+			    const char *fmt, ...)
+{
+	va_list ap;
+
+	memset(cmd, 0, sizeof(*cmd));
+	argv_array_init(&cmd->argv);
+
+	if (!o->dealer)
+		return -1;
+
+	va_start(ap, fmt);
+	prepare_helper_command(&cmd->argv, o->dealer, fmt, ap);
+	va_end(ap);
+
+	cmd->child.argv = cmd->argv.argv;
+	cmd->child.use_shell = 1;
+	cmd->child.no_stdin = 1;
+	cmd->child.out = -1;
+
+	if (start_command(&cmd->child) < 0) {
+		argv_array_clear(&cmd->argv);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int odb_helper_finish(struct odb_helper *o,
+			     struct odb_helper_cmd *cmd)
+{
+	int ret = finish_command(&cmd->child);
+	argv_array_clear(&cmd->argv);
+	if (ret) {
+		warning("odb helper '%s' reported failure", o->name);
+		return -1;
+	}
+	return 0;
+}
+
+static int parse_object_line(struct odb_helper_object *o, const char *line)
+{
+	char *end;
+	if (get_sha1_hex(line, o->sha1) < 0)
+		return -1;
+
+	line += 40;
+	if (*line++ != ' ')
+		return -1;
+
+	o->size = strtoul(line, &end, 10);
+	if (line == end || *end++ != ' ')
+		return -1;
+
+	o->type = type_from_string(end);
+	return 0;
+}
+
+static int add_have_entry(struct odb_helper *o, const char *line)
+{
+	ALLOC_GROW(o->have, o->have_nr+1, o->have_alloc);
+	if (parse_object_line(&o->have[o->have_nr], line) < 0) {
+		warning("bad 'have' input from odb helper '%s': %s",
+			o->name, line);
+		return 1;
+	}
+	o->have_nr++;
+	return 0;
+}
+
+static int odb_helper_object_cmp(const void *va, const void *vb)
+{
+	const struct odb_helper_object *a = va, *b = vb;
+	return hashcmp(a->sha1, b->sha1);
+}
+
 static void odb_helper_load_have(struct odb_helper *o)
 {
+	struct odb_helper_cmd cmd;
+	FILE *fh;
+	struct strbuf line = STRBUF_INIT;
+
 	if (o->have_valid)
 		return;
 	o->have_valid = 1;
 
-	/* TODO */
+	if (odb_helper_start(o, &cmd, "have") < 0)
+		return;
+
+	fh = xfdopen(cmd.child.out, "r");
+	while (strbuf_getline(&line, fh) != EOF)
+		if (add_have_entry(o, line.buf))
+			break;
+
+	strbuf_release(&line);
+	fclose(fh);
+	odb_helper_finish(o, &cmd);
+
+	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
 
 static const unsigned char *have_sha1_access(size_t index, void *table)
@@ -53,6 +164,111 @@ int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
 	return !!odb_helper_lookup(o, sha1);
 }
 
+int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
+			    int fd)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+	unsigned long total_got;
+	git_zstream stream;
+	int zret = Z_STREAM_END;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+	struct strbuf header = STRBUF_INIT;
+	unsigned long hdr_size;
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	memset(&stream, 0, sizeof(stream));
+	git_inflate_init(&stream);
+	git_SHA1_Init(&hash);
+	total_got = 0;
+
+	for (;;) {
+		unsigned char buf[4096];
+		int r;
+
+		r = xread(cmd.child.out, buf, sizeof(buf));
+		if (r < 0) {
+			error("unable to read from odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			git_inflate_end(&stream);
+			return -1;
+		}
+		if (r == 0)
+			break;
+
+		write_or_die(fd, buf, r);
+
+		stream.next_in = buf;
+		stream.avail_in = r;
+		do {
+			unsigned char inflated[4096];
+			unsigned long got;
+
+			stream.next_out = inflated;
+			stream.avail_out = sizeof(inflated);
+			zret = git_inflate(&stream, Z_SYNC_FLUSH);
+			got = sizeof(inflated) - stream.avail_out;
+
+			git_SHA1_Update(&hash, inflated, got);
+			/* skip header when counting size */
+			if (!total_got) {
+				const unsigned char *p = memchr(inflated, '\0', got);
+				if (p) {
+					unsigned long hdr_last = p - inflated + 1;
+					strbuf_add(&header, inflated, hdr_last);
+					got -= hdr_last;
+				} else {
+					strbuf_add(&header, inflated, got);
+					got = 0;
+				}
+			}
+			total_got += got;
+		} while (stream.avail_in && zret == Z_OK);
+	}
+
+	close(cmd.child.out);
+	git_inflate_end(&stream);
+	git_SHA1_Final(real_sha1, &hash);
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+	if (zret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (total_got != obj->size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, obj->size);
+		return -1;
+	}
+	if (hashcmp(real_sha1, sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (parse_sha1_header(header.buf, &hdr_size) < 0) {
+		warning("could not parse header from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (total_got != hdr_size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, hdr_size);
+		return -1;
+	}
+
+	return 0;
+}
+
 int odb_helper_get_direct(struct odb_helper *o,
 			  const unsigned char *sha1)
 {
diff --git a/odb-helper.h b/odb-helper.h
index f4bc66b0ef..90b279c07e 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -20,6 +20,10 @@ struct odb_helper {
 extern struct odb_helper *odb_helper_new(const char *name, int namelen);
 extern int odb_helper_has_object(struct odb_helper *o,
 				 const unsigned char *sha1);
+extern int odb_helper_get_object(struct odb_helper *o,
+				 const unsigned char *sha1,
+				 int fd);
 extern int odb_helper_get_direct(struct odb_helper *o,
 				 const unsigned char *sha1);
+
 #endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index 785e8dda03..f1d688c10a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -904,6 +904,9 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
 			return 0;
 	}
 
+	if (!external_odb_get_object(sha1) && !lstat(*path, st))
+		return 0;
+
 	return -1;
 }
 
@@ -939,7 +942,14 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	if (fd >= 0)
 		return fd;
 
-	return open_sha1_file_alt(sha1, path);
+	fd = open_sha1_file_alt(sha1, path);
+	if (fd >= 0)
+		return fd;
+
+	if (!external_odb_get_object(sha1))
+		fd = open_sha1_file_alt(sha1, path);
+
+	return fd;
 }
 
 /*
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
new file mode 100755
index 0000000000..eaa3688a31
--- /dev/null
+++ b/t/t0400-external-odb.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+
+test_description='basic tests for external object databases'
+
+. ./test-lib.sh
+
+ALT_SOURCE="$PWD/alt-repo/.git"
+export ALT_SOURCE
+write_script odb-helper <<\EOF
+GIT_DIR=$ALT_SOURCE; export GIT_DIR
+case "$1" in
+have)
+	git cat-file --batch-check --batch-all-objects |
+	awk '{print $1 " " $3 " " $2}'
+	;;
+get_git_obj)
+	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-helper"
+
+test_expect_success 'setup alternate repo' '
+	git init alt-repo &&
+	test_commit -C alt-repo one &&
+	test_commit -C alt-repo two &&
+	alt_head=$(git -C alt-repo rev-parse HEAD)
+'
+
+test_expect_success 'alt objects are missing' '
+	test_must_fail git log --format=%s $alt_head
+'
+
+test_expect_success 'helper can retrieve alt objects' '
+	test_config odb.magic.scriptCommand "$HELPER" &&
+	cat >expect <<-\EOF &&
+	two
+	one
+	EOF
+	git log --format=%s $alt_head >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 16/40] odb-helper: add 'enum odb_helper_type'
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (14 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 15/40] external-odb: add script mode support Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 17/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
                   ` (23 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

As there will be different kinds of helpers, let's add
an "enum odb_helper_type" to tell between the different
kinds.

Let's add a field with this type in "struct odb_helper",
and set it when reading the config file.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/odb-helper.h b/odb-helper.h
index 90b279c07e..4f2ac5e476 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -1,9 +1,18 @@
 #ifndef ODB_HELPER_H
 #define ODB_HELPER_H
 
+enum odb_helper_type {
+	ODB_HELPER_NONE = 0,
+	ODB_HELPER_GIT_REMOTE,
+	ODB_HELPER_SCRIPT_CMD,
+	ODB_HELPER_PROCESS_CMD,
+	OBJ_HELPER_MAX
+};
+
 struct odb_helper {
 	const char *name;
 	const char *dealer;
+	enum odb_helper_type type;
 
 	struct odb_helper_object {
 		unsigned char sha1[20];
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 17/40] odb-helper: add odb_helper_init() to send 'init' instruction
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (15 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 16/40] odb-helper: add 'enum odb_helper_type' Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 18/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
                   ` (22 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Let's add an odb_helper_init() function to send an 'init'
instruction to the helpers. This 'init' instruction is
especially useful to get the capabilities that are supported
by the helpers.

So while at it, let's also add a parse_capabilities()
function to parse them and a supported_capabilities
variable in struct odb_helper to store them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c          | 13 +++++++++++-
 odb-helper.c            | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 odb-helper.h            | 12 +++++++++++
 t/t0400-external-odb.sh |  4 ++++
 4 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/external-odb.c b/external-odb.c
index 81f2aa5fac..2622c12853 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -35,6 +35,8 @@ static int external_odb_config(const char *var, const char *value, void *data)
 
 	if (!strcmp(subkey, "promisorremote")) {
 		o->type = ODB_HELPER_GIT_REMOTE;
+		o->supported_capabilities |= ODB_HELPER_CAP_HAVE;
+		o->supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
 		return git_config_string(&o->dealer, var, value);
 	}
 	if (!strcmp(subkey, "scriptcommand")) {
@@ -48,12 +50,16 @@ static int external_odb_config(const char *var, const char *value, void *data)
 static void external_odb_init(void)
 {
 	static int initialized;
+	struct odb_helper *o;
 
 	if (initialized || !use_external_odb)
 		return;
 	initialized = 1;
 
 	git_config(external_odb_config, NULL);
+
+	for (o = helpers; o; o = o->next)
+		odb_helper_init(o);
 }
 
 int has_external_odb(void)
@@ -77,9 +83,12 @@ int external_odb_has_object(const unsigned char *sha1)
 
 	external_odb_init();
 
-	for (o = helpers; o; o = o->next)
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
+			return 1;
 		if (odb_helper_has_object(o, sha1))
 			return 1;
+	}
 	return 0;
 }
 
@@ -133,6 +142,8 @@ int external_odb_get_direct(const unsigned char *sha1)
 	external_odb_init();
 
 	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT))
+			continue;
 		if (odb_helper_get_direct(o, sha1) < 0)
 			continue;
 		return 0;
diff --git a/odb-helper.c b/odb-helper.c
index c1a3443dc7..ea642fd438 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -6,6 +6,40 @@
 #include "sha1-lookup.h"
 #include "fetch-object.h"
 
+static void parse_capabilities(char *cap_buf,
+			       unsigned int *supported_capabilities,
+			       const char *process_name)
+{
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+
+	string_list_split_in_place(&cap_list, cap_buf, '=', 1);
+
+	if (cap_list.nr == 2 && !strcmp(cap_list.items[0].string, "capability")) {
+		const char *cap_name = cap_list.items[1].string;
+
+		if (!strcmp(cap_name, "get_git_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_GIT_OBJ;
+		} else if (!strcmp(cap_name, "get_raw_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_RAW_OBJ;
+		} else if (!strcmp(cap_name, "get_direct")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
+		} else if (!strcmp(cap_name, "put_git_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_GIT_OBJ;
+		} else if (!strcmp(cap_name, "put_raw_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_RAW_OBJ;
+		} else if (!strcmp(cap_name, "put_direct")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_DIRECT;
+		} else if (!strcmp(cap_name, "have")) {
+			*supported_capabilities |= ODB_HELPER_CAP_HAVE;
+		} else {
+			warning("external process '%s' requested unsupported read-object capability '%s'",
+				process_name, cap_name);
+		}
+	}
+
+	string_list_clear(&cap_list, 0);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -80,6 +114,26 @@ static int odb_helper_finish(struct odb_helper *o,
 	return 0;
 }
 
+int odb_helper_init(struct odb_helper *o)
+{
+	struct odb_helper_cmd cmd;
+	FILE *fh;
+	struct strbuf line = STRBUF_INIT;
+
+	if (odb_helper_start(o, &cmd, "init") < 0)
+		return -1;
+
+	fh = xfdopen(cmd.child.out, "r");
+	while (strbuf_getline(&line, fh) != EOF)
+		parse_capabilities(line.buf, &o->supported_capabilities, o->name);
+
+	strbuf_release(&line);
+	fclose(fh);
+	odb_helper_finish(o, &cmd);
+
+	return 0;
+}
+
 static int parse_object_line(struct odb_helper_object *o, const char *line)
 {
 	char *end;
diff --git a/odb-helper.h b/odb-helper.h
index 4f2ac5e476..f8eac7f44c 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -1,6 +1,8 @@
 #ifndef ODB_HELPER_H
 #define ODB_HELPER_H
 
+#include "external-odb.h"
+
 enum odb_helper_type {
 	ODB_HELPER_NONE = 0,
 	ODB_HELPER_GIT_REMOTE,
@@ -9,10 +11,19 @@ enum odb_helper_type {
 	OBJ_HELPER_MAX
 };
 
+#define ODB_HELPER_CAP_GET_GIT_OBJ    (1u<<0)
+#define ODB_HELPER_CAP_GET_RAW_OBJ    (1u<<1)
+#define ODB_HELPER_CAP_GET_DIRECT     (1u<<2)
+#define ODB_HELPER_CAP_PUT_GIT_OBJ    (1u<<3)
+#define ODB_HELPER_CAP_PUT_RAW_OBJ    (1u<<4)
+#define ODB_HELPER_CAP_PUT_DIRECT     (1u<<5)
+#define ODB_HELPER_CAP_HAVE           (1u<<6)
+
 struct odb_helper {
 	const char *name;
 	const char *dealer;
 	enum odb_helper_type type;
+	unsigned int supported_capabilities;
 
 	struct odb_helper_object {
 		unsigned char sha1[20];
@@ -27,6 +38,7 @@ struct odb_helper {
 };
 
 extern struct odb_helper *odb_helper_new(const char *name, int namelen);
+extern int odb_helper_init(struct odb_helper *o);
 extern int odb_helper_has_object(struct odb_helper *o,
 				 const unsigned char *sha1);
 extern int odb_helper_get_object(struct odb_helper *o,
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index eaa3688a31..977fea852d 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -9,6 +9,10 @@ export ALT_SOURCE
 write_script odb-helper <<\EOF
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
 have)
 	git cat-file --batch-check --batch-all-objects |
 	awk '{print $1 " " $3 " " $2}'
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 18/40] t0400: add 'put_raw_obj' instruction to odb-helper script
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (16 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 17/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 19/40] external odb: add 'put_raw_obj' support Christian Couder
                   ` (21 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

To properly test passing objects from Git to an external odb
we need an odb-helper script that supports a 'put'
capability/instruction.

For now we will support only sending raw blobs, so the
supported capability/instruction will be 'put_raw_obj'.

While at it let's add a test to check that our odb-helper
script works well.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 977fea852d..4ccca1e965 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -7,10 +7,15 @@ test_description='basic tests for external object databases'
 ALT_SOURCE="$PWD/alt-repo/.git"
 export ALT_SOURCE
 write_script odb-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 init)
 	echo "capability=get_git_obj"
+	echo "capability=put_raw_obj"
 	echo "capability=have"
 	;;
 have)
@@ -20,6 +25,16 @@ have)
 get_git_obj)
 	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
 	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	written=$(git hash-object -w -t "$kind" --stdin)
+	test "$written" = "$sha1" || die "bad sha1 passed '$sha1' vs written '$written'"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
 esac
 EOF
 HELPER="\"$PWD\"/odb-helper"
@@ -45,4 +60,13 @@ test_expect_success 'helper can retrieve alt objects' '
 	test_cmp expect actual
 '
 
+test_expect_success 'helper can add objects to alt repo' '
+	hash=$(echo "Hello odb!" | git hash-object -w -t blob --stdin) &&
+	test -f .git/objects/$(echo $hash | sed "s#..#&/#") &&
+	size=$(git cat-file -s "$hash") &&
+	git cat-file blob "$hash" | ./odb-helper put_raw_obj "$hash" "$size" blob &&
+	alt_size=$(git -C alt-repo cat-file -s "$hash") &&
+	test "$size" -eq "$alt_size"
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 19/40] external odb: add 'put_raw_obj' support
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (17 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 18/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 20/40] external-odb: accept only blobs for now Christian Couder
                   ` (20 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Add support for a 'put_raw_obj' capability/instruction to send new
objects to an external odb. Objects will be sent as they are (in
their 'raw' format). They will not be converted to Git objects.

For now any new Git object (blob, tree, commit, ...) would be sent
if 'put_raw_obj' is supported by an odb helper. This is not a great
default, but let's leave it to following commits to tweak that.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 15 +++++++++++++++
 external-odb.h |  2 ++
 odb-helper.c   | 43 ++++++++++++++++++++++++++++++++++++++-----
 odb-helper.h   |  3 +++
 sha1_file.c    |  2 ++
 5 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 2622c12853..337bdd2540 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -151,3 +151,18 @@ int external_odb_get_direct(const unsigned char *sha1)
 
 	return -1;
 }
+
+int external_odb_put_object(const void *buf, size_t len,
+			    const char *type, unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		int r = odb_helper_put_object(o, buf, len, type, sha1);
+		if (r <= 0)
+			return r;
+	}
+	return 1;
+}
diff --git a/external-odb.h b/external-odb.h
index fb8b94972f..26bb931685 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -6,5 +6,7 @@ extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
 extern int external_odb_get_object(const unsigned char *sha1);
 extern int external_odb_get_direct(const unsigned char *sha1);
+extern int external_odb_put_object(const void *buf, size_t len,
+				   const char *type, unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index ea642fd438..6f56f07b38 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -72,9 +72,10 @@ static void prepare_helper_command(struct argv_array *argv, const char *cmd,
 	strbuf_release(&buf);
 }
 
-__attribute__((format (printf,3,4)))
+__attribute__((format (printf,4,5)))
 static int odb_helper_start(struct odb_helper *o,
 			    struct odb_helper_cmd *cmd,
+			    int use_stdin,
 			    const char *fmt, ...)
 {
 	va_list ap;
@@ -91,7 +92,10 @@ static int odb_helper_start(struct odb_helper *o,
 
 	cmd->child.argv = cmd->argv.argv;
 	cmd->child.use_shell = 1;
-	cmd->child.no_stdin = 1;
+	if (use_stdin)
+		cmd->child.in = -1;
+	else
+		cmd->child.no_stdin = 1;
 	cmd->child.out = -1;
 
 	if (start_command(&cmd->child) < 0) {
@@ -120,7 +124,7 @@ int odb_helper_init(struct odb_helper *o)
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (odb_helper_start(o, &cmd, "init") < 0)
+	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
 	fh = xfdopen(cmd.child.out, "r");
@@ -180,7 +184,7 @@ static void odb_helper_load_have(struct odb_helper *o)
 		return;
 	o->have_valid = 1;
 
-	if (odb_helper_start(o, &cmd, "have") < 0)
+	if (odb_helper_start(o, &cmd, 0, "have") < 0)
 		return;
 
 	fh = xfdopen(cmd.child.out, "r");
@@ -235,7 +239,7 @@ int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 	if (!obj)
 		return -1;
 
-	if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
+	if (odb_helper_start(o, &cmd, 0, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
 		return -1;
 
 	memset(&stream, 0, sizeof(stream));
@@ -335,3 +339,32 @@ int odb_helper_get_direct(struct odb_helper *o,
 
 	return res;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+			  const void *buf, size_t len,
+			  const char *type, unsigned char *sha1)
+{
+	struct odb_helper_cmd cmd;
+
+	if (odb_helper_start(o, &cmd, 1, "put_raw_obj %s %"PRIuMAX" %s",
+			     sha1_to_hex(sha1), (uintmax_t)len, type) < 0)
+		return -1;
+
+	do {
+		int w = xwrite(cmd.child.in, buf, len);
+		if (w < 0) {
+			error("unable to write to odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.in);
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			return -1;
+		}
+		len -= w;
+	} while (len > 0);
+
+	close(cmd.child.in);
+	close(cmd.child.out);
+	odb_helper_finish(o, &cmd);
+	return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
index f8eac7f44c..4a9cc7f07b 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -46,5 +46,8 @@ extern int odb_helper_get_object(struct odb_helper *o,
 				 int fd);
 extern int odb_helper_get_direct(struct odb_helper *o,
 				 const unsigned char *sha1);
+extern int odb_helper_put_object(struct odb_helper *o,
+				 const void *buf, size_t len,
+				 const char *type, unsigned char *sha1);
 
 #endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index f1d688c10a..8c873554b2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1695,6 +1695,8 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	 * it out into .git/objects/??/?{38} file.
 	 */
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	if (!external_odb_put_object(buf, len, type, sha1))
+		return 0;
 	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 20/40] external-odb: accept only blobs for now
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (18 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 19/40] external odb: add 'put_raw_obj' support Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 21/40] t0400: add test for external odb write support Christian Couder
                   ` (19 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

The mechanism to decide which blobs should be sent to which
external object database will be very simple for now.
If the external odb helper support any "put_*" instruction
all the new blobs will be sent to it.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 337bdd2540..93971e9ce4 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -159,6 +159,10 @@ int external_odb_put_object(const void *buf, size_t len,
 
 	external_odb_init();
 
+	/* For now accept only blobs */
+	if (strcmp(type, "blob"))
+		return 1;
+
 	for (o = helpers; o; o = o->next) {
 		int r = odb_helper_put_object(o, buf, len, type, sha1);
 		if (r <= 0)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 21/40] t0400: add test for external odb write support
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (19 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 20/40] external-odb: accept only blobs for now Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 22/40] Add t0410 to test external ODB transfer Christian Couder
                   ` (18 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 4ccca1e965..f924de870f 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -69,4 +69,12 @@ test_expect_success 'helper can add objects to alt repo' '
 	test "$size" -eq "$alt_size"
 '
 
+test_expect_success 'commit adds objects to alt repo' '
+	test_config odb.magic.scriptCommand "$HELPER" &&
+	test_commit three &&
+	hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
+	content=$(git -C alt-repo show "$hash3") &&
+	test "$content" = "three"
+'
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 22/40] Add t0410 to test external ODB transfer
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (20 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 21/40] t0400: add test for external odb write support Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 23/40] lib-httpd: pass config file to start_httpd() Christian Couder
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0410-transfer-e-odb.sh | 144 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100755 t/t0410-transfer-e-odb.sh

diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
new file mode 100755
index 0000000000..065ec7d759
--- /dev/null
+++ b/t/t0410-transfer-e-odb.sh
@@ -0,0 +1,144 @@
+#!/bin/sh
+
+test_description='basic tests for transfering external ODBs'
+
+. ./test-lib.sh
+
+ORIG_SOURCE="$PWD/.git"
+export ORIG_SOURCE
+
+ALT_SOURCE1="$PWD/alt-repo1/.git"
+export ALT_SOURCE1
+write_script odb-helper1 <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+GIT_DIR=$ALT_SOURCE1; export GIT_DIR
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
+have)
+	git cat-file --batch-check --batch-all-objects |
+	awk '{print $1 " " $3 " " $2}'
+	;;
+get_git_obj)
+	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$ORIG_SOURCE GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	GIT_DIR=$ORIG_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER1="\"$PWD\"/odb-helper1"
+
+OTHER_SOURCE="$PWD/.git"
+export OTHER_SOURCE
+
+ALT_SOURCE2="$PWD/alt-repo2/.git"
+export ALT_SOURCE2
+write_script odb-helper2 <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+GIT_DIR=$ALT_SOURCE2; export GIT_DIR
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
+have)
+	GIT_DIR=$OTHER_SOURCE git for-each-ref --format='%(objectname)' refs/odbs/magic/ | GIT_DIR=$OTHER_SOURCE xargs git show
+	;;
+get_git_obj)
+	OBJ_FILE="$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	if ! test -f "$OBJ_FILE"
+	then
+		# "Download" the missing object by copying it from alt-repo1
+		OBJ_DIR=$(echo $2 | sed 's/\(..\).*/\1/')
+		OBJ_BASE=$(basename "$OBJ_FILE")
+		ALT_OBJ_DIR1="$ALT_SOURCE1/objects/$OBJ_DIR"
+		ALT_OBJ_DIR2="$ALT_SOURCE2/objects/$OBJ_DIR"
+		mkdir -p "$ALT_OBJ_DIR2" || die "Could not mkdir '$ALT_OBJ_DIR2'"
+		OBJ_SRC="$ALT_OBJ_DIR1/$OBJ_BASE"
+		cp "$OBJ_SRC" "$ALT_OBJ_DIR2" ||
+		die "Could not cp '$OBJ_SRC' into '$ALT_OBJ_DIR2'"
+	fi
+	cat "$OBJ_FILE" || die "Could not cat '$OBJ_FILE'"
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$OTHER_SOURCE GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	GIT_DIR=$OTHER_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER2="\"$PWD\"/odb-helper2"
+
+test_expect_success 'setup first alternate repo' '
+	git init alt-repo1 &&
+	test_commit zero &&
+	git config odb.magic.scriptCommand "$HELPER1"
+'
+
+test_expect_success 'setup other repo and its alternate repo' '
+	git init other-repo &&
+	git init alt-repo2 &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'new blobs are put in first object store' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo1 && git show "$hash1") &&
+	test "$content" = "one" &&
+	test_commit two &&
+	hash2=$(git ls-tree HEAD | grep two.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo1 && git show "$hash2") &&
+	test "$content" = "two"
+'
+
+test_expect_success 'other repo gets the blobs from object store' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 test_must_fail git cat-file blob "$hash2" &&
+	 git config odb.magic.scriptCommand "$HELPER2" &&
+	 git cat-file blob "$hash1" &&
+	 git cat-file blob "$hash2"
+	)
+'
+
+test_expect_success 'other repo gets everything else' '
+	(cd other-repo &&
+	 git fetch origin &&
+	 content=$(git show "$hash1") &&
+	 test "$content" = "one" &&
+	 content=$(git show "$hash2") &&
+	 test "$content" = "two")
+'
+
+test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 23/40] lib-httpd: pass config file to start_httpd()
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (21 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 22/40] Add t0410 to test external ODB transfer Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 24/40] lib-httpd: add upload.sh Christian Couder
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This makes it possible to start an apache web server with different
config files.

This will be used in a later patch to pass a config file that makes
apache store external objects.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465a..2e659a8ee2 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -171,12 +171,14 @@ prepare_httpd() {
 }
 
 start_httpd() {
+	APACHE_CONF_FILE=${1-apache.conf}
+
 	prepare_httpd >&3 2>&4
 
 	trap 'code=$?; stop_httpd; (exit $code); die' EXIT
 
 	"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-		-f "$TEST_PATH/apache.conf" $HTTPD_PARA \
+		-f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA \
 		-c "Listen 127.0.0.1:$LIB_HTTPD_PORT" -k start \
 		>&3 2>&4
 	if test $? -ne 0
@@ -191,7 +193,7 @@ stop_httpd() {
 	trap 'die' EXIT
 
 	"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-		-f "$TEST_PATH/apache.conf" $HTTPD_PARA -k stop
+		-f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA -k stop
 }
 
 test_http_push_nonff () {
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 24/40] lib-httpd: add upload.sh
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (22 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 23/40] lib-httpd: pass config file to start_httpd() Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 25/40] lib-httpd: add list.sh Christian Couder
                   ` (15 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This cgi will be used to upload objects to, or to delete
objects from, an apache web server.

This way the apache server can work as an external object
database.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh        |  1 +
 t/lib-httpd/upload.sh | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 t/lib-httpd/upload.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 2e659a8ee2..d80b004549 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
 	install_script error.sh
+	install_script upload.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/upload.sh b/t/lib-httpd/upload.sh
new file mode 100644
index 0000000000..64d3f31c31
--- /dev/null
+++ b/t/lib-httpd/upload.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+# In part from http://codereview.stackexchange.com/questions/79549/bash-cgi-upload-file
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+	key=${1%%=*}
+	val=${1#*=}
+
+	case "$key" in
+	"sha1") sha1="$val" ;;
+	"type") type="$val" ;;
+	"size") size="$val" ;;
+	"delete") delete=1 ;;
+	*) echo >&2 "unknown key '$key'" ;;
+	esac
+
+	shift
+done
+
+case "$REQUEST_METHOD" in
+POST)
+	if test "$delete" = "1"
+	then
+		rm -f "$FILES_DIR/$sha1-$size-$type"
+	else
+		mkdir -p "$FILES_DIR"
+		cat >"$FILES_DIR/$sha1-$size-$type"
+	fi
+
+	echo 'Status: 204 No Content'
+	echo
+	;;
+
+*)
+	echo 'Status: 405 Method Not Allowed'
+	echo
+esac
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 25/40] lib-httpd: add list.sh
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (23 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 24/40] lib-httpd: add upload.sh Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 26/40] lib-httpd: add apache-e-odb.conf Christian Couder
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This cgi script can list Git objects that have been uploaded as
files to an apache web server. This script can also retrieve
the content of each of these files.

This will help make apache work as an external object database.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh      |  1 +
 t/lib-httpd/list.sh | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)
 create mode 100644 t/lib-httpd/list.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index d80b004549..f31ea261f5 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -133,6 +133,7 @@ prepare_httpd() {
 	install_script broken-smart-http.sh
 	install_script error.sh
 	install_script upload.sh
+	install_script list.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/list.sh b/t/lib-httpd/list.sh
new file mode 100644
index 0000000000..b6d6c29a2f
--- /dev/null
+++ b/t/lib-httpd/list.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+	key=${1%%=*}
+	val=${1#*=}
+
+	case "$key" in
+	"sha1") sha1="$val" ;;
+	*) echo >&2 "unknown key '$key'" ;;
+	esac
+
+	shift
+done
+
+if test -d "$FILES_DIR"
+then
+	if test -z "$sha1"
+	then
+		echo 'Status: 200 OK'
+		echo
+		ls "$FILES_DIR" | tr '-' ' '
+	else
+		if test -f "$FILES_DIR/$sha1"-*
+		then
+			echo 'Status: 200 OK'
+			echo
+			cat "$FILES_DIR/$sha1"-*
+		else
+			echo 'Status: 404 Not Found'
+			echo
+		fi
+	fi
+fi
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 26/40] lib-httpd: add apache-e-odb.conf
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (24 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 25/40] lib-httpd: add list.sh Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 27/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This is an apache config file to test external object databases.
It uses the upload.sh and list.sh cgi that have been added
previously to make apache store external objects.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd/apache-e-odb.conf | 214 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 214 insertions(+)
 create mode 100644 t/lib-httpd/apache-e-odb.conf

diff --git a/t/lib-httpd/apache-e-odb.conf b/t/lib-httpd/apache-e-odb.conf
new file mode 100644
index 0000000000..19a1540c82
--- /dev/null
+++ b/t/lib-httpd/apache-e-odb.conf
@@ -0,0 +1,214 @@
+ServerName dummy
+PidFile httpd.pid
+DocumentRoot www
+LogFormat "%h %l %u %t \"%r\" %>s %b" common
+CustomLog access.log common
+ErrorLog error.log
+<IfModule !mod_log_config.c>
+	LoadModule log_config_module modules/mod_log_config.so
+</IfModule>
+<IfModule !mod_alias.c>
+	LoadModule alias_module modules/mod_alias.so
+</IfModule>
+<IfModule !mod_cgi.c>
+	LoadModule cgi_module modules/mod_cgi.so
+</IfModule>
+<IfModule !mod_env.c>
+	LoadModule env_module modules/mod_env.so
+</IfModule>
+<IfModule !mod_rewrite.c>
+	LoadModule rewrite_module modules/mod_rewrite.so
+</IFModule>
+<IfModule !mod_version.c>
+	LoadModule version_module modules/mod_version.so
+</IfModule>
+<IfModule !mod_headers.c>
+	LoadModule headers_module modules/mod_headers.so
+</IfModule>
+
+<IfVersion < 2.4>
+LockFile accept.lock
+</IfVersion>
+
+<IfVersion < 2.1>
+<IfModule !mod_auth.c>
+	LoadModule auth_module modules/mod_auth.so
+</IfModule>
+</IfVersion>
+
+<IfVersion >= 2.1>
+<IfModule !mod_auth_basic.c>
+	LoadModule auth_basic_module modules/mod_auth_basic.so
+</IfModule>
+<IfModule !mod_authn_file.c>
+	LoadModule authn_file_module modules/mod_authn_file.so
+</IfModule>
+<IfModule !mod_authz_user.c>
+	LoadModule authz_user_module modules/mod_authz_user.so
+</IfModule>
+<IfModule !mod_authz_host.c>
+	LoadModule authz_host_module modules/mod_authz_host.so
+</IfModule>
+</IfVersion>
+
+<IfVersion >= 2.4>
+<IfModule !mod_authn_core.c>
+	LoadModule authn_core_module modules/mod_authn_core.so
+</IfModule>
+<IfModule !mod_authz_core.c>
+	LoadModule authz_core_module modules/mod_authz_core.so
+</IfModule>
+<IfModule !mod_access_compat.c>
+	LoadModule access_compat_module modules/mod_access_compat.so
+</IfModule>
+<IfModule !mod_mpm_prefork.c>
+	LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
+</IfModule>
+<IfModule !mod_unixd.c>
+	LoadModule unixd_module modules/mod_unixd.so
+</IfModule>
+</IfVersion>
+
+PassEnv GIT_VALGRIND
+PassEnv GIT_VALGRIND_OPTIONS
+PassEnv GNUPGHOME
+PassEnv ASAN_OPTIONS
+PassEnv GIT_TRACE
+PassEnv GIT_CONFIG_NOSYSTEM
+
+Alias /dumb/ www/
+Alias /auth/dumb/ www/auth/dumb/
+
+<LocationMatch /smart/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+</LocationMatch>
+<LocationMatch /smart_noexport/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+</LocationMatch>
+<LocationMatch /smart_custom_env/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	SetEnv GIT_COMMITTER_NAME "Custom User"
+	SetEnv GIT_COMMITTER_EMAIL custom@example.com
+</LocationMatch>
+<LocationMatch /smart_namespace/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	SetEnv GIT_NAMESPACE ns
+</LocationMatch>
+<LocationMatch /smart_cookies/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	Header set Set-Cookie name=value
+</LocationMatch>
+<LocationMatch /smart_headers/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+</LocationMatch>
+ScriptAlias /upload/ upload.sh/
+ScriptAlias /list/ list.sh/
+<Directory ${GIT_EXEC_PATH}>
+	Options FollowSymlinks
+</Directory>
+<Files upload.sh>
+  Options ExecCGI
+</Files>
+<Files list.sh>
+  Options ExecCGI
+</Files>
+<Files ${GIT_EXEC_PATH}/git-http-backend>
+	Options ExecCGI
+</Files>
+
+RewriteEngine on
+RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
+RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
+RewriteRule ^/smart-redir-auth/(.*)$ /auth/smart/$1 [R=301]
+RewriteRule ^/smart-redir-limited/(.*)/info/refs$ /smart/$1/info/refs [R=301]
+RewriteRule ^/ftp-redir/(.*)$ ftp://localhost:1000/$1 [R=302]
+
+RewriteRule ^/loop-redir/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-(.*) /$1 [R=302]
+RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
+
+# Apache 2.2 does not understand <RequireAll>, so we use RewriteCond.
+# And as RewriteCond does not allow testing for non-matches, we match
+# the desired case first (one has abra, two has cadabra), and let it
+# pass by marking the RewriteRule as [L], "last rule, do not process
+# any other matching RewriteRules after this"), and then have another
+# RewriteRule that matches all other cases and lets them fail via '[F]',
+# "fail the request".
+RewriteCond %{HTTP:x-magic-one} =abra
+RewriteCond %{HTTP:x-magic-two} =cadabra
+RewriteRule ^/smart_headers/.* - [L]
+RewriteRule ^/smart_headers/.* - [F]
+
+<IfDefine SSL>
+LoadModule ssl_module modules/mod_ssl.so
+
+SSLCertificateFile httpd.pem
+SSLCertificateKeyFile httpd.pem
+SSLRandomSeed startup file:/dev/urandom 512
+SSLRandomSeed connect file:/dev/urandom 512
+SSLSessionCache none
+SSLMutex file:ssl_mutex
+SSLEngine On
+</IfDefine>
+
+<Location /auth/>
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</Location>
+
+<LocationMatch "^/auth-push/.*/git-receive-pack$">
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</LocationMatch>
+
+<LocationMatch "^/auth-fetch/.*/git-upload-pack$">
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</LocationMatch>
+
+RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
+RewriteCond %{REQUEST_URI} /git-receive-pack$
+RewriteRule ^/half-auth-complete/ - [E=AUTHREQUIRED:yes]
+
+<Location /half-auth-complete/>
+  Order Deny,Allow
+  Deny from env=AUTHREQUIRED
+
+  AuthType Basic
+  AuthName "Git Access"
+  AuthUserFile passwd
+  Require valid-user
+  Satisfy Any
+</Location>
+
+<IfDefine DAV>
+	LoadModule dav_module modules/mod_dav.so
+	LoadModule dav_fs_module modules/mod_dav_fs.so
+
+	DAVLockDB DAVLock
+	<Location /dumb/>
+		Dav on
+	</Location>
+	<Location /auth/dumb>
+		Dav on
+	</Location>
+</IfDefine>
+
+<IfDefine SVN>
+	LoadModule dav_svn_module modules/mod_dav_svn.so
+
+	<Location /${LIB_HTTPD_SVN}>
+		DAV svn
+		SVNPath "${LIB_HTTPD_SVNPATH}"
+	</Location>
+</IfDefine>
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 27/40] odb-helper: add odb_helper_get_raw_object()
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (25 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 26/40] lib-httpd: add apache-e-odb.conf Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 28/40] pack-objects: don't pack objects in external odbs Christian Couder
                   ` (12 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

The existing odb_helper_get_object() is renamed
odb_helper_get_git_object() and a new odb_helper_get_raw_object()
is introduced to deal with external objects that are not in Git format.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 111 insertions(+), 2 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index 6f56f07b38..fc30c2fa57 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -222,8 +222,107 @@ int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
 	return !!odb_helper_lookup(o, sha1);
 }
 
-int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
-			    int fd)
+static int odb_helper_get_raw_object(struct odb_helper *o,
+				     const unsigned char *sha1,
+				     int fd)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+	unsigned long total_got = 0;
+
+	char hdr[32];
+	int hdrlen;
+
+	int ret = Z_STREAM_END;
+	unsigned char compressed[4096];
+	git_zstream stream;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, 0, "get_raw_obj %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	/* Set it up */
+	git_deflate_init(&stream, zlib_compression_level);
+	stream.next_out = compressed;
+	stream.avail_out = sizeof(compressed);
+	git_SHA1_Init(&hash);
+
+	/* First header.. */
+	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(obj->type), obj->size) + 1;
+	stream.next_in = (unsigned char *)hdr;
+	stream.avail_in = hdrlen;
+	while (git_deflate(&stream, 0) == Z_OK)
+		; /* nothing */
+	git_SHA1_Update(&hash, hdr, hdrlen);
+
+	for (;;) {
+		unsigned char buf[4096];
+		int r;
+
+		r = xread(cmd.child.out, buf, sizeof(buf));
+		if (r < 0) {
+			error("unable to read from odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			git_deflate_end(&stream);
+			return -1;
+		}
+		if (r == 0)
+			break;
+
+		total_got += r;
+
+		/* Then the data itself.. */
+		stream.next_in = (void *)buf;
+		stream.avail_in = r;
+		do {
+			unsigned char *in0 = stream.next_in;
+			ret = git_deflate(&stream, Z_FINISH);
+			git_SHA1_Update(&hash, in0, stream.next_in - in0);
+			write_or_die(fd, compressed, stream.next_out - compressed);
+			stream.next_out = compressed;
+			stream.avail_out = sizeof(compressed);
+		} while (ret == Z_OK);
+	}
+
+	close(cmd.child.out);
+	if (ret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	ret = git_deflate_end_gently(&stream);
+	if (ret != Z_OK) {
+		warning("deflateEnd on object %s from odb helper '%s' failed (%d)",
+			sha1_to_hex(sha1), o->name, ret);
+		return -1;
+	}
+	git_SHA1_Final(real_sha1, &hash);
+	if (hashcmp(sha1, real_sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+	if (total_got != obj->size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, obj->size);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int odb_helper_get_git_object(struct odb_helper *o,
+				     const unsigned char *sha1,
+				     int fd)
 {
 	struct odb_helper_object *obj;
 	struct odb_helper_cmd cmd;
@@ -340,6 +439,16 @@ int odb_helper_get_direct(struct odb_helper *o,
 	return res;
 }
 
+int odb_helper_get_object(struct odb_helper *o,
+			  const unsigned char *sha1,
+			  int fd)
+{
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
+		return odb_helper_get_raw_object(o, sha1, fd);
+	else
+		return odb_helper_get_git_object(o, sha1, fd);
+}
+
 int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 28/40] pack-objects: don't pack objects in external odbs
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (26 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 27/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-04 20:54   ` Jeff Hostetler
  2018-01-03 16:33 ` [PATCH 29/40] Add t0420 to test transfer to HTTP external odb Christian Couder
                   ` (11 subsequent siblings)
  39 siblings, 1 reply; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Objects managed by an external ODB should not be put into
pack files. They should be transfered using other mechanism
that can be specific to the external odb.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/pack-objects.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6c71552cdf..4ed66c7677 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -28,6 +28,7 @@
 #include "argv-array.h"
 #include "mru.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 static const char *pack_usage[] = {
 	N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -1026,6 +1027,9 @@ static int want_object_in_pack(const struct object_id *oid,
 			return want;
 	}
 
+	if (external_odb_has_object(oid->hash))
+		return 0;
+
 	for (entry = packed_git_mru.head; entry; entry = entry->next) {
 		struct packed_git *p = entry->item;
 		off_t offset;
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 29/40] Add t0420 to test transfer to HTTP external odb
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (27 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 28/40] pack-objects: don't pack objects in external odbs Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 30/40] external-odb: add 'get_direct' support Christian Couder
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This tests that an apache web server can be used as an
external object database and store files in their native
format instead of converting them to a Git object.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0420-transfer-http-e-odb.sh | 142 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100755 t/t0420-transfer-http-e-odb.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
new file mode 100755
index 0000000000..f84fe950ec
--- /dev/null
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -0,0 +1,142 @@
+#!/bin/sh
+
+test_description='tests for transfering external objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-http-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+echo >&2 "odb-http-helper args:" "$@"
+case "$1" in
+init)
+	echo "capability=get_raw_obj"
+	echo "capability=put_raw_obj"
+	echo "capability=have"
+	;;
+have)
+	list_url="$HTTPD_URL/list/"
+	curl "$list_url" ||
+	die "curl '$list_url' failed"
+	;;
+get_raw_obj)
+	get_url="$HTTPD_URL/list/?sha1=$2"
+	curl "$get_url" ||
+	die "curl '$get_url' failed"
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	upload_url="$HTTPD_URL/upload/?sha1=$sha1&size=$size&type=$kind"
+	curl --data-binary @- --include "$upload_url" >out ||
+	die "curl '$upload_url' failed"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-http-helper"
+
+test_expect_success 'setup repo with a root commit and the helper' '
+	test_commit zero &&
+	git config odb.magic.scriptCommand "$HELPER"
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.scriptCommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.scriptCommand="$HELPER" \
+		--no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 30/40] external-odb: add 'get_direct' support
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (28 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 29/40] Add t0420 to test transfer to HTTP external odb Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 31/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This implements the 'get_direct' capability/instruction that makes
it possible for external odb helper scripts to pass blobs to Git
by directly writing them as loose objects files.

It is better to call this a "direct" mode rather than a "fault-in"
mode as we could have the same kind of mechanism to "put" objects
into an external odb, where the odb helper would access blobs it
wants to send to an external odb directly from files, but it
would be strange to call that a fault-in mode too.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c |  3 ++-
 odb-helper.c   | 28 +++++++++++++++++++++++-----
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 93971e9ce4..3ce3d111f3 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -109,7 +109,8 @@ int external_odb_get_object(const unsigned char *sha1)
 		int ret;
 		int fd;
 
-		if (!odb_helper_has_object(o, sha1))
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ) &&
+		    !(o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ))
 			continue;
 
 		fd = create_object_tmpfile(&tmpfile, path);
diff --git a/odb-helper.c b/odb-helper.c
index fc30c2fa57..0fa7af0348 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -429,24 +429,42 @@ static int odb_helper_get_git_object(struct odb_helper *o,
 int odb_helper_get_direct(struct odb_helper *o,
 			  const unsigned char *sha1)
 {
-	int res = 0;
 	uint64_t start = getnanotime();
 
-	fetch_object(o->dealer, sha1);
+	if (o->type == ODB_HELPER_GIT_REMOTE) {
+		fetch_object(o->dealer, sha1);
+	} else {
+		struct odb_helper_object *obj;
+		struct odb_helper_cmd cmd;
+
+		obj = odb_helper_lookup(o, sha1);
+		if (!obj)
+			return -1;
+
+		if (odb_helper_start(o, &cmd, 0, "get_direct %s", sha1_to_hex(sha1)) < 0)
+			return -1;
+
+		if (odb_helper_finish(o, &cmd))
+			return -1;
+	}
 
 	trace_performance_since(start, "odb_helper_get_direct");
 
-	return res;
+	return 0;
 }
 
 int odb_helper_get_object(struct odb_helper *o,
 			  const unsigned char *sha1,
 			  int fd)
 {
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
+		return odb_helper_get_git_object(o, sha1, fd);
 	if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
 		return odb_helper_get_raw_object(o, sha1, fd);
-	else
-		return odb_helper_get_git_object(o, sha1, fd);
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT)
+		return 0;
+
+	BUG("invalid get capability (capabilities: '%d')", o->supported_capabilities);
 }
 
 int odb_helper_put_object(struct odb_helper *o,
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 31/40] odb-helper: add 'script_mode' to 'struct odb_helper'
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (29 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 30/40] external-odb: add 'get_direct' support Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 32/40] odb-helper: add init_object_process() Christian Couder
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

to prepare for having a long running odb helper sub-process
handling the communication between Git and an external odb.

We introduce "odb.<name>.subprocesscommand" to make it
possible to define such a sub-process, and we mark such odb
helpers with the new 'script_mode' field set to 0.

Helpers defined using the existing "odb.<name>.scriptcommand"
are marked with the 'script_mode' field set to 1.

Implementation of the different capabilities/instructions in
the new (sub-)process mode is left for following commits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c |  4 ++++
 odb-helper.c   | 19 ++++++++++++++-----
 odb-helper.h   |  2 +-
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 3ce3d111f3..f38c2c2fe3 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -43,6 +43,10 @@ static int external_odb_config(const char *var, const char *value, void *data)
 		o->type = ODB_HELPER_SCRIPT_CMD;
 		return git_config_string(&o->dealer, var, value);
 	}
+	if (!strcmp(subkey, "subprocesscommand")) {
+		o->type = ODB_HELPER_SUBPROCESS_CMD;
+		return git_config_string(&o->dealer, var, value);
+	}
 
 	return 0;
 }
diff --git a/odb-helper.c b/odb-helper.c
index 0fa7af0348..91b4de1a05 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -124,6 +124,9 @@ int odb_helper_init(struct odb_helper *o)
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
+	if (o->type != ODB_HELPER_SCRIPT_CMD)
+		return 0;
+
 	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
@@ -174,16 +177,12 @@ static int odb_helper_object_cmp(const void *va, const void *vb)
 	return hashcmp(a->sha1, b->sha1);
 }
 
-static void odb_helper_load_have(struct odb_helper *o)
+static void have_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (o->have_valid)
-		return;
-	o->have_valid = 1;
-
 	if (odb_helper_start(o, &cmd, 0, "have") < 0)
 		return;
 
@@ -195,6 +194,16 @@ static void odb_helper_load_have(struct odb_helper *o)
 	strbuf_release(&line);
 	fclose(fh);
 	odb_helper_finish(o, &cmd);
+}
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+	if (o->have_valid)
+		return;
+	o->have_valid = 1;
+
+	if (o->type == ODB_HELPER_SCRIPT_CMD)
+		have_object_script(o);
 
 	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
diff --git a/odb-helper.h b/odb-helper.h
index 4a9cc7f07b..9a33adbf0b 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -7,7 +7,7 @@ enum odb_helper_type {
 	ODB_HELPER_NONE = 0,
 	ODB_HELPER_GIT_REMOTE,
 	ODB_HELPER_SCRIPT_CMD,
-	ODB_HELPER_PROCESS_CMD,
+	ODB_HELPER_SUBPROCESS_CMD,
 	OBJ_HELPER_MAX
 };
 
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 32/40] odb-helper: add init_object_process()
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (30 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 31/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 33/40] Add t0450 to test 'get_direct' mechanism Christian Couder
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This adds the infrastructure to launch and use long running
sub-processes as external odb helpers.

For now only the 'init' and 'get_direct' capabilities are
supported with sub-processes.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c |  46 +++---
 odb-helper.c   | 474 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 sha1_file.c    |  12 +-
 3 files changed, 491 insertions(+), 41 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index f38c2c2fe3..50c1cec50b 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -81,29 +81,11 @@ const char *external_odb_root(void)
 	return root;
 }
 
-int external_odb_has_object(const unsigned char *sha1)
-{
-	struct odb_helper *o;
-
-	external_odb_init();
-
-	for (o = helpers; o; o = o->next) {
-		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
-			return 1;
-		if (odb_helper_has_object(o, sha1))
-			return 1;
-	}
-	return 0;
-}
-
-int external_odb_get_object(const unsigned char *sha1)
+static int external_odb_do_get_object(const unsigned char *sha1)
 {
 	struct odb_helper *o;
 	const char *path;
 
-	if (!external_odb_has_object(sha1))
-		return -1;
-
 	path = sha1_file_name_alt(external_odb_root(), sha1);
 	safe_create_leading_directories_const(path);
 	prepare_external_alt_odb();
@@ -157,6 +139,32 @@ int external_odb_get_direct(const unsigned char *sha1)
 	return -1;
 }
 
+int external_odb_has_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE)) {
+			if (o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT)
+				return 1;
+			return !external_odb_do_get_object(sha1);
+		}
+		if (odb_helper_has_object(o, sha1))
+			return 1;
+	}
+	return 0;
+}
+
+int external_odb_get_object(const unsigned char *sha1)
+{
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	return external_odb_do_get_object(sha1);
+}
+
 int external_odb_put_object(const void *buf, size_t len,
 			    const char *type, unsigned char *sha1)
 {
diff --git a/odb-helper.c b/odb-helper.c
index 91b4de1a05..a67dfddca0 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -5,6 +5,22 @@
 #include "run-command.h"
 #include "sha1-lookup.h"
 #include "fetch-object.h"
+#include "sub-process.h"
+#include "pkt-line.h"
+#include "sigchain.h"
+
+struct object_process {
+	struct subprocess_entry subprocess;
+	unsigned int supported_capabilities;
+};
+
+static struct hashmap subprocess_map;
+
+static int check_object_process_status(int fd, struct strbuf *status)
+{
+	subprocess_read_status(fd, status);
+	return strcmp(status->buf, "success");
+}
 
 static void parse_capabilities(char *cap_buf,
 			       unsigned int *supported_capabilities,
@@ -40,6 +56,384 @@ static void parse_capabilities(char *cap_buf,
 	string_list_clear(&cap_list, 0);
 }
 
+static int start_object_process_fn(struct subprocess_entry *subprocess)
+{
+	static int versions[] = {1, 0};
+	static struct subprocess_capability capabilities[] = {
+		{ "get_git_obj", ODB_HELPER_CAP_GET_GIT_OBJ },
+		{ "get_raw_obj", ODB_HELPER_CAP_GET_RAW_OBJ },
+		{ "get_direct",  ODB_HELPER_CAP_GET_DIRECT  },
+		{ "put_git_obj", ODB_HELPER_CAP_PUT_GIT_OBJ },
+		{ "put_raw_obj", ODB_HELPER_CAP_PUT_RAW_OBJ },
+		{ "put_direct",  ODB_HELPER_CAP_PUT_DIRECT  },
+		{ "have",        ODB_HELPER_CAP_HAVE },
+		{ NULL, 0 }
+	};
+	struct object_process *entry = (struct object_process *)subprocess;
+	return subprocess_handshake(subprocess, "git-read-object", versions, NULL,
+				    capabilities,
+				    &entry->supported_capabilities);
+}
+
+static struct object_process *launch_object_process(struct odb_helper *o,
+						    unsigned int capability)
+{
+	struct object_process *entry = NULL;
+
+	if (!subprocess_map.tablesize)
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp, NULL, 0);
+	else
+		entry = (struct object_process *)subprocess_find_entry(&subprocess_map, o->dealer);
+
+	fflush(NULL);
+
+	if (!entry) {
+		entry = xmalloc(sizeof(*entry));
+		entry->supported_capabilities = 0;
+
+		if (subprocess_start(&subprocess_map, &entry->subprocess, o->dealer, start_object_process_fn)) {
+			error("Could not launch process for cmd '%s'", o->dealer);
+			free(entry);
+			return NULL;
+		}
+	}
+
+	o->supported_capabilities = entry->supported_capabilities;
+
+	if (capability && !(capability & entry->supported_capabilities)) {
+		error("The cmd '%s' does not support capability '%d'", o->dealer, capability);
+		return NULL;
+	}
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	return entry;
+}
+
+static int check_object_process_error(int err,
+				      const char *status,
+				      struct object_process *entry,
+				      const char *cmd,
+				      unsigned int capability)
+{
+	sigchain_pop(SIGPIPE);
+
+	if (!err)
+		return 0;
+
+	if (!strcmp(status, "error")) {
+		/* The process signaled a problem with the file. */
+	} else if (!strcmp(status, "notfound")) {
+		/* Object was not found */
+		err = -1;
+	} else if (!strcmp(status, "abort")) {
+		/*
+		 * The process signaled a permanent problem. Don't try to read
+		 * objects with the same command for the lifetime of the current
+		 * Git process.
+		 */
+		if (capability)
+			entry->supported_capabilities &= ~capability;
+	} else {
+		/*
+		 * Something went wrong with the read-object process.
+		 * Force shutdown and restart if needed.
+		 */
+		error("external object process '%s' failed", cmd);
+		subprocess_stop(&subprocess_map, &entry->subprocess);
+		free(entry);
+	}
+
+	return err;
+}
+
+static int send_init_packets(struct object_process *entry,
+			     struct strbuf *status)
+{
+	struct child_process *process = &entry->subprocess.process;
+
+	return packet_write_fmt_gently(process->in, "command=init\n") ||
+		packet_flush_gently(process->in) ||
+		check_object_process_status(process->out, status);
+}
+
+static int init_object_process(struct odb_helper *o)
+{
+	int err;
+	struct strbuf status = STRBUF_INIT;
+	struct object_process *entry = launch_object_process(o, 0);
+	if (!entry)
+		return -1;
+
+	err = send_init_packets(entry, &status);
+
+	return check_object_process_error(err, status.buf, entry,
+					  o->dealer, 0);
+}
+
+static ssize_t read_packetized_raw_object_to_fd(struct odb_helper *o,
+						const unsigned char *sha1,
+						int fd_in, int fd_out)
+{
+	ssize_t total_read = 0;
+	unsigned long total_got = 0;
+	int packet_len;
+
+	char hdr[32];
+	int hdrlen;
+
+	int ret = Z_STREAM_END;
+	unsigned char compressed[4096];
+	git_zstream stream;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	off_t size;
+	enum object_type type;
+	const char *s;
+	int pkt_size;
+	char *size_buf;
+
+	size_buf = packet_read_line(fd_in, &pkt_size);
+	if (!skip_prefix(size_buf, "size=", &s))
+		return error("odb helper '%s' did not send size of plain object", o->name);
+	size = strtoumax(s, NULL, 10);
+	if (!skip_prefix(packet_read_line(fd_in, NULL), "kind=", &s))
+		return error("odb helper '%s' did not send kind of plain object", o->name);
+	/* Check if the object is not available */
+	if (!strcmp(s, "none"))
+		return -1;
+	type = type_from_string_gently(s, strlen(s), 1);
+	if (type < 0)
+		return error("odb helper '%s' sent bad type '%s'", o->name, s);
+
+	/* Set it up */
+	git_deflate_init(&stream, zlib_compression_level);
+	stream.next_out = compressed;
+	stream.avail_out = sizeof(compressed);
+	git_SHA1_Init(&hash);
+
+	/* First header.. */
+	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(type), size) + 1;
+	stream.next_in = (unsigned char *)hdr;
+	stream.avail_in = hdrlen;
+	while (git_deflate(&stream, 0) == Z_OK)
+		; /* nothing */
+	git_SHA1_Update(&hash, hdr, hdrlen);
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+
+		packet_len = packet_read(fd_in, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		total_got += packet_len;
+
+		/* Then the data itself.. */
+		stream.next_in = (void *)buf;
+		stream.avail_in = packet_len;
+		do {
+			unsigned char *in0 = stream.next_in;
+			ret = git_deflate(&stream, Z_FINISH);
+			git_SHA1_Update(&hash, in0, stream.next_in - in0);
+			write_or_die(fd_out, compressed, stream.next_out - compressed);
+			stream.next_out = compressed;
+			stream.avail_out = sizeof(compressed);
+		} while (ret == Z_OK);
+
+		total_read += packet_len;
+	}
+
+	if (packet_len < 0) {
+		error("unable to read from odb helper '%s': %s",
+		      o->name, strerror(errno));
+		git_deflate_end(&stream);
+		return packet_len;
+	}
+
+	if (ret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+
+	ret = git_deflate_end_gently(&stream);
+	if (ret != Z_OK) {
+		warning("deflateEnd on object %s from odb helper '%s' failed (%d)",
+			sha1_to_hex(sha1), o->name, ret);
+		return -1;
+	}
+	git_SHA1_Final(real_sha1, &hash);
+	if (hashcmp(sha1, real_sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (total_got != size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, size);
+		return -1;
+	}
+
+	return total_read;
+}
+
+static ssize_t read_packetized_git_object_to_fd(struct odb_helper *o,
+						const unsigned char *sha1,
+						int fd_in, int fd_out)
+{
+	ssize_t total_read = 0;
+	unsigned long total_got = 0;
+	int packet_len;
+	git_zstream stream;
+	int zret = Z_STREAM_END;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+	struct strbuf header = STRBUF_INIT;
+	unsigned long hdr_size;
+
+	memset(&stream, 0, sizeof(stream));
+	git_inflate_init(&stream);
+	git_SHA1_Init(&hash);
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+
+		packet_len = packet_read(fd_in, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		write_or_die(fd_out, buf, packet_len);
+
+		stream.next_in = (unsigned char *)buf;
+		stream.avail_in = packet_len;
+		do {
+			unsigned char inflated[4096];
+			unsigned long got;
+
+			stream.next_out = inflated;
+			stream.avail_out = sizeof(inflated);
+			zret = git_inflate(&stream, Z_SYNC_FLUSH);
+			got = sizeof(inflated) - stream.avail_out;
+
+			git_SHA1_Update(&hash, inflated, got);
+			/* skip header when counting size */
+			if (!total_got) {
+				const unsigned char *p = memchr(inflated, '\0', got);
+				if (p) {
+					unsigned long hdr_last = p - inflated + 1;
+					strbuf_add(&header, inflated, hdr_last);
+					got -= hdr_last;
+				} else {
+					strbuf_add(&header, inflated, got);
+					got = 0;
+				}
+			}
+			total_got += got;
+		} while (stream.avail_in && zret == Z_OK);
+
+		total_read += packet_len;
+	}
+
+	git_inflate_end(&stream);
+
+	if (packet_len < 0)
+		return packet_len;
+
+	git_SHA1_Final(real_sha1, &hash);
+
+	if (zret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (hashcmp(real_sha1, sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (parse_sha1_header(header.buf, &hdr_size) < 0) {
+		warning("could not parse header from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (total_got != hdr_size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, hdr_size);
+		return -1;
+	}
+
+	return total_read;
+}
+
+static int send_get_packets(struct odb_helper *o,
+			    struct object_process *entry,
+			    const unsigned char *sha1,
+			    int fd,
+			    unsigned int *cur_cap,
+			    struct strbuf *status)
+{
+	const char *instruction;
+	int err;
+	struct child_process *process = &entry->subprocess.process;
+
+	if (entry->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ) {
+		*cur_cap = ODB_HELPER_CAP_GET_GIT_OBJ;
+		instruction = "get_git_obj";
+	} else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ) {
+		*cur_cap = ODB_HELPER_CAP_GET_RAW_OBJ;
+		instruction = "get_raw_obj";
+	} else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT) {
+		*cur_cap = ODB_HELPER_CAP_GET_DIRECT;
+		instruction = "get_direct";
+	} else {
+		BUG("No known ODB_HELPER_CAP_GET_XXX capability!");
+	}
+
+	err = packet_write_fmt_gently(process->in, "command=%s\n", instruction);
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	if (entry->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
+		err = read_packetized_raw_object_to_fd(o, sha1, process->out, fd) < 0;
+	else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
+		err = read_packetized_git_object_to_fd(o, sha1, process->out, fd) < 0;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int get_object_process(struct odb_helper *o, const unsigned char *sha1, int fd)
+{
+	int err;
+	struct strbuf status = STRBUF_INIT;
+	unsigned int cur_cap = 0;
+	struct object_process *entry = launch_object_process(o, 0);
+	if (!entry)
+		return -1;
+
+	err = send_get_packets(o, entry, sha1, fd, &cur_cap, &status);
+
+	return check_object_process_error(err, status.buf, entry,
+					  o->dealer, cur_cap);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -118,15 +512,12 @@ static int odb_helper_finish(struct odb_helper *o,
 	return 0;
 }
 
-int odb_helper_init(struct odb_helper *o)
+static int init_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (o->type != ODB_HELPER_SCRIPT_CMD)
-		return 0;
-
 	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
@@ -141,6 +532,21 @@ int odb_helper_init(struct odb_helper *o)
 	return 0;
 }
 
+int odb_helper_init(struct odb_helper *o)
+{
+	int res = 0;
+	uint64_t start = getnanotime();
+
+	if (o->type == ODB_HELPER_SCRIPT_CMD)
+		res = init_object_script(o);
+	else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
+		res = init_object_process(o);
+
+	trace_performance_since(start, "odb_helper_init");
+
+	return res;
+}
+
 static int parse_object_line(struct odb_helper_object *o, const char *line)
 {
 	char *end;
@@ -435,36 +841,45 @@ static int odb_helper_get_git_object(struct odb_helper *o,
 	return 0;
 }
 
+static int get_direct_script(struct odb_helper *o, const unsigned char *sha1)
+{
+	struct odb_helper_cmd cmd;
+
+	if (odb_helper_start(o, &cmd, 0, "get_direct %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+	return 0;
+}
+
 int odb_helper_get_direct(struct odb_helper *o,
 			  const unsigned char *sha1)
 {
-	uint64_t start = getnanotime();
-
-	if (o->type == ODB_HELPER_GIT_REMOTE) {
-		fetch_object(o->dealer, sha1);
-	} else {
-		struct odb_helper_object *obj;
-		struct odb_helper_cmd cmd;
+	int res = 0;
+	uint64_t start;
 
-		obj = odb_helper_lookup(o, sha1);
+	if (o->supported_capabilities & ODB_HELPER_CAP_HAVE &&
+	    o->type != ODB_HELPER_GIT_REMOTE) {
+		struct odb_helper_object *obj = odb_helper_lookup(o, sha1);
 		if (!obj)
 			return -1;
+	}
 
-		if (odb_helper_start(o, &cmd, 0, "get_direct %s", sha1_to_hex(sha1)) < 0)
-			return -1;
+	start = getnanotime();
 
-		if (odb_helper_finish(o, &cmd))
-			return -1;
-	}
+	if (o->type == ODB_HELPER_GIT_REMOTE)
+		fetch_object(o->dealer, sha1);
+	else if (o->type == ODB_HELPER_SCRIPT_CMD)
+		res = get_direct_script(o, sha1);
+	else
+		res = get_object_process(o, sha1, -1);
 
 	trace_performance_since(start, "odb_helper_get_direct");
 
-	return 0;
+	return res;
 }
 
-int odb_helper_get_object(struct odb_helper *o,
-			  const unsigned char *sha1,
-			  int fd)
+static int get_object_script(struct odb_helper *o, const unsigned char *sha1, int fd)
 {
 	if (o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
 		return odb_helper_get_git_object(o, sha1, fd);
@@ -476,6 +891,23 @@ int odb_helper_get_object(struct odb_helper *o,
 	BUG("invalid get capability (capabilities: '%d')", o->supported_capabilities);
 }
 
+int odb_helper_get_object(struct odb_helper *o,
+			  const unsigned char *sha1,
+			  int fd)
+{
+	int res = 0;
+	uint64_t start = getnanotime();
+
+	if (o->type == ODB_HELPER_SCRIPT_CMD)
+		res = get_object_script(o, sha1, fd);
+	else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
+		res = get_object_process(o, sha1, fd);
+
+	trace_performance_since(start, "odb_helper_get_object");
+
+	return res;
+}
+
 int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1)
diff --git a/sha1_file.c b/sha1_file.c
index 8c873554b2..300029459f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -735,7 +735,17 @@ int check_and_freshen_file(const char *fn, int freshen)
 
 static int check_and_freshen_local(const unsigned char *sha1, int freshen)
 {
-	return check_and_freshen_file(sha1_file_name(sha1), freshen);
+	int ret;
+	int tried_hook = 0;
+
+retry:
+	ret = check_and_freshen_file(sha1_file_name(sha1), freshen);
+	if (!ret && !tried_hook) {
+		tried_hook = 1;
+		if (!external_odb_get_direct(sha1))
+			goto retry;
+	}
+	return ret;
 }
 
 static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 33/40] Add t0450 to test 'get_direct' mechanism
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (31 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 32/40] odb-helper: add init_object_process() Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 34/40] Add t0460 to test passing git objects Christian Couder
                   ` (6 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Ben Peart

From: Ben Peart <benpeart@microsoft.com>

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0450-read-object.sh | 28 +++++++++++++++++++++
 t/t0450/read-object    | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)
 create mode 100755 t/t0450-read-object.sh
 create mode 100755 t/t0450/read-object

diff --git a/t/t0450-read-object.sh b/t/t0450-read-object.sh
new file mode 100755
index 0000000000..6b97305452
--- /dev/null
+++ b/t/t0450-read-object.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0450"
+
+test_expect_success 'setup host repo with a root commit' '
+	test_commit zero &&
+	hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+	git init guest-repo &&
+	(cd guest-repo &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+	cd guest-repo &&
+	test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0450/read-object b/t/t0450/read-object
new file mode 100755
index 0000000000..004e9368c9
--- /dev/null
+++ b/t/t0450/read-object
@@ -0,0 +1,68 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#	cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard <sha from rev-parse call above>
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_direct");
+packet_check_and_write_capabilities(\%remote_caps, "get_direct");
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_direct" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | GIT_NO_EXTERNAL_ODB=1 git hash-object -w --stdin >/dev/null 2>&1');
+
+		packet_txt_write(($?) ? "status=error" : "status=success");
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 34/40] Add t0460 to test passing git objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (32 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 33/40] Add t0450 to test 'get_direct' mechanism Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 35/40] odb-helper: add put_object_process() Christian Couder
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0460-read-object-git.sh | 28 +++++++++++++++++
 t/t0460/read-object-git    | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+)
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git

diff --git a/t/t0460-read-object-git.sh b/t/t0460-read-object-git.sh
new file mode 100755
index 0000000000..2873b445f3
--- /dev/null
+++ b/t/t0460-read-object-git.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process passing git objects'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0460"
+
+test_expect_success 'setup host repo with a root commit' '
+	test_commit zero &&
+	hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object-git"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+	git init guest-repo &&
+	(cd guest-repo &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+	cd guest-repo &&
+	test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0460/read-object-git b/t/t0460/read-object-git
new file mode 100755
index 0000000000..4b3ca0948b
--- /dev/null
+++ b/t/t0460/read-object-git
@@ -0,0 +1,78 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#	cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard <sha from rev-parse call above>
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_git_obj");
+packet_check_and_write_capabilities(\%remote_caps, "get_git_obj");
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_git_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $path = $sha1;
+		$path =~ s{..}{$&/};
+		$path = $DIR . "/objects/" . $path;
+
+		my $contents = do {
+		    local $/;
+		    open my $fh, $path or die "Can't open '$path': $!";
+		    <$fh>
+		};
+
+		packet_bin_write($contents);
+		packet_flush();
+		packet_txt_write("status=success");
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 35/40] odb-helper: add put_object_process()
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (33 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 34/40] Add t0460 to test passing git objects Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:33 ` [PATCH 36/40] Add t0470 to test passing raw objects Christian Couder
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This adds the infrastructure to send objects to a sub-process
handling the communication with an external odb.

For now we only handle sending raw blobs using the 'put_raw_obj'
instruction.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index a67dfddca0..d901f6d0bc 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -434,6 +434,58 @@ static int get_object_process(struct odb_helper *o, const unsigned char *sha1, i
 					  o->dealer, cur_cap);
 }
 
+static int send_put_packets(struct object_process *entry,
+			    const unsigned char *sha1,
+			    const void *buf,
+			    size_t len,
+			    struct strbuf *status)
+{
+	struct child_process *process = &entry->subprocess.process;
+	int err = packet_write_fmt_gently(process->in, "command=put_raw_obj\n");
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "size=%"PRIuMAX"\n", len);
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "kind=blob\n");
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	err = write_packetized_from_buf(buf, len, process->in);
+	if (err)
+		return err;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int put_object_process(struct odb_helper *o,
+			      const void *buf, size_t len,
+			      const char *type, unsigned char *sha1)
+{
+	int err;
+	struct object_process *entry;
+	struct strbuf status = STRBUF_INIT;
+
+	entry = launch_object_process(o, ODB_HELPER_CAP_PUT_RAW_OBJ);
+	if (!entry)
+		return -1;
+
+	err = send_put_packets(entry, sha1, buf, len, &status);
+
+	return check_object_process_error(err, status.buf, entry, o->dealer,
+					  ODB_HELPER_CAP_PUT_RAW_OBJ);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -908,9 +960,9 @@ int odb_helper_get_object(struct odb_helper *o,
 	return res;
 }
 
-int odb_helper_put_object(struct odb_helper *o,
-			  const void *buf, size_t len,
-			  const char *type, unsigned char *sha1)
+static int put_raw_object_script(struct odb_helper *o,
+				 const void *buf, size_t len,
+				 const char *type, unsigned char *sha1)
 {
 	struct odb_helper_cmd cmd;
 
@@ -936,3 +988,20 @@ int odb_helper_put_object(struct odb_helper *o,
 	odb_helper_finish(o, &cmd);
 	return 0;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+			  const void *buf, size_t len,
+			  const char *type, unsigned char *sha1)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	if (o->type == ODB_HELPER_SCRIPT_CMD)
+		res = put_raw_object_script(o, buf, len, type, sha1);
+	else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
+		res = put_object_process(o, buf, len, type, sha1);
+
+	trace_performance_since(start, "odb_helper_put_object");
+
+	return res;
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 36/40] Add t0470 to test passing raw objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (34 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 35/40] odb-helper: add put_object_process() Christian Couder
@ 2018-01-03 16:33 ` Christian Couder
  2018-01-03 16:34 ` [PATCH 37/40] odb-helper: add have_object_process() Christian Couder
                   ` (3 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0470-read-object-http-e-odb.sh | 109 ++++++++++++++++++++++++++++++++++++++
 t/t0470/read-object-plain         |  83 +++++++++++++++++++++++++++++
 2 files changed, 192 insertions(+)
 create mode 100755 t/t0470-read-object-http-e-odb.sh
 create mode 100755 t/t0470/read-object-plain

diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
new file mode 100755
index 0000000000..774528c04f
--- /dev/null
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process passing plain objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0470"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain"
+
+test_expect_success 'setup repo with a root commit' '
+	test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+	git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.subprocesscommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0470/read-object-plain b/t/t0470/read-object-plain
new file mode 100755
index 0000000000..0766e16032
--- /dev/null
+++ b/t/t0470/read-object-plain
@@ -0,0 +1,83 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_raw_obj", "put_raw_obj");
+packet_check_and_write_capabilities(\%remote_caps, "get_raw_obj", "put_raw_obj");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $get_url = $http_url . "/list/?sha1=" . $sha1;
+
+		my $userAgent = LWP::UserAgent->new();
+
+		my $response = $userAgent->get( $get_url );
+
+		if ($response->is_error) {
+		    packet_txt_write("size=0");
+		    packet_txt_write("kind=none");	    
+		    packet_txt_write("status=notfound");
+		} else {
+		    packet_txt_write("size=" . length($response->content));
+		    packet_txt_write("kind=blob");
+		    packet_bin_write($response->content);
+		    packet_flush();
+		    packet_txt_write("status=success");
+		}
+
+		packet_flush();
+	} elsif ( $command eq "put_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		my ($size) = packet_txt_read() =~ /^size=([0-9]+)$/;
+		my ($kind) = packet_txt_read() =~ /^kind=(\w+)$/;
+		packet_bin_read();
+
+		# We must read the content we are sent and send it to the right url
+		my ($res, $buf) = packet_bin_read();
+		die "bad packet_bin_read res ($res)" unless ($res eq 0);
+		( packet_bin_read() eq ( 1, "" ) ) || die "bad send end";		
+
+		my $upload_url = $http_url . "/upload/?sha1=" . $sha1 . "&size=" . $size . "&type=blob";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $request = POST $upload_url, Content_Type => 'multipart/form-data', Content => $buf;
+
+		my $response = $userAgent->request($request);
+
+		if ($response->is_error) {
+			packet_txt_write("status=failure");
+		} else {
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 37/40] odb-helper: add have_object_process()
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (35 preceding siblings ...)
  2018-01-03 16:33 ` [PATCH 36/40] Add t0470 to test passing raw objects Christian Couder
@ 2018-01-03 16:34 ` Christian Couder
  2018-01-03 16:34 ` [PATCH 38/40] Add t0480 to test "have" capability and raw objects Christian Couder
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This adds the infrastructure to handle 'have' instructions in
process mode.

The answer from the helper sub-process should be like the
output in script mode, that is lines like this:

sha1 SPACE size SPACE type NEWLINE

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 74 insertions(+), 2 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index d901f6d0bc..d8902a9541 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -635,6 +635,70 @@ static int odb_helper_object_cmp(const void *va, const void *vb)
 	return hashcmp(a->sha1, b->sha1);
 }
 
+static int send_have_packets(struct odb_helper *o,
+			     struct object_process *entry,
+			     struct strbuf *status)
+{
+	int packet_len;
+	int total_got = 0;
+	struct child_process *process = &entry->subprocess.process;
+	int err = packet_write_fmt_gently(process->in, "command=have\n");
+
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+		char *p = buf;
+		int more;
+
+		packet_len = packet_read(process->out, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		total_got += packet_len;
+
+		/* 'have' packets should end with '\n' or '\0' */
+		do {
+			char *eol = strchrnul(p, '\n');
+			more = (*eol == '\n');
+			*eol = '\0';
+			if (add_have_entry(o, p))
+				break;
+			p = eol + 1;
+		} while (more && *p);
+	}
+
+	if (packet_len < 0)
+		return packet_len;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int have_object_process(struct odb_helper *o)
+{
+	int err;
+	struct object_process *entry;
+	struct strbuf status = STRBUF_INIT;
+
+	entry = launch_object_process(o, ODB_HELPER_CAP_HAVE);
+	if (!entry)
+		return -1;
+
+	err = send_have_packets(o, entry, &status);
+
+	return check_object_process_error(err, status.buf, entry, o->dealer,
+					  ODB_HELPER_CAP_HAVE);
+}
+
 static void have_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
@@ -656,12 +720,20 @@ static void have_object_script(struct odb_helper *o)
 
 static void odb_helper_load_have(struct odb_helper *o)
 {
+	uint64_t start;
+
 	if (o->have_valid)
 		return;
 	o->have_valid = 1;
 
+	start = getnanotime();
+
 	if (o->type == ODB_HELPER_SCRIPT_CMD)
 		have_object_script(o);
+	else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
+		have_object_process(o);
+
+	trace_performance_since(start, "odb_helper_load_have");
 
 	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
@@ -923,7 +995,7 @@ int odb_helper_get_direct(struct odb_helper *o,
 		fetch_object(o->dealer, sha1);
 	else if (o->type == ODB_HELPER_SCRIPT_CMD)
 		res = get_direct_script(o, sha1);
-	else
+	else if (o->type == ODB_HELPER_SUBPROCESS_CMD)
 		res = get_object_process(o, sha1, -1);
 
 	trace_performance_since(start, "odb_helper_get_direct");
@@ -993,7 +1065,7 @@ int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1)
 {
-	int res;
+	int res = 0;
 	uint64_t start = getnanotime();
 
 	if (o->type == ODB_HELPER_SCRIPT_CMD)
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 38/40] Add t0480 to test "have" capability and raw objects
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (36 preceding siblings ...)
  2018-01-03 16:34 ` [PATCH 37/40] odb-helper: add have_object_process() Christian Couder
@ 2018-01-03 16:34 ` Christian Couder
  2018-01-03 16:34 ` [PATCH 39/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
  2018-01-03 16:34 ` [PATCH 40/40] Add Documentation/technical/external-odb.txt Christian Couder
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0480-read-object-have-http-e-odb.sh | 109 +++++++++++++++++++++++++++++++++
 t/t0480/read-object-plain-have         | 103 +++++++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100755 t/t0480-read-object-have-http-e-odb.sh
 create mode 100755 t/t0480/read-object-plain-have

diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
new file mode 100755
index 0000000000..056a40f2bb
--- /dev/null
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process with "have" cap and plain objects'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0480"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain-have"
+
+test_expect_success 'setup repo with a root commit' '
+	test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+	git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0480/read-object-plain-have b/t/t0480/read-object-plain-have
new file mode 100755
index 0000000000..f230cbd5eb
--- /dev/null
+++ b/t/t0480/read-object-plain-have
@@ -0,0 +1,103 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+my %remote_caps = packet_read_and_check_capabilities("get_raw_obj", "put_raw_obj", "have");
+packet_check_and_write_capabilities(\%remote_caps, "get_raw_obj", "put_raw_obj", "have");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "have" ) {
+		# read the flush after the command
+		packet_bin_read();
+
+		my $have_url = $http_url . "/list/";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $response = $userAgent->get( $have_url );
+
+		if ($response->is_error) {
+			packet_bin_write("");
+			packet_flush();
+			packet_txt_write("status=failure");
+		} else {
+			packet_bin_write($response->content);
+			packet_flush();
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} elsif ( $command eq "get_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $get_url = $http_url . "/list/?sha1=" . $sha1;
+
+		my $userAgent = LWP::UserAgent->new();
+
+		my $response = $userAgent->get( $get_url );
+
+		if ($response->is_error) {
+			packet_txt_write("size=0");
+			packet_txt_write("kind=none");	    
+			packet_txt_write("status=notfound");
+		} else {
+			packet_txt_write("size=" . length($response->content));
+			packet_txt_write("kind=blob");
+			packet_bin_write($response->content);
+			packet_flush();
+			packet_txt_write("status=success");
+		}
+
+		packet_flush();
+	} elsif ( $command eq "put_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		my ($size) = packet_txt_read() =~ /^size=([0-9]+)$/;
+		my ($kind) = packet_txt_read() =~ /^kind=(\w+)$/;
+
+		packet_bin_read();
+
+		# We must read the content we are sent and send it to the right url
+		my ($res, $buf) = packet_bin_read();
+		die "bad packet_bin_read res ($res)" unless ($res eq 0);
+		( packet_bin_read() eq ( 1, "" ) ) || die "bad send end";		
+
+		my $upload_url = $http_url . "/upload/?sha1=" . $sha1 . "&size=" . $size . "&type=blob";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $request = POST $upload_url, Content_Type => 'multipart/form-data', Content => $buf;
+
+		my $response = $userAgent->request($request);
+
+		if ($response->is_error) {
+			packet_txt_write("status=failure");
+		} else {
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 39/40] external-odb: use 'odb=magic' attribute to mark odb blobs
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (37 preceding siblings ...)
  2018-01-03 16:34 ` [PATCH 38/40] Add t0480 to test "have" capability and raw objects Christian Couder
@ 2018-01-03 16:34 ` Christian Couder
  2018-01-03 16:34 ` [PATCH 40/40] Add Documentation/technical/external-odb.txt Christian Couder
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

To tell which blobs should be sent to the "magic" external odb,
let's require that the blobs be marked using the 'odb=magic'
attribute.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c                         | 25 ++++++++++++++++++++++---
 external-odb.h                         |  3 ++-
 sha1_file.c                            | 20 +++++++++++++++-----
 t/t0400-external-odb.sh                |  3 +++
 t/t0410-transfer-e-odb.sh              |  3 +++
 t/t0420-transfer-http-e-odb.sh         |  3 +++
 t/t0470-read-object-http-e-odb.sh      |  3 +++
 t/t0480-read-object-have-http-e-odb.sh |  3 +++
 8 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 50c1cec50b..e3a05e24e3 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -2,6 +2,7 @@
 #include "external-odb.h"
 #include "odb-helper.h"
 #include "config.h"
+#include "attr.h"
 
 static struct odb_helper *helpers;
 static struct odb_helper **helpers_tail = &helpers;
@@ -165,19 +166,37 @@ int external_odb_get_object(const unsigned char *sha1)
 	return external_odb_do_get_object(sha1);
 }
 
+static int has_odb_attrs(struct odb_helper *o, const char *path)
+{
+	static struct attr_check *check;
+
+	if (!check)
+		check = attr_check_initl("odb", NULL);
+
+	if (!git_check_attr(path, check)) {
+		const char *value = check->items[0].value;
+		return value ? !strcmp(o->name, value) : 0;
+	}
+	return 0;
+}
+
 int external_odb_put_object(const void *buf, size_t len,
-			    const char *type, unsigned char *sha1)
+			    const char *type, unsigned char *sha1,
+			    const char *path)
 {
 	struct odb_helper *o;
 
 	external_odb_init();
 
 	/* For now accept only blobs */
-	if (strcmp(type, "blob"))
+	if (!path || strcmp(type, "blob"))
 		return 1;
 
 	for (o = helpers; o; o = o->next) {
-		int r = odb_helper_put_object(o, buf, len, type, sha1);
+		int r;
+		if (!has_odb_attrs(o, path))
+			continue;
+		r = odb_helper_put_object(o, buf, len, type, sha1);
 		if (r <= 0)
 			return r;
 	}
diff --git a/external-odb.h b/external-odb.h
index 26bb931685..5a8936f417 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -7,6 +7,7 @@ extern int external_odb_has_object(const unsigned char *sha1);
 extern int external_odb_get_object(const unsigned char *sha1);
 extern int external_odb_get_direct(const unsigned char *sha1);
 extern int external_odb_put_object(const void *buf, size_t len,
-				   const char *type, unsigned char *sha1);
+				   const char *type, unsigned char *sha1,
+				   const char *path);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/sha1_file.c b/sha1_file.c
index 300029459f..d3f395e967 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1696,7 +1696,9 @@ static int freshen_packed_object(const unsigned char *sha1)
 	return 1;
 }
 
-int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1)
+static int write_sha1_file_with_path(const void *buf, unsigned long len,
+				     const char *type, unsigned char *sha1,
+				     const char *path)
 {
 	char hdr[32];
 	int hdrlen = sizeof(hdr);
@@ -1705,13 +1707,19 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	 * it out into .git/objects/??/?{38} file.
 	 */
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
-	if (!external_odb_put_object(buf, len, type, sha1))
+	if (!external_odb_put_object(buf, len, type, sha1, path))
 		return 0;
 	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
 
+int write_sha1_file(const void *buf, unsigned long len,
+		    const char *type, unsigned char *sha1)
+{
+	return write_sha1_file_with_path(buf, len, type, sha1, NULL);
+}
+
 int hash_sha1_file_literally(const void *buf, unsigned long len, const char *type,
 			     struct object_id *oid, unsigned flags)
 {
@@ -1832,7 +1840,8 @@ static int index_mem(struct object_id *oid, void *buf, size_t size,
 	}
 
 	if (write_object)
-		ret = write_sha1_file(buf, size, typename(type), oid->hash);
+		ret = write_sha1_file_with_path(buf, size, typename(type),
+						oid->hash, path);
 	else
 		ret = hash_sha1_file(buf, size, typename(type), oid->hash);
 	if (re_allocated)
@@ -1854,8 +1863,9 @@ static int index_stream_convert_blob(struct object_id *oid, int fd,
 				 get_safe_crlf(flags));
 
 	if (write_object)
-		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
-				      oid->hash);
+		ret = write_sha1_file_with_path(sbuf.buf, sbuf.len,
+						typename(OBJ_BLOB),
+						oid->hash, path);
 	else
 		ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
 				     oid->hash);
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index f924de870f..944b7bb28f 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -71,6 +71,9 @@ test_expect_success 'helper can add objects to alt repo' '
 
 test_expect_success 'commit adds objects to alt repo' '
 	test_config odb.magic.scriptCommand "$HELPER" &&
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit three &&
 	hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
 	content=$(git -C alt-repo show "$hash3") &&
diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
index 065ec7d759..fd3e37918c 100755
--- a/t/t0410-transfer-e-odb.sh
+++ b/t/t0410-transfer-e-odb.sh
@@ -111,6 +111,9 @@ test_expect_success 'setup other repo and its alternate repo' '
 '
 
 test_expect_success 'new blobs are put in first object store' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	content=$(cd alt-repo1 && git show "$hash1") &&
diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
index f84fe950ec..d307af0457 100755
--- a/t/t0420-transfer-http-e-odb.sh
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -94,6 +94,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
index 774528c04f..d814a43d59 100755
--- a/t/t0470-read-object-http-e-odb.sh
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -62,6 +62,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
index 056a40f2bb..fe1fac5ef3 100755
--- a/t/t0480-read-object-have-http-e-odb.sh
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -62,6 +62,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 40/40] Add Documentation/technical/external-odb.txt
  2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
                   ` (38 preceding siblings ...)
  2018-01-03 16:34 ` [PATCH 39/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
@ 2018-01-03 16:34 ` Christian Couder
  39 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-03 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

This describes the external odb mechanism's purpose and
how it works.

Helped-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/technical/external-odb.txt | 342 +++++++++++++++++++++++++++++++
 1 file changed, 342 insertions(+)
 create mode 100644 Documentation/technical/external-odb.txt

diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt
new file mode 100644
index 0000000000..58ec8a8145
--- /dev/null
+++ b/Documentation/technical/external-odb.txt
@@ -0,0 +1,342 @@
+External ODBs
+^^^^^^^^^^^^^
+
+The External ODB mechanism makes it possible for Git objects, only
+blobs for now though, to be stored in an "external object database"
+(External ODB).
+
+An External ODB can be any object store as long as there is an helper
+program called an "odb helper" that can communicate with Git to
+transfer objects to/from the external odb and to retrieve information
+about available objects in the external odb.
+
+Purpose
+=======
+
+The purpose of this mechanism is to make possible to handle Git
+objects, especially blobs, in much more flexible ways.
+
+Currently Git can store its objects only in the form of loose objects
+in separate files or packed objects in a pack file. These existing
+object stores cannot be easily optimized for many different kind of
+contents.
+
+So the current stores are not flexible enough for some important use
+cases like handling really big binary files or handling a really big
+number of files that are fetched only as needed. And it is not
+realistic to expect that Git could fully natively handle many of such
+use cases. Git would need to natively implement different internal
+stores which would be a huge burden and which could lead to
+re-implement things like HTTP servers, Docker registries or artifact
+stores that already exist outside Git.
+
+Furthermore many improvements that are dependent on specific setups
+could be implemented in the way Git objects are managed if it was
+possible to customize how the Git objects are handled. For example a
+restartable clone using the bundle mechanism has often been requested,
+but implementing that would go against the current strict rules under
+which the Git objects are currently handled.
+
+What Git needs is a mechanism to make it possible to customize in a
+lot of different ways how the Git objects are handled. Though this
+mechanism should try as much as possible to avoid interfering with the
+usual way in which Git handle its objects.
+
+Helpers
+=======
+
+ODB helpers are commands that have to be registered using either the
+"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
+config variables.
+
+Registering such a command tells Git that an external odb called
+<odbname> exists and that the registered command should be used to
+communicate with it.
+
+The communication happens through instructions that are sent by Git
+and that the commands should answer. If it makes sense, Git can send
+the same instruction to many commands in the order in which they are
+configured.
+
+There are 2 kinds of commands. Commands registered using the
+"odb.<odbname>.subprocessCommand" config variable are called "process
+commands" and the associated mode is called "process mode". Commands
+registered using the "odb.<odbname>.scriptCommand" config variables
+are called "script commands" and the associated mode is called "script
+mode".
+
+Early on git commands send an 'init' instruction to the registered
+commands. A capability negociation will take place during this
+request/response exchange which will let Git and the helpers know how
+they can further collaborate. The attribute system can also be used to
+tell Git which objects should be handled by which helper.
+
+Process Mode
+============
+
+In process mode the command is started as a single process invocation
+that should last for the entire life of the single Git command that
+started it.
+
+A packet format (pkt-line, see technical/protocol-common.txt) based
+protocol over standard input and standard output is used for
+communication between Git and the helper command.
+
+After the process command is started, Git sends a welcome message
+("git-read-object-client"), a list of supported protocol version
+numbers, and a flush packet. Git expects to read a welcome response
+message ("git-read-object-server"), exactly one protocol version
+number from the previously sent list, and a flush packet. All further
+communication will be based on the selected version.
+
+The remaining protocol description below documents "version=1". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities
+that it supports and a flush packet. Git expects to read a list of
+desired capabilities, which must be a subset of the supported
+capabilities list, and a flush packet as response:
+
+------------------------
+packet: git> git-read-object-client
+packet: git> version=1
+packet: git> version=42
+packet: git> 0000
+packet: git< git-read-object-server
+packet: git< version=1
+packet: git< 0000
+packet: git> capability=get_raw_obj
+packet: git> capability=have
+packet: git> capability=put_raw_obj
+packet: git> capability=not-yet-invented
+packet: git> 0000
+packet: git< capability=get_raw_obj
+packet: git< 0000
+------------------------
+
+Afterwards Git sends a list of "key=value" pairs terminated with a
+flush packet. The list will contain at least the instruction (based on
+the supported capabilities) and the arguments for the
+instruction. Please note, that the process must not send any response
+before it received the final flush packet.
+
+In general any response from the helper should end with a status
+packet. See the documentation of the 'get_*' instructions below for
+examples of status packets.
+
+After the helper has processed an instruction, it is expected to wait
+for the next "key=value" list containing another instruction.
+
+On exit Git will close the pipe to the helper. The helper is then
+expected to detect EOF and exit gracefully on its own. Git will wait
+until the process has stopped.
+
+Script Mode
+===========
+
+In this mode Git launches the script command each time it wants to
+communicates with the helper. There is no welcome message and no
+protocol version in this mode.
+
+The instruction and associated arguments are passed as arguments when
+launching the script command and if needed further information is
+passed between Git and the command through stdin and stdout.
+
+Capabilities/Instructions
+=========================
+
+The following instructions are currently supported by Git:
+
+- init
+- get_git_obj
+- get_raw_obj
+- get_direct
+- put_raw_obj
+- have
+
+The plan is to also support 'put_git_obj' and 'put_direct' soon, for
+consistency with the 'get_*' instructions.
+
+ - 'init'
+
+All the process and script commands must accept the 'init'
+instruction. It should be the first instruction sent to a command. It
+should not be advertised in the capability exchange. Any argument
+should be ignored.
+
+In process mode, after receiving the 'init' instruction and a flush
+packet, the helper should just send a status packet and then a flush
+packet. See the 'get_*' instructions below for examples of status
+packets.
+
+In script mode the command should print on stdout the capabilities
+that it supports if any. This is the only time in script mode when a
+capability exchange happens.
+
+For example a script command could use the following shell code
+snippet to handle the 'init' instruction:
+
+------------------------
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=put_raw_obj"
+	echo "capability=have"
+	;;
+------------------------
+
+ - 'get_git_obj <sha1>' and 'get_raw_obj <sha1>'
+
+These instructions should have a hexadecimal <sha1> argument to tell
+which object the helper should send to git.
+
+In process mode the sha1 argument should be followed by a flush packet
+like this:
+
+------------------------
+packet: git> command=get_git_obj
+packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
+packet: git> 0000
+------------------------
+
+After reading that the helper should send the requested object to Git in a
+packet series followed by a flush packet. If the helper does not experience
+problems then the helper must send a "success" status like the following:
+
+------------------------
+packet: git< status=success
+packet: git< 0000
+------------------------
+
+In case the helper cannot or does not want to send the requested
+object as well as any other object for the lifetime of the Git
+process, then it is expected to respond with an "abort" status at any
+point in the protocol:
+
+------------------------
+packet: git< status=abort
+packet: git< 0000
+------------------------
+
+Git neither stops nor restarts the helper in case a
+"notfound"/"error"/"abort" status is set. An "error" status means a
+possibly more transient error than an abort. The helper should also
+send a "notfound" error in case of a "get_*" instruction, which means
+that the requested object cannot be found.
+
+If the helper dies during the communication or does not adhere to the
+protocol then Git will stop and restart it with the next instruction.
+
+In script mode the helper should just send the requested object to Git
+by writing it to stdout and should then exit. The exit code should
+signal to Git if a problem occured or not.
+
+The only difference between 'get_git_obj' and 'get_raw_obj' is that in
+case of 'get_git_obj' the requested object should be sent as a Git
+object (that is in the same format as loose object files). In case of
+'get_raw_obj' the object should be sent in its raw format (that is the
+same output as `git cat-file <type> <sha1>`).
+
+ - 'get_direct <sha1>'
+
+This instruction is similar as the other 'get_*' instructions except
+that no object should be sent from the helper to Git. Instead the
+helper should directly write the requested object into a loose object
+file in the ".git/objects" directory.
+
+After the helper has sent the "status=success" packet and the
+following flush packet in process mode, or after it has exited in the
+script mode, Git will lookup again for the requested sha1 in its loose
+object files and pack files.
+
+ - 'put_raw_obj <sha1> <size> <type>'
+
+This instruction should be following by three arguments to tell which
+object the helper will receive from git: <sha1>, <size> and
+<type>. The hexadecimal <sha1> argument describes the object that will
+be sent from Git to the helper. The <type> is the object type ("blob",
+"tree", "commit" or "tag") of this object. The <size> is the size in
+bytes of the (decompressed) object content.
+
+In process mode the last argument (the type) should be followed by a
+flush packet.
+
+After reading that the helper should read the announced object from
+Git in a packet series followed by a flush packet.
+
+If the helper does not experience problems when receiving and storing
+or processing the object, then the helper must send a "success" status
+as described for the 'get_*' instructions.
+
+In script mode the helper should just receive the announced object
+from its standard input. After receiving and processing the object,
+the helper should exit and its exit code should signal to Git if a
+problem occured or not.
+
+- 'have'
+
+In process mode this instruction should be followed by a flush
+packet. After receiving this packet the helper should send the sha1,
+size and type, in this order, of all the objects it can provide to Git
+(through a 'get_*' instruction). There should be a space character
+between the sha1 and the size and between the size and the type, and
+then a new line character after the type.
+
+If many packets are needed to send back all this information, the
+split between packets should be made after the new line characters.
+
+If the helper does not experience problems, then it must then send a
+"success" status as described for the 'get_*' instructions.
+
+In script mode the helper should send to its standard output the sha1,
+size and type, in this order of all the objects it can provide to
+Git. There should also be a space character between the sha1 and the
+size and between the size and the type, and then a new line character
+after the type.
+
+After sending this, the script helper should exit and its exit code
+should signal to Git if a problem occured or not.
+
+Order of instructions
+=====================
+
+For get_*_object instructions the regular code to find objects is
+called before the odb helpers.
+
+For put_*_object instructions the regular code to store the objects is
+called after the odb helpers.
+
+For now this order is not configurable.
+
+Object caching
+==============
+
+If a helper returns the object data as requested by get_git_obj or
+get_raw_obj, then Git will itself store the object locally in its
+regular object store, so it is redundant for the helper to also store
+or try to store the object in the regular object store.
+
+Yeah, this seems to defeat the goal of enabling specialized object
+handlers to handle large or other "unusual" objects that git normally
+doesn't deal well with. So in the long run there should be a way to
+make this configurable.
+
+Selecting objects
+=================
+
+To select objects that should be handled by an external odb, one can
+use the git attributes system. For now this will only work with blobs
+and this will only work along with the 'put_raw_obj' instruction.
+
+For example if one has an external odb called "magic" and has
+registered an associated a process command helper that supports the
+'put_raw_obj' instruction, then one can tell Git that all the .jpg
+files should be handled by the "magic" odb using a .gitattributes file
+can that contains:
+
+------------------------
+*.jpg           odb=magic
+------------------------
+
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 10/40] external-odb: implement external_odb_get_direct
  2018-01-03 16:33 ` [PATCH 10/40] external-odb: implement external_odb_get_direct Christian Couder
@ 2018-01-04 17:44   ` Jeff Hostetler
  2018-01-15 14:47     ` Christian Couder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff Hostetler @ 2018-01-04 17:44 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler



On 1/3/2018 11:33 AM, Christian Couder wrote:
> This is implemented only in the promisor remote mode
> for now by calling fetch_object().
> 
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>   external-odb.c | 15 +++++++++++++++
>   external-odb.h |  1 +
>   odb-helper.c   | 13 +++++++++++++
>   odb-helper.h   |  3 ++-
>   4 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/external-odb.c b/external-odb.c
> index d26e63d8b1..5d0afb9762 100644
> --- a/external-odb.c
> +++ b/external-odb.c
> @@ -76,3 +76,18 @@ int external_odb_has_object(const unsigned char *sha1)
>   			return 1;
>   	return 0;
>   }
> +
> +int external_odb_get_direct(const unsigned char *sha1)
> +{
> +	struct odb_helper *o;
> +
> +	external_odb_init();
> +
> +	for (o = helpers; o; o = o->next) {
> +		if (odb_helper_get_direct(o, sha1) < 0)
> +			continue;
> +		return 0;
 > +	}

Would this be simpler said as:
	for (o = ...)
		if (!odb_helper_get_direct(...))
			return 0;


> +
> +	return -1;
> +}
> diff --git a/external-odb.h b/external-odb.h
> index 9a3c2f01b3..fd6708163e 100644
> --- a/external-odb.h
> +++ b/external-odb.h
> @@ -4,5 +4,6 @@
>   extern int has_external_odb(void);
>   extern const char *external_odb_root(void);
>   extern int external_odb_has_object(const unsigned char *sha1);
> +extern int external_odb_get_direct(const unsigned char *sha1);
>   
>   #endif /* EXTERNAL_ODB_H */
> diff --git a/odb-helper.c b/odb-helper.c
> index 1404393807..4b70b287af 100644
> --- a/odb-helper.c
> +++ b/odb-helper.c
> @@ -4,6 +4,7 @@
>   #include "odb-helper.h"
>   #include "run-command.h"
>   #include "sha1-lookup.h"
> +#include "fetch-object.h"
>   
>   struct odb_helper *odb_helper_new(const char *name, int namelen)
>   {
> @@ -52,3 +53,15 @@ int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
>   	return !!odb_helper_lookup(o, sha1);
>   }
>   
> +int odb_helper_get_direct(struct odb_helper *o,
> +			  const unsigned char *sha1)
> +{
> +	int res = 0;
> +	uint64_t start = getnanotime();
> +
> +	fetch_object(o->dealer, sha1);
> +
> +	trace_performance_since(start, "odb_helper_get_direct");
> +
> +	return res;

'res' will always be 0, so the external_odb_get_direct() will
only do the first helper.  i haven't looked at the rest of the
series yet, so maybe you've already addressed this.

Also, I put a TODO comment in the fetch_object() header to
consider returning an error/success, so maybe that could help
here too.

> +}
> diff --git a/odb-helper.h b/odb-helper.h
> index 9395e606ce..f4bc66b0ef 100644
> --- a/odb-helper.h
> +++ b/odb-helper.h
> @@ -20,5 +20,6 @@ struct odb_helper {
>   extern struct odb_helper *odb_helper_new(const char *name, int namelen);
>   extern int odb_helper_has_object(struct odb_helper *o,
>   				 const unsigned char *sha1);
> -
> +extern int odb_helper_get_direct(struct odb_helper *o,
> +				 const unsigned char *sha1);
>   #endif /* ODB_HELPER_H */
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/40] sha1_file: prepare for external odbs
  2018-01-03 16:33 ` [PATCH 14/40] sha1_file: prepare for external odbs Christian Couder
@ 2018-01-04 18:00   ` Jeff Hostetler
  2018-01-16  7:23     ` Christian Couder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff Hostetler @ 2018-01-04 18:00 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler



On 1/3/2018 11:33 AM, Christian Couder wrote:
> In the following commits we will need some functions that were
> internal to sha1_file.c, so let's first make them non static
> and declare them in "cache.h". While at it, let's rename
> 'create_tmpfile()' to 'create_object_tmpfile()' to make its
> name less generic.
> 
> Let's also split out 'sha1_file_name_alt()' from
> 'sha1_file_name()' and 'open_sha1_file_alt()' from
> 'open_sha1_file()', as we will need both of these new
> functions too.
[...]
> diff --git a/sha1_file.c b/sha1_file.c
> index 261baf800f..785e8dda03 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -322,17 +322,22 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
>   	}
>   }
>   
> -const char *sha1_file_name(const unsigned char *sha1)
> +const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1)
>   {
>   	static struct strbuf buf = STRBUF_INIT;

While we are refactoring sha1_file_name() and adding
sha1_file_name_alt(), could we also change the API and
pass in the strbuf so we can get rid of the static buffer?
Granted, it is a little off topic, but it will help out
in the long run.

[...]
> @@ -1551,7 +1562,7 @@ static inline int directory_size(const char *filename)
>    * We want to avoid cross-directory filename renames, because those
>    * can have problems on various filesystems (FAT, NFS, Coda).
>    */
> -static int create_tmpfile(struct strbuf *tmp, const char *filename)
> +int create_object_tmpfile(struct strbuf *tmp, const char *filename)
>   {
>   	int fd, dirlen = directory_size(filename);
>   
> @@ -1591,7 +1602,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
>   	static struct strbuf tmp_file = STRBUF_INIT;

Same thing here, since we are renaming the function anyway, could we
add a strbuf arg and get rid of the static one?


>   	const char *filename = sha1_file_name(sha1);
>   
> -	fd = create_tmpfile(&tmp_file, filename);
> +	fd = create_object_tmpfile(&tmp_file, filename);
>   	if (fd < 0) {
>   		if (errno == EACCES)
>   			return error("insufficient permission for adding an object to repository database %s", get_object_directory());
> 

Jeff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 15/40] external-odb: add script mode support
  2018-01-03 16:33 ` [PATCH 15/40] external-odb: add script mode support Christian Couder
@ 2018-01-04 19:55   ` Jeff Hostetler
  2018-03-19 13:15     ` Christian Couder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff Hostetler @ 2018-01-04 19:55 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler



On 1/3/2018 11:33 AM, Christian Couder wrote:
> This adds support for the script command mode where
> an helper script or command is called to retrieve or
> manage objects.
> 
> This implements the 'have' and 'get_git_obj'
> instructions for the script mode.
> 
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>   external-odb.c          |  51 ++++++++++-
>   external-odb.h          |   1 +
>   odb-helper.c            | 218 +++++++++++++++++++++++++++++++++++++++++++++++-
>   odb-helper.h            |   4 +
>   sha1_file.c             |  12 ++-
>   t/t0400-external-odb.sh |  44 ++++++++++
>   6 files changed, 327 insertions(+), 3 deletions(-)
>   create mode 100755 t/t0400-external-odb.sh
> 
> diff --git a/external-odb.c b/external-odb.c
> index 5d0afb9762..81f2aa5fac 100644
> --- a/external-odb.c
> +++ b/external-odb.c
> @@ -33,8 +33,14 @@ static int external_odb_config(const char *var, const char *value, void *data)
>   
>   	o = find_or_create_helper(name, namelen);
>   
> -	if (!strcmp(subkey, "promisorremote"))
> +	if (!strcmp(subkey, "promisorremote")) {
> +		o->type = ODB_HELPER_GIT_REMOTE;
>   		return git_config_string(&o->dealer, var, value);
> +	}
> +	if (!strcmp(subkey, "scriptcommand")) {
> +		o->type = ODB_HELPER_SCRIPT_CMD;
> +		return git_config_string(&o->dealer, var, value);
> +	}
>   
>   	return 0;
>   }
> @@ -77,6 +83,49 @@ int external_odb_has_object(const unsigned char *sha1)
>   	return 0;
>   }
>   
> +int external_odb_get_object(const unsigned char *sha1)
> +{
> +	struct odb_helper *o;
> +	const char *path;
> +
> +	if (!external_odb_has_object(sha1))
> +		return -1;
> +
> +	path = sha1_file_name_alt(external_odb_root(), sha1);
> +	safe_create_leading_directories_const(path);
> +	prepare_external_alt_odb();
> +
> +	for (o = helpers; o; o = o->next) {
> +		struct strbuf tmpfile = STRBUF_INIT;
> +		int ret;
> +		int fd;
> +
> +		if (!odb_helper_has_object(o, sha1))
> +			continue;
> +
> +		fd = create_object_tmpfile(&tmpfile, path);
> +		if (fd < 0) {
> +			strbuf_release(&tmpfile);
> +			return -1;
> +		}
> +
> +		if (odb_helper_get_object(o, sha1, fd) < 0) {
> +			close(fd);
> +			unlink(tmpfile.buf);
> +			strbuf_release(&tmpfile);
> +			continue;
> +		}
> +
> +		close_sha1_file(fd);
> +		ret = finalize_object_file(tmpfile.buf, path);
> +		strbuf_release(&tmpfile);
> +		if (!ret)
> +			return 0;
> +	}
> +
> +	return -1;
> +}
> +
>   int external_odb_get_direct(const unsigned char *sha1)
>   {
>   	struct odb_helper *o;
> diff --git a/external-odb.h b/external-odb.h
> index fd6708163e..fb8b94972f 100644
> --- a/external-odb.h
> +++ b/external-odb.h
> @@ -4,6 +4,7 @@
>   extern int has_external_odb(void);
>   extern const char *external_odb_root(void);
>   extern int external_odb_has_object(const unsigned char *sha1);
> +extern int external_odb_get_object(const unsigned char *sha1);
>   extern int external_odb_get_direct(const unsigned char *sha1);
>   
>   #endif /* EXTERNAL_ODB_H */
> diff --git a/odb-helper.c b/odb-helper.c
> index 4b70b287af..c1a3443dc7 100644
> --- a/odb-helper.c
> +++ b/odb-helper.c
> @@ -21,13 +21,124 @@ struct odb_helper_cmd {
>   	struct child_process child;
>   };
>   
> +/*
> + * Callers are responsible to ensure that the result of vaddf(fmt, ap)
> + * is properly shell-quoted.
> + */
> +static void prepare_helper_command(struct argv_array *argv, const char *cmd,
> +				   const char *fmt, va_list ap)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_addstr(&buf, cmd);
> +	strbuf_addch(&buf, ' ');
> +	strbuf_vaddf(&buf, fmt, ap);

I do find this a bit odd that you're putting the cmd, a space,
and the printf results into a single arg in the argv, rather than
directly loading up the argv.

Is there an issue with the whitespace between the cmd and the
printf result being in the same arg -- especially if there are
quoting issues in the fmt string as you mention in the comment
above?  (not sure, just asking)

> +
> +	argv_array_push(argv, buf.buf);
> +	strbuf_release(&buf);
> +}
> +
> +__attribute__((format (printf,3,4)))
> +static int odb_helper_start(struct odb_helper *o,
> +			    struct odb_helper_cmd *cmd,
> +			    const char *fmt, ...)
> +{
> +	va_list ap;
> +
> +	memset(cmd, 0, sizeof(*cmd));
> +	argv_array_init(&cmd->argv);
> +
> +	if (!o->dealer)
> +		return -1;
> +
> +	va_start(ap, fmt);
> +	prepare_helper_command(&cmd->argv, o->dealer, fmt, ap);
> +	va_end(ap);
> +
> +	cmd->child.argv = cmd->argv.argv;
> +	cmd->child.use_shell = 1;
> +	cmd->child.no_stdin = 1;
> +	cmd->child.out = -1;
> +
> +	if (start_command(&cmd->child) < 0) {
> +		argv_array_clear(&cmd->argv);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int odb_helper_finish(struct odb_helper *o,
> +			     struct odb_helper_cmd *cmd)
> +{
> +	int ret = finish_command(&cmd->child);
> +	argv_array_clear(&cmd->argv);
> +	if (ret) {
> +		warning("odb helper '%s' reported failure", o->name);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int parse_object_line(struct odb_helper_object *o, const char *line)
> +{

Is there a reason to order the fields this way?  In the test
at the bottom of this commit, you take cat-file output and
re-order the columns with awk.   I'm just wondering if we kept
cat-file ordering in the format here, we could simplify things.

> +	char *end;
> +	if (get_sha1_hex(line, o->sha1) < 0)
> +		return -1;
> +
> +	line += 40;
> +	if (*line++ != ' ')
> +		return -1;
> +
> +	o->size = strtoul(line, &end, 10);
> +	if (line == end || *end++ != ' ')
> +		return -1;
> +
> +	o->type = type_from_string(end);
> +	return 0;
> +}
> +
> +static int add_have_entry(struct odb_helper *o, const char *line)
> +{
> +	ALLOC_GROW(o->have, o->have_nr+1, o->have_alloc);

I didn't see where o->have is initially allocated.  The default is
to start with 64 and then grow by 3/2 as needed.  If we are getting
lots of objects here, we'll have lots of reallocs slowing things down.
It would be better to seed it somewhere to a large value.


> +	if (parse_object_line(&o->have[o->have_nr], line) < 0) {
> +		warning("bad 'have' input from odb helper '%s': %s",
> +			o->name, line);
> +		return 1;
> +	}
> +	o->have_nr++;
> +	return 0;
> +}
> +
> +static int odb_helper_object_cmp(const void *va, const void *vb)
> +{
> +	const struct odb_helper_object *a = va, *b = vb;
> +	return hashcmp(a->sha1, b->sha1);
> +}
> +
>   static void odb_helper_load_have(struct odb_helper *o)
>   {
> +	struct odb_helper_cmd cmd;
> +	FILE *fh;
> +	struct strbuf line = STRBUF_INIT;
> +
>   	if (o->have_valid)
>   		return;
>   	o->have_valid = 1;
>   
> -	/* TODO */
> +	if (odb_helper_start(o, &cmd, "have") < 0)
> +		return;
> +
> +	fh = xfdopen(cmd.child.out, "r");
> +	while (strbuf_getline(&line, fh) != EOF)
> +		if (add_have_entry(o, line.buf))
> +			break;
> +
> +	strbuf_release(&line);
> +	fclose(fh);
> +	odb_helper_finish(o, &cmd);
> +
> +	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
>   }

Help me understand this.  I originally thought that the "have"
command would ask about one or more specific OIDs, but after a
couple of readings it looks like the "have" command is getting the
*complete* list of objects known to this external ODB and building
a sorted array of them.  And then we do this for each external ODB
configured.

If this is the case, I'm concerned that this will have scale problems.
"git cat-file..." shows that even my little git.git repo has 360K
objects.  And "time git cat-file..." takes over 1.1 seconds.

I'm wondering if there is a better/different way to do this.
(Sorry if you've already covered this in earlier versions of this
patch series and I missed it.)

I'm wondering about "struct odb_helper" maintain a long-running
connection to the sub-command which would let git ask for the object
and either get it back (as you have below) or get an not-found error.
The sub-command would then wait for another get-object request over
stdin.

I say this because whatever operation I'm doing (like a checkout or
log or blame) is only going to need a small percentage of the total
set of objects.  I think it would be more efficient to try/retry to
fault them in as-needed via one or more external helpers than to build
these tables in advance.


>   static const unsigned char *have_sha1_access(size_t index, void *table)
> @@ -53,6 +164,111 @@ int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
>   	return !!odb_helper_lookup(o, sha1);
>   }
>   
> +int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
> +			    int fd)
> +{
> +	struct odb_helper_object *obj;
> +	struct odb_helper_cmd cmd;
> +	unsigned long total_got;
> +	git_zstream stream;
> +	int zret = Z_STREAM_END;
> +	git_SHA_CTX hash;
> +	unsigned char real_sha1[20];
> +	struct strbuf header = STRBUF_INIT;
> +	unsigned long hdr_size;
> +
> +	obj = odb_helper_lookup(o, sha1);
> +	if (!obj)
> +		return -1;
> +
> +	if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
> +		return -1;
> +
> +	memset(&stream, 0, sizeof(stream));
> +	git_inflate_init(&stream);
> +	git_SHA1_Init(&hash);
> +	total_got = 0;
> +
> +	for (;;) {
> +		unsigned char buf[4096];
> +		int r;
> +
> +		r = xread(cmd.child.out, buf, sizeof(buf));
> +		if (r < 0) {
> +			error("unable to read from odb helper '%s': %s",
> +			      o->name, strerror(errno));
> +			close(cmd.child.out);
> +			odb_helper_finish(o, &cmd);
> +			git_inflate_end(&stream);
> +			return -1;
> +		}
> +		if (r == 0)
> +			break;
> +
> +		write_or_die(fd, buf, r);
> +
> +		stream.next_in = buf;
> +		stream.avail_in = r;
> +		do {
> +			unsigned char inflated[4096];
> +			unsigned long got;
> +
> +			stream.next_out = inflated;
> +			stream.avail_out = sizeof(inflated);
> +			zret = git_inflate(&stream, Z_SYNC_FLUSH);
> +			got = sizeof(inflated) - stream.avail_out;
> +
> +			git_SHA1_Update(&hash, inflated, got);
> +			/* skip header when counting size */
> +			if (!total_got) {
> +				const unsigned char *p = memchr(inflated, '\0', got);
> +				if (p) {
> +					unsigned long hdr_last = p - inflated + 1;
> +					strbuf_add(&header, inflated, hdr_last);
> +					got -= hdr_last;
> +				} else {
> +					strbuf_add(&header, inflated, got);
> +					got = 0;
> +				}
> +			}
> +			total_got += got;
> +		} while (stream.avail_in && zret == Z_OK);
> +	}
> +
> +	close(cmd.child.out);
> +	git_inflate_end(&stream);
> +	git_SHA1_Final(real_sha1, &hash);
> +	if (odb_helper_finish(o, &cmd))
> +		return -1;
> +	if (zret != Z_STREAM_END) {
> +		warning("bad zlib data from odb helper '%s' for %s",
> +			o->name, sha1_to_hex(sha1));
> +		return -1;
> +	}
> +	if (total_got != obj->size) {
> +		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
> +			o->name, sha1_to_hex(sha1), total_got, obj->size);
> +		return -1;
> +	}
> +	if (hashcmp(real_sha1, sha1)) {
> +		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
> +			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
> +		return -1;
> +	}
> +	if (parse_sha1_header(header.buf, &hdr_size) < 0) {
> +		warning("could not parse header from odb helper '%s' for %s",
> +			o->name, sha1_to_hex(sha1));
> +		return -1;
> +	}
> +	if (total_got != hdr_size) {
> +		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
> +			o->name, sha1_to_hex(sha1), total_got, hdr_size);
> +		return -1;
> +	}

Does "strbuf header" need to be released before all of the returns?

> +
> +	return 0;
> +}

If I understand this function, it is receiving an object from the odb
helper sub-command and creating a loose object for it in the local odb.
The object is compressed on the wire and this function unzips it to
verify the object is completely received and without errors.

But this function does not keep the resulting object around in memory
(which might not fit anyway).  I'm assuming that the higher-layer that
needs the object will just be told to read the new loose object.


>   int odb_helper_get_direct(struct odb_helper *o,
>   			  const unsigned char *sha1)
>   {
> diff --git a/odb-helper.h b/odb-helper.h
> index f4bc66b0ef..90b279c07e 100644
> --- a/odb-helper.h
> +++ b/odb-helper.h
> @@ -20,6 +20,10 @@ struct odb_helper {
>   extern struct odb_helper *odb_helper_new(const char *name, int namelen);
>   extern int odb_helper_has_object(struct odb_helper *o,
>   				 const unsigned char *sha1);
> +extern int odb_helper_get_object(struct odb_helper *o,
> +				 const unsigned char *sha1,
> +				 int fd);
>   extern int odb_helper_get_direct(struct odb_helper *o,
>   				 const unsigned char *sha1);
> +
>   #endif /* ODB_HELPER_H */
> diff --git a/sha1_file.c b/sha1_file.c
> index 785e8dda03..f1d688c10a 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -904,6 +904,9 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
>   			return 0;
>   	}
>   
> +	if (!external_odb_get_object(sha1) && !lstat(*path, st))
> +		return 0;
> +
>   	return -1;
>   }
>   
> @@ -939,7 +942,14 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
>   	if (fd >= 0)
>   		return fd;
>   
> -	return open_sha1_file_alt(sha1, path);
> +	fd = open_sha1_file_alt(sha1, path);
> +	if (fd >= 0)
> +		return fd;
> +
> +	if (!external_odb_get_object(sha1))
> +		fd = open_sha1_file_alt(sha1, path);
> +
> +	return fd;
>   }
>   
>   /*
> diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
> new file mode 100755
> index 0000000000..eaa3688a31
> --- /dev/null
> +++ b/t/t0400-external-odb.sh
> @@ -0,0 +1,44 @@
> +#!/bin/sh
> +
> +test_description='basic tests for external object databases'
> +
> +. ./test-lib.sh
> +
> +ALT_SOURCE="$PWD/alt-repo/.git"
> +export ALT_SOURCE
> +write_script odb-helper <<\EOF
> +GIT_DIR=$ALT_SOURCE; export GIT_DIR
> +case "$1" in
> +have)
> +	git cat-file --batch-check --batch-all-objects |
> +	awk '{print $1 " " $3 " " $2}'
> +	;;
> +get_git_obj)
> +	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
> +	;;
> +esac
> +EOF
> +HELPER="\"$PWD\"/odb-helper"
> +
> +test_expect_success 'setup alternate repo' '
> +	git init alt-repo &&
> +	test_commit -C alt-repo one &&
> +	test_commit -C alt-repo two &&
> +	alt_head=$(git -C alt-repo rev-parse HEAD)
> +'
> +
> +test_expect_success 'alt objects are missing' '
> +	test_must_fail git log --format=%s $alt_head
> +'
> +
> +test_expect_success 'helper can retrieve alt objects' '
> +	test_config odb.magic.scriptCommand "$HELPER" &&
> +	cat >expect <<-\EOF &&
> +	two
> +	one
> +	EOF
> +	git log --format=%s $alt_head >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_done
> 

Jeff


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/40] Add initial external odb support
  2018-01-03 16:33 ` [PATCH 01/40] Add initial external odb support Christian Couder
@ 2018-01-04 19:59   ` Jeff Hostetler
  2018-01-15 14:34     ` Christian Couder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff Hostetler @ 2018-01-04 19:59 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler



On 1/3/2018 11:33 AM, Christian Couder wrote:
> The external-odb.{c,h} files will contain the functions
> that are called by the rest of Git mostly from
> "sha1_file.c" to access the objects managed by the
> external odbs.
> 
> The odb-helper.{c,h} files will contain the functions to
> actually implement communication with either the internal
> functions or the external scripts or processes that will
> manage and provide external git objects.
> 
> For now only infrastructure to create helpers from the
> config and to manage a cache for the 'have' command is
> implemented.
> 
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
[...]
> diff --git a/odb-helper.h b/odb-helper.h
> new file mode 100644
> index 0000000000..9395e606ce
> --- /dev/null
> +++ b/odb-helper.h
> @@ -0,0 +1,24 @@
> +#ifndef ODB_HELPER_H
> +#define ODB_HELPER_H
> +
> +struct odb_helper {
> +	const char *name;
> +	const char *dealer;
> +
> +	struct odb_helper_object {
> +		unsigned char sha1[20];

Should this be "struct object_id" ?

> +		unsigned long size;
> +		enum object_type type;
> +	} *have;
> +	int have_nr;
> +	int have_alloc;
> +	int have_valid;
> +
> +	struct odb_helper *next;
> +};
> +
> +extern struct odb_helper *odb_helper_new(const char *name, int namelen);
> +extern int odb_helper_has_object(struct odb_helper *o,
> +				 const unsigned char *sha1);
> +
> +#endif /* ODB_HELPER_H */

Jeff


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 28/40] pack-objects: don't pack objects in external odbs
  2018-01-03 16:33 ` [PATCH 28/40] pack-objects: don't pack objects in external odbs Christian Couder
@ 2018-01-04 20:54   ` Jeff Hostetler
  2018-03-19 13:27     ` Christian Couder
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff Hostetler @ 2018-01-04 20:54 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler



On 1/3/2018 11:33 AM, Christian Couder wrote:
> Objects managed by an external ODB should not be put into
> pack files. They should be transfered using other mechanism
> that can be specific to the external odb.
> 
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>   builtin/pack-objects.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 6c71552cdf..4ed66c7677 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -28,6 +28,7 @@
>   #include "argv-array.h"
>   #include "mru.h"
>   #include "packfile.h"
> +#include "external-odb.h"
>   
>   static const char *pack_usage[] = {
>   	N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
> @@ -1026,6 +1027,9 @@ static int want_object_in_pack(const struct object_id *oid,
>   			return want;
>   	}
>   
> +	if (external_odb_has_object(oid->hash))
> +		return 0;
> +

I worry about the performance of this in light of my comments
earlier in the patch series about the expense of building the
"have" sets.

Since we've already checked for a loose object and we are about
to walk thru the local packfiles, so if we don't find it in any
of them, then we don't have it locally.  Only then do we need
to worry about external odbs.

If we don't have it locally, does the caller of this function
have sufficient "promisor" infomation infer that the object
should exist on the promisor remote?   Since you're going to
omit it from the packfile anyway, you don't really need to know
if the remote actually has it.


>   	for (entry = packed_git_mru.head; entry; entry = entry->next) {
>   		struct packed_git *p = entry->item;
>   		off_t offset;
> 

Jeff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/40] Add initial external odb support
  2018-01-04 19:59   ` Jeff Hostetler
@ 2018-01-15 14:34     ` Christian Couder
  0 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-15 14:34 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

On Thu, Jan 4, 2018 at 8:59 PM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>> diff --git a/odb-helper.h b/odb-helper.h
>> new file mode 100644
>> index 0000000000..9395e606ce
>> --- /dev/null
>> +++ b/odb-helper.h
>> @@ -0,0 +1,24 @@
>> +#ifndef ODB_HELPER_H
>> +#define ODB_HELPER_H
>> +
>> +struct odb_helper {
>> +       const char *name;
>> +       const char *dealer;
>> +
>> +       struct odb_helper_object {
>> +               unsigned char sha1[20];
>
>
> Should this be "struct object_id" ?

Yeah, I changed this in my current version.

I cannot change all the functions to take a struct object_id though as
some are called inside sha1_file.c where there are only sha1.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 10/40] external-odb: implement external_odb_get_direct
  2018-01-04 17:44   ` Jeff Hostetler
@ 2018-01-15 14:47     ` Christian Couder
  0 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-15 14:47 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

On Thu, Jan 4, 2018 at 6:44 PM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>
> On 1/3/2018 11:33 AM, Christian Couder wrote:
>>
>> This is implemented only in the promisor remote mode
>> for now by calling fetch_object().
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>>   external-odb.c | 15 +++++++++++++++
>>   external-odb.h |  1 +
>>   odb-helper.c   | 13 +++++++++++++
>>   odb-helper.h   |  3 ++-
>>   4 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/external-odb.c b/external-odb.c
>> index d26e63d8b1..5d0afb9762 100644
>> --- a/external-odb.c
>> +++ b/external-odb.c
>> @@ -76,3 +76,18 @@ int external_odb_has_object(const unsigned char *sha1)
>>                         return 1;
>>         return 0;
>>   }
>> +
>> +int external_odb_get_direct(const unsigned char *sha1)
>> +{
>> +       struct odb_helper *o;
>> +
>> +       external_odb_init();
>> +
>> +       for (o = helpers; o; o = o->next) {
>> +               if (odb_helper_get_direct(o, sha1) < 0)
>> +                       continue;
>> +               return 0;
>
>> +     }
>
> Would this be simpler said as:
>         for (o = ...)
>                 if (!odb_helper_get_direct(...))
>                         return 0;

At then end of the series the content of the loop is:

        if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT))
            continue;
        if (odb_helper_get_direct(o, sha1) < 0)
            continue;
        return 0;

And I think it is fine like that, so I don't think changing this
commit is a good idea.

>> +       return -1;
>> +}
>> diff --git a/external-odb.h b/external-odb.h
>> index 9a3c2f01b3..fd6708163e 100644
>> --- a/external-odb.h
>> +++ b/external-odb.h
>> @@ -4,5 +4,6 @@
>>   extern int has_external_odb(void);
>>   extern const char *external_odb_root(void);
>>   extern int external_odb_has_object(const unsigned char *sha1);
>> +extern int external_odb_get_direct(const unsigned char *sha1);
>>     #endif /* EXTERNAL_ODB_H */
>> diff --git a/odb-helper.c b/odb-helper.c
>> index 1404393807..4b70b287af 100644
>> --- a/odb-helper.c
>> +++ b/odb-helper.c
>> @@ -4,6 +4,7 @@
>>   #include "odb-helper.h"
>>   #include "run-command.h"
>>   #include "sha1-lookup.h"
>> +#include "fetch-object.h"
>>     struct odb_helper *odb_helper_new(const char *name, int namelen)
>>   {
>> @@ -52,3 +53,15 @@ int odb_helper_has_object(struct odb_helper *o, const
>> unsigned char *sha1)
>>         return !!odb_helper_lookup(o, sha1);
>>   }
>>   +int odb_helper_get_direct(struct odb_helper *o,
>> +                         const unsigned char *sha1)
>> +{
>> +       int res = 0;
>> +       uint64_t start = getnanotime();
>> +
>> +       fetch_object(o->dealer, sha1);
>> +
>> +       trace_performance_since(start, "odb_helper_get_direct");
>> +
>> +       return res;
>
>
> 'res' will always be 0, so the external_odb_get_direct() will
> only do the first helper.  i haven't looked at the rest of the
> series yet, so maybe you've already addressed this.

That's why I previously suggested in one of your or Jonathan's patch
that fetch_object() should return an int that tells the caller if the
object has been fetched instead of void.

If we make it possible at one point to have the objects fetched
fetch_object() in different remotes (and I think that's a
straightforward goal), this will actually fail but callers will have
no simple way to know that.

> Also, I put a TODO comment in the fetch_object() header to
> consider returning an error/success, so maybe that could help
> here too.

Yeah, indeed.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/40] sha1_file: prepare for external odbs
  2018-01-04 18:00   ` Jeff Hostetler
@ 2018-01-16  7:23     ` Christian Couder
  0 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-01-16  7:23 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

On Thu, Jan 4, 2018 at 7:00 PM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>
> On 1/3/2018 11:33 AM, Christian Couder wrote:

>> diff --git a/sha1_file.c b/sha1_file.c
>> index 261baf800f..785e8dda03 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -322,17 +322,22 @@ static void fill_sha1_path(struct strbuf *buf, const
>> unsigned char *sha1)
>>         }
>>   }
>>   -const char *sha1_file_name(const unsigned char *sha1)
>> +const char *sha1_file_name_alt(const char *objdir, const unsigned char
>> *sha1)
>>   {
>>         static struct strbuf buf = STRBUF_INIT;
>
> While we are refactoring sha1_file_name() and adding
> sha1_file_name_alt(), could we also change the API and
> pass in the strbuf so we can get rid of the static buffer?
> Granted, it is a little off topic, but it will help out
> in the long run.

Ok, but I prefer to do that in a separate patch series, so I just sent:

https://public-inbox.org/git/20180116071814.19884-1-chriscool@tuxfamily.org

>> @@ -1551,7 +1562,7 @@ static inline int directory_size(const char
>> *filename)
>>    * We want to avoid cross-directory filename renames, because those
>>    * can have problems on various filesystems (FAT, NFS, Coda).
>>    */
>> -static int create_tmpfile(struct strbuf *tmp, const char *filename)
>> +int create_object_tmpfile(struct strbuf *tmp, const char *filename)
>>   {
>>         int fd, dirlen = directory_size(filename);
>>   @@ -1591,7 +1602,7 @@ static int write_loose_object(const unsigned char
>> *sha1, char *hdr, int hdrlen,
>>         static struct strbuf tmp_file = STRBUF_INIT;
>
> Same thing here, since we are renaming the function anyway, could we
> add a strbuf arg and get rid of the static one?

I will see how it goes with the above patch to remove the static
buffer in sha1_file_name() before preparing a patch to remove the
static buffer in create_tmpfile().

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 15/40] external-odb: add script mode support
  2018-01-04 19:55   ` Jeff Hostetler
@ 2018-03-19 13:15     ` Christian Couder
  0 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-03-19 13:15 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

On Thu, Jan 4, 2018 at 8:55 PM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 1/3/2018 11:33 AM, Christian Couder wrote:
>>
>> diff --git a/odb-helper.c b/odb-helper.c
>> index 4b70b287af..c1a3443dc7 100644
>> --- a/odb-helper.c
>> +++ b/odb-helper.c
>> @@ -21,13 +21,124 @@ struct odb_helper_cmd {
>>         struct child_process child;
>>   };
>>   +/*
>> + * Callers are responsible to ensure that the result of vaddf(fmt, ap)
>> + * is properly shell-quoted.
>> + */
>> +static void prepare_helper_command(struct argv_array *argv, const char
>> *cmd,
>> +                                  const char *fmt, va_list ap)
>> +{
>> +       struct strbuf buf = STRBUF_INIT;
>> +
>> +       strbuf_addstr(&buf, cmd);
>> +       strbuf_addch(&buf, ' ');
>> +       strbuf_vaddf(&buf, fmt, ap);
>
> I do find this a bit odd that you're putting the cmd, a space,
> and the printf results into a single arg in the argv, rather than
> directly loading up the argv.
>
> Is there an issue with the whitespace between the cmd and the
> printf result being in the same arg -- especially if there are
> quoting issues in the fmt string as you mention in the comment
> above?  (not sure, just asking)

This was discussed with Junio here:

https://public-inbox.org/git/xmqqmvggbl6m.fsf@gitster.mtv.corp.google.com/

I agree that I should take another look at it though. I will do that
in the next version after the one I will send really soon now.

>> +static int parse_object_line(struct odb_helper_object *o, const char
>> *line)
>> +{
>
> Is there a reason to order the fields this way?  In the test
> at the bottom of this commit, you take cat-file output and
> re-order the columns with awk.   I'm just wondering if we kept
> cat-file ordering in the format here, we could simplify things.

Yeah, maybe the shell script could be simplified while the C code
would not be more complex. I don't remember if that was in the
original version from Peff or if there was a reason to do it this way.
I will take a look at that in the next version after the one I will
send really soon now.

>> +       char *end;
>> +       if (get_sha1_hex(line, o->sha1) < 0)
>> +               return -1;
>> +
>> +       line += 40;
>> +       if (*line++ != ' ')
>> +               return -1;
>> +
>> +       o->size = strtoul(line, &end, 10);
>> +       if (line == end || *end++ != ' ')
>> +               return -1;
>> +
>> +       o->type = type_from_string(end);
>> +       return 0;
>> +}
>> +
>> +static int add_have_entry(struct odb_helper *o, const char *line)
>> +{
>> +       ALLOC_GROW(o->have, o->have_nr+1, o->have_alloc);
>
> I didn't see where o->have is initially allocated.  The default is
> to start with 64 and then grow by 3/2 as needed.  If we are getting
> lots of objects here, we'll have lots of reallocs slowing things down.
> It would be better to seed it somewhere to a large value.

Yeah but using this is optional. It depends on the cap_have
capability. And I think that if there is a really huge number of
objects in the external odb, it might be better to just not use
cap_have.

Another possibility would be for the helper to send the number of
objects it has first, so that we could alloc the right number before
receiving the haves, but this could require the helper to be more
complex.

>> +       if (parse_object_line(&o->have[o->have_nr], line) < 0) {
>> +               warning("bad 'have' input from odb helper '%s': %s",
>> +                       o->name, line);
>> +               return 1;
>> +       }
>> +       o->have_nr++;
>> +       return 0;
>> +}
>> +
>> +static int odb_helper_object_cmp(const void *va, const void *vb)
>> +{
>> +       const struct odb_helper_object *a = va, *b = vb;
>> +       return hashcmp(a->sha1, b->sha1);
>> +}
>> +
>>   static void odb_helper_load_have(struct odb_helper *o)
>>   {
>> +       struct odb_helper_cmd cmd;
>> +       FILE *fh;
>> +       struct strbuf line = STRBUF_INIT;
>> +
>>         if (o->have_valid)
>>                 return;
>>         o->have_valid = 1;
>>   -     /* TODO */
>> +       if (odb_helper_start(o, &cmd, "have") < 0)
>> +               return;
>> +
>> +       fh = xfdopen(cmd.child.out, "r");
>> +       while (strbuf_getline(&line, fh) != EOF)
>> +               if (add_have_entry(o, line.buf))
>> +                       break;
>> +
>> +       strbuf_release(&line);
>> +       fclose(fh);
>> +       odb_helper_finish(o, &cmd);
>> +
>> +       qsort(o->have, o->have_nr, sizeof(*o->have),
>> odb_helper_object_cmp);
>>   }
>
> Help me understand this.  I originally thought that the "have"
> command would ask about one or more specific OIDs, but after a
> couple of readings it looks like the "have" command is getting the
> *complete* list of objects known to this external ODB and building
> a sorted array of them.  And then we do this for each external ODB
> configured.
>
> If this is the case, I'm concerned that this will have scale problems.
> "git cat-file..." shows that even my little git.git repo has 360K
> objects.  And "time git cat-file..." takes over 1.1 seconds.

Yeah, but originally external odbs were not supposed to contain a huge
number of objects. And now that cap_have is optional, it should
probably not be used if indeed the external odb contains a huge number
of objects.

> I'm wondering if there is a better/different way to do this.
> (Sorry if you've already covered this in earlier versions of this
> patch series and I missed it.)
>
> I'm wondering about "struct odb_helper" maintain a long-running
> connection to the sub-command which would let git ask for the object
> and either get it back (as you have below) or get an not-found error.
> The sub-command would then wait for another get-object request over
> stdin.
>
> I say this because whatever operation I'm doing (like a checkout or
> log or blame) is only going to need a small percentage of the total
> set of objects.  I think it would be more efficient to try/retry to
> fault them in as-needed via one or more external helpers than to build
> these tables in advance.

This is possible when using the "process mode" and not using cap_have.
I reused Ben Peart's work to make that possible by the way.

>>   static const unsigned char *have_sha1_access(size_t index, void *table)
>> @@ -53,6 +164,111 @@ int odb_helper_has_object(struct odb_helper *o, const
>> unsigned char *sha1)
>>         return !!odb_helper_lookup(o, sha1);
>>   }
>>   +int odb_helper_get_object(struct odb_helper *o, const unsigned char
>> *sha1,
>> +                           int fd)
>> +{
>> +       struct odb_helper_object *obj;
>> +       struct odb_helper_cmd cmd;
>> +       unsigned long total_got;
>> +       git_zstream stream;
>> +       int zret = Z_STREAM_END;
>> +       git_SHA_CTX hash;
>> +       unsigned char real_sha1[20];
>> +       struct strbuf header = STRBUF_INIT;
>> +       unsigned long hdr_size;
>> +
>> +       obj = odb_helper_lookup(o, sha1);
>> +       if (!obj)
>> +               return -1;
>> +
>> +       if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1))
>> < 0)
>> +               return -1;
>> +
>> +       memset(&stream, 0, sizeof(stream));
>> +       git_inflate_init(&stream);
>> +       git_SHA1_Init(&hash);
>> +       total_got = 0;
>> +
>> +       for (;;) {
>> +               unsigned char buf[4096];
>> +               int r;
>> +
>> +               r = xread(cmd.child.out, buf, sizeof(buf));
>> +               if (r < 0) {
>> +                       error("unable to read from odb helper '%s': %s",
>> +                             o->name, strerror(errno));
>> +                       close(cmd.child.out);
>> +                       odb_helper_finish(o, &cmd);
>> +                       git_inflate_end(&stream);
>> +                       return -1;
>> +               }
>> +               if (r == 0)
>> +                       break;
>> +
>> +               write_or_die(fd, buf, r);
>> +
>> +               stream.next_in = buf;
>> +               stream.avail_in = r;
>> +               do {
>> +                       unsigned char inflated[4096];
>> +                       unsigned long got;
>> +
>> +                       stream.next_out = inflated;
>> +                       stream.avail_out = sizeof(inflated);
>> +                       zret = git_inflate(&stream, Z_SYNC_FLUSH);
>> +                       got = sizeof(inflated) - stream.avail_out;
>> +
>> +                       git_SHA1_Update(&hash, inflated, got);
>> +                       /* skip header when counting size */
>> +                       if (!total_got) {
>> +                               const unsigned char *p = memchr(inflated,
>> '\0', got);
>> +                               if (p) {
>> +                                       unsigned long hdr_last = p -
>> inflated + 1;
>> +                                       strbuf_add(&header, inflated,
>> hdr_last);
>> +                                       got -= hdr_last;
>> +                               } else {
>> +                                       strbuf_add(&header, inflated,
>> got);
>> +                                       got = 0;
>> +                               }
>> +                       }
>> +                       total_got += got;
>> +               } while (stream.avail_in && zret == Z_OK);
>> +       }
>> +
>> +       close(cmd.child.out);
>> +       git_inflate_end(&stream);
>> +       git_SHA1_Final(real_sha1, &hash);
>> +       if (odb_helper_finish(o, &cmd))
>> +               return -1;
>> +       if (zret != Z_STREAM_END) {
>> +               warning("bad zlib data from odb helper '%s' for %s",
>> +                       o->name, sha1_to_hex(sha1));
>> +               return -1;
>> +       }
>> +       if (total_got != obj->size) {
>> +               warning("size mismatch from odb helper '%s' for %s (%lu !=
>> %lu)",
>> +                       o->name, sha1_to_hex(sha1), total_got, obj->size);
>> +               return -1;
>> +       }
>> +       if (hashcmp(real_sha1, sha1)) {
>> +               warning("sha1 mismatch from odb helper '%s' for %s (got
>> %s)",
>> +                       o->name, sha1_to_hex(sha1),
>> sha1_to_hex(real_sha1));
>> +               return -1;
>> +       }
>> +       if (parse_sha1_header(header.buf, &hdr_size) < 0) {
>> +               warning("could not parse header from odb helper '%s' for
>> %s",
>> +                       o->name, sha1_to_hex(sha1));
>> +               return -1;
>> +       }
>> +       if (total_got != hdr_size) {
>> +               warning("size mismatch from odb helper '%s' for %s (%lu !=
>> %lu)",
>> +                       o->name, sha1_to_hex(sha1), total_got, hdr_size);
>> +               return -1;
>> +       }
>
> Does "strbuf header" need to be released before all of the returns?

Yeah, thanks for noticing. This is fixed in the version I will send
really soon now.

>> +
>> +       return 0;
>> +}
>
> If I understand this function, it is receiving an object from the odb
> helper sub-command and creating a loose object for it in the local odb.
> The object is compressed on the wire and this function unzips it to
> verify the object is completely received and without errors.
>
> But this function does not keep the resulting object around in memory
> (which might not fit anyway).  I'm assuming that the higher-layer that
> needs the object will just be told to read the new loose object.

Yeah, that's how it works. I think I took most of the code and the way
this function works from the work Ben Peart had sent to the mailing
list.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 28/40] pack-objects: don't pack objects in external odbs
  2018-01-04 20:54   ` Jeff Hostetler
@ 2018-03-19 13:27     ` Christian Couder
  0 siblings, 0 replies; 51+ messages in thread
From: Christian Couder @ 2018-03-19 13:27 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler

On Thu, Jan 4, 2018 at 9:54 PM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 1/3/2018 11:33 AM, Christian Couder wrote:
>>
>> Objects managed by an external ODB should not be put into
>> pack files. They should be transfered using other mechanism
>> that can be specific to the external odb.
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>>   builtin/pack-objects.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 6c71552cdf..4ed66c7677 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -28,6 +28,7 @@
>>   #include "argv-array.h"
>>   #include "mru.h"
>>   #include "packfile.h"
>> +#include "external-odb.h"
>>     static const char *pack_usage[] = {
>>         N_("git pack-objects --stdout [<options>...] [< <ref-list> | <
>> <object-list>]"),
>> @@ -1026,6 +1027,9 @@ static int want_object_in_pack(const struct
>> object_id *oid,
>>                         return want;
>>         }
>>   +     if (external_odb_has_object(oid->hash))
>> +               return 0;
>
> I worry about the performance of this in light of my comments
> earlier in the patch series about the expense of building the
> "have" sets.

Building the have set is done only if the cap_have capability is used,
(see my answer to your comments on patch 15/40).

> Since we've already checked for a loose object and we are about
> to walk thru the local packfiles, so if we don't find it in any
> of them, then we don't have it locally.  Only then do we need
> to worry about external odbs.
>
> If we don't have it locally, does the caller of this function
> have sufficient "promisor" infomation infer that the object
> should exist on the promisor remote?   Since you're going to
> omit it from the packfile anyway, you don't really need to know
> if the remote actually has it.

Yeah, I think it works in the same way as how the promisor code works,
but I haven't checked carefully, so I will take a look at that.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2018-03-19 13:27 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
2018-01-03 16:33 ` [PATCH 01/40] Add initial external odb support Christian Couder
2018-01-04 19:59   ` Jeff Hostetler
2018-01-15 14:34     ` Christian Couder
2018-01-03 16:33 ` [PATCH 02/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2018-01-03 16:33 ` [PATCH 03/40] external-odb: add has_external_odb() Christian Couder
2018-01-03 16:33 ` [PATCH 04/40] fsck: introduce promisor objects Christian Couder
2018-01-03 16:33 ` [PATCH 05/40] fsck: support refs pointing to " Christian Couder
2018-01-03 16:33 ` [PATCH 06/40] fsck: support referenced " Christian Couder
2018-01-03 16:33 ` [PATCH 07/40] fsck: support promisor objects as CLI argument Christian Couder
2018-01-03 16:33 ` [PATCH 08/40] index-pack: refactor writing of .keep files Christian Couder
2018-01-03 16:33 ` [PATCH 09/40] introduce fetch-object: fetch one promisor object Christian Couder
2018-01-03 16:33 ` [PATCH 10/40] external-odb: implement external_odb_get_direct Christian Couder
2018-01-04 17:44   ` Jeff Hostetler
2018-01-15 14:47     ` Christian Couder
2018-01-03 16:33 ` [PATCH 11/40] sha1_file: support lazily fetching missing objects Christian Couder
2018-01-03 16:33 ` [PATCH 12/40] rev-list: support termination at promisor objects Christian Couder
2018-01-03 16:33 ` [PATCH 13/40] gc: do not repack promisor packfiles Christian Couder
2018-01-03 16:33 ` [PATCH 14/40] sha1_file: prepare for external odbs Christian Couder
2018-01-04 18:00   ` Jeff Hostetler
2018-01-16  7:23     ` Christian Couder
2018-01-03 16:33 ` [PATCH 15/40] external-odb: add script mode support Christian Couder
2018-01-04 19:55   ` Jeff Hostetler
2018-03-19 13:15     ` Christian Couder
2018-01-03 16:33 ` [PATCH 16/40] odb-helper: add 'enum odb_helper_type' Christian Couder
2018-01-03 16:33 ` [PATCH 17/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
2018-01-03 16:33 ` [PATCH 18/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
2018-01-03 16:33 ` [PATCH 19/40] external odb: add 'put_raw_obj' support Christian Couder
2018-01-03 16:33 ` [PATCH 20/40] external-odb: accept only blobs for now Christian Couder
2018-01-03 16:33 ` [PATCH 21/40] t0400: add test for external odb write support Christian Couder
2018-01-03 16:33 ` [PATCH 22/40] Add t0410 to test external ODB transfer Christian Couder
2018-01-03 16:33 ` [PATCH 23/40] lib-httpd: pass config file to start_httpd() Christian Couder
2018-01-03 16:33 ` [PATCH 24/40] lib-httpd: add upload.sh Christian Couder
2018-01-03 16:33 ` [PATCH 25/40] lib-httpd: add list.sh Christian Couder
2018-01-03 16:33 ` [PATCH 26/40] lib-httpd: add apache-e-odb.conf Christian Couder
2018-01-03 16:33 ` [PATCH 27/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
2018-01-03 16:33 ` [PATCH 28/40] pack-objects: don't pack objects in external odbs Christian Couder
2018-01-04 20:54   ` Jeff Hostetler
2018-03-19 13:27     ` Christian Couder
2018-01-03 16:33 ` [PATCH 29/40] Add t0420 to test transfer to HTTP external odb Christian Couder
2018-01-03 16:33 ` [PATCH 30/40] external-odb: add 'get_direct' support Christian Couder
2018-01-03 16:33 ` [PATCH 31/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
2018-01-03 16:33 ` [PATCH 32/40] odb-helper: add init_object_process() Christian Couder
2018-01-03 16:33 ` [PATCH 33/40] Add t0450 to test 'get_direct' mechanism Christian Couder
2018-01-03 16:33 ` [PATCH 34/40] Add t0460 to test passing git objects Christian Couder
2018-01-03 16:33 ` [PATCH 35/40] odb-helper: add put_object_process() Christian Couder
2018-01-03 16:33 ` [PATCH 36/40] Add t0470 to test passing raw objects Christian Couder
2018-01-03 16:34 ` [PATCH 37/40] odb-helper: add have_object_process() Christian Couder
2018-01-03 16:34 ` [PATCH 38/40] Add t0480 to test "have" capability and raw objects Christian Couder
2018-01-03 16:34 ` [PATCH 39/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
2018-01-03 16:34 ` [PATCH 40/40] Add Documentation/technical/external-odb.txt Christian Couder

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git