git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 0/9] Introducing remote ODBs
@ 2018-08-02  6:14 Christian Couder
  2018-08-02  6:14 ` [PATCH v4 1/9] fetch-object: make functions return an error code Christian Couder
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

This path series is a follow up from the patch series called "odb
remote" that I sent earlier this year, which was itself a follow up
from previous series. See the links section for more information.

As with the previous "odb remote" series, this series is only about
integrating with the promisor/narrow clone work and showing that it
makes it possible to use more than one promisor remote. Everything
that is not necessary for that integration has been removed for now
(though you can still find it in one of my branches on GitHub if you
want).

There is one test in patch 8/9 that shows that more than one promisor
remote can now be used, but I still feel that it could be interesting
to add other such tests, so I am open to ideas in this area.

Changes compared to V3 of this patch series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  - the remote_odb_reinit() function is not "inline" anymore in patch
    5/9 as suggested by Beat Bolli in:

    https://public-inbox.org/git/20180724215223.27516-3-dev+git@drbeat.li/

High level overview of this patch series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All the patches except 5/9 are unchanged compared to V3.

  - Patch 1/9:

This makes functions in fetch-object.c return an error code, which is
necessary to later tell that they failed and try another remote odb
when there is more than one. This could also just be seen as a fix to
these functions.

  - Patch 2/9:

This introduces the minimum infrastructure for remote odbs.

  - Patches 3/9 and 4/9:

These patches implement remote_odb_get_direct() and
remote_odb_get_many_direct() using the functions from fetch-object.c.
These new functions will be used in following patches to replace the
functions from fetch-object.c.

  - Patch 5/9:

This implement remote_odb_reinit() which will be needed to reparse the
remote odb configuration.

  - Patches 6/9 and 7/9:

These patches integrate the remote odb mechanism into the
promisor/narrow clone code. The "extensions.partialClone" config
option is replaced by "odb.<name>.promisorRemote" and
"core.partialCloneFilter" is replaced by
"odb.<name>.partialCloneFilter".

  - Patch 8/9:

This adds a test case that shows that now more than one promisor
remote can be used.

  - Patch 9/9:

This starts documenting the remote odb mechanism.

Discussion
~~~~~~~~~~

I am not sure that it is ok to completely replace the
"extensions.partialclone" config option. Even if it is fully replaced,
no "extensions.remoteodb" is implemented in these patches, as maybe
the "extensions.partialclone" name could be kept even if the
underlying mechanism is the remote odb mechanism.

Anyway I think that the remote odb mechanism is much more extensible,
so I think using "extensions.partialclone" to specify a promisor
remote should be at least deprecated.

Links
~~~~~

This patch series on GitHub:

V4: https://github.com/chriscool/git/commits/remote-odb
V3: https://github.com/chriscool/git/commits/remote-odb3
V2: https://github.com/chriscool/git/commits/remote-odb2
V1: https://github.com/chriscool/git/commits/remote-odb1

Discussions related to previous versions:

V3: https://public-inbox.org/git/20180713174959.16748-1-chriscool@tuxfamily.org/
V2: https://public-inbox.org/git/20180630083542.20347-1-chriscool@tuxfamily.org/
V1: https://public-inbox.org/git/20180623121846.19750-1-chriscool@tuxfamily.org/

Previous "odb remote" series:

https://public-inbox.org/git/20180513103232.17514-1-chriscool@tuxfamily.org/
https://github.com/chriscool/git/commits/odb-remote

Version 1 and 2 of the "Promisor remotes and external ODB support" series:

https://public-inbox.org/git/20180103163403.11303-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20180319133147.15413-1-chriscool@tuxfamily.org/

Version 1 and 2 of the "Promisor remotes and external ODB support" series on GitHub:

https://github.com/chriscool/git/commits/gl-small-promisor-external-odb12
https://github.com/chriscool/git/commits/gl-small-promisor-external-odb71

Christian Couder (9):
  fetch-object: make functions return an error code
  Add initial remote odb support
  remote-odb: implement remote_odb_get_direct()
  remote-odb: implement remote_odb_get_many_direct()
  remote-odb: add remote_odb_reinit()
  Use remote_odb_get_direct() and has_remote_odb()
  Use odb.origin.partialclonefilter instead of core.partialclonefilter
  t0410: test fetching from many promisor remotes
  Documentation/config: add odb.<name>.promisorRemote

 Documentation/config.txt      |   5 ++
 Makefile                      |   2 +
 builtin/cat-file.c            |   5 +-
 builtin/fetch.c               |  13 ++--
 builtin/gc.c                  |   3 +-
 builtin/repack.c              |   3 +-
 cache.h                       |   2 -
 connected.c                   |   3 +-
 environment.c                 |   1 -
 fetch-object.c                |  15 ++--
 fetch-object.h                |   6 +-
 list-objects-filter-options.c |  51 +++++++------
 list-objects-filter-options.h |   3 +-
 odb-helper.c                  |  45 +++++++++++
 odb-helper.h                  |  24 ++++++
 packfile.c                    |   3 +-
 remote-odb.c                  | 137 ++++++++++++++++++++++++++++++++++
 remote-odb.h                  |  10 +++
 setup.c                       |   7 +-
 sha1-file.c                   |  14 ++--
 t/t0410-partial-clone.sh      |  58 +++++++++-----
 t/t5500-fetch-pack.sh         |   4 +-
 t/t5601-clone.sh              |   2 +-
 t/t5616-partial-clone.sh      |   4 +-
 t/t5702-protocol-v2.sh        |   2 +-
 unpack-trees.c                |   6 +-
 26 files changed, 342 insertions(+), 86 deletions(-)
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100644 remote-odb.c
 create mode 100644 remote-odb.h

-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/9] fetch-object: make functions return an error code
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
@ 2018-08-02  6:14 ` Christian Couder
  2018-08-02  6:14 ` [PATCH v4 2/9] Add initial remote odb support Christian Couder
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

The callers of the fetch_object() and fetch_objects() might
be interested in knowing if these functions succeeded or not.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 fetch-object.c | 15 +++++++++------
 fetch-object.h |  6 +++---
 sha1-file.c    |  4 ++--
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 48fe63dd6c..3c52009266 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -5,11 +5,12 @@
 #include "transport.h"
 #include "fetch-object.h"
 
-static void fetch_refs(const char *remote_name, struct ref *ref)
+static int fetch_refs(const char *remote_name, struct ref *ref)
 {
 	struct remote *remote;
 	struct transport *transport;
 	int original_fetch_if_missing = fetch_if_missing;
+	int res;
 
 	fetch_if_missing = 0;
 	remote = remote_get(remote_name);
@@ -19,18 +20,20 @@ static void fetch_refs(const char *remote_name, struct ref *ref)
 
 	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
 	transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
-	transport_fetch_refs(transport, ref, NULL);
+	res = transport_fetch_refs(transport, ref, NULL);
 	fetch_if_missing = original_fetch_if_missing;
+
+	return res;
 }
 
-void fetch_object(const char *remote_name, const unsigned char *sha1)
+int fetch_object(const char *remote_name, const unsigned char *sha1)
 {
 	struct ref *ref = alloc_ref(sha1_to_hex(sha1));
 	hashcpy(ref->old_oid.hash, sha1);
-	fetch_refs(remote_name, ref);
+	return fetch_refs(remote_name, ref);
 }
 
-void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
+int fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
 {
 	struct ref *ref = NULL;
 	int i;
@@ -41,5 +44,5 @@ void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
 		new_ref->next = ref;
 		ref = new_ref;
 	}
-	fetch_refs(remote_name, ref);
+	return fetch_refs(remote_name, ref);
 }
diff --git a/fetch-object.h b/fetch-object.h
index 4b269d07ed..12e1f9ee70 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -3,9 +3,9 @@
 
 #include "sha1-array.h"
 
-extern void fetch_object(const char *remote_name, const unsigned char *sha1);
+extern int fetch_object(const char *remote_name, const unsigned char *sha1);
 
-extern void fetch_objects(const char *remote_name,
-			  const struct oid_array *to_fetch);
+extern int fetch_objects(const char *remote_name,
+			 const struct oid_array *to_fetch);
 
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index de4839e634..c099f5584d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1312,8 +1312,8 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 		if (fetch_if_missing && repository_format_partial_clone &&
 		    !already_retried && r == the_repository) {
 			/*
-			 * TODO Investigate having fetch_object() return
-			 * TODO error/success and stopping the music here.
+			 * TODO Investigate checking fetch_object() return
+			 * TODO value and stopping on error here.
 			 * TODO Pass a repository struct through fetch_object,
 			 * such that arbitrary repositories work.
 			 */
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 2/9] Add initial remote odb support
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
  2018-08-02  6:14 ` [PATCH v4 1/9] fetch-object: make functions return an error code Christian Couder
@ 2018-08-02  6:14 ` Christian Couder
  2018-08-02  6:14 ` [PATCH v4 3/9] remote-odb: implement remote_odb_get_direct() Christian Couder
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

The remote-odb.{c,h} files will contain the functions
that are called by the rest of Git mostly from
"sha1-file.c" to access the objects managed by the
remote odbs.

The odb-helper.{c,h} files will contain the functions to
actually implement communication with either the internal
functions or the external scripts or processes that will
manage and provide remote git objects.

For now only infrastructure to create helpers from the
config and to check if there are remote odbs and odb
helpers is implemented.

We expect that there will not be a lot of helpers, so it
is ok to use a simple linked list to manage them.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile     |  2 ++
 odb-helper.c | 16 +++++++++
 odb-helper.h | 19 +++++++++++
 remote-odb.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 remote-odb.h |  7 ++++
 5 files changed, 135 insertions(+)
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100644 remote-odb.c
 create mode 100644 remote-odb.h

diff --git a/Makefile b/Makefile
index 08e5c54549..2a9bd02208 100644
--- a/Makefile
+++ b/Makefile
@@ -896,6 +896,8 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += odb-helper.o
+LIB_OBJS += remote-odb.o
 LIB_OBJS += oidmap.o
 LIB_OBJS += oidset.o
 LIB_OBJS += packfile.o
diff --git a/odb-helper.c b/odb-helper.c
new file mode 100644
index 0000000000..b4d403ffa9
--- /dev/null
+++ b/odb-helper.c
@@ -0,0 +1,16 @@
+#include "cache.h"
+#include "object.h"
+#include "argv-array.h"
+#include "odb-helper.h"
+#include "run-command.h"
+#include "sha1-lookup.h"
+
+struct odb_helper *odb_helper_new(const char *name, int namelen)
+{
+	struct odb_helper *o;
+
+	o = xcalloc(1, sizeof(*o));
+	o->name = xmemdupz(name, namelen);
+
+	return o;
+}
diff --git a/odb-helper.h b/odb-helper.h
new file mode 100644
index 0000000000..4b792a3896
--- /dev/null
+++ b/odb-helper.h
@@ -0,0 +1,19 @@
+#ifndef ODB_HELPER_H
+#define ODB_HELPER_H
+
+/*
+ * An odb helper is a way to access a remote odb.
+ *
+ * Information in its fields comes from the config file [odb "NAME"]
+ * entries.
+ */
+struct odb_helper {
+	const char *name;                 /* odb.<NAME>.* */
+	const char *remote;               /* odb.<NAME>.promisorRemote */
+
+	struct odb_helper *next;
+};
+
+extern struct odb_helper *odb_helper_new(const char *name, int namelen);
+
+#endif /* ODB_HELPER_H */
diff --git a/remote-odb.c b/remote-odb.c
new file mode 100644
index 0000000000..03be54ba2e
--- /dev/null
+++ b/remote-odb.c
@@ -0,0 +1,91 @@
+#include "cache.h"
+#include "remote-odb.h"
+#include "odb-helper.h"
+#include "config.h"
+
+static struct odb_helper *helpers;
+static struct odb_helper **helpers_tail = &helpers;
+
+static struct odb_helper *find_or_create_helper(const char *name, int len)
+{
+	struct odb_helper *o;
+
+	for (o = helpers; o; o = o->next)
+		if (!strncmp(o->name, name, len) && !o->name[len])
+			return o;
+
+	o = odb_helper_new(name, len);
+	*helpers_tail = o;
+	helpers_tail = &o->next;
+
+	return o;
+}
+
+static struct odb_helper *do_find_odb_helper(const char *remote)
+{
+	struct odb_helper *o;
+
+	for (o = helpers; o; o = o->next)
+		if (o->remote && !strcmp(o->remote, remote))
+			return o;
+
+	return NULL;
+}
+
+static int remote_odb_config(const char *var, const char *value, void *data)
+{
+	struct odb_helper *o;
+	const char *name;
+	int namelen;
+	const char *subkey;
+
+	if (parse_config_key(var, "odb", &name, &namelen, &subkey) < 0)
+		return 0;
+
+	o = find_or_create_helper(name, namelen);
+
+	if (!strcmp(subkey, "promisorremote")) {
+		const char *remote;
+		int res = git_config_string(&remote, var, value);
+
+		if (res)
+			return res;
+
+		if (do_find_odb_helper(remote))
+			return error(_("when parsing config key '%s' "
+				       "helper for remote '%s' already exists"),
+				     var, remote);
+
+		o->remote = remote;
+
+		return 0;
+	}
+
+	return 0;
+}
+
+static void remote_odb_init(void)
+{
+	static int initialized;
+
+	if (initialized)
+		return;
+	initialized = 1;
+
+	git_config(remote_odb_config, NULL);
+}
+
+struct odb_helper *find_odb_helper(const char *remote)
+{
+	remote_odb_init();
+
+	if (!remote)
+		return helpers;
+
+	return do_find_odb_helper(remote);
+}
+
+int has_remote_odb(void)
+{
+	return !!find_odb_helper(NULL);
+}
diff --git a/remote-odb.h b/remote-odb.h
new file mode 100644
index 0000000000..4c7b13695f
--- /dev/null
+++ b/remote-odb.h
@@ -0,0 +1,7 @@
+#ifndef REMOTE_ODB_H
+#define REMOTE_ODB_H
+
+extern struct odb_helper *find_odb_helper(const char *remote);
+extern int has_remote_odb(void);
+
+#endif /* REMOTE_ODB_H */
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 3/9] remote-odb: implement remote_odb_get_direct()
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
  2018-08-02  6:14 ` [PATCH v4 1/9] fetch-object: make functions return an error code Christian Couder
  2018-08-02  6:14 ` [PATCH v4 2/9] Add initial remote odb support Christian Couder
@ 2018-08-02  6:14 ` Christian Couder
  2018-08-02  6:15 ` [PATCH v4 4/9] remote-odb: implement remote_odb_get_many_direct() Christian Couder
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

This is implemented only in the promisor remote mode
for now by calling fetch_object().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 odb-helper.c | 14 ++++++++++++++
 odb-helper.h |  3 ++-
 remote-odb.c | 17 +++++++++++++++++
 remote-odb.h |  1 +
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/odb-helper.c b/odb-helper.c
index b4d403ffa9..99b5a345e8 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -4,6 +4,7 @@
 #include "odb-helper.h"
 #include "run-command.h"
 #include "sha1-lookup.h"
+#include "fetch-object.h"
 
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
@@ -14,3 +15,16 @@ struct odb_helper *odb_helper_new(const char *name, int namelen)
 
 	return o;
 }
+
+int odb_helper_get_direct(struct odb_helper *o,
+			  const unsigned char *sha1)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	res = fetch_object(o->remote, sha1);
+
+	trace_performance_since(start, "odb_helper_get_direct");
+
+	return res;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 4b792a3896..4c52e81ce8 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -15,5 +15,6 @@ struct odb_helper {
 };
 
 extern struct odb_helper *odb_helper_new(const char *name, int namelen);
-
+extern int odb_helper_get_direct(struct odb_helper *o,
+				 const unsigned char *sha1);
 #endif /* ODB_HELPER_H */
diff --git a/remote-odb.c b/remote-odb.c
index 03be54ba2e..7f815c7ebb 100644
--- a/remote-odb.c
+++ b/remote-odb.c
@@ -89,3 +89,20 @@ int has_remote_odb(void)
 {
 	return !!find_odb_helper(NULL);
 }
+
+int remote_odb_get_direct(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	trace_printf("trace: remote_odb_get_direct: %s", sha1_to_hex(sha1));
+
+	remote_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		if (odb_helper_get_direct(o, sha1) < 0)
+			continue;
+		return 0;
+	}
+
+	return -1;
+}
diff --git a/remote-odb.h b/remote-odb.h
index 4c7b13695f..c5384c922d 100644
--- a/remote-odb.h
+++ b/remote-odb.h
@@ -3,5 +3,6 @@
 
 extern struct odb_helper *find_odb_helper(const char *remote);
 extern int has_remote_odb(void);
+extern int remote_odb_get_direct(const unsigned char *sha1);
 
 #endif /* REMOTE_ODB_H */
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 4/9] remote-odb: implement remote_odb_get_many_direct()
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
                   ` (2 preceding siblings ...)
  2018-08-02  6:14 ` [PATCH v4 3/9] remote-odb: implement remote_odb_get_direct() Christian Couder
@ 2018-08-02  6:15 ` Christian Couder
  2018-08-02  6:15 ` [PATCH v4 5/9] remote-odb: add remote_odb_reinit() Christian Couder
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

This function will be used to get many objects from a promisor
remote.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 odb-helper.c | 15 +++++++++++++++
 odb-helper.h |  3 +++
 remote-odb.c | 17 +++++++++++++++++
 remote-odb.h |  1 +
 4 files changed, 36 insertions(+)

diff --git a/odb-helper.c b/odb-helper.c
index 99b5a345e8..246ebf8f7a 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -28,3 +28,18 @@ int odb_helper_get_direct(struct odb_helper *o,
 
 	return res;
 }
+
+int odb_helper_get_many_direct(struct odb_helper *o,
+			       const struct oid_array *to_get)
+{
+	int res;
+	uint64_t start;
+
+	start = getnanotime();
+
+	res = fetch_objects(o->remote, to_get);
+
+	trace_performance_since(start, "odb_helper_get_many_direct");
+
+	return res;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 4c52e81ce8..154ce4c7e4 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -17,4 +17,7 @@ struct odb_helper {
 extern struct odb_helper *odb_helper_new(const char *name, int namelen);
 extern int odb_helper_get_direct(struct odb_helper *o,
 				 const unsigned char *sha1);
+extern int odb_helper_get_many_direct(struct odb_helper *o,
+				      const struct oid_array *to_get);
+
 #endif /* ODB_HELPER_H */
diff --git a/remote-odb.c b/remote-odb.c
index 7f815c7ebb..09dfc2e16f 100644
--- a/remote-odb.c
+++ b/remote-odb.c
@@ -106,3 +106,20 @@ int remote_odb_get_direct(const unsigned char *sha1)
 
 	return -1;
 }
+
+int remote_odb_get_many_direct(const struct oid_array *to_get)
+{
+	struct odb_helper *o;
+
+	trace_printf("trace: remote_odb_get_many_direct: nr: %d", to_get->nr);
+
+	remote_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		if (odb_helper_get_many_direct(o, to_get) < 0)
+			continue;
+		return 0;
+	}
+
+	return -1;
+}
diff --git a/remote-odb.h b/remote-odb.h
index c5384c922d..e10a8bf855 100644
--- a/remote-odb.h
+++ b/remote-odb.h
@@ -4,5 +4,6 @@
 extern struct odb_helper *find_odb_helper(const char *remote);
 extern int has_remote_odb(void);
 extern int remote_odb_get_direct(const unsigned char *sha1);
+extern int remote_odb_get_many_direct(const struct oid_array *to_get);
 
 #endif /* REMOTE_ODB_H */
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 5/9] remote-odb: add remote_odb_reinit()
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
                   ` (3 preceding siblings ...)
  2018-08-02  6:15 ` [PATCH v4 4/9] remote-odb: implement remote_odb_get_many_direct() Christian Couder
@ 2018-08-02  6:15 ` Christian Couder
  2018-08-02  6:15 ` [PATCH v4 6/9] Use remote_odb_get_direct() and has_remote_odb() Christian Couder
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

We will need to reinitialize the remote odb configuration
as we will make some changes to it in a later commit when
we will detect that a remote is also a remote odb.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 remote-odb.c | 14 ++++++++++++--
 remote-odb.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/remote-odb.c b/remote-odb.c
index 09dfc2e16f..f063ba0fda 100644
--- a/remote-odb.c
+++ b/remote-odb.c
@@ -64,17 +64,27 @@ static int remote_odb_config(const char *var, const char *value, void *data)
 	return 0;
 }
 
-static void remote_odb_init(void)
+static void remote_odb_do_init(int force)
 {
 	static int initialized;
 
-	if (initialized)
+	if (!force && initialized)
 		return;
 	initialized = 1;
 
 	git_config(remote_odb_config, NULL);
 }
 
+static inline void remote_odb_init(void)
+{
+	remote_odb_do_init(0);
+}
+
+void remote_odb_reinit(void)
+{
+	remote_odb_do_init(1);
+}
+
 struct odb_helper *find_odb_helper(const char *remote)
 {
 	remote_odb_init();
diff --git a/remote-odb.h b/remote-odb.h
index e10a8bf855..79803782af 100644
--- a/remote-odb.h
+++ b/remote-odb.h
@@ -1,6 +1,7 @@
 #ifndef REMOTE_ODB_H
 #define REMOTE_ODB_H
 
+extern void remote_odb_reinit(void);
 extern struct odb_helper *find_odb_helper(const char *remote);
 extern int has_remote_odb(void);
 extern int remote_odb_get_direct(const unsigned char *sha1);
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 6/9] Use remote_odb_get_direct() and has_remote_odb()
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
                   ` (4 preceding siblings ...)
  2018-08-02  6:15 ` [PATCH v4 5/9] remote-odb: add remote_odb_reinit() Christian Couder
@ 2018-08-02  6:15 ` Christian Couder
  2018-08-02  6:15 ` [PATCH v4 7/9] Use odb.origin.partialclonefilter instead of core.partialclonefilter Christian Couder
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

Instead of using the repository_format_partial_clone global
and fetch_object() directly, let's use has_remote_odb() and
remote_odb_get_direct().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/cat-file.c            |  5 +++--
 builtin/fetch.c               | 11 ++++++-----
 builtin/gc.c                  |  3 ++-
 builtin/repack.c              |  3 ++-
 cache.h                       |  2 --
 connected.c                   |  3 ++-
 environment.c                 |  1 -
 list-objects-filter-options.c | 27 +++++++++++++++------------
 packfile.c                    |  3 ++-
 setup.c                       |  7 +------
 sha1-file.c                   | 14 ++++++++------
 t/t0410-partial-clone.sh      | 34 +++++++++++++++++-----------------
 t/t5500-fetch-pack.sh         |  4 ++--
 t/t5601-clone.sh              |  2 +-
 t/t5616-partial-clone.sh      |  2 +-
 t/t5702-protocol-v2.sh        |  2 +-
 unpack-trees.c                |  6 +++---
 17 files changed, 66 insertions(+), 63 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4a44b2404f..4d19e9277e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@
 #include "sha1-array.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "remote-odb.h"
 
 struct batch_options {
 	int enabled;
@@ -478,8 +479,8 @@ static int batch_objects(struct batch_options *opt)
 
 		for_each_loose_object(batch_loose_object, &sa, 0);
 		for_each_packed_object(batch_packed_object, &sa, 0);
-		if (repository_format_partial_clone)
-			warning("This repository has extensions.partialClone set. Some objects may not be loaded.");
+		if (has_remote_odb())
+			warning("This repository uses an odb. Some objects may not be loaded.");
 
 		cb.opt = opt;
 		cb.expand = &data;
diff --git a/builtin/fetch.c b/builtin/fetch.c
index ac06f6a576..9c7df8356c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -22,6 +22,7 @@
 #include "utf8.h"
 #include "packfile.h"
 #include "list-objects-filter-options.h"
+#include "remote-odb.h"
 
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
@@ -1339,7 +1340,7 @@ static inline void fetch_one_setup_partial(struct remote *remote)
 	 * If no prior partial clone/fetch and the current fetch DID NOT
 	 * request a partial-fetch, do a normal fetch.
 	 */
-	if (!repository_format_partial_clone && !filter_options.choice)
+	if (!has_remote_odb() && !filter_options.choice)
 		return;
 
 	/*
@@ -1347,7 +1348,7 @@ static inline void fetch_one_setup_partial(struct remote *remote)
 	 * on this repo and remember the given filter-spec as the default
 	 * for subsequent fetches to this remote.
 	 */
-	if (!repository_format_partial_clone && filter_options.choice) {
+	if (!has_remote_odb() && filter_options.choice) {
 		partial_clone_register(remote->name, &filter_options);
 		return;
 	}
@@ -1356,7 +1357,7 @@ static inline void fetch_one_setup_partial(struct remote *remote)
 	 * We are currently limited to only ONE promisor remote and only
 	 * allow partial-fetches from the promisor remote.
 	 */
-	if (strcmp(remote->name, repository_format_partial_clone)) {
+	if (!find_odb_helper(remote->name)) {
 		if (filter_options.choice)
 			die(_("--filter can only be used with the remote configured in core.partialClone"));
 		return;
@@ -1487,7 +1488,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (depth || deepen_since || deepen_not.nr)
 		deepen = 1;
 
-	if (filter_options.choice && !repository_format_partial_clone)
+	if (filter_options.choice && !has_remote_odb())
 		die("--filter can only be used when extensions.partialClone is set");
 
 	if (all) {
@@ -1521,7 +1522,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	}
 
 	if (remote) {
-		if (filter_options.choice || repository_format_partial_clone)
+		if (filter_options.choice || has_remote_odb())
 			fetch_one_setup_partial(remote);
 		result = fetch_one(remote, argc, argv, prune_tags_ok);
 	} else {
diff --git a/builtin/gc.c b/builtin/gc.c
index ccfb1ceaeb..02c783b514 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -26,6 +26,7 @@
 #include "pack-objects.h"
 #include "blob.h"
 #include "tree.h"
+#include "remote-odb.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -619,7 +620,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			argv_array_push(&prune, prune_expire);
 			if (quiet)
 				argv_array_push(&prune, "--no-progress");
-			if (repository_format_partial_clone)
+			if (has_remote_odb())
 				argv_array_push(&prune,
 						"--exclude-promisor-objects");
 			if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159e..72b1e2d94f 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "remote-odb.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -249,7 +250,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--all");
 	argv_array_push(&cmd.args, "--reflog");
 	argv_array_push(&cmd.args, "--indexed-objects");
-	if (repository_format_partial_clone)
+	if (has_remote_odb())
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
 	if (window)
 		argv_array_pushf(&cmd.args, "--window=%s", window);
diff --git a/cache.h b/cache.h
index 8b447652a7..a0772bd099 100644
--- a/cache.h
+++ b/cache.h
@@ -904,13 +904,11 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
-extern char *repository_format_partial_clone;
 extern const char *core_partial_clone_filter_default;
 
 struct repository_format {
 	int version;
 	int precious_objects;
-	char *partial_clone; /* value of extensions.partialclone */
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
diff --git a/connected.c b/connected.c
index 1bba888eff..7743b78290 100644
--- a/connected.c
+++ b/connected.c
@@ -4,6 +4,7 @@
 #include "connected.h"
 #include "transport.h"
 #include "packfile.h"
+#include "remote-odb.h"
 
 /*
  * If we feed all the commits we want to verify to this command
@@ -56,7 +57,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 	argv_array_push(&rev_list.args,"rev-list");
 	argv_array_push(&rev_list.args, "--objects");
 	argv_array_push(&rev_list.args, "--stdin");
-	if (repository_format_partial_clone)
+	if (has_remote_odb())
 		argv_array_push(&rev_list.args, "--exclude-promisor-objects");
 	if (!opt->is_deepening_fetch) {
 		argv_array_push(&rev_list.args, "--not");
diff --git a/environment.c b/environment.c
index 013e845235..8234940046 100644
--- a/environment.c
+++ b/environment.c
@@ -30,7 +30,6 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
-char *repository_format_partial_clone;
 const char *core_partial_clone_filter_default;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a06..60452c8f36 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "remote-odb.h"
 
 /*
  * Parse value of the argument to the "filter" keyword.
@@ -114,30 +115,32 @@ void partial_clone_register(
 	const char *remote,
 	const struct list_objects_filter_options *filter_options)
 {
-	/*
-	 * Record the name of the partial clone remote in the
-	 * config and in the global variable -- the latter is
-	 * used throughout to indicate that partial clone is
-	 * enabled and to expect missing objects.
-	 */
-	if (repository_format_partial_clone &&
-	    *repository_format_partial_clone &&
-	    strcmp(remote, repository_format_partial_clone))
-		die(_("cannot change partial clone promisor remote"));
+	char *cfg_name;
+
+	/* Check if it is already registered */
+	if (find_odb_helper(remote))
+		return;
 
 	git_config_set("core.repositoryformatversion", "1");
-	git_config_set("extensions.partialclone", remote);
 
-	repository_format_partial_clone = xstrdup(remote);
+	/* Add odb config for the remote */
+	cfg_name = xstrfmt("odb.%s.promisorRemote", remote);
+	git_config_set(cfg_name, remote);
+	free(cfg_name);
 
 	/*
 	 * Record the initial filter-spec in the config as
 	 * the default for subsequent fetches from this remote.
+	 *
+	 * TODO: move core.partialclonefilter into odb.<name>
 	 */
 	core_partial_clone_filter_default =
 		xstrdup(filter_options->filter_spec);
 	git_config_set("core.partialclonefilter",
 		       core_partial_clone_filter_default);
+
+	/* Make sure the config info are reset */
+	remote_odb_reinit();
 }
 
 void partial_clone_get_default_filter_spec(
diff --git a/packfile.c b/packfile.c
index 7cd45aa4b2..9cbc974612 100644
--- a/packfile.c
+++ b/packfile.c
@@ -15,6 +15,7 @@
 #include "tree-walk.h"
 #include "tree.h"
 #include "object-store.h"
+#include "remote-odb.h"
 
 char *odb_pack_name(struct strbuf *buf,
 		    const unsigned char *sha1,
@@ -1976,7 +1977,7 @@ int is_promisor_object(const struct object_id *oid)
 	static int promisor_objects_prepared;
 
 	if (!promisor_objects_prepared) {
-		if (repository_format_partial_clone) {
+		if (has_remote_odb()) {
 			for_each_packed_object(add_promisor_object,
 					       &promisor_objects,
 					       FOR_EACH_OBJECT_PROMISOR_ONLY);
diff --git a/setup.c b/setup.c
index b24c811c1c..4eeb7fe073 100644
--- a/setup.c
+++ b/setup.c
@@ -419,11 +419,7 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			;
 		else if (!strcmp(ext, "preciousobjects"))
 			data->precious_objects = git_config_bool(var, value);
-		else if (!strcmp(ext, "partialclone")) {
-			if (!value)
-				return config_error_nonbool(var);
-			data->partial_clone = xstrdup(value);
-		} else
+		else
 			string_list_append(&data->unknown_extensions, ext);
 	} else if (strcmp(var, "core.bare") == 0) {
 		data->is_bare = git_config_bool(var, value);
@@ -465,7 +461,6 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	repository_format_partial_clone = candidate->partial_clone;
 	string_list_clear(&candidate->unknown_extensions, 0);
 	if (!has_common) {
 		if (candidate->is_bare != -1) {
diff --git a/sha1-file.c b/sha1-file.c
index c099f5584d..5e183d2cd4 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -32,6 +32,7 @@
 #include "packfile.h"
 #include "fetch-object.h"
 #include "object-store.h"
+#include "remote-odb.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -1309,15 +1310,16 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 		}
 
 		/* Check if it is a missing object */
-		if (fetch_if_missing && repository_format_partial_clone &&
+		if (fetch_if_missing && has_remote_odb() &&
 		    !already_retried && r == the_repository) {
 			/*
-			 * TODO Investigate checking fetch_object() return
-			 * TODO value and stopping on error here.
-			 * TODO Pass a repository struct through fetch_object,
-			 * such that arbitrary repositories work.
+			 * TODO Investigate checking remote_odb_get_direct()
+			 * TODO return value and stopping on error here.
+			 * TODO Pass a repository struct through
+			 * remote_odb_get_direct(), such that arbitrary
+			 * repositories work.
 			 */
-			fetch_object(repository_format_partial_clone, real->hash);
+			remote_odb_get_direct(real->hash);
 			already_retried = 1;
 			continue;
 		}
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583d..71a9b9a3e8 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -23,10 +23,10 @@ promise_and_delete () {
 	delete_object repo "$HASH"
 }
 
-test_expect_success 'extensions.partialclone without filter' '
+test_expect_success 'promisor remote without filter' '
 	test_create_repo server &&
 	git clone --filter="blob:none" "file://$(pwd)/server" client &&
-	git -C client config --unset core.partialclonefilter &&
+	git -C client config --unset odb.origin.partialclonefilter &&
 	git -C client fetch origin
 '
 
@@ -51,7 +51,7 @@ test_expect_success 'missing reflog object, but promised by a commit, passes fsc
 
 	# But with the extension, it succeeds
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo fsck
 '
 
@@ -74,7 +74,7 @@ test_expect_success 'missing reflog object, but promised by a tag, passes fsck'
 	printf "$T\n" | pack_as_from_promisor &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo fsck
 '
 
@@ -92,7 +92,7 @@ test_expect_success 'missing reflog object alone fails fsck, even with extension
 	delete_object repo "$A" &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	test_must_fail git -C repo fsck
 '
 
@@ -108,7 +108,7 @@ test_expect_success 'missing ref object, but promised, passes fsck' '
 	promise_and_delete "$A" &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo fsck
 '
 
@@ -131,7 +131,7 @@ test_expect_success 'missing object, but promised, passes fsck' '
 	promise_and_delete "$AT" &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo fsck
 '
 
@@ -144,7 +144,7 @@ test_expect_success 'missing CLI object, but promised, passes fsck' '
 	promise_and_delete "$A" &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo fsck "$A"
 '
 
@@ -159,7 +159,7 @@ test_expect_success 'fetching of missing objects' '
 	rm -rf repo/.git/objects/* &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "origin" &&
+	git -C repo config odb.magic.promisorRemote "origin" &&
 	git -C repo cat-file -p "$HASH" &&
 
 	# Ensure that the .promisor file is written, and check that its
@@ -180,7 +180,7 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	promise_and_delete "$FOO" &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo rev-list --exclude-promisor-objects --objects bar >out &&
 	grep $(git -C repo rev-parse bar) out &&
 	! grep $FOO out
@@ -205,7 +205,7 @@ test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	promise_and_delete "$TREE2" &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
 	grep $(git -C repo rev-parse foo) out &&
 	! grep $TREE out &&
@@ -224,7 +224,7 @@ test_expect_success 'rev-list stops traversal at missing and promised blob' '
 	promise_and_delete "$BLOB" &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
 	grep $(git -C repo rev-parse HEAD) out &&
 	! grep $BLOB out
@@ -243,7 +243,7 @@ test_expect_success 'rev-list stops traversal at promisor commit, tree, and blob
 	printf "%s\n%s\n%s\n" $COMMIT $TREE $BLOB | pack_as_from_promisor &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
 	! grep $COMMIT out &&
 	! grep $TREE out &&
@@ -267,7 +267,7 @@ test_expect_success 'rev-list accepts missing and promised objects on command li
 	promise_and_delete $BLOB &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
 '
 
@@ -280,7 +280,7 @@ test_expect_success 'gc does not repack promisor objects' '
 	HASH=$(printf "$TREE_HASH\n" | pack_as_from_promisor) &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo gc &&
 
 	# Ensure that the promisor packfile still exists, and remove it
@@ -304,7 +304,7 @@ test_expect_success 'gc stops traversal when a missing but promised object is re
 	HASH=$(promise_and_delete $TREE_HASH) &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo config odb.magic.promisorRemote "arbitrary string" &&
 	git -C repo gc &&
 
 	# Ensure that the promisor packfile still exists, and remove it
@@ -335,7 +335,7 @@ test_expect_success 'fetching of missing objects from an HTTP server' '
 	rm -rf repo/.git/objects/* &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialclone "origin" &&
+	git -C repo config odb.magic.promisorRemote "origin" &&
 	git -C repo cat-file -p "$HASH" &&
 
 	# Ensure that the .promisor file is written, and check that its
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 3d33ab3875..8605b9b3b5 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -851,14 +851,14 @@ fetch_filter_blob_limit_zero () {
 	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
 
 	git clone "$URL" client &&
-	test_config -C client extensions.partialclone origin &&
+	test_config -C client odb.magic.promisorRemote origin &&
 
 	test_commit -C "$SERVER" two &&
 
 	git -C client fetch --filter=blob:limit=0 origin HEAD:somewhere &&
 
 	# Ensure that commit is fetched, but blob is not
-	test_config -C client extensions.partialclone "arbitrary string" &&
+	test_config -C client odb.magic.promisorRemote "arbitrary string" &&
 	git -C client cat-file -e $(git -C "$SERVER" rev-parse two) &&
 	test_must_fail git -C client cat-file -e $(git hash-object "$SERVER/two.t")
 }
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..022c65a7ae 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -647,7 +647,7 @@ partial_clone () {
 	git -C client fsck &&
 
 	# Ensure that unneeded blobs are not inadvertently fetched.
-	test_config -C client extensions.partialclone "not a remote" &&
+	test_config -C client odb.origin.promisorRemote "not a remote" &&
 	test_must_fail git -C client cat-file -e "$HASH1" &&
 
 	# But this blob was fetched, because clone performs an initial checkout
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 44d8e80171..f7ed3c40e2 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -40,7 +40,7 @@ test_expect_success 'do partial clone 1' '
 		| sort >observed.oids &&
 	test_cmp expect_1.oids observed.oids &&
 	test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
-	test "$(git -C pc1 config --local extensions.partialclone)" = "origin" &&
+	test "$(git -C pc1 config --local odb.origin.promisorRemote)" = "origin" &&
 	test "$(git -C pc1 config --local core.partialclonefilter)" = "blob:none"
 '
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index a4fe6508bd..0c47599568 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -285,7 +285,7 @@ test_expect_success 'partial fetch' '
 	rm -rf client "$(pwd)/trace" &&
 	git init client &&
 	SERVER="file://$(pwd)/server" &&
-	test_config -C client extensions.partialClone "$SERVER" &&
+	test_config -C client odb.magic.promisorRemote "$SERVER" &&
 
 	GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \
 		fetch --filter=blob:none "$SERVER" master:refs/heads/other &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..13da008198 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -18,6 +18,7 @@
 #include "fsmonitor.h"
 #include "object-store.h"
 #include "fetch-object.h"
+#include "remote-odb.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -383,7 +384,7 @@ static int check_updates(struct unpack_trees_options *o)
 		load_gitmodules_file(index, &state);
 
 	enable_delayed_checkout(&state);
-	if (repository_format_partial_clone && o->update && !o->dry_run) {
+	if (has_remote_odb() && o->update && !o->dry_run) {
 		/*
 		 * Prefetch the objects that are to be checked out in the loop
 		 * below.
@@ -400,8 +401,7 @@ static int check_updates(struct unpack_trees_options *o)
 			}
 		}
 		if (to_fetch.nr)
-			fetch_objects(repository_format_partial_clone,
-				      &to_fetch);
+			remote_odb_get_many_direct(&to_fetch);
 		fetch_if_missing = fetch_if_missing_store;
 		oid_array_clear(&to_fetch);
 	}
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 7/9] Use odb.origin.partialclonefilter instead of core.partialclonefilter
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
                   ` (5 preceding siblings ...)
  2018-08-02  6:15 ` [PATCH v4 6/9] Use remote_odb_get_direct() and has_remote_odb() Christian Couder
@ 2018-08-02  6:15 ` Christian Couder
  2018-08-02  6:15 ` [PATCH v4 8/9] t0410: test fetching from many promisor remotes Christian Couder
  2018-08-02  6:15 ` [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote Christian Couder
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

Let's make the partial clone filter specific to one odb
instead of general to all the odbs.

This makes it possible to have different partial clone
filters for different odbs.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fetch.c               |  2 +-
 list-objects-filter-options.c | 28 ++++++++++++++++------------
 list-objects-filter-options.h |  3 ++-
 odb-helper.h                  |  1 +
 remote-odb.c                  |  2 ++
 t/t5616-partial-clone.sh      |  2 +-
 6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9c7df8356c..f3dee73179 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1369,7 +1369,7 @@ static inline void fetch_one_setup_partial(struct remote *remote)
 	 * the config.
 	 */
 	if (!filter_options.choice)
-		partial_clone_get_default_filter_spec(&filter_options);
+		partial_clone_get_default_filter_spec(&filter_options, remote->name);
 	return;
 }
 
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 60452c8f36..755a793664 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -7,6 +7,7 @@
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
 #include "remote-odb.h"
+#include "odb-helper.h"
 
 /*
  * Parse value of the argument to the "filter" keyword.
@@ -29,6 +30,9 @@ static int gently_parse_list_objects_filter(
 {
 	const char *v0;
 
+	if (!arg)
+		return 0;
+
 	if (filter_options->choice) {
 		if (errbuf) {
 			strbuf_init(errbuf, 0);
@@ -116,6 +120,7 @@ void partial_clone_register(
 	const struct list_objects_filter_options *filter_options)
 {
 	char *cfg_name;
+	char *filter_name;
 
 	/* Check if it is already registered */
 	if (find_odb_helper(remote))
@@ -131,27 +136,26 @@ void partial_clone_register(
 	/*
 	 * Record the initial filter-spec in the config as
 	 * the default for subsequent fetches from this remote.
-	 *
-	 * TODO: move core.partialclonefilter into odb.<name>
 	 */
-	core_partial_clone_filter_default =
-		xstrdup(filter_options->filter_spec);
-	git_config_set("core.partialclonefilter",
-		       core_partial_clone_filter_default);
+	filter_name = xstrfmt("odb.%s.partialclonefilter", remote);
+	git_config_set(filter_name, filter_options->filter_spec);
+	free(filter_name);
 
 	/* Make sure the config info are reset */
 	remote_odb_reinit();
 }
 
 void partial_clone_get_default_filter_spec(
-	struct list_objects_filter_options *filter_options)
+	struct list_objects_filter_options *filter_options,
+	const char *remote)
 {
+	struct odb_helper *helper = find_odb_helper(remote);
+
 	/*
 	 * Parse default value, but silently ignore it if it is invalid.
 	 */
-	if (!core_partial_clone_filter_default)
-		return;
-	gently_parse_list_objects_filter(filter_options,
-					 core_partial_clone_filter_default,
-					 NULL);
+	if (helper)
+		gently_parse_list_objects_filter(filter_options,
+						 helper->partial_clone_filter,
+						 NULL);
 }
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f82..12ceef3230 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -74,6 +74,7 @@ void partial_clone_register(
 	const char *remote,
 	const struct list_objects_filter_options *filter_options);
 void partial_clone_get_default_filter_spec(
-	struct list_objects_filter_options *filter_options);
+	struct list_objects_filter_options *filter_options,
+	const char *remote);
 
 #endif /* LIST_OBJECTS_FILTER_OPTIONS_H */
diff --git a/odb-helper.h b/odb-helper.h
index 154ce4c7e4..4af75ef55c 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -10,6 +10,7 @@
 struct odb_helper {
 	const char *name;                 /* odb.<NAME>.* */
 	const char *remote;               /* odb.<NAME>.promisorRemote */
+	const char *partial_clone_filter; /* odb.<NAME>.partialCloneFilter */
 
 	struct odb_helper *next;
 };
diff --git a/remote-odb.c b/remote-odb.c
index f063ba0fda..49cf8e30aa 100644
--- a/remote-odb.c
+++ b/remote-odb.c
@@ -60,6 +60,8 @@ static int remote_odb_config(const char *var, const char *value, void *data)
 
 		return 0;
 	}
+	if (!strcmp(subkey, "partialclonefilter"))
+		return git_config_string(&o->partial_clone_filter, var, value);
 
 	return 0;
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index f7ed3c40e2..e2aeee1d7d 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -41,7 +41,7 @@ test_expect_success 'do partial clone 1' '
 	test_cmp expect_1.oids observed.oids &&
 	test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
 	test "$(git -C pc1 config --local odb.origin.promisorRemote)" = "origin" &&
-	test "$(git -C pc1 config --local core.partialclonefilter)" = "blob:none"
+	test "$(git -C pc1 config --local odb.origin.partialclonefilter)" = "blob:none"
 '
 
 # checkout master to force dynamic object fetch of blobs at HEAD.
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 8/9] t0410: test fetching from many promisor remotes
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
                   ` (6 preceding siblings ...)
  2018-08-02  6:15 ` [PATCH v4 7/9] Use odb.origin.partialclonefilter instead of core.partialclonefilter Christian Couder
@ 2018-08-02  6:15 ` Christian Couder
  2018-08-02  6:15 ` [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote Christian Couder
  8 siblings, 0 replies; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t0410-partial-clone.sh | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 71a9b9a3e8..9d513ebf57 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -170,8 +170,30 @@ test_expect_success 'fetching of missing objects' '
 	git verify-pack --verbose "$IDX" | grep "$HASH"
 '
 
+test_expect_success 'fetching of missing objects from another odb remote' '
+	git clone "file://$(pwd)/server" server2 &&
+	test_commit -C server2 bar &&
+	git -C server2 repack -a -d --write-bitmap-index &&
+	HASH2=$(git -C server2 rev-parse bar) &&
+
+	git -C repo remote add server2 "file://$(pwd)/server2" &&
+	git -C repo config odb.magic2.promisorRemote server2 &&
+	git -C repo cat-file -p "$HASH2" &&
+
+	git -C repo fetch server2 &&
+	rm -rf repo/.git/objects/* &&
+	git -C repo cat-file -p "$HASH2" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH2"
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised commit' '
-	rm -rf repo &&
+	rm -rf repo server server2 &&
 	test_create_repo repo &&
 	test_commit -C repo foo &&
 	test_commit -C repo bar &&
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
                   ` (7 preceding siblings ...)
  2018-08-02  6:15 ` [PATCH v4 8/9] t0410: test fetching from many promisor remotes Christian Couder
@ 2018-08-02  6:15 ` Christian Couder
  2018-08-02 22:55   ` Stefan Beller
  8 siblings, 1 reply; 23+ messages in thread
From: Christian Couder @ 2018-08-02  6:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli,
	Christian Couder

From: Christian Couder <christian.couder@gmail.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 43b2de7b5f..2d048d47f2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2513,6 +2513,11 @@ This setting can be overridden with the `GIT_NOTES_REWRITE_REF`
 environment variable, which must be a colon separated list of refs or
 globs.
 
+odb.<name>.promisorRemote::
+	The name of a promisor remote. For now promisor remotes are
+	the only kind of remote object database (odb) that is
+	supported.
+
 pack.window::
 	The size of the window used by linkgit:git-pack-objects[1] when no
 	window size is given on the command line. Defaults to 10.
-- 
2.18.0.330.g17eb9fed90


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-08-02  6:15 ` [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote Christian Couder
@ 2018-08-02 22:55   ` Stefan Beller
  2018-09-25  8:07     ` Christian Couder
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Beller @ 2018-08-02 22:55 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Duy Nguyen, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

On Wed, Aug 1, 2018 at 11:16 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> From: Christian Couder <christian.couder@gmail.com>
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/config.txt | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 43b2de7b5f..2d048d47f2 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2513,6 +2513,11 @@ This setting can be overridden with the `GIT_NOTES_REWRITE_REF`
>  environment variable, which must be a colon separated list of refs or
>  globs.
>
> +odb.<name>.promisorRemote::
> +       The name of a promisor remote. For now promisor remotes are
> +       the only kind of remote object database (odb) that is
> +       supported.
> +

Can you explain the end goal for this? (I did not find it in the cover letter,
nor do I make sense of this documentation)

So from what I understand, this series relates to partialClone, which
has the remote name of the "promisor" in extensions.partialclone.
That is the remote to contact for any needs w.r.t. partial clone and
fetching on demand.

This key "odb.<name1>.promisorRemote = <name2>" introduces
2 new names, where do each of these two names hook in?
name2 is a remote, such as "origin" from what I can tell, but
which naming scheme does name1 follow here?

What makes the odb key different, in that the partial clone
feature only handles objects as well?

>  pack.window::
>         The size of the window used by linkgit:git-pack-objects[1] when no
>         window size is given on the command line. Defaults to 10.
> --
> 2.18.0.330.g17eb9fed90
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-08-02 22:55   ` Stefan Beller
@ 2018-09-25  8:07     ` Christian Couder
  2018-09-25 22:31       ` Junio C Hamano
  2018-10-16 17:43       ` Jonathan Nieder
  0 siblings, 2 replies; 23+ messages in thread
From: Christian Couder @ 2018-09-25  8:07 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Duy Nguyen, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

On Fri, Aug 3, 2018 at 12:55 AM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, Aug 1, 2018 at 11:16 PM Christian Couder
> <christian.couder@gmail.com> wrote:
>>
>> From: Christian Couder <christian.couder@gmail.com>
>>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>  Documentation/config.txt | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 43b2de7b5f..2d048d47f2 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -2513,6 +2513,11 @@ This setting can be overridden with the `GIT_NOTES_REWRITE_REF`
>>  environment variable, which must be a colon separated list of refs or
>>  globs.
>>
>> +odb.<name>.promisorRemote::
>> +       The name of a promisor remote. For now promisor remotes are
>> +       the only kind of remote object database (odb) that is
>> +       supported.
>> +
>
> Can you explain the end goal for this? (I did not find it in the cover letter,
> nor do I make sense of this documentation)

First thank you for (re)opening this discussion, as I think it may
help resolve the issues related to my work.

In the cover letter there is a "Discussion" section which is about
this, but I agree that it might not be very clear.

The main issue that this patch series tries to solve is that
extensions.partialclone config option limits the partial clone and
promisor features to only one remote. One related issue is that it
also prevents to have other kind of promisor/partial clone/odb
remotes. By other kind I mean remotes that would not necessarily be
git repos, but that could store objects (that's where ODB, for Object
DataBase, comes from) and could provide those objects to Git through a
helper (or driver) script or program.

For reference I tried to raise these issues (especially the first one)
at least twice before extensions.partialclone was merged:

https://public-inbox.org/git/CAP8UFD3Jt+0Lq9Yx_7x3sJD+jG+A25bAgDg7zp+dZV43+1-vow@mail.gmail.com/
https://public-inbox.org/git/CAP8UFD0P7kVo2NP4Wq7OaSV4H1+sqHapuzW5AQef+enNS0S5hw@mail.gmail.com/

but it was still merged as is.

(So of course now it's not surprising that my work on this patch
series keeps conflicting with work that is still going on promisors
and partial clone, and unfortunately the result is that my work keeps
being ejected from pu when it can reach it.)

> So from what I understand, this series relates to partialClone, which
> has the remote name of the "promisor" in extensions.partialclone.
> That is the remote to contact for any needs w.r.t. partial clone and
> fetching on demand.

Yes.

> This key "odb.<name1>.promisorRemote = <name2>" introduces
> 2 new names, where do each of these two names hook in?
> name2 is a remote, such as "origin" from what I can tell, but
> which naming scheme does name1 follow here?

There is just one new name. Instead of:

  extensions.partialclone = <remote name>

there is:

  odb.<remote odb name>.promisorRemote = <remote name>

So it is now like:

  remote.<remote name>.url = <remote url>

which we use for remote repositories.

And it enables us to:

  - have more than one promisor remote
  - specify different parameters for each promisor remote
  - make it possible later to have other kind of promisor/odb remotes

It also restores the distributed nature of Git which was kind of
broken for promisor remotes.

> What makes the odb key different, in that the partial clone
> feature only handles objects as well?

I am not sure I understand this question. Anyway if we want more than
one promisor remote, we need to be able to specify different
parameters for each promisor remote. For example now
core.partialclonefilter is used to specify some filters for the
promisor remote, but how can we nicely specify different partial clone
filters if we have more than one promisor remote?

With the changes in this patch series core.partialclonefilter is
replaced with odb.<remote odb name>.partialclonefilter, so that
parameters for a remote odb are properly grouped together in the
section where the remote odb is defined.

One alternative scheme could be for example to have:

  remote.<remote name>.promisor = (true|false)

or maybe:

  remote.<remote name>.partialclone = (true|false)

instead of:

  extensions.partialclone = <remote name>

And then we could also have:

  remote.<remote name>.partialclonefilter = <partial clone filter>

The issue with this scheme is that it kind of overloads the
"remote.<remote name>.*" namespace for something that can be seen as
different especially if, as I want to do it later, we are going to
have other kind of promisor/odb remotes.

I plan to send a V5 of this patch series really soon now, where I will
try to explain better the end goal.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-09-25  8:07     ` Christian Couder
@ 2018-09-25 22:31       ` Junio C Hamano
  2018-09-26  4:12         ` Jeff King
  2018-10-16 17:43       ` Jonathan Nieder
  1 sibling, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2018-09-25 22:31 UTC (permalink / raw)
  To: Christian Couder
  Cc: Stefan Beller, git, Jeff King, Ben Peart, Jonathan Tan,
	Duy Nguyen, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

Christian Couder <christian.couder@gmail.com> writes:

> The main issue that this patch series tries to solve is that
> extensions.partialclone config option limits the partial clone and
> promisor features to only one remote. One related issue is that it
> also prevents to have other kind of promisor/partial clone/odb
> remotes. By other kind I mean remotes that would not necessarily be
> git repos, but that could store objects (that's where ODB, for Object
> DataBase, comes from) and could provide those objects to Git through a
> helper (or driver) script or program.

I do not think "sources that are not git repositories" is all that
interesting, unless they can also serve as the source for ext::
remote helper.  And if they can serve "git fetch ext::...", I think
they can be treated just like a normal Git repository by the
backfill code that needs to lazily populate the partial clone.

And it would be nice to be able to say "I took these commits from
that remote and that remote should be able to backfill the trees and
the blobs necessary to complete these commits" for more than one
remote would obviously be a good thing.  The way we mark the
promisor packs currently is by a mere presence of a file, but
nothing prevents us from extending it to write the nickname of the
configured remote the pack was taken from to help us answer "who can
feed us the remaining objects?", for example, so I do not think it
is an insurmountable problem

I guess JTan is the primary person who is interested/working on the
partial clone with backfill?  Have you two been collaborating well?

Do you two need help from us to make that happen, and if so what do
you need?  Stop the world and declare this and that source files are
off limits for two weeks, or something like that?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-09-25 22:31       ` Junio C Hamano
@ 2018-09-26  4:12         ` Jeff King
  2018-09-26 13:44           ` Taylor Blau
  2018-09-26 18:11           ` Junio C Hamano
  0 siblings, 2 replies; 23+ messages in thread
From: Jeff King @ 2018-09-26  4:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, Stefan Beller, git, Ben Peart, Jonathan Tan,
	Duy Nguyen, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

On Tue, Sep 25, 2018 at 03:31:36PM -0700, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
> 
> > The main issue that this patch series tries to solve is that
> > extensions.partialclone config option limits the partial clone and
> > promisor features to only one remote. One related issue is that it
> > also prevents to have other kind of promisor/partial clone/odb
> > remotes. By other kind I mean remotes that would not necessarily be
> > git repos, but that could store objects (that's where ODB, for Object
> > DataBase, comes from) and could provide those objects to Git through a
> > helper (or driver) script or program.
> 
> I do not think "sources that are not git repositories" is all that
> interesting, unless they can also serve as the source for ext::
> remote helper.  And if they can serve "git fetch ext::...", I think
> they can be treated just like a normal Git repository by the
> backfill code that needs to lazily populate the partial clone.

I don't know about that. Imagine I had a regular Git repo with a bunch
of large blobs, and then I also stored those large blobs in something
like S3 that provides caching, geographic locality, and resumable
transfers.

It would be nice to be able to say:

  1. Clone from the real repo, but do not transfer any blobs larger than
     10MB.

  2. When you need a blob, check the external odb that points to S3. Git
     cannot know about this automatically, but presumably you would set
     a few config variables to point to an external-odb helper script.

  3. If for some reason S3 doesn't work, you can always request it from
     the original repo. That part _doesn't_ need extra config, since we
     can assume that the source of the promisor pack can feed us the
     extra objects[1].

But you don't need to ever be able to "git fetch" from the S3 repo.

Now if you are arguing that the interface to the external-odb helper
script should be that it _looks_ like upload-pack, but simply advertises
no refs and will let you fetch any object, that makes more sense to me.
It's not something you could "git clone", but you can "git fetch" from
it.

However, that may be an overly constricting interface for the helper.
E.g., we might want to be able to issue several requests and have them
transfer in parallel. But I suppose we could teach that trick to
upload-pack in the long run, as it may be applicable even to fetching
from "real" git repos.

Hmm. Actually, I kind of like that direction the more I think about it.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-09-26  4:12         ` Jeff King
@ 2018-09-26 13:44           ` Taylor Blau
  2018-09-26 18:11           ` Junio C Hamano
  1 sibling, 0 replies; 23+ messages in thread
From: Taylor Blau @ 2018-09-26 13:44 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Christian Couder, Stefan Beller, git, Ben Peart,
	Jonathan Tan, Duy Nguyen, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

On Wed, Sep 26, 2018 at 12:12:22AM -0400, Jeff King wrote:
> On Tue, Sep 25, 2018 at 03:31:36PM -0700, Junio C Hamano wrote:
>
> > Christian Couder <christian.couder@gmail.com> writes:
> >
> > > The main issue that this patch series tries to solve is that
> > > extensions.partialclone config option limits the partial clone and
> > > promisor features to only one remote. One related issue is that it
> > > also prevents to have other kind of promisor/partial clone/odb
> > > remotes. By other kind I mean remotes that would not necessarily be
> > > git repos, but that could store objects (that's where ODB, for Object
> > > DataBase, comes from) and could provide those objects to Git through a
> > > helper (or driver) script or program.
> >
> > I do not think "sources that are not git repositories" is all that
> > interesting, unless they can also serve as the source for ext::
> > remote helper.  And if they can serve "git fetch ext::...", I think
> > they can be treated just like a normal Git repository by the
> > backfill code that needs to lazily populate the partial clone.
>
> I don't know about that. Imagine I had a regular Git repo with a bunch
> of large blobs, and then I also stored those large blobs in something
> like S3 that provides caching, geographic locality, and resumable
> transfers.
>
> [ ... ]
>
> Now if you are arguing that the interface to the external-odb helper
> script should be that it _looks_ like upload-pack, but simply advertises
> no refs and will let you fetch any object, that makes more sense to me.
> It's not something you could "git clone", but you can "git fetch" from
> it.
>
> However, that may be an overly constricting interface for the helper.
> E.g., we might want to be able to issue several requests and have them
> transfer in parallel. But I suppose we could teach that trick to
> upload-pack in the long run, as it may be applicable even to fetching
> from "real" git repos.
>
> Hmm. Actually, I kind of like that direction the more I think about it.

Yes, this is an important design decision for Git LFS, which I believe
is important to this series. Git LFS allows the caller to issue `n`
parallel object transfers (uploads or downloads) at a time, which is
useful when, say, checking out a repository that has many large objects.

We do this trick with 'filter.lfs.process', where we accumulate many Git
LFS objects that we wish to tell Git about so that it can check them out
into the working copy, and then promise that we will provide the
contents later (e.g., by sending status=delayed).

We then "batch" up all of those requests, issue them all at once (after
which the LFS API will tell us the URLs of where to upload/download each
item), and then we open "N" threads to do that work.

After all of that, we respond back with all of the objects that we had
to download, and close the process filter.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-09-26  4:12         ` Jeff King
  2018-09-26 13:44           ` Taylor Blau
@ 2018-09-26 18:11           ` Junio C Hamano
  1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2018-09-26 18:11 UTC (permalink / raw)
  To: Jeff King
  Cc: Christian Couder, Stefan Beller, git, Ben Peart, Jonathan Tan,
	Duy Nguyen, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

Jeff King <peff@peff.net> writes:

>> I do not think "sources that are not git repositories" is all that
>> interesting, unless they can also serve as the source for ext::
>> remote helper.  And if they can serve "git fetch ext::...", I think
>> they can be treated just like a normal Git repository by the
>> backfill code that needs to lazily populate the partial clone.
>
> I don't know about that. Imagine I had a regular Git repo with a bunch
> of large blobs, and then I also stored those large blobs in something
> like S3 that provides caching, geographic locality, and resumable
> transfers.
>
> It would be nice to be able to say:
>
>   1. Clone from the real repo, but do not transfer any blobs larger than
>      10MB.
>
>   2. When you need a blob, check the external odb that points to S3. Git
>      cannot know about this automatically, but presumably you would set
>      a few config variables to point to an external-odb helper script.
>
>   3. If for some reason S3 doesn't work, you can always request it from
>      the original repo. That part _doesn't_ need extra config, since we
>      can assume that the source of the promisor pack can feed us the
>      extra objects[1].
>
> But you don't need to ever be able to "git fetch" from the S3 repo.
>
> Now if you are arguing that the interface to the external-odb helper
> script should be that it _looks_ like upload-pack, but simply advertises
> no refs and will let you fetch any object, that makes more sense to me.
> It's not something you could "git clone", but you can "git fetch" from
> it.

Yup.  The lazy backfill JTan has, if I understand correctly, only
wants "Please give me this and that object" and use of "upload-pack"
is an implementation detail.  Over the existing Git protocols, you
may implement it as sending these object names as "want" and perhaps
restrict the traversal (if there is a "want" object that is commit)
by giving some commits as "have", i.e. "upload-pack" may not be the
best model for the other side, but that is what we have readily
available.  I was hoping that the way we take to move forward is to
enhance that interface so that we can use different "object store"
backends as needed, to satisfy needs from both parties.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-09-25  8:07     ` Christian Couder
  2018-09-25 22:31       ` Junio C Hamano
@ 2018-10-16 17:43       ` Jonathan Nieder
  2018-10-16 22:22         ` Jonathan Tan
  2018-10-31  6:28         ` Christian Couder
  1 sibling, 2 replies; 23+ messages in thread
From: Jonathan Nieder @ 2018-10-16 17:43 UTC (permalink / raw)
  To: Christian Couder
  Cc: Stefan Beller, git, Junio C Hamano, Jeff King, Ben Peart,
	Jonathan Tan, Duy Nguyen, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Jeff Hostetler, Eric Sunshine, Beat Bolli

Hi Christian,

On Tue, Sep 25, 2018, Christian Couder wrote:

> In the cover letter there is a "Discussion" section which is about
> this, but I agree that it might not be very clear.
>
> The main issue that this patch series tries to solve is that
> extensions.partialclone config option limits the partial clone and
> promisor features to only one remote. One related issue is that it
> also prevents to have other kind of promisor/partial clone/odb
> remotes. By other kind I mean remotes that would not necessarily be
> git repos, but that could store objects (that's where ODB, for Object
> DataBase, comes from) and could provide those objects to Git through a
> helper (or driver) script or program.

Thanks for this explanation.  I took the opportunity to learn more
while you were in the bay area for the google summer of code mentor
summit and learned a little more, which was very helpful to me.

The broader picture is that this is meant to make Git natively handle
large blobs in a nicer way.  The design in this series has a few
components:

 1. Teaching partial clone to attempt to fetch missing objects from
    multiple remotes instead of only one.  This is useful because you
    can have a server that is nearby and cheaper to serve from (some
    kind of local cache server) that you make requests to first before
    falling back to the canonical source of objects.

 2. Simplifying the protocol for fetching missing objects so that it
    can be satisfied by a lighter weight object storage system than
    a full Git server.  The ODB helpers introduced in this series are
    meant to speak such a simpler protocol since they are only used
    for one-off requests of a collection of missing objects instead of
    needing to understand refs, Git's negotiation, etc.

 3. (possibly, though not in this series) Making the criteria for what
    objects can be missing more aggressive, so that I can "git add"
    a large file and work with it using Git without even having a
    second copy of that object in my local object store.

For (2), I would like to see us improve the remote helper
infrastructure instead of introducing a new ODB helper.  Remote
helpers are already permitted to fetch some objects without listing
refs --- perhaps we will want to

 i. split listing refs to a separate capability, so that a remote
    helper can advertise that it doesn't support that.  (Alternatively
    the remote could advertise that it has no refs.)

 ii. Use the "long-running process" mechanism to improve how Git
     communicates with a remote helper.

For (1), things get more tricky.  In an object store from a partial
clone today, we relax the ordinary "closure under reachability"
invariant but in a minor way.  We'll need to work out how this works
with multiple promisor remotes.

The idea today is that there are two kinds of packs: promisor packs
(from the promisor remote) and non-promisor packs.  Promisor packs are
allowed to have reachability edges (for example a tree->blob edge)
that point to a missing object, since the promisor remote has promised
that we will be able to access that object on demand.  Non-promisor
packs are also allowed to have reachability edges that point to a
missing object, as long as there is a reachability edge from an object
in a promisor pack to the same object (because of the same promise).
See "Handling Missing Objects" in Documentation/technical/partial-clone.txt
for more details.

To prevent older versions of Git from being confused by partial clone
repositories, they use the repositoryFormatVersion mechanism:

	[core]
		repositoryFormatVersion = 1
	[extensions]
		partialClone = ...

If we change the invariant, we will need to use a new extensions.* key
to ensure that versions of Git that are not aware of the new invariant
do not operate on the repository.

A promisor pack is indicated by there being a .promisor file next to
the usual .pack file.  Currently the .promisor file is empty.  The
previous idea was that once we want more metadata (e.g. for the sake of
multiple promisor remotes), we could write it in that file.  For
example, remotes could be associated to a <promisor-id> and the
.promisor file could indicate which <promisor-id> has promised to serve
requests for objects reachable from objects in this pack.

That will complicate the object access code as well, since currently
we only find who has promised an object during "git fsck" and similar
operations.  During everyday access we do not care which promisor
pack caused the object to be promised, since there is only one promisor
remote to fetch from anyway.

So much for the current setup.  For (1), I believe you are proposing to
still have only one effective <promisor-id>, so it doesn't necessarily
require modifying the extensions.* configuration.  Instead, the idea is
that when trying to access an object, we would follow one of a list of
steps:

 1. First, check the local object store. If it's there, we're done.
 2. Second, try alternates --- maybe the object is in one of those!
 3. Now, try promisor remotes, one at a time, in user-configured order.

In other words, I think that for (1) all we would need is a new
configuration

	[object]
		missingObjectRemote = local-cache-remote
		missingObjectRemote = origin

The semantics would be that when trying to access a promised object,
we attempt to fetch from these remotes one at a time, in the order
specified.  We could require that the remote named in
extensions.partialClone be one of the listed remotes, without having
to care where it shows up in the list.

That way, we get the benefit (1) without having to change the
semantics of extensions.partialClone and without having to care about
the order of sections in the config.  What do you think?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-10-16 17:43       ` Jonathan Nieder
@ 2018-10-16 22:22         ` Jonathan Tan
  2018-10-19  0:01           ` Junio C Hamano
  2018-10-31  6:28         ` Christian Couder
  1 sibling, 1 reply; 23+ messages in thread
From: Jonathan Tan @ 2018-10-16 22:22 UTC (permalink / raw)
  To: jrnieder
  Cc: christian.couder, sbeller, git, gitster, peff, Ben.Peart,
	jonathantanmy, pclouds, mh, larsxschneider, e, chriscool,
	jeffhost, sunshine, dev+git

>  1. Teaching partial clone to attempt to fetch missing objects from
>     multiple remotes instead of only one.  This is useful because you
>     can have a server that is nearby and cheaper to serve from (some
>     kind of local cache server) that you make requests to first before
>     falling back to the canonical source of objects.

Quoting the above definition of (1) for reference. I think Jonathan
Nieder has covered the relevant points well - I'll just expand on (1).

> So much for the current setup.  For (1), I believe you are proposing to
> still have only one effective <promisor-id>, so it doesn't necessarily
> require modifying the extensions.* configuration.  Instead, the idea is
> that when trying to access an object, we would follow one of a list of
> steps:
> 
>  1. First, check the local object store. If it's there, we're done.
>  2. Second, try alternates --- maybe the object is in one of those!
>  3. Now, try promisor remotes, one at a time, in user-configured order.
> 
> In other words, I think that for (1) all we would need is a new
> configuration
> 
> 	[object]
> 		missingObjectRemote = local-cache-remote
> 		missingObjectRemote = origin
> 
> The semantics would be that when trying to access a promised object,
> we attempt to fetch from these remotes one at a time, in the order
> specified.  We could require that the remote named in
> extensions.partialClone be one of the listed remotes, without having
> to care where it shows up in the list.

Or allow extensions.partialClone=<R> wherein <R> is not in the
missingObjectRemote, in which case <R> is tried first, so that we don't
have to reject some configurations.

> That way, we get the benefit (1) without having to change the
> semantics of extensions.partialClone and without having to care about
> the order of sections in the config.  What do you think?

Let's define the promisor remotes of a repository as those in
missingObjectRemote or extensions.partialClone (currently, we talk about
"the promisor remote" (singular), defined in extensions.partialClone).

Overall, this seems like a reasonable idea to me, if we keep the
restriction that we can only fetch with filter from a promisor remote.
This allows us to extend the definition of a promisor object in a
manner consistent to the current definition - to say "a promisor object
is one promised by at least one promisor remote" (currently, "a promisor
object is promised by the promisor remote").

In the presence of missingObjectRemote, old versions of Git, when lazily
fetching, would only know to try the extensions.partialClone remote. But
this is safe because existing data wouldn't be clobbered (since we're
not using ideas like adding meaning to the contents of the .promisor
file). Also, other things like fsck and gc still work.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-10-16 22:22         ` Jonathan Tan
@ 2018-10-19  0:01           ` Junio C Hamano
  2018-10-19  0:33             ` Jonathan Nieder
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2018-10-19  0:01 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: jrnieder, christian.couder, sbeller, git, peff, Ben.Peart,
	pclouds, mh, larsxschneider, e, chriscool, jeffhost, sunshine,
	dev+git

Jonathan Tan <jonathantanmy@google.com> writes:

>> 	[object]
>> 		missingObjectRemote = local-cache-remote
>> 		missingObjectRemote = origin
>> 
> In the presence of missingObjectRemote, old versions of Git, when lazily
> fetching, would only know to try the extensions.partialClone remote. But
> this is safe because existing data wouldn't be clobbered (since we're
> not using ideas like adding meaning to the contents of the .promisor
> file). Also, other things like fsck and gc still work.

It is a good idea to implicitly include the promisor-remote to the
set of secondary places to consult to help existing versions of Git,
but once the repository starts fetching incomplete subgraphs and
adding new object.missingobjectremote [*1*], these versions of Git
will stop working correctly, so I am not sure if it is all that
useful approach for compatibility in practice.


[Footnote]

*1* That name with two "object" in it sounds horrible.  I think the
same keyname in 'core' section may sit better (this feature sounds
more 'core' than other cruft that crept into 'core' section over
time).  

Or "odb.remoteAlternate" (as opposed to object/info/alternates that
are local alternates), perhaps.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-10-19  0:01           ` Junio C Hamano
@ 2018-10-19  0:33             ` Jonathan Nieder
  2018-10-19  2:55               ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Jonathan Nieder @ 2018-10-19  0:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Tan, christian.couder, sbeller, git, peff, Ben.Peart,
	pclouds, mh, larsxschneider, e, chriscool, jeffhost, sunshine,
	dev+git

Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
>> Jonathan Nieder wrote:

>>> 	[object]
>>> 		missingObjectRemote = local-cache-remote
>>> 		missingObjectRemote = origin
>>
>> In the presence of missingObjectRemote, old versions of Git, when lazily
>> fetching, would only know to try the extensions.partialClone remote. But
>> this is safe because existing data wouldn't be clobbered (since we're
>> not using ideas like adding meaning to the contents of the .promisor
>> file). Also, other things like fsck and gc still work.
>
> It is a good idea to implicitly include the promisor-remote to the
> set of secondary places to consult to help existing versions of Git,
> but once the repository starts fetching incomplete subgraphs and
> adding new object.missingobjectremote [*1*], these versions of Git
> will stop working correctly, so I am not sure if it is all that
> useful approach for compatibility in practice.

Can you spell this out for me more?  Do you mean that a remote from
this list might make a promise that the original partialClone remote
can't keep?

If we're careful to only page in objects that were promised by the
original partialClone remote, then a promise "I promise to supply you
on demand with any object directly or indirectly reachable from any
object you have fetched from me" from the partialClone remote should
be enough.

> [Footnote]
>
> *1* That name with two "object" in it sounds horrible.

Sorry about that.  Another name used while discussing this was
objectAccess.promisorRemote.  As long as the idea is clear (that this
means "remotes to use when attempting to obtain an object that is not
already available locally"), I am not attached to any particular name.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-10-19  0:33             ` Jonathan Nieder
@ 2018-10-19  2:55               ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2018-10-19  2:55 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Jonathan Tan, christian.couder, sbeller, git, peff, Ben.Peart,
	pclouds, mh, larsxschneider, e, chriscool, jeffhost, sunshine,
	dev+git

Jonathan Nieder <jrnieder@gmail.com> writes:

> Junio C Hamano wrote:
> ...
>> It is a good idea to implicitly include the promisor-remote to the
>> set of secondary places to consult to help existing versions of Git,
>> but once the repository starts fetching incomplete subgraphs and
>> adding new object.missingobjectremote [*1*], these versions of Git
>> will stop working correctly, so I am not sure if it is all that
>> useful approach for compatibility in practice.
>
> Can you spell this out for me more?  Do you mean that a remote from
> this list might make a promise that the original partialClone remote
> can't keep?

It was my failed attempt to demonstrate that I understood what was
being discussed by rephrasing JTan's

    Or allow extensions.partialClone=<R> wherein <R> is not in the
    missingObjectRemote, in which case <R> is tried first, so that
    we don't have to reject some configurations.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-10-16 17:43       ` Jonathan Nieder
  2018-10-16 22:22         ` Jonathan Tan
@ 2018-10-31  6:28         ` Christian Couder
  2018-11-01 21:16           ` Jeff King
  1 sibling, 1 reply; 23+ messages in thread
From: Christian Couder @ 2018-10-31  6:28 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, git, Junio C Hamano, Jeff King, Ben Peart,
	Jonathan Tan, Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider,
	Eric Wong, Christian Couder, Jeff Hostetler, Eric Sunshine,
	Beat Bolli

Hi Jonathan,

On Tue, Oct 16, 2018 at 7:43 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Hi Christian,
>
> On Tue, Sep 25, 2018, Christian Couder wrote:
>
> > In the cover letter there is a "Discussion" section which is about
> > this, but I agree that it might not be very clear.
> >
> > The main issue that this patch series tries to solve is that
> > extensions.partialclone config option limits the partial clone and
> > promisor features to only one remote. One related issue is that it
> > also prevents to have other kind of promisor/partial clone/odb
> > remotes. By other kind I mean remotes that would not necessarily be
> > git repos, but that could store objects (that's where ODB, for Object
> > DataBase, comes from) and could provide those objects to Git through a
> > helper (or driver) script or program.
>
> Thanks for this explanation.  I took the opportunity to learn more
> while you were in the bay area for the google summer of code mentor
> summit and learned a little more, which was very helpful to me.

Thanks for inviting me at the Google offices in Sunnyvale and San
Francisco to discuss about this and other issues.

> The broader picture is that this is meant to make Git natively handle
> large blobs in a nicer way.  The design in this series has a few
> components:
>
>  1. Teaching partial clone to attempt to fetch missing objects from
>     multiple remotes instead of only one.  This is useful because you
>     can have a server that is nearby and cheaper to serve from (some
>     kind of local cache server) that you make requests to first before
>     falling back to the canonical source of objects.
>
>  2. Simplifying the protocol for fetching missing objects so that it
>     can be satisfied by a lighter weight object storage system than
>     a full Git server.  The ODB helpers introduced in this series are
>     meant to speak such a simpler protocol since they are only used
>     for one-off requests of a collection of missing objects instead of
>     needing to understand refs, Git's negotiation, etc.
>
>  3. (possibly, though not in this series) Making the criteria for what
>     objects can be missing more aggressive, so that I can "git add"
>     a large file and work with it using Git without even having a
>     second copy of that object in my local object store.

Yeah, I think this is a good summary of the issues I have been trying
to address.

> For (2), I would like to see us improve the remote helper
> infrastructure instead of introducing a new ODB helper.  Remote
> helpers are already permitted to fetch some objects without listing
> refs --- perhaps we will want to
>
>  i. split listing refs to a separate capability, so that a remote
>     helper can advertise that it doesn't support that.  (Alternatively
>     the remote could advertise that it has no refs.)
>
>  ii. Use the "long-running process" mechanism to improve how Git
>      communicates with a remote helper.

Yeah, I agree that improving the remote helper infrastructure is
probably better than what I have been trying to add. And I agree with
the above 2 steps you propose.

> For (1), things get more tricky.  In an object store from a partial
> clone today, we relax the ordinary "closure under reachability"
> invariant but in a minor way.  We'll need to work out how this works
> with multiple promisor remotes.
>
> The idea today is that there are two kinds of packs: promisor packs
> (from the promisor remote) and non-promisor packs.  Promisor packs are
> allowed to have reachability edges (for example a tree->blob edge)
> that point to a missing object, since the promisor remote has promised
> that we will be able to access that object on demand.  Non-promisor
> packs are also allowed to have reachability edges that point to a
> missing object, as long as there is a reachability edge from an object
> in a promisor pack to the same object (because of the same promise).
> See "Handling Missing Objects" in Documentation/technical/partial-clone.txt
> for more details.
>
> To prevent older versions of Git from being confused by partial clone
> repositories, they use the repositoryFormatVersion mechanism:
>
>         [core]
>                 repositoryFormatVersion = 1
>         [extensions]
>                 partialClone = ...
>
> If we change the invariant, we will need to use a new extensions.* key
> to ensure that versions of Git that are not aware of the new invariant
> do not operate on the repository.

Maybe the versions of Git that are not aware of the new invariant
could still work using only the specified remote while the new
versions would know that they can use other remotes by looking at
other config variables.

> A promisor pack is indicated by there being a .promisor file next to
> the usual .pack file.  Currently the .promisor file is empty.  The
> previous idea was that once we want more metadata (e.g. for the sake of
> multiple promisor remotes), we could write it in that file.  For
> example, remotes could be associated to a <promisor-id> and the
> .promisor file could indicate which <promisor-id> has promised to serve
> requests for objects reachable from objects in this pack.

I think it would be nicer if we could avoid any modification of the
file format. Also I don't think it is important for the user to be
able to blame any specific remote for not being able to serve a given
object, because sometimes it could be better for a remote that would
be too slow to provide a large blob to just reject any request for
this blob and to instead expect users to get it from another faster
remote.

I think that when Git cannot get a promised object it should just tell
that it could not get it from the configured promisor remotes and list
the promisor remotes (along maybe with the associated URLs).

> That will complicate the object access code as well, since currently
> we only find who has promised an object during "git fsck" and similar
> operations.  During everyday access we do not care which promisor
> pack caused the object to be promised, since there is only one promisor
> remote to fetch from anyway.

I agree.

> So much for the current setup.  For (1), I believe you are proposing to
> still have only one effective <promisor-id>, so it doesn't necessarily
> require modifying the extensions.* configuration.

Yes, it would be better if we don't need to modify the "extensions.*"
configuration. So the remote given by the "extensions.partialClone"
config option would only be a remote that can provide all the objects
to previous versions of Git (those that don't work with more than one
promisor remote).

> Instead, the idea is
> that when trying to access an object, we would follow one of a list of
> steps:
>
>  1. First, check the local object store. If it's there, we're done.
>  2. Second, try alternates --- maybe the object is in one of those!
>  3. Now, try promisor remotes, one at a time, in user-configured order.

Yeah, 1. and 2. are already working, so we just need to implement 3..

> In other words, I think that for (1) all we would need is a new
> configuration
>
>         [object]
>                 missingObjectRemote = local-cache-remote
>                 missingObjectRemote = origin

Yeah, we could add some configuration to define an order. I don't
think it's really important though to be able to configure an order
right now. I think it's more important to first just have a mechanism
to iterate over the promisor remotes. By default if no order is
configured, the order we use to iterate can be the order of the
promisor remotes in the config file(s).

> The semantics would be that when trying to access a promised object,
> we attempt to fetch from these remotes one at a time, in the order
> specified.  We could require that the remote named in
> extensions.partialClone be one of the listed remotes, without having
> to care where it shows up in the list.

Yeah, when we implement this in the config, it would make sense to
have such kind of semantics. If some promisor remotes are not
configured in the order, we could also decide that these remotes
should be tried after those that are configured, or maybe we should
decide that they should not be tried except if the remote is specified
by "extensions.partialClone".

> That way, we get the benefit (1) without having to change the
> semantics of extensions.partialClone and without having to care about
> the order of sections in the config.  What do you think?

I mostly agree with you, though I would like to work on things in the
following order:

  1) Teaching partial clone to attempt to fetch missing objects from
multiple remotes instead of only one using the order in the config.

  2) Simplifying the protocol for fetching missing objects so that it
can be satisfied by a lighter weight object storage system than a full
Git server.

  3) Making it possible to explicitly define an order in which the
remotes are accessed.

  4) Making the criteria for what objects can be missing more
aggressive, so that I can "git add" a large file and work with it
using Git without even having a second copy of that object in my local
object store.

Or maybe even switching the above 3) and 4).

I think we should also decide how we specify different parameters for
each promisor remote. There is already "core.partialclonefilter", but
if we have more than one promisor remote, we might want to have
different partial clone filters for different remotes.

I am ok with using the "remote.<remote name>.*" namespace, for example
"remote.<remote name>.partialclonefilter". "core.partialclonefilter"
would then be the default partial clone filter for the remote
specified by "extensions.partialClone".

After saying all the above though, I still wonder if keeping
"extensions.partialClone" and "core.partialclonefilter" is not already
a burden that will make us implement unnecessarily complex semantics
and will become even more a burden over time.

For example what if we later want to say something like:

"When fetching from this remote, I want to use these promisor remotes
in this order, though, when pushing to that (maybe different) remote,
I want to
use those (maybe a bit different) promisor remotes in that order."

which I think should become quite reasonable to ask for.

So I think it might be interesting to look at how we would ideally
want to specify such kind of configuration, as it might lead us to
follow a different, though perhaps saner over time, direction.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote
  2018-10-31  6:28         ` Christian Couder
@ 2018-11-01 21:16           ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2018-11-01 21:16 UTC (permalink / raw)
  To: Christian Couder
  Cc: Jonathan Nieder, Stefan Beller, git, Junio C Hamano, Ben Peart,
	Jonathan Tan, Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider,
	Eric Wong, Christian Couder, Jeff Hostetler, Eric Sunshine,
	Beat Bolli

On Wed, Oct 31, 2018 at 07:28:09AM +0100, Christian Couder wrote:

> > For (2), I would like to see us improve the remote helper
> > infrastructure instead of introducing a new ODB helper.  Remote
> > helpers are already permitted to fetch some objects without listing
> > refs --- perhaps we will want to
> >
> >  i. split listing refs to a separate capability, so that a remote
> >     helper can advertise that it doesn't support that.  (Alternatively
> >     the remote could advertise that it has no refs.)
> >
> >  ii. Use the "long-running process" mechanism to improve how Git
> >      communicates with a remote helper.
> 
> Yeah, I agree that improving the remote helper infrastructure is
> probably better than what I have been trying to add. And I agree with
> the above 2 steps you propose.

One thing you might want to port over is the ability to ask the remote
helper "tell me the type and size of these objects". The reason I built
that into the original external-odb interface proposal was so that diff
could easily skip large objects without faulting them in (because
they're considered binary, and we'd just say "binary files differ"
anyway). That makes things like "git log -p" work a lot better (try it
with a blob-less partial clone now; it's pretty painful).

I know that's kind of the _opposite_ of how partial clones work now,
where we try really hard not to have to even tell the client the full
list of objects. That's good if the reason you want a partial clone is
because there are gigantic numbers of objects (e.g., the Windows repo).
But I think many people are interested in having a more moderate number
of large objects (e.g., things like game development that are using
git-lfs now). It would be great if we could support both use cases
easily.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-11-01 21:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-02  6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder
2018-08-02  6:14 ` [PATCH v4 1/9] fetch-object: make functions return an error code Christian Couder
2018-08-02  6:14 ` [PATCH v4 2/9] Add initial remote odb support Christian Couder
2018-08-02  6:14 ` [PATCH v4 3/9] remote-odb: implement remote_odb_get_direct() Christian Couder
2018-08-02  6:15 ` [PATCH v4 4/9] remote-odb: implement remote_odb_get_many_direct() Christian Couder
2018-08-02  6:15 ` [PATCH v4 5/9] remote-odb: add remote_odb_reinit() Christian Couder
2018-08-02  6:15 ` [PATCH v4 6/9] Use remote_odb_get_direct() and has_remote_odb() Christian Couder
2018-08-02  6:15 ` [PATCH v4 7/9] Use odb.origin.partialclonefilter instead of core.partialclonefilter Christian Couder
2018-08-02  6:15 ` [PATCH v4 8/9] t0410: test fetching from many promisor remotes Christian Couder
2018-08-02  6:15 ` [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote Christian Couder
2018-08-02 22:55   ` Stefan Beller
2018-09-25  8:07     ` Christian Couder
2018-09-25 22:31       ` Junio C Hamano
2018-09-26  4:12         ` Jeff King
2018-09-26 13:44           ` Taylor Blau
2018-09-26 18:11           ` Junio C Hamano
2018-10-16 17:43       ` Jonathan Nieder
2018-10-16 22:22         ` Jonathan Tan
2018-10-19  0:01           ` Junio C Hamano
2018-10-19  0:33             ` Jonathan Nieder
2018-10-19  2:55               ` Junio C Hamano
2018-10-31  6:28         ` Christian Couder
2018-11-01 21:16           ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).