git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v6 00/40] Add initial experimental external ODB support
@ 2017-09-16  8:06 Christian Couder
  2017-09-16  8:06 ` [PATCH v6 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
                   ` (40 more replies)
  0 siblings, 41 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Note: a lot of information about the goals, the design and how things
work is now in the followin patches:

  - [PATCH v6 34/40] Add Documentation/technical/external-odb.txt
  - [PATCH v6 40/40] Doc/external-odb: explain transfering objects and metadata

Goal
~~~~

Git can store its objects only in the form of loose objects in
separate files or packed objects in a pack file.

To be able to better handle some kind of objects, for example big
blobs, it would be nice if Git could store its objects in other object
databases (ODB).

To do that, this patch series makes it possible to register commands,
also called "helpers", using "odb.<odbname>.scriptCommand" or
"odb.<odbname>.subprocessCommand" config variables, to access external
ODBs where objects can be stored and retrieved.

Design
~~~~~~

* The "helpers" (registered commands)

Each helper manages access to one external ODB.

There are 2 different modes for helper:

  - Helpers configured using "odb.<odbname>.scriptCommand" are
    launched each time Git wants to communicate with the <odbname>
    external ODB. This is called "script mode".

  - Helpers configured using "odb.<odbname>.subprocessCommand" are
    launched launched once as a sub-process (using sub-process.h), and
    Git communicates with them using packet lines. This is called
    "process mode".

A helper can be given different instructions by Git. The instructions
that are supported are negociated at the beginning of the
communication using a capability mechanism.

See patch 34/40 and 40/40 (the documentation patchs) for more
information about the different instructions and their arguments.

* Performance

The process mode has been implemented using the refactoring that Ben
Peart did on top of Lars Schneider's work on using sub-processes and
packet lines in the smudge/clean filters for git-lfs.

This also uses further work from Ben Peart called "read object
process".

See:

https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170322165220.5660-1-benpeart@microsoft.com/

Ben recently sent an update of this work but this update has not been
integrated into the current patch series. See:

https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/

Anyway thanks to this, the external ODB mechanism should in the end perform
as well as the git-lfs mechanism when many objects should be
transfered.

Implementation
~~~~~~~~~~~~~~

* Mechanism to call the registered commands

A set of function in external-odb.{c,h} are called by the rest of Git
to manage all the external ODBs.

These functions use 'struct odb_helper' and its associated functions
defined in odb-helper.{c,h} to talk to the different external ODBs by
launching the configured "odb.<odbname>.*command" commands and writing
to or reading from them.

* Transfering information

To tranfer information about the blobs stored in external ODB, some
special refs, called "odb ref", similar as replace refs, are used in
the tests of this series, but in general nothing forces the helper to
use that mechanism.

The external odb helper is responsible for using and creating the refs
in refs/odbs/<odbname>/, if it wants to do that. It is free for example
to just create one ref, as it is also free to create many refs. Git
would just transmit the refs that have been created by this helper, if
Git is asked to do so.

For now in the tests there is one odb ref per blob, as it is simple
and as it is similar to what git-lfs does. Each ref name is
refs/odbs/<odbname>/<sha1> where <sha1> is the sha1 of the blob stored
in the external odb named <odbname>.

These odb refs point to a blob that is stored in the Git
repository and contain information about the blob stored in the
external odb. This information can be specific to the external odb.
The repos can then share this information using commands like:

`git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`

At the end of the current patch series, "git clone" is teached a
"--initial-refspec" option, that asks it to first fetch some specified
refs. This is used in the tests to fetch the odb refs first.

This way only one "git clone" command can setup a repo using the
external ODB mechanism as long as the right helper is installed on the
machine and as long as the following options are used:

  - "--initial-refspec <odbrefspec>" to fetch the odb refspec
  - "-c odb.<odbname>.command=<helper>" to configure the helper

There is also a test script (t0430) that shows that the
"--initial-refspec" option along with the external ODB mechanism can
be used to implement cloning using bundles.

* ODB refs

For now odb ref management is only implemented in a helper in t0410.

When a new blob is added to an external odb, its sha1, size and type
are writen in another new blob and the odb ref is created.

When the list of existing blobs is requested from the external odb,
the content of the blobs pointed to by the odb refs can also be used
by the odb to claim that it can get the objects.

When a blob is actually requested from the external odb, it can use
the content stored in the blobs pointed to by the odb refs to get the
actual blobs and then pass them.

Highlevel view of the patches in the series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    - Patch 1/40 is a small code cleanup that I already sent to the
      mailing list but may be removed in the end due to ongoing work
      on "git clone".

    - Patches 02/40 to 07/40 create a "Git/Packet.pm" module by
      refactoring "t0021/rot13-filter.pl". Functions from this new
      module will be used later in test scripts. According to Junio's
      suggestion compared to v5 we now first fully refactor
      "t0021/rot13-filter.pl" before creating the "Git/Packet.pm"
      module.

    - Patches 08/40 to 16/40 create the external ODB insfrastructure
      in external-odb.{c,h} and odb-helper.{c,h} for the script mode.
      The main changes compared to v5 are the following:
        - we mark as "extern" functions in *.h files
	- we use sha1_pos() instead of sha1_entry_pos()
	- we check the size in the header when we 'get' a Git object

    - Patches 17/40 to 23/40 improve lib-http to make it possible to
      use it as an external ODB to test storing blobs in an HTTP
      server. The "upload.sh" and "list.sh" files are now properly
      indented and they use %% instead of % in parameter
      substitutions compared to v5.

    - Patches 24/40 to 32/40 improve the external ODB insfrastructure
      to support sub-processes and make everything work using
      them. The main changes compared to v5 are the following:
        - we mark as "extern" functions in *.h files
	- we use the new subprocess_handshake() function
	- we check the size in the header when we 'get' a Git object

    - Patch 33/40 uses attributes to mark blobs that should be handled
      by an external odb.

    - Patch 34/40 adds documentation about the external odb
      mechanism. This patch has been much improved since v5.

    - Patches 35/40 to 39/40 add the --initial-refspec to git clone
      along with tests.

    - Patch 40/40 adds documentation about transfering objects and
      metadata when using the external odb mechanism. This patch is
      new since v5.

Future work
~~~~~~~~~~~

There are still things that could be cleaned or improved. I think I
may work on:

  - Integrate changes in recent "read-object-process" work by Ben Peart.

  - Better test all the combinations of the different modes with and
    without "have" and "put_*" instructions.

  - Maybe implement the missing kinds of 'put' ('put_git_obj' and
    'put_direct'), so that Git could pass either a git object a plain
    object or ask the helper to retreive it directly from Git's object
    database.

  - Add more long running tests and improve tests in general.

Previous work and discussions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(Sorry for the old Gmane links, I hope I will try to replace them with
public-inbox.org at one point.)

Peff started to work on this and discuss this some years ago:

http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207040
http://thread.gmane.org/gmane.comp.version-control.git/247171
http://thread.gmane.org/gmane.comp.version-control.git/202902/focus=203020

His work, which is not compile-tested any more, is still there:

https://github.com/peff/git/commits/jk/external-odb-wip

Initial discussions about this new series are there:

http://thread.gmane.org/gmane.comp.version-control.git/288151/focus=295160

Version 1, 2, 3, 4 and 5 of this series are here:

https://public-inbox.org/git/20160613085546.11784-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20160628181933.24620-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20161130210420.15982-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20170620075523.26961-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20170803091926.1755-1-chriscool@tuxfamily.org/

Some of the discussions related to Ben Peart's work that is used by
this series are here:

https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170322165220.5660-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/

Links
~~~~~

This patch series is available here:

https://github.com/chriscool/git/commits/external-odb

Version 1, 2, 3, 4 and 5 are here:

https://github.com/chriscool/git/commits/gl-external-odb12
https://github.com/chriscool/git/commits/gl-external-odb22
https://github.com/chriscool/git/commits/gl-external-odb61
https://github.com/chriscool/git/commits/gl-external-odb239
https://github.com/chriscool/git/commits/gl-external-odb373


Ben Peart (2):
  odb-helper: add init_object_process()
  Add t0450 to test 'get_direct' mechanism

Christian Couder (38):
  builtin/clone: get rid of 'value' strbuf
  t0021/rot13-filter: refactor packet reading functions
  t0021/rot13-filter: improve 'if .. elsif .. else' style
  t0021/rot13-filter: improve error message
  t0021/rot13-filter: add packet_initialize()
  t0021/rot13-filter: add capability functions
  Add Git/Packet.pm from parts of t0021/rot13-filter.pl
  sha1_file: prepare for external odbs
  Add initial external odb support
  odb-helper: add odb_helper_init() to send 'init' instruction
  t0400: add 'put_raw_obj' instruction to odb-helper script
  external odb: add 'put_raw_obj' support
  external-odb: accept only blobs for now
  t0400: add test for external odb write support
  Add GIT_NO_EXTERNAL_ODB env variable
  Add t0410 to test external ODB transfer
  lib-httpd: pass config file to start_httpd()
  lib-httpd: add upload.sh
  lib-httpd: add list.sh
  lib-httpd: add apache-e-odb.conf
  odb-helper: add odb_helper_get_raw_object()
  pack-objects: don't pack objects in external odbs
  Add t0420 to test transfer to HTTP external odb
  external-odb: add 'get_direct' support
  odb-helper: add 'script_mode' to 'struct odb_helper'
  Add t0460 to test passing git objects
  odb-helper: add put_object_process()
  Add t0470 to test passing raw objects
  odb-helper: add have_object_process()
  Add t0480 to test "have" capability and raw objects
  external-odb: use 'odb=magic' attribute to mark odb blobs
  Add Documentation/technical/external-odb.txt
  clone: add 'initial' param to write_remote_refs()
  clone: add --initial-refspec option
  clone: disable external odb before initial clone
  Add tests for 'clone --initial-refspec'
  Add t0430 to test cloning using bundles
  Doc/external-odb: explain transfering objects and metadata

 Documentation/technical/external-odb.txt |  447 +++++++++++++
 Makefile                                 |    2 +
 builtin/clone.c                          |   91 ++-
 builtin/pack-objects.c                   |    4 +
 cache.h                                  |   18 +
 environment.c                            |    4 +
 external-odb.c                           |  196 ++++++
 external-odb.h                           |   12 +
 odb-helper.c                             | 1076 ++++++++++++++++++++++++++++++
 odb-helper.h                             |   45 ++
 perl/Git/Packet.pm                       |  118 ++++
 sha1_file.c                              |  155 +++--
 t/lib-httpd.sh                           |    8 +-
 t/lib-httpd/apache-e-odb.conf            |  214 ++++++
 t/lib-httpd/list.sh                      |   41 ++
 t/lib-httpd/upload.sh                    |   45 ++
 t/t0021/rot13-filter.pl                  |  110 +--
 t/t0400-external-odb.sh                  |   85 +++
 t/t0410-transfer-e-odb.sh                |  147 ++++
 t/t0420-transfer-http-e-odb.sh           |  152 +++++
 t/t0430-clone-bundle-e-odb.sh            |   85 +++
 t/t0450-read-object.sh                   |   28 +
 t/t0450/read-object                      |   68 ++
 t/t0460-read-object-git.sh               |   28 +
 t/t0460/read-object-git                  |   78 +++
 t/t0470-read-object-http-e-odb.sh        |  119 ++++
 t/t0470/read-object-plain                |   83 +++
 t/t0480-read-object-have-http-e-odb.sh   |  119 ++++
 t/t0480/read-object-plain-have           |  103 +++
 t/t5616-clone-initial-refspec.sh         |   48 ++
 30 files changed, 3588 insertions(+), 141 deletions(-)
 create mode 100644 Documentation/technical/external-odb.txt
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100644 perl/Git/Packet.pm
 create mode 100644 t/lib-httpd/apache-e-odb.conf
 create mode 100644 t/lib-httpd/list.sh
 create mode 100644 t/lib-httpd/upload.sh
 create mode 100755 t/t0400-external-odb.sh
 create mode 100755 t/t0410-transfer-e-odb.sh
 create mode 100755 t/t0420-transfer-http-e-odb.sh
 create mode 100755 t/t0430-clone-bundle-e-odb.sh
 create mode 100755 t/t0450-read-object.sh
 create mode 100755 t/t0450/read-object
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git
 create mode 100755 t/t0470-read-object-http-e-odb.sh
 create mode 100755 t/t0470/read-object-plain
 create mode 100755 t/t0480-read-object-have-http-e-odb.sh
 create mode 100755 t/t0480/read-object-plain-have
 create mode 100755 t/t5616-clone-initial-refspec.sh

-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v6 01/40] builtin/clone: get rid of 'value' strbuf
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:06 ` [PATCH v6 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This makes the code simpler by removing a few lines, and getting
rid of one variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 8d11b570a1..dcd5b878f1 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -875,7 +875,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	const struct ref *our_head_points_at;
 	struct ref *mapped_refs;
 	const struct ref *ref;
-	struct strbuf key = STRBUF_INIT, value = STRBUF_INIT;
+	struct strbuf key = STRBUF_INIT;
 	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
 	struct transport *transport = NULL;
 	const char *src_ref_prefix = "refs/heads/";
@@ -1040,7 +1040,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		strbuf_addf(&branch_top, "refs/remotes/%s/", option_origin);
 	}
 
-	strbuf_addf(&value, "+%s*:%s*", src_ref_prefix, branch_top.buf);
 	strbuf_addf(&key, "remote.%s.url", option_origin);
 	git_config_set(key.buf, repo);
 	strbuf_reset(&key);
@@ -1054,10 +1053,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
-	fetch_pattern = value.buf;
+	fetch_pattern = xstrfmt("+%s*:%s*", src_ref_prefix, branch_top.buf);
 	refspec = parse_fetch_refspec(1, &fetch_pattern);
-
-	strbuf_reset(&value);
+	free((char *)fetch_pattern);
 
 	remote = remote_get(option_origin);
 	transport = transport_get(remote, remote->url[0]);
@@ -1196,7 +1194,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	strbuf_release(&reflog_msg);
 	strbuf_release(&branch_top);
 	strbuf_release(&key);
-	strbuf_release(&value);
 	junk_mode = JUNK_LEAVE_ALL;
 
 	free(refspec);
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 02/40] t0021/rot13-filter: refactor packet reading functions
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
  2017-09-16  8:06 ` [PATCH v6 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:06 ` [PATCH v6 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To make it possible in a following commit to move packet
reading and writing functions into a Packet.pm module,
let's refactor these functions, so they don't handle
printing debug output and exiting.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index ad685d92f8..e4495a52f3 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -60,8 +60,7 @@ sub packet_bin_read {
 	my $bytes_read = read STDIN, $buffer, 4;
 	if ( $bytes_read == 0 ) {
 		# EOF - Git stopped talking to us!
-		print $debug "STOP\n";
-		exit();
+		return ( -1, "" );
 	}
 	elsif ( $bytes_read != 4 ) {
 		die "invalid packet: '$buffer'";
@@ -85,7 +84,7 @@ sub packet_bin_read {
 
 sub packet_txt_read {
 	my ( $res, $buf ) = packet_bin_read();
-	unless ( $buf eq '' or $buf =~ s/\n$// ) {
+	unless ( $res == -1 or $buf eq '' or $buf =~ s/\n$// ) {
 		die "A non-binary line MUST be terminated by an LF.";
 	}
 	return ( $res, $buf );
@@ -131,7 +130,12 @@ print $debug "init handshake complete\n";
 $debug->flush();
 
 while (1) {
-	my ( $command ) = packet_txt_read() =~ /^command=(.+)$/;
+	my ( $res, $command ) = packet_txt_read();
+	if ( $res == -1 ) {
+		print $debug "STOP\n";
+		exit();
+	}
+	$command =~ s/^command=//;
 	print $debug "IN: $command";
 	$debug->flush();
 
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
  2017-09-16  8:06 ` [PATCH v6 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
  2017-09-16  8:06 ` [PATCH v6 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:06 ` [PATCH v6 04/40] t0021/rot13-filter: improve error message Christian Couder
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Before further refactoring the "t0021/rot13-filter.pl" script,
let's modernize the style of its 'if .. elsif .. else' clauses
to improve its readability by making it more similar to our
other perl scripts.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 39 +++++++++++++--------------------------
 1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index e4495a52f3..82882392ae 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -61,23 +61,20 @@ sub packet_bin_read {
 	if ( $bytes_read == 0 ) {
 		# EOF - Git stopped talking to us!
 		return ( -1, "" );
-	}
-	elsif ( $bytes_read != 4 ) {
+	} elsif ( $bytes_read != 4 ) {
 		die "invalid packet: '$buffer'";
 	}
 	my $pkt_size = hex($buffer);
 	if ( $pkt_size == 0 ) {
 		return ( 1, "" );
-	}
-	elsif ( $pkt_size > 4 ) {
+	} elsif ( $pkt_size > 4 ) {
 		my $content_size = $pkt_size - 4;
 		$bytes_read = read STDIN, $buffer, $content_size;
 		if ( $bytes_read != $content_size ) {
 			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
 		}
 		return ( 0, $buffer );
-	}
-	else {
+	} else {
 		die "invalid packet size: $pkt_size";
 	}
 }
@@ -165,8 +162,7 @@ while (1) {
 		$debug->flush();
 		packet_txt_write("status=success");
 		packet_flush();
-	}
-	else {
+	} else {
 		my ( $pathname ) = packet_txt_read() =~ /^pathname=(.+)$/;
 		print $debug " $pathname";
 		$debug->flush();
@@ -205,17 +201,13 @@ while (1) {
 		my $output;
 		if ( exists $DELAY{$pathname} and exists $DELAY{$pathname}{"output"} ) {
 			$output = $DELAY{$pathname}{"output"}
-		}
-		elsif ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
+		} elsif ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
 			$output = "";
-		}
-		elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+		} elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
 			$output = rot13($input);
-		}
-		elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+		} elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
 			$output = rot13($input);
-		}
-		else {
+		} else {
 			die "bad command '$command'";
 		}
 
@@ -224,25 +216,21 @@ while (1) {
 			$debug->flush();
 			packet_txt_write("status=error");
 			packet_flush();
-		}
-		elsif ( $pathname eq "abort.r" ) {
+		} elsif ( $pathname eq "abort.r" ) {
 			print $debug "[ABORT]\n";
 			$debug->flush();
 			packet_txt_write("status=abort");
 			packet_flush();
-		}
-		elsif ( $command eq "smudge" and
+		} elsif ( $command eq "smudge" and
 			exists $DELAY{$pathname} and
-			$DELAY{$pathname}{"requested"} == 1
-		) {
+			$DELAY{$pathname}{"requested"} == 1 ) {
 			print $debug "[DELAYED]\n";
 			$debug->flush();
 			packet_txt_write("status=delayed");
 			packet_flush();
 			$DELAY{$pathname}{"requested"} = 2;
 			$DELAY{$pathname}{"output"} = $output;
-		}
-		else {
+		} else {
 			packet_txt_write("status=success");
 			packet_flush();
 
@@ -262,8 +250,7 @@ while (1) {
 				print $debug ".";
 				if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
 					$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
-				}
-				else {
+				} else {
 					$output = "";
 				}
 			}
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 04/40] t0021/rot13-filter: improve error message
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (2 preceding siblings ...)
  2017-09-16  8:06 ` [PATCH v6 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:06 ` [PATCH v6 05/40] t0021/rot13-filter: add packet_initialize() Christian Couder
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

If there is no new line at the end of something it receives,
the packet_txt_read() function die()s, but it's difficult to
debug without much context.

Let's give a bit more information when that happens.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 82882392ae..3b3da8a03d 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -82,7 +82,8 @@ sub packet_bin_read {
 sub packet_txt_read {
 	my ( $res, $buf ) = packet_bin_read();
 	unless ( $res == -1 or $buf eq '' or $buf =~ s/\n$// ) {
-		die "A non-binary line MUST be terminated by an LF.";
+		die "A non-binary line MUST be terminated by an LF.\n"
+		    . "Received: '$buf'";
 	}
 	return ( $res, $buf );
 }
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 05/40] t0021/rot13-filter: add packet_initialize()
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (3 preceding siblings ...)
  2017-09-16  8:06 ` [PATCH v6 04/40] t0021/rot13-filter: improve error message Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:06 ` [PATCH v6 06/40] t0021/rot13-filter: add capability functions Christian Couder
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Let's refactor the code to initialize communication into its own
packet_initialize() function, so that we can reuse this
functionality in following patches.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 3b3da8a03d..278fc6f534 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -104,16 +104,22 @@ sub packet_flush {
 	STDOUT->flush();
 }
 
+sub packet_initialize {
+	my ($name, $version) = @_;
+
+	( packet_txt_read() eq ( 0, $name . "-client" ) )       || die "bad initialize";
+	( packet_txt_read() eq ( 0, "version=" . $version ) )   || die "bad version";
+	( packet_bin_read() eq ( 1, "" ) )                      || die "bad version end";
+
+	packet_txt_write( $name . "-server" );
+	packet_txt_write( "version=" . $version );
+	packet_flush();
+}
+
 print $debug "START\n";
 $debug->flush();
 
-( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
-( packet_txt_read() eq ( 0, "version=2" ) )         || die "bad version";
-( packet_bin_read() eq ( 1, "" ) )                  || die "bad version end";
-
-packet_txt_write("git-filter-server");
-packet_txt_write("version=2");
-packet_flush();
+packet_initialize("git-filter", 2);
 
 ( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
 ( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 06/40] t0021/rot13-filter: add capability functions
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (4 preceding siblings ...)
  2017-09-16  8:06 ` [PATCH v6 05/40] t0021/rot13-filter: add packet_initialize() Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:06 ` [PATCH v6 07/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Add functions to help read and write capabilities.
These functions will be reused in following patches.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 40 ++++++++++++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 278fc6f534..ba18b207c6 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -116,20 +116,44 @@ sub packet_initialize {
 	packet_flush();
 }
 
+sub packet_read_capabilities {
+	my @cap;
+	while (1) {
+		my ( $res, $buf ) = packet_bin_read();
+		return ( $res, @cap ) if ( $res != 0 );
+		unless ( $buf =~ s/\n$// ) {
+			die "A non-binary line MUST be terminated by an LF.\n"
+			    . "Received: '$buf'";
+		}
+		die "bad capability buf: '$buf'" unless ( $buf =~ s/capability=// );
+		push @cap, $buf;
+	}
+}
+
+sub packet_read_and_check_capabilities {
+	my @local_caps = @_;
+	my @remote_res_caps = packet_read_capabilities();
+	my $res = shift @remote_res_caps;
+	my %remote_caps = map { $_ => 1 } @remote_res_caps;
+	foreach (@local_caps) {
+        	die "'$_' capability not available" unless (exists($remote_caps{$_}));
+	}
+	return $res;
+}
+
+sub packet_write_capabilities {
+	packet_txt_write( "capability=" . $_ ) foreach (@_);
+	packet_flush();
+}
+
 print $debug "START\n";
 $debug->flush();
 
 packet_initialize("git-filter", 2);
 
-( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
-( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
-( packet_txt_read() eq ( 0, "capability=delay" ) )  || die "bad capability";
-( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";
+packet_read_and_check_capabilities("clean", "smudge", "delay");
+packet_write_capabilities(@capabilities);
 
-foreach (@capabilities) {
-	packet_txt_write( "capability=" . $_ );
-}
-packet_flush();
 print $debug "init handshake complete\n";
 $debug->flush();
 
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 07/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (5 preceding siblings ...)
  2017-09-16  8:06 ` [PATCH v6 06/40] t0021/rot13-filter: add capability functions Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:06 ` [PATCH v6 08/40] sha1_file: prepare for external odbs Christian Couder
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

And while at it let's simplify t0021/rot13-filter.pl by
using Git/Packet.pm.

This will make it possible to reuse packet related
functions in other test scripts.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 perl/Git/Packet.pm      | 118 ++++++++++++++++++++++++++++++++++++++++++++++++
 t/t0021/rot13-filter.pl |  94 ++------------------------------------
 2 files changed, 121 insertions(+), 91 deletions(-)
 create mode 100644 perl/Git/Packet.pm

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
new file mode 100644
index 0000000000..b1e67477a0
--- /dev/null
+++ b/perl/Git/Packet.pm
@@ -0,0 +1,118 @@
+package Git::Packet;
+use 5.008;
+use strict;
+use warnings;
+BEGIN {
+	require Exporter;
+	if ($] < 5.008003) {
+		*import = \&Exporter::import;
+	} else {
+		# Exporter 5.57 which supports this invocation was
+		# released with perl 5.8.3
+		Exporter->import('import');
+	}
+}
+
+our @EXPORT = qw(
+			packet_bin_read
+			packet_txt_read
+			packet_bin_write
+			packet_txt_write
+			packet_flush
+			packet_initialize
+			packet_read_capabilities
+			packet_write_capabilities
+			packet_read_and_check_capabilities
+		);
+our @EXPORT_OK = @EXPORT;
+
+sub packet_bin_read {
+	my $buffer;
+	my $bytes_read = read STDIN, $buffer, 4;
+	if ( $bytes_read == 0 ) {
+		# EOF - Git stopped talking to us!
+		return ( -1, "" );
+	} elsif ( $bytes_read != 4 ) {
+		die "invalid packet: '$buffer'";
+	}
+	my $pkt_size = hex($buffer);
+	if ( $pkt_size == 0 ) {
+		return ( 1, "" );
+	} elsif ( $pkt_size > 4 ) {
+		my $content_size = $pkt_size - 4;
+		$bytes_read = read STDIN, $buffer, $content_size;
+		if ( $bytes_read != $content_size ) {
+			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
+		}
+		return ( 0, $buffer );
+	} else {
+		die "invalid packet size: $pkt_size";
+	}
+}
+
+sub packet_txt_read {
+	my ( $res, $buf ) = packet_bin_read();
+	unless ( $res == -1 or $buf eq '' or $buf =~ s/\n$// ) {
+		die "A non-binary line MUST be terminated by an LF.\n"
+		    . "Received: '$buf'";
+	}
+	return ( $res, $buf );
+}
+
+sub packet_bin_write {
+	my $buf = shift;
+	print STDOUT sprintf( "%04x", length($buf) + 4 );
+	print STDOUT $buf;
+	STDOUT->flush();
+}
+
+sub packet_txt_write {
+	packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+	print STDOUT sprintf( "%04x", 0 );
+	STDOUT->flush();
+}
+
+sub packet_initialize {
+	my ($name, $version) = @_;
+
+	( packet_txt_read() eq ( 0, $name . "-client" ) )	|| die "bad initialize";
+	( packet_txt_read() eq ( 0, "version=" . $version ) )	|| die "bad version";
+	( packet_bin_read() eq ( 1, "" ) )			|| die "bad version end";
+
+	packet_txt_write( $name . "-server" );
+	packet_txt_write( "version=" . $version );
+	packet_flush();
+}
+
+sub packet_read_capabilities {
+	my @cap;
+	while (1) {
+		my ( $res, $buf ) = packet_bin_read();
+		return ( $res, @cap ) if ( $res != 0 );
+		unless ( $buf =~ s/\n$// ) {
+			die "A non-binary line MUST be terminated by an LF.\n"
+			    . "Received: '$buf'";
+		}
+		die "bad capability buf: '$buf'" unless ( $buf =~ s/capability=// );
+		push @cap, $buf;
+	}
+}
+
+sub packet_read_and_check_capabilities {
+	my @local_caps = @_;
+	my @remote_res_caps = packet_read_capabilities();
+	my $res = shift @remote_res_caps;
+	my %remote_caps = map { $_ => 1 } @remote_res_caps;
+	foreach (@local_caps) {
+		die "'$_' capability not available" unless (exists($remote_caps{$_}));
+	}
+	return $res;
+}
+
+sub packet_write_capabilities {
+	packet_txt_write( "capability=" . $_ ) foreach (@_);
+	packet_flush();
+}
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index ba18b207c6..2e8ad4d496 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -30,9 +30,12 @@
 #     to the "list_available_blobs" response.
 #
 
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
 use strict;
 use warnings;
 use IO::File;
+use Git::Packet;
 
 my $MAX_PACKET_CONTENT_SIZE = 65516;
 my $log_file                = shift @ARGV;
@@ -55,97 +58,6 @@ sub rot13 {
 	return $str;
 }
 
-sub packet_bin_read {
-	my $buffer;
-	my $bytes_read = read STDIN, $buffer, 4;
-	if ( $bytes_read == 0 ) {
-		# EOF - Git stopped talking to us!
-		return ( -1, "" );
-	} elsif ( $bytes_read != 4 ) {
-		die "invalid packet: '$buffer'";
-	}
-	my $pkt_size = hex($buffer);
-	if ( $pkt_size == 0 ) {
-		return ( 1, "" );
-	} elsif ( $pkt_size > 4 ) {
-		my $content_size = $pkt_size - 4;
-		$bytes_read = read STDIN, $buffer, $content_size;
-		if ( $bytes_read != $content_size ) {
-			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
-		}
-		return ( 0, $buffer );
-	} else {
-		die "invalid packet size: $pkt_size";
-	}
-}
-
-sub packet_txt_read {
-	my ( $res, $buf ) = packet_bin_read();
-	unless ( $res == -1 or $buf eq '' or $buf =~ s/\n$// ) {
-		die "A non-binary line MUST be terminated by an LF.\n"
-		    . "Received: '$buf'";
-	}
-	return ( $res, $buf );
-}
-
-sub packet_bin_write {
-	my $buf = shift;
-	print STDOUT sprintf( "%04x", length($buf) + 4 );
-	print STDOUT $buf;
-	STDOUT->flush();
-}
-
-sub packet_txt_write {
-	packet_bin_write( $_[0] . "\n" );
-}
-
-sub packet_flush {
-	print STDOUT sprintf( "%04x", 0 );
-	STDOUT->flush();
-}
-
-sub packet_initialize {
-	my ($name, $version) = @_;
-
-	( packet_txt_read() eq ( 0, $name . "-client" ) )       || die "bad initialize";
-	( packet_txt_read() eq ( 0, "version=" . $version ) )   || die "bad version";
-	( packet_bin_read() eq ( 1, "" ) )                      || die "bad version end";
-
-	packet_txt_write( $name . "-server" );
-	packet_txt_write( "version=" . $version );
-	packet_flush();
-}
-
-sub packet_read_capabilities {
-	my @cap;
-	while (1) {
-		my ( $res, $buf ) = packet_bin_read();
-		return ( $res, @cap ) if ( $res != 0 );
-		unless ( $buf =~ s/\n$// ) {
-			die "A non-binary line MUST be terminated by an LF.\n"
-			    . "Received: '$buf'";
-		}
-		die "bad capability buf: '$buf'" unless ( $buf =~ s/capability=// );
-		push @cap, $buf;
-	}
-}
-
-sub packet_read_and_check_capabilities {
-	my @local_caps = @_;
-	my @remote_res_caps = packet_read_capabilities();
-	my $res = shift @remote_res_caps;
-	my %remote_caps = map { $_ => 1 } @remote_res_caps;
-	foreach (@local_caps) {
-        	die "'$_' capability not available" unless (exists($remote_caps{$_}));
-	}
-	return $res;
-}
-
-sub packet_write_capabilities {
-	packet_txt_write( "capability=" . $_ ) foreach (@_);
-	packet_flush();
-}
-
 print $debug "START\n";
 $debug->flush();
 
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 08/40] sha1_file: prepare for external odbs
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (6 preceding siblings ...)
  2017-09-16  8:06 ` [PATCH v6 07/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
@ 2017-09-16  8:06 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 09/40] Add initial external odb support Christian Couder
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

In the following commits we will need some functions that were
internal to sha1_file.c, so let's first make them non static
and declare them in "cache.h". While at it, let's rename
'create_tmpfile()' to 'create_object_tmpfile()' to make its
name less generic.

Let's also split out 'sha1_file_name_alt()' from
'sha1_file_name()' and 'open_sha1_file_alt()' from
'open_sha1_file()', as we will need both of these new
functions too.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h     |  8 ++++++++
 sha1_file.c | 47 +++++++++++++++++++++++++++++------------------
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index a916bc79e3..00d89568f3 100644
--- a/cache.h
+++ b/cache.h
@@ -902,6 +902,12 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
+/*
+ * Like sha1_file_name, but return the filename within a specific alternate
+ * object directory. Shares the same static buffer with sha1_file_name.
+ */
+extern const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1);
+
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
  * The result will be at least `len` characters long, and will be NUL
@@ -1189,6 +1195,8 @@ extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
 
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
 
+extern int create_object_tmpfile(struct strbuf *tmp, const char *filename);
+extern void close_sha1_file(int fd);
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 5f71bbac3e..bea1ae6afb 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -251,17 +251,22 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
 	}
 }
 
-const char *sha1_file_name(const unsigned char *sha1)
+const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1)
 {
 	static struct strbuf buf = STRBUF_INIT;
 
 	strbuf_reset(&buf);
-	strbuf_addf(&buf, "%s/", get_object_directory());
+	strbuf_addf(&buf, "%s/", objdir);
 
 	fill_sha1_path(&buf, sha1);
 	return buf.buf;
 }
 
+const char *sha1_file_name(const unsigned char *sha1)
+{
+	return sha1_file_name_alt(get_object_directory(), sha1);
+}
+
 struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
 {
 	strbuf_setlen(&alt->scratch, alt->base_len);
@@ -822,24 +827,14 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
 	return -1;
 }
 
-/*
- * Like stat_sha1_file(), but actually open the object and return the
- * descriptor. See the caveats on the "path" parameter above.
- */
-static int open_sha1_file(const unsigned char *sha1, const char **path)
+static int open_sha1_file_alt(const unsigned char *sha1, const char **path)
 {
-	int fd;
 	struct alternate_object_database *alt;
-	int most_interesting_errno;
-
-	*path = sha1_file_name(sha1);
-	fd = git_open(*path);
-	if (fd >= 0)
-		return fd;
-	most_interesting_errno = errno;
+	int most_interesting_errno = errno;
 
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
+		int fd;
 		*path = alt_sha1_path(alt, sha1);
 		fd = git_open(*path);
 		if (fd >= 0)
@@ -851,6 +846,22 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	return -1;
 }
 
+/*
+ * Like stat_sha1_file(), but actually open the object and return the
+ * descriptor. See the caveats on the "path" parameter above.
+ */
+static int open_sha1_file(const unsigned char *sha1, const char **path)
+{
+	int fd;
+
+	*path = sha1_file_name(sha1);
+	fd = git_open(*path);
+	if (fd >= 0)
+		return fd;
+
+	return open_sha1_file_alt(sha1, path);
+}
+
 /*
  * Map the loose object at "path" if it is not NULL, or the path found by
  * searching for a loose object named "sha1".
@@ -1428,7 +1439,7 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
 }
 
 /* Finalize a file on disk, and close it. */
-static void close_sha1_file(int fd)
+void close_sha1_file(int fd)
 {
 	if (fsync_object_files)
 		fsync_or_die(fd, "sha1 file");
@@ -1452,7 +1463,7 @@ static inline int directory_size(const char *filename)
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+int create_object_tmpfile(struct strbuf *tmp, const char *filename)
 {
 	int fd, dirlen = directory_size(filename);
 
@@ -1492,7 +1503,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	const char *filename = sha1_file_name(sha1);
 
-	fd = create_tmpfile(&tmp_file, filename);
+	fd = create_object_tmpfile(&tmp_file, filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error("insufficient permission for adding an object to repository database %s", get_object_directory());
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 09/40] Add initial external odb support
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (7 preceding siblings ...)
  2017-09-16  8:06 ` [PATCH v6 08/40] sha1_file: prepare for external odbs Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-19 17:45   ` Jonathan Tan
  2017-09-16  8:07 ` [PATCH v6 10/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
                   ` (31 subsequent siblings)
  40 siblings, 1 reply; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

The external-odb.{c,h} files contains the functions that are
called by the rest of Git from "sha1_file.c".

The odb-helper.{c,h} files contains the functions to
actually implement communication with the external scripts or
processes that will manage external git objects.

For now only script mode is supported, and only the 'have' and
'get_git_obj' instructions are supported.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Makefile                |   2 +
 cache.h                 |   1 +
 external-odb.c          | 113 ++++++++++++++++++++
 external-odb.h          |   8 ++
 odb-helper.c            | 269 ++++++++++++++++++++++++++++++++++++++++++++++++
 odb-helper.h            |  27 +++++
 sha1_file.c             |  31 +++++-
 t/t0400-external-odb.sh |  46 +++++++++
 8 files changed, 495 insertions(+), 2 deletions(-)
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100755 t/t0400-external-odb.sh

diff --git a/Makefile b/Makefile
index f2bb7f2f63..24aab8ace3 100644
--- a/Makefile
+++ b/Makefile
@@ -784,6 +784,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += external-odb.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
@@ -816,6 +817,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += odb-helper.o
 LIB_OBJS += oidset.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pack-bitmap.o
diff --git a/cache.h b/cache.h
index 00d89568f3..6c22bd0525 100644
--- a/cache.h
+++ b/cache.h
@@ -1534,6 +1534,7 @@ extern void prepare_alt_odb(void);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
+extern void prepare_external_alt_odb(void);
 
 /*
  * Allocate a "struct alternate_object_database" but do _not_ actually
diff --git a/external-odb.c b/external-odb.c
new file mode 100644
index 0000000000..e9c3f11666
--- /dev/null
+++ b/external-odb.c
@@ -0,0 +1,113 @@
+#include "cache.h"
+#include "external-odb.h"
+#include "odb-helper.h"
+
+static struct odb_helper *helpers;
+static struct odb_helper **helpers_tail = &helpers;
+
+static struct odb_helper *find_or_create_helper(const char *name, int len)
+{
+	struct odb_helper *o;
+
+	for (o = helpers; o; o = o->next)
+		if (!strncmp(o->name, name, len) && !o->name[len])
+			return o;
+
+	o = odb_helper_new(name, len);
+	*helpers_tail = o;
+	helpers_tail = &o->next;
+
+	return o;
+}
+
+static int external_odb_config(const char *var, const char *value, void *data)
+{
+	struct odb_helper *o;
+	const char *name;
+	int namelen;
+	const char *subkey;
+
+	if (parse_config_key(var, "odb", &name, &namelen, &subkey) < 0)
+		return 0;
+
+	o = find_or_create_helper(name, namelen);
+
+	if (!strcmp(subkey, "scriptcommand"))
+		return git_config_string(&o->cmd, var, value);
+
+	return 0;
+}
+
+static void external_odb_init(void)
+{
+	static int initialized;
+
+	if (initialized)
+		return;
+	initialized = 1;
+
+	git_config(external_odb_config, NULL);
+}
+
+const char *external_odb_root(void)
+{
+	static const char *root;
+	if (!root)
+		root = git_pathdup("objects/external");
+	return root;
+}
+
+int external_odb_has_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next)
+		if (odb_helper_has_object(o, sha1))
+			return 1;
+	return 0;
+}
+
+int external_odb_get_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+	const char *path;
+
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	path = sha1_file_name_alt(external_odb_root(), sha1);
+	safe_create_leading_directories_const(path);
+	prepare_external_alt_odb();
+
+	for (o = helpers; o; o = o->next) {
+		struct strbuf tmpfile = STRBUF_INIT;
+		int ret;
+		int fd;
+
+		if (!odb_helper_has_object(o, sha1))
+			continue;
+
+		fd = create_object_tmpfile(&tmpfile, path);
+		if (fd < 0) {
+			strbuf_release(&tmpfile);
+			return -1;
+		}
+
+		if (odb_helper_get_object(o, sha1, fd) < 0) {
+			close(fd);
+			unlink(tmpfile.buf);
+			strbuf_release(&tmpfile);
+			continue;
+		}
+
+		close_sha1_file(fd);
+		ret = finalize_object_file(tmpfile.buf, path);
+		strbuf_release(&tmpfile);
+		if (!ret)
+			return 0;
+	}
+
+	return -1;
+}
diff --git a/external-odb.h b/external-odb.h
new file mode 100644
index 0000000000..dc5635f452
--- /dev/null
+++ b/external-odb.h
@@ -0,0 +1,8 @@
+#ifndef EXTERNAL_ODB_H
+#define EXTERNAL_ODB_H
+
+extern const char *external_odb_root(void);
+extern int external_odb_has_object(const unsigned char *sha1);
+extern int external_odb_get_object(const unsigned char *sha1);
+
+#endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
new file mode 100644
index 0000000000..5e91044872
--- /dev/null
+++ b/odb-helper.c
@@ -0,0 +1,269 @@
+#include "cache.h"
+#include "object.h"
+#include "argv-array.h"
+#include "odb-helper.h"
+#include "run-command.h"
+#include "sha1-lookup.h"
+
+struct odb_helper *odb_helper_new(const char *name, int namelen)
+{
+	struct odb_helper *o;
+
+	o = xcalloc(1, sizeof(*o));
+	o->name = xmemdupz(name, namelen);
+
+	return o;
+}
+
+struct odb_helper_cmd {
+	struct argv_array argv;
+	struct child_process child;
+};
+
+/*
+ * Callers are responsible to ensure that the result of vaddf(fmt, ap)
+ * is properly shell-quoted.
+ */
+static void prepare_helper_command(struct argv_array *argv, const char *cmd,
+				   const char *fmt, va_list ap)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addstr(&buf, cmd);
+	strbuf_addch(&buf, ' ');
+	strbuf_vaddf(&buf, fmt, ap);
+
+	argv_array_push(argv, buf.buf);
+	strbuf_release(&buf);
+}
+
+__attribute__((format (printf,3,4)))
+static int odb_helper_start(struct odb_helper *o,
+			    struct odb_helper_cmd *cmd,
+			    const char *fmt, ...)
+{
+	va_list ap;
+
+	memset(cmd, 0, sizeof(*cmd));
+	argv_array_init(&cmd->argv);
+
+	if (!o->cmd)
+		return -1;
+
+	va_start(ap, fmt);
+	prepare_helper_command(&cmd->argv, o->cmd, fmt, ap);
+	va_end(ap);
+
+	cmd->child.argv = cmd->argv.argv;
+	cmd->child.use_shell = 1;
+	cmd->child.no_stdin = 1;
+	cmd->child.out = -1;
+
+	if (start_command(&cmd->child) < 0) {
+		argv_array_clear(&cmd->argv);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int odb_helper_finish(struct odb_helper *o,
+			     struct odb_helper_cmd *cmd)
+{
+	int ret = finish_command(&cmd->child);
+	argv_array_clear(&cmd->argv);
+	if (ret) {
+		warning("odb helper '%s' reported failure", o->name);
+		return -1;
+	}
+	return 0;
+}
+
+static int parse_object_line(struct odb_helper_object *o, const char *line)
+{
+	char *end;
+	if (get_sha1_hex(line, o->sha1) < 0)
+		return -1;
+
+	line += 40;
+	if (*line++ != ' ')
+		return -1;
+
+	o->size = strtoul(line, &end, 10);
+	if (line == end || *end++ != ' ')
+		return -1;
+
+	o->type = type_from_string(end);
+	return 0;
+}
+
+static int add_have_entry(struct odb_helper *o, const char *line)
+{
+	ALLOC_GROW(o->have, o->have_nr+1, o->have_alloc);
+	if (parse_object_line(&o->have[o->have_nr], line) < 0) {
+		warning("bad 'have' input from odb helper '%s': %s",
+			o->name, line);
+		return 1;
+	}
+	o->have_nr++;
+	return 0;
+}
+
+static int odb_helper_object_cmp(const void *va, const void *vb)
+{
+	const struct odb_helper_object *a = va, *b = vb;
+	return hashcmp(a->sha1, b->sha1);
+}
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+	struct odb_helper_cmd cmd;
+	FILE *fh;
+	struct strbuf line = STRBUF_INIT;
+
+	if (o->have_valid)
+		return;
+	o->have_valid = 1;
+
+	if (odb_helper_start(o, &cmd, "have") < 0)
+		return;
+
+	fh = xfdopen(cmd.child.out, "r");
+	while (strbuf_getline(&line, fh) != EOF)
+		if (add_have_entry(o, line.buf))
+			break;
+
+	strbuf_release(&line);
+	fclose(fh);
+	odb_helper_finish(o, &cmd);
+
+	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
+}
+
+static const unsigned char *have_sha1_access(size_t index, void *table)
+{
+	struct odb_helper_object *have = table;
+	return have[index].sha1;
+}
+
+static struct odb_helper_object *odb_helper_lookup(struct odb_helper *o,
+						   const unsigned char *sha1)
+{
+	int idx;
+
+	odb_helper_load_have(o);
+	idx = sha1_pos(sha1, o->have, o->have_nr, have_sha1_access);
+	if (idx < 0)
+		return NULL;
+	return &o->have[idx];
+}
+
+int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
+{
+	return !!odb_helper_lookup(o, sha1);
+}
+
+int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
+			    int fd)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+	unsigned long total_got;
+	git_zstream stream;
+	int zret = Z_STREAM_END;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+	struct strbuf header = STRBUF_INIT;
+	unsigned long hdr_size;
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	memset(&stream, 0, sizeof(stream));
+	git_inflate_init(&stream);
+	git_SHA1_Init(&hash);
+	total_got = 0;
+
+	for (;;) {
+		unsigned char buf[4096];
+		int r;
+
+		r = xread(cmd.child.out, buf, sizeof(buf));
+		if (r < 0) {
+			error("unable to read from odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			git_inflate_end(&stream);
+			return -1;
+		}
+		if (r == 0)
+			break;
+
+		write_or_die(fd, buf, r);
+
+		stream.next_in = buf;
+		stream.avail_in = r;
+		do {
+			unsigned char inflated[4096];
+			unsigned long got;
+
+			stream.next_out = inflated;
+			stream.avail_out = sizeof(inflated);
+			zret = git_inflate(&stream, Z_SYNC_FLUSH);
+			got = sizeof(inflated) - stream.avail_out;
+
+			git_SHA1_Update(&hash, inflated, got);
+			/* skip header when counting size */
+			if (!total_got) {
+				const unsigned char *p = memchr(inflated, '\0', got);
+				if (p) {
+					unsigned long hdr_last = p - inflated + 1;
+					strbuf_add(&header, inflated, hdr_last);
+					got -= hdr_last;
+				} else {
+					strbuf_add(&header, inflated, got);
+					got = 0;
+				}
+			}
+			total_got += got;
+		} while (stream.avail_in && zret == Z_OK);
+	}
+
+	close(cmd.child.out);
+	git_inflate_end(&stream);
+	git_SHA1_Final(real_sha1, &hash);
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+	if (zret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (total_got != obj->size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, obj->size);
+		return -1;
+	}
+	if (hashcmp(real_sha1, sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (parse_sha1_header(header.buf, &hdr_size) < 0) {
+		warning("could not parse header from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (total_got != hdr_size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, hdr_size);
+		return -1;
+	}
+
+	return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
new file mode 100644
index 0000000000..fb25ad579e
--- /dev/null
+++ b/odb-helper.h
@@ -0,0 +1,27 @@
+#ifndef ODB_HELPER_H
+#define ODB_HELPER_H
+
+struct odb_helper {
+	const char *name;
+	const char *cmd;
+
+	struct odb_helper_object {
+		unsigned char sha1[20];
+		unsigned long size;
+		enum object_type type;
+	} *have;
+	int have_nr;
+	int have_alloc;
+	int have_valid;
+
+	struct odb_helper *next;
+};
+
+extern struct odb_helper *odb_helper_new(const char *name, int namelen);
+extern int odb_helper_has_object(struct odb_helper *o,
+				 const unsigned char *sha1);
+extern int odb_helper_get_object(struct odb_helper *o,
+				 const unsigned char *sha1,
+				 int fd);
+
+#endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index bea1ae6afb..4a4f5df5ec 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -29,6 +29,7 @@
 #include "mergesort.h"
 #include "quote.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -613,6 +614,21 @@ int foreach_alt_odb(alt_odb_fn fn, void *cb)
 	return r;
 }
 
+void prepare_external_alt_odb(void)
+{
+	static int linked_external;
+	const char *path;
+
+	if (linked_external)
+		return;
+
+	path = external_odb_root();
+	if (!access(path, F_OK)) {
+		link_alt_odb_entry(path, NULL, 0, "");
+		linked_external = 1;
+	}
+}
+
 void prepare_alt_odb(void)
 {
 	const char *alt;
@@ -627,6 +643,7 @@ void prepare_alt_odb(void)
 	link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);
 
 	read_info_alternates(get_object_directory(), 0);
+	prepare_external_alt_odb();
 }
 
 /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
@@ -667,7 +684,7 @@ static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
 		if (check_and_freshen_file(path, freshen))
 			return 1;
 	}
-	return 0;
+	return external_odb_has_object(sha1);
 }
 
 static int check_and_freshen(const unsigned char *sha1, int freshen)
@@ -824,6 +841,9 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
 			return 0;
 	}
 
+	if (!external_odb_get_object(sha1) && !lstat(*path, st))
+		return 0;
+
 	return -1;
 }
 
@@ -859,7 +879,14 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	if (fd >= 0)
 		return fd;
 
-	return open_sha1_file_alt(sha1, path);
+	fd = open_sha1_file_alt(sha1, path);
+	if (fd >= 0)
+		return fd;
+
+	if (!external_odb_get_object(sha1))
+		fd = open_sha1_file_alt(sha1, path);
+
+	return fd;
 }
 
 /*
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
new file mode 100755
index 0000000000..2f4749fab1
--- /dev/null
+++ b/t/t0400-external-odb.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='basic tests for external object databases'
+
+. ./test-lib.sh
+
+ALT_SOURCE="$PWD/alt-repo/.git"
+export ALT_SOURCE
+write_script odb-helper <<\EOF
+GIT_DIR=$ALT_SOURCE; export GIT_DIR
+case "$1" in
+have)
+	git cat-file --batch-check --batch-all-objects |
+	awk '{print $1 " " $3 " " $2}'
+	;;
+get_git_obj)
+	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-helper"
+
+test_expect_success 'setup alternate repo' '
+	git init alt-repo &&
+	(cd alt-repo &&
+	 test_commit one &&
+	 test_commit two
+	) &&
+	alt_head=`cd alt-repo && git rev-parse HEAD`
+'
+
+test_expect_success 'alt objects are missing' '
+	test_must_fail git log --format=%s $alt_head
+'
+
+test_expect_success 'helper can retrieve alt objects' '
+	test_config odb.magic.scriptCommand "$HELPER" &&
+	cat >expect <<-\EOF &&
+	two
+	one
+	EOF
+	git log --format=%s $alt_head >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 10/40] odb-helper: add odb_helper_init() to send 'init' instruction
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (8 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 09/40] Add initial external odb support Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 11/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Let's add an odb_helper_init() function to send an 'init'
instruction to the helpers. This 'init' instruction is
especially useful to get the capabilities that are supported
by the helpers.

So while at it, let's also add a parse_capabilities()
function to parse them and a supported_capabilities
variable in struct odb_helper to store them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c          |  9 ++++++++-
 odb-helper.c            | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 odb-helper.h            | 12 +++++++++++
 t/t0400-external-odb.sh |  4 ++++
 4 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/external-odb.c b/external-odb.c
index e9c3f11666..0f0de170b8 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -41,12 +41,16 @@ static int external_odb_config(const char *var, const char *value, void *data)
 static void external_odb_init(void)
 {
 	static int initialized;
+	struct odb_helper *o;
 
 	if (initialized)
 		return;
 	initialized = 1;
 
 	git_config(external_odb_config, NULL);
+
+	for (o = helpers; o; o = o->next)
+		odb_helper_init(o);
 }
 
 const char *external_odb_root(void)
@@ -63,9 +67,12 @@ int external_odb_has_object(const unsigned char *sha1)
 
 	external_odb_init();
 
-	for (o = helpers; o; o = o->next)
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
+			return 1;
 		if (odb_helper_has_object(o, sha1))
 			return 1;
+	}
 	return 0;
 }
 
diff --git a/odb-helper.c b/odb-helper.c
index 5e91044872..9375eca58f 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -5,6 +5,40 @@
 #include "run-command.h"
 #include "sha1-lookup.h"
 
+static void parse_capabilities(char *cap_buf,
+			       unsigned int *supported_capabilities,
+			       const char *process_name)
+{
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+
+	string_list_split_in_place(&cap_list, cap_buf, '=', 1);
+
+	if (cap_list.nr == 2 && !strcmp(cap_list.items[0].string, "capability")) {
+		const char *cap_name = cap_list.items[1].string;
+
+		if (!strcmp(cap_name, "get_git_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_GIT_OBJ;
+		} else if (!strcmp(cap_name, "get_raw_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_RAW_OBJ;
+		} else if (!strcmp(cap_name, "get_direct")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
+		} else if (!strcmp(cap_name, "put_git_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_GIT_OBJ;
+		} else if (!strcmp(cap_name, "put_raw_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_RAW_OBJ;
+		} else if (!strcmp(cap_name, "put_direct")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_DIRECT;
+		} else if (!strcmp(cap_name, "have")) {
+			*supported_capabilities |= ODB_HELPER_CAP_HAVE;
+		} else {
+			warning("external process '%s' requested unsupported read-object capability '%s'",
+				process_name, cap_name);
+		}
+	}
+
+	string_list_clear(&cap_list, 0);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -79,6 +113,26 @@ static int odb_helper_finish(struct odb_helper *o,
 	return 0;
 }
 
+int odb_helper_init(struct odb_helper *o)
+{
+	struct odb_helper_cmd cmd;
+	FILE *fh;
+	struct strbuf line = STRBUF_INIT;
+
+	if (odb_helper_start(o, &cmd, "init") < 0)
+		return -1;
+
+	fh = xfdopen(cmd.child.out, "r");
+	while (strbuf_getline(&line, fh) != EOF)
+		parse_capabilities(line.buf, &o->supported_capabilities, o->name);
+
+	strbuf_release(&line);
+	fclose(fh);
+	odb_helper_finish(o, &cmd);
+
+	return 0;
+}
+
 static int parse_object_line(struct odb_helper_object *o, const char *line)
 {
 	char *end;
diff --git a/odb-helper.h b/odb-helper.h
index fb25ad579e..5f28a6e512 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -1,9 +1,20 @@
 #ifndef ODB_HELPER_H
 #define ODB_HELPER_H
 
+#include "external-odb.h"
+
+#define ODB_HELPER_CAP_GET_GIT_OBJ    (1u<<0)
+#define ODB_HELPER_CAP_GET_RAW_OBJ    (1u<<1)
+#define ODB_HELPER_CAP_GET_DIRECT     (1u<<2)
+#define ODB_HELPER_CAP_PUT_GIT_OBJ    (1u<<3)
+#define ODB_HELPER_CAP_PUT_RAW_OBJ    (1u<<4)
+#define ODB_HELPER_CAP_PUT_DIRECT     (1u<<5)
+#define ODB_HELPER_CAP_HAVE           (1u<<6)
+
 struct odb_helper {
 	const char *name;
 	const char *cmd;
+	unsigned int supported_capabilities;
 
 	struct odb_helper_object {
 		unsigned char sha1[20];
@@ -18,6 +29,7 @@ struct odb_helper {
 };
 
 extern struct odb_helper *odb_helper_new(const char *name, int namelen);
+extern int odb_helper_init(struct odb_helper *o);
 extern int odb_helper_has_object(struct odb_helper *o,
 				 const unsigned char *sha1);
 extern int odb_helper_get_object(struct odb_helper *o,
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 2f4749fab1..ed89f3ab40 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -9,6 +9,10 @@ export ALT_SOURCE
 write_script odb-helper <<\EOF
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
 have)
 	git cat-file --batch-check --batch-all-objects |
 	awk '{print $1 " " $3 " " $2}'
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 11/40] t0400: add 'put_raw_obj' instruction to odb-helper script
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (9 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 10/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 12/40] external odb: add 'put_raw_obj' support Christian Couder
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To properly test passing objects from Git to an external odb
we need an odb-helper script that supports a 'put'
capability/instruction.

For now we will support only sending raw blobs, so the
supported capability/instruction will be 'put_raw_obj'.

While at it let's add a test to check that our odb-helper
script works well.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index ed89f3ab40..f9e6ea1015 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -7,10 +7,15 @@ test_description='basic tests for external object databases'
 ALT_SOURCE="$PWD/alt-repo/.git"
 export ALT_SOURCE
 write_script odb-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 init)
 	echo "capability=get_git_obj"
+	echo "capability=put_raw_obj"
 	echo "capability=have"
 	;;
 have)
@@ -20,6 +25,16 @@ have)
 get_git_obj)
 	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
 	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	written=$(git hash-object -w -t "$kind" --stdin)
+	test "$written" = "$sha1" || die "bad sha1 passed '$sha1' vs written '$written'"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
 esac
 EOF
 HELPER="\"$PWD\"/odb-helper"
@@ -47,4 +62,13 @@ test_expect_success 'helper can retrieve alt objects' '
 	test_cmp expect actual
 '
 
+test_expect_success 'helper can add objects to alt repo' '
+	hash=$(echo "Hello odb!" | git hash-object -w -t blob --stdin) &&
+	test -f .git/objects/$(echo $hash | sed "s#..#&/#") &&
+	size=$(git cat-file -s "$hash") &&
+	git cat-file blob "$hash" | ./odb-helper put_raw_obj "$hash" "$size" blob &&
+	alt_size=$(cd alt-repo && git cat-file -s "$hash") &&
+	test "$size" -eq "$alt_size"
+'
+
 test_done
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 12/40] external odb: add 'put_raw_obj' support
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (10 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 11/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 13/40] external-odb: accept only blobs for now Christian Couder
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Add support for a 'put_raw_obj' capability/instruction to send new
objects to an external odb. Objects will be sent as they are (in
their 'raw' format). They will not be converted to Git objects.

For now any new Git object (blob, tree, commit, ...) would be sent
if 'put_raw_obj' is supported by an odb helper. This is not a great
default, but let's leave it to following commits to tweak that.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 15 +++++++++++++++
 external-odb.h |  2 ++
 odb-helper.c   | 43 ++++++++++++++++++++++++++++++++++++++-----
 odb-helper.h   |  3 +++
 sha1_file.c    |  2 ++
 5 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 0f0de170b8..82fac702e8 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -118,3 +118,18 @@ int external_odb_get_object(const unsigned char *sha1)
 
 	return -1;
 }
+
+int external_odb_put_object(const void *buf, size_t len,
+			    const char *type, unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		int r = odb_helper_put_object(o, buf, len, type, sha1);
+		if (r <= 0)
+			return r;
+	}
+	return 1;
+}
diff --git a/external-odb.h b/external-odb.h
index dc5635f452..d369dfdf6f 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,5 +4,7 @@
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
 extern int external_odb_get_object(const unsigned char *sha1);
+extern int external_odb_put_object(const void *buf, size_t len,
+				   const char *type, unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index 9375eca58f..39d20fdfd7 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -71,9 +71,10 @@ static void prepare_helper_command(struct argv_array *argv, const char *cmd,
 	strbuf_release(&buf);
 }
 
-__attribute__((format (printf,3,4)))
+__attribute__((format (printf,4,5)))
 static int odb_helper_start(struct odb_helper *o,
 			    struct odb_helper_cmd *cmd,
+			    int use_stdin,
 			    const char *fmt, ...)
 {
 	va_list ap;
@@ -90,7 +91,10 @@ static int odb_helper_start(struct odb_helper *o,
 
 	cmd->child.argv = cmd->argv.argv;
 	cmd->child.use_shell = 1;
-	cmd->child.no_stdin = 1;
+	if (use_stdin)
+		cmd->child.in = -1;
+	else
+		cmd->child.no_stdin = 1;
 	cmd->child.out = -1;
 
 	if (start_command(&cmd->child) < 0) {
@@ -119,7 +123,7 @@ int odb_helper_init(struct odb_helper *o)
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (odb_helper_start(o, &cmd, "init") < 0)
+	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
 	fh = xfdopen(cmd.child.out, "r");
@@ -179,7 +183,7 @@ static void odb_helper_load_have(struct odb_helper *o)
 		return;
 	o->have_valid = 1;
 
-	if (odb_helper_start(o, &cmd, "have") < 0)
+	if (odb_helper_start(o, &cmd, 0, "have") < 0)
 		return;
 
 	fh = xfdopen(cmd.child.out, "r");
@@ -234,7 +238,7 @@ int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 	if (!obj)
 		return -1;
 
-	if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
+	if (odb_helper_start(o, &cmd, 0, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
 		return -1;
 
 	memset(&stream, 0, sizeof(stream));
@@ -321,3 +325,32 @@ int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 
 	return 0;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+			  const void *buf, size_t len,
+			  const char *type, unsigned char *sha1)
+{
+	struct odb_helper_cmd cmd;
+
+	if (odb_helper_start(o, &cmd, 1, "put_raw_obj %s %"PRIuMAX" %s",
+			     sha1_to_hex(sha1), (uintmax_t)len, type) < 0)
+		return -1;
+
+	do {
+		int w = xwrite(cmd.child.in, buf, len);
+		if (w < 0) {
+			error("unable to write to odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.in);
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			return -1;
+		}
+		len -= w;
+	} while (len > 0);
+
+	close(cmd.child.in);
+	close(cmd.child.out);
+	odb_helper_finish(o, &cmd);
+	return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 5f28a6e512..0571ba09cb 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -35,5 +35,8 @@ extern int odb_helper_has_object(struct odb_helper *o,
 extern int odb_helper_get_object(struct odb_helper *o,
 				 const unsigned char *sha1,
 				 int fd);
+extern int odb_helper_put_object(struct odb_helper *o,
+				 const void *buf, size_t len,
+				 const char *type, unsigned char *sha1);
 
 #endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index 4a4f5df5ec..d0155e392f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1613,6 +1613,8 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	 * it out into .git/objects/??/?{38} file.
 	 */
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	if (!external_odb_put_object(buf, len, type, sha1))
+		return 0;
 	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 13/40] external-odb: accept only blobs for now
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (11 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 12/40] external odb: add 'put_raw_obj' support Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 14/40] t0400: add test for external odb write support Christian Couder
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

The mechanism to decide which blobs should be sent to which
external object database will be very simple for now.
If the external odb helper support any "put_*" instruction
all the new blobs will be sent to it.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 82fac702e8..a4f8c72e1c 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -124,6 +124,10 @@ int external_odb_put_object(const void *buf, size_t len,
 {
 	struct odb_helper *o;
 
+	/* For now accept only blobs */
+	if (strcmp(type, "blob"))
+		return 1;
+
 	external_odb_init();
 
 	for (o = helpers; o; o = o->next) {
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 14/40] t0400: add test for external odb write support
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (12 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 13/40] external-odb: accept only blobs for now Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 15/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index f9e6ea1015..03df030461 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -71,4 +71,12 @@ test_expect_success 'helper can add objects to alt repo' '
 	test "$size" -eq "$alt_size"
 '
 
+test_expect_success 'commit adds objects to alt repo' '
+	test_config odb.magic.scriptCommand "$HELPER" &&
+	test_commit three &&
+	hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo && git show "$hash3") &&
+	test "$content" = "three"
+'
+
 test_done
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 15/40] Add GIT_NO_EXTERNAL_ODB env variable
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (13 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 14/40] t0400: add test for external odb write support Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 16/40] Add t0410 to test external ODB transfer Christian Couder
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This new environment variable will be used to perform git
commands without involving any external odb mechanism.

This makes it possible for example to create new blobs that
will not be sent to an external odb even if the external odb
supports "put_*" instructions.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h        | 9 +++++++++
 environment.c  | 4 ++++
 external-odb.c | 6 ++++++
 sha1_file.c    | 3 +++
 4 files changed, 22 insertions(+)

diff --git a/cache.h b/cache.h
index 6c22bd0525..a2bd2090b3 100644
--- a/cache.h
+++ b/cache.h
@@ -429,6 +429,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
 #define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
 #define GIT_REPLACE_REF_BASE_ENVIRONMENT "GIT_REPLACE_REF_BASE"
+#define NO_EXTERNAL_ODB_ENVIRONMENT "GIT_NO_EXTERNAL_ODB"
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
@@ -767,6 +768,14 @@ void reset_shared_repository(void);
 extern int check_replace_refs;
 extern char *git_replace_ref_base;
 
+/*
+ * Do external odbs need to be used this run?  This variable is
+ * initialized to true unless $GIT_NO_EXTERNAL_ODB is set, but it
+ * maybe set to false by some commands that do not want external
+ * odbs to be active.
+ */
+extern int use_external_odb;
+
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
diff --git a/environment.c b/environment.c
index 3fd4b10845..bbccabef6b 100644
--- a/environment.c
+++ b/environment.c
@@ -48,6 +48,7 @@ const char *excludes_file;
 enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
 int check_replace_refs = 1;
 char *git_replace_ref_base;
+int use_external_odb = 1;
 enum eol core_eol = EOL_UNSET;
 enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
 unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
@@ -116,6 +117,7 @@ const char * const local_repo_env[] = {
 	INDEX_ENVIRONMENT,
 	NO_REPLACE_OBJECTS_ENVIRONMENT,
 	GIT_REPLACE_REF_BASE_ENVIRONMENT,
+	NO_EXTERNAL_ODB_ENVIRONMENT,
 	GIT_PREFIX_ENVIRONMENT,
 	GIT_SUPER_PREFIX_ENVIRONMENT,
 	GIT_SHALLOW_FILE_ENVIRONMENT,
@@ -154,6 +156,8 @@ void setup_git_env(void)
 	replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);
 	git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
 							  : "refs/replace/");
+	if (getenv(NO_EXTERNAL_ODB_ENVIRONMENT))
+		use_external_odb = 0;
 	namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
 	shallow_file = getenv(GIT_SHALLOW_FILE_ENVIRONMENT);
 	if (shallow_file)
diff --git a/external-odb.c b/external-odb.c
index a4f8c72e1c..52cb448d01 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -65,6 +65,9 @@ int external_odb_has_object(const unsigned char *sha1)
 {
 	struct odb_helper *o;
 
+	if (!use_external_odb)
+		return 0;
+
 	external_odb_init();
 
 	for (o = helpers; o; o = o->next) {
@@ -124,6 +127,9 @@ int external_odb_put_object(const void *buf, size_t len,
 {
 	struct odb_helper *o;
 
+	if (!use_external_odb)
+		return 1;
+
 	/* For now accept only blobs */
 	if (strcmp(type, "blob"))
 		return 1;
diff --git a/sha1_file.c b/sha1_file.c
index d0155e392f..7b2a0f64fa 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -619,6 +619,9 @@ void prepare_external_alt_odb(void)
 	static int linked_external;
 	const char *path;
 
+	if (!use_external_odb)
+		return;
+
 	if (linked_external)
 		return;
 
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 16/40] Add t0410 to test external ODB transfer
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (14 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 15/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 17/40] lib-httpd: pass config file to start_httpd() Christian Couder
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0410-transfer-e-odb.sh | 144 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100755 t/t0410-transfer-e-odb.sh

diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
new file mode 100755
index 0000000000..065ec7d759
--- /dev/null
+++ b/t/t0410-transfer-e-odb.sh
@@ -0,0 +1,144 @@
+#!/bin/sh
+
+test_description='basic tests for transfering external ODBs'
+
+. ./test-lib.sh
+
+ORIG_SOURCE="$PWD/.git"
+export ORIG_SOURCE
+
+ALT_SOURCE1="$PWD/alt-repo1/.git"
+export ALT_SOURCE1
+write_script odb-helper1 <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+GIT_DIR=$ALT_SOURCE1; export GIT_DIR
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
+have)
+	git cat-file --batch-check --batch-all-objects |
+	awk '{print $1 " " $3 " " $2}'
+	;;
+get_git_obj)
+	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$ORIG_SOURCE GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	GIT_DIR=$ORIG_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER1="\"$PWD\"/odb-helper1"
+
+OTHER_SOURCE="$PWD/.git"
+export OTHER_SOURCE
+
+ALT_SOURCE2="$PWD/alt-repo2/.git"
+export ALT_SOURCE2
+write_script odb-helper2 <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+GIT_DIR=$ALT_SOURCE2; export GIT_DIR
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
+have)
+	GIT_DIR=$OTHER_SOURCE git for-each-ref --format='%(objectname)' refs/odbs/magic/ | GIT_DIR=$OTHER_SOURCE xargs git show
+	;;
+get_git_obj)
+	OBJ_FILE="$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	if ! test -f "$OBJ_FILE"
+	then
+		# "Download" the missing object by copying it from alt-repo1
+		OBJ_DIR=$(echo $2 | sed 's/\(..\).*/\1/')
+		OBJ_BASE=$(basename "$OBJ_FILE")
+		ALT_OBJ_DIR1="$ALT_SOURCE1/objects/$OBJ_DIR"
+		ALT_OBJ_DIR2="$ALT_SOURCE2/objects/$OBJ_DIR"
+		mkdir -p "$ALT_OBJ_DIR2" || die "Could not mkdir '$ALT_OBJ_DIR2'"
+		OBJ_SRC="$ALT_OBJ_DIR1/$OBJ_BASE"
+		cp "$OBJ_SRC" "$ALT_OBJ_DIR2" ||
+		die "Could not cp '$OBJ_SRC' into '$ALT_OBJ_DIR2'"
+	fi
+	cat "$OBJ_FILE" || die "Could not cat '$OBJ_FILE'"
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$OTHER_SOURCE GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	GIT_DIR=$OTHER_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER2="\"$PWD\"/odb-helper2"
+
+test_expect_success 'setup first alternate repo' '
+	git init alt-repo1 &&
+	test_commit zero &&
+	git config odb.magic.scriptCommand "$HELPER1"
+'
+
+test_expect_success 'setup other repo and its alternate repo' '
+	git init other-repo &&
+	git init alt-repo2 &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'new blobs are put in first object store' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo1 && git show "$hash1") &&
+	test "$content" = "one" &&
+	test_commit two &&
+	hash2=$(git ls-tree HEAD | grep two.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo1 && git show "$hash2") &&
+	test "$content" = "two"
+'
+
+test_expect_success 'other repo gets the blobs from object store' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 test_must_fail git cat-file blob "$hash2" &&
+	 git config odb.magic.scriptCommand "$HELPER2" &&
+	 git cat-file blob "$hash1" &&
+	 git cat-file blob "$hash2"
+	)
+'
+
+test_expect_success 'other repo gets everything else' '
+	(cd other-repo &&
+	 git fetch origin &&
+	 content=$(git show "$hash1") &&
+	 test "$content" = "one" &&
+	 content=$(git show "$hash2") &&
+	 test "$content" = "two")
+'
+
+test_done
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 17/40] lib-httpd: pass config file to start_httpd()
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (15 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 16/40] Add t0410 to test external ODB transfer Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 18/40] lib-httpd: add upload.sh Christian Couder
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This makes it possible to start an apache web server with different
config files.

This will be used in a later patch to pass a config file that makes
apache store external objects.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465a..2e659a8ee2 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -171,12 +171,14 @@ prepare_httpd() {
 }
 
 start_httpd() {
+	APACHE_CONF_FILE=${1-apache.conf}
+
 	prepare_httpd >&3 2>&4
 
 	trap 'code=$?; stop_httpd; (exit $code); die' EXIT
 
 	"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-		-f "$TEST_PATH/apache.conf" $HTTPD_PARA \
+		-f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA \
 		-c "Listen 127.0.0.1:$LIB_HTTPD_PORT" -k start \
 		>&3 2>&4
 	if test $? -ne 0
@@ -191,7 +193,7 @@ stop_httpd() {
 	trap 'die' EXIT
 
 	"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-		-f "$TEST_PATH/apache.conf" $HTTPD_PARA -k stop
+		-f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA -k stop
 }
 
 test_http_push_nonff () {
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 18/40] lib-httpd: add upload.sh
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (16 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 17/40] lib-httpd: pass config file to start_httpd() Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 19/40] lib-httpd: add list.sh Christian Couder
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This cgi will be used to upload objects to, or to delete
objects from, an apache web server.

This way the apache server can work as an external object
database.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh        |  1 +
 t/lib-httpd/upload.sh | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 t/lib-httpd/upload.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 2e659a8ee2..d80b004549 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
 	install_script error.sh
+	install_script upload.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/upload.sh b/t/lib-httpd/upload.sh
new file mode 100644
index 0000000000..64d3f31c31
--- /dev/null
+++ b/t/lib-httpd/upload.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+# In part from http://codereview.stackexchange.com/questions/79549/bash-cgi-upload-file
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+	key=${1%%=*}
+	val=${1#*=}
+
+	case "$key" in
+	"sha1") sha1="$val" ;;
+	"type") type="$val" ;;
+	"size") size="$val" ;;
+	"delete") delete=1 ;;
+	*) echo >&2 "unknown key '$key'" ;;
+	esac
+
+	shift
+done
+
+case "$REQUEST_METHOD" in
+POST)
+	if test "$delete" = "1"
+	then
+		rm -f "$FILES_DIR/$sha1-$size-$type"
+	else
+		mkdir -p "$FILES_DIR"
+		cat >"$FILES_DIR/$sha1-$size-$type"
+	fi
+
+	echo 'Status: 204 No Content'
+	echo
+	;;
+
+*)
+	echo 'Status: 405 Method Not Allowed'
+	echo
+esac
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 19/40] lib-httpd: add list.sh
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (17 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 18/40] lib-httpd: add upload.sh Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 20/40] lib-httpd: add apache-e-odb.conf Christian Couder
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This cgi script can list Git objects that have been uploaded as
files to an apache web server. This script can also retrieve
the content of each of these files.

This will help make apache work as an external object database.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh      |  1 +
 t/lib-httpd/list.sh | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)
 create mode 100644 t/lib-httpd/list.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index d80b004549..f31ea261f5 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -133,6 +133,7 @@ prepare_httpd() {
 	install_script broken-smart-http.sh
 	install_script error.sh
 	install_script upload.sh
+	install_script list.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/list.sh b/t/lib-httpd/list.sh
new file mode 100644
index 0000000000..b6d6c29a2f
--- /dev/null
+++ b/t/lib-httpd/list.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+	key=${1%%=*}
+	val=${1#*=}
+
+	case "$key" in
+	"sha1") sha1="$val" ;;
+	*) echo >&2 "unknown key '$key'" ;;
+	esac
+
+	shift
+done
+
+if test -d "$FILES_DIR"
+then
+	if test -z "$sha1"
+	then
+		echo 'Status: 200 OK'
+		echo
+		ls "$FILES_DIR" | tr '-' ' '
+	else
+		if test -f "$FILES_DIR/$sha1"-*
+		then
+			echo 'Status: 200 OK'
+			echo
+			cat "$FILES_DIR/$sha1"-*
+		else
+			echo 'Status: 404 Not Found'
+			echo
+		fi
+	fi
+fi
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 20/40] lib-httpd: add apache-e-odb.conf
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (18 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 19/40] lib-httpd: add list.sh Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 21/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This is an apache config file to test external object databases.
It uses the upload.sh and list.sh cgi that have been added
previously to make apache store external objects.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd/apache-e-odb.conf | 214 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 214 insertions(+)
 create mode 100644 t/lib-httpd/apache-e-odb.conf

diff --git a/t/lib-httpd/apache-e-odb.conf b/t/lib-httpd/apache-e-odb.conf
new file mode 100644
index 0000000000..19a1540c82
--- /dev/null
+++ b/t/lib-httpd/apache-e-odb.conf
@@ -0,0 +1,214 @@
+ServerName dummy
+PidFile httpd.pid
+DocumentRoot www
+LogFormat "%h %l %u %t \"%r\" %>s %b" common
+CustomLog access.log common
+ErrorLog error.log
+<IfModule !mod_log_config.c>
+	LoadModule log_config_module modules/mod_log_config.so
+</IfModule>
+<IfModule !mod_alias.c>
+	LoadModule alias_module modules/mod_alias.so
+</IfModule>
+<IfModule !mod_cgi.c>
+	LoadModule cgi_module modules/mod_cgi.so
+</IfModule>
+<IfModule !mod_env.c>
+	LoadModule env_module modules/mod_env.so
+</IfModule>
+<IfModule !mod_rewrite.c>
+	LoadModule rewrite_module modules/mod_rewrite.so
+</IFModule>
+<IfModule !mod_version.c>
+	LoadModule version_module modules/mod_version.so
+</IfModule>
+<IfModule !mod_headers.c>
+	LoadModule headers_module modules/mod_headers.so
+</IfModule>
+
+<IfVersion < 2.4>
+LockFile accept.lock
+</IfVersion>
+
+<IfVersion < 2.1>
+<IfModule !mod_auth.c>
+	LoadModule auth_module modules/mod_auth.so
+</IfModule>
+</IfVersion>
+
+<IfVersion >= 2.1>
+<IfModule !mod_auth_basic.c>
+	LoadModule auth_basic_module modules/mod_auth_basic.so
+</IfModule>
+<IfModule !mod_authn_file.c>
+	LoadModule authn_file_module modules/mod_authn_file.so
+</IfModule>
+<IfModule !mod_authz_user.c>
+	LoadModule authz_user_module modules/mod_authz_user.so
+</IfModule>
+<IfModule !mod_authz_host.c>
+	LoadModule authz_host_module modules/mod_authz_host.so
+</IfModule>
+</IfVersion>
+
+<IfVersion >= 2.4>
+<IfModule !mod_authn_core.c>
+	LoadModule authn_core_module modules/mod_authn_core.so
+</IfModule>
+<IfModule !mod_authz_core.c>
+	LoadModule authz_core_module modules/mod_authz_core.so
+</IfModule>
+<IfModule !mod_access_compat.c>
+	LoadModule access_compat_module modules/mod_access_compat.so
+</IfModule>
+<IfModule !mod_mpm_prefork.c>
+	LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
+</IfModule>
+<IfModule !mod_unixd.c>
+	LoadModule unixd_module modules/mod_unixd.so
+</IfModule>
+</IfVersion>
+
+PassEnv GIT_VALGRIND
+PassEnv GIT_VALGRIND_OPTIONS
+PassEnv GNUPGHOME
+PassEnv ASAN_OPTIONS
+PassEnv GIT_TRACE
+PassEnv GIT_CONFIG_NOSYSTEM
+
+Alias /dumb/ www/
+Alias /auth/dumb/ www/auth/dumb/
+
+<LocationMatch /smart/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+</LocationMatch>
+<LocationMatch /smart_noexport/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+</LocationMatch>
+<LocationMatch /smart_custom_env/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	SetEnv GIT_COMMITTER_NAME "Custom User"
+	SetEnv GIT_COMMITTER_EMAIL custom@example.com
+</LocationMatch>
+<LocationMatch /smart_namespace/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	SetEnv GIT_NAMESPACE ns
+</LocationMatch>
+<LocationMatch /smart_cookies/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	Header set Set-Cookie name=value
+</LocationMatch>
+<LocationMatch /smart_headers/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+</LocationMatch>
+ScriptAlias /upload/ upload.sh/
+ScriptAlias /list/ list.sh/
+<Directory ${GIT_EXEC_PATH}>
+	Options FollowSymlinks
+</Directory>
+<Files upload.sh>
+  Options ExecCGI
+</Files>
+<Files list.sh>
+  Options ExecCGI
+</Files>
+<Files ${GIT_EXEC_PATH}/git-http-backend>
+	Options ExecCGI
+</Files>
+
+RewriteEngine on
+RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
+RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
+RewriteRule ^/smart-redir-auth/(.*)$ /auth/smart/$1 [R=301]
+RewriteRule ^/smart-redir-limited/(.*)/info/refs$ /smart/$1/info/refs [R=301]
+RewriteRule ^/ftp-redir/(.*)$ ftp://localhost:1000/$1 [R=302]
+
+RewriteRule ^/loop-redir/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-(.*) /$1 [R=302]
+RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
+
+# Apache 2.2 does not understand <RequireAll>, so we use RewriteCond.
+# And as RewriteCond does not allow testing for non-matches, we match
+# the desired case first (one has abra, two has cadabra), and let it
+# pass by marking the RewriteRule as [L], "last rule, do not process
+# any other matching RewriteRules after this"), and then have another
+# RewriteRule that matches all other cases and lets them fail via '[F]',
+# "fail the request".
+RewriteCond %{HTTP:x-magic-one} =abra
+RewriteCond %{HTTP:x-magic-two} =cadabra
+RewriteRule ^/smart_headers/.* - [L]
+RewriteRule ^/smart_headers/.* - [F]
+
+<IfDefine SSL>
+LoadModule ssl_module modules/mod_ssl.so
+
+SSLCertificateFile httpd.pem
+SSLCertificateKeyFile httpd.pem
+SSLRandomSeed startup file:/dev/urandom 512
+SSLRandomSeed connect file:/dev/urandom 512
+SSLSessionCache none
+SSLMutex file:ssl_mutex
+SSLEngine On
+</IfDefine>
+
+<Location /auth/>
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</Location>
+
+<LocationMatch "^/auth-push/.*/git-receive-pack$">
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</LocationMatch>
+
+<LocationMatch "^/auth-fetch/.*/git-upload-pack$">
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</LocationMatch>
+
+RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
+RewriteCond %{REQUEST_URI} /git-receive-pack$
+RewriteRule ^/half-auth-complete/ - [E=AUTHREQUIRED:yes]
+
+<Location /half-auth-complete/>
+  Order Deny,Allow
+  Deny from env=AUTHREQUIRED
+
+  AuthType Basic
+  AuthName "Git Access"
+  AuthUserFile passwd
+  Require valid-user
+  Satisfy Any
+</Location>
+
+<IfDefine DAV>
+	LoadModule dav_module modules/mod_dav.so
+	LoadModule dav_fs_module modules/mod_dav_fs.so
+
+	DAVLockDB DAVLock
+	<Location /dumb/>
+		Dav on
+	</Location>
+	<Location /auth/dumb>
+		Dav on
+	</Location>
+</IfDefine>
+
+<IfDefine SVN>
+	LoadModule dav_svn_module modules/mod_dav_svn.so
+
+	<Location /${LIB_HTTPD_SVN}>
+		DAV svn
+		SVNPath "${LIB_HTTPD_SVNPATH}"
+	</Location>
+</IfDefine>
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 21/40] odb-helper: add odb_helper_get_raw_object()
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (19 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 20/40] lib-httpd: add apache-e-odb.conf Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 22/40] pack-objects: don't pack objects in external odbs Christian Couder
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

The existing odb_helper_get_object() is renamed
odb_helper_get_git_object() and a new odb_helper_get_raw_object()
is introduced to deal with external objects that are not in Git format.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 111 insertions(+), 2 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index 39d20fdfd7..1f4666b349 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -221,8 +221,107 @@ int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
 	return !!odb_helper_lookup(o, sha1);
 }
 
-int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
-			    int fd)
+static int odb_helper_get_raw_object(struct odb_helper *o,
+				     const unsigned char *sha1,
+				     int fd)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+	unsigned long total_got = 0;
+
+	char hdr[32];
+	int hdrlen;
+
+	int ret = Z_STREAM_END;
+	unsigned char compressed[4096];
+	git_zstream stream;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, 0, "get_raw_obj %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	/* Set it up */
+	git_deflate_init(&stream, zlib_compression_level);
+	stream.next_out = compressed;
+	stream.avail_out = sizeof(compressed);
+	git_SHA1_Init(&hash);
+
+	/* First header.. */
+	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(obj->type), obj->size) + 1;
+	stream.next_in = (unsigned char *)hdr;
+	stream.avail_in = hdrlen;
+	while (git_deflate(&stream, 0) == Z_OK)
+		; /* nothing */
+	git_SHA1_Update(&hash, hdr, hdrlen);
+
+	for (;;) {
+		unsigned char buf[4096];
+		int r;
+
+		r = xread(cmd.child.out, buf, sizeof(buf));
+		if (r < 0) {
+			error("unable to read from odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			git_deflate_end(&stream);
+			return -1;
+		}
+		if (r == 0)
+			break;
+
+		total_got += r;
+
+		/* Then the data itself.. */
+		stream.next_in = (void *)buf;
+		stream.avail_in = r;
+		do {
+			unsigned char *in0 = stream.next_in;
+			ret = git_deflate(&stream, Z_FINISH);
+			git_SHA1_Update(&hash, in0, stream.next_in - in0);
+			write_or_die(fd, compressed, stream.next_out - compressed);
+			stream.next_out = compressed;
+			stream.avail_out = sizeof(compressed);
+		} while (ret == Z_OK);
+	}
+
+	close(cmd.child.out);
+	if (ret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	ret = git_deflate_end_gently(&stream);
+	if (ret != Z_OK) {
+		warning("deflateEnd on object %s from odb helper '%s' failed (%d)",
+			sha1_to_hex(sha1), o->name, ret);
+		return -1;
+	}
+	git_SHA1_Final(real_sha1, &hash);
+	if (hashcmp(sha1, real_sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+	if (total_got != obj->size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, obj->size);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int odb_helper_get_git_object(struct odb_helper *o,
+				     const unsigned char *sha1,
+				     int fd)
 {
 	struct odb_helper_object *obj;
 	struct odb_helper_cmd cmd;
@@ -326,6 +425,16 @@ int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 	return 0;
 }
 
+int odb_helper_get_object(struct odb_helper *o,
+			  const unsigned char *sha1,
+			  int fd)
+{
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
+		return odb_helper_get_raw_object(o, sha1, fd);
+	else
+		return odb_helper_get_git_object(o, sha1, fd);
+}
+
 int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1)
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 22/40] pack-objects: don't pack objects in external odbs
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (20 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 21/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 23/40] Add t0420 to test transfer to HTTP external odb Christian Couder
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Objects managed by an external ODB should not be put into
pack files. They should be transfered using other mechanism
that can be specific to the external odb.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/pack-objects.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a57b4f058d..db5e225d5a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -26,6 +26,7 @@
 #include "argv-array.h"
 #include "mru.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 static const char *pack_usage[] = {
 	N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -1012,6 +1013,9 @@ static int want_object_in_pack(const unsigned char *sha1,
 			return want;
 	}
 
+	if (external_odb_has_object(sha1))
+		return 0;
+
 	for (entry = packed_git_mru->head; entry; entry = entry->next) {
 		struct packed_git *p = entry->item;
 		off_t offset;
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 23/40] Add t0420 to test transfer to HTTP external odb
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (21 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 22/40] pack-objects: don't pack objects in external odbs Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 24/40] external-odb: add 'get_direct' support Christian Couder
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This tests that an apache web server can be used as an
external object database and store files in their native
format instead of converting them to a Git object.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0420-transfer-http-e-odb.sh | 142 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100755 t/t0420-transfer-http-e-odb.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
new file mode 100755
index 0000000000..f84fe950ec
--- /dev/null
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -0,0 +1,142 @@
+#!/bin/sh
+
+test_description='tests for transfering external objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-http-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+echo >&2 "odb-http-helper args:" "$@"
+case "$1" in
+init)
+	echo "capability=get_raw_obj"
+	echo "capability=put_raw_obj"
+	echo "capability=have"
+	;;
+have)
+	list_url="$HTTPD_URL/list/"
+	curl "$list_url" ||
+	die "curl '$list_url' failed"
+	;;
+get_raw_obj)
+	get_url="$HTTPD_URL/list/?sha1=$2"
+	curl "$get_url" ||
+	die "curl '$get_url' failed"
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	upload_url="$HTTPD_URL/upload/?sha1=$sha1&size=$size&type=$kind"
+	curl --data-binary @- --include "$upload_url" >out ||
+	die "curl '$upload_url' failed"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-http-helper"
+
+test_expect_success 'setup repo with a root commit and the helper' '
+	test_commit zero &&
+	git config odb.magic.scriptCommand "$HELPER"
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.scriptCommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.scriptCommand="$HELPER" \
+		--no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 24/40] external-odb: add 'get_direct' support
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (22 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 23/40] Add t0420 to test transfer to HTTP external odb Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 25/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This implements the 'get_direct' capability/instruction that makes
it possible for external odb helper scripts to pass blobs to Git
by directly writing them as loose objects files.

It is better to call this a "direct" mode rather than a "fault-in"
mode as we could have the same kind of mechanism to "put" objects
into an external odb, where the odb helper would access blobs it
wants to send to an external odb directly from files, but it
would be strange to call that a fault-in mode too.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 21 ++++++++++++++++++++-
 external-odb.h |  1 +
 odb-helper.c   | 27 +++++++++++++++++++++++++--
 odb-helper.h   |  2 ++
 4 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 52cb448d01..31d21bfe04 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -96,7 +96,8 @@ int external_odb_get_object(const unsigned char *sha1)
 		int ret;
 		int fd;
 
-		if (!odb_helper_has_object(o, sha1))
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ) &&
+		    !(o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ))
 			continue;
 
 		fd = create_object_tmpfile(&tmpfile, path);
@@ -122,6 +123,24 @@ int external_odb_get_object(const unsigned char *sha1)
 	return -1;
 }
 
+int external_odb_get_direct(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT))
+			continue;
+		if (odb_helper_get_direct(o, sha1) < 0)
+			continue;
+		return 0;
+	}
+
+	return -1;
+}
+
 int external_odb_put_object(const void *buf, size_t len,
 			    const char *type, unsigned char *sha1)
 {
diff --git a/external-odb.h b/external-odb.h
index d369dfdf6f..1fda08c0fb 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,6 +4,7 @@
 extern const char *external_odb_root(void);
 extern int external_odb_has_object(const unsigned char *sha1);
 extern int external_odb_get_object(const unsigned char *sha1);
+extern int external_odb_get_direct(const unsigned char *sha1);
 extern int external_odb_put_object(const void *buf, size_t len,
 				   const char *type, unsigned char *sha1);
 
diff --git a/odb-helper.c b/odb-helper.c
index 1f4666b349..3d940a3171 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -425,14 +425,37 @@ static int odb_helper_get_git_object(struct odb_helper *o,
 	return 0;
 }
 
+int odb_helper_get_direct(struct odb_helper *o,
+			  const unsigned char *sha1)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, 0, "get_direct %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+
+	return 0;
+}
+
 int odb_helper_get_object(struct odb_helper *o,
 			  const unsigned char *sha1,
 			  int fd)
 {
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
+		return odb_helper_get_git_object(o, sha1, fd);
 	if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
 		return odb_helper_get_raw_object(o, sha1, fd);
-	else
-		return odb_helper_get_git_object(o, sha1, fd);
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT)
+		return 0;
+
+	BUG("invalid get capability (capabilities: '%d')", o->supported_capabilities);
 }
 
 int odb_helper_put_object(struct odb_helper *o,
diff --git a/odb-helper.h b/odb-helper.h
index 0571ba09cb..fbb6d333ee 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -35,6 +35,8 @@ extern int odb_helper_has_object(struct odb_helper *o,
 extern int odb_helper_get_object(struct odb_helper *o,
 				 const unsigned char *sha1,
 				 int fd);
+extern int odb_helper_get_direct(struct odb_helper *o,
+				 const unsigned char *sha1);
 extern int odb_helper_put_object(struct odb_helper *o,
 				 const void *buf, size_t len,
 				 const char *type, unsigned char *sha1);
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 25/40] odb-helper: add 'script_mode' to 'struct odb_helper'
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (23 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 24/40] external-odb: add 'get_direct' support Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 26/40] odb-helper: add init_object_process() Christian Couder
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

to prepare for having a long running odb helper sub-process
handling the communication between Git and an external odb.

We introduce "odb.<name>.subprocesscommand" to make it
possible to define such a sub-process, and we mark such odb
helpers with the new 'script_mode' field set to 0.

Helpers defined using the existing "odb.<name>.scriptcommand"
are marked with the 'script_mode' field set to 1.

Implementation of the different capabilities/instructions in
the new (sub-)process mode is left for following commits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c |  8 +++++++-
 odb-helper.c   | 19 ++++++++++++++-----
 odb-helper.h   |  1 +
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 31d21bfe04..ccca67eff5 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -32,8 +32,14 @@ static int external_odb_config(const char *var, const char *value, void *data)
 
 	o = find_or_create_helper(name, namelen);
 
-	if (!strcmp(subkey, "scriptcommand"))
+	if (!strcmp(subkey, "scriptcommand")) {
+		o->script_mode = 1;
 		return git_config_string(&o->cmd, var, value);
+	}
+	if (!strcmp(subkey, "subprocesscommand")) {
+		o->script_mode = 0;
+		return git_config_string(&o->cmd, var, value);
+	}
 
 	return 0;
 }
diff --git a/odb-helper.c b/odb-helper.c
index 3d940a3171..22728200e3 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -123,6 +123,9 @@ int odb_helper_init(struct odb_helper *o)
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
+	if (!o->script_mode)
+		return 0;
+
 	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
@@ -173,16 +176,12 @@ static int odb_helper_object_cmp(const void *va, const void *vb)
 	return hashcmp(a->sha1, b->sha1);
 }
 
-static void odb_helper_load_have(struct odb_helper *o)
+static void have_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (o->have_valid)
-		return;
-	o->have_valid = 1;
-
 	if (odb_helper_start(o, &cmd, 0, "have") < 0)
 		return;
 
@@ -194,6 +193,16 @@ static void odb_helper_load_have(struct odb_helper *o)
 	strbuf_release(&line);
 	fclose(fh);
 	odb_helper_finish(o, &cmd);
+}
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+	if (o->have_valid)
+		return;
+	o->have_valid = 1;
+
+	if (o->script_mode)
+		have_object_script(o);
 
 	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
diff --git a/odb-helper.h b/odb-helper.h
index fbb6d333ee..ee2b09b182 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -15,6 +15,7 @@ struct odb_helper {
 	const char *name;
 	const char *cmd;
 	unsigned int supported_capabilities;
+	int script_mode;
 
 	struct odb_helper_object {
 		unsigned char sha1[20];
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 26/40] odb-helper: add init_object_process()
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (24 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 25/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 27/40] Add t0450 to test 'get_direct' mechanism Christian Couder
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Ben Peart

From: Ben Peart <benpeart@microsoft.com>

This adds the infrastructure to launch and use long running
sub-processes as external odb helpers.

For now only the 'init' and 'get_direct' capabilities are
supported with sub-processes.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c |  52 ++++---
 odb-helper.c   | 469 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 sha1_file.c    |  56 +++++--
 3 files changed, 523 insertions(+), 54 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index ccca67eff5..084cd32e0b 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -67,32 +67,11 @@ const char *external_odb_root(void)
 	return root;
 }
 
-int external_odb_has_object(const unsigned char *sha1)
-{
-	struct odb_helper *o;
-
-	if (!use_external_odb)
-		return 0;
-
-	external_odb_init();
-
-	for (o = helpers; o; o = o->next) {
-		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
-			return 1;
-		if (odb_helper_has_object(o, sha1))
-			return 1;
-	}
-	return 0;
-}
-
-int external_odb_get_object(const unsigned char *sha1)
+static int external_odb_do_get_object(const unsigned char *sha1)
 {
 	struct odb_helper *o;
 	const char *path;
 
-	if (!external_odb_has_object(sha1))
-		return -1;
-
 	path = sha1_file_name_alt(external_odb_root(), sha1);
 	safe_create_leading_directories_const(path);
 	prepare_external_alt_odb();
@@ -147,6 +126,35 @@ int external_odb_get_direct(const unsigned char *sha1)
 	return -1;
 }
 
+int external_odb_has_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	if (!use_external_odb)
+		return 0;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE)) {
+			if (o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT)
+				return 1;
+			return !external_odb_do_get_object(sha1);
+		}
+		if (odb_helper_has_object(o, sha1))
+			return 1;
+	}
+	return 0;
+}
+
+int external_odb_get_object(const unsigned char *sha1)
+{
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	return external_odb_do_get_object(sha1);
+}
+
 int external_odb_put_object(const void *buf, size_t len,
 			    const char *type, unsigned char *sha1)
 {
diff --git a/odb-helper.c b/odb-helper.c
index 22728200e3..3148bcfa15 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -4,6 +4,22 @@
 #include "odb-helper.h"
 #include "run-command.h"
 #include "sha1-lookup.h"
+#include "sub-process.h"
+#include "pkt-line.h"
+#include "sigchain.h"
+
+struct object_process {
+	struct subprocess_entry subprocess;
+	unsigned int supported_capabilities;
+};
+
+static struct hashmap subprocess_map;
+
+static int check_object_process_status(int fd, struct strbuf *status)
+{
+	subprocess_read_status(fd, status);
+	return strcmp(status->buf, "success");
+}
 
 static void parse_capabilities(char *cap_buf,
 			       unsigned int *supported_capabilities,
@@ -39,6 +55,384 @@ static void parse_capabilities(char *cap_buf,
 	string_list_clear(&cap_list, 0);
 }
 
+static int start_object_process_fn(struct subprocess_entry *subprocess)
+{
+	static int versions[] = {1, 0};
+	static struct subprocess_capability capabilities[] = {
+		{ "get_git_obj", ODB_HELPER_CAP_GET_GIT_OBJ },
+		{ "get_raw_obj", ODB_HELPER_CAP_GET_RAW_OBJ },
+		{ "get_direct",  ODB_HELPER_CAP_GET_DIRECT  },
+		{ "put_git_obj", ODB_HELPER_CAP_PUT_GIT_OBJ },
+		{ "put_raw_obj", ODB_HELPER_CAP_PUT_RAW_OBJ },
+		{ "put_direct",  ODB_HELPER_CAP_PUT_DIRECT  },
+		{ "have",        ODB_HELPER_CAP_HAVE },
+		{ NULL, 0 }
+	};
+	struct object_process *entry = (struct object_process *)subprocess;
+	return subprocess_handshake(subprocess, "git-read-object", versions, NULL,
+				    capabilities,
+				    &entry->supported_capabilities);
+}
+
+static struct object_process *launch_object_process(struct odb_helper *o,
+						    unsigned int capability)
+{
+	struct object_process *entry = NULL;
+
+	if (!subprocess_map.tablesize)
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp, NULL, 0);
+	else
+		entry = (struct object_process *)subprocess_find_entry(&subprocess_map, o->cmd);
+
+	fflush(NULL);
+
+	if (!entry) {
+		entry = xmalloc(sizeof(*entry));
+		entry->supported_capabilities = 0;
+
+		if (subprocess_start(&subprocess_map, &entry->subprocess, o->cmd, start_object_process_fn)) {
+			error("Could not launch process for cmd '%s'", o->cmd);
+			free(entry);
+			return NULL;
+		}
+	}
+
+	o->supported_capabilities = entry->supported_capabilities;
+
+	if (capability && !(capability & entry->supported_capabilities)) {
+		error("The cmd '%s' does not support capability '%d'", o->cmd, capability);
+		return NULL;
+	}
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	return entry;
+}
+
+static int check_object_process_error(int err,
+				      const char *status,
+				      struct object_process *entry,
+				      const char *cmd,
+				      unsigned int capability)
+{
+	sigchain_pop(SIGPIPE);
+
+	if (!err)
+		return 0;
+
+	if (!strcmp(status, "error")) {
+		/* The process signaled a problem with the file. */
+	} else if (!strcmp(status, "notfound")) {
+		/* Object was not found */
+		err = -1;
+	} else if (!strcmp(status, "abort")) {
+		/*
+		 * The process signaled a permanent problem. Don't try to read
+		 * objects with the same command for the lifetime of the current
+		 * Git process.
+		 */
+		if (capability)
+			entry->supported_capabilities &= ~capability;
+	} else {
+		/*
+		 * Something went wrong with the read-object process.
+		 * Force shutdown and restart if needed.
+		 */
+		error("external object process '%s' failed", cmd);
+		subprocess_stop(&subprocess_map, &entry->subprocess);
+		free(entry);
+	}
+
+	return err;
+}
+
+static int send_init_packets(struct object_process *entry,
+			     struct strbuf *status)
+{
+	struct child_process *process = &entry->subprocess.process;
+
+	return packet_write_fmt_gently(process->in, "command=init\n") ||
+		packet_flush_gently(process->in) ||
+		check_object_process_status(process->out, status);
+}
+
+static int init_object_process(struct odb_helper *o)
+{
+	int err;
+	struct strbuf status = STRBUF_INIT;
+	struct object_process *entry = launch_object_process(o, 0);
+	if (!entry)
+		return -1;
+
+	err = send_init_packets(entry, &status);
+
+	return check_object_process_error(err, status.buf, entry,
+					  o->cmd, 0);
+}
+
+static ssize_t read_packetized_raw_object_to_fd(struct odb_helper *o,
+						const unsigned char *sha1,
+						int fd_in, int fd_out)
+{
+	ssize_t total_read = 0;
+	unsigned long total_got = 0;
+	int packet_len;
+
+	char hdr[32];
+	int hdrlen;
+
+	int ret = Z_STREAM_END;
+	unsigned char compressed[4096];
+	git_zstream stream;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	off_t size;
+	enum object_type type;
+	const char *s;
+	int pkt_size;
+	char *size_buf;
+
+	size_buf = packet_read_line(fd_in, &pkt_size);
+	if (!skip_prefix(size_buf, "size=", &s))
+		return error("odb helper '%s' did not send size of plain object", o->name);
+	size = strtoumax(s, NULL, 10);
+	if (!skip_prefix(packet_read_line(fd_in, NULL), "kind=", &s))
+		return error("odb helper '%s' did not send kind of plain object", o->name);
+	/* Check if the object is not available */
+	if (!strcmp(s, "none"))
+		return -1;
+	type = type_from_string_gently(s, strlen(s), 1);
+	if (type < 0)
+		return error("odb helper '%s' sent bad type '%s'", o->name, s);
+
+	/* Set it up */
+	git_deflate_init(&stream, zlib_compression_level);
+	stream.next_out = compressed;
+	stream.avail_out = sizeof(compressed);
+	git_SHA1_Init(&hash);
+
+	/* First header.. */
+	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(type), size) + 1;
+	stream.next_in = (unsigned char *)hdr;
+	stream.avail_in = hdrlen;
+	while (git_deflate(&stream, 0) == Z_OK)
+		; /* nothing */
+	git_SHA1_Update(&hash, hdr, hdrlen);
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+
+		packet_len = packet_read(fd_in, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		total_got += packet_len;
+
+		/* Then the data itself.. */
+		stream.next_in = (void *)buf;
+		stream.avail_in = packet_len;
+		do {
+			unsigned char *in0 = stream.next_in;
+			ret = git_deflate(&stream, Z_FINISH);
+			git_SHA1_Update(&hash, in0, stream.next_in - in0);
+			write_or_die(fd_out, compressed, stream.next_out - compressed);
+			stream.next_out = compressed;
+			stream.avail_out = sizeof(compressed);
+		} while (ret == Z_OK);
+
+		total_read += packet_len;
+	}
+
+	if (packet_len < 0) {
+		error("unable to read from odb helper '%s': %s",
+		      o->name, strerror(errno));
+		git_deflate_end(&stream);
+		return packet_len;
+	}
+
+	if (ret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+
+	ret = git_deflate_end_gently(&stream);
+	if (ret != Z_OK) {
+		warning("deflateEnd on object %s from odb helper '%s' failed (%d)",
+			sha1_to_hex(sha1), o->name, ret);
+		return -1;
+	}
+	git_SHA1_Final(real_sha1, &hash);
+	if (hashcmp(sha1, real_sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (total_got != size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, size);
+		return -1;
+	}
+
+	return total_read;
+}
+
+static ssize_t read_packetized_git_object_to_fd(struct odb_helper *o,
+						const unsigned char *sha1,
+						int fd_in, int fd_out)
+{
+	ssize_t total_read = 0;
+	unsigned long total_got = 0;
+	int packet_len;
+	git_zstream stream;
+	int zret = Z_STREAM_END;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+	struct strbuf header = STRBUF_INIT;
+	unsigned long hdr_size;
+
+	memset(&stream, 0, sizeof(stream));
+	git_inflate_init(&stream);
+	git_SHA1_Init(&hash);
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+
+		packet_len = packet_read(fd_in, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		write_or_die(fd_out, buf, packet_len);
+
+		stream.next_in = (unsigned char *)buf;
+		stream.avail_in = packet_len;
+		do {
+			unsigned char inflated[4096];
+			unsigned long got;
+
+			stream.next_out = inflated;
+			stream.avail_out = sizeof(inflated);
+			zret = git_inflate(&stream, Z_SYNC_FLUSH);
+			got = sizeof(inflated) - stream.avail_out;
+
+			git_SHA1_Update(&hash, inflated, got);
+			/* skip header when counting size */
+			if (!total_got) {
+				const unsigned char *p = memchr(inflated, '\0', got);
+				if (p) {
+					unsigned long hdr_last = p - inflated + 1;
+					strbuf_add(&header, inflated, hdr_last);
+					got -= hdr_last;
+				} else {
+					strbuf_add(&header, inflated, got);
+					got = 0;
+				}
+			}
+			total_got += got;
+		} while (stream.avail_in && zret == Z_OK);
+
+		total_read += packet_len;
+	}
+
+	git_inflate_end(&stream);
+
+	if (packet_len < 0)
+		return packet_len;
+
+	git_SHA1_Final(real_sha1, &hash);
+
+	if (zret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (hashcmp(real_sha1, sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (parse_sha1_header(header.buf, &hdr_size) < 0) {
+		warning("could not parse header from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (total_got != hdr_size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, hdr_size);
+		return -1;
+	}
+
+	return total_read;
+}
+
+static int send_get_packets(struct odb_helper *o,
+			    struct object_process *entry,
+			    const unsigned char *sha1,
+			    int fd,
+			    unsigned int *cur_cap,
+			    struct strbuf *status)
+{
+	const char *instruction;
+	int err;
+	struct child_process *process = &entry->subprocess.process;
+
+	if (entry->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ) {
+		*cur_cap = ODB_HELPER_CAP_GET_GIT_OBJ;
+		instruction = "get_git_obj";
+	} else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ) {
+		*cur_cap = ODB_HELPER_CAP_GET_RAW_OBJ;
+		instruction = "get_raw_obj";
+	} else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT) {
+		*cur_cap = ODB_HELPER_CAP_GET_DIRECT;
+		instruction = "get_direct";
+	} else {
+		BUG("No known ODB_HELPER_CAP_GET_XXX capability!");
+	}
+
+	err = packet_write_fmt_gently(process->in, "command=%s\n", instruction);
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	if (entry->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
+		err = read_packetized_raw_object_to_fd(o, sha1, process->out, fd) < 0;
+	else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
+		err = read_packetized_git_object_to_fd(o, sha1, process->out, fd) < 0;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int get_object_process(struct odb_helper *o, const unsigned char *sha1, int fd)
+{
+	int err;
+	struct strbuf status = STRBUF_INIT;
+	unsigned int cur_cap = 0;
+	struct object_process *entry = launch_object_process(o, 0);
+	if (!entry)
+		return -1;
+
+	err = send_get_packets(o, entry, sha1, fd, &cur_cap, &status);
+
+	return check_object_process_error(err, status.buf, entry,
+					  o->cmd, cur_cap);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -117,15 +511,12 @@ static int odb_helper_finish(struct odb_helper *o,
 	return 0;
 }
 
-int odb_helper_init(struct odb_helper *o)
+static int init_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (!o->script_mode)
-		return 0;
-
 	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
@@ -140,6 +531,21 @@ int odb_helper_init(struct odb_helper *o)
 	return 0;
 }
 
+int odb_helper_init(struct odb_helper *o)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	if (o->script_mode)
+		res = init_object_script(o);
+	else
+		res = init_object_process(o);
+
+	trace_performance_since(start, "odb_helper_init");
+
+	return 0;
+}
+
 static int parse_object_line(struct odb_helper_object *o, const char *line)
 {
 	char *end;
@@ -434,28 +840,42 @@ static int odb_helper_get_git_object(struct odb_helper *o,
 	return 0;
 }
 
-int odb_helper_get_direct(struct odb_helper *o,
-			  const unsigned char *sha1)
+static int get_direct_script(struct odb_helper *o, const unsigned char *sha1)
 {
-	struct odb_helper_object *obj;
 	struct odb_helper_cmd cmd;
 
-	obj = odb_helper_lookup(o, sha1);
-	if (!obj)
-		return -1;
-
 	if (odb_helper_start(o, &cmd, 0, "get_direct %s", sha1_to_hex(sha1)) < 0)
 		return -1;
-
 	if (odb_helper_finish(o, &cmd))
 		return -1;
-
 	return 0;
 }
 
-int odb_helper_get_object(struct odb_helper *o,
-			  const unsigned char *sha1,
-			  int fd)
+int odb_helper_get_direct(struct odb_helper *o,
+			  const unsigned char *sha1)
+{
+	int res;
+	uint64_t start;
+
+	if (o->supported_capabilities & ODB_HELPER_CAP_HAVE) {
+		struct odb_helper_object *obj = odb_helper_lookup(o, sha1);
+		if (!obj)
+			return -1;
+	}
+
+	start = getnanotime();
+
+	if (o->script_mode)
+		res = get_direct_script(o, sha1);
+	else
+		res = get_object_process(o, sha1, -1);
+
+	trace_performance_since(start, "odb_helper_get_direct");
+
+	return res;
+}
+
+static int get_object_script(struct odb_helper *o, const unsigned char *sha1, int fd)
 {
 	if (o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
 		return odb_helper_get_git_object(o, sha1, fd);
@@ -467,6 +887,23 @@ int odb_helper_get_object(struct odb_helper *o,
 	BUG("invalid get capability (capabilities: '%d')", o->supported_capabilities);
 }
 
+int odb_helper_get_object(struct odb_helper *o,
+			  const unsigned char *sha1,
+			  int fd)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	if (o->script_mode)
+		res = get_object_script(o, sha1, fd);
+	else
+		res = get_object_process(o, sha1, fd);
+
+	trace_performance_since(start, "odb_helper_get_object");
+
+	return res;
+}
+
 int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1)
diff --git a/sha1_file.c b/sha1_file.c
index 7b2a0f64fa..c5b6d89b97 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -675,7 +675,17 @@ int check_and_freshen_file(const char *fn, int freshen)
 
 static int check_and_freshen_local(const unsigned char *sha1, int freshen)
 {
-	return check_and_freshen_file(sha1_file_name(sha1), freshen);
+	int ret;
+	int tried_hook = 0;
+
+retry:
+	ret = check_and_freshen_file(sha1_file_name(sha1), freshen);
+	if (!ret && !tried_hook) {
+		tried_hook = 1;
+		if (!external_odb_get_direct(sha1))
+			goto retry;
+	}
+	return ret;
 }
 
 static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
@@ -1190,20 +1200,11 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	return (status < 0) ? status : 0;
 }
 
-int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
+static int find_cached_or_packed(const unsigned char *sha1, struct object_info *oi,
+				 unsigned flags, struct pack_entry *e, int retry)
 {
-	static struct object_info blank_oi = OBJECT_INFO_INIT;
-	struct pack_entry e;
-	int rtype;
-	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
-				    lookup_replace_object(sha1) :
-				    sha1;
-
-	if (!oi)
-		oi = &blank_oi;
-
 	if (!(flags & OBJECT_INFO_SKIP_CACHED)) {
-		struct cached_object *co = find_cached_object(real);
+		struct cached_object *co = find_cached_object(sha1);
 		if (co) {
 			if (oi->typep)
 				*(oi->typep) = co->type;
@@ -1222,9 +1223,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		}
 	}
 
-	if (!find_pack_entry(real, &e)) {
+	if (!find_pack_entry(sha1, e)) {
 		/* Most likely it's a loose object. */
-		if (!sha1_loose_object_info(real, oi, flags))
+		if (!sha1_loose_object_info(sha1, oi, flags))
 			return 0;
 
 		/* Not a loose object; someone else may have just packed it. */
@@ -1232,10 +1233,33 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 			return -1;
 		} else {
 			reprepare_packed_git();
-			if (!find_pack_entry(real, &e))
+			if (!find_pack_entry(sha1, e)) {
+				if (retry && !external_odb_get_direct(sha1))
+					return find_cached_or_packed(sha1, oi, flags, e, 0);
 				return -1;
+			}
 		}
 	}
+	return 1;
+}
+
+int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
+{
+	static struct object_info blank_oi = OBJECT_INFO_INIT;
+	struct pack_entry e;
+	int rtype;
+	enum object_type real_type;
+	int res;
+	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
+				    lookup_replace_object(sha1) :
+				    sha1;
+
+	if (!oi)
+		oi = &blank_oi;
+
+	res = find_cached_or_packed(real, oi, flags, &e, 1);
+	if (res < 1)
+		return res;
 
 	if (oi == &blank_oi)
 		/*
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 27/40] Add t0450 to test 'get_direct' mechanism
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (25 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 26/40] odb-helper: add init_object_process() Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 28/40] Add t0460 to test passing git objects Christian Couder
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Ben Peart

From: Ben Peart <benpeart@microsoft.com>

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0450-read-object.sh | 28 +++++++++++++++++++++
 t/t0450/read-object    | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)
 create mode 100755 t/t0450-read-object.sh
 create mode 100755 t/t0450/read-object

diff --git a/t/t0450-read-object.sh b/t/t0450-read-object.sh
new file mode 100755
index 0000000000..6b97305452
--- /dev/null
+++ b/t/t0450-read-object.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0450"
+
+test_expect_success 'setup host repo with a root commit' '
+	test_commit zero &&
+	hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+	git init guest-repo &&
+	(cd guest-repo &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+	cd guest-repo &&
+	test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0450/read-object b/t/t0450/read-object
new file mode 100755
index 0000000000..cf22e2f581
--- /dev/null
+++ b/t/t0450/read-object
@@ -0,0 +1,68 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#	cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard <sha from rev-parse call above>
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_direct");
+packet_write_capabilities("get_direct");
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_direct" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | GIT_NO_EXTERNAL_ODB=1 git hash-object -w --stdin >/dev/null 2>&1');
+
+		packet_txt_write(($?) ? "status=error" : "status=success");
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 28/40] Add t0460 to test passing git objects
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (26 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 27/40] Add t0450 to test 'get_direct' mechanism Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 29/40] odb-helper: add put_object_process() Christian Couder
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0460-read-object-git.sh | 28 +++++++++++++++++
 t/t0460/read-object-git    | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+)
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git

diff --git a/t/t0460-read-object-git.sh b/t/t0460-read-object-git.sh
new file mode 100755
index 0000000000..2873b445f3
--- /dev/null
+++ b/t/t0460-read-object-git.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process passing git objects'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0460"
+
+test_expect_success 'setup host repo with a root commit' '
+	test_commit zero &&
+	hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object-git"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+	git init guest-repo &&
+	(cd guest-repo &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+	cd guest-repo &&
+	test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0460/read-object-git b/t/t0460/read-object-git
new file mode 100755
index 0000000000..38529e622e
--- /dev/null
+++ b/t/t0460/read-object-git
@@ -0,0 +1,78 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#	cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard <sha from rev-parse call above>
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_git_obj");
+packet_write_capabilities("get_git_obj");
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_git_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $path = $sha1;
+		$path =~ s{..}{$&/};
+		$path = $DIR . "/objects/" . $path;
+
+		my $contents = do {
+		    local $/;
+		    open my $fh, $path or die "Can't open '$path': $!";
+		    <$fh>
+		};
+
+		packet_bin_write($contents);
+		packet_flush();
+		packet_txt_write("status=success");
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 29/40] odb-helper: add put_object_process()
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (27 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 28/40] Add t0460 to test passing git objects Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 30/40] Add t0470 to test passing raw objects Christian Couder
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This adds the infrastructure to send objects to a sub-process
handling the communication with an external odb.

For now we only handle sending raw blobs using the 'put_raw_obj'
instruction.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index 3148bcfa15..356f6172d8 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -433,6 +433,58 @@ static int get_object_process(struct odb_helper *o, const unsigned char *sha1, i
 					  o->cmd, cur_cap);
 }
 
+static int send_put_packets(struct object_process *entry,
+			    const unsigned char *sha1,
+			    const void *buf,
+			    size_t len,
+			    struct strbuf *status)
+{
+	struct child_process *process = &entry->subprocess.process;
+	int err = packet_write_fmt_gently(process->in, "command=put_raw_obj\n");
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "size=%"PRIuMAX"\n", len);
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "kind=blob\n");
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	err = write_packetized_from_buf(buf, len, process->in);
+	if (err)
+		return err;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int put_object_process(struct odb_helper *o,
+			      const void *buf, size_t len,
+			      const char *type, unsigned char *sha1)
+{
+	int err;
+	struct object_process *entry;
+	struct strbuf status = STRBUF_INIT;
+
+	entry = launch_object_process(o, ODB_HELPER_CAP_PUT_RAW_OBJ);
+	if (!entry)
+		return -1;
+
+	err = send_put_packets(entry, sha1, buf, len, &status);
+
+	return check_object_process_error(err, status.buf, entry, o->cmd,
+					  ODB_HELPER_CAP_PUT_RAW_OBJ);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -904,9 +956,9 @@ int odb_helper_get_object(struct odb_helper *o,
 	return res;
 }
 
-int odb_helper_put_object(struct odb_helper *o,
-			  const void *buf, size_t len,
-			  const char *type, unsigned char *sha1)
+static int put_raw_object_script(struct odb_helper *o,
+				 const void *buf, size_t len,
+				 const char *type, unsigned char *sha1)
 {
 	struct odb_helper_cmd cmd;
 
@@ -932,3 +984,20 @@ int odb_helper_put_object(struct odb_helper *o,
 	odb_helper_finish(o, &cmd);
 	return 0;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+			  const void *buf, size_t len,
+			  const char *type, unsigned char *sha1)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	if (o->script_mode)
+		res = put_raw_object_script(o, buf, len, type, sha1);
+	else
+		res = put_object_process(o, buf, len, type, sha1);
+
+	trace_performance_since(start, "odb_helper_put_object");
+
+	return res;
+}
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 30/40] Add t0470 to test passing raw objects
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (28 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 29/40] odb-helper: add put_object_process() Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 31/40] odb-helper: add have_object_process() Christian Couder
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0470-read-object-http-e-odb.sh | 109 ++++++++++++++++++++++++++++++++++++++
 t/t0470/read-object-plain         |  83 +++++++++++++++++++++++++++++
 2 files changed, 192 insertions(+)
 create mode 100755 t/t0470-read-object-http-e-odb.sh
 create mode 100755 t/t0470/read-object-plain

diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
new file mode 100755
index 0000000000..774528c04f
--- /dev/null
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process passing plain objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0470"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain"
+
+test_expect_success 'setup repo with a root commit' '
+	test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+	git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.subprocesscommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0470/read-object-plain b/t/t0470/read-object-plain
new file mode 100755
index 0000000000..918e7b00b5
--- /dev/null
+++ b/t/t0470/read-object-plain
@@ -0,0 +1,83 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_raw_obj", "put_raw_obj");
+packet_write_capabilities("get_raw_obj", "put_raw_obj");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $get_url = $http_url . "/list/?sha1=" . $sha1;
+
+		my $userAgent = LWP::UserAgent->new();
+
+		my $response = $userAgent->get( $get_url );
+
+		if ($response->is_error) {
+		    packet_txt_write("size=0");
+		    packet_txt_write("kind=none");	    
+		    packet_txt_write("status=notfound");
+		} else {
+		    packet_txt_write("size=" . length($response->content));
+		    packet_txt_write("kind=blob");
+		    packet_bin_write($response->content);
+		    packet_flush();
+		    packet_txt_write("status=success");
+		}
+
+		packet_flush();
+	} elsif ( $command eq "put_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		my ($size) = packet_txt_read() =~ /^size=([0-9]+)$/;
+		my ($kind) = packet_txt_read() =~ /^kind=(\w+)$/;
+		packet_bin_read();
+
+		# We must read the content we are sent and send it to the right url
+		my ($res, $buf) = packet_bin_read();
+		die "bad packet_bin_read res ($res)" unless ($res eq 0);
+		( packet_bin_read() eq ( 1, "" ) ) || die "bad send end";		
+
+		my $upload_url = $http_url . "/upload/?sha1=" . $sha1 . "&size=" . $size . "&type=blob";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $request = POST $upload_url, Content_Type => 'multipart/form-data', Content => $buf;
+
+		my $response = $userAgent->request($request);
+
+		if ($response->is_error) {
+			packet_txt_write("status=failure");
+		} else {
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 31/40] odb-helper: add have_object_process()
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (29 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 30/40] Add t0470 to test passing raw objects Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 32/40] Add t0480 to test "have" capability and raw objects Christian Couder
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This adds the infrastructure to handle 'have' instructions in
process mode.

The answer from the helper sub-process should be like the
output in script mode, that is lines like this:

sha1 SPACE size SPACE type NEWLINE

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/odb-helper.c b/odb-helper.c
index 356f6172d8..7433a4bfc6 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -634,6 +634,71 @@ static int odb_helper_object_cmp(const void *va, const void *vb)
 	return hashcmp(a->sha1, b->sha1);
 }
 
+static int send_have_packets(struct odb_helper *o,
+			     struct object_process *entry,
+			     struct strbuf *status)
+{
+	char *line;
+	int packet_len;
+	int total_got = 0;
+	struct child_process *process = &entry->subprocess.process;
+	int err = packet_write_fmt_gently(process->in, "command=have\n");
+
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+		char *p = buf;
+		int more;
+
+		packet_len = packet_read(process->out, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		total_got += packet_len;
+
+		/* 'have' packets should end with '\n' or '\0' */
+		do {
+			char *eol = strchrnul(p, '\n');
+			more = (*eol == '\n');
+			*eol = '\0';
+			if (add_have_entry(o, p))
+				break;
+			p = eol + 1;
+		} while (more && *p);
+	}
+
+	if (packet_len < 0)
+		return packet_len;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int have_object_process(struct odb_helper *o)
+{
+	int err;
+	struct object_process *entry;
+	struct strbuf status = STRBUF_INIT;
+
+	entry = launch_object_process(o, ODB_HELPER_CAP_HAVE);
+	if (!entry)
+		return -1;
+
+	err = send_have_packets(o, entry, &status);
+
+	return check_object_process_error(err, status.buf, entry, o->cmd,
+					  ODB_HELPER_CAP_HAVE);
+}
+
 static void have_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
@@ -655,12 +720,20 @@ static void have_object_script(struct odb_helper *o)
 
 static void odb_helper_load_have(struct odb_helper *o)
 {
+	uint64_t start;
+
 	if (o->have_valid)
 		return;
 	o->have_valid = 1;
 
+	start = getnanotime();
+
 	if (o->script_mode)
 		have_object_script(o);
+	else
+		have_object_process(o);
+
+	trace_performance_since(start, "odb_helper_load_have");
 
 	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 32/40] Add t0480 to test "have" capability and raw objects
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (30 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 31/40] odb-helper: add have_object_process() Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 33/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0480-read-object-have-http-e-odb.sh | 109 +++++++++++++++++++++++++++++++++
 t/t0480/read-object-plain-have         | 103 +++++++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100755 t/t0480-read-object-have-http-e-odb.sh
 create mode 100755 t/t0480/read-object-plain-have

diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
new file mode 100755
index 0000000000..056a40f2bb
--- /dev/null
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process with "have" cap and plain objects'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0480"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain-have"
+
+test_expect_success 'setup repo with a root commit' '
+	test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+	git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0480/read-object-plain-have b/t/t0480/read-object-plain-have
new file mode 100755
index 0000000000..d63e327f33
--- /dev/null
+++ b/t/t0480/read-object-plain-have
@@ -0,0 +1,103 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_raw_obj", "put_raw_obj", "have");
+packet_write_capabilities("get_raw_obj", "put_raw_obj", "have");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "have" ) {
+		# read the flush after the command
+		packet_bin_read();
+
+		my $have_url = $http_url . "/list/";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $response = $userAgent->get( $have_url );
+
+		if ($response->is_error) {
+			packet_bin_write("");
+			packet_flush();
+			packet_txt_write("status=failure");
+		} else {
+			packet_bin_write($response->content);
+			packet_flush();
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} elsif ( $command eq "get_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $get_url = $http_url . "/list/?sha1=" . $sha1;
+
+		my $userAgent = LWP::UserAgent->new();
+
+		my $response = $userAgent->get( $get_url );
+
+		if ($response->is_error) {
+			packet_txt_write("size=0");
+			packet_txt_write("kind=none");	    
+			packet_txt_write("status=notfound");
+		} else {
+			packet_txt_write("size=" . length($response->content));
+			packet_txt_write("kind=blob");
+			packet_bin_write($response->content);
+			packet_flush();
+			packet_txt_write("status=success");
+		}
+
+		packet_flush();
+	} elsif ( $command eq "put_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		my ($size) = packet_txt_read() =~ /^size=([0-9]+)$/;
+		my ($kind) = packet_txt_read() =~ /^kind=(\w+)$/;
+
+		packet_bin_read();
+
+		# We must read the content we are sent and send it to the right url
+		my ($res, $buf) = packet_bin_read();
+		die "bad packet_bin_read res ($res)" unless ($res eq 0);
+		( packet_bin_read() eq ( 1, "" ) ) || die "bad send end";		
+
+		my $upload_url = $http_url . "/upload/?sha1=" . $sha1 . "&size=" . $size . "&type=blob";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $request = POST $upload_url, Content_Type => 'multipart/form-data', Content => $buf;
+
+		my $response = $userAgent->request($request);
+
+		if ($response->is_error) {
+			packet_txt_write("status=failure");
+		} else {
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 33/40] external-odb: use 'odb=magic' attribute to mark odb blobs
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (31 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 32/40] Add t0480 to test "have" capability and raw objects Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 34/40] Add Documentation/technical/external-odb.txt Christian Couder
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To tell which blobs should be sent to the "magic" external odb,
let's require that the blobs be marked using the 'odb=magic'
attribute.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c                         | 22 ++++++++++++++++++++--
 external-odb.h                         |  3 ++-
 sha1_file.c                            | 20 +++++++++++++++-----
 t/t0400-external-odb.sh                |  3 +++
 t/t0410-transfer-e-odb.sh              |  3 +++
 t/t0420-transfer-http-e-odb.sh         |  3 +++
 t/t0470-read-object-http-e-odb.sh      |  3 +++
 t/t0480-read-object-have-http-e-odb.sh |  3 +++
 8 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 084cd32e0b..e103514a46 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "external-odb.h"
 #include "odb-helper.h"
+#include "attr.h"
 
 static struct odb_helper *helpers;
 static struct odb_helper **helpers_tail = &helpers;
@@ -155,8 +156,23 @@ int external_odb_get_object(const unsigned char *sha1)
 	return external_odb_do_get_object(sha1);
 }
 
+static int has_odb_attrs(struct odb_helper *o, const char *path)
+{
+	static struct attr_check *check;
+
+	if (!check)
+		check = attr_check_initl("odb", NULL);
+
+	if (!git_check_attr(path, check)) {
+		const char *value = check->items[0].value;
+		return value ? !strcmp(o->name, value) : 0;
+	}
+	return 0;
+}
+
 int external_odb_put_object(const void *buf, size_t len,
-			    const char *type, unsigned char *sha1)
+			    const char *type, unsigned char *sha1,
+			    const char *path)
 {
 	struct odb_helper *o;
 
@@ -164,12 +180,14 @@ int external_odb_put_object(const void *buf, size_t len,
 		return 1;
 
 	/* For now accept only blobs */
-	if (strcmp(type, "blob"))
+	if (!path || strcmp(type, "blob"))
 		return 1;
 
 	external_odb_init();
 
 	for (o = helpers; o; o = o->next) {
+		if (!has_odb_attrs(o, path))
+			continue;
 		int r = odb_helper_put_object(o, buf, len, type, sha1);
 		if (r <= 0)
 			return r;
diff --git a/external-odb.h b/external-odb.h
index 1fda08c0fb..93a3b35a04 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -6,6 +6,7 @@ extern int external_odb_has_object(const unsigned char *sha1);
 extern int external_odb_get_object(const unsigned char *sha1);
 extern int external_odb_get_direct(const unsigned char *sha1);
 extern int external_odb_put_object(const void *buf, size_t len,
-				   const char *type, unsigned char *sha1);
+				   const char *type, unsigned char *sha1,
+				   const char *path);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/sha1_file.c b/sha1_file.c
index c5b6d89b97..24fbc28eab 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1631,7 +1631,9 @@ static int freshen_packed_object(const unsigned char *sha1)
 	return 1;
 }
 
-int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1)
+static int write_sha1_file_with_path(const void *buf, unsigned long len,
+				     const char *type, unsigned char *sha1,
+				     const char *path)
 {
 	char hdr[32];
 	int hdrlen = sizeof(hdr);
@@ -1640,13 +1642,19 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	 * it out into .git/objects/??/?{38} file.
 	 */
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
-	if (!external_odb_put_object(buf, len, type, sha1))
+	if (!external_odb_put_object(buf, len, type, sha1, path))
 		return 0;
 	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
 
+int write_sha1_file(const void *buf, unsigned long len,
+		    const char *type, unsigned char *sha1)
+{
+	write_sha1_file_with_path(buf, len, type, sha1, NULL);
+}
+
 int hash_sha1_file_literally(const void *buf, unsigned long len, const char *type,
 			     struct object_id *oid, unsigned flags)
 {
@@ -1767,7 +1775,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	}
 
 	if (write_object)
-		ret = write_sha1_file(buf, size, typename(type), sha1);
+		ret = write_sha1_file_with_path(buf, size, typename(type),
+						sha1, path);
 	else
 		ret = hash_sha1_file(buf, size, typename(type), sha1);
 	if (re_allocated)
@@ -1789,8 +1798,9 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
-		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
-				      sha1);
+		ret = write_sha1_file_with_path(sbuf.buf, sbuf.len,
+						typename(OBJ_BLOB),
+						sha1, path);
 	else
 		ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
 				     sha1);
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 03df030461..492c772076 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -73,6 +73,9 @@ test_expect_success 'helper can add objects to alt repo' '
 
 test_expect_success 'commit adds objects to alt repo' '
 	test_config odb.magic.scriptCommand "$HELPER" &&
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit three &&
 	hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
 	content=$(cd alt-repo && git show "$hash3") &&
diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
index 065ec7d759..fd3e37918c 100755
--- a/t/t0410-transfer-e-odb.sh
+++ b/t/t0410-transfer-e-odb.sh
@@ -111,6 +111,9 @@ test_expect_success 'setup other repo and its alternate repo' '
 '
 
 test_expect_success 'new blobs are put in first object store' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	content=$(cd alt-repo1 && git show "$hash1") &&
diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
index f84fe950ec..d307af0457 100755
--- a/t/t0420-transfer-http-e-odb.sh
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -94,6 +94,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
index 774528c04f..d814a43d59 100755
--- a/t/t0470-read-object-http-e-odb.sh
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -62,6 +62,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
index 056a40f2bb..fe1fac5ef3 100755
--- a/t/t0480-read-object-have-http-e-odb.sh
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -62,6 +62,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 34/40] Add Documentation/technical/external-odb.txt
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (32 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 33/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 35/40] clone: add 'initial' param to write_remote_refs() Christian Couder
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This describes the external odb mechanism's purpose and
how it works.

Helped-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/technical/external-odb.txt | 342 +++++++++++++++++++++++++++++++
 1 file changed, 342 insertions(+)
 create mode 100644 Documentation/technical/external-odb.txt

diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt
new file mode 100644
index 0000000000..58ec8a8145
--- /dev/null
+++ b/Documentation/technical/external-odb.txt
@@ -0,0 +1,342 @@
+External ODBs
+^^^^^^^^^^^^^
+
+The External ODB mechanism makes it possible for Git objects, only
+blobs for now though, to be stored in an "external object database"
+(External ODB).
+
+An External ODB can be any object store as long as there is an helper
+program called an "odb helper" that can communicate with Git to
+transfer objects to/from the external odb and to retrieve information
+about available objects in the external odb.
+
+Purpose
+=======
+
+The purpose of this mechanism is to make possible to handle Git
+objects, especially blobs, in much more flexible ways.
+
+Currently Git can store its objects only in the form of loose objects
+in separate files or packed objects in a pack file. These existing
+object stores cannot be easily optimized for many different kind of
+contents.
+
+So the current stores are not flexible enough for some important use
+cases like handling really big binary files or handling a really big
+number of files that are fetched only as needed. And it is not
+realistic to expect that Git could fully natively handle many of such
+use cases. Git would need to natively implement different internal
+stores which would be a huge burden and which could lead to
+re-implement things like HTTP servers, Docker registries or artifact
+stores that already exist outside Git.
+
+Furthermore many improvements that are dependent on specific setups
+could be implemented in the way Git objects are managed if it was
+possible to customize how the Git objects are handled. For example a
+restartable clone using the bundle mechanism has often been requested,
+but implementing that would go against the current strict rules under
+which the Git objects are currently handled.
+
+What Git needs is a mechanism to make it possible to customize in a
+lot of different ways how the Git objects are handled. Though this
+mechanism should try as much as possible to avoid interfering with the
+usual way in which Git handle its objects.
+
+Helpers
+=======
+
+ODB helpers are commands that have to be registered using either the
+"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
+config variables.
+
+Registering such a command tells Git that an external odb called
+<odbname> exists and that the registered command should be used to
+communicate with it.
+
+The communication happens through instructions that are sent by Git
+and that the commands should answer. If it makes sense, Git can send
+the same instruction to many commands in the order in which they are
+configured.
+
+There are 2 kinds of commands. Commands registered using the
+"odb.<odbname>.subprocessCommand" config variable are called "process
+commands" and the associated mode is called "process mode". Commands
+registered using the "odb.<odbname>.scriptCommand" config variables
+are called "script commands" and the associated mode is called "script
+mode".
+
+Early on git commands send an 'init' instruction to the registered
+commands. A capability negociation will take place during this
+request/response exchange which will let Git and the helpers know how
+they can further collaborate. The attribute system can also be used to
+tell Git which objects should be handled by which helper.
+
+Process Mode
+============
+
+In process mode the command is started as a single process invocation
+that should last for the entire life of the single Git command that
+started it.
+
+A packet format (pkt-line, see technical/protocol-common.txt) based
+protocol over standard input and standard output is used for
+communication between Git and the helper command.
+
+After the process command is started, Git sends a welcome message
+("git-read-object-client"), a list of supported protocol version
+numbers, and a flush packet. Git expects to read a welcome response
+message ("git-read-object-server"), exactly one protocol version
+number from the previously sent list, and a flush packet. All further
+communication will be based on the selected version.
+
+The remaining protocol description below documents "version=1". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities
+that it supports and a flush packet. Git expects to read a list of
+desired capabilities, which must be a subset of the supported
+capabilities list, and a flush packet as response:
+
+------------------------
+packet: git> git-read-object-client
+packet: git> version=1
+packet: git> version=42
+packet: git> 0000
+packet: git< git-read-object-server
+packet: git< version=1
+packet: git< 0000
+packet: git> capability=get_raw_obj
+packet: git> capability=have
+packet: git> capability=put_raw_obj
+packet: git> capability=not-yet-invented
+packet: git> 0000
+packet: git< capability=get_raw_obj
+packet: git< 0000
+------------------------
+
+Afterwards Git sends a list of "key=value" pairs terminated with a
+flush packet. The list will contain at least the instruction (based on
+the supported capabilities) and the arguments for the
+instruction. Please note, that the process must not send any response
+before it received the final flush packet.
+
+In general any response from the helper should end with a status
+packet. See the documentation of the 'get_*' instructions below for
+examples of status packets.
+
+After the helper has processed an instruction, it is expected to wait
+for the next "key=value" list containing another instruction.
+
+On exit Git will close the pipe to the helper. The helper is then
+expected to detect EOF and exit gracefully on its own. Git will wait
+until the process has stopped.
+
+Script Mode
+===========
+
+In this mode Git launches the script command each time it wants to
+communicates with the helper. There is no welcome message and no
+protocol version in this mode.
+
+The instruction and associated arguments are passed as arguments when
+launching the script command and if needed further information is
+passed between Git and the command through stdin and stdout.
+
+Capabilities/Instructions
+=========================
+
+The following instructions are currently supported by Git:
+
+- init
+- get_git_obj
+- get_raw_obj
+- get_direct
+- put_raw_obj
+- have
+
+The plan is to also support 'put_git_obj' and 'put_direct' soon, for
+consistency with the 'get_*' instructions.
+
+ - 'init'
+
+All the process and script commands must accept the 'init'
+instruction. It should be the first instruction sent to a command. It
+should not be advertised in the capability exchange. Any argument
+should be ignored.
+
+In process mode, after receiving the 'init' instruction and a flush
+packet, the helper should just send a status packet and then a flush
+packet. See the 'get_*' instructions below for examples of status
+packets.
+
+In script mode the command should print on stdout the capabilities
+that it supports if any. This is the only time in script mode when a
+capability exchange happens.
+
+For example a script command could use the following shell code
+snippet to handle the 'init' instruction:
+
+------------------------
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=put_raw_obj"
+	echo "capability=have"
+	;;
+------------------------
+
+ - 'get_git_obj <sha1>' and 'get_raw_obj <sha1>'
+
+These instructions should have a hexadecimal <sha1> argument to tell
+which object the helper should send to git.
+
+In process mode the sha1 argument should be followed by a flush packet
+like this:
+
+------------------------
+packet: git> command=get_git_obj
+packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
+packet: git> 0000
+------------------------
+
+After reading that the helper should send the requested object to Git in a
+packet series followed by a flush packet. If the helper does not experience
+problems then the helper must send a "success" status like the following:
+
+------------------------
+packet: git< status=success
+packet: git< 0000
+------------------------
+
+In case the helper cannot or does not want to send the requested
+object as well as any other object for the lifetime of the Git
+process, then it is expected to respond with an "abort" status at any
+point in the protocol:
+
+------------------------
+packet: git< status=abort
+packet: git< 0000
+------------------------
+
+Git neither stops nor restarts the helper in case a
+"notfound"/"error"/"abort" status is set. An "error" status means a
+possibly more transient error than an abort. The helper should also
+send a "notfound" error in case of a "get_*" instruction, which means
+that the requested object cannot be found.
+
+If the helper dies during the communication or does not adhere to the
+protocol then Git will stop and restart it with the next instruction.
+
+In script mode the helper should just send the requested object to Git
+by writing it to stdout and should then exit. The exit code should
+signal to Git if a problem occured or not.
+
+The only difference between 'get_git_obj' and 'get_raw_obj' is that in
+case of 'get_git_obj' the requested object should be sent as a Git
+object (that is in the same format as loose object files). In case of
+'get_raw_obj' the object should be sent in its raw format (that is the
+same output as `git cat-file <type> <sha1>`).
+
+ - 'get_direct <sha1>'
+
+This instruction is similar as the other 'get_*' instructions except
+that no object should be sent from the helper to Git. Instead the
+helper should directly write the requested object into a loose object
+file in the ".git/objects" directory.
+
+After the helper has sent the "status=success" packet and the
+following flush packet in process mode, or after it has exited in the
+script mode, Git will lookup again for the requested sha1 in its loose
+object files and pack files.
+
+ - 'put_raw_obj <sha1> <size> <type>'
+
+This instruction should be following by three arguments to tell which
+object the helper will receive from git: <sha1>, <size> and
+<type>. The hexadecimal <sha1> argument describes the object that will
+be sent from Git to the helper. The <type> is the object type ("blob",
+"tree", "commit" or "tag") of this object. The <size> is the size in
+bytes of the (decompressed) object content.
+
+In process mode the last argument (the type) should be followed by a
+flush packet.
+
+After reading that the helper should read the announced object from
+Git in a packet series followed by a flush packet.
+
+If the helper does not experience problems when receiving and storing
+or processing the object, then the helper must send a "success" status
+as described for the 'get_*' instructions.
+
+In script mode the helper should just receive the announced object
+from its standard input. After receiving and processing the object,
+the helper should exit and its exit code should signal to Git if a
+problem occured or not.
+
+- 'have'
+
+In process mode this instruction should be followed by a flush
+packet. After receiving this packet the helper should send the sha1,
+size and type, in this order, of all the objects it can provide to Git
+(through a 'get_*' instruction). There should be a space character
+between the sha1 and the size and between the size and the type, and
+then a new line character after the type.
+
+If many packets are needed to send back all this information, the
+split between packets should be made after the new line characters.
+
+If the helper does not experience problems, then it must then send a
+"success" status as described for the 'get_*' instructions.
+
+In script mode the helper should send to its standard output the sha1,
+size and type, in this order of all the objects it can provide to
+Git. There should also be a space character between the sha1 and the
+size and between the size and the type, and then a new line character
+after the type.
+
+After sending this, the script helper should exit and its exit code
+should signal to Git if a problem occured or not.
+
+Order of instructions
+=====================
+
+For get_*_object instructions the regular code to find objects is
+called before the odb helpers.
+
+For put_*_object instructions the regular code to store the objects is
+called after the odb helpers.
+
+For now this order is not configurable.
+
+Object caching
+==============
+
+If a helper returns the object data as requested by get_git_obj or
+get_raw_obj, then Git will itself store the object locally in its
+regular object store, so it is redundant for the helper to also store
+or try to store the object in the regular object store.
+
+Yeah, this seems to defeat the goal of enabling specialized object
+handlers to handle large or other "unusual" objects that git normally
+doesn't deal well with. So in the long run there should be a way to
+make this configurable.
+
+Selecting objects
+=================
+
+To select objects that should be handled by an external odb, one can
+use the git attributes system. For now this will only work with blobs
+and this will only work along with the 'put_raw_obj' instruction.
+
+For example if one has an external odb called "magic" and has
+registered an associated a process command helper that supports the
+'put_raw_obj' instruction, then one can tell Git that all the .jpg
+files should be handled by the "magic" odb using a .gitattributes file
+can that contains:
+
+------------------------
+*.jpg           odb=magic
+------------------------
+
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 35/40] clone: add 'initial' param to write_remote_refs()
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (33 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 34/40] Add Documentation/technical/external-odb.txt Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 36/40] clone: add --initial-refspec option Christian Couder
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

We want to make it possible to separate fetching remote refs into
an initial part and a later part. To prepare for that, let's add
an 'initial' boolean parameter to write_remote_refs() to tell this
function if we are performing the initial part or not.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index dcd5b878f1..2e5d60521d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -574,7 +574,7 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
 	return local_refs;
 }
 
-static void write_remote_refs(const struct ref *local_refs)
+static void write_remote_refs(const struct ref *local_refs, int initial)
 {
 	const struct ref *r;
 
@@ -593,8 +593,13 @@ static void write_remote_refs(const struct ref *local_refs)
 			die("%s", err.buf);
 	}
 
-	if (initial_ref_transaction_commit(t, &err))
-		die("%s", err.buf);
+	if (initial) {
+		if (initial_ref_transaction_commit(t, &err))
+			die("%s", err.buf);
+	} else {
+		if (ref_transaction_commit(t, &err))
+			die("%s", err.buf);
+	}
 
 	strbuf_release(&err);
 	ref_transaction_free(t);
@@ -641,7 +646,8 @@ static void update_remote_refs(const struct ref *refs,
 			       const char *branch_top,
 			       const char *msg,
 			       struct transport *transport,
-			       int check_connectivity)
+			       int check_connectivity,
+			       int initial)
 {
 	const struct ref *rm = mapped_refs;
 
@@ -656,7 +662,7 @@ static void update_remote_refs(const struct ref *refs,
 	}
 
 	if (refs) {
-		write_remote_refs(mapped_refs);
+		write_remote_refs(mapped_refs, initial);
 		if (option_single_branch && !option_no_tags)
 			write_followtags(refs, msg);
 	}
@@ -1168,7 +1174,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_fetch_refs(transport, mapped_refs);
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
-			   branch_top.buf, reflog_msg.buf, transport, !is_local);
+			   branch_top.buf, reflog_msg.buf, transport,
+			   !is_local, 0);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 36/40] clone: add --initial-refspec option
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (34 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 35/40] clone: add 'initial' param to write_remote_refs() Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 37/40] clone: disable external odb before initial clone Christian Couder
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This option makes it possible to separate fetching refs when cloning
in two parts, an initial part and a later normal part.

This way after the initial part, mechanisms like the external odb
mechanism can be used to prefetch some objects using information
that has been made available during the initial fetch.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 2e5d60521d..57cecd194c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -57,6 +57,7 @@ static enum transport_family family;
 static struct string_list option_config = STRING_LIST_INIT_NODUP;
 static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
 static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
+static struct string_list option_initial_refspec = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
@@ -107,6 +108,8 @@ static struct option builtin_clone_options[] = {
 			N_("reference repository")),
 	OPT_STRING_LIST(0, "reference-if-able", &option_optional_reference,
 			N_("repo"), N_("reference repository")),
+	OPT_STRING_LIST(0, "initial-refspec", &option_initial_refspec,
+			N_("refspec"), N_("fetch this refspec first")),
 	OPT_BOOL(0, "dissociate", &option_dissociate,
 		 N_("use --reference only while cloning")),
 	OPT_STRING('o', "origin", &option_origin, N_("name"),
@@ -869,6 +872,47 @@ static void dissociate_from_references(void)
 	free(alternates);
 }
 
+static struct refspec *parse_initial_refspecs(void)
+{
+	const char **refspecs;
+	struct refspec *initial_refspecs;
+	struct string_list_item *rs;
+	int i = 0;
+
+	if (!option_initial_refspec.nr)
+		return NULL;
+
+	refspecs = xcalloc(option_initial_refspec.nr, sizeof(const char *));
+
+	for_each_string_list_item(rs, &option_initial_refspec)
+		refspecs[i++] = rs->string;
+
+	initial_refspecs = parse_fetch_refspec(option_initial_refspec.nr, refspecs);
+
+	free(refspecs);
+
+	return initial_refspecs;
+}
+
+static void fetch_initial_refs(struct transport *transport,
+			       const struct ref *refs,
+			       struct refspec *initial_refspecs,
+			       const char *branch_top,
+			       const char *reflog_msg,
+			       int is_local)
+{
+	int i;
+
+	for (i = 0; i < option_initial_refspec.nr; i++) {
+		struct ref *init_refs = NULL;
+		struct ref **tail = &init_refs;
+		get_fetch_map(refs, &initial_refspecs[i], &tail, 0);
+		transport_fetch_refs(transport, init_refs);
+		update_remote_refs(refs, init_refs, NULL, branch_top, reflog_msg,
+				   transport, !is_local, 1);
+	}
+}
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
 	int is_bundle = 0, is_local;
@@ -892,6 +936,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct refspec *refspec;
 	const char *fetch_pattern;
 
+	struct refspec *initial_refspecs;
+	int is_initial;
+
 	packet_trace_identity("clone");
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
 			     builtin_clone_usage, 0);
@@ -1059,6 +1106,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
+	initial_refspecs = parse_initial_refspecs();
+
 	fetch_pattern = xstrfmt("+%s*:%s*", src_ref_prefix, branch_top.buf);
 	refspec = parse_fetch_refspec(1, &fetch_pattern);
 	free((char *)fetch_pattern);
@@ -1114,6 +1163,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	refs = transport_get_remote_refs(transport);
 
 	if (refs) {
+		fetch_initial_refs(transport, refs, initial_refspecs,
+				   branch_top.buf, reflog_msg.buf, is_local);
+
 		mapped_refs = wanted_peer_refs(refs, refspec);
 		/*
 		 * transport_get_remote_refs() may return refs with null sha-1
@@ -1173,9 +1225,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	else if (refs && complete_refs_before_fetch)
 		transport_fetch_refs(transport, mapped_refs);
 
+	is_initial = !refs || option_initial_refspec.nr == 0;
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
 			   branch_top.buf, reflog_msg.buf, transport,
-			   !is_local, 0);
+			   !is_local, is_initial);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 37/40] clone: disable external odb before initial clone
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (35 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 36/40] clone: add --initial-refspec option Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 38/40] Add tests for 'clone --initial-refspec' Christian Couder
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To make it possible to have the external odb mechanism only kick in
after the initial part of a clone, we should disable it during the
initial part of the clone.

Let's do that by saving and then restoring the value of the
'use_external_odb' global variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 57cecd194c..323b73016e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -938,6 +938,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	struct refspec *initial_refspecs;
 	int is_initial;
+	int saved_use_external_odb;
 
 	packet_trace_identity("clone");
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
@@ -1083,6 +1084,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	/* Temporarily disable external ODB before initial clone */
+	saved_use_external_odb = use_external_odb;
+	use_external_odb = 0;
+
 	if (option_bare) {
 		if (option_mirror)
 			src_ref_prefix = "refs/";
@@ -1166,6 +1171,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		fetch_initial_refs(transport, refs, initial_refspecs,
 				   branch_top.buf, reflog_msg.buf, is_local);
 
+		use_external_odb = saved_use_external_odb;
+
 		mapped_refs = wanted_peer_refs(refs, refspec);
 		/*
 		 * transport_get_remote_refs() may return refs with null sha-1
@@ -1207,6 +1214,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 					option_branch, option_origin);
 
 		warning(_("You appear to have cloned an empty repository."));
+
+		use_external_odb = saved_use_external_odb;
+
 		mapped_refs = NULL;
 		our_head_points_at = NULL;
 		remote_head_points_at = NULL;
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 38/40] Add tests for 'clone --initial-refspec'
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (36 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 37/40] clone: disable external odb before initial clone Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 39/40] Add t0430 to test cloning using bundles Christian Couder
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0420-transfer-http-e-odb.sh         |  7 +++++
 t/t0470-read-object-http-e-odb.sh      |  7 +++++
 t/t0480-read-object-have-http-e-odb.sh |  7 +++++
 t/t5616-clone-initial-refspec.sh       | 48 ++++++++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+)
 create mode 100755 t/t5616-clone-initial-refspec.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
index d307af0457..ed833850c3 100755
--- a/t/t0420-transfer-http-e-odb.sh
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -140,6 +140,13 @@ test_expect_success 'no-local clone from the first repo with helper succeeds' '
 	rm -rf my-other-clone
 '
 
+test_expect_success 'no-local initial-refspec clone succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git -c odb.magic.scriptCommand="$HELPER" \
+		clone --no-local --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
 stop_httpd
 
 test_done
diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
index d814a43d59..7355ca4d51 100755
--- a/t/t0470-read-object-http-e-odb.sh
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -107,6 +107,13 @@ test_expect_success 'no-local clone from the first repo with helper succeeds' '
 	rm -rf my-other-clone
 '
 
+test_expect_success 'no-local initial-refspec clone succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git -c odb.magic.subprocessCommand="$HELPER" \
+		clone --no-local --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
 stop_httpd
 
 test_done
diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
index fe1fac5ef3..c451d269a7 100755
--- a/t/t0480-read-object-have-http-e-odb.sh
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -107,6 +107,13 @@ test_expect_success 'no-local clone from the first repo with helper succeeds' '
 	rm -rf my-other-clone
 '
 
+test_expect_success 'no-local initial-refspec clone succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git -c odb.magic.subprocessCommand="$HELPER" \
+		clone --no-local --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
 stop_httpd
 
 test_done
diff --git a/t/t5616-clone-initial-refspec.sh b/t/t5616-clone-initial-refspec.sh
new file mode 100755
index 0000000000..ccbc27f83f
--- /dev/null
+++ b/t/t5616-clone-initial-refspec.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+test_description='test clone with --initial-refspec option'
+. ./test-lib.sh
+
+
+test_expect_success 'setup regular repo' '
+	# Make two branches, "master" and "side"
+	echo one >file &&
+	git add file &&
+	git commit -m one &&
+	echo two >file &&
+	git commit -a -m two &&
+	git tag two &&
+	echo three >file &&
+	git commit -a -m three &&
+	git checkout -b side &&
+	echo four >file &&
+	git commit -a -m four &&
+	git checkout master
+'
+
+test_expect_success 'add a special ref pointing to a blob' '
+	hash=$(echo "Hello world!" | git hash-object -w -t blob --stdin) &&
+	git update-ref refs/special/hello "$hash"
+'
+
+test_expect_success 'no-local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone --no-local .. . &&
+	 test_must_fail git cat-file blob "$hash") &&
+	rm -rf my-clone
+'
+
+test_expect_success 'no-local clone with --initial-refspec' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone --no-local --initial-refspec "refs/special/*:refs/special/*" .. . &&
+	 git cat-file blob "$hash" &&
+	 git rev-parse refs/special/hello >actual &&
+	 echo "$hash" >expected &&
+	 test_cmp expected actual) &&
+	rm -rf my-clone
+'
+
+test_done
+
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 39/40] Add t0430 to test cloning using bundles
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (37 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 38/40] Add tests for 'clone --initial-refspec' Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-09-16  8:07 ` [PATCH v6 40/40] Doc/external-odb: explain transfering objects and metadata Christian Couder
  2017-10-02 14:18 ` [PATCH v6 00/40] Add initial experimental external ODB support Ben Peart
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0430-clone-bundle-e-odb.sh | 85 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)
 create mode 100755 t/t0430-clone-bundle-e-odb.sh

diff --git a/t/t0430-clone-bundle-e-odb.sh b/t/t0430-clone-bundle-e-odb.sh
new file mode 100755
index 0000000000..ac38ae1be5
--- /dev/null
+++ b/t/t0430-clone-bundle-e-odb.sh
@@ -0,0 +1,85 @@
+#!/bin/sh
+
+test_description='tests for cloning using a bundle through e-odb'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-clone-bundle-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+echo >&2 "odb-clone-bundle-helper args:" "$@"
+case "$1" in
+init)
+	ref_hash=$(git rev-parse refs/odbs/magic/bundle) ||
+	die "couldn't find refs/odbs/magic/bundle"
+	GIT_NO_EXTERNAL_ODB=1 git cat-file blob "$ref_hash" >bundle_info ||
+	die "couldn't get blob $ref_hash"
+	bundle_url=$(sed -e 's/bundle url: //' bundle_info)
+	echo >&2 "bundle_url: '$bundle_url'"
+	curl "$bundle_url" -o bundle_file ||
+	die "curl '$bundle_url' failed"
+	GIT_NO_EXTERNAL_ODB=1 git bundle unbundle bundle_file >unbundling_info ||
+	die "unbundling 'bundle_file' failed"
+	;;
+get*)
+	die "odb-clone-bundle-helper '$1' called"
+	;;
+put*)
+	die "odb-clone-bundle-helper '$1' called"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-clone-bundle-helper"
+
+
+test_expect_success 'setup repo with a few commits' '
+	test_commit one &&
+	test_commit two &&
+	test_commit three &&
+	test_commit four
+'
+
+BUNDLE_FILE="file.bundle"
+FILES_DIR="httpd/www/files"
+GET_URL="$HTTPD_URL/files/$BUNDLE_FILE"
+
+test_expect_success 'create a bundle for this repo and check that it can be downloaded' '
+	git bundle create "$BUNDLE_FILE" master &&
+	mkdir "$FILES_DIR" &&
+	cp "$BUNDLE_FILE" "$FILES_DIR/" &&
+	curl "$GET_URL" --output actual &&
+	test_cmp "$BUNDLE_FILE" actual
+'
+
+test_expect_success 'create an e-odb ref for this bundle' '
+	ref_hash=$(echo "bundle url: $GET_URL" | GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) &&
+	git update-ref refs/odbs/magic/bundle "$ref_hash"
+'
+
+test_expect_success 'clone using the e-odb helper to download and install the bundle' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone --no-local \
+		-c odb.magic.scriptCommand="$HELPER" \
+		--initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
+stop_httpd
+
+test_done
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v6 40/40] Doc/external-odb: explain transfering objects and metadata
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (38 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 39/40] Add t0430 to test cloning using bundles Christian Couder
@ 2017-09-16  8:07 ` Christian Couder
  2017-10-02 14:18 ` [PATCH v6 00/40] Add initial experimental external ODB support Ben Peart
  40 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-09-16  8:07 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/technical/external-odb.txt | 105 +++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt
index 58ec8a8145..76dd1e2e6c 100644
--- a/Documentation/technical/external-odb.txt
+++ b/Documentation/technical/external-odb.txt
@@ -340,3 +340,108 @@ can that contains:
 *.jpg           odb=magic
 ------------------------
 
+Transfering objects
+===================
+
+When an external odb helper is configured, the objects managed by the
+external odb are not put in the pack file that is sent (when pushing
+or answering clone and fetch requests), so the receiver should also
+have configured an external odb helper that can get the missing
+objects otherwise Git will error out complaining about missing
+objects.
+
+This has some drawbacks of course, but at least it makes sure that
+users' and admins' repositories are both properly configured to use a
+common external ODB before they can talk to each other.
+
+Transfering meta information and restartable clone
+==================================================
+
+There are different ways to make it possible for the external odb
+helpers to know which services they should get the objects from (or
+put them into), for example the information could be hardcoded into
+the helpers, or the information could be computed from configuration
+information like the url of the "origin" remote.
+
+The external odb mechanism itself doesn't really take care of this, so
+helpers are free to do whatever they want.
+
+One interesting possibility though is to have this information as part
+of the repository in special refs, for example refs/odb/magic/*, where
+"magic" is the external odb name.
+
+This would especially make it possible to implement a restartable
+clone using Git bundles (and an external odb helper) like this:
+
+	1) At the very start of the clone, Git would fetch the refs
+	that contain "meta information", for example refs/odb/magic/*
+	(where "magic" is the odb name). These refs would point to
+	some blobs that contain lists of the bundles that are
+	available for fetching by the helper, along with enough
+	information for the helper to fetch them (for example HTTP
+	urls of the bundles).
+
+	2) After this first fetch of the refs/odb/magic/* refs, the
+	helper would be sent the 'init' instruction. At that time it
+	can read all the blobs pointed to by these refs and download
+	the bundles listed in the blobs.
+
+	If something goes wrong when the helper "fetches" a bundle,
+	the helper could force the clone to error out (after maybe
+	retrying), and when the user (or the helper itself) tries
+	again to clone, the helper would restart its bundle "fetch"
+	(using the restartable protocol, for example HTTP).
+
+	When this "fetch" eventually succeeds, then the helper will
+	unbundle what it received, and then give back control to the
+	second regular part of the clone.
+
+	3) This regular part of the clone will then try to fetch the
+	usual refs, but as the unbundling has already updated the
+	content of the usual refs as well as the object stores this
+	fetch will find that everything is up-to-date.
+
+	Or if everything is not quite up-to-date and there are still
+	things to fetch, another hopefully much small regular fetch
+	will happen.
+
+As this is an interesting use of the external odb mechanism, the
+`--initial-refspec` option has been implemented in `git clone`. This
+makes it possible to perform all the above steps using a single clone
+command like:
+
+------------------------
+$ git clone -c odb.magic.scriptCommand="$HELPER" \
+  --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" "$URL"
+------------------------
+
+But note that the above could also be performed using:
+
+------------------------
+$ git init
+$ git remote add origin "$URL"
+$ git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*"
+$ git config odb.magic.scriptCommand "$HELPER"
+$ git fetch origin
+------------------------
+
+So the `--initial-refspec` option can be seen as just a shortcut to
+simplify external odb helped clones for users.
+
+Also note that this `--initial-refspec` approach could be slower than
+a regular clone, so it is mostly interesting if one wants to fetch a
+big number of objects or many big objects, like for an initial clone
+of a big repo. In this use case a relatively small amount of time
+spent in the initial fetch is an acceptable trade-off if the clone is
+restartable.
+
+Though in some cases, as the `--initial-refspec` clone could alleviate
+resource usage of the Git server, it could be even faster than a
+regular clone.
+
+So admins and users should not blindly use the `--initial-refspec`
+option all the time when an external odb is configured. But using an
+external odb in the first place means that they have specific
+requirements for handling objects which suggests that the regular way
+to clone might not be very good for their use cases and for the
+objects that are stored in their external ODBs.
-- 
2.14.1.576.g3f707d88cd


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 09/40] Add initial external odb support
  2017-09-16  8:07 ` [PATCH v6 09/40] Add initial external odb support Christian Couder
@ 2017-09-19 17:45   ` Jonathan Tan
  2017-09-27 16:46     ` Christian Couder
  0 siblings, 1 reply; 49+ messages in thread
From: Jonathan Tan @ 2017-09-19 17:45 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

I wonder if it's better to get a change like this (PATCH v6 09/40 and
any of the previous patches that this depends on) in and then build on
it rather than to review the whole patch set at a time.  This would
reduce ripple effects (of needing to change later patches in a patch set
multiple times unnecessarily) and help collaboration (in that multiple
people can write patches, since the foundation would already be laid).

The same concerns about fsck apply, but that shouldn't be a problem,
since this patch provides the same internal API as mine ("get" function
taking in a single hash, and "have" function taking in a single hash) so
it shouldn't be too difficult to adapt my fsck and gc patches [1]. (I
can do that if necessary.)

[1] https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/

One possible issue (with committing something early) is that later work
(for example, a fast long-running process protocol) will make the
earlier work (for example, here, a simple single-shot protocol)
obsolete, while saddling us with the necessity of maintaining the
earlier one. To that end, if we want to start with the support for a
hook, a better approach might be to only code the fast long-running
process protocol, and put a wrapper script in contrib/ that can wrap a
single-shot process in a long-running process.

And another possible issue is that we design ourselves into a corner.
Thinking about the use cases that I know about (the Android use case and
the Microsoft GVFS use case), I don't think we are doing that - for
Android, this means that large blob metadata needs to be part of the
design (and this patch series does provide for that), and for Microsoft
GVFS, "get" is relatively cheap, so a configuration option to not invoke
"have" first when loading a missing object might be sufficient.

As for the design itself (including fetch and clone), it differs from my
patches (linked above as [1]) in that mine is self-contained (requiring
only an updated Git server and Git client) whereas this, as far as I can
tell, requires an external process and some measure of coordination
between the administrator of the server and the client user (for
example, the client must have the same ODB mechanism as the server, if
not, the server might omit certain blobs that the client does not know
how to fetch).

And I think that my design can be extended to support a use case in
which, for example, blobs corresponding to a certain type of filename
(defined by a glob like in gitattributes) can be excluded during
fetch/clone, much like --blob-max-bytes, and they can be fetched either
through the built-in mechanism or through a custom hook.

For those reasons, I still lean towards my design, but if we do want to
go with this design, here are my comments about this patch...

First of all:
 - You'll probably need to add a repository extension.
 - I get compile errors when I "git am" these onto master. I think
   '#include "config.h"' is needed in some places.

On Sat, 16 Sep 2017 10:07:00 +0200
Christian Couder <christian.couder@gmail.com> wrote:

> The external-odb.{c,h} files contains the functions that are
> called by the rest of Git from "sha1_file.c".
> 
> The odb-helper.{c,h} files contains the functions to
> actually implement communication with the external scripts or
> processes that will manage external git objects.
> 
> For now only script mode is supported, and only the 'have' and
> 'get_git_obj' instructions are supported.

This "have", as I see from this commit, is more like a "list" command in
that it lists all hashes that it knows about, and does not check if a
given hash exists.

> +static struct odb_helper *helpers;
> +static struct odb_helper **helpers_tail = &helpers;

This could be done with the helpers in list.h instead.

> +int external_odb_get_object(const unsigned char *sha1)
> +{
> +	struct odb_helper *o;
> +	const char *path;
> +
> +	if (!external_odb_has_object(sha1))
> +		return -1;
> +
> +	path = sha1_file_name_alt(external_odb_root(), sha1);

If the purpose of making these functions global in the previous patch is
just for temporary names, I don't think it's necessary for them to be
global. Just concatenate the hex SHA1 to external_odb_root()?

>  /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
> @@ -667,7 +684,7 @@ static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
>  		if (check_and_freshen_file(path, freshen))
>  			return 1;
>  	}
> -	return 0;
> +	return external_odb_has_object(sha1);
>  }
>  
>  static int check_and_freshen(const unsigned char *sha1, int freshen)
> @@ -824,6 +841,9 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
>  			return 0;
>  	}
>  
> +	if (!external_odb_get_object(sha1) && !lstat(*path, st))
> +		return 0;
> +
>  	return -1;
>  }
>  
> @@ -859,7 +879,14 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
>  	if (fd >= 0)
>  		return fd;
>  
> -	return open_sha1_file_alt(sha1, path);
> +	fd = open_sha1_file_alt(sha1, path);
> +	if (fd >= 0)
> +		return fd;
> +
> +	if (!external_odb_get_object(sha1))
> +		fd = open_sha1_file_alt(sha1, path);
> +
> +	return fd;
>  }

Any reason why you prefer to update the loose object functions than to
update the generic one (sha1_object_info_extended)? My concern with just
updating the loose object functions was that a caller might have
obtained the path by iterating through the loose object dirs, and in
that case we shouldn't query the external ODB for anything.

> +ALT_SOURCE="$PWD/alt-repo/.git"
> +export ALT_SOURCE
> +write_script odb-helper <<\EOF
> +GIT_DIR=$ALT_SOURCE; export GIT_DIR
> +case "$1" in
> +have)
> +	git cat-file --batch-check --batch-all-objects |
> +	awk '{print $1 " " $3 " " $2}'
> +	;;
> +get_git_obj)
> +	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
> +	;;
> +esac
> +EOF
> +HELPER="\"$PWD\"/odb-helper"

Thanks for the clear test. It is very obvious that "have" returns a list
of objects, and "get_git_obj" returns the compressed loose object with
the Git loose object header included.

> +
> +test_expect_success 'setup alternate repo' '
> +	git init alt-repo &&
> +	(cd alt-repo &&
> +	 test_commit one &&

Probably better written as "test_commit -C alt-repo one".

> +	 test_commit two
> +	) &&
> +	alt_head=`cd alt-repo && git rev-parse HEAD`

I think the style is to use $() and "git -C alt-repo rev-parse HEAD".

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 09/40] Add initial external odb support
  2017-09-19 17:45   ` Jonathan Tan
@ 2017-09-27 16:46     ` Christian Couder
  2017-09-29 20:36       ` Jonathan Tan
  0 siblings, 1 reply; 49+ messages in thread
From: Christian Couder @ 2017-09-27 16:46 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Tue, Sep 19, 2017 at 7:45 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> I wonder if it's better to get a change like this (PATCH v6 09/40 and
> any of the previous patches that this depends on) in and then build on
> it rather than to review the whole patch set at a time.  This would
> reduce ripple effects (of needing to change later patches in a patch set
> multiple times unnecessarily) and help collaboration (in that multiple
> people can write patches, since the foundation would already be laid).

I am ok to split the patch series, but I am not sure that 01/40 to
09/40 is the right range for the first patch series.
I would say that 01/40 to 07/40 is better as it can be seen as a
separate refactoring.

> The same concerns about fsck apply, but that shouldn't be a problem,
> since this patch provides the same internal API as mine ("get" function
> taking in a single hash, and "have" function taking in a single hash) so
> it shouldn't be too difficult to adapt my fsck and gc patches [1]. (I
> can do that if necessary.)
>
> [1] https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/

Great! I would be happy with such an outcome.

> One possible issue (with committing something early) is that later work
> (for example, a fast long-running process protocol) will make the
> earlier work (for example, here, a simple single-shot protocol)
> obsolete, while saddling us with the necessity of maintaining the
> earlier one. To that end, if we want to start with the support for a
> hook, a better approach might be to only code the fast long-running
> process protocol, and put a wrapper script in contrib/ that can wrap a
> single-shot process in a long-running process.

I don't think single-shot processes would be a huge burden, because
the code is simpler, and because for example for filters we already
have single shot and long-running processes and no one complains about
that. It's code that is useful as it makes it much easier for people
to do some things (see the clone bundle example).

In fact in Git development we usually start to by first implementing
simpler single-shot solutions, before thinking, when the need arise,
to make it faster. So a perhaps an equally valid opinion could be to
first only submit the patches for the single-shot protocol and later
submit the rest of the series when we start getting feedback about how
external odbs are used.

And yeah I could change the order of the patch series to implement the
long-running processes first and the single-shot process last, so that
it could be possible to first get feedback about the long-running
processes, before we decide to merge or not the single-shot stuff, but
I don't think it would look like the most logical order.

> And another possible issue is that we design ourselves into a corner.
> Thinking about the use cases that I know about (the Android use case and
> the Microsoft GVFS use case), I don't think we are doing that - for
> Android, this means that large blob metadata needs to be part of the
> design (and this patch series does provide for that), and for Microsoft
> GVFS, "get" is relatively cheap, so a configuration option to not invoke
> "have" first when loading a missing object might be sufficient.

If the helper does not advertise the "have" capability, the "have"
instruction will not be sent to the helper, so the current design is
already working for that case.

> As for the design itself (including fetch and clone), it differs from my
> patches (linked above as [1]) in that mine is self-contained (requiring
> only an updated Git server and Git client) whereas this, as far as I can
> tell, requires an external process and some measure of coordination
> between the administrator of the server and the client user (for
> example, the client must have the same ODB mechanism as the server, if
> not, the server might omit certain blobs that the client does not know
> how to fetch).

Yeah, your design is more self contained, but it doesn't handle as
many use cases.

> And I think that my design can be extended to support a use case in
> which, for example, blobs corresponding to a certain type of filename
> (defined by a glob like in gitattributes) can be excluded during
> fetch/clone, much like --blob-max-bytes, and they can be fetched either
> through the built-in mechanism or through a custom hook.

Sure, we could probably rebuild something equivalent to what I did on
top of your design.
My opinion though is that if we want to eventually get to the same
goal, it is better to first merge something that get us very close to
the end goal and then add some improvements on top of it.

> For those reasons, I still lean towards my design, but if we do want to
> go with this design, here are my comments about this patch...
>
> First of all:
>  - You'll probably need to add a repository extension.

I am ok to add one if this is needed, but I am not sure it is, as the
repository format does not really change.

>  - I get compile errors when I "git am" these onto master. I think
>    '#include "config.h"' is needed in some places.

It's strange because I get no compile errors even after a "make clean"
from my branch.
Could you show the actual errors?

> On Sat, 16 Sep 2017 10:07:00 +0200
> Christian Couder <christian.couder@gmail.com> wrote:
>
>> The external-odb.{c,h} files contains the functions that are
>> called by the rest of Git from "sha1_file.c".
>>
>> The odb-helper.{c,h} files contains the functions to
>> actually implement communication with the external scripts or
>> processes that will manage external git objects.
>>
>> For now only script mode is supported, and only the 'have' and
>> 'get_git_obj' instructions are supported.
>
> This "have", as I see from this commit, is more like a "list" command in
> that it lists all hashes that it knows about, and does not check if a
> given hash exists.

Yes.

>> +static struct odb_helper *helpers;
>> +static struct odb_helper **helpers_tail = &helpers;
>
> This could be done with the helpers in list.h instead.

Yeah, but list.h is for a doubly-linked list and I am not sure we need that.

>> +int external_odb_get_object(const unsigned char *sha1)
>> +{
>> +     struct odb_helper *o;
>> +     const char *path;
>> +
>> +     if (!external_odb_has_object(sha1))
>> +             return -1;
>> +
>> +     path = sha1_file_name_alt(external_odb_root(), sha1);
>
> If the purpose of making these functions global in the previous patch is
> just for temporary names, I don't think it's necessary for them to be
> global. Just concatenate the hex SHA1 to external_odb_root()?

In my opinion it is cleaner to make a few functions that are needed
global rather than have some custom code, even if the custom code is
small.
(And I guess that Peff who wrote the above agrees.)

>>  /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
>> @@ -667,7 +684,7 @@ static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
>>               if (check_and_freshen_file(path, freshen))
>>                       return 1;
>>       }
>> -     return 0;
>> +     return external_odb_has_object(sha1);
>>  }
>>
>>  static int check_and_freshen(const unsigned char *sha1, int freshen)
>> @@ -824,6 +841,9 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
>>                       return 0;
>>       }
>>
>> +     if (!external_odb_get_object(sha1) && !lstat(*path, st))
>> +             return 0;
>> +
>>       return -1;
>>  }
>>
>> @@ -859,7 +879,14 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
>>       if (fd >= 0)
>>               return fd;
>>
>> -     return open_sha1_file_alt(sha1, path);
>> +     fd = open_sha1_file_alt(sha1, path);
>> +     if (fd >= 0)
>> +             return fd;
>> +
>> +     if (!external_odb_get_object(sha1))
>> +             fd = open_sha1_file_alt(sha1, path);
>> +
>> +     return fd;
>>  }
>
> Any reason why you prefer to update the loose object functions than to
> update the generic one (sha1_object_info_extended)? My concern with just
> updating the loose object functions was that a caller might have
> obtained the path by iterating through the loose object dirs, and in
> that case we shouldn't query the external ODB for anything.

You are thinking about fsck or gc?
Otherwise I don't think it would be clean to iterate through loose object dirs.

>> +ALT_SOURCE="$PWD/alt-repo/.git"
>> +export ALT_SOURCE
>> +write_script odb-helper <<\EOF
>> +GIT_DIR=$ALT_SOURCE; export GIT_DIR
>> +case "$1" in
>> +have)
>> +     git cat-file --batch-check --batch-all-objects |
>> +     awk '{print $1 " " $3 " " $2}'
>> +     ;;
>> +get_git_obj)
>> +     cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
>> +     ;;
>> +esac
>> +EOF
>> +HELPER="\"$PWD\"/odb-helper"
>
> Thanks for the clear test. It is very obvious that "have" returns a list
> of objects, and "get_git_obj" returns the compressed loose object with
> the Git loose object header included.

Happy that you like how simple helpers can be with single-shot processes ;-)

>> +test_expect_success 'setup alternate repo' '
>> +     git init alt-repo &&
>> +     (cd alt-repo &&
>> +      test_commit one &&
>
> Probably better written as "test_commit -C alt-repo one".

Ok, I will use that .

>> +      test_commit two
>> +     ) &&
>> +     alt_head=`cd alt-repo && git rev-parse HEAD`
>
> I think the style is to use $() and "git -C alt-repo rev-parse HEAD".

Ok, I will change that.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 09/40] Add initial external odb support
  2017-09-27 16:46     ` Christian Couder
@ 2017-09-29 20:36       ` Jonathan Tan
  2017-10-02 14:34         ` Ben Peart
  2017-10-03  9:45         ` Christian Couder
  0 siblings, 2 replies; 49+ messages in thread
From: Jonathan Tan @ 2017-09-29 20:36 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Wed, 27 Sep 2017 18:46:30 +0200
Christian Couder <christian.couder@gmail.com> wrote:

> I am ok to split the patch series, but I am not sure that 01/40 to
> 09/40 is the right range for the first patch series.
> I would say that 01/40 to 07/40 is better as it can be seen as a
> separate refactoring.

I mentioned 09/40 because this is (as far as I can tell) the first one
that introduces a new design.

> I don't think single-shot processes would be a huge burden, because
> the code is simpler, and because for example for filters we already
> have single shot and long-running processes and no one complains about
> that. It's code that is useful as it makes it much easier for people
> to do some things (see the clone bundle example).
> 
> In fact in Git development we usually start to by first implementing
> simpler single-shot solutions, before thinking, when the need arise,
> to make it faster. So a perhaps an equally valid opinion could be to
> first only submit the patches for the single-shot protocol and later
> submit the rest of the series when we start getting feedback about how
> external odbs are used.

My concern is that, as far as I understand about the Microsoft use case,
we already know that we need the faster solution, so the need has
already arisen.

> And yeah I could change the order of the patch series to implement the
> long-running processes first and the single-shot process last, so that
> it could be possible to first get feedback about the long-running
> processes, before we decide to merge or not the single-shot stuff, but
> I don't think it would look like the most logical order.

My thinking was that we would just implement the long-running process
and not implement the single-shot process at all (besides maybe a script
in contrib/). If we are going to do both anyway, I agree that we should
do the single-shot process first.

> > And another possible issue is that we design ourselves into a corner.
> > Thinking about the use cases that I know about (the Android use case and
> > the Microsoft GVFS use case), I don't think we are doing that - for
> > Android, this means that large blob metadata needs to be part of the
> > design (and this patch series does provide for that), and for Microsoft
> > GVFS, "get" is relatively cheap, so a configuration option to not invoke
> > "have" first when loading a missing object might be sufficient.
> 
> If the helper does not advertise the "have" capability, the "have"
> instruction will not be sent to the helper, so the current design is
> already working for that case.

Ah, that's good to know.

> > And I think that my design can be extended to support a use case in
> > which, for example, blobs corresponding to a certain type of filename
> > (defined by a glob like in gitattributes) can be excluded during
> > fetch/clone, much like --blob-max-bytes, and they can be fetched either
> > through the built-in mechanism or through a custom hook.
> 
> Sure, we could probably rebuild something equivalent to what I did on
> top of your design.
> My opinion though is that if we want to eventually get to the same
> goal, it is better to first merge something that get us very close to
> the end goal and then add some improvements on top of it.

I agree - I mentioned that because I personally prefer to review smaller
patch sets at a time, and my patch set already includes a lot of the
same infrastructure needed by yours - for example, the places in the
code to dynamically fetch objects, exclusion of objects when fetching or
cloning, configuring the cloned repo when cloning, fsck, and gc.

> >  - I get compile errors when I "git am" these onto master. I think
> >    '#include "config.h"' is needed in some places.
> 
> It's strange because I get no compile errors even after a "make clean"
> from my branch.
> Could you show the actual errors?

I don't have the error messages with me now, but it was something about
a function being implicitly declared. You will probably get these errors
if you sync past commit e67a57f ("config: create config.h", 2017-06-15).

> > Any reason why you prefer to update the loose object functions than to
> > update the generic one (sha1_object_info_extended)? My concern with just
> > updating the loose object functions was that a caller might have
> > obtained the path by iterating through the loose object dirs, and in
> > that case we shouldn't query the external ODB for anything.
> 
> You are thinking about fsck or gc?
> Otherwise I don't think it would be clean to iterate through loose object dirs.

Yes, fsck and gc (well, prune, I think) do that. I agree that Git
typically doesn't do that (except for exceptional cases like fsck and
gc), but I was thinking about supporting existing code that does that
iteration, not introducing new code that does that.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 00/40] Add initial experimental external ODB support
  2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
                   ` (39 preceding siblings ...)
  2017-09-16  8:07 ` [PATCH v6 40/40] Doc/external-odb: explain transfering objects and metadata Christian Couder
@ 2017-10-02 14:18 ` Ben Peart
  2017-10-03  6:32   ` Christian Couder
  40 siblings, 1 reply; 49+ messages in thread
From: Ben Peart @ 2017-10-02 14:18 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder



On 9/16/2017 4:06 AM, Christian Couder wrote:

> Highlevel view of the patches in the series
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

This is a massive patch series and IMO keeping it a single monolithic 
set of patches makes it difficult to review and unwieldy to make 
progress on as an "all or nothing" series.

I highly recommend breaking it up into multiple smaller patch series 
that can be reviewed and accepted individually.  I have to admit I've 
skipped reviewing the last couple of iterations simply because of the 
time investment required and difficulty separating out the various pieces.

I think using your division of the patches below is a good place to 
start.  Many of these would be good changes even if they weren't part of 
the larger external ODB effort.

       - Patch 1/40 is a small code cleanup that I already sent to the
>        mailing list but may be removed in the end due to ongoing work
>        on "git clone".
> 
>      - Patches 02/40 to 07/40 create a "Git/Packet.pm" module by
>        refactoring "t0021/rot13-filter.pl". Functions from this new
>        module will be used later in test scripts. According to Junio's
>        suggestion compared to v5 we now first fully refactor
>        "t0021/rot13-filter.pl" before creating the "Git/Packet.pm"
>        module.
> 

This seems like a very logical thing to do and should be split out so 
that progress can be made independently.

>      - Patches 08/40 to 16/40 create the external ODB insfrastructure
>        in external-odb.{c,h} and odb-helper.{c,h} for the script mode.
>        The main changes compared to v5 are the following:
>          - we mark as "extern" functions in *.h files
> 	- we use sha1_pos() instead of sha1_entry_pos()
> 	- we check the size in the header when we 'get' a Git object
> 

This is the heart of the ODB infrastructure series.  I'll respond in 
more detail in the specific patches.

>      - Patches 17/40 to 23/40 improve lib-http to make it possible to
>        use it as an external ODB to test storing blobs in an HTTP
>        server. The "upload.sh" and "list.sh" files are now properly
>        indented and they use %% instead of % in parameter
>        substitutions compared to v5.
> 
>      - Patches 24/40 to 32/40 improve the external ODB insfrastructure
>        to support sub-processes and make everything work using
>        them. The main changes compared to v5 are the following:
>          - we mark as "extern" functions in *.h files
> 	- we use the new subprocess_handshake() function
> 	- we check the size in the header when we 'get' a Git object
> 
>      - Patch 33/40 uses attributes to mark blobs that should be handled
>        by an external odb.
> 
>      - Patch 34/40 adds documentation about the external odb
>        mechanism. This patch has been much improved since v5.
> 
>      - Patches 35/40 to 39/40 add the --initial-refspec to git clone
>        along with tests.
> 
>      - Patch 40/40 adds documentation about transfering objects and
>        metadata when using the external odb mechanism. This patch is
>        new since v5.
> 
> Future work
> ~~~~~~~~~~~
> 
> There are still things that could be cleaned or improved. I think I
> may work on:
> 
>    - Integrate changes in recent "read-object-process" work by Ben Peart.
> 
>    - Better test all the combinations of the different modes with and
>      without "have" and "put_*" instructions.
> 
>    - Maybe implement the missing kinds of 'put' ('put_git_obj' and
>      'put_direct'), so that Git could pass either a git object a plain
>      object or ask the helper to retreive it directly from Git's object
>      database.
> 
>    - Add more long running tests and improve tests in general.
> 
> Previous work and discussions
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> (Sorry for the old Gmane links, I hope I will try to replace them with
> public-inbox.org at one point.)
> 
> Peff started to work on this and discuss this some years ago:
> 
> http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207040
> http://thread.gmane.org/gmane.comp.version-control.git/247171
> http://thread.gmane.org/gmane.comp.version-control.git/202902/focus=203020
> 
> His work, which is not compile-tested any more, is still there:
> 
> https://github.com/peff/git/commits/jk/external-odb-wip
> 
> Initial discussions about this new series are there:
> 
> http://thread.gmane.org/gmane.comp.version-control.git/288151/focus=295160
> 
> Version 1, 2, 3, 4 and 5 of this series are here:
> 
> https://public-inbox.org/git/20160613085546.11784-1-chriscool@tuxfamily.org/
> https://public-inbox.org/git/20160628181933.24620-1-chriscool@tuxfamily.org/
> https://public-inbox.org/git/20161130210420.15982-1-chriscool@tuxfamily.org/
> https://public-inbox.org/git/20170620075523.26961-1-chriscool@tuxfamily.org/
> https://public-inbox.org/git/20170803091926.1755-1-chriscool@tuxfamily.org/
> 
> Some of the discussions related to Ben Peart's work that is used by
> this series are here:
> 
> https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
> https://public-inbox.org/git/20170322165220.5660-1-benpeart@microsoft.com/
> https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/
> 
> Links
> ~~~~~
> 
> This patch series is available here:
> 
> https://github.com/chriscool/git/commits/external-odb
> 
> Version 1, 2, 3, 4 and 5 are here:
> 
> https://github.com/chriscool/git/commits/gl-external-odb12
> https://github.com/chriscool/git/commits/gl-external-odb22
> https://github.com/chriscool/git/commits/gl-external-odb61
> https://github.com/chriscool/git/commits/gl-external-odb239
> https://github.com/chriscool/git/commits/gl-external-odb373
> 
> 
> Ben Peart (2):
>    odb-helper: add init_object_process()
>    Add t0450 to test 'get_direct' mechanism
> 
> Christian Couder (38):
>    builtin/clone: get rid of 'value' strbuf
>    t0021/rot13-filter: refactor packet reading functions
>    t0021/rot13-filter: improve 'if .. elsif .. else' style
>    t0021/rot13-filter: improve error message
>    t0021/rot13-filter: add packet_initialize()
>    t0021/rot13-filter: add capability functions
>    Add Git/Packet.pm from parts of t0021/rot13-filter.pl
>    sha1_file: prepare for external odbs
>    Add initial external odb support
>    odb-helper: add odb_helper_init() to send 'init' instruction
>    t0400: add 'put_raw_obj' instruction to odb-helper script
>    external odb: add 'put_raw_obj' support
>    external-odb: accept only blobs for now
>    t0400: add test for external odb write support
>    Add GIT_NO_EXTERNAL_ODB env variable
>    Add t0410 to test external ODB transfer
>    lib-httpd: pass config file to start_httpd()
>    lib-httpd: add upload.sh
>    lib-httpd: add list.sh
>    lib-httpd: add apache-e-odb.conf
>    odb-helper: add odb_helper_get_raw_object()
>    pack-objects: don't pack objects in external odbs
>    Add t0420 to test transfer to HTTP external odb
>    external-odb: add 'get_direct' support
>    odb-helper: add 'script_mode' to 'struct odb_helper'
>    Add t0460 to test passing git objects
>    odb-helper: add put_object_process()
>    Add t0470 to test passing raw objects
>    odb-helper: add have_object_process()
>    Add t0480 to test "have" capability and raw objects
>    external-odb: use 'odb=magic' attribute to mark odb blobs
>    Add Documentation/technical/external-odb.txt
>    clone: add 'initial' param to write_remote_refs()
>    clone: add --initial-refspec option
>    clone: disable external odb before initial clone
>    Add tests for 'clone --initial-refspec'
>    Add t0430 to test cloning using bundles
>    Doc/external-odb: explain transfering objects and metadata
> 
>   Documentation/technical/external-odb.txt |  447 +++++++++++++
>   Makefile                                 |    2 +
>   builtin/clone.c                          |   91 ++-
>   builtin/pack-objects.c                   |    4 +
>   cache.h                                  |   18 +
>   environment.c                            |    4 +
>   external-odb.c                           |  196 ++++++
>   external-odb.h                           |   12 +
>   odb-helper.c                             | 1076 ++++++++++++++++++++++++++++++
>   odb-helper.h                             |   45 ++
>   perl/Git/Packet.pm                       |  118 ++++
>   sha1_file.c                              |  155 +++--
>   t/lib-httpd.sh                           |    8 +-
>   t/lib-httpd/apache-e-odb.conf            |  214 ++++++
>   t/lib-httpd/list.sh                      |   41 ++
>   t/lib-httpd/upload.sh                    |   45 ++
>   t/t0021/rot13-filter.pl                  |  110 +--
>   t/t0400-external-odb.sh                  |   85 +++
>   t/t0410-transfer-e-odb.sh                |  147 ++++
>   t/t0420-transfer-http-e-odb.sh           |  152 +++++
>   t/t0430-clone-bundle-e-odb.sh            |   85 +++
>   t/t0450-read-object.sh                   |   28 +
>   t/t0450/read-object                      |   68 ++
>   t/t0460-read-object-git.sh               |   28 +
>   t/t0460/read-object-git                  |   78 +++
>   t/t0470-read-object-http-e-odb.sh        |  119 ++++
>   t/t0470/read-object-plain                |   83 +++
>   t/t0480-read-object-have-http-e-odb.sh   |  119 ++++
>   t/t0480/read-object-plain-have           |  103 +++
>   t/t5616-clone-initial-refspec.sh         |   48 ++
>   30 files changed, 3588 insertions(+), 141 deletions(-)
>   create mode 100644 Documentation/technical/external-odb.txt
>   create mode 100644 external-odb.c
>   create mode 100644 external-odb.h
>   create mode 100644 odb-helper.c
>   create mode 100644 odb-helper.h
>   create mode 100644 perl/Git/Packet.pm
>   create mode 100644 t/lib-httpd/apache-e-odb.conf
>   create mode 100644 t/lib-httpd/list.sh
>   create mode 100644 t/lib-httpd/upload.sh
>   create mode 100755 t/t0400-external-odb.sh
>   create mode 100755 t/t0410-transfer-e-odb.sh
>   create mode 100755 t/t0420-transfer-http-e-odb.sh
>   create mode 100755 t/t0430-clone-bundle-e-odb.sh
>   create mode 100755 t/t0450-read-object.sh
>   create mode 100755 t/t0450/read-object
>   create mode 100755 t/t0460-read-object-git.sh
>   create mode 100755 t/t0460/read-object-git
>   create mode 100755 t/t0470-read-object-http-e-odb.sh
>   create mode 100755 t/t0470/read-object-plain
>   create mode 100755 t/t0480-read-object-have-http-e-odb.sh
>   create mode 100755 t/t0480/read-object-plain-have
>   create mode 100755 t/t5616-clone-initial-refspec.sh
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 09/40] Add initial external odb support
  2017-09-29 20:36       ` Jonathan Tan
@ 2017-10-02 14:34         ` Ben Peart
  2017-10-03  9:45         ` Christian Couder
  1 sibling, 0 replies; 49+ messages in thread
From: Ben Peart @ 2017-10-02 14:34 UTC (permalink / raw)
  To: Jonathan Tan, Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder



On 9/29/2017 4:36 PM, Jonathan Tan wrote:
> On Wed, 27 Sep 2017 18:46:30 +0200
> Christian Couder <christian.couder@gmail.com> wrote:
> 
>> I am ok to split the patch series, but I am not sure that 01/40 to
>> 09/40 is the right range for the first patch series.
>> I would say that 01/40 to 07/40 is better as it can be seen as a
>> separate refactoring.
> 
> I mentioned 09/40 because this is (as far as I can tell) the first one
> that introduces a new design.
> 
>> I don't think single-shot processes would be a huge burden, because
>> the code is simpler, and because for example for filters we already
>> have single shot and long-running processes and no one complains about
>> that. It's code that is useful as it makes it much easier for people
>> to do some things (see the clone bundle example).
>>
>> In fact in Git development we usually start to by first implementing
>> simpler single-shot solutions, before thinking, when the need arise,
>> to make it faster. So a perhaps an equally valid opinion could be to
>> first only submit the patches for the single-shot protocol and later
>> submit the rest of the series when we start getting feedback about how
>> external odbs are used.
> 
> My concern is that, as far as I understand about the Microsoft use case,
> we already know that we need the faster solution, so the need has
> already arisen.
> 
>> And yeah I could change the order of the patch series to implement the
>> long-running processes first and the single-shot process last, so that
>> it could be possible to first get feedback about the long-running
>> processes, before we decide to merge or not the single-shot stuff, but
>> I don't think it would look like the most logical order.
> 
> My thinking was that we would just implement the long-running process
> and not implement the single-shot process at all (besides maybe a script
> in contrib/). If we are going to do both anyway, I agree that we should
> do the single-shot process first.
> 

I agree with Jonathan's feedback.  We already know the performance of 
single shot requests is insufficient as there are scenarios where there 
will potentially be many missing objects that need to be retrieved to 
complete a git operation (ie checkout).  As a results, we will need the 
long running process model so, overall, it will be simpler to focus 
entirely on that model and skip the single-shot model.

If the complexity of the process model is considered to be too high, we 
can provide helper code in both script and native code that can be used 
to reduce the cost/complexity.  I believe we have most of this already 
with the existing sub-process.c/h module and the packet.pm refactoring 
you have done earlier in this series.

Providing high quality working samples of both is another way to reduce 
the cost and improve the quality.

>>> And another possible issue is that we design ourselves into a corner.
>>> Thinking about the use cases that I know about (the Android use case and
>>> the Microsoft GVFS use case), I don't think we are doing that - for
>>> Android, this means that large blob metadata needs to be part of the
>>> design (and this patch series does provide for that), and for Microsoft
>>> GVFS, "get" is relatively cheap, so a configuration option to not invoke
>>> "have" first when loading a missing object might be sufficient.
>>
>> If the helper does not advertise the "have" capability, the "have"
>> instruction will not be sent to the helper, so the current design is
>> already working for that case.
> 
> Ah, that's good to know.
> 
>>> And I think that my design can be extended to support a use case in
>>> which, for example, blobs corresponding to a certain type of filename
>>> (defined by a glob like in gitattributes) can be excluded during
>>> fetch/clone, much like --blob-max-bytes, and they can be fetched either
>>> through the built-in mechanism or through a custom hook.
>>
>> Sure, we could probably rebuild something equivalent to what I did on
>> top of your design.
>> My opinion though is that if we want to eventually get to the same
>> goal, it is better to first merge something that get us very close to
>> the end goal and then add some improvements on top of it.
> 
> I agree - I mentioned that because I personally prefer to review smaller
> patch sets at a time, and my patch set already includes a lot of the
> same infrastructure needed by yours - for example, the places in the
> code to dynamically fetch objects, exclusion of objects when fetching or
> cloning, configuring the cloned repo when cloning, fsck, and gc.
> 

I agree here has well.  I think smaller patch sets that we can 
review/approve independently will be more effective.

I think Jonathan has a lot of the infrastructure support in his partial 
clone series.  I'd like to take that work, add your external ODB work + 
Jeff's filtering work and come up with the best of all three solution. :)

>>>   - I get compile errors when I "git am" these onto master. I think
>>>     '#include "config.h"' is needed in some places.
>>
>> It's strange because I get no compile errors even after a "make clean"
>> from my branch.
>> Could you show the actual errors?
> 
> I don't have the error messages with me now, but it was something about
> a function being implicitly declared. You will probably get these errors
> if you sync past commit e67a57f ("config: create config.h", 2017-06-15).
> 
>>> Any reason why you prefer to update the loose object functions than to
>>> update the generic one (sha1_object_info_extended)? My concern with just
>>> updating the loose object functions was that a caller might have
>>> obtained the path by iterating through the loose object dirs, and in
>>> that case we shouldn't query the external ODB for anything.
>>
>> You are thinking about fsck or gc?
>> Otherwise I don't think it would be clean to iterate through loose object dirs.
> 
> Yes, fsck and gc (well, prune, I think) do that. I agree that Git
> typically doesn't do that (except for exceptional cases like fsck and
> gc), but I was thinking about supporting existing code that does that
> iteration, not introducing new code that does that.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 00/40] Add initial experimental external ODB support
  2017-10-02 14:18 ` [PATCH v6 00/40] Add initial experimental external ODB support Ben Peart
@ 2017-10-03  6:32   ` Christian Couder
  0 siblings, 0 replies; 49+ messages in thread
From: Christian Couder @ 2017-10-03  6:32 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Mon, Oct 2, 2017 at 4:18 PM, Ben Peart <peartben@gmail.com> wrote:
>
> On 9/16/2017 4:06 AM, Christian Couder wrote:
>
>       - Patch 1/40 is a small code cleanup that I already sent to the
>>
>>        mailing list but may be removed in the end due to ongoing work
>>        on "git clone".
>>
>>      - Patches 02/40 to 07/40 create a "Git/Packet.pm" module by
>>        refactoring "t0021/rot13-filter.pl". Functions from this new
>>        module will be used later in test scripts. According to Junio's
>>        suggestion compared to v5 we now first fully refactor
>>        "t0021/rot13-filter.pl" before creating the "Git/Packet.pm"
>>        module.
>
> This seems like a very logical thing to do and should be split out so that
> progress can be made independently.

Ok, I will split according to my division of the patches.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 09/40] Add initial external odb support
  2017-09-29 20:36       ` Jonathan Tan
  2017-10-02 14:34         ` Ben Peart
@ 2017-10-03  9:45         ` Christian Couder
  2017-10-04  0:15           ` Jonathan Tan
  1 sibling, 1 reply; 49+ messages in thread
From: Christian Couder @ 2017-10-03  9:45 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Fri, Sep 29, 2017 at 10:36 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Wed, 27 Sep 2017 18:46:30 +0200
> Christian Couder <christian.couder@gmail.com> wrote:

>> I don't think single-shot processes would be a huge burden, because
>> the code is simpler, and because for example for filters we already
>> have single shot and long-running processes and no one complains about
>> that. It's code that is useful as it makes it much easier for people
>> to do some things (see the clone bundle example).
>>
>> In fact in Git development we usually start to by first implementing
>> simpler single-shot solutions, before thinking, when the need arise,
>> to make it faster. So a perhaps an equally valid opinion could be to
>> first only submit the patches for the single-shot protocol and later
>> submit the rest of the series when we start getting feedback about how
>> external odbs are used.
>
> My concern is that, as far as I understand about the Microsoft use case,
> we already know that we need the faster solution, so the need has
> already arisen.

Yeah, some people need the faster solution, but my opinion is that
many other people would prefer the single shot protocol.
If all you want to do is a simple resumable clone using bundles for
example, then the long running process solution is very much overkill.

For example with filters there are people using them to do keyword
expansion (maybe to emulate the way Subversion and CVS substitutes
keywords like $Id$, $Author$ and so on). It would be really bad to
deprecate the single shot filters and tell those people they now have
to use long running processes because we don't want to maintain the
small code that make single shot filters work.

The Microsoft GVFS use case is just one use case that is very far from
what most people need. And my opinion is that many more people could
benefit from the single shot protocol. For example many people and
admins could benefit from resumable clones using bundles and, if I
remove the single shot protocol, this use case will be unnecessarily
more difficult to implement in the same way as keyword expansion would
be unnecessarily more difficult to implement if we removed the single
shot filters.

See the first article in
https://git.github.io/rev_news/2016/03/16/edition-13/ about resumable
clone if you are not convinced that resumable clones are not an old
and important problem.

>> And yeah I could change the order of the patch series to implement the
>> long-running processes first and the single-shot process last, so that
>> it could be possible to first get feedback about the long-running
>> processes, before we decide to merge or not the single-shot stuff, but
>> I don't think it would look like the most logical order.
>
> My thinking was that we would just implement the long-running process
> and not implement the single-shot process at all (besides maybe a script
> in contrib/). If we are going to do both anyway, I agree that we should
> do the single-shot process first.

Nice to hear that!

>> > And I think that my design can be extended to support a use case in
>> > which, for example, blobs corresponding to a certain type of filename
>> > (defined by a glob like in gitattributes) can be excluded during
>> > fetch/clone, much like --blob-max-bytes, and they can be fetched either
>> > through the built-in mechanism or through a custom hook.
>>
>> Sure, we could probably rebuild something equivalent to what I did on
>> top of your design.
>> My opinion though is that if we want to eventually get to the same
>> goal, it is better to first merge something that get us very close to
>> the end goal and then add some improvements on top of it.
>
> I agree

So are you ok to rebase your patch series on top of my patch series?

My opinion is that my patch series is trying to get to the end goal,
and succeeding to a very large extent, with the smallest amount of
deep technical changes as possible, and that it is the right way to
approach this problem for the following reasons:

1) The root problem is that the current object stores (packfiles and
loose object files) are not good ways to store some objects,
especially some blobs.

2) This root problem cannot be dealt with by Git itself without any
help from external programs, because Git cannot realistically
implement many different object stores (like http servers, artifact
stores, etc). So Git must be improved so that it becomes capable of
communicating with external object stores.

3) As the Git protocol uses packfiles to send objects and is not very
flexible, it might be better if external stores can also be used to
transfer objects that are not stored any more in the current object
stores. (As packfiles are not good for storing some objects, they are
probably also not a good format for sending them. Also, as the Git
protocol is not resumable, we might easily be able to implement
resumable clones if we let external stores handle some transfer.)

4) Making it easy and flexible to exchange objects (and maybe meta
information) with the external stores is very important.

5) Protocol changes are more difficult than many other code changes,
so we should care a lot about the protocol between Git and external
stores.

> - I mentioned that because I personally prefer to review smaller
> patch sets at a time,

I am ok with sending small patch sets, and I will send smaller patch
sets about this from now on.

> and my patch set already includes a lot of the
> same infrastructure needed by yours - for example, the places in the
> code to dynamically fetch objects, exclusion of objects when fetching or
> cloning, configuring the cloned repo when cloning, fsck, and gc.

I agree that your patch set already includes some infrastructure that
could be used by my work, and your patch sets are perhaps implementing
some of this infrastructure better than in my work (I haven't taken a
deep look). But I really think that the right approach is to focus
first on designing a flexible protocol between Git and external
stores. Then the infrastructure work should be related to improving or
enabling the flexible protocol and the communication between Git and
external stores.

Doing infrastructure work first and improving things on top of this
new infrastructure without relying first on a design of the protocol
between Git and external stores is not the best approach as I think we
might over engineer some infrastructure work or base some user
interfaces on the infrastructure work and not on the end goal.

For example if we improve the current protocol, which is not
necessarily a bad thing in itself, we might forget that for resumable
clone it is much better if we just let external stores and helpers
handle the transfer.

I am not saying that doing infrastructure work is bad or will not in
the end let us reach our goals, but I see it as something that is
potentially distracting, or misleading, from focusing first on the
protocol between Git and external stores.

>> >  - I get compile errors when I "git am" these onto master. I think
>> >    '#include "config.h"' is needed in some places.
>>
>> It's strange because I get no compile errors even after a "make clean"
>> from my branch.
>> Could you show the actual errors?
>
> I don't have the error messages with me now, but it was something about
> a function being implicitly declared. You will probably get these errors
> if you sync past commit e67a57f ("config: create config.h", 2017-06-15).

I am past this commit and I get no errors.
I rebased on top of: ea220ee40c "The eleventh batch for 2.15"

>> > Any reason why you prefer to update the loose object functions than to
>> > update the generic one (sha1_object_info_extended)? My concern with just
>> > updating the loose object functions was that a caller might have
>> > obtained the path by iterating through the loose object dirs, and in
>> > that case we shouldn't query the external ODB for anything.
>>
>> You are thinking about fsck or gc?
>> Otherwise I don't think it would be clean to iterate through loose object dirs.
>
> Yes, fsck and gc (well, prune, I think) do that. I agree that Git
> typically doesn't do that (except for exceptional cases like fsck and
> gc), but I was thinking about supporting existing code that does that
> iteration, not introducing new code that does that.

I haven't taken a look at how fsck and prune work and this is still
code that Peff wrote (though a long time ago), so I tend to trust it.
But I will take a look, and if it is indeed better for them, I am ok
to update sha1_object_info_extended() instead of loose object
functions.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v6 09/40] Add initial external odb support
  2017-10-03  9:45         ` Christian Couder
@ 2017-10-04  0:15           ` Jonathan Tan
  0 siblings, 0 replies; 49+ messages in thread
From: Jonathan Tan @ 2017-10-04  0:15 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Tue, Oct 3, 2017 at 2:45 AM, Christian Couder
<christian.couder@gmail.com> wrote:
> Yeah, some people need the faster solution, but my opinion is that
> many other people would prefer the single shot protocol.
> If all you want to do is a simple resumable clone using bundles for
> example, then the long running process solution is very much overkill.
>
> For example with filters there are people using them to do keyword
> expansion (maybe to emulate the way Subversion and CVS substitutes
> keywords like $Id$, $Author$ and so on). It would be really bad to
> deprecate the single shot filters and tell those people they now have
> to use long running processes because we don't want to maintain the
> small code that make single shot filters work.
>
> The Microsoft GVFS use case is just one use case that is very far from
> what most people need. And my opinion is that many more people could
> benefit from the single shot protocol. For example many people and
> admins could benefit from resumable clones using bundles and, if I
> remove the single shot protocol, this use case will be unnecessarily
> more difficult to implement in the same way as keyword expansion would
> be unnecessarily more difficult to implement if we removed the single
> shot filters.

The idea that some users will prefer writing to the single-shot
protocol is reasonable to me, but I think that providing a contrib/
Perl script that wraps something that speaks the single-shot protocol
is sufficient. This results in less C code, and a better separation of
concerns (I prefer 1 exit point and 1 adapter over 2 exit points).

> I agree that your patch set already includes some infrastructure that
> could be used by my work, and your patch sets are perhaps implementing
> some of this infrastructure better than in my work (I haven't taken a
> deep look). But I really think that the right approach is to focus
> first on designing a flexible protocol between Git and external
> stores. Then the infrastructure work should be related to improving or
> enabling the flexible protocol and the communication between Git and
> external stores.
>
> Doing infrastructure work first and improving things on top of this
> new infrastructure without relying first on a design of the protocol
> between Git and external stores is not the best approach as I think we
> might over engineer some infrastructure work or base some user
> interfaces on the infrastructure work and not on the end goal.
>
> For example if we improve the current protocol, which is not
> necessarily a bad thing in itself, we might forget that for resumable
> clone it is much better if we just let external stores and helpers
> handle the transfer.
>
> I am not saying that doing infrastructure work is bad or will not in
> the end let us reach our goals, but I see it as something that is
> potentially distracting, or misleading, from focusing first on the
> protocol between Git and external stores.

I think that the infrastructure really needs to be considered when
designing the protocol. In particular, we had to consider the needs of
the connectivity check in fsck and the repacking in GC when designing
what the promisor remote (or ODB, in this case) needs to tell us and
what, if any, postprocessing needs to be done. In the end, I settled
on tracking which objects came from the promisor remote and which did
not, which works in my design (which I have tried to ensure that it
fits in our and Microsoft's use case). But that design won't work in
what I understand to be the ODB case, at least from what I understand,
because (at least) (i) you can have multiple ODBs, and (ii) Git does
not have direct access to the objects stored within the ODBs. So some
more design needs to be done.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2017-10-04  0:15 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
2017-09-16  8:06 ` [PATCH v6 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
2017-09-16  8:06 ` [PATCH v6 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
2017-09-16  8:06 ` [PATCH v6 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
2017-09-16  8:06 ` [PATCH v6 04/40] t0021/rot13-filter: improve error message Christian Couder
2017-09-16  8:06 ` [PATCH v6 05/40] t0021/rot13-filter: add packet_initialize() Christian Couder
2017-09-16  8:06 ` [PATCH v6 06/40] t0021/rot13-filter: add capability functions Christian Couder
2017-09-16  8:06 ` [PATCH v6 07/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
2017-09-16  8:06 ` [PATCH v6 08/40] sha1_file: prepare for external odbs Christian Couder
2017-09-16  8:07 ` [PATCH v6 09/40] Add initial external odb support Christian Couder
2017-09-19 17:45   ` Jonathan Tan
2017-09-27 16:46     ` Christian Couder
2017-09-29 20:36       ` Jonathan Tan
2017-10-02 14:34         ` Ben Peart
2017-10-03  9:45         ` Christian Couder
2017-10-04  0:15           ` Jonathan Tan
2017-09-16  8:07 ` [PATCH v6 10/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
2017-09-16  8:07 ` [PATCH v6 11/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
2017-09-16  8:07 ` [PATCH v6 12/40] external odb: add 'put_raw_obj' support Christian Couder
2017-09-16  8:07 ` [PATCH v6 13/40] external-odb: accept only blobs for now Christian Couder
2017-09-16  8:07 ` [PATCH v6 14/40] t0400: add test for external odb write support Christian Couder
2017-09-16  8:07 ` [PATCH v6 15/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2017-09-16  8:07 ` [PATCH v6 16/40] Add t0410 to test external ODB transfer Christian Couder
2017-09-16  8:07 ` [PATCH v6 17/40] lib-httpd: pass config file to start_httpd() Christian Couder
2017-09-16  8:07 ` [PATCH v6 18/40] lib-httpd: add upload.sh Christian Couder
2017-09-16  8:07 ` [PATCH v6 19/40] lib-httpd: add list.sh Christian Couder
2017-09-16  8:07 ` [PATCH v6 20/40] lib-httpd: add apache-e-odb.conf Christian Couder
2017-09-16  8:07 ` [PATCH v6 21/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
2017-09-16  8:07 ` [PATCH v6 22/40] pack-objects: don't pack objects in external odbs Christian Couder
2017-09-16  8:07 ` [PATCH v6 23/40] Add t0420 to test transfer to HTTP external odb Christian Couder
2017-09-16  8:07 ` [PATCH v6 24/40] external-odb: add 'get_direct' support Christian Couder
2017-09-16  8:07 ` [PATCH v6 25/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
2017-09-16  8:07 ` [PATCH v6 26/40] odb-helper: add init_object_process() Christian Couder
2017-09-16  8:07 ` [PATCH v6 27/40] Add t0450 to test 'get_direct' mechanism Christian Couder
2017-09-16  8:07 ` [PATCH v6 28/40] Add t0460 to test passing git objects Christian Couder
2017-09-16  8:07 ` [PATCH v6 29/40] odb-helper: add put_object_process() Christian Couder
2017-09-16  8:07 ` [PATCH v6 30/40] Add t0470 to test passing raw objects Christian Couder
2017-09-16  8:07 ` [PATCH v6 31/40] odb-helper: add have_object_process() Christian Couder
2017-09-16  8:07 ` [PATCH v6 32/40] Add t0480 to test "have" capability and raw objects Christian Couder
2017-09-16  8:07 ` [PATCH v6 33/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
2017-09-16  8:07 ` [PATCH v6 34/40] Add Documentation/technical/external-odb.txt Christian Couder
2017-09-16  8:07 ` [PATCH v6 35/40] clone: add 'initial' param to write_remote_refs() Christian Couder
2017-09-16  8:07 ` [PATCH v6 36/40] clone: add --initial-refspec option Christian Couder
2017-09-16  8:07 ` [PATCH v6 37/40] clone: disable external odb before initial clone Christian Couder
2017-09-16  8:07 ` [PATCH v6 38/40] Add tests for 'clone --initial-refspec' Christian Couder
2017-09-16  8:07 ` [PATCH v6 39/40] Add t0430 to test cloning using bundles Christian Couder
2017-09-16  8:07 ` [PATCH v6 40/40] Doc/external-odb: explain transfering objects and metadata Christian Couder
2017-10-02 14:18 ` [PATCH v6 00/40] Add initial experimental external ODB support Ben Peart
2017-10-03  6:32   ` Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).