git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Ben Peart <Ben.Peart@microsoft.com>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Mike Hommey <mh@glandium.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Eric Wong <e@80x24.org>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v6 09/40] Add initial external odb support
Date: Tue, 19 Sep 2017 10:45:53 -0700	[thread overview]
Message-ID: <20170919104553.494d7d3b@twelve2.svl.corp.google.com> (raw)
In-Reply-To: <20170916080731.13925-10-chriscool@tuxfamily.org>

I wonder if it's better to get a change like this (PATCH v6 09/40 and
any of the previous patches that this depends on) in and then build on
it rather than to review the whole patch set at a time.  This would
reduce ripple effects (of needing to change later patches in a patch set
multiple times unnecessarily) and help collaboration (in that multiple
people can write patches, since the foundation would already be laid).

The same concerns about fsck apply, but that shouldn't be a problem,
since this patch provides the same internal API as mine ("get" function
taking in a single hash, and "have" function taking in a single hash) so
it shouldn't be too difficult to adapt my fsck and gc patches [1]. (I
can do that if necessary.)

[1] https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/

One possible issue (with committing something early) is that later work
(for example, a fast long-running process protocol) will make the
earlier work (for example, here, a simple single-shot protocol)
obsolete, while saddling us with the necessity of maintaining the
earlier one. To that end, if we want to start with the support for a
hook, a better approach might be to only code the fast long-running
process protocol, and put a wrapper script in contrib/ that can wrap a
single-shot process in a long-running process.

And another possible issue is that we design ourselves into a corner.
Thinking about the use cases that I know about (the Android use case and
the Microsoft GVFS use case), I don't think we are doing that - for
Android, this means that large blob metadata needs to be part of the
design (and this patch series does provide for that), and for Microsoft
GVFS, "get" is relatively cheap, so a configuration option to not invoke
"have" first when loading a missing object might be sufficient.

As for the design itself (including fetch and clone), it differs from my
patches (linked above as [1]) in that mine is self-contained (requiring
only an updated Git server and Git client) whereas this, as far as I can
tell, requires an external process and some measure of coordination
between the administrator of the server and the client user (for
example, the client must have the same ODB mechanism as the server, if
not, the server might omit certain blobs that the client does not know
how to fetch).

And I think that my design can be extended to support a use case in
which, for example, blobs corresponding to a certain type of filename
(defined by a glob like in gitattributes) can be excluded during
fetch/clone, much like --blob-max-bytes, and they can be fetched either
through the built-in mechanism or through a custom hook.

For those reasons, I still lean towards my design, but if we do want to
go with this design, here are my comments about this patch...

First of all:
 - You'll probably need to add a repository extension.
 - I get compile errors when I "git am" these onto master. I think
   '#include "config.h"' is needed in some places.

On Sat, 16 Sep 2017 10:07:00 +0200
Christian Couder <christian.couder@gmail.com> wrote:

> The external-odb.{c,h} files contains the functions that are
> called by the rest of Git from "sha1_file.c".
> 
> The odb-helper.{c,h} files contains the functions to
> actually implement communication with the external scripts or
> processes that will manage external git objects.
> 
> For now only script mode is supported, and only the 'have' and
> 'get_git_obj' instructions are supported.

This "have", as I see from this commit, is more like a "list" command in
that it lists all hashes that it knows about, and does not check if a
given hash exists.

> +static struct odb_helper *helpers;
> +static struct odb_helper **helpers_tail = &helpers;

This could be done with the helpers in list.h instead.

> +int external_odb_get_object(const unsigned char *sha1)
> +{
> +	struct odb_helper *o;
> +	const char *path;
> +
> +	if (!external_odb_has_object(sha1))
> +		return -1;
> +
> +	path = sha1_file_name_alt(external_odb_root(), sha1);

If the purpose of making these functions global in the previous patch is
just for temporary names, I don't think it's necessary for them to be
global. Just concatenate the hex SHA1 to external_odb_root()?

>  /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
> @@ -667,7 +684,7 @@ static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
>  		if (check_and_freshen_file(path, freshen))
>  			return 1;
>  	}
> -	return 0;
> +	return external_odb_has_object(sha1);
>  }
>  
>  static int check_and_freshen(const unsigned char *sha1, int freshen)
> @@ -824,6 +841,9 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
>  			return 0;
>  	}
>  
> +	if (!external_odb_get_object(sha1) && !lstat(*path, st))
> +		return 0;
> +
>  	return -1;
>  }
>  
> @@ -859,7 +879,14 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
>  	if (fd >= 0)
>  		return fd;
>  
> -	return open_sha1_file_alt(sha1, path);
> +	fd = open_sha1_file_alt(sha1, path);
> +	if (fd >= 0)
> +		return fd;
> +
> +	if (!external_odb_get_object(sha1))
> +		fd = open_sha1_file_alt(sha1, path);
> +
> +	return fd;
>  }

Any reason why you prefer to update the loose object functions than to
update the generic one (sha1_object_info_extended)? My concern with just
updating the loose object functions was that a caller might have
obtained the path by iterating through the loose object dirs, and in
that case we shouldn't query the external ODB for anything.

> +ALT_SOURCE="$PWD/alt-repo/.git"
> +export ALT_SOURCE
> +write_script odb-helper <<\EOF
> +GIT_DIR=$ALT_SOURCE; export GIT_DIR
> +case "$1" in
> +have)
> +	git cat-file --batch-check --batch-all-objects |
> +	awk '{print $1 " " $3 " " $2}'
> +	;;
> +get_git_obj)
> +	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
> +	;;
> +esac
> +EOF
> +HELPER="\"$PWD\"/odb-helper"

Thanks for the clear test. It is very obvious that "have" returns a list
of objects, and "get_git_obj" returns the compressed loose object with
the Git loose object header included.

> +
> +test_expect_success 'setup alternate repo' '
> +	git init alt-repo &&
> +	(cd alt-repo &&
> +	 test_commit one &&

Probably better written as "test_commit -C alt-repo one".

> +	 test_commit two
> +	) &&
> +	alt_head=`cd alt-repo && git rev-parse HEAD`

I think the style is to use $() and "git -C alt-repo rev-parse HEAD".

  reply	other threads:[~2017-09-19 17:46 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16  8:06 [PATCH v6 00/40] Add initial experimental external ODB support Christian Couder
2017-09-16  8:06 ` [PATCH v6 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
2017-09-16  8:06 ` [PATCH v6 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
2017-09-16  8:06 ` [PATCH v6 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
2017-09-16  8:06 ` [PATCH v6 04/40] t0021/rot13-filter: improve error message Christian Couder
2017-09-16  8:06 ` [PATCH v6 05/40] t0021/rot13-filter: add packet_initialize() Christian Couder
2017-09-16  8:06 ` [PATCH v6 06/40] t0021/rot13-filter: add capability functions Christian Couder
2017-09-16  8:06 ` [PATCH v6 07/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
2017-09-16  8:06 ` [PATCH v6 08/40] sha1_file: prepare for external odbs Christian Couder
2017-09-16  8:07 ` [PATCH v6 09/40] Add initial external odb support Christian Couder
2017-09-19 17:45   ` Jonathan Tan [this message]
2017-09-27 16:46     ` Christian Couder
2017-09-29 20:36       ` Jonathan Tan
2017-10-02 14:34         ` Ben Peart
2017-10-03  9:45         ` Christian Couder
2017-10-04  0:15           ` Jonathan Tan
2017-09-16  8:07 ` [PATCH v6 10/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
2017-09-16  8:07 ` [PATCH v6 11/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
2017-09-16  8:07 ` [PATCH v6 12/40] external odb: add 'put_raw_obj' support Christian Couder
2017-09-16  8:07 ` [PATCH v6 13/40] external-odb: accept only blobs for now Christian Couder
2017-09-16  8:07 ` [PATCH v6 14/40] t0400: add test for external odb write support Christian Couder
2017-09-16  8:07 ` [PATCH v6 15/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2017-09-16  8:07 ` [PATCH v6 16/40] Add t0410 to test external ODB transfer Christian Couder
2017-09-16  8:07 ` [PATCH v6 17/40] lib-httpd: pass config file to start_httpd() Christian Couder
2017-09-16  8:07 ` [PATCH v6 18/40] lib-httpd: add upload.sh Christian Couder
2017-09-16  8:07 ` [PATCH v6 19/40] lib-httpd: add list.sh Christian Couder
2017-09-16  8:07 ` [PATCH v6 20/40] lib-httpd: add apache-e-odb.conf Christian Couder
2017-09-16  8:07 ` [PATCH v6 21/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
2017-09-16  8:07 ` [PATCH v6 22/40] pack-objects: don't pack objects in external odbs Christian Couder
2017-09-16  8:07 ` [PATCH v6 23/40] Add t0420 to test transfer to HTTP external odb Christian Couder
2017-09-16  8:07 ` [PATCH v6 24/40] external-odb: add 'get_direct' support Christian Couder
2017-09-16  8:07 ` [PATCH v6 25/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
2017-09-16  8:07 ` [PATCH v6 26/40] odb-helper: add init_object_process() Christian Couder
2017-09-16  8:07 ` [PATCH v6 27/40] Add t0450 to test 'get_direct' mechanism Christian Couder
2017-09-16  8:07 ` [PATCH v6 28/40] Add t0460 to test passing git objects Christian Couder
2017-09-16  8:07 ` [PATCH v6 29/40] odb-helper: add put_object_process() Christian Couder
2017-09-16  8:07 ` [PATCH v6 30/40] Add t0470 to test passing raw objects Christian Couder
2017-09-16  8:07 ` [PATCH v6 31/40] odb-helper: add have_object_process() Christian Couder
2017-09-16  8:07 ` [PATCH v6 32/40] Add t0480 to test "have" capability and raw objects Christian Couder
2017-09-16  8:07 ` [PATCH v6 33/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
2017-09-16  8:07 ` [PATCH v6 34/40] Add Documentation/technical/external-odb.txt Christian Couder
2017-09-16  8:07 ` [PATCH v6 35/40] clone: add 'initial' param to write_remote_refs() Christian Couder
2017-09-16  8:07 ` [PATCH v6 36/40] clone: add --initial-refspec option Christian Couder
2017-09-16  8:07 ` [PATCH v6 37/40] clone: disable external odb before initial clone Christian Couder
2017-09-16  8:07 ` [PATCH v6 38/40] Add tests for 'clone --initial-refspec' Christian Couder
2017-09-16  8:07 ` [PATCH v6 39/40] Add t0430 to test cloning using bundles Christian Couder
2017-09-16  8:07 ` [PATCH v6 40/40] Doc/external-odb: explain transfering objects and metadata Christian Couder
2017-10-02 14:18 ` [PATCH v6 00/40] Add initial experimental external ODB support Ben Peart
2017-10-03  6:32   ` Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170919104553.494d7d3b@twelve2.svl.corp.google.com \
    --to=jonathantanmy@google.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).