git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Mike Hommey <mh@glandium.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Eric Wong <e@80x24.org>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [RFC/PATCH v3 01/16] Add initial external odb support
Date: Wed, 30 Nov 2016 15:30:09 -0800	[thread overview]
Message-ID: <xmqqmvggbl6m.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20161130210420.15982-2-chriscool@tuxfamily.org> (Christian Couder's message of "Wed, 30 Nov 2016 22:04:05 +0100")

Christian Couder <christian.couder@gmail.com> writes:

> From: Jeff King <peff@peff.net>
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---

By the time the series loses RFC prefix, we'd need to see the above
three lines straightened out a bit more, e.g. a real message and a
more proper sign-off sequence.

> +static struct odb_helper *find_or_create_helper(const char *name, int len)
> +{
> +	struct odb_helper *o;
> +
> +	for (o = helpers; o; o = o->next)
> +		if (!strncmp(o->name, name, len) && !o->name[len])
> +			return o;
> +
> +	o = odb_helper_new(name, len);
> +	*helpers_tail = o;
> +	helpers_tail = &o->next;
> +
> +	return o;
> +}
> +
> +static int external_odb_config(const char *var, const char *value, void *data)
> +{
> +	struct odb_helper *o;
> +	const char *key, *dot;
> +
> +	if (!skip_prefix(var, "odb.", &key))
> +		return 0;
> +	dot = strrchr(key, '.');
> +	if (!dot)
> +		return 0;

Is this something Peff wrote long time ago?  I find it surprising
that he would write this without using parse_config_key().

> +struct odb_helper_cmd {
> +	struct argv_array argv;
> +	struct child_process child;
> +};
> +
> +static void prepare_helper_command(struct argv_array *argv, const char *cmd,
> +				   const char *fmt, va_list ap)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_addstr(&buf, cmd);
> +	strbuf_addch(&buf, ' ');
> +	strbuf_vaddf(&buf, fmt, ap);
> +
> +	argv_array_push(argv, buf.buf);
> +	strbuf_release(&buf);

This concatenates the cmdname (like "get") and its parameters into a
single string (e.g. "get 454cb6bd52a4de614a3633e4f547af03d5c3b640")
and then places the resulting string as the sole element in argv[]
array, which I find somewhat unusual.  It at least deserves a
comment in front of the function that the callers are responsible to
ensure that the result of vaddf(fmt, ap) is properly shell-quoted.
Otherwise it is inviting quoting errors in the future (even if there
is no such errors in the current code and command set, i.e. a 40-hex
object name that "get" subcommand takes happens to require no
special shell-quoting consideration).

> +int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
> +			    int fd)
> +{
> +	struct odb_helper_object *obj;
> +	struct odb_helper_cmd cmd;
> +	unsigned long total_got;
> +	git_zstream stream;
> +	int zret = Z_STREAM_END;
> +	git_SHA_CTX hash;
> +	unsigned char real_sha1[20];
> +
> +	obj = odb_helper_lookup(o, sha1);
> +	if (!obj)
> +		return -1;
> +
> +	if (odb_helper_start(o, &cmd, "get %s", sha1_to_hex(sha1)) < 0)
> +		return -1;
> +
> +	memset(&stream, 0, sizeof(stream));
> +	git_inflate_init(&stream);
> +	git_SHA1_Init(&hash);
> +	total_got = 0;
> +
> +	for (;;) {
> +		unsigned char buf[4096];
> +		int r;
> +
> +		r = xread(cmd.child.out, buf, sizeof(buf));
> +		if (r < 0) {
> +			error("unable to read from odb helper '%s': %s",
> +			      o->name, strerror(errno));
> +			close(cmd.child.out);
> +			odb_helper_finish(o, &cmd);
> +			git_inflate_end(&stream);
> +			return -1;
> +		}
> +		if (r == 0)
> +			break;
> +
> +		write_or_die(fd, buf, r);
> +
> +		stream.next_in = buf;
> +		stream.avail_in = r;
> +		do {
> +			unsigned char inflated[4096];
> +			unsigned long got;
> +
> +			stream.next_out = inflated;
> +			stream.avail_out = sizeof(inflated);
> +			zret = git_inflate(&stream, Z_SYNC_FLUSH);
> +			got = sizeof(inflated) - stream.avail_out;
> +
> +			git_SHA1_Update(&hash, inflated, got);
> +			/* skip header when counting size */
> +			if (!total_got) {
> +				const unsigned char *p = memchr(inflated, '\0', got);

Does this assume that a single xread() that can result in a
short-read would not return in the middle of "header", and if so, is
that a safe assumption to make?

> +				if (p)
> +					got -= p - inflated + 1;
> +				else
> +					got = 0;
> +			}
> +			total_got += got;
> +		} while (stream.avail_in && zret == Z_OK);
> +	}
> +
> +	close(cmd.child.out);
> +	git_inflate_end(&stream);
> +	git_SHA1_Final(real_sha1, &hash);
> +	if (odb_helper_finish(o, &cmd))
> +		return -1;
> +	if (zret != Z_STREAM_END) {
> +		warning("bad zlib data from odb helper '%s' for %s",
> +			o->name, sha1_to_hex(sha1));
> +		return -1;
> +	}
> +	if (total_got != obj->size) {
> +		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
> +			o->name, sha1_to_hex(sha1), total_got, obj->size);
> +		return -1;
> +	}
> +	if (hashcmp(real_sha1, sha1)) {
> +		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
> +			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
> +		return -1;
> +	}
> +
> +	return 0;
> +}

OK.  So we call the external helper with "get" command, expecting
that the helper returns the data as a zlib deflated stream, and
validate that the data matches the expected hash and the expected
size, while also saving the contents of the deflated stream to fd.
Presumably the caller opened a fd to write into a loose object?  I
do not see this code actually validating that the loose object
header, i.e. "<type> <len>\0", matches what the caller wanted to see
in obj->size and obj->type.  Shouldn't there be a check for that,
too?

I am tempted to debate myself if it is a sensible design to require
"get" to return a loose object representation, but cannot decide
without seeing the remainder of the series.  An obvious alternative
is to define the "get" request to interface with us via the raw
contents (not even deflated) and leave the deflating to us, i.e. Git
sitting on the receiving end, which would allow us to choose to
store it differently (e.g. we may want to try streaming it into its
own pack using the streaming.h API, for example).


  reply	other threads:[~2016-11-30 23:30 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30 21:04 [RFC/PATCH v3 00/16] Add initial experimental external ODB support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 01/16] Add initial external odb support Christian Couder
2016-11-30 23:30   ` Junio C Hamano [this message]
2016-11-30 23:37     ` Jeff King
2017-08-03  7:48       ` Christian Couder
2017-08-03  7:46     ` Christian Couder
2017-08-03  8:06       ` Jeff King
2016-11-30 21:04 ` [RFC/PATCH v3 02/16] external odb foreach Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 03/16] t0400: use --batch-all-objects to get all objects Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 04/16] t0400: add 'put' command to odb-helper script Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 05/16] t0400: add test for 'put' command Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 06/16] external odb: add write support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 07/16] external-odb: accept only blobs for now Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 08/16] t0400: add test for external odb write support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 09/16] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 10/16] Add t0410 to test external ODB transfer Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 11/16] lib-httpd: pass config file to start_httpd() Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 12/16] lib-httpd: add upload.sh Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 13/16] lib-httpd: add list.sh Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 14/16] lib-httpd: add apache-e-odb.conf Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 15/16] odb-helper: add 'store_plain_objects' to 'struct odb_helper' Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 16/16] t0420: add test with HTTP external odb Christian Couder
2016-11-30 22:36 ` [RFC/PATCH v3 00/16] Add initial experimental external ODB support Junio C Hamano
2016-12-13 16:40   ` Christian Couder
2016-12-13 20:05     ` Junio C Hamano
2016-12-15  9:56       ` Christian Couder
2016-12-03 18:47 ` Lars Schneider
2016-12-05 13:23   ` Jeff King
2016-12-13 17:20   ` Christian Couder
2016-12-18 13:13     ` Lars Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqmvggbl6m.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).