git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Mike Hommey <mh@glandium.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Eric Wong <e@80x24.org>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [RFC/PATCH v3 01/16] Add initial external odb support
Date: Thu, 3 Aug 2017 09:46:38 +0200	[thread overview]
Message-ID: <CAP8UFD0ecFW2Sk0fr3ysAXPERNp1RiBMqZMTjYxgt_mvtY-kaw@mail.gmail.com> (raw)
In-Reply-To: <xmqqmvggbl6m.fsf@gitster.mtv.corp.google.com>

(I realized that I didn't answer this email about the v3 version.
Sorry about this late answer.)

On Thu, Dec 1, 2016 at 12:30 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> From: Jeff King <peff@peff.net>
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>
> By the time the series loses RFC prefix, we'd need to see the above
> three lines straightened out a bit more, e.g. a real message and a
> more proper sign-off sequence.

Yeah, it is better in the v5 series I will send really soon now.

>> +static struct odb_helper *find_or_create_helper(const char *name, int len)
>> +{
>> +     struct odb_helper *o;
>> +
>> +     for (o = helpers; o; o = o->next)
>> +             if (!strncmp(o->name, name, len) && !o->name[len])
>> +                     return o;
>> +
>> +     o = odb_helper_new(name, len);
>> +     *helpers_tail = o;
>> +     helpers_tail = &o->next;
>> +
>> +     return o;
>> +}
>> +
>> +static int external_odb_config(const char *var, const char *value, void *data)
>> +{
>> +     struct odb_helper *o;
>> +     const char *key, *dot;
>> +
>> +     if (!skip_prefix(var, "odb.", &key))
>> +             return 0;
>> +     dot = strrchr(key, '.');
>> +     if (!dot)
>> +             return 0;
>
> Is this something Peff wrote long time ago?  I find it surprising
> that he would write this without using parse_config_key().

parse_config_key() is used now.

>> +struct odb_helper_cmd {
>> +     struct argv_array argv;
>> +     struct child_process child;
>> +};
>> +
>> +static void prepare_helper_command(struct argv_array *argv, const char *cmd,
>> +                                const char *fmt, va_list ap)
>> +{
>> +     struct strbuf buf = STRBUF_INIT;
>> +
>> +     strbuf_addstr(&buf, cmd);
>> +     strbuf_addch(&buf, ' ');
>> +     strbuf_vaddf(&buf, fmt, ap);
>> +
>> +     argv_array_push(argv, buf.buf);
>> +     strbuf_release(&buf);
>
> This concatenates the cmdname (like "get") and its parameters into a
> single string (e.g. "get 454cb6bd52a4de614a3633e4f547af03d5c3b640")
> and then places the resulting string as the sole element in argv[]
> array, which I find somewhat unusual.  It at least deserves a
> comment in front of the function that the callers are responsible to
> ensure that the result of vaddf(fmt, ap) is properly shell-quoted.

I added a comment.

> Otherwise it is inviting quoting errors in the future (even if there
> is no such errors in the current code and command set, i.e. a 40-hex
> object name that "get" subcommand takes happens to require no
> special shell-quoting consideration).

Yeah, but I am not sure how this could be done better.

>> +int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
>> +                         int fd)
>> +{
>> +     struct odb_helper_object *obj;
>> +     struct odb_helper_cmd cmd;
>> +     unsigned long total_got;
>> +     git_zstream stream;
>> +     int zret = Z_STREAM_END;
>> +     git_SHA_CTX hash;
>> +     unsigned char real_sha1[20];
>> +
>> +     obj = odb_helper_lookup(o, sha1);
>> +     if (!obj)
>> +             return -1;
>> +
>> +     if (odb_helper_start(o, &cmd, "get %s", sha1_to_hex(sha1)) < 0)
>> +             return -1;
>> +
>> +     memset(&stream, 0, sizeof(stream));
>> +     git_inflate_init(&stream);
>> +     git_SHA1_Init(&hash);
>> +     total_got = 0;
>> +
>> +     for (;;) {
>> +             unsigned char buf[4096];
>> +             int r;
>> +
>> +             r = xread(cmd.child.out, buf, sizeof(buf));
>> +             if (r < 0) {
>> +                     error("unable to read from odb helper '%s': %s",
>> +                           o->name, strerror(errno));
>> +                     close(cmd.child.out);
>> +                     odb_helper_finish(o, &cmd);
>> +                     git_inflate_end(&stream);
>> +                     return -1;
>> +             }
>> +             if (r == 0)
>> +                     break;
>> +
>> +             write_or_die(fd, buf, r);
>> +
>> +             stream.next_in = buf;
>> +             stream.avail_in = r;
>> +             do {
>> +                     unsigned char inflated[4096];
>> +                     unsigned long got;
>> +
>> +                     stream.next_out = inflated;
>> +                     stream.avail_out = sizeof(inflated);
>> +                     zret = git_inflate(&stream, Z_SYNC_FLUSH);
>> +                     got = sizeof(inflated) - stream.avail_out;
>> +
>> +                     git_SHA1_Update(&hash, inflated, got);
>> +                     /* skip header when counting size */
>> +                     if (!total_got) {
>> +                             const unsigned char *p = memchr(inflated, '\0', got);
>
> Does this assume that a single xread() that can result in a
> short-read would not return in the middle of "header", and if so, is
> that a safe assumption to make?

I am not sure what would go wrong in case of a short read.
My guess is that as long as we test that p is not NULL below we should be fine.
As Peff wrote this code, he could probably answer much better than me.

>> +                             if (p)
>> +                                     got -= p - inflated + 1;
>> +                             else
>> +                                     got = 0;
>> +                     }
>> +                     total_got += got;
>> +             } while (stream.avail_in && zret == Z_OK);
>> +     }
>> +
>> +     close(cmd.child.out);
>> +     git_inflate_end(&stream);
>> +     git_SHA1_Final(real_sha1, &hash);
>> +     if (odb_helper_finish(o, &cmd))
>> +             return -1;
>> +     if (zret != Z_STREAM_END) {
>> +             warning("bad zlib data from odb helper '%s' for %s",
>> +                     o->name, sha1_to_hex(sha1));
>> +             return -1;
>> +     }
>> +     if (total_got != obj->size) {
>> +             warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
>> +                     o->name, sha1_to_hex(sha1), total_got, obj->size);
>> +             return -1;
>> +     }
>> +     if (hashcmp(real_sha1, sha1)) {
>> +             warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
>> +                     o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
>> +             return -1;
>> +     }
>> +
>> +     return 0;
>> +}
>
> OK.  So we call the external helper with "get" command, expecting
> that the helper returns the data as a zlib deflated stream, and
> validate that the data matches the expected hash and the expected
> size, while also saving the contents of the deflated stream to fd.
> Presumably the caller opened a fd to write into a loose object?  I
> do not see this code actually validating that the loose object
> header, i.e. "<type> <len>\0", matches what the caller wanted to see
> in obj->size and obj->type.  Shouldn't there be a check for that,
> too?

Yeah, it would be better with a check. It also looks like checking
that must take into account possible short-read that could happen (as
discussed above), so for now I will see if we need to fix the header
reading, before taking a look at this.

> I am tempted to debate myself if it is a sensible design to require
> "get" to return a loose object representation, but cannot decide
> without seeing the remainder of the series.  An obvious alternative
> is to define the "get" request to interface with us via the raw
> contents (not even deflated) and leave the deflating to us, i.e. Git
> sitting on the receiving end, which would allow us to choose to
> store it differently (e.g. we may want to try streaming it into its
> own pack using the streaming.h API, for example).

There is now both a get_raw_obj and a get_git_obj to handle both cases.

  parent reply	other threads:[~2017-08-03  7:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30 21:04 [RFC/PATCH v3 00/16] Add initial experimental external ODB support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 01/16] Add initial external odb support Christian Couder
2016-11-30 23:30   ` Junio C Hamano
2016-11-30 23:37     ` Jeff King
2017-08-03  7:48       ` Christian Couder
2017-08-03  7:46     ` Christian Couder [this message]
2017-08-03  8:06       ` Jeff King
2016-11-30 21:04 ` [RFC/PATCH v3 02/16] external odb foreach Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 03/16] t0400: use --batch-all-objects to get all objects Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 04/16] t0400: add 'put' command to odb-helper script Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 05/16] t0400: add test for 'put' command Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 06/16] external odb: add write support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 07/16] external-odb: accept only blobs for now Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 08/16] t0400: add test for external odb write support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 09/16] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 10/16] Add t0410 to test external ODB transfer Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 11/16] lib-httpd: pass config file to start_httpd() Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 12/16] lib-httpd: add upload.sh Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 13/16] lib-httpd: add list.sh Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 14/16] lib-httpd: add apache-e-odb.conf Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 15/16] odb-helper: add 'store_plain_objects' to 'struct odb_helper' Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 16/16] t0420: add test with HTTP external odb Christian Couder
2016-11-30 22:36 ` [RFC/PATCH v3 00/16] Add initial experimental external ODB support Junio C Hamano
2016-12-13 16:40   ` Christian Couder
2016-12-13 20:05     ` Junio C Hamano
2016-12-15  9:56       ` Christian Couder
2016-12-03 18:47 ` Lars Schneider
2016-12-05 13:23   ` Jeff King
2016-12-13 17:20   ` Christian Couder
2016-12-18 13:13     ` Lars Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD0ecFW2Sk0fr3ysAXPERNp1RiBMqZMTjYxgt_mvtY-kaw@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).