git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Christian Couder <christian.couder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Mike Hommey <mh@glandium.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Eric Wong <e@80x24.org>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [RFC/PATCH v3 01/16] Add initial external odb support
Date: Thu, 3 Aug 2017 04:06:00 -0400	[thread overview]
Message-ID: <20170803080600.aeaprj5hb6ucrkgy@sigill.intra.peff.net> (raw)
In-Reply-To: <CAP8UFD0ecFW2Sk0fr3ysAXPERNp1RiBMqZMTjYxgt_mvtY-kaw@mail.gmail.com>

On Thu, Aug 03, 2017 at 09:46:38AM +0200, Christian Couder wrote:

> >> +static int external_odb_config(const char *var, const char *value, void *data)
> >> +{
> >> +     struct odb_helper *o;
> >> +     const char *key, *dot;
> >> +
> >> +     if (!skip_prefix(var, "odb.", &key))
> >> +             return 0;
> >> +     dot = strrchr(key, '.');
> >> +     if (!dot)
> >> +             return 0;
> >
> > Is this something Peff wrote long time ago?  I find it surprising
> > that he would write this without using parse_config_key().
> 
> parse_config_key() is used now.

Yeah, I think the original was from 2012. We didn't add
parse_config_key() until 2013. Definitely worth using.

> >> +     for (;;) {
> >> +             unsigned char buf[4096];
> >> +             int r;
> >> +
> >> +             r = xread(cmd.child.out, buf, sizeof(buf));
> >> +             if (r < 0) {
> >> +                     error("unable to read from odb helper '%s': %s",
> >> +                           o->name, strerror(errno));
> >> +                     close(cmd.child.out);
> >> +                     odb_helper_finish(o, &cmd);
> >> +                     git_inflate_end(&stream);
> >> +                     return -1;
> >> +             }
> >> +             if (r == 0)
> >> +                     break;
> >> +
> >> +             write_or_die(fd, buf, r);
> >> +
> >> +             stream.next_in = buf;
> >> +             stream.avail_in = r;
> >> +             do {
> >> +                     unsigned char inflated[4096];
> >> +                     unsigned long got;
> >> +
> >> +                     stream.next_out = inflated;
> >> +                     stream.avail_out = sizeof(inflated);
> >> +                     zret = git_inflate(&stream, Z_SYNC_FLUSH);
> >> +                     got = sizeof(inflated) - stream.avail_out;
> >> +
> >> +                     git_SHA1_Update(&hash, inflated, got);
> >> +                     /* skip header when counting size */
> >> +                     if (!total_got) {
> >> +                             const unsigned char *p = memchr(inflated, '\0', got);
> [reconstructed quoted function so we can see the whole thing]
> >> +                             if (p)
> >> +                                     got -= p - inflated + 1;
> >> +                             else
> >> +                                     got = 0;
> >> +                     }
> >> +                     total_got += got;
> >> +             } while (stream.avail_in && zret == Z_OK);
> >> +     }
> >
> > Does this assume that a single xread() that can result in a
> > short-read would not return in the middle of "header", and if so, is
> > that a safe assumption to make?
> 
> I am not sure what would go wrong in case of a short read.
> My guess is that as long as we test that p is not NULL below we should be fine.
> As Peff wrote this code, he could probably answer much better than me.

I think it's OK. The idea is to suck up characters until we hit the
end-of-header NUL. So "total_got" only becomes non-zero once we've seen
that NUL. If we get a short read then that memchr() returns NULL, and we
set "got" to 0, and don't advance total_got at all. When we finally do
hit a partial read that contains the NUL, then "p" will be non-NULL, and
we'll reduce "got" as appropriate.

All that said, I agree with the bits you both said later that we should
probably be checking the actual content of the header (if we're indeed
going to keep this on-the-wire format -- see below).

> > I am tempted to debate myself if it is a sensible design to require
> > "get" to return a loose object representation, but cannot decide
> > without seeing the remainder of the series.  An obvious alternative
> > is to define the "get" request to interface with us via the raw
> > contents (not even deflated) and leave the deflating to us, i.e. Git
> > sitting on the receiving end, which would allow us to choose to
> > store it differently (e.g. we may want to try streaming it into its
> > own pack using the streaming.h API, for example).
> 
> There is now both a get_raw_obj and a get_git_obj to handle both cases.

Yeah, I don't recall why I picked the loose format as the transfer
mechanism. I guess I figured that the objects would be large, so we'd
want them in their own loose objects (not part of a pack that might have
to get rewritten). So storing and sending as a loose-object zlib stream
means we can avoid any recompression and just send it out to disk.

But that was a long time ago. I think these days we prefer to put even
large objects into packfiles. If we wanted something the client could
stream directly into storage, the pack format would probably make more
sense. But it also probably isn't the end of the world to just get raw
contents and then re-zlib them. That's a little extra overhead on the
receiving side, but it makes things nice and simple. And we're already
uncompressing and computing the sha1 to verify the incoming content
anyway.

-Peff

  reply	other threads:[~2017-08-03  8:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30 21:04 [RFC/PATCH v3 00/16] Add initial experimental external ODB support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 01/16] Add initial external odb support Christian Couder
2016-11-30 23:30   ` Junio C Hamano
2016-11-30 23:37     ` Jeff King
2017-08-03  7:48       ` Christian Couder
2017-08-03  7:46     ` Christian Couder
2017-08-03  8:06       ` Jeff King [this message]
2016-11-30 21:04 ` [RFC/PATCH v3 02/16] external odb foreach Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 03/16] t0400: use --batch-all-objects to get all objects Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 04/16] t0400: add 'put' command to odb-helper script Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 05/16] t0400: add test for 'put' command Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 06/16] external odb: add write support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 07/16] external-odb: accept only blobs for now Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 08/16] t0400: add test for external odb write support Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 09/16] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 10/16] Add t0410 to test external ODB transfer Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 11/16] lib-httpd: pass config file to start_httpd() Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 12/16] lib-httpd: add upload.sh Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 13/16] lib-httpd: add list.sh Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 14/16] lib-httpd: add apache-e-odb.conf Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 15/16] odb-helper: add 'store_plain_objects' to 'struct odb_helper' Christian Couder
2016-11-30 21:04 ` [RFC/PATCH v3 16/16] t0420: add test with HTTP external odb Christian Couder
2016-11-30 22:36 ` [RFC/PATCH v3 00/16] Add initial experimental external ODB support Junio C Hamano
2016-12-13 16:40   ` Christian Couder
2016-12-13 20:05     ` Junio C Hamano
2016-12-15  9:56       ` Christian Couder
2016-12-03 18:47 ` Lars Schneider
2016-12-05 13:23   ` Jeff King
2016-12-13 17:20   ` Christian Couder
2016-12-18 13:13     ` Lars Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170803080600.aeaprj5hb6ucrkgy@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).