From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 53FE72047F for ; Thu, 3 Aug 2017 08:06:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751223AbdHCIGI (ORCPT ); Thu, 3 Aug 2017 04:06:08 -0400 Received: from cloud.peff.net ([104.130.231.41]:56526 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751050AbdHCIGF (ORCPT ); Thu, 3 Aug 2017 04:06:05 -0400 Received: (qmail 9943 invoked by uid 109); 3 Aug 2017 08:06:03 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Thu, 03 Aug 2017 08:06:03 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 16842 invoked by uid 111); 3 Aug 2017 08:06:22 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with SMTP; Thu, 03 Aug 2017 04:06:22 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Thu, 03 Aug 2017 04:06:00 -0400 Date: Thu, 3 Aug 2017 04:06:00 -0400 From: Jeff King To: Christian Couder Cc: Junio C Hamano , git , Nguyen Thai Ngoc Duy , Mike Hommey , Lars Schneider , Eric Wong , Christian Couder Subject: Re: [RFC/PATCH v3 01/16] Add initial external odb support Message-ID: <20170803080600.aeaprj5hb6ucrkgy@sigill.intra.peff.net> References: <20161130210420.15982-1-chriscool@tuxfamily.org> <20161130210420.15982-2-chriscool@tuxfamily.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Aug 03, 2017 at 09:46:38AM +0200, Christian Couder wrote: > >> +static int external_odb_config(const char *var, const char *value, void *data) > >> +{ > >> + struct odb_helper *o; > >> + const char *key, *dot; > >> + > >> + if (!skip_prefix(var, "odb.", &key)) > >> + return 0; > >> + dot = strrchr(key, '.'); > >> + if (!dot) > >> + return 0; > > > > Is this something Peff wrote long time ago? I find it surprising > > that he would write this without using parse_config_key(). > > parse_config_key() is used now. Yeah, I think the original was from 2012. We didn't add parse_config_key() until 2013. Definitely worth using. > >> + for (;;) { > >> + unsigned char buf[4096]; > >> + int r; > >> + > >> + r = xread(cmd.child.out, buf, sizeof(buf)); > >> + if (r < 0) { > >> + error("unable to read from odb helper '%s': %s", > >> + o->name, strerror(errno)); > >> + close(cmd.child.out); > >> + odb_helper_finish(o, &cmd); > >> + git_inflate_end(&stream); > >> + return -1; > >> + } > >> + if (r == 0) > >> + break; > >> + > >> + write_or_die(fd, buf, r); > >> + > >> + stream.next_in = buf; > >> + stream.avail_in = r; > >> + do { > >> + unsigned char inflated[4096]; > >> + unsigned long got; > >> + > >> + stream.next_out = inflated; > >> + stream.avail_out = sizeof(inflated); > >> + zret = git_inflate(&stream, Z_SYNC_FLUSH); > >> + got = sizeof(inflated) - stream.avail_out; > >> + > >> + git_SHA1_Update(&hash, inflated, got); > >> + /* skip header when counting size */ > >> + if (!total_got) { > >> + const unsigned char *p = memchr(inflated, '\0', got); > [reconstructed quoted function so we can see the whole thing] > >> + if (p) > >> + got -= p - inflated + 1; > >> + else > >> + got = 0; > >> + } > >> + total_got += got; > >> + } while (stream.avail_in && zret == Z_OK); > >> + } > > > > Does this assume that a single xread() that can result in a > > short-read would not return in the middle of "header", and if so, is > > that a safe assumption to make? > > I am not sure what would go wrong in case of a short read. > My guess is that as long as we test that p is not NULL below we should be fine. > As Peff wrote this code, he could probably answer much better than me. I think it's OK. The idea is to suck up characters until we hit the end-of-header NUL. So "total_got" only becomes non-zero once we've seen that NUL. If we get a short read then that memchr() returns NULL, and we set "got" to 0, and don't advance total_got at all. When we finally do hit a partial read that contains the NUL, then "p" will be non-NULL, and we'll reduce "got" as appropriate. All that said, I agree with the bits you both said later that we should probably be checking the actual content of the header (if we're indeed going to keep this on-the-wire format -- see below). > > I am tempted to debate myself if it is a sensible design to require > > "get" to return a loose object representation, but cannot decide > > without seeing the remainder of the series. An obvious alternative > > is to define the "get" request to interface with us via the raw > > contents (not even deflated) and leave the deflating to us, i.e. Git > > sitting on the receiving end, which would allow us to choose to > > store it differently (e.g. we may want to try streaming it into its > > own pack using the streaming.h API, for example). > > There is now both a get_raw_obj and a get_git_obj to handle both cases. Yeah, I don't recall why I picked the loose format as the transfer mechanism. I guess I figured that the objects would be large, so we'd want them in their own loose objects (not part of a pack that might have to get rewritten). So storing and sending as a loose-object zlib stream means we can avoid any recompression and just send it out to disk. But that was a long time ago. I think these days we prefer to put even large objects into packfiles. If we wanted something the client could stream directly into storage, the pack format would probably make more sense. But it also probably isn't the end of the world to just get raw contents and then re-zlib them. That's a little extra overhead on the receiving side, but it makes things nice and simple. And we're already uncompressing and computing the sha1 to verify the incoming content anyway. -Peff