git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>
Subject: Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP
Date: Sun, 14 Feb 2016 12:05:25 -0500	[thread overview]
Message-ID: <20160214170525.GB10219@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJo=hJv-GWZOsv31iekW+AdNazLGQ=XYD=UXMO+RuB15baTsow@mail.gmail.com>

On Sat, Feb 13, 2016 at 06:14:31PM -0800, Shawn Pearce wrote:

> > And with "resumable=<url>", the client does not have to hit the server
> > to do a redirect; it can go straight to the final URL, saving a
> > round-trip.
> 
> It occurred to me today that to make the above ("resumable=<url>") as
> efficient as possible, we should allow HTTP clients to include
> &resumable=1 in the GET /info/refs URL as a hint to the server that if
> it serves a resumable=<url> it can shrink the ref advertisement to 1
> line containing the capabilities.

I'm slightly wary of this. The client advertising "resumable=1" does not
mean "I will definitely do a resumable clone". It means "I support the
resumable keyword, and if you share a resumable URL with me, I _might_
use it, depending on things that are none of your business" (e.g., if it
does not like the server URL's protocol).

It is recoverable by having the client re-contact the server without the
resumable flag, so it could still be a net win if the client will select
the resumable URL a majority of the time.

I'm also not happy about having an HTTP-only feature in the protocol. I
liked Stefan's proposal for the "v2" protocol that would let the two
sides exchange capabilities before the ref advertisement. Then the
client, having seen the server's resumable URL, knows whether or not
to proceed with the advertisement.

> Clients are going to follow the returned <url> to get a bundle header,
> which contains the references. And then incremental fetch from the
> server after downloading the pack. So returning references with the
> resumable URL during a clone is an unnecessary waste of bandwidth.

If the bundle is up to date, the client can skip the follow-up
incremental fetch, as it knows that it has everything needed for the
original ref advertisement it got. Whether that's a net win depends on
how up-to-date the bundles are.

If "C" is the cost to contact the server at all and "A" is the cost of
the advertisement, then a "hit" with this scheme means the overhead is
C+A (we contact the server only once). A "miss" means we have do the
followup fetch anyway, and we pay 2C+2A (paying the advertisement cost
both times). Whereas with your scheme, we pay 2C+A always; two contacts,
but only the second has an advertisement.

So it depends on the relative cost of C and A, and how often we expect
it to kick in.

In practice, I suspect it's mostly dominated by the cost of the actual
clone objects anyway, but maybe that is different for Gerrit. I hear
refs/changes/ can get pretty big. :)

But if that is the case, then "C" is almost certainly negligible
compared to "A".

> We could also consider allowing resumable=<url> to be relative to the
> repository, so that on HTTP schemes a server could just reply
> resumable=pack-HASH.info or something short and not worry about
> overflowing the capabilities line.

I think that's orthogonal, but yeah, it might be a nice feature for
admins setting up the server side config.

-Peff

  reply	other threads:[~2016-02-14 17:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 18:59 RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP Shawn Pearce
2016-02-10 20:11 ` Shawn Pearce
2016-02-10 20:23   ` Stefan Beller
2016-02-10 20:57     ` Junio C Hamano
2016-02-10 21:22       ` Jonathan Nieder
2016-02-10 22:03         ` Jeff King
2016-02-10 21:01     ` Jonathan Nieder
2016-02-10 21:07       ` Junio C Hamano
2016-02-11  3:43       ` Junio C Hamano
2016-02-11 18:04         ` Shawn Pearce
2016-02-11 23:53       ` Duy Nguyen
2016-02-13  5:07         ` Junio C Hamano
2016-02-10 21:49   ` Jeff King
2016-02-10 22:17     ` Jonathan Nieder
2016-02-10 23:03       ` Jeff King
2016-02-10 22:40     ` Junio C Hamano
2016-02-11 21:32     ` Junio C Hamano
2016-02-11 21:46       ` Jeff King
2016-02-13  1:40     ` Blake Burkhart
2016-02-13 17:00       ` Jeff King
2016-02-14  2:14     ` Shawn Pearce
2016-02-14 17:05       ` Jeff King [this message]
2016-02-14 17:56         ` Shawn Pearce
2016-02-16 18:34         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160214170525.GB10219@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).