git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Christian Couder <christian.couder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH v2 4/4] bundle v3: the beginning
Date: Tue, 7 Jun 2016 16:35:57 -0400	[thread overview]
Message-ID: <20160607203557.GB5726@sigill.intra.peff.net> (raw)
In-Reply-To: <CAP8UFD3mGKTONzh1fxCJAJBrmc=OcCHeRqBQi-xTzHvcnAJ_sw@mail.gmail.com>

On Tue, Jun 07, 2016 at 03:19:46PM +0200, Christian Couder wrote:

> >      But there are lots of cases where the server might want to tell
> >      the client that don't involve bundles at all.
> 
> The idea is also that anytime the server needs to send external ODB
> data to the client, it would ask its own external ODB to prepare a
> kind of bundle with that data and use the bundle v3 mechanism to send
> it.
> That may need the bundle v3 mechanism to be extended, but I don't see
> in which cases it would not work.

Ah, I see we do not have the same underlying mental model.

I think the external odb is purely the _client's_ business. The server
does not have to have an external odb at all, and does not need to know
about the client's. The client is responsible for telling the server
during the git protocol anything it would need to know (like "do not
bother sending objects over 50MB; I can get them elsewhere").

This makes the problem much more complicated, but it is more flexible
and decentralized.

> >        a. The receiving side of a connection (e.g., a fetch client)
> >           somehow has out-of-band access to some objects. How does it
> >           tell the other side "do not bother sending me these objects; I
> >           can get them in another way"?
> 
> I don't see a difference with regular objects that the fetch client
> already has. If it already has some regular objects, a way to tell the
> server "don't bother sending me these objects" is useful already and
> it should be possible to use it to tell the server that there is no
> need to send some objects stored in the external ODB too.

The way to do that with normal objects is by finding shared commit tips,
and assuming the normal git repository property of "if you have X, you
have all of the objects reachable from X".

This whole idea is essentially creating "holes" in that property. You
can enumerate all of the holes, but I am not sure that scales well. We
get a lot of efficiency by communicating only ref tips during the
negotiation, and not individual object names.

> Also something like this is needed for shallow clones and narrow
> clones anyway.

Yes, and I don't think it scales well there, either. A single shallow
cutoff works OK. But if you repeatedly shallow-fetch into a repository,
you end up with a patchwork of disconnected "islands" of history. The
CPU required on the server side to serve those fetch requests is much
greater than what would normally be needed. You can't use things like
reachability bitmaps, and you have to open up the trees for each island
to see which objects the other side actually has.

> >        b. The receiving side of a connection has out-of-band access to
> >           some objects. Some of these will be expensive to get (e.g.,
> >           requiring a large download), and some may be fast (e.g.,
> >           they've already been fetched to a local cache). How do we tell
> >           the sending side not to assume we have cheap access to these
> >           objects (e.g., for use as a delta base)?
> 
> I don't think we need to tell the sending side we have cheap access or
> not to some objects.
> If the objects are managed by the external ODB, it's the external ODB
> on the server and on the client that will manage these objects. They
> should not be used as delta bases.
> Perhaps there is no mechanism to say that some objects (basically all
> external ODB managed objects) should not be used as delta bases, but
> that could be added.

Yes, I agree that _if_ the server can access the list of objects
available in the external odb, this becomes much easier. I'm just not
convinced that level of coupling is a good idea.

Note that the server would also want to take this into account during
repacking, as otherwise you end up with fetches that are very expensive
to serve (you want to send X which is a delta based on Y, but you know
that Y is available via the external odb, and therefore should not be
used as a base. So you have to throw out the delta for X and either send
it whole or compute a new one. That's much more expensive than blitting
the delta from disk, which is what a normal clone would do).

-Peff

  reply	other threads:[~2016-06-07 20:36 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-01 23:35 [PATCH 1/2] bundle: plug resource leak Junio C Hamano
2016-03-01 23:36 ` [PATCH 2/2] bundle: keep a copy of bundle file name in the in-core bundle header Junio C Hamano
2016-03-02  9:01   ` Jeff King
2016-03-02 18:15     ` Junio C Hamano
2016-03-02 20:32       ` [PATCH v2 0/4] "split bundle" preview Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 1/4] bundle doc: 'verify' is not about verifying the bundle Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 2/4] bundle: plug resource leak Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 3/4] bundle: keep a copy of bundle file name in the in-core bundle header Junio C Hamano
2016-03-02 20:49           ` Jeff King
2016-03-02 20:32         ` [PATCH v2 4/4] bundle v3: the beginning Junio C Hamano
2016-03-03  1:36           ` Duy Nguyen
2016-03-03  2:57             ` Junio C Hamano
2016-03-03  5:15               ` Duy Nguyen
2016-05-20 12:39           ` Christian Couder
2016-05-31 12:43             ` Duy Nguyen
2016-05-31 13:18               ` Christian Couder
2016-06-01 13:37                 ` Duy Nguyen
2016-06-07 14:49                   ` Christian Couder
2016-06-01 14:00                 ` Duy Nguyen
2016-06-07  8:46                   ` Christian Couder
2016-06-07  8:53                     ` Mike Hommey
2016-06-07 10:22                     ` Duy Nguyen
2016-06-07 19:23                     ` Junio C Hamano
2016-06-07 20:23                       ` Jeff King
2016-06-08 10:44                         ` Duy Nguyen
2016-06-08 16:19                           ` Jeff King
2016-06-09  8:53                             ` Duy Nguyen
2016-06-09 17:23                               ` Jeff King
2016-06-08 18:05                         ` Junio C Hamano
2016-06-08 19:00                           ` Jeff King
2016-05-31 22:23               ` Jeff King
2016-05-31 22:31             ` Jeff King
2016-06-07 13:19               ` Christian Couder
2016-06-07 20:35                 ` Jeff King [this message]
2016-03-02  8:54 ` [PATCH 1/2] bundle: plug resource leak Jeff King
2016-03-02  9:00   ` Junio C Hamano
2016-03-02  9:02     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160607203557.GB5726@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).