git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* submodule support in git-bundle
@ 2018-11-02 16:09 Duy Nguyen
  2018-11-02 17:08 ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Duy Nguyen @ 2018-11-02 16:09 UTC (permalink / raw)
  To: Git Mailing List

I use git-bundle today and it occurs to me that if I want to use it to
transfer part of a history that involves submodule changes, things
aren't pretty. Has anybody given thought on how to do binary history
transfer that contains changes from submodules?

Since .bundle files are basically .pack files, i'm not sure if it's
easy to bundle multiple pack files (one per repo)...
-- 
Duy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: submodule support in git-bundle
  2018-11-02 16:09 submodule support in git-bundle Duy Nguyen
@ 2018-11-02 17:08 ` Stefan Beller
  2018-11-02 18:34   ` Duy Nguyen
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Beller @ 2018-11-02 17:08 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git

On Fri, Nov 2, 2018 at 9:10 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> I use git-bundle today and it occurs to me that if I want to use it to
> transfer part of a history that involves submodule changes, things
> aren't pretty. Has anybody given thought on how to do binary history
> transfer that contains changes from submodules?
>
> Since .bundle files are basically .pack files, i'm not sure if it's
> easy to bundle multiple pack files (one per repo)...

That is a really good discussion starter!

As bundles are modeled after the fetch protocol, I would
redirect the discussion there.

The new fetch protocol could support sending more than
one pack, which could be for both the superproject as
well as the relevant submodule updates (i.e. what is recorded
in the superproject) based on a new capability.

We at Google have given this idea some thought, but from a
different angle: As you may know currently Android uses the
repo tool, which we want to replace with Gits native submodules
eventually. The repo tool tests for each repository to clone if
there is a bundle file for that repository, such that instead of
cloning the repo, the bundle can be downloaded and then
a catch-up fetch can be performed. (This helps the Git servers
as well as the client, the bundle can be hosted on a CDN,
which is faster and cheaper than a git server for us).

So we've given some thought on extending the packfiles in the
fetch protocol to have some redirection to a CDN possible,
i.e. instead of sending bytes as is, you get more or less a "todo"
list, which might be
    (a) take the following bytes as is (current pack format)
    (b) download these other bytes from $THERE
        (possibly with a checksum)
once the stream of bytes is assembled, it will look like a regular
packfile with deltas etc.

This offloading-to-CDN (or "mostly resumable clone" in the
sense that the communication with the server is minimal, and
you get most of your data via resumable http range-requests)
sounds like complete offtopic, but is one of the requirements
for the repo to submodule migration, hence I came to speak of it.

Did you have other things in mind, on a higher level?
e.g. querying the bundle and creating submodule bundles
based off the superproject bundle? 'git bundle create' could
learn the --recurse-submodules option, which then produces
multiple bundle files without changing the file formats.

Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: submodule support in git-bundle
  2018-11-02 17:08 ` Stefan Beller
@ 2018-11-02 18:34   ` Duy Nguyen
  2018-11-02 19:00     ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Duy Nguyen @ 2018-11-02 18:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List

On Fri, Nov 2, 2018 at 6:09 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Fri, Nov 2, 2018 at 9:10 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > I use git-bundle today and it occurs to me that if I want to use it to
> > transfer part of a history that involves submodule changes, things
> > aren't pretty. Has anybody given thought on how to do binary history
> > transfer that contains changes from submodules?
> >
> > Since .bundle files are basically .pack files, i'm not sure if it's
> > easy to bundle multiple pack files (one per repo)...
>
> That is a really good discussion starter!
>
> As bundles are modeled after the fetch protocol, I would
> redirect the discussion there.
>
> The new fetch protocol could support sending more than
> one pack, which could be for both the superproject as
> well as the relevant submodule updates (i.e. what is recorded
> in the superproject) based on a new capability.
>
> We at Google have given this idea some thought, but from a
> different angle: As you may know currently Android uses the
> repo tool, which we want to replace with Gits native submodules
> eventually. The repo tool tests for each repository to clone if
> there is a bundle file for that repository, such that instead of
> cloning the repo, the bundle can be downloaded and then
> a catch-up fetch can be performed. (This helps the Git servers
> as well as the client, the bundle can be hosted on a CDN,
> which is faster and cheaper than a git server for us).
>
> So we've given some thought on extending the packfiles in the
> fetch protocol to have some redirection to a CDN possible,
> i.e. instead of sending bytes as is, you get more or less a "todo"
> list, which might be
>     (a) take the following bytes as is (current pack format)
>     (b) download these other bytes from $THERE
>         (possibly with a checksum)
> once the stream of bytes is assembled, it will look like a regular
> packfile with deltas etc.
>
> This offloading-to-CDN (or "mostly resumable clone" in the
> sense that the communication with the server is minimal, and
> you get most of your data via resumable http range-requests)
> sounds like complete offtopic, but is one of the requirements
> for the repo to submodule migration, hence I came to speak of it.

Hm.. so what you're saying is, we could have a pack file that lists
other (real) pack files and for the bundle case they are all in the
same file. And "download from $THERE" in this case is "download at
this file offset"? That might actually work.

> Did you have other things in mind, on a higher level?
> e.g. querying the bundle and creating submodule bundles
> based off the superproject bundle? 'git bundle create' could
> learn the --recurse-submodules option, which then produces
> multiple bundle files without changing the file formats.

This is probably the simplest way to support submodules. I just
haven't really thought much about it (the problem just came up to me
like 2 hours ago). Two problems with this are convenience (I don't
want to handle multiple files) and submodule info (which pack should
be unbundled on which submodule?). But I suppose if "git bundle"
produces a tarball of these bundle files then you solve both.

But of course there may be other and better options like what you
described above. If in long term we have "pack with hyperlinks" anyway
for resumable clone and other fancy stuff then reusing the same
mechanism for bundles makes sense, less maintenance burden.
-- 
Duy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: submodule support in git-bundle
  2018-11-02 18:34   ` Duy Nguyen
@ 2018-11-02 19:00     ` Stefan Beller
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Beller @ 2018-11-02 19:00 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git

> > This offloading-to-CDN (or "mostly resumable clone" in the
> > sense that the communication with the server is minimal, and
> > you get most of your data via resumable http range-requests)
> > sounds like complete offtopic, but is one of the requirements
> > for the repo to submodule migration, hence I came to speak of it.
>
> Hm.. so what you're saying is, we could have a pack file that lists
> other (real) pack files and for the bundle case they are all in the
> same file. And "download from $THERE" in this case is "download at
> this file offset"? That might actually work.

We're conflating 2 things here.
This idea of CDN offloading has nothing to do with submodules, it's
just a general thing to improve the fetch protocol.
And the pointed at file doesn't need to be a "real" packfile, as long
as the bytestream at the end looks like a real packfile. For example
the bytes to get from $THERE would not need to have a pack header
(or if it had, I would ask you to omit the first bytes containing the header)
as I can give the header myself.

The idea for submodules is more along the lines of having "just"
multiple pack files in the stream. For the bundle case we would
probably not have redirection to $THERE in there, as it should
be self contained completely (we don't know if the bundle recipient
can access $THERE in a timely manner).


> > Did you have other things in mind, on a higher level?
> > e.g. querying the bundle and creating submodule bundles
> > based off the superproject bundle? 'git bundle create' could
> > learn the --recurse-submodules option, which then produces
> > multiple bundle files without changing the file formats.
>
> This is probably the simplest way to support submodules.

Yep, that sounds simplest, but I think it makes for bad UX.
(Multiple files, need to be kept in some order and applied correctly)


> I just
> haven't really thought much about it (the problem just came up to me
> like 2 hours ago). Two problems with this are convenience (I don't
> want to handle multiple files) and submodule info (which pack should
> be unbundled on which submodule?). But I suppose if "git bundle"
> produces a tarball of these bundle files then you solve both.

The tarball makes it one file and would naturally provide some
order. It feels iffy, I'd rather have multiple packs in the bundle.

> But of course there may be other and better options like what you
> described above. If in long term we have "pack with hyperlinks" anyway
> for resumable clone and other fancy stuff then reusing the same
> mechanism for bundles makes sense, less maintenance burden.

I think of the hyperlinks in packs as an orthogonal feature, but closely
nearby in code and implementation, which is why I brought it up.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-02 19:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-02 16:09 submodule support in git-bundle Duy Nguyen
2018-11-02 17:08 ` Stefan Beller
2018-11-02 18:34   ` Duy Nguyen
2018-11-02 19:00     ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).