git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 4/4] bundle v3: the beginning
Date: Tue, 31 May 2016 19:43:27 +0700	[thread overview]
Message-ID: <CACsJy8Dr_Z886Jb-O8gbAv_vzBLicNH6bPPpKwb9HWZTKQ9muw@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD1xqRMFE2Wzntu=XevCyj+acGLEO-cTq1fqn+NMe3x0vg@mail.gmail.com>

On Fri, May 20, 2016 at 7:39 PM, Christian Couder
<christian.couder@gmail.com> wrote:
> I am responding to this 2+ month old email because I am investigating
> adding an alternate object store at the same level as loose and packed
> objects. This alternate object store could be used for large files. I
> am working on this for GitLab. (Yeah, I am working, as a freelance,
> for both Booking.com and GitLab these days.)

I'm also interested in this from a different angle, narrow clone that
potentially allows to skip download some large blobs (likely old ones
from the past that nobody will bother).

> On Wed, Mar 2, 2016 at 9:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> The bundle v3 format introduces an ability to have the bundle header
>> (which describes what references in the bundled history can be
>> fetched, and what objects the receiving repository must have in
>> order to unbundle it successfully) in one file, and the bundled pack
>> stream data in a separate file.
>>
>> A v3 bundle file begins with a line with "# v3 git bundle", followed
>> by zero or more "extended header" lines, and an empty line, finally
>> followed by the list of prerequisites and references in the same
>> format as v2 bundle.  If it uses the "split bundle" feature, there
>> is a "data: $URL" extended header line, and nothing follows the list
>> of prerequisites and references.  Also, "sha1: " extended header
>> line may exist to help validating that the pack stream data matches
>> the bundle header.
>>
>> A typical expected use of a split bundle is to help initial clone
>> that involves a huge data transfer, and would go like this:
>>
>>  - Any repository people would clone and fetch from would regularly
>>    be repacked, and it is expected that there would be a packfile
>>    without prerequisites that holds all (or at least most) of the
>>    history of it (call it pack-$name.pack).
>>
>>  - After arranging that packfile to be downloadable over popular
>>    transfer methods used for serving static files (such as HTTP or
>>    HTTPS) that are easily resumable as $URL/pack-$name.pack, a v3
>>    bundle file (call it $name.bndl) can be prepared with an extended
>>    header "data: $URL/pack-$name.pack" to point at the download
>>    location for the packfile, and be served at "$URL/$name.bndl".
>>
>>  - An updated Git client, when trying to "git clone" from such a
>>    repository, may be redirected to $URL/$name.bndl", which would be
>>    a tiny text file (when split bundle feature is used).
>>
>>  - The client would then inspect the downloaded $name.bndl, learn
>>    that the corresponding packfile exists at $URL/pack-$name.pack,
>>    and downloads it as pack-$name.pack, until the download succeeds.
>>    This can easily be done with "wget --continue" equivalent over an
>>    unreliable link.  The checksum recorded on the "sha1: " header
>>    line is expected to be used by this downloader (not written yet).
>
> I wonder if this mechanism could also be used or extended to clone and
> fetch an alternate object database.
>
> In [1], [2] and [3], and this was also discussed during the
> Contributor Summit last month, Peff says that he started working on
> alternate object database support a long time ago, and that the hard
> part is a protocol extension to tell remotes that you can access some
> objects in a different way.
>
> If a Git client would download a "$name.bndl" v3 bundle file that
> would have a "data: $URL/alt-odb-$name.odb" extended header, the Git
> client would just need to download "$URL/alt-odb-$name.odb" and use
> the alternate object database support on this file.

What does this file contain exactly? A list of SHA-1 that can be
retrieved from this remote/alternate odb? I wonder if we could just
git-replace for this marking. The replaced content could contain the
uri pointing to the alt odb. We could optionally contact alt odb to
retrieve real content, or just show the replaced/fake data when alt
odb is out of reach. Transferring git-replace is basically ref
exchange, which may be fine if you don't have a lot of objects in this
alt odb. If you do, well, we need to deal with lots of refs anyway.
This may benefit from it too.

> [3] http://thread.gmane.org/gmane.comp.version-control.git/202902/focus=203020

This points to  https://github.com/peff/git/commits/jk/external-odb
which is dead. Jeff, do you still have it somewhere, or is it not
worth looking at anymore?
-- 
Duy

  reply	other threads:[~2016-05-31 12:44 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-01 23:35 [PATCH 1/2] bundle: plug resource leak Junio C Hamano
2016-03-01 23:36 ` [PATCH 2/2] bundle: keep a copy of bundle file name in the in-core bundle header Junio C Hamano
2016-03-02  9:01   ` Jeff King
2016-03-02 18:15     ` Junio C Hamano
2016-03-02 20:32       ` [PATCH v2 0/4] "split bundle" preview Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 1/4] bundle doc: 'verify' is not about verifying the bundle Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 2/4] bundle: plug resource leak Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 3/4] bundle: keep a copy of bundle file name in the in-core bundle header Junio C Hamano
2016-03-02 20:49           ` Jeff King
2016-03-02 20:32         ` [PATCH v2 4/4] bundle v3: the beginning Junio C Hamano
2016-03-03  1:36           ` Duy Nguyen
2016-03-03  2:57             ` Junio C Hamano
2016-03-03  5:15               ` Duy Nguyen
2016-05-20 12:39           ` Christian Couder
2016-05-31 12:43             ` Duy Nguyen [this message]
2016-05-31 13:18               ` Christian Couder
2016-06-01 13:37                 ` Duy Nguyen
2016-06-07 14:49                   ` Christian Couder
2016-06-01 14:00                 ` Duy Nguyen
2016-06-07  8:46                   ` Christian Couder
2016-06-07  8:53                     ` Mike Hommey
2016-06-07 10:22                     ` Duy Nguyen
2016-06-07 19:23                     ` Junio C Hamano
2016-06-07 20:23                       ` Jeff King
2016-06-08 10:44                         ` Duy Nguyen
2016-06-08 16:19                           ` Jeff King
2016-06-09  8:53                             ` Duy Nguyen
2016-06-09 17:23                               ` Jeff King
2016-06-08 18:05                         ` Junio C Hamano
2016-06-08 19:00                           ` Jeff King
2016-05-31 22:23               ` Jeff King
2016-05-31 22:31             ` Jeff King
2016-06-07 13:19               ` Christian Couder
2016-06-07 20:35                 ` Jeff King
2016-03-02  8:54 ` [PATCH 1/2] bundle: plug resource leak Jeff King
2016-03-02  9:00   ` Junio C Hamano
2016-03-02  9:02     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8Dr_Z886Jb-O8gbAv_vzBLicNH6bPPpKwb9HWZTKQ9muw@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).