git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Ben Peart <Ben.Peart@microsoft.com>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Mike Hommey <mh@glandium.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Eric Wong <e@80x24.org>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [RFC/PATCH v4 00/49] Add initial experimental external ODB support
Date: Fri, 15 Sep 2017 15:16:53 +0200	[thread overview]
Message-ID: <CAP8UFD1hYHU0Wb+=vd3OK1cKUOKETEH5APGNaXa0W2ZBUUNgFQ@mail.gmail.com> (raw)
In-Reply-To: <20170712120647.6340f75a@twelve2.svl.corp.google.com>

(It looks like I did not reply to this other email yet, sorry about
this late reply.)

On Wed, Jul 12, 2017 at 9:06 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Tue, 20 Jun 2017 09:54:34 +0200
> Christian Couder <christian.couder@gmail.com> wrote:
>
>> Git can store its objects only in the form of loose objects in
>> separate files or packed objects in a pack file.
>>
>> To be able to better handle some kind of objects, for example big
>> blobs, it would be nice if Git could store its objects in other object
>> databases (ODB).
>
> Thanks for this, and sorry for the late reply. It's good to know that
> others are thinking about "missing" objects in repos too.
>
>>   - "have": the helper should respond with the sha1, size and type of
>>     all the objects the external ODB contains, one object per line.
>
> This should work well if we are not caching this "have" information
> locally (that is, if the object store can be accessed with low latency),
> but I am not sure if this will work otherwise.

Yeah, there could be problems related to caching or not caching the
"have" information.
As a repo should not send the blobs that are in an external odb, I
think it could be useful to cache the "have" information.
I plan to take a look and add related tests soon.

> I see that you have
> proposed a local cache-using method later in the e-mail - my comments on
> that are below.
>
>>   - "get <sha1>": the helper should then read from the external ODB
>>     the content of the object corresponding to <sha1> and pass it to
>> Git.
>
> This makes sense - I have some patches [1] that implement this with the
> "fault_in" mechanism described in your e-mail.
>
> [1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
>
>> * Transfering information
>>
>> To tranfer information about the blobs stored in external ODB, some
>> special refs, called "odb ref", similar as replace refs, are used in
>> the tests of this series, but in general nothing forces the helper to
>> use that mechanism.
>>
>> The external odb helper is responsible for using and creating the refs
>> in refs/odbs/<odbname>/, if it wants to do that. It is free for
>> example to just create one ref, as it is also free to create many
>> refs. Git would just transmit the refs that have been created by this
>> helper, if Git is asked to do so.
>>
>> For now in the tests there is one odb ref per blob, as it is simple
>> and as it is similar to what git-lfs does. Each ref name is
>> refs/odbs/<odbname>/<sha1> where <sha1> is the sha1 of the blob stored
>> in the external odb named <odbname>.
>>
>> These odb refs point to a blob that is stored in the Git
>> repository and contain information about the blob stored in the
>> external odb. This information can be specific to the external odb.
>> The repos can then share this information using commands like:
>>
>> `git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`
>>
>> At the end of the current patch series, "git clone" is teached a
>> "--initial-refspec" option, that asks it to first fetch some specified
>> refs. This is used in the tests to fetch the odb refs first.
>>
>> This way only one "git clone" command can setup a repo using the
>> external ODB mechanism as long as the right helper is installed on the
>> machine and as long as the following options are used:
>>
>>   - "--initial-refspec <odbrefspec>" to fetch the odb refspec
>>   - "-c odb.<odbname>.command=<helper>" to configure the helper
>
> A method like this means that information about every object is
> downloaded, regardless of which branches were actually cloned, and
> regardless of what parameters (e.g. max blob size) were used to control
> the objects that were actually cloned.
>
> We could make, say, one "odb ref" per size and branch - for example,
> "refs/odbs/master/0", "refs/odbs/master/1k", "refs/odbs/master/1m", etc.
> - and have the client know which one to download. But this wouldn't
> scale if we introduce different object filters in the clone and fetch
> commands.

Yeah, there are multiple ways to do that.

> I think that it is best to have upload-pack send this information
> together with the packfile, since it knows exactly what objects were
> omitted, and therefore what information the client needs. As discussed
> in a sibling e-mail, clone/fetch already needs to be modified to omit
> objects anyway.

I try to avoid sending this information as I don't think it is
necessary and it simplify things a lot to not have to change the
communication protocol.

      reply	other threads:[~2017-09-15 13:16 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-20  7:54 [RFC/PATCH v4 00/49] Add initial experimental external ODB support Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 01/49] builtin/clone: get rid of 'value' strbuf Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 02/49] t0021/rot13-filter: refactor packet reading functions Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 03/49] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 04/49] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 05/49] t0021/rot13-filter: use Git/Packet.pm Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 06/49] Git/Packet.pm: improve error message Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 07/49] Git/Packet.pm: add packet_initialize() Christian Couder
2017-06-23 18:55   ` Ben Peart
2017-06-20  7:54 ` [RFC/PATCH v4 08/49] Git/Packet: add capability functions Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 09/49] Add initial external odb support Christian Couder
2017-06-23 19:49   ` Ben Peart
2017-06-20  7:54 ` [RFC/PATCH v4 10/49] external odb foreach Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 11/49] t0400: add 'put' command to odb-helper script Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 12/49] external odb: add write support Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 13/49] external-odb: accept only blobs for now Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 14/49] t0400: add test for external odb write support Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 15/49] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 16/49] Add t0410 to test external ODB transfer Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 17/49] lib-httpd: pass config file to start_httpd() Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 18/49] lib-httpd: add upload.sh Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 19/49] lib-httpd: add list.sh Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 20/49] lib-httpd: add apache-e-odb.conf Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 21/49] odb-helper: add 'store_plain_objects' to 'struct odb_helper' Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 22/49] pack-objects: don't pack objects in external odbs Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 23/49] t0420: add test with HTTP external odb Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 24/49] odb-helper: start fault in implementation Christian Couder
2017-06-20  7:54 ` [RFC/PATCH v4 25/49] external-odb: add external_odb_fault_in_object() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 26/49] odb-helper: add script_mode Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 27/49] Documentation: add read-object-protocol.txt Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 28/49] contrib: add long-running-read-object/example.pl Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 29/49] Add t0410 to test read object mechanism Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 30/49] odb-helper: add read_object_process() Christian Couder
2017-07-10 15:57   ` Ben Peart
2017-06-20  7:55 ` [RFC/PATCH v4 31/49] external-odb: add external_odb_get_capabilities() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 32/49] t04*: add 'get_cap' support to helpers Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 33/49] odb-helper: call odb_helper_lookup() with 'have' capability Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 34/49] odb-helper: fix odb_helper_fetch_object() for read_object Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 35/49] Add t0460 to test passing git objects Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 36/49] odb-helper: add read_packetized_git_object_to_fd() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 37/49] odb-helper: add read_packetized_plain_object_to_fd() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 38/49] Add t0470 to test passing plain objects Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 39/49] odb-helper: add write_object_process() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 40/49] Add t0480 to test "have" capability and plain objects Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 41/49] external-odb: add external_odb_do_fetch_object() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 42/49] odb-helper: advertise 'have' capability Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 43/49] odb-helper: advertise 'put' capability Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 44/49] odb-helper: add have_object_process() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 45/49] clone: add initial param to write_remote_refs() Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 46/49] clone: add --initial-refspec option Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 47/49] clone: disable external odb before initial clone Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 48/49] Add test for 'clone --initial-refspec' Christian Couder
2017-06-20  7:55 ` [RFC/PATCH v4 49/49] t: add t0430 to test cloning using bundles Christian Couder
2017-06-20 13:48 ` [RFC/PATCH v4 00/49] Add initial experimental external ODB support Christian Couder
2017-06-23 18:24 ` Ben Peart
2017-07-01 19:41   ` Christian Couder
2017-07-01 20:12     ` Christian Couder
2017-07-01 20:33     ` Junio C Hamano
2017-07-02  4:25       ` Christian Couder
2017-07-03 16:56         ` Junio C Hamano
2017-07-06 17:36     ` Ben Peart
2017-09-15 12:56       ` Christian Couder
2017-07-12 19:06 ` Jonathan Tan
2017-09-15 13:16   ` Christian Couder [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8UFD1hYHU0Wb+=vd3OK1cKUOKETEH5APGNaXa0W2ZBUUNgFQ@mail.gmail.com' \
    --to=christian.couder@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=chriscool@tuxfamily.org \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).