git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Stefan Beller <sbeller@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Ben Peart <Ben.Peart@microsoft.com>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Mike Hommey <mh@glandium.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Eric Wong <e@80x24.org>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
Date: Fri, 25 Aug 2017 14:23:00 -0700	[thread overview]
Message-ID: <20170825142300.01b15d6b@twelve2.svl.corp.google.com> (raw)
In-Reply-To: <CAP8UFD1oONnj93UKf=nBzgOQtY2E+ZVvoLGDNGLsZVobfiN90Q@mail.gmail.com>

On Fri, 25 Aug 2017 08:14:08 +0200
Christian Couder <christian.couder@gmail.com> wrote:

> As Git is used by more and more by people having different needs, I
> think it is not realistic to expect that we can optimize its object
> storage for all these different needs. So a better strategy is to just
> let them store objects in external stores.
[snip]
> About these many use cases, I gave the "really big binary files"
> example which is why Git LFS exists (and which GitLab is interested in
> better solving), and the "really big number of files that are fetched
> only as needed" example which Microsoft is interested in solving. I
> could also imagine that some people have both big text files and big
> binary files in which case the "core.bigfilethreshold" might not work
> well, or that some people already have blobs in some different stores
> (like HTTP servers, Docker registries, artifact stores, ...) and want
> to fetch them from there as much as possible. 

Thanks for explaining the use cases - this makes sense, especially the
last one which motivates the different modes for the "get" command
(return raw bytes vs populating the Git repository with loose/packed
objects).

> And then letting people
> use different stores can make clones or fetches restartable which
> would solve another problem people have long been complaining about...

This is unrelated to the rest of my e-mail, but out of curiosity, how
would a different store make clones or fetches restartable? Do you mean
that Git would invoke a "fetch" command through the ODB protocol instead
of using its own native protocol?

> >> +Furthermore many improvements that are dependent on specific setups
> >> +could be implemented in the way Git objects are managed if it was
> >> +possible to customize how the Git objects are handled. For example a
> >> +restartable clone using the bundle mechanism has often been requested,
> >> +but implementing that would go against the current strict rules under
> >> +which the Git objects are currently handled.
> >
> > So in this example, you would use todays git-clone to obtain a small version
> > of the repo and then obtain other objects later?
> 
> The problem with explaining how it would work is that the
> --initial-refspec option is added to git clone later in the patch
> series. And there could be changes in the later part of the patch
> series. So I don't want to promise or explain too much here.
> But maybe I could add another patch to better explain that at the end
> of the series.

Such an explanation, in whatever form (patch or e-mail) would be great,
because I'm not sure of the interaction between fetches and the
connectivity check.

The approach I have taken in my own patches [1] is to (1) declare that
if a lazy remote supplies an object, it promises to have everything
referred to by that object, and (2) we thus only need to check the
objects not from the lazy remote. Translated to the ODB world, (1) is
possible in the Microsoft case and is trivial in all the cases where the
ODB provides only blobs (since blobs don't refer to any other object),
and for (2), a "list" command should suffice.

One constraint is that we do not want to obtain (from the remote) or
store a separate list of what it has, to avoid the overhead. (I saw the
--initial-refspec approach - that would not work if we want to avoid the
overhead.)

For fetches, we remember the objects obtained from that specific remote
by adding a special file, name to be determined (I used ".imported" in
[1]). (The same method is used to note objects lazily downloaded.) The
repack command understands the difference between these two types of
objects (patches for this are in progress).

I'm not sure if this can be translated to the ODB world. The ODB can
declare a special capability that fetch sends to the server in order to
inform the server that it can exclude certain objects, and fetch can
inform the ODB of the packfiles that it has written, but I'm not sure
how the ODB can "remember" what it has. The ODB could mark such packs
with ".managed" to note that it is managed by that ODB, so Git shoudn't
touch it, but this means (for example) that Git can't GC them (and it
seems also quite contradictory for an ODB to manage Git packfiles).

[1] https://public-inbox.org/git/20170804145113.5ceafafa@twelve2.svl.corp.google.com/

  reply	other threads:[~2017-08-25 21:23 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
2017-08-03  9:18 ` [PATCH v5 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
2017-08-03  9:18 ` [PATCH v5 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
2017-08-03  9:18 ` [PATCH v5 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
2017-08-03  9:18 ` [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
2017-08-03 19:11   ` Junio C Hamano
2017-08-04  6:32     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 05/40] t0021/rot13-filter: use Git/Packet.pm Christian Couder
2017-08-03  9:18 ` [PATCH v5 06/40] Git/Packet.pm: improve error message Christian Couder
2017-08-03  9:18 ` [PATCH v5 07/40] Git/Packet.pm: add packet_initialize() Christian Couder
2017-08-03  9:18 ` [PATCH v5 08/40] Git/Packet.pm: add capability functions Christian Couder
2017-08-03 19:14   ` Junio C Hamano
2017-08-04 20:34     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 09/40] sha1_file: prepare for external odbs Christian Couder
2017-08-03  9:18 ` [PATCH v5 10/40] Add initial external odb support Christian Couder
2017-08-03 19:34   ` Junio C Hamano
2017-08-03 20:17     ` Jeff King
2017-09-14 10:14     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
2017-09-10 12:12   ` Lars Schneider
2017-09-14  7:18     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
2017-09-10 12:12   ` Lars Schneider
2017-09-14  7:09     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 13/40] external odb: add 'put_raw_obj' support Christian Couder
2017-08-03 19:50   ` Junio C Hamano
2017-09-14  9:17     ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 14/40] external-odb: accept only blobs for now Christian Couder
2017-08-03 19:52   ` Junio C Hamano
2017-09-14  9:59     ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 15/40] t0400: add test for external odb write support Christian Couder
2017-08-03  9:19 ` [PATCH v5 16/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2017-08-03  9:19 ` [PATCH v5 17/40] Add t0410 to test external ODB transfer Christian Couder
2017-08-03  9:19 ` [PATCH v5 18/40] lib-httpd: pass config file to start_httpd() Christian Couder
2017-08-03  9:19 ` [PATCH v5 19/40] lib-httpd: add upload.sh Christian Couder
2017-08-03 20:07   ` Junio C Hamano
2017-09-14  7:43     ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 20/40] lib-httpd: add list.sh Christian Couder
2017-08-03  9:19 ` [PATCH v5 21/40] lib-httpd: add apache-e-odb.conf Christian Couder
2017-08-03  9:19 ` [PATCH v5 22/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
2017-08-03  9:19 ` [PATCH v5 23/40] pack-objects: don't pack objects in external odbs Christian Couder
2017-08-03  9:19 ` [PATCH v5 24/40] Add t0420 to test transfer to HTTP external odb Christian Couder
2017-08-03  9:19 ` [PATCH v5 25/40] external-odb: add 'get_direct' support Christian Couder
2017-08-03 21:40   ` Junio C Hamano
2017-09-14  8:39     ` Christian Couder
2017-09-14 18:19       ` Jonathan Tan
2017-09-15 11:24         ` Christian Couder
2017-09-15 20:54           ` Jonathan Tan
2017-08-03  9:19 ` [PATCH v5 26/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
2017-08-03  9:19 ` [PATCH v5 27/40] odb-helper: add init_object_process() Christian Couder
2017-08-03  9:19 ` [PATCH v5 28/40] Add t0450 to test 'get_direct' mechanism Christian Couder
2017-08-03  9:19 ` [PATCH v5 29/40] Add t0460 to test passing git objects Christian Couder
2017-08-03  9:19 ` [PATCH v5 30/40] odb-helper: add put_object_process() Christian Couder
2017-08-03  9:19 ` [PATCH v5 31/40] Add t0470 to test passing raw objects Christian Couder
2017-08-03  9:19 ` [PATCH v5 32/40] odb-helper: add have_object_process() Christian Couder
2017-08-03  9:19 ` [PATCH v5 33/40] Add t0480 to test "have" capability and raw objects Christian Couder
2017-08-03  9:19 ` [PATCH v5 34/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
2017-08-03  9:19 ` [PATCH v5 35/40] Add Documentation/technical/external-odb.txt Christian Couder
2017-08-03 18:38   ` Stefan Beller
2017-08-25  6:14     ` Christian Couder
2017-08-25 21:23       ` Jonathan Tan [this message]
2017-08-29  9:37         ` Christian Couder
2017-08-28 18:59   ` Ben Peart
2017-08-29 15:43     ` Christian Couder
2017-08-30 12:50       ` Ben Peart
2017-08-30 14:15         ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 36/40] clone: add 'initial' param to write_remote_refs() Christian Couder
2017-08-03  9:19 ` [PATCH v5 37/40] clone: add --initial-refspec option Christian Couder
2017-08-03  9:19 ` [PATCH v5 38/40] clone: disable external odb before initial clone Christian Couder
2017-08-03  9:19 ` [PATCH v5 39/40] Add tests for 'clone --initial-refspec' Christian Couder
2017-08-03  9:19 ` [PATCH v5 40/40] Add t0430 to test cloning using bundles Christian Couder
2017-09-10 12:30 ` [PATCH v5 00/40] Add initial experimental external ODB support Lars Schneider
2017-09-14  7:02   ` Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170825142300.01b15d6b@twelve2.svl.corp.google.com \
    --to=jonathantanmy@google.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).