From: Jonathan Nieder <email@example.com> To: Christian Couder <firstname.lastname@example.org> Cc: Stefan Beller <email@example.com>, git <firstname.lastname@example.org>, Junio C Hamano <email@example.com>, Jeff King <firstname.lastname@example.org>, Ben Peart <Ben.Peart@microsoft.com>, Jonathan Tan <email@example.com>, Duy Nguyen <firstname.lastname@example.org>, Mike Hommey <email@example.com>, Lars Schneider <firstname.lastname@example.org>, Eric Wong <email@example.com>, Christian Couder <firstname.lastname@example.org>, Jeff Hostetler <email@example.com>, Eric Sunshine <firstname.lastname@example.org>, Beat Bolli <email@example.com> Subject: Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote Date: Tue, 16 Oct 2018 10:43:04 -0700 [thread overview] Message-ID: <20181016174304.GA221682@aiede.svl.corp.google.com> (raw) In-Reply-To: <CAP8UFD1ia1xWk9pjfTUQ3zD7=dP=8UjKzf=G0ptsz=qRH8_X+Q@mail.gmail.com> Hi Christian, On Tue, Sep 25, 2018, Christian Couder wrote: > In the cover letter there is a "Discussion" section which is about > this, but I agree that it might not be very clear. > > The main issue that this patch series tries to solve is that > extensions.partialclone config option limits the partial clone and > promisor features to only one remote. One related issue is that it > also prevents to have other kind of promisor/partial clone/odb > remotes. By other kind I mean remotes that would not necessarily be > git repos, but that could store objects (that's where ODB, for Object > DataBase, comes from) and could provide those objects to Git through a > helper (or driver) script or program. Thanks for this explanation. I took the opportunity to learn more while you were in the bay area for the google summer of code mentor summit and learned a little more, which was very helpful to me. The broader picture is that this is meant to make Git natively handle large blobs in a nicer way. The design in this series has a few components: 1. Teaching partial clone to attempt to fetch missing objects from multiple remotes instead of only one. This is useful because you can have a server that is nearby and cheaper to serve from (some kind of local cache server) that you make requests to first before falling back to the canonical source of objects. 2. Simplifying the protocol for fetching missing objects so that it can be satisfied by a lighter weight object storage system than a full Git server. The ODB helpers introduced in this series are meant to speak such a simpler protocol since they are only used for one-off requests of a collection of missing objects instead of needing to understand refs, Git's negotiation, etc. 3. (possibly, though not in this series) Making the criteria for what objects can be missing more aggressive, so that I can "git add" a large file and work with it using Git without even having a second copy of that object in my local object store. For (2), I would like to see us improve the remote helper infrastructure instead of introducing a new ODB helper. Remote helpers are already permitted to fetch some objects without listing refs --- perhaps we will want to i. split listing refs to a separate capability, so that a remote helper can advertise that it doesn't support that. (Alternatively the remote could advertise that it has no refs.) ii. Use the "long-running process" mechanism to improve how Git communicates with a remote helper. For (1), things get more tricky. In an object store from a partial clone today, we relax the ordinary "closure under reachability" invariant but in a minor way. We'll need to work out how this works with multiple promisor remotes. The idea today is that there are two kinds of packs: promisor packs (from the promisor remote) and non-promisor packs. Promisor packs are allowed to have reachability edges (for example a tree->blob edge) that point to a missing object, since the promisor remote has promised that we will be able to access that object on demand. Non-promisor packs are also allowed to have reachability edges that point to a missing object, as long as there is a reachability edge from an object in a promisor pack to the same object (because of the same promise). See "Handling Missing Objects" in Documentation/technical/partial-clone.txt for more details. To prevent older versions of Git from being confused by partial clone repositories, they use the repositoryFormatVersion mechanism: [core] repositoryFormatVersion = 1 [extensions] partialClone = ... If we change the invariant, we will need to use a new extensions.* key to ensure that versions of Git that are not aware of the new invariant do not operate on the repository. A promisor pack is indicated by there being a .promisor file next to the usual .pack file. Currently the .promisor file is empty. The previous idea was that once we want more metadata (e.g. for the sake of multiple promisor remotes), we could write it in that file. For example, remotes could be associated to a <promisor-id> and the .promisor file could indicate which <promisor-id> has promised to serve requests for objects reachable from objects in this pack. That will complicate the object access code as well, since currently we only find who has promised an object during "git fsck" and similar operations. During everyday access we do not care which promisor pack caused the object to be promised, since there is only one promisor remote to fetch from anyway. So much for the current setup. For (1), I believe you are proposing to still have only one effective <promisor-id>, so it doesn't necessarily require modifying the extensions.* configuration. Instead, the idea is that when trying to access an object, we would follow one of a list of steps: 1. First, check the local object store. If it's there, we're done. 2. Second, try alternates --- maybe the object is in one of those! 3. Now, try promisor remotes, one at a time, in user-configured order. In other words, I think that for (1) all we would need is a new configuration [object] missingObjectRemote = local-cache-remote missingObjectRemote = origin The semantics would be that when trying to access a promised object, we attempt to fetch from these remotes one at a time, in the order specified. We could require that the remote named in extensions.partialClone be one of the listed remotes, without having to care where it shows up in the list. That way, we get the benefit (1) without having to change the semantics of extensions.partialClone and without having to care about the order of sections in the config. What do you think? Thanks, Jonathan
next prev parent reply other threads:[~2018-10-16 17:43 UTC|newest] Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-02 6:14 [PATCH v4 0/9] Introducing remote ODBs Christian Couder 2018-08-02 6:14 ` [PATCH v4 1/9] fetch-object: make functions return an error code Christian Couder 2018-08-02 6:14 ` [PATCH v4 2/9] Add initial remote odb support Christian Couder 2018-08-02 6:14 ` [PATCH v4 3/9] remote-odb: implement remote_odb_get_direct() Christian Couder 2018-08-02 6:15 ` [PATCH v4 4/9] remote-odb: implement remote_odb_get_many_direct() Christian Couder 2018-08-02 6:15 ` [PATCH v4 5/9] remote-odb: add remote_odb_reinit() Christian Couder 2018-08-02 6:15 ` [PATCH v4 6/9] Use remote_odb_get_direct() and has_remote_odb() Christian Couder 2018-08-02 6:15 ` [PATCH v4 7/9] Use odb.origin.partialclonefilter instead of core.partialclonefilter Christian Couder 2018-08-02 6:15 ` [PATCH v4 8/9] t0410: test fetching from many promisor remotes Christian Couder 2018-08-02 6:15 ` [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote Christian Couder 2018-08-02 22:55 ` Stefan Beller 2018-09-25 8:07 ` Christian Couder 2018-09-25 22:31 ` Junio C Hamano 2018-09-26 4:12 ` Jeff King 2018-09-26 13:44 ` Taylor Blau 2018-09-26 18:11 ` Junio C Hamano 2018-10-16 17:43 ` Jonathan Nieder [this message] 2018-10-16 22:22 ` Jonathan Tan 2018-10-19 0:01 ` Junio C Hamano 2018-10-19 0:33 ` Jonathan Nieder 2018-10-19 2:55 ` Junio C Hamano 2018-10-31 6:28 ` Christian Couder 2018-11-01 21:16 ` Jeff King
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181016174304.GA221682@aiede.svl.corp.google.com \ --firstname.lastname@example.org \ --cc=Ben.Peart@microsoft.com \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).