From: Jeff Hostetler <git@jeffhostetler.com>
To: Philip Oakley <philipoakley@iee.org>, Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, peff@peff.net, jonathantanmy@google.com,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH] partial-clone: design doc
Date: Thu, 14 Dec 2017 15:46:29 -0500 [thread overview]
Message-ID: <40a141bb-65e7-f1c5-7ada-f65670141b5e@jeffhostetler.com> (raw)
In-Reply-To: <2078863B63F54322A0C9455C8BC98C9D@PhilipOakley>
On 12/13/2017 8:17 AM, Philip Oakley wrote:
> From: "Junio C Hamano" <gitster@pobox.com>
>> "Philip Oakley" <philipoakley@iee.org> writes:
>>
>>>> + These filtered packfiles are incomplete in the traditional sense
>>>> because
>>>> + they may contain trees that reference blobs that the client does
>>>> not have.
>>>
>>> Is a comment needed here noting that currently, IIUC, the complete
>>> trees are fetched in the packfiles, it's just the un-necessary blobs
>>> that are omitted ?
>>
>> I probably am misreading what you meant to say, but the above
>> statement with "currently" taken literally to mean the system
>> without JeffH's changes, is false.
>
> I was meaning the current JeffH's V6 series, rather than the last Git release.
>
> In one of the previous discussions Jeff had noted that (at that time) his partial design would provide a full set of trees for the selected commits (excluding the trees already available locally), but only a few of the file blobs (based on the filter spec).
>
> So yes, I should have been clearer to avoid talking at cross purposes.
Right, we build upon the existing thin-pack capabilities such that a
fetch following a clone gets a packfile that assumes the client already
has all of the objects in the "edge". So a fetch would not need to
receive trees and blobs that are already present in the edge commits.
What we are adding here is a way to filter/restrict even further the
set of objects sent to the client.
>
>>
>> When the receiver says it has commit A and the sender wants to send
>> a commit B (because the receiver said it does not have it, and it
>> wants it), trees in A are not sent in the pack the sender sends to
>> give objects sufficient to complete B, which the receiver wanted to
>> have, even if B also has those trees. If you fetch from me twice
>> and between that time Documentation/ directory did not change, the
>> second fetch will not have the tree object that corresponds to that
>> hierarchy (and of course no blobs and sub trees inside it).
>
> Though, after the fetch has completed (v2.15 Git), the receiver will have the 'full set of trees and blobs'. In Jeff's design (V6) the reciever would still have a full set of trees, but only a partial set of the blobs. So my viewpoint was not of the pack file but of the receiver's object store after the fetch.
Currently (with our changes) the receiver will have all of the trees
and only some of the blobs. If we later add another filter that can
filter trees, the client will also have missing but referenced trees too.
>>
>> So "the complete trees are fetched" is not true. What is true (and
>> what matters more in JeffH's document) is that fetching is done in
>> such a way that objects resulting in the receiving repository are
>> complete in the current system that does not allow promised objects.
>> If some objects resulting in the receiving repository are incomplete,
>> the current system considers that we corrupted the repository.
>>
>> The promise mechanism says that it is fine for the receiving end to
>> lack blobs, trees or commits, as long as the promisor repository
>> tells it that these "missing" objects can be obtained from it later.
>
> True. (though I'm not sure exactly how Jeff decides about commits - I thought theye were not part of this optimisation)
I've not talked about commit filtering -- mainly because we already
have such machinery in shallow-clone -- and I did not want to mess
with the haves/wants computations.
But it will work with missing commits, because of the way object lookup
happens a missing commit will trigger the fetch-object code just like it
does for missing blobs. The ODB layer doesn't really care what type of
object it is -- just that it is missing and needs to be dynamically fetched.
Thanks
Jeff
next prev parent reply other threads:[~2017-12-14 20:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-08 19:26 [PATCH] Partial clone design document Jeff Hostetler
2017-12-08 19:26 ` [PATCH] partial-clone: design doc Jeff Hostetler
2017-12-08 20:14 ` Junio C Hamano
2017-12-13 22:34 ` Jeff Hostetler
2017-12-12 23:31 ` Philip Oakley
2017-12-12 23:57 ` Junio C Hamano
2017-12-13 13:17 ` Philip Oakley
2017-12-14 20:46 ` Jeff Hostetler [this message]
2017-12-14 20:32 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40a141bb-65e7-f1c5-7ada-f65670141b5e@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=jonathantanmy@google.com \
--cc=peff@peff.net \
--cc=philipoakley@iee.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).