From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Cc: peartben@gmail.com, Christian Couder <christian.couder@gmail.com>
Subject: Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3)
Date: Thu, 21 Sep 2017 13:59:43 -0400 [thread overview]
Message-ID: <5d295ab3-310e-321e-6e88-69484eb9ce8a@jeffhostetler.com> (raw)
In-Reply-To: <20170915134343.3814dc38@twelve2.svl.corp.google.com>
(part 2)
Additional overall comments on:
https://github.com/jonathantanmy/git/commits/partialclone2
{} I think it would help to split the blob-max-bytes filtering and the
promisor/promised concepts and discuss them independently.
{} Then we can talk about about the promisor/promised functionality
independent of any kind of filter. The net-net is that the client
has missing objects and it doesn't matter what filter criteria
or mechanism caused that to happened.
{} blob-max-bytes is but one such filter we should have. This might
be sufficient if the goal is replace LFS (where you rarely ever
need any given very very large object) and dynamically loading
them as needed is sufficient and the network round-trip isn't
too much of a perf penalty.
{} But if we want to do things like a "sparse-enlistments" where the
client only needs a small part of the tree using sparse-checkout.
For example, only populating 50,000 files from a tree of 3.5M files
at HEAD, then we need a more general filtering.
{} And as I said above, how we chose to filter should be independent
of how the client handles promisor/promised objects.
{} Also, if we rely strictly on dynamic object fetching to fetch missing
objects, we are effectively tethered to the server during operations
(such as checkout) that the user might not think about as requiring
a network connection. And we are forced to keep the same limitations
of LFS in that you can't prefetch and go offline (without actually
checking out to your worktree first). And we can't bulk or parallel
fetch objects.
{} I think it would also help to move the blob-max-bytes calculation out
of pack-objects.c : add_object_entry() [1]. The current code isolates
the computation there so that only pack-objects can do the filtering.
Instead, put it in list-objects.c and traverse_commit_list() so that
pack-objects and rev-list can share it (as Peff suggested [2] in
response to my first patch series in March).
For example, this would let the client have a pre-checkout hook, use
rev-list to compute the set of missing objects needed for that commit,
and pipe that to a command to BULK fetch them from the server BEFORE
starting the actual checkout. This would allow the savy user to
manually run a prefetch before going offline.
[1] https://github.com/jonathantanmy/git/commit/68e529484169f4800115c5a32e0904c25ad14bd8#diff-a8d2c9cf879e775d748056cfed48440cR1110
[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/
{} This also locks us into size-only filtering and makes it more
difficult to add other filters. In that the add_object_entry()
code gets called on an object after the traversal has decided
what to do with it. It would be difficult to add tree-trimming
at this level, for example.
{} An early draft of this type of filtering is here [3]. I hope to push
up a revised draft of this shortly.
[3] https://public-inbox.org/git/20170713173459.3559-1-git@jeffhostetler.com/
Thanks,
Jeff
next prev parent reply other threads:[~2017-09-21 17:59 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 20:43 RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-19 5:51 ` Junio C Hamano
2017-09-21 17:57 ` Jeff Hostetler
2017-09-21 22:42 ` Jonathan Tan
2017-09-22 21:02 ` Jeff Hostetler
2017-09-22 22:49 ` Jonathan Tan
2017-09-26 15:26 ` Michael Haggerty
2017-09-29 20:21 ` Jonathan Tan
2017-09-21 17:59 ` Jeff Hostetler [this message]
2017-09-21 22:51 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3) Jonathan Tan
2017-09-22 21:19 ` Jeff Hostetler
2017-09-22 22:52 ` Jonathan Tan
2017-09-26 14:03 ` Jeff Hostetler
2017-09-21 18:00 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3) Jeff Hostetler
2017-09-21 23:04 ` Jonathan Tan
2017-09-22 21:32 ` Jeff Hostetler
2017-09-22 22:58 ` Jonathan Tan
2017-09-26 14:25 ` Jeff Hostetler
2017-09-26 17:32 ` Jonathan Tan
2017-09-29 0:53 ` RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-29 2:03 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d295ab3-310e-321e-6e88-69484eb9ce8a@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=peartben@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).