git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Cc: peartben@gmail.com, Christian Couder <christian.couder@gmail.com>
Subject: Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3)
Date: Thu, 21 Sep 2017 13:59:43 -0400	[thread overview]
Message-ID: <5d295ab3-310e-321e-6e88-69484eb9ce8a@jeffhostetler.com> (raw)
In-Reply-To: <20170915134343.3814dc38@twelve2.svl.corp.google.com>

(part 2)

Additional overall comments on:
https://github.com/jonathantanmy/git/commits/partialclone2

{} I think it would help to split the blob-max-bytes filtering and the
    promisor/promised concepts and discuss them independently.

    {} Then we can talk about about the promisor/promised functionality
       independent of any kind of filter.  The net-net is that the client
       has missing objects and it doesn't matter what filter criteria
       or mechanism caused that to happened.

    {} blob-max-bytes is but one such filter we should have.  This might
       be sufficient if the goal is replace LFS (where you rarely ever
       need any given very very large object) and dynamically loading
       them as needed is sufficient and the network round-trip isn't
       too much of a perf penalty.

    {} But if we want to do things like a "sparse-enlistments" where the
       client only needs a small part of the tree using sparse-checkout.
       For example, only populating 50,000 files from a tree of 3.5M files
       at HEAD, then we need a more general filtering.

    {} And as I said above, how we chose to filter should be independent
       of how the client handles promisor/promised objects.


{} Also, if we rely strictly on dynamic object fetching to fetch missing
    objects, we are effectively tethered to the server during operations
    (such as checkout) that the user might not think about as requiring
    a network connection.  And we are forced to keep the same limitations
    of LFS in that you can't prefetch and go offline (without actually
    checking out to your worktree first).  And we can't bulk or parallel
    fetch objects.


{} I think it would also help to move the blob-max-bytes calculation out
    of pack-objects.c : add_object_entry() [1].  The current code isolates
    the computation there so that only pack-objects can do the filtering.

    Instead, put it in list-objects.c and traverse_commit_list() so that
    pack-objects and rev-list can share it (as Peff suggested [2] in
    response to my first patch series in March).

    For example, this would let the client have a pre-checkout hook, use
    rev-list to compute the set of missing objects needed for that commit,
    and pipe that to a command to BULK fetch them from the server BEFORE
    starting the actual checkout.  This would allow the savy user to
    manually run a prefetch before going offline.

[1] https://github.com/jonathantanmy/git/commit/68e529484169f4800115c5a32e0904c25ad14bd8#diff-a8d2c9cf879e775d748056cfed48440cR1110

[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/


{} This also locks us into size-only filtering and makes it more
    difficult to add other filters.  In that the add_object_entry()
    code gets called on an object after the traversal has decided
    what to do with it.  It would be difficult to add tree-trimming
    at this level, for example.


{} An early draft of this type of filtering is here [3].  I hope to push
    up a revised draft of this shortly.

[3] https://public-inbox.org/git/20170713173459.3559-1-git@jeffhostetler.com/


Thanks,
Jeff


  parent reply	other threads:[~2017-09-21 17:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 20:43 RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-19  5:51 ` Junio C Hamano
2017-09-21 17:57 ` Jeff Hostetler
2017-09-21 22:42   ` Jonathan Tan
2017-09-22 21:02     ` Jeff Hostetler
2017-09-22 22:49       ` Jonathan Tan
2017-09-26 15:26     ` Michael Haggerty
2017-09-29 20:21       ` Jonathan Tan
2017-09-21 17:59 ` Jeff Hostetler [this message]
2017-09-21 22:51   ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3) Jonathan Tan
2017-09-22 21:19     ` Jeff Hostetler
2017-09-22 22:52       ` Jonathan Tan
2017-09-26 14:03         ` Jeff Hostetler
2017-09-21 18:00 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3) Jeff Hostetler
2017-09-21 23:04   ` Jonathan Tan
2017-09-22 21:32     ` Jeff Hostetler
2017-09-22 22:58       ` Jonathan Tan
2017-09-26 14:25         ` Jeff Hostetler
2017-09-26 17:32           ` Jonathan Tan
2017-09-29  0:53 ` RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-29  2:03   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d295ab3-310e-321e-6e88-69484eb9ce8a@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).