git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Cc: peff@peff.net
Subject: Re: [RFC PATCH] fetch-pack: lazy fetch using tree:0
Date: Thu, 19 Mar 2020 15:58:33 -0400	[thread overview]
Message-ID: <593efac4-75f8-df74-259d-83dd8297aa3f@gmail.com> (raw)
In-Reply-To: <20200319174439.230969-1-jonathantanmy@google.com>

On 3/19/2020 1:44 PM, Jonathan Tan wrote:
> Support for partial clones with filtered trees was added in bc5975d24f
> ("list-objects-filter: implement filter tree:0", 2018-10-07), but
> whenever a lazy fetch of a tree is done, besides the tree itself, some
> other objects that it references are also fetched.
> 
> The "blob:none" filter was added to lazy fetches in 4c7f9567ea
> ("fetch-pack: exclude blobs when lazy-fetching trees", 2018-10-04) to
> restrict blobs from being fetched, but it didn't restrict trees.
> ("tree:0", which would restrict all trees as well, wasn't added then
> because "tree:0" was itself new and may not have been supported by Git
> servers, as you can see from the dates of the commits.)
> 
> Now that "tree:0" has been supported in Git for a while, teach lazy
> fetches to use "tree:0" instead of "blob:none".
> 
> (An alternative to doing this is to teach Git a new filter that only
> returns exactly the objects requested, no more - but "tree:0" already
> does that for us for now, hence this patch. If we were to support
> filtering of commits in partial clones later, I think that specifying a
> depth will work to restrict the commits returned, so we won't need an
> additional filter anyway.)
> ---
> This looks like a good change to me - in particular, it makes Git align
> with the (in my opinion, reasonable) mental model that when we lazily
> fetch something, we only fetch that thing. Some issues that I can think
> about:
> 
>  - Some hosts like GitHub support some partial clone filters, but not
>    "tree:0".
>  - I haven't figured out the performance implications yet. If we want a
>    tree, I think that we typically will want some of its subtrees, but
>    not all.
> 
> Any thoughts?

The end result of fetching missing objects one-by-one matches how the
GVFS protocol has handled these tree misses in the past. While there
may be a lot more round trips, it saves on excess data since a
missing tree likely can reach several known trees and blobs.

The real unknown here is how the "boundary" of missing trees is
created. In the GVFS protocol, missing trees happen mostly when our
pre-computed "prefetch pack-files" of commits and trees are behind the
ref tips.

The usage pattern for depth-limited or path-scoped filters is not
quite as established as the blob-limited patterns (because they are
similar to the behavior in VFS for Git and Scalar).

The code seems to be doing what you say, but I highly recommend taking
this for a spin on a real repository with a real remote, if possible.
The more that we could get some numbers for which situations do better
in one case or the other, the more this change can be adopted with
confidence.

Thanks,
-Stolee

  reply	other threads:[~2020-03-19 19:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 17:44 [RFC PATCH] fetch-pack: lazy fetch using tree:0 Jonathan Tan
2020-03-19 19:58 ` Derrick Stolee [this message]
2020-03-20  6:12 ` Jeff King
2020-03-26 19:50   ` Taylor Blau
2020-03-27  9:37     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=593efac4-75f8-df74-259d-83dd8297aa3f@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).