git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: "Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"ZheNing Hu" <adlternative@gmail.com>
Subject: Re: [PATCH 0/3] list-object-filter: introduce depth filter
Date: Thu, 1 Sep 2022 15:24:18 -0400	[thread overview]
Message-ID: <a14028be-2fd2-258d-94f5-c010669de8a6@github.com> (raw)
In-Reply-To: <pull.1343.git.1662025272.gitgitgadget@gmail.com>

On 9/1/2022 5:41 AM, ZheNing Hu via GitGitGadget wrote:
> This patch let partial clone have the similar capabilities of the shallow
> clone git clone --depth=<depth>.
...
> Now we can use git clone --filter="depth=<depth>" to omit all commits whose
> depth is >= <depth>. By this way, we can have the advantages of both shallow
> clone and partial clone: Limiting the depth of commits, get other objects on
> demand.

I have several concerns about this proposal.

The first is that "depth=X" doesn't mean anything after the first
clone. What will happen when we fetch the remaining objects?

Partial clone is designed to download a subset of objects, but make
the remaining reachable objects downloadable on demand. By dropping
reachable commits, the normal partial clone mechanism would result
in a 'git rev-list' call asking for a missing commit. Would this
inherit the "depth=X" but result in a huge amount of over-downloading
the trees and blobs in that commit range? Would it result in downloading
commits one-by-one, and then their root trees (and all reachable objects
from those root trees)?

Finally, computing the set of objects to send is just as expensive as
if we had a shallow clone (we can't use bitmaps). However, we get the
additional problem where fetches do not have a shallow boundary, so
the server will send deltas based on objects that are not necessarily
present locally, triggering extra requests to resolve those deltas.

This fallout remains undocumented and unexplored in this series, but I
doubt the investigation would result in positive outcomes.

> Disadvantages of git clone --depth=<depth> --filter=blob:none: we must call
> git fetch --unshallow to lift the shallow clone restriction, it will
> download all history of current commit.

How does your proposal fix this? Instead of unshallowing, users will
stumble across these objects and trigger huge downloads by accident.
 
> Disadvantages of git clone --filter=blob:none with git sparse-checkout: The
> git client needs to send a lot of missing objects' id to the server, this
> can be very wasteful of network traffic.

Asking for a list of blobs (especially limited to a sparse-checkout) is
much more efficient than what will happen when a user tries to do almost
anything in a repository formed the way you did here.

Thinking about this idea, I don't think it is viable. I would need to
see a lot of work done to test these scenarios closely to believe that
this type of partial clone is a desirable working state.

Thanks,
-Stolee

  parent reply	other threads:[~2022-09-01 19:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01  9:41 [PATCH 0/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01  9:41 ` [PATCH 1/3] commit-graph: let commit graph respect commit graft ZheNing Hu via GitGitGadget
2022-09-01 19:18   ` Derrick Stolee
2022-09-04  5:57     ` ZheNing Hu
2022-09-01  9:41 ` [PATCH 2/3] list-object-filter: pass traversal_context in filter_init_fn ZheNing Hu via GitGitGadget
2022-09-01  9:41 ` [PATCH 3/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01 19:24 ` Derrick Stolee [this message]
2022-09-02 13:48   ` [PATCH 0/3] " Johannes Schindelin
2022-09-04  9:14     ` ZheNing Hu
2022-09-07 10:18       ` Johannes Schindelin
2022-09-11 10:59         ` ZheNing Hu
2022-09-04  7:27   ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a14028be-2fd2-258d-94f5-c010669de8a6@github.com \
    --to=derrickstolee@github.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).