From: ZheNing Hu <adlternative@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
"Git List" <git@vger.kernel.org>,
"Christian Couder" <christian.couder@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Jeff King" <peff@peff.net>,
"Jeff Hostetler" <jeffhost@microsoft.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 0/3] list-object-filter: introduce depth filter
Date: Sun, 4 Sep 2022 15:27:47 +0800 [thread overview]
Message-ID: <CAOLTT8T5gxC9F2KS--c8L3=-zzET=eRj_jNZxKeKGcoNZipzWw@mail.gmail.com> (raw)
In-Reply-To: <a14028be-2fd2-258d-94f5-c010669de8a6@github.com>
Derrick Stolee <derrickstolee@github.com> 于2022年9月2日周五 03:24写道:
>
> On 9/1/2022 5:41 AM, ZheNing Hu via GitGitGadget wrote:
> > This patch let partial clone have the similar capabilities of the shallow
> > clone git clone --depth=<depth>.
> ...
> > Now we can use git clone --filter="depth=<depth>" to omit all commits whose
> > depth is >= <depth>. By this way, we can have the advantages of both shallow
> > clone and partial clone: Limiting the depth of commits, get other objects on
> > demand.
>
> I have several concerns about this proposal.
>
> The first is that "depth=X" doesn't mean anything after the first
> clone. What will happen when we fetch the remaining objects?
>
According to the current results, yes, it still downloads a large number
of commits.
Do a litte test again:
$ git clone --filter=depth:2 git.git git
Cloning into 'git'...
remote: Enumerating objects: 4311, done.
remote: Counting objects: 100% (4311/4311), done.
remote: Compressing objects: 100% (3788/3788), done.
Just see how many objects...
$ git cat-file --batch-check --batch-all-objects | grep blob | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
4098
$ git cat-file --batch-check --batch-all-objects | grep tree | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
211
$ git cat-file --batch-check --batch-all-objects | grep commit | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
2
$ git checkout HEAD~
Fetch nothing...because depth=2.
$ git checkout HEAD~
remote: Enumerating objects: 198514, done.
remote: Counting objects: 100% (198514/198514), done.
remote: Compressing objects: 100% (68511/68511), done.
remote: Total 198514 (delta 128408), reused 198509 (delta 128406), pack-reused 0
Receiving objects: 100% (198514/198514), 77.07 MiB | 9.58 MiB/s, done.
Resolving deltas: 100% (128408/128408), done.
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 14.35 KiB | 14.35 MiB/s, done.
remote: Enumerating objects: 198014, done.
remote: Counting objects: 100% (198014/198014), done.
remote: Compressing objects: 100% (68362/68362), done.
remote: Total 198014 (delta 128056), reused 198012 (delta 128055), pack-reused 0
Receiving objects: 100% (198014/198014), 76.55 MiB | 14.00 MiB/s, done.
Resolving deltas: 100% (128056/128056), done.
Previous HEAD position was 624a936234 Merge branch 'en/merge-multi-strategies'
HEAD is now at 014a9ea207 Merge branch 'en/t4301-more-merge-tree-tests'
Fetch a lot of objects... (three times!)
$ git cat-file --batch-check --batch-all-objects | grep blob | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
4099
$ git cat-file --batch-check --batch-all-objects | grep tree | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
130712
$ git cat-file --batch-check --batch-all-objects | grep commit | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
67815
It fetched too many Commits and Trees... But Surprisingly, only one
more blob was downloaded.
I admit that this is a very bad action, That's because we
have no commits locally...
Maybe one solution: we can also provide a commit-id parameter
inside the depth filter, like --filter="commit:014a9ea207, depth:1"...
we can clone with blob:none filter to download all trees/commits,
then fetch blobs with this "commit-depth" filter.... even we can
provide a more complex filter: --filter="commit:014a9ea207, depth:1, type=blob"
This may avoid downloading too many unneeded commits and trees...
git fetch --filter="commit:014a9ea207, depth:1, type=blob"
If git fetch have learned this filter, then git checkout or other commands can
use this filter internally heuristically:
e.g.
git checkout HEAD~
if HEAD~ missing | 75% blobs/trees in HEAD~ missing -> use "commit-depth" filter
else -> use blob:none filter
We can even make this commit-depth filter support multiple commits later.
> Partial clone is designed to download a subset of objects, but make
> the remaining reachable objects downloadable on demand. By dropping
> reachable commits, the normal partial clone mechanism would result
> in a 'git rev-list' call asking for a missing commit. Would this
> inherit the "depth=X" but result in a huge amount of over-downloading
> the trees and blobs in that commit range? Would it result in downloading
> commits one-by-one, and then their root trees (and all reachable objects
> from those root trees)?
>
I don't know if it's possible let git rev-list know that commits is missing, and
stop download them. (just like git cat-file --batch --batch-all-objects does)
Similarly, you can let git log or other commands to understand this...
Probably a config var: fetch.skipmissingcommits...
> Finally, computing the set of objects to send is just as expensive as
> if we had a shallow clone (we can't use bitmaps). However, we get the
> additional problem where fetches do not have a shallow boundary, so
> the server will send deltas based on objects that are not necessarily
> present locally, triggering extra requests to resolve those deltas.
>
Agree, I think this maybe a problem, but there is no good solution for it.
> This fallout remains undocumented and unexplored in this series, but I
> doubt the investigation would result in positive outcomes.
>
> > Disadvantages of git clone --depth=<depth> --filter=blob:none: we must call
> > git fetch --unshallow to lift the shallow clone restriction, it will
> > download all history of current commit.
>
> How does your proposal fix this? Instead of unshallowing, users will
> stumble across these objects and trigger huge downloads by accident.
>
As mentioned above, I would expect a commit-depth filter to fix this.
> > Disadvantages of git clone --filter=blob:none with git sparse-checkout: The
> > git client needs to send a lot of missing objects' id to the server, this
> > can be very wasteful of network traffic.
>
> Asking for a list of blobs (especially limited to a sparse-checkout) is
> much more efficient than what will happen when a user tries to do almost
> anything in a repository formed the way you did here.
>
Yes. also as mentioned above, enabling this filter in some specific cases:
e.g. we have the commit but not all trees/blobs in it.
> Thinking about this idea, I don't think it is viable. I would need to
> see a lot of work done to test these scenarios closely to believe that
> this type of partial clone is a desirable working state.
>
Agree.
> Thanks,
> -Stolee
Thanks to these reviews and criticisms, it makes me think more :)
ZheNing Hu
prev parent reply other threads:[~2022-09-04 7:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 9:41 [PATCH 0/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01 9:41 ` [PATCH 1/3] commit-graph: let commit graph respect commit graft ZheNing Hu via GitGitGadget
2022-09-01 19:18 ` Derrick Stolee
2022-09-04 5:57 ` ZheNing Hu
2022-09-01 9:41 ` [PATCH 2/3] list-object-filter: pass traversal_context in filter_init_fn ZheNing Hu via GitGitGadget
2022-09-01 9:41 ` [PATCH 3/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01 19:24 ` [PATCH 0/3] " Derrick Stolee
2022-09-02 13:48 ` Johannes Schindelin
2022-09-04 9:14 ` ZheNing Hu
2022-09-07 10:18 ` Johannes Schindelin
2022-09-11 10:59 ` ZheNing Hu
2022-09-04 7:27 ` ZheNing Hu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOLTT8T5gxC9F2KS--c8L3=-zzET=eRj_jNZxKeKGcoNZipzWw@mail.gmail.com' \
--to=adlternative@gmail.com \
--cc=avarab@gmail.com \
--cc=christian.couder@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=johannes.schindelin@gmx.de \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).