git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 0/3] list-object-filter: introduce depth filter
Date: Sun, 4 Sep 2022 15:27:47 +0800	[thread overview]
Message-ID: <CAOLTT8T5gxC9F2KS--c8L3=-zzET=eRj_jNZxKeKGcoNZipzWw@mail.gmail.com> (raw)
In-Reply-To: <a14028be-2fd2-258d-94f5-c010669de8a6@github.com>

Derrick Stolee <derrickstolee@github.com> 于2022年9月2日周五 03:24写道:
>
> On 9/1/2022 5:41 AM, ZheNing Hu via GitGitGadget wrote:
> > This patch let partial clone have the similar capabilities of the shallow
> > clone git clone --depth=<depth>.
> ...
> > Now we can use git clone --filter="depth=<depth>" to omit all commits whose
> > depth is >= <depth>. By this way, we can have the advantages of both shallow
> > clone and partial clone: Limiting the depth of commits, get other objects on
> > demand.
>
> I have several concerns about this proposal.
>
> The first is that "depth=X" doesn't mean anything after the first
> clone. What will happen when we fetch the remaining objects?
>

According to the current results, yes, it still downloads a large number
of commits.

Do a litte test again:

$ git clone --filter=depth:2 git.git git
Cloning into 'git'...
remote: Enumerating objects: 4311, done.
remote: Counting objects: 100% (4311/4311), done.
remote: Compressing objects: 100% (3788/3788), done.

Just see how many objects...
$ git cat-file --batch-check --batch-all-objects | grep blob | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
    4098
$ git cat-file --batch-check --batch-all-objects | grep tree | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
     211
$ git cat-file --batch-check --batch-all-objects | grep commit | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
       2

$ git checkout HEAD~

Fetch nothing...because depth=2.

$  git checkout HEAD~
remote: Enumerating objects: 198514, done.
remote: Counting objects: 100% (198514/198514), done.
remote: Compressing objects: 100% (68511/68511), done.
remote: Total 198514 (delta 128408), reused 198509 (delta 128406), pack-reused 0
Receiving objects: 100% (198514/198514), 77.07 MiB | 9.58 MiB/s, done.
Resolving deltas: 100% (128408/128408), done.
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 14.35 KiB | 14.35 MiB/s, done.
remote: Enumerating objects: 198014, done.
remote: Counting objects: 100% (198014/198014), done.
remote: Compressing objects: 100% (68362/68362), done.
remote: Total 198014 (delta 128056), reused 198012 (delta 128055), pack-reused 0
Receiving objects: 100% (198014/198014), 76.55 MiB | 14.00 MiB/s, done.
Resolving deltas: 100% (128056/128056), done.
Previous HEAD position was 624a936234 Merge branch 'en/merge-multi-strategies'
HEAD is now at 014a9ea207 Merge branch 'en/t4301-more-merge-tree-tests'

Fetch a lot of objects... (three times!)

$ git cat-file --batch-check --batch-all-objects | grep blob | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
    4099
$ git cat-file --batch-check --batch-all-objects | grep tree | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
    130712
$ git cat-file --batch-check --batch-all-objects | grep commit | wc -l
warning: This repository uses promisor remotes. Some objects may not be loaded.
    67815

It fetched too many Commits and Trees... But Surprisingly, only one
more blob was downloaded.

I admit that this is a very bad action, That's because we
have no commits locally...

Maybe one solution: we can also provide a commit-id parameter
inside the depth filter, like --filter="commit:014a9ea207, depth:1"...
we can clone with blob:none filter to download all trees/commits,
then fetch blobs with this "commit-depth" filter.... even we can
provide a more complex filter: --filter="commit:014a9ea207, depth:1, type=blob"
This may avoid downloading too many unneeded commits and trees...

git fetch --filter="commit:014a9ea207, depth:1, type=blob"

If git fetch have learned this filter, then git checkout or other commands can
 use this filter internally heuristically:

e.g.

git checkout HEAD~
if HEAD~ missing | 75% blobs/trees in HEAD~ missing -> use "commit-depth" filter
else -> use blob:none filter

We can even make this commit-depth filter support multiple commits later.

> Partial clone is designed to download a subset of objects, but make
> the remaining reachable objects downloadable on demand. By dropping
> reachable commits, the normal partial clone mechanism would result
> in a 'git rev-list' call asking for a missing commit. Would this
> inherit the "depth=X" but result in a huge amount of over-downloading
> the trees and blobs in that commit range? Would it result in downloading
> commits one-by-one, and then their root trees (and all reachable objects
> from those root trees)?
>

I don't know if it's possible let git rev-list know that commits is missing, and
stop download them. (just like git cat-file --batch --batch-all-objects does)

Similarly, you can let git log or other commands to understand this...

Probably a config var: fetch.skipmissingcommits...

> Finally, computing the set of objects to send is just as expensive as
> if we had a shallow clone (we can't use bitmaps). However, we get the
> additional problem where fetches do not have a shallow boundary, so
> the server will send deltas based on objects that are not necessarily
> present locally, triggering extra requests to resolve those deltas.
>

Agree, I think this maybe a problem, but there is no good solution for it.

> This fallout remains undocumented and unexplored in this series, but I
> doubt the investigation would result in positive outcomes.
>
> > Disadvantages of git clone --depth=<depth> --filter=blob:none: we must call
> > git fetch --unshallow to lift the shallow clone restriction, it will
> > download all history of current commit.
>
> How does your proposal fix this? Instead of unshallowing, users will
> stumble across these objects and trigger huge downloads by accident.
>

As mentioned above, I would expect a commit-depth filter to fix this.

> > Disadvantages of git clone --filter=blob:none with git sparse-checkout: The
> > git client needs to send a lot of missing objects' id to the server, this
> > can be very wasteful of network traffic.
>
> Asking for a list of blobs (especially limited to a sparse-checkout) is
> much more efficient than what will happen when a user tries to do almost
> anything in a repository formed the way you did here.
>

Yes. also as mentioned above, enabling this filter in some specific cases:
e.g. we have the commit but not all trees/blobs in it.

> Thinking about this idea, I don't think it is viable. I would need to
> see a lot of work done to test these scenarios closely to believe that
> this type of partial clone is a desirable working state.
>

Agree.

> Thanks,
> -Stolee

Thanks to these reviews and criticisms, it makes me think more :)

ZheNing Hu

      parent reply	other threads:[~2022-09-04  7:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01  9:41 [PATCH 0/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01  9:41 ` [PATCH 1/3] commit-graph: let commit graph respect commit graft ZheNing Hu via GitGitGadget
2022-09-01 19:18   ` Derrick Stolee
2022-09-04  5:57     ` ZheNing Hu
2022-09-01  9:41 ` [PATCH 2/3] list-object-filter: pass traversal_context in filter_init_fn ZheNing Hu via GitGitGadget
2022-09-01  9:41 ` [PATCH 3/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01 19:24 ` [PATCH 0/3] " Derrick Stolee
2022-09-02 13:48   ` Johannes Schindelin
2022-09-04  9:14     ` ZheNing Hu
2022-09-07 10:18       ` Johannes Schindelin
2022-09-11 10:59         ` ZheNing Hu
2022-09-04  7:27   ` ZheNing Hu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8T5gxC9F2KS--c8L3=-zzET=eRj_jNZxKeKGcoNZipzWw@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).