git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "Derrick Stolee" <derrickstolee@github.com>,
	"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH 0/3] list-object-filter: introduce depth filter
Date: Wed, 7 Sep 2022 12:18:34 +0200 (CEST)	[thread overview]
Message-ID: <o10o218s-2rq4-9n3p-86np-rn79r7qr2139@tzk.qr> (raw)
In-Reply-To: <CAOLTT8S2r1gzyF8YAORuGwian+QwSniAPd8br0xn_P5gPyxpgg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3966 bytes --]

Hi ZheNing,

On Sun, 4 Sep 2022, ZheNing Hu wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> 于2022年9月2日周五 21:48写道:
>
> > [...]
> > When you have all the commit and tree objects on the local side,
> > you can enumerate all the blob objects you need in one fell swoop, then
> > fetch them in a single network round trip.
> >
> > When you lack tree objects, or worse, commit objects, this is not true.
> > You may very well need to fetch _quite_ a bunch of objects, then inspect
> > them to find out that you need to fetch more tree/commit objects, and then
> > a couple more round trips, before you can enumerate all of the objects you
> > need.
>
> I think this is because the previous design was that you had to fetch
> these missing commits (also trees) and all their ancestors. Maybe we can
> modify git rev-list to make it understand missing commits...

We do have such a modification, and it is called "shallow clone" ;-)

Granted, shallow clones are not a complete solution and turned out to be a
dead end (i.e. that design cannot be extended into anything more useful).
But that approach demonstrates what it would take to implement a logic
whereby Git understands that some commit ranges are missing and should not
be fetched automatically.

> > [...] it is hard to think of a way how the design could result in
> > anything but undesirable behavior, both on the client and the server
> > side.
> >
> > We also have to consider that our experience with large repositories
> > demonstrates that tree and commit objects delta pretty well and are
> > virtually never a concern when cloning. It is always the sheer amount
> > of blob objects that is causing poor user experience when performing
> > non-partial clones of large repositories.
>
> Thanks, I think I understand the problem here. By the way, does it make
> sense to download just some of the commits/trees in some big repository
> which have several million commits/trees?

It probably only makes sense if we can come up with a good idea how to
teach Git the trick to stop downloading so many objects in costly
roundtrips.

But I wonder whether your scenarios are so different from the ones I
encountered, in that commit and tree objects do _not_ delta well on your
side?

If they _do_ delta well, i.e. if it is comparatively cheap to just fetch
them all in one go, it probably makes more sense to just drop the idea of
fetching only some commit/tree objects but not others in a partial clone,
and always fetch all of 'em.

> > Now, I can be totally wrong in my expectation that there is _no_ scenario
> > where cloning with a "partial depth" would cause anything but poor
> > performance. If I am wrong, then there is value in having this feature,
> > but since it causes undesirable performance in all cases I can think of,
> > it definitely should be guarded behind an opt-in flag.
>
> Well, now I think this depth filter might be a better fit for git fetch.

I disagree here, because I see all the same challenges as I described for
clones missing entire commit ranges.

> If git checkout or other commands which just need to check
> few commits, and find almost all objects (maybe >= 75%) in a
> commit are not local, it can use this depth filter to download them.

If you want a clone that does not show any reasonable commit history
because it does not fetch commit objects on-the-fly, then we already have
such a thing with shallow clones.

The only way to make Git's revision walking logic perform _somewhat_
reasonably would be to teach it to fetch not just a single commit object
when it was asked for, but to somehow pass a desired depth by which to
"unshallow" automatically.

However, such a feature would come with the same undesirable implications
on the server side as shallow clones (fetches into shallow clones are
_really_ expensive on the server side).

Ciao,
Dscho

  reply	other threads:[~2022-09-07 10:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01  9:41 [PATCH 0/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01  9:41 ` [PATCH 1/3] commit-graph: let commit graph respect commit graft ZheNing Hu via GitGitGadget
2022-09-01 19:18   ` Derrick Stolee
2022-09-04  5:57     ` ZheNing Hu
2022-09-01  9:41 ` [PATCH 2/3] list-object-filter: pass traversal_context in filter_init_fn ZheNing Hu via GitGitGadget
2022-09-01  9:41 ` [PATCH 3/3] list-object-filter: introduce depth filter ZheNing Hu via GitGitGadget
2022-09-01 19:24 ` [PATCH 0/3] " Derrick Stolee
2022-09-02 13:48   ` Johannes Schindelin
2022-09-04  9:14     ` ZheNing Hu
2022-09-07 10:18       ` Johannes Schindelin [this message]
2022-09-11 10:59         ` ZheNing Hu
2022-09-04  7:27   ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=o10o218s-2rq4-9n3p-86np-rn79r7qr2139@tzk.qr \
    --to=johannes.schindelin@gmx.de \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).