Hi ZheNing,

On Sun, 4 Sep 2022, ZheNing Hu wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> 于2022年9月2日周五 21:48写道：
>
> > [...]
> > When you have all the commit and tree objects on the local side,
> > you can enumerate all the blob objects you need in one fell swoop, then
> > fetch them in a single network round trip.
> >
> > When you lack tree objects, or worse, commit objects, this is not true.
> > You may very well need to fetch _quite_ a bunch of objects, then inspect
> > them to find out that you need to fetch more tree/commit objects, and then
> > a couple more round trips, before you can enumerate all of the objects you
> > need.
>
> I think this is because the previous design was that you had to fetch
> these missing commits (also trees) and all their ancestors. Maybe we can
> modify git rev-list to make it understand missing commits...

We do have such a modification, and it is called "shallow clone" ;-)

Granted, shallow clones are not a complete solution and turned out to be a
dead end (i.e. that design cannot be extended into anything more useful).
But that approach demonstrates what it would take to implement a logic
whereby Git understands that some commit ranges are missing and should not
be fetched automatically.

> > [...] it is hard to think of a way how the design could result in
> > anything but undesirable behavior, both on the client and the server
> > side.
> >
> > We also have to consider that our experience with large repositories
> > demonstrates that tree and commit objects delta pretty well and are
> > virtually never a concern when cloning. It is always the sheer amount
> > of blob objects that is causing poor user experience when performing
> > non-partial clones of large repositories.
>
> Thanks, I think I understand the problem here. By the way, does it make
> sense to download just some of the commits/trees in some big repository
> which have several million commits/trees?

It probably only makes sense if we can come up with a good idea how to
teach Git the trick to stop downloading so many objects in costly
roundtrips.

But I wonder whether your scenarios are so different from the ones I
encountered, in that commit and tree objects do _not_ delta well on your
side?

If they _do_ delta well, i.e. if it is comparatively cheap to just fetch
them all in one go, it probably makes more sense to just drop the idea of
fetching only some commit/tree objects but not others in a partial clone,
and always fetch all of 'em.

> > Now, I can be totally wrong in my expectation that there is _no_ scenario
> > where cloning with a "partial depth" would cause anything but poor
> > performance. If I am wrong, then there is value in having this feature,
> > but since it causes undesirable performance in all cases I can think of,
> > it definitely should be guarded behind an opt-in flag.
>
> Well, now I think this depth filter might be a better fit for git fetch.

I disagree here, because I see all the same challenges as I described for
clones missing entire commit ranges.

> If git checkout or other commands which just need to check
> few commits, and find almost all objects (maybe >= 75%) in a
> commit are not local, it can use this depth filter to download them.

If you want a clone that does not show any reasonable commit history
because it does not fetch commit objects on-the-fly, then we already have
such a thing with shallow clones.

The only way to make Git's revision walking logic perform _somewhat_
reasonably would be to teach it to fetch not just a single commit object
when it was asked for, but to somehow pass a desired depth by which to
"unshallow" automatically.

However, such a feature would come with the same undesirable implications
on the server side as shallow clones (fetches into shallow clones are
_really_ expensive on the server side).

Ciao,
Dscho