git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Junio C Hamano <gitster@pobox.com>,
	ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Derrick Stolee" <dstolee@microsoft.com>,
	"ZheNing Hu" <adlternative@gmail.com>
Subject: Re: [PATCH] [RFC] list-objects-filter: introduce new filter sparse:buffer=<spec>
Date: Tue, 9 Aug 2022 09:37:09 -0400	[thread overview]
Message-ID: <46ca40a9-2d9a-3c7c-3272-938003f4967a@github.com> (raw)
In-Reply-To: <xmqqczdau2yd.fsf@gitster.g>

On 8/8/2022 12:15 PM, Junio C Hamano wrote:
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: ZheNing Hu <adlternative@gmail.com>
>>
>> Although we already had a `--filter=sparse:oid=<oid>` which
>> can used to clone a repository with limited objects which meet
>> filter rules in the file corresponding to the <oid> on the git
>> server. But it can only read filter rules which have been record
>> in the git server before.
> 
> Was the reason why we have "we limit to an object we already have"
> restriction because we didn't want to blindly use a piece of
> uncontrolled arbigrary end-user data here?  Just wondering.

One of the ideas here was to limit the opportunity of sending an
arbitrary set of data over the Git protocol and avoid exactly the
scenario you mention.

Another was that it is incredibly expensive to compute the set of
reachable objects within an arbitrary sparse-checkout definition,
since it requires walking trees (bitmaps do not help here). This
is why (to my knowledge) no Git hosting service currently supports
this mechanism at scale. At minimum, using the stored OID would
allow the host to keep track of these pre-defined sets and do some
precomputing of reachable data using bitmaps to keep clones and
fetches reasonable at all.

The other side of the issue is that we do not have a good solution
for resolving how to change this filter in the future, in case the
user wants to expand their sparse-checkout definition and update
their partial clone filter.

There used to be a significant issue where a 'git checkout'
would fault in a lot of missing trees because the index needed to
reference the files outside of the sparse-checkout definition. Now
that the sparse index exists, this is less of an impediment, but
it can still cause some pain.

At this moment, I think path-scoped filters have a lot of problems
that need solving before they can be used effectively in the wild.
I would prefer that we solve those problems before making the
feature more complicated. That's a tall ask, since these problems
do not have simple solutions.

Thanks,
-Stolee

  parent reply	other threads:[~2022-08-09 13:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08 11:29 [PATCH] [RFC] list-objects-filter: introduce new filter sparse:buffer=<spec> ZheNing Hu via GitGitGadget
2022-08-08 16:15 ` Junio C Hamano
2022-08-09  6:13   ` ZheNing Hu
2022-08-09 13:37   ` Derrick Stolee [this message]
2022-08-10 21:15     ` Jeff King
2022-08-12 15:49       ` ZheNing Hu
2022-08-14  6:54         ` Jeff King
2022-08-12 15:40     ` ZheNing Hu
2022-08-26  5:10     ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46ca40a9-2d9a-3c7c-3272-938003f4967a@github.com \
    --to=derrickstolee@github.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).