git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Derrick Stolee" <dstolee@microsoft.com>
Subject: Re: [PATCH] [RFC] list-objects-filter: introduce new filter sparse:buffer=<spec>
Date: Fri, 12 Aug 2022 23:40:27 +0800	[thread overview]
Message-ID: <CAOLTT8RVhfzA7RtZHzU+L8XOGhoYr1AEOw4iD0vHb1b84mhtiw@mail.gmail.com> (raw)
In-Reply-To: <46ca40a9-2d9a-3c7c-3272-938003f4967a@github.com>

Derrick Stolee <derrickstolee@github.com> 于2022年8月9日周二 21:37写道:
>
> On 8/8/2022 12:15 PM, Junio C Hamano wrote:
> > "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> From: ZheNing Hu <adlternative@gmail.com>
> >>
> >> Although we already had a `--filter=sparse:oid=<oid>` which
> >> can used to clone a repository with limited objects which meet
> >> filter rules in the file corresponding to the <oid> on the git
> >> server. But it can only read filter rules which have been record
> >> in the git server before.
> >
> > Was the reason why we have "we limit to an object we already have"
> > restriction because we didn't want to blindly use a piece of
> > uncontrolled arbigrary end-user data here?  Just wondering.
>
> One of the ideas here was to limit the opportunity of sending an
> arbitrary set of data over the Git protocol and avoid exactly the
> scenario you mention.
>
> Another was that it is incredibly expensive to compute the set of
> reachable objects within an arbitrary sparse-checkout definition,
> since it requires walking trees (bitmaps do not help here). This
> is why (to my knowledge) no Git hosting service currently supports
> this mechanism at scale. At minimum, using the stored OID would
> allow the host to keep track of these pre-defined sets and do some
> precomputing of reachable data using bitmaps to keep clones and
> fetches reasonable at all.
>

How about only allowing some easier filter rules?

e.g. https://github.com/derrickstolee/sparse-checkout-example

User A can use --filter="sparse:buffer=client" to download client/ directory,
User B can use --filter="sparse:buffer=service/list"  to download only
service/list.

cat >filterspec <<-EOF &&
web
service
EOF
User C can use --filter="sparse:buffer=`cat filterspec`" to download
web/ and service/.

cat >filterspec <<-EOF &&
service
!service/list
EOF
But user D cannot use --filter="sparse:buffer=service/list"  to
download service without service/list.

I guess many users can benefit from this...

> The other side of the issue is that we do not have a good solution
> for resolving how to change this filter in the future, in case the
> user wants to expand their sparse-checkout definition and update
> their partial clone filter.
>

I guess we don't really need to maintain this "partial clone filter", we
can even reuse sparse-checkout rules after we first partial-clone, we maybe
should write the first partial-clone filter rules to .git/info/sparse-checkout
(only when --sparse is used in git clone?)

> There used to be a significant issue where a 'git checkout'
> would fault in a lot of missing trees because the index needed to
> reference the files outside of the sparse-checkout definition. Now
> that the sparse index exists, this is less of an impediment, but
> it can still cause some pain.
>

Agree.

> At this moment, I think path-scoped filters have a lot of problems
> that need solving before they can be used effectively in the wild.
> I would prefer that we solve those problems before making the
> feature more complicated. That's a tall ask, since these problems
> do not have simple solutions.
>

Could you tell me where the problem is? I can start to deal with them :)

> Thanks,
> -Stolee

Thanks.
ZheNing Hu

  parent reply	other threads:[~2022-08-12 15:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08 11:29 [PATCH] [RFC] list-objects-filter: introduce new filter sparse:buffer=<spec> ZheNing Hu via GitGitGadget
2022-08-08 16:15 ` Junio C Hamano
2022-08-09  6:13   ` ZheNing Hu
2022-08-09 13:37   ` Derrick Stolee
2022-08-10 21:15     ` Jeff King
2022-08-12 15:49       ` ZheNing Hu
2022-08-14  6:54         ` Jeff King
2022-08-12 15:40     ` ZheNing Hu [this message]
2022-08-26  5:10     ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8RVhfzA7RtZHzU+L8XOGhoYr1AEOw4iD0vHb1b84mhtiw@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).