git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git and sparse-checkout on large monorepos - hiding irrelevant changes for a sparse-checkout specification?
@ 2020-04-20 21:18 Tao Klerks
  2020-04-20 22:22 ` Elijah Newren
  0 siblings, 1 reply; 2+ messages in thread
From: Tao Klerks @ 2020-04-20 21:18 UTC (permalink / raw)
  To: git

Hi,

I posted an "Is this possible?" question on stackoverflow
(https://stackoverflow.com/q/61326025/74296) and was pointed here.

I understand from recent updates that there is increasing built-in
support for large files and large repos, between some of the older
capabilities (sparse checkout in general and shallow clone), and the
newer ones (partial-clone and git-sparse-checkout).

I'm playing with a large repo, and finding some "rough edges" around
large diffs (eg 200,000 files "added" in the "initial" commits of
shallow clones).

I was hoping these could be smoothed out when using sparse checkout
(where each user would only see say 30,000 of those 200,000 files),
but can't figure out a way to easily & consistently apply the
.git/info/sparse-checkout specification to tools like git-diff and
git-log (across many users with some semblance of consistency).

Is this something that is or is expected to be supported at some point?

While I'm asking, I have two less-important questions:

1) Are there any plans to support a filter along the lines of "keep
blobs used for commits since date X handy"? I know I can do a shallow
clone, then turn on filtering/promisors, and then unshallow, but then
later fetches don't bring in binaries - a mode that provides this
"full commit history but recent blobs only" might be nice? (I imagine
that's probably non-trivial, because the filters are probably based on
properties of the blobs themselves... but one can dream?)

2) Is there a target date for when git-sparse-checkout will become
non-experimental?

Thanks for any help, my apologies if my questions are too forward.

Best regards,
Tao Klerks

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Git and sparse-checkout on large monorepos - hiding irrelevant changes for a sparse-checkout specification?
  2020-04-20 21:18 Git and sparse-checkout on large monorepos - hiding irrelevant changes for a sparse-checkout specification? Tao Klerks
@ 2020-04-20 22:22 ` Elijah Newren
  0 siblings, 0 replies; 2+ messages in thread
From: Elijah Newren @ 2020-04-20 22:22 UTC (permalink / raw)
  To: Tao Klerks; +Cc: Git Mailing List

On Mon, Apr 20, 2020 at 2:21 PM Tao Klerks <tao@klerks.biz> wrote:
>
> Hi,
>
> I posted an "Is this possible?" question on stackoverflow
> (https://stackoverflow.com/q/61326025/74296) and was pointed here.
>
> I understand from recent updates that there is increasing built-in
> support for large files and large repos, between some of the older
> capabilities (sparse checkout in general and shallow clone), and the
> newer ones (partial-clone and git-sparse-checkout).
>
> I'm playing with a large repo, and finding some "rough edges" around
> large diffs (eg 200,000 files "added" in the "initial" commits of
> shallow clones).
>
> I was hoping these could be smoothed out when using sparse checkout
> (where each user would only see say 30,000 of those 200,000 files),
> but can't figure out a way to easily & consistently apply the
> .git/info/sparse-checkout specification to tools like git-diff and
> git-log (across many users with some semblance of consistency).
>
> Is this something that is or is expected to be supported at some point?

Yes, we would like to support this at some point.  See
https://lore.kernel.org/git/xmqq7dz938sc.fsf@gitster.c.googlers.com/
and a bunch of other emails from that thread.  You may need to set a
config setting, though (see e.g.
https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/
from that thread).

Also, there is no plan at all for when this will happen.  You'll note
those links are kind of recent.  These issues have also come up
before, but I'm too lazy to dig up the links to the other threads.

> While I'm asking, I have two less-important questions:
>
> 1) Are there any plans to support a filter along the lines of "keep
> blobs used for commits since date X handy"? I know I can do a shallow
> clone, then turn on filtering/promisors, and then unshallow, but then
> later fetches don't bring in binaries - a mode that provides this
> "full commit history but recent blobs only" might be nice? (I imagine
> that's probably non-trivial, because the filters are probably based on
> properties of the blobs themselves... but one can dream?)

Given the context before this in your email, could you clarify what
you are asking?  In particular, are you really asking for all blobs
since date X, or for blobs within your sparse cone (going back to
beginning of history), or blobs within your sparse cone since date X?

I personally don't think doing anything with shallow clones other than
avoiding breaking existing usecases has any value.  So, I'll focus on
partial clones.

I've been trying to win some mindshare for the second of those options
(having the ability to specify sparsity cones to clone/fetch and have
it respect those and only download blobs touching those paths, plus
all commits and maybe all trees), and perhaps the others could be
added on top.  I'm planning to help out with this, after my merge
work, but who knows when that finishes.

> 2) Is there a target date for when git-sparse-checkout will become
> non-experimental?

We're more feature based than date based.  I was one of the ones
asking that we put that loud this-is-experimental warning in the docs,
and in particular mentioning that other commands (diff, log, grep,
clone, fetch, etc.) could change in the presence of sparse-checkouts
precisely because I want to see some of the above things fixed and
even have some ideas for merge/rebase/cherry-pick in this area.
You're likely to see some commands start gaining support to work
better in a sparse-checkout (e.g. Matheus posted some patches to make
grep better respect those), and more commands slowly gain it over
time.  Once enough have it and we've worked out the known bugs with
sparse-checkouts (we have some significant patches in 'next' that 2.26
users haven't seen yet), then we'll discuss when it's time to remove
the experimental warning.

> Thanks for any help, my apologies if my questions are too forward.

Sorry that the answer amounts to "we don't have that yet", but the
things you are asking for are things we've been discussing and moving
towards.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-04-20 22:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-20 21:18 Git and sparse-checkout on large monorepos - hiding irrelevant changes for a sparse-checkout specification? Tao Klerks
2020-04-20 22:22 ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).