From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [TOPIC 6/8] State of sparsity work
Date: Thu, 29 Sep 2022 15:21:30 -0400 [thread overview]
Message-ID: <YzXwOsaoCdBhHsX1@nand.local> (raw)
In-Reply-To: <YzXvMRc6X60kjVeY@nand.local>
# State of sparsity developments and future plans (Victoria)
- (Victoria) Integrating commands with sparse index, making them
compatible with sparse indexes, not un-sparsifying the index before
executing themselves
- Have worked on some more recently
- GSoC student has worked on a handful as well
- Near-term future is about finding commands that need to touch the
index, don't support sparse index, and then make them compatible
- In some cases, that is going to require expanding the index to be
non-sparse, especially if you are touching something outside of the
sparsity cone
- That is somewhat straightforward
- More interesting questions: what is the future of sparsity? Recently,
Elijah pushed a change to make sparse-checkout's cone mode the default
(nice, since it is required by sparse index)
- As we move forward, what should we change the defaults of?
- Sparse index for sparse checkouts in cone mode?
- Scalar as a testing ground for larger features, including sparse index
- Could make sparse index the default in Scalar for cone-mode sparse
checkouts, and then see how it goes
- Or, could just go for it sooner (after we have integrated sparse
index with enough commands)
- A handful of internal, logistical things that would have to happen for
sparse index to become the default. Currently, commands are assumed to
not work with the sparse index.
- Question for everybody: what is a good balance between pushing sparse
index, and waiting to introduce it to more users by holding off on
changing the default.
- (Stolee): sparse checkout and submodules became a difficulty when
mentoring their GSoC student.
- (JTan): possible to decouple sparse index and cone-mode sparse
checkouts from each other? This would be easy to test - turn it on,
all of the test suite automatically uses it. Jrnieder: This sounds ok
for the filesystem, but I don't know how this would work for this
"VFS-backed Git" idea on the spreadsheet. (other things…)
- (Stolee): We need cone mode today because they're the only way to
definitively say that we've reached the boundary. But we can also
expand the idea of "cone" to allow more paths (files instead of
directories) in the cone.
- (Taylor): What do we need to tell subcommands to assume that sparse
index is supported? (Victoria) Gut feeling for the most part.
(Jrnieder): I'd prefer this to happen sooner rather than later. This
is easier for maintainability since we don't have to worry about
commands being in two possible modes of operation. We can break these
incompatible APIs by renaming them to prevent them from being misused
by new commands.
- (Victoria): So just break things that always use the full index?
Sounds ok. (Stolee) This sounds similar to the_index macros, which
we've tried to remove for the most part but we've stopped. Doing this
conversion everywhere sounds extremely difficult - we've done an audit
on this. (Jrnieder): Oh, I just meant renaming the API without
changing semantics. Intentionally break everything.
- (Victoria): We'd need to write new tests for lots of commands because
the existing tests don't actually interact with the "sparse" parts of
the index.
- (Ævar): Is this just a matter of telling `git init` to initialize a
sparse index? (Stolee): No, we need to force the tests to work on
sparse directory entries.
- (Jrnieder): This sounds like a good fit for feature.experimental
- (VIctoria): Is sparse index a good git default instead of just "for
large repos"? I'd think yes. (Jrnieder) Yes I think any sparse
checkout user would want this. (Stolee): LIterally every command that
touches the index has been converted (used for Microsoft Office
monorepo), so it's just a matter of doing this for the whole project.
- (Elijah): I would like partial filters from sparse patterns.
--filter=blob:none doesn't let you disconnect from the server.
- (Jrnieder) The DX of sparse checkout + blob:none has been pretty
good. (Elijah): but you need to stay connected to the server.
(Jrnieder) Ah, thanks for explaining, sorry for the confusion
(Elijah) It would be great to have "sparse clone"s and have
commands that work just inside of that cone when disconnected. Make
"grep", "log", etc respect the sparse pattern
- (Stolee): We've thought about this, but it is very expensive on the
server side and makes bitmaps unusable. Alternatively, we could
start with blob:none and then backfill. That sounds more promising,
but that's not just a plain partial clone.
- (Jonathantanmy): FYI there's a protocol feature that already allows
clones to specify a sparse filter (referencing a blob with sparse
patterns that's present on the server), but I don't know of any
implementation that has this enabled.
- (Jrnieder): Can we delete this? (Github folks): We don't like it,
we invented the uploadpackfilter config to disable it :) Is this
just cleanup? (Jrnieder): Yes, and this dead end will stop being a
distraction.
- (Peff) We could already implement most of the backfilling using
current commands, but that might skip over some delta-ing
optimizations. We could have a protocol change to provide the path
as a hint to the server.
next prev parent reply other threads:[~2022-09-29 19:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-29 19:17 Notes from the Git Contributor's Summit, 2022 Taylor Blau
2022-09-29 19:19 ` [TOPIC 1/8] Bundle URIs Taylor Blau
2022-09-29 19:19 ` [TOPIC 2/8] State of SHA-256 transition Taylor Blau
2022-09-29 19:20 ` [TOPIC 3/8] Merge ORT timeline Taylor Blau
2022-09-29 19:20 ` [TOPIC 4/8] Commit `--filter`'s Taylor Blau
2022-09-29 19:21 ` [TOPIC 5/8] Server side merges and rebases Taylor Blau
2022-09-29 19:21 ` Taylor Blau [this message]
2022-09-29 19:21 ` [TOPIC 7/8] Speeding up the connectivity check Taylor Blau
2022-09-29 19:22 ` [TOPIC 8/8] Using Git securely in shared services Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YzXwOsaoCdBhHsX1@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).