git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [TOPIC 6/8] State of sparsity work
Date: Thu, 29 Sep 2022 15:21:30 -0400	[thread overview]
Message-ID: <YzXwOsaoCdBhHsX1@nand.local> (raw)
In-Reply-To: <YzXvMRc6X60kjVeY@nand.local>

# State of sparsity developments and future plans (Victoria)

- (Victoria) Integrating commands with sparse index, making them
	compatible with sparse indexes, not un-sparsifying the index before
	executing themselves
- Have worked on some more recently
- GSoC student has worked on a handful as well
- Near-term future is about finding commands that need to touch the
	index, don't support sparse index, and then make them compatible
- In some cases, that is going to require expanding the index to be
	non-sparse, especially if you are touching something outside of the
	sparsity cone
	 - That is somewhat straightforward
- More interesting questions: what is the future of sparsity? Recently,
	Elijah pushed a change to make sparse-checkout's cone mode the default
	(nice, since it is required by sparse index)
- As we move forward, what should we change the defaults of?
	 - Sparse index for sparse checkouts in cone mode?
- Scalar as a testing ground for larger features, including sparse index
	 - Could make sparse index the default in Scalar for cone-mode sparse
		 checkouts, and then see how it goes
	 - Or, could just go for it sooner (after we have integrated sparse
		 index with enough commands)
- A handful of internal, logistical things that would have to happen for
	sparse index to become the default. Currently, commands are assumed to
	not work with the sparse index.
- Question for everybody: what is a good balance between pushing sparse
	index, and waiting to introduce it to more users by holding off on
	changing the default.
- (Stolee): sparse checkout and submodules became a difficulty when
	mentoring their GSoC student.
- (JTan): possible to decouple sparse index and cone-mode sparse
	checkouts from each other? This would be easy to test - turn it on,
	all of the test suite automatically uses it. Jrnieder: This sounds ok
	for the filesystem, but I don't know how this would work for this
		"VFS-backed Git" idea on the spreadsheet. (other things…)
- (Stolee): We need cone mode today because they're the only way to
	definitively say that we've reached the boundary. But we can also
	expand the idea of "cone" to allow more paths (files instead of
	directories) in the cone.
- (Taylor): What do we need to tell subcommands to assume that sparse
	index is supported? (Victoria) Gut feeling for the most part.
	(Jrnieder): I'd prefer this to happen sooner rather than later. This
	is easier for maintainability since we don't have to worry about
	commands being in two possible modes of operation. We can break these
	incompatible APIs by renaming them to prevent them from being misused
	by new commands.
- (Victoria): So just break things that always use the full index?
	Sounds ok.  (Stolee) This sounds similar to the_index macros, which
	we've tried to remove for the most part but we've stopped. Doing this
	conversion everywhere sounds extremely difficult - we've done an audit
	on this. (Jrnieder): Oh, I just meant renaming the API without
	changing semantics. Intentionally break everything.
- (Victoria): We'd need to write new tests for lots of commands because
	the existing tests don't actually interact with the "sparse" parts of
	the index.
- (Ævar): Is this just a matter of telling `git init` to initialize a
	sparse index? (Stolee): No, we need to force the tests to work on
	sparse directory entries.
- (Jrnieder): This sounds like a good fit for feature.experimental
- (VIctoria): Is sparse index a good git default instead of just "for
	large repos"? I'd think yes. (Jrnieder) Yes I think any sparse
	checkout user would want this. (Stolee): LIterally every command that
	touches the index has been converted (used for Microsoft Office
	monorepo), so it's just a matter of doing this for the whole project.
- (Elijah): I would like partial filters from sparse patterns.
	--filter=blob:none doesn't let you disconnect from the server.
	 - (Jrnieder) The DX of sparse checkout + blob:none has been pretty
		 good.  (Elijah): but you need to stay connected to the server.
		 (Jrnieder) Ah, thanks for explaining, sorry for the confusion
		 (Elijah) It would be great to have "sparse clone"s and have
		 commands that work just inside of that cone when disconnected. Make
		 "grep", "log", etc respect the sparse pattern
	 - (Stolee): We've thought about this, but it is very expensive on the
		 server side and makes bitmaps unusable. Alternatively, we could
		 start with blob:none and then backfill. That sounds more promising,
		 but that's not just a plain partial clone.
	 - (Jonathantanmy): FYI there's a protocol feature that already allows
		 clones to specify a sparse filter (referencing a blob with sparse
		 patterns that's present on the server), but I don't know of any
		 implementation that has this enabled.
	 - (Jrnieder): Can we delete this? (Github folks): We don't like it,
		 we invented the uploadpackfilter config to disable it :) Is this
		 just cleanup?  (Jrnieder): Yes, and this dead end will stop being a
		 distraction.
	 - (Peff) We could already implement most of the backfilling using
		 current commands, but that might skip over some delta-ing
		 optimizations. We could have a protocol change to provide the path
		 as a hint to the server.

  parent reply	other threads:[~2022-09-29 19:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-29 19:17 Notes from the Git Contributor's Summit, 2022 Taylor Blau
2022-09-29 19:19 ` [TOPIC 1/8] Bundle URIs Taylor Blau
2022-09-29 19:19 ` [TOPIC 2/8] State of SHA-256 transition Taylor Blau
2022-09-29 19:20 ` [TOPIC 3/8] Merge ORT timeline Taylor Blau
2022-09-29 19:20 ` [TOPIC 4/8] Commit `--filter`'s Taylor Blau
2022-09-29 19:21 ` [TOPIC 5/8] Server side merges and rebases Taylor Blau
2022-09-29 19:21 ` Taylor Blau [this message]
2022-09-29 19:21 ` [TOPIC 7/8] Speeding up the connectivity check Taylor Blau
2022-09-29 19:22 ` [TOPIC 8/8] Using Git securely in shared services Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzXwOsaoCdBhHsX1@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).