git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: newren@gmaill.com, peff@peff.net, me@ttaylorr.com,
	jrnieder@gmail.com, Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
Date: Thu, 07 May 2020 13:17:32 +0000	[thread overview]
Message-ID: <pull.627.git.1588857462.gitgitgadget@gmail.com> (raw)

This is based on ds/sparse-allow-empty-working-tree.

BACKGROUND
==========

After discussing the future of sparse-checkout at the Git Contributor
Summit, Minh Tai caught me during the break to introduce a very interesting
idea: in-tree sparse-checkout definitions.

Here is the context: I was talking about how the Office team is building a
tool to update the sparse-checkout definition dynamically as their build
definitions change. They have a very compartmentalized build system, and
certain "projects" depend on each other. To ensure that builds do not break
as dependencies change, the plan was to have this tool run as a hook when
HEAD changes. This is a bit expensive to do, and we were discussing how much
we need this to be automated versus a part of the build process.

This kind of idea would need to be replicated for every team that wants to
use sparse-checkout in this way. That doesn't bode well for wide adoption of
the feature.

IN-TREE SPARSE-CHECKOUT DEFINITIONS
===================================

Minh's idea was simple: have sparse-checkout files in the working directory
and use config to point to them. As these in-tree files update, we can
automatically update the sparse-checkout definition accordingly. Now, the
only thing to do would be to ensure that the sparse-checkout files are
updated when someone updates the build definitions. This requires some extra
build validation, but would not require special tools built on every client.

As soon as I sat down to implement this idea, I discovered some
complications to that basic idea. Each of these sparse-checkout definitions
should fit a "role" on the team. To continue with the Office analogy,
suppose we had a definition for developers working on each of Word,
PowerPoint, and Excel. What happens when someone wants to port a feature
from Excel to Word? That user would want to take the union of the two
sparse-checkout definitions. But what does that mean when we are working
with arbitrary sparse-checkout files?

This leads to the restriction that I built into this feature: we only care
about "cone mode" definitions. These rely on directory-based prefix matches
that are very fast. With this restriction, it is simple to understand what
the "union" operation should do: take all directories that would be included
by either.

Once we restrict to cone mode, there is no reason to continue storing files
using the sparse-checkout file format. The list of patterns is larger than
the input to "git sparse-checkout set" or output from "git sparse-checkout
list" in cone mode, and more prone to user error. Thus, let's be simpler and
use a list of directories to specify the sparse-checkout definition.

This leads to the second issue that complicates things. If a build
dependency changes to a core library, but there are N "roles" that depend on
that library, then the person changing that library also needs to change and
validate N sparse-checkout definitions! This does not scale well. So, let's
take a page from build dependencies and allow these in-tree sparse-checkout
definitions to depend on each other!

The end result is a mechanism that should be flexible enough to future needs
but simple enough to build using existing Git features. The format re-uses
the config file format. This provides good re-use of code as well as being
something easy to extend in the future. In Patch 4, I tried to describe
several alternatives and why this is the best of those options. I'm open to
suggestions here.

OUTLINE
=======

 * Patch 1 is a bugfix that should work on its own. I caught it in tests
   after Patch 5.
   
   
 * Patches 2-7 implement the feature.
   
   
 * Patches 8-9 make the Git build system a bit more robust to missing
   directories.
   
   
 * Patch 10 adds a .sparse directory to Git as well as two in-tree
   sparse-checkout definitions. One is for a bare-bones Linux developer and
   the other adds the compat code for Windows on top.
   
   

As mentioned in patch 10, this allows the following workflow for Git
contributors that want to "eat our own dogfood" with the partial clone and
sparse-checkout features:

 $ git clone --sparse --filter=blob:none https://github.com/git/git$ cd git
$ git sparse-checkout set --in-tree .sparse/base.deps

Then, we have a smaller working directory but can still build and test the
code.

FUTURE DIRECTIONS
=================

There are definitely ways to make this feature more seamless, or more useful
to us.

 * It would be nice to extend git clone to have a string value to 
   --sparse=<file> and possibly have it multi-valued. This would initialize
   the in-tree sparse-checkout definition immediately upon cloning.
   
   
 * I think there are some ways we could reorganize the Git codebase to make
   our "most basic" sparse-checkout file have even fewer files. Mostly, it
   would require modifying the build to ask "do these files exist on-disk?
   if not, then disable this build option." This would also require adding
   test dependencies that also disable some tests when those do not exist.
   
   
 * Currently, if we have an in-tree sparse-checkout definition, then we
   can't add more directories to it manually. If someone runs git
   sparse-checkout set --in-tree <file> followed by git sparse-checkout add
   <dir>, then will disappear as it is overridden by the in-tree definition.
   The only way I can think to get around this would be to add these
   directories through Git config, since we were using the sparse-checkout
   file as the storage of "what directories do I care about?" but it is no
   longer the source of truth in this situation. I'm open to ideas here, and
   it would be nice to have this interaction hardened before moving the
   series out of RFC status.
   
   

Thanks, -Stolee

Derrick Stolee (10):
  unpack-trees: avoid array out-of-bounds error
  sparse-checkout: move code from builtin
  sparse-checkout: move code from unpack-trees.c
  sparse-checkout: allow in-tree definitions
  sparse-checkout: automatically update in-tree definition
  sparse-checkout: use oidset to prevent repeat blobs
  sparse-checkout: define in-tree dependencies
  Makefile: skip git-gui if dir is missing
  Makefile: disable GETTEXT when 'po' is missing
  .sparse: add in-tree sparse-checkout for Git

 .sparse/base.deps                     |  19 ++
 .sparse/windows.deps                  |   3 +
 Documentation/config.txt              |   2 +
 Documentation/config/sparse.txt       |  15 +
 Documentation/git-sparse-checkout.txt |  70 ++++-
 Makefile                              |  19 +-
 builtin/commit.c                      |   4 +-
 builtin/read-tree.c                   |   4 +
 builtin/sparse-checkout.c             | 268 +++--------------
 read-cache.c                          |   8 +-
 sparse-checkout.c                     | 418 ++++++++++++++++++++++++++
 sparse-checkout.h                     |  30 ++
 t/lib-gettext.sh                      |   1 -
 t/t0200-gettext-basic.sh              |   6 +-
 t/t1091-sparse-checkout-builtin.sh    | 143 +++++++++
 t/test-lib.sh                         |   7 +-
 unpack-trees.c                        |  10 +-
 17 files changed, 785 insertions(+), 242 deletions(-)
 create mode 100644 .sparse/base.deps
 create mode 100644 .sparse/windows.deps
 create mode 100644 Documentation/config/sparse.txt
 create mode 100644 sparse-checkout.c
 create mode 100644 sparse-checkout.h


base-commit: ace224ac5fb120e9cae894e31713ab60e91f141f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-627%2Fderrickstolee%2Fsparse-checkout-in-tree-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-627/derrickstolee/sparse-checkout-in-tree-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/627
-- 
gitgitgadget

             reply	other threads:[~2020-05-07 13:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 13:17 Derrick Stolee via GitGitGadget [this message]
2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
2020-05-07 22:27   ` Junio C Hamano
2020-05-08 12:19     ` Derrick Stolee
2020-05-08 15:09       ` Junio C Hamano
2020-05-20 16:32     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 02/10] sparse-checkout: move code from builtin Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 03/10] sparse-checkout: move code from unpack-trees.c Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
2020-05-07 22:58   ` Junio C Hamano
2020-05-08 15:40     ` Derrick Stolee
2020-05-20 17:52       ` Elijah Newren
2020-06-17 23:07         ` Elijah Newren
2020-06-18  8:18           ` Son Luong Ngoc
2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
2020-05-20 16:28   ` Elijah Newren
2020-05-07 13:17 ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Derrick Stolee via GitGitGadget
2020-05-20 16:40   ` Elijah Newren
2020-05-21  3:49     ` Elijah Newren
2020-05-21 17:54       ` Derrick Stolee
2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
2020-05-20 18:10   ` Elijah Newren
2020-05-30 17:26     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 08/10] Makefile: skip git-gui if dir is missing Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 09/10] Makefile: disable GETTEXT when 'po' " Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git Derrick Stolee via GitGitGadget
2020-05-20 17:38 ` [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Elijah Newren
2020-06-17 23:14 ` Elijah Newren
2020-06-18  1:42   ` Derrick Stolee
2020-06-18  1:59     ` Elijah Newren
2020-06-18  3:01       ` Derrick Stolee
2020-06-18  5:03         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.627.git.1588857462.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmaill.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).