git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"René Scharfe" <l.s.r@web.de>,
	"Derrick Stolee" <derrickstolee@github.com>
Subject: Re: [PATCH v5 0/9] Sparse index: delete ignored files outside sparse cone
Date: Tue, 7 Sep 2021 22:30:38 -0700	[thread overview]
Message-ID: <CABPp-BG+VMSL9hXHGA6HdZTNAiZM3wCQEK36-YFr17YU=T921g@mail.gmail.com> (raw)
In-Reply-To: <pull.1009.v5.git.1631065353.gitgitgadget@gmail.com>

On Tue, Sep 7, 2021 at 6:42 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> We launched an experimental release [1] of the sparse-index feature to our
> internal users. We immediately discovered a problem due to the isolated way
> in which we tested the sparse index: we were never building the project and
> changing our sparse-checkout definition.
>
> [1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp
>
> Users who ran a build in one project and then moved to another still had
> build artifacts in their worktree that lived inside the old directories.
> Since the files are marked by the .gitignore patterns, these files were not
> removed by the 'git sparse-checkout set' command. However, they make the
> sparse-index unusable because every 'git status' command needs to expand the
> sparse-directory entries in order to see if the files are tracked or not.
> This made the first experimental release actually slower for all users
> because of this cost.
>
> The solution we shipped to these customers was to change the way our fork
> handles these ignored files. Specifically, instead of Git completely
> ignoring the files, we changed Git to understand that with cone-mode
> sparse-checkout patterns, the users is asking for entire directories to be
> removed from the worktree. The link [1] included earlier has this change.
>
> I believe that this is a reasonable expectation, though I recognize that it
> might look like breaking the expectations of how .gitignore files work.
>
> Since feedback demonstrated that this is a desired behavior, v2 includes
> this behavior for all "cone mode" repositories.
>
> I'm interested in the community's thoughts about this change, as it seems
> like one that we should make carefully and intentionally.
>
> While the rewrite of the t7519 test seems unrelated, it is required to avoid
> a test failure with this change that deletes files outside of the cone. By
> moving the test into repositories not at $TRASH_DIRECTORY, we gain more
> control over the repository structure.
>
>
> Updates in V5
> =============
>
>  * Updated the locality of a cache_entry pointer.
>
>  * Rephrased a comment.
>
>  * Removed the patch adding a config option.
>
>  * I tried, but failed, to create a scenario where the call to
>    cache_tree_update() causes a test to fail. I still think this is valuable
>    as defensive programming for the reasons mentioned in the patch, which is
>    why I didn't remove them here.
>
>
> Updates in V4
> =============
>
>  * Fixed an issue with the split index.
>
>  * The helper methods are used more consistently.
>
>  * The helper method path_in_cone_mode_sparse_checkout() is introduced.
>
>  * Commit messages are edited for clarity.
>
>  * A new config option is added to disable the behavior being added in this
>    series.
>
>  * I split the commit that involves cache_tree_update(). I have not yet
>    succeeded in creating tests to demonstrate why this is required outside
>    of needing it in the Scalar functional tests, which includes a version of
>    partial clone. I will continue to investigate recreating this scenario in
>    the Git test suite, but I wanted to send this version out to get feedback
>    on the things that have changed.
>
>
> Update in V3
> ============
>
>  * As promised [2], the helper methods are fixed to work with non-cone-mode
>    patterns. A later series will use them to their fullest potential
>    (changing git add, git rm, and git mv when interacting with sparse
>    entries).
>
> [2]
> https://lore.kernel.org/git/bac76c72-955d-1ade-4ecf-778ffc45f297@gmail.com/
>
>
> Updates in V2
> =============
>
>  * This version correctly leaves untracked files alone. If untracked files
>    are found, then the directory is left as-is, in case those ignored files
>    are important to the user's work resolving those untracked files.
>
>  * This behavior is now enabled by core.sparseCheckoutCone=true.
>
>  * To use a sparse index as an in-memory data structure even when
>    index.sparse is disabled, a new patch is included to modify the prototype
>    of convert_to_sparse() to include a flags parameter.
>
>  * A few cleanup patches that I was collecting based on feedback from the
>    experimental release and intending for my next series were necessary for
>    this implementation.
>
>  * Cleaned up the tests (no NEEDSWORK) and the remainders of a previous
>    implementation that used run_subcommand().
>
> Thanks, -Stolee
>
> Derrick Stolee (9):
>   t7519: rewrite sparse index test
>   sparse-index: silently return when not using cone-mode patterns
>   unpack-trees: fix nested sparse-dir search
>   sparse-index: silently return when cache tree fails
>   sparse-index: use WRITE_TREE_MISSING_OK
>   sparse-checkout: create helper methods
>   attr: be careful about sparse directories
>   sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag
>   sparse-checkout: clear tracked sparse dirs
>
>  Documentation/git-sparse-checkout.txt | 10 +++
>  attr.c                                | 15 +++++
>  builtin/add.c                         |  7 +-
>  builtin/sparse-checkout.c             | 94 +++++++++++++++++++++++++++
>  dir.c                                 | 52 +++++++++++++++
>  dir.h                                 |  8 +++
>  read-cache.c                          |  4 +-
>  sparse-index.c                        | 76 ++++++++++++----------
>  sparse-index.h                        |  3 +-
>  t/t1091-sparse-checkout-builtin.sh    | 59 +++++++++++++++++
>  t/t7519-status-fsmonitor.sh           | 38 ++++++-----
>  unpack-trees.c                        |  8 ++-
>  12 files changed, 312 insertions(+), 62 deletions(-)
>
>
> base-commit: 80b8d6c56b8a5f5db1d5c2a0159fd808e8a7fc4f
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1009%2Fderrickstolee%2Fsparse-index%2Fignored-files-v5
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1009/derrickstolee/sparse-index/ignored-files-v5
> Pull-Request: https://github.com/gitgitgadget/git/pull/1009
>
> Range-diff vs v4:
>
>   1:  c407b2cb346 =  1:  c407b2cb346 t7519: rewrite sparse index test
>   2:  8660877ba7a =  2:  8660877ba7a sparse-index: silently return when not using cone-mode patterns
>   5:  acdded0f762 !  3:  edb00d3f9aa unpack-trees: fix nested sparse-dir search
>      @@ unpack-trees.c: static int sparse_dir_matches_path(const struct cache_entry *ce,
>        {
>       - struct cache_entry *ce;
>       + const char *path;
>      -+ struct cache_entry *ce = NULL;
>         int pos = find_cache_pos(info, p->path, p->pathlen);
>         struct unpack_trees_options *o = info->data;
>
>       @@ unpack-trees.c: static struct cache_entry *find_cache_entry(struct traverse_info *info,
>      +   * paths (e.g. "subdir-").
>      +   */
>         while (pos >= 0) {
>      -          ce = o->src_index->cache[pos];
>      +-         ce = o->src_index->cache[pos];
>      ++         struct cache_entry *ce = o->src_index->cache[pos];
>
>       -         if (strncmp(ce->name, p->path, p->pathlen))
>       +         if (!skip_prefix(ce->name, info->traverse_path, &path) ||
>   3:  a669740af9a =  4:  c8620de61eb sparse-index: silently return when cache tree fails
>   4:  b379b8fc61a =  5:  37171612424 sparse-index: use WRITE_TREE_MISSING_OK
>   6:  1958751aa0e =  6:  98b4cae2297 sparse-checkout: create helper methods
>   7:  e496f3cee66 !  7:  6ec3cb2042e attr: be careful about sparse directories
>      @@ attr.c: static struct attr_stack *read_attr_from_index(struct index_state *istat
>       +  * .gitattributes file since it will not matter.
>       +  *
>       +  * In the case of a sparse index, it is critical that we don't go
>      -+  * looking for a .gitattributes file, as the index will expand.
>      ++  * looking for a .gitattributes file, as doing so would cause the
>      ++  * index to expand.
>       +  */
>       + if (!path_in_cone_mode_sparse_checkout(path, istate))
>       +         return NULL;
>   8:  cab9360b1e9 =  8:  d57f48c445c sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag
>   9:  c19d93ec5d7 =  9:  91b53f20109 sparse-checkout: clear tracked sparse dirs
>  10:  8d55a6ba2fd <  -:  ----------- sparse-checkout: add config to disable deleting dirs
>

Thanks for all your diligence working on this, and your patience in
explaining all the details to me in the previous rounds.  There wasn't
much left to address after the last round, and this round addresses
the few loose ends.

Reviewed-by: Elijah Newren <newren@gmail.com>

      parent reply	other threads:[~2021-09-08  5:30 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 17:27 [PATCH 0/2] Sparse index: delete ignored files outside sparse cone Derrick Stolee via GitGitGadget
2021-07-29 17:27 ` [PATCH 1/2] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-07-29 17:27 ` [PATCH 2/2] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-07-30 13:52   ` Elijah Newren
2021-08-02 14:34     ` Derrick Stolee
2021-08-02 16:17       ` Elijah Newren
2021-08-05  1:55         ` Derrick Stolee
2021-08-05  3:54           ` Elijah Newren
2021-07-30 13:11 ` [PATCH 0/2] Sparse index: delete ignored files outside sparse cone Elijah Newren
2021-08-10 19:50 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 1/8] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 2/8] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 3/8] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-08-19 18:24     ` Elijah Newren
2021-08-20 15:04       ` Derrick Stolee
2021-08-10 19:50   ` [PATCH v2 4/8] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 5/8] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-08-12 17:29     ` Derrick Stolee
2021-08-10 19:50   ` [PATCH v2 6/8] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 7/8] sparse-index: add SPARSE_INDEX_IGNORE_CONFIG flag Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 8/8] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-08-17 13:23   ` [PATCH v3 0/8] Sparse index: delete ignored files outside sparse cone Derrick Stolee via GitGitGadget
2021-08-17 13:23     ` [PATCH v3 1/8] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-08-19  7:45       ` Johannes Schindelin
2021-08-20 15:09         ` Derrick Stolee
2021-08-20 16:40           ` Eric Sunshine
2021-08-17 13:23     ` [PATCH v3 2/8] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-08-17 13:23     ` [PATCH v3 3/8] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-08-17 13:23     ` [PATCH v3 4/8] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-08-19  8:01       ` Johannes Schindelin
2021-08-20 15:18         ` Derrick Stolee
2021-08-20 19:35           ` René Scharfe
2021-08-20 20:22             ` René Scharfe
2021-08-19 18:29       ` Elijah Newren
2021-08-17 13:23     ` [PATCH v3 5/8] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-08-19  8:07       ` Johannes Schindelin
2021-08-20 15:30         ` Derrick Stolee
2021-08-17 13:23     ` [PATCH v3 6/8] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-08-19  8:11       ` Johannes Schindelin
2021-08-20 15:36         ` Derrick Stolee
2021-08-19 20:53       ` Elijah Newren
2021-08-20 15:39         ` Derrick Stolee
2021-08-20 16:05           ` Elijah Newren
2021-08-17 13:23     ` [PATCH v3 7/8] sparse-index: add SPARSE_INDEX_IGNORE_CONFIG flag Derrick Stolee via GitGitGadget
2021-08-18 18:59       ` Derrick Stolee
2021-08-17 13:23     ` [PATCH v3 8/8] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-08-19  8:48       ` Johannes Schindelin
2021-08-20 15:49         ` Derrick Stolee
2021-08-20 16:15           ` Elijah Newren
2021-08-20 15:56         ` Elijah Newren
2021-08-23 20:00           ` Johannes Schindelin
2021-08-17 14:09     ` [PATCH v3 0/8] Sparse index: delete ignored files outside sparse cone Elijah Newren
2021-08-24 21:51     ` [PATCH v4 00/10] " Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 01/10] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 02/10] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 03/10] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 04/10] sparse-index: use WRITE_TREE_MISSING_OK Derrick Stolee via GitGitGadget
2021-08-27 21:33         ` Elijah Newren
2021-08-30 13:19           ` Derrick Stolee
2021-08-30 20:08             ` Elijah Newren
2021-08-24 21:51       ` [PATCH v4 05/10] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-08-24 22:21         ` René Scharfe
2021-08-25  1:09           ` Derrick Stolee
2021-08-24 21:51       ` [PATCH v4 06/10] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 07/10] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 08/10] sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 09/10] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 10/10] sparse-checkout: add config to disable deleting dirs Derrick Stolee via GitGitGadget
2021-08-27 20:58         ` Elijah Newren
2021-08-30 13:30           ` Derrick Stolee
2021-08-30 20:11             ` Elijah Newren
2021-08-27 21:56       ` [PATCH v4 00/10] Sparse index: delete ignored files outside sparse cone Elijah Newren
2021-08-27 22:01         ` Elijah Newren
2021-08-30 13:34           ` Derrick Stolee
2021-08-30 20:14             ` Elijah Newren
2021-08-30 13:54         ` Derrick Stolee
2021-08-30 20:23           ` Elijah Newren
2021-09-08  1:42       ` [PATCH v5 0/9] " Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 1/9] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 2/9] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 3/9] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 4/9] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 5/9] sparse-index: use WRITE_TREE_MISSING_OK Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 6/9] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 7/9] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 8/9] sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 9/9] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-09-08  5:21         ` [PATCH v5 0/9] Sparse index: delete ignored files outside sparse cone Junio C Hamano
2021-09-08  6:56           ` Junio C Hamano
2021-09-08 11:39             ` Derrick Stolee
2021-09-08 16:11               ` Junio C Hamano
2021-09-08  5:30         ` Elijah Newren [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BG+VMSL9hXHGA6HdZTNAiZM3wCQEK36-YFr17YU=T921g@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=matheus.bernardino@usp.br \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).