From: Elijah Newren <newren@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>,
"Git Mailing List" <git@vger.kernel.org>,
"Jonathan Tan" <jonathantanmy@google.com>,
"Jose Lopes" <jabolopes@google.com>,
"Jeff Hostetler" <jeffhostetler@github.com>,
"Derrick Stolee" <derrickstolee@github.com>,
"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2] repo_read_index: add config to expect files outside sparse patterns
Date: Tue, 22 Feb 2022 19:10:46 -0800 [thread overview]
Message-ID: <CABPp-BHv4VVyPv1v-pgPXf_u=6EpRhcn7Cm551o2HEbdEthAaw@mail.gmail.com> (raw)
In-Reply-To: <YhWbWOd6PeF1RZw1@google.com>
On Tue, Feb 22, 2022 at 6:26 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> From: Elijah Newren <newren@gmail.com>
>
> Typically with sparse checkouts, we expect files outside the sparsity
> patterns to be marked as SKIP_WORKTREE and be missing from the working
> tree. In edge cases, this can be violated and cause confusion, so in
> a sparse checkout, since 11d46a399d ("repo_read_index: clear
> SKIP_WORKTREE bit from files present in worktree", 2022-01-06), Git
> automatically clears the SKIP_WORKTREE bit at read time for entries
> corresponding to files that are present in the working tree.
>
> However, there is a more atypical situation where this situation would
> be expected. A Git-aware virtual file system[1] takes advantage of
> its position as a file system driver to expose all files in the
> working tree, fetch them on demand using partial clone on access, and
> tell Git to pay attention to them on demand by updating the sparse
> checkout pattern on writes. This means that commands like "git
> status" only has to examine files that have potentially been modified,
> whereas commands like "ls" are able to show the entire codebase
> without requiring manual updates to the sparse checkout pattern.
Should that be s/commands/a command/ or else s/only has/only have/?
> Thus since 11d46a399d, Git with such Git-aware virtual file systems
> unsets the SKIP_WORKTREE bit for all files and commands like "git
> status" have to fetch and examine them all.
>
> Introduce a configuration setting sparse.expectFilesOutsideOfPatterns
> to allow limiting the tracked set of files to a small set once again.
> A Git-aware virtual file system or other application that wants to
> maintain files outside of the sparse checkout can set this in a
> repository to instruct Git not to check for the presence of
> SKIP_WORKTREE files. The setting defaults to false, so most users of
> sparse checkout will still get the benefit of an automatically
> updating index to recover from interrupted updates that forget to
Please don't presume that these only come from interrupted updates.
As per the referenced 11d46a399d that started all this:
"""
There are various ways for users to get files to be present in the
working copy despite having the SKIP_WORKTREE bit set for that file in
the index. This may come from:
* various git commands not really supporting the SKIP_WORKTREE bit[1,2]
* users grabbing files from elsewhere and writing them to the worktree
(perhaps even cached in their editor)
* users attempting to "abort" a sparse-checkout operation with a
not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
working tree is not atomic)[3].
"""
> delete some files or unset SKIP_WORKTREE for them.
The only problem 11d46a399d corrects is having the SKIP_WORKTREE being
*set* despite the file being present. So the "recover from...updates
that...unset SKIP_WORKTREE" doesn't make any sense.
> [1] such as the vfsd described in
> https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
>
> [jn: fleshed out commit message and documentation, added missing
> include to config.txt, moved to a separate config callback]
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
> Some minor updates, but this is basically the same as the patch you
> sent. Thoughts?
...and it looks even more like the v2 I was about to send, so it seems
like we're pretty much on the same page. :-)
Overall, it looks good, and your version has some things that are
nicer than mine, but I did have a couple small notes on the commit
message above and a similar one on the config description below.
> Thanks,
> Jonathan
>
> Documentation/config.txt | 2 ++
> Documentation/config/sparse.txt | 24 ++++++++++++++++++++++++
> cache.h | 1 +
> config.c | 14 ++++++++++++++
> environment.c | 1 +
> sparse-index.c | 3 ++-
> t/t1090-sparse-checkout-scope.sh | 19 +++++++++++++++++++
> 7 files changed, 63 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/config/sparse.txt
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index b168f02dc3d..8628ae2634d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -468,6 +468,8 @@ include::config/sequencer.txt[]
>
> include::config/showbranch.txt[]
>
> +include::config/sparse.txt[]
> +
> include::config/splitindex.txt[]
>
> include::config/ssh.txt[]
> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> new file mode 100644
> index 00000000000..c790c728276
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,24 @@
> +sparse.expectFilesOutsideOfPatterns::
> + Typically with sparse checkouts, files not matching any
> + sparsity patterns are marked as such in the index file and
> + missing from the working tree. Accordingly, Git will
> + ordinarily check whether files that the index indicates are
> + outside of the sparse area are present in the working tree and
> + mark them as present in the index if so. This option can be
> + used to tell Git that such present-but-unmatching files are
> + expected and to stop checking for them.
> ++
> +The default is `false`. Leaving this set to `false` is recommended in
> +most situations because it allows Git to recover from an interrupted
> +operation that updated the working tree without updating the index or
> +vice versa.
Again, please don't claim this is only for recovering from interrupted
operations; there are other cases -- Git commands that write the
working tree but not the index (checkout-index, git apply), and users
mucking around with files, for example.
> ++
> +A Git-based virtual file system (VFS) can turn the usual expectation
> +on its head: files are present in the working copy but do not take
> +up much disk space because their contents are not downloaded until
> +they are accessed. With such a virtual file system layer, most files
> +do not match the sparsity patterns at first, and the VFS layer
> +updates the sparsity patterns to add more files whenever files are
> +written. Setting this to `true` supports such a setup where files are
> +expected to be present outside the sparse area and a separate, robust
> +mechanism is responsible for keeping the sparsity patterns up to date.
> diff --git a/cache.h b/cache.h
> index 281f00ab1b1..b6b8e83ae35 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1003,6 +1003,7 @@ extern const char *core_fsmonitor;
>
> extern int core_apply_sparse_checkout;
> extern int core_sparse_checkout_cone;
> +extern int sparse_expect_files_outside_of_patterns;
>
> /*
> * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
> diff --git a/config.c b/config.c
> index 2bffa8d4a01..9b9ad1500aa 100644
> --- a/config.c
> +++ b/config.c
> @@ -1544,6 +1544,17 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
> return platform_core_config(var, value, cb);
> }
>
> +static int git_default_sparse_config(const char *var, const char *value)
> +{
> + if (!strcmp(var, "sparse.expectfilesoutsideofpatterns")) {
> + sparse_expect_files_outside_of_patterns = git_config_bool(var, value);
> + return 0;
> + }
> +
> + /* Add other config variables here and to Documentation/config/sparse.txt. */
> + return 0;
> +}
> +
> static int git_default_i18n_config(const char *var, const char *value)
> {
> if (!strcmp(var, "i18n.commitencoding"))
> @@ -1675,6 +1686,9 @@ int git_default_config(const char *var, const char *value, void *cb)
> return 0;
> }
>
> + if (starts_with(var, "sparse."))
> + return git_default_sparse_config(var, value);
> +
> /* Add other config variables here and to Documentation/config.txt. */
> return 0;
> }
> diff --git a/environment.c b/environment.c
> index fd0501e77a5..fb55bf61290 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -70,6 +70,7 @@ char *notes_ref_name;
> int grafts_replace_parents = 1;
> int core_apply_sparse_checkout;
> int core_sparse_checkout_cone;
> +int sparse_expect_files_outside_of_patterns;
> int merge_log_config = -1;
> int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
> unsigned long pack_size_limit_cfg;
> diff --git a/sparse-index.c b/sparse-index.c
> index eed170cd8f7..daeb5112a18 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -396,7 +396,8 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>
> int i;
>
> - if (!core_apply_sparse_checkout)
> + if (!core_apply_sparse_checkout ||
> + sparse_expect_files_outside_of_patterns)
> return;
>
> restart:
> diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
> index 3deb4901874..d1833c0f31b 100755
> --- a/t/t1090-sparse-checkout-scope.sh
> +++ b/t/t1090-sparse-checkout-scope.sh
> @@ -52,6 +52,25 @@ test_expect_success 'return to full checkout of main' '
> test "$(cat b)" = "modified"
> '
>
> +test_expect_success 'skip-worktree on files outside sparse patterns' '
> + git sparse-checkout disable &&
> + git sparse-checkout set --no-cone "a*" &&
> + git checkout-index --all --ignore-skip-worktree-bits &&
> +
> + git ls-files -t >output &&
> + ! grep ^S output >actual &&
> + test_must_be_empty actual &&
> +
> + test_config sparse.expectFilesOutsideOfPatterns true &&
> + cat <<-\EOF >expect &&
> + S b
> + S c
> + EOF
> + git ls-files -t >output &&
> + grep ^S output >actual &&
> + test_cmp expect actual
> +'
> +
> test_expect_success 'in partial clone, sparse checkout only fetches needed blobs' '
> test_create_repo server &&
> git clone "file://$(pwd)/server" client &&
> --
> 2.35.1.574.g5d30c73bfb
next prev parent reply other threads:[~2022-02-23 3:11 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-20 5:05 [PATCH] Provide config option to expect files outside sparse patterns Elijah Newren via GitGitGadget
2022-02-20 19:41 ` Derrick Stolee
2022-02-20 20:16 ` Junio C Hamano
2022-02-22 2:17 ` Elijah Newren
2022-02-22 12:28 ` Johannes Schindelin
2022-02-22 13:43 ` Derrick Stolee
2022-02-21 20:34 ` Johannes Schindelin
2022-02-21 22:53 ` Ævar Arnfjörð Bjarmason
2022-02-22 2:25 ` Elijah Newren
2022-02-22 12:13 ` Johannes Schindelin
2022-02-22 12:57 ` Ævar Arnfjörð Bjarmason
2022-02-22 23:13 ` Jonathan Nieder
2022-02-25 16:39 ` Ævar Arnfjörð Bjarmason
2022-02-22 2:23 ` Elijah Newren
2022-02-22 10:05 ` Ævar Arnfjörð Bjarmason
2022-02-22 12:11 ` Johannes Schindelin
2022-02-22 13:47 ` Derrick Stolee
2022-02-23 2:26 ` [PATCH v2] repo_read_index: add config " Jonathan Nieder
2022-02-23 3:10 ` Elijah Newren [this message]
2022-02-24 5:22 ` [PATCH v3] " Elijah Newren
2022-02-24 18:24 ` Junio C Hamano
2022-02-26 5:58 ` Elijah Newren
2022-02-25 16:33 ` Jonathan Nieder
2022-02-26 6:01 ` Elijah Newren
2022-02-26 6:12 ` [PATCH v4] " Elijah Newren
2022-03-02 4:33 ` [PATCH v5] " Elijah Newren
2022-03-02 7:36 ` Junio C Hamano
2022-03-02 8:01 ` Elijah Newren
2022-03-02 13:37 ` [PATCH v4] " Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABPp-BHv4VVyPv1v-pgPXf_u=6EpRhcn7Cm551o2HEbdEthAaw@mail.gmail.com' \
--to=newren@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=jabolopes@google.com \
--cc=jeffhostetler@github.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).