git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Jose Lopes" <jabolopes@google.com>,
	"Jeff Hostetler" <jeffhostetler@github.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2] repo_read_index: add config to expect files outside sparse patterns
Date: Tue, 22 Feb 2022 19:10:46 -0800	[thread overview]
Message-ID: <CABPp-BHv4VVyPv1v-pgPXf_u=6EpRhcn7Cm551o2HEbdEthAaw@mail.gmail.com> (raw)
In-Reply-To: <YhWbWOd6PeF1RZw1@google.com>

On Tue, Feb 22, 2022 at 6:26 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> From: Elijah Newren <newren@gmail.com>
>
> Typically with sparse checkouts, we expect files outside the sparsity
> patterns to be marked as SKIP_WORKTREE and be missing from the working
> tree.  In edge cases, this can be violated and cause confusion, so in
> a sparse checkout, since 11d46a399d ("repo_read_index: clear
> SKIP_WORKTREE bit from files present in worktree", 2022-01-06), Git
> automatically clears the SKIP_WORKTREE bit at read time for entries
> corresponding to files that are present in the working tree.
>
> However, there is a more atypical situation where this situation would
> be expected.  A Git-aware virtual file system[1] takes advantage of
> its position as a file system driver to expose all files in the
> working tree, fetch them on demand using partial clone on access, and
> tell Git to pay attention to them on demand by updating the sparse
> checkout pattern on writes.  This means that commands like "git
> status" only has to examine files that have potentially been modified,
> whereas commands like "ls" are able to show the entire codebase
> without requiring manual updates to the sparse checkout pattern.

Should that be s/commands/a command/ or else s/only has/only have/?

> Thus since 11d46a399d, Git with such Git-aware virtual file systems
> unsets the SKIP_WORKTREE bit for all files and commands like "git
> status" have to fetch and examine them all.
>
> Introduce a configuration setting sparse.expectFilesOutsideOfPatterns
> to allow limiting the tracked set of files to a small set once again.
> A Git-aware virtual file system or other application that wants to
> maintain files outside of the sparse checkout can set this in a
> repository to instruct Git not to check for the presence of
> SKIP_WORKTREE files.  The setting defaults to false, so most users of
> sparse checkout will still get the benefit of an automatically
> updating index to recover from interrupted updates that forget to

Please don't presume that these only come from interrupted updates.
As per the referenced 11d46a399d that started all this:

"""
    There are various ways for users to get files to be present in the
    working copy despite having the SKIP_WORKTREE bit set for that file in
    the index.  This may come from:
      * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
      * users grabbing files from elsewhere and writing them to the worktree
        (perhaps even cached in their editor)
      * users attempting to "abort" a sparse-checkout operation with a
        not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
        working tree is not atomic)[3].
"""

> delete some files or unset SKIP_WORKTREE for them.

The only problem 11d46a399d corrects is having the SKIP_WORKTREE being
*set* despite the file being present.  So the "recover from...updates
that...unset SKIP_WORKTREE" doesn't make any sense.

> [1] such as the vfsd described in
> https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
>
> [jn: fleshed out commit message and documentation, added missing
>  include to config.txt, moved to a separate config callback]
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
> Some minor updates, but this is basically the same as the patch you
> sent.  Thoughts?

...and it looks even more like the v2 I was about to send, so it seems
like we're pretty much on the same page.  :-)

Overall, it looks good, and your version has some things that are
nicer than mine, but I did have a couple small notes on the commit
message above and a similar one on the config description below.

> Thanks,
> Jonathan
>
>  Documentation/config.txt         |  2 ++
>  Documentation/config/sparse.txt  | 24 ++++++++++++++++++++++++
>  cache.h                          |  1 +
>  config.c                         | 14 ++++++++++++++
>  environment.c                    |  1 +
>  sparse-index.c                   |  3 ++-
>  t/t1090-sparse-checkout-scope.sh | 19 +++++++++++++++++++
>  7 files changed, 63 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/config/sparse.txt
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index b168f02dc3d..8628ae2634d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -468,6 +468,8 @@ include::config/sequencer.txt[]
>
>  include::config/showbranch.txt[]
>
> +include::config/sparse.txt[]
> +
>  include::config/splitindex.txt[]
>
>  include::config/ssh.txt[]
> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> new file mode 100644
> index 00000000000..c790c728276
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,24 @@
> +sparse.expectFilesOutsideOfPatterns::
> +       Typically with sparse checkouts, files not matching any
> +       sparsity patterns are marked as such in the index file and
> +       missing from the working tree.  Accordingly, Git will
> +       ordinarily check whether files that the index indicates are
> +       outside of the sparse area are present in the working tree and
> +       mark them as present in the index if so.  This option can be
> +       used to tell Git that such present-but-unmatching files are
> +       expected and to stop checking for them.
> ++
> +The default is `false`.  Leaving this set to `false` is recommended in
> +most situations because it allows Git to recover from an interrupted
> +operation that updated the working tree without updating the index or
> +vice versa.

Again, please don't claim this is only for recovering from interrupted
operations; there are other cases -- Git commands that write the
working tree but not the index (checkout-index, git apply), and users
mucking around with files, for example.

> ++
> +A Git-based virtual file system (VFS) can turn the usual expectation
> +on its head: files are present in the working copy but do not take
> +up much disk space because their contents are not downloaded until
> +they are accessed.  With such a virtual file system layer, most files
> +do not match the sparsity patterns at first, and the VFS layer
> +updates the sparsity patterns to add more files whenever files are
> +written.  Setting this to `true` supports such a setup where files are
> +expected to be present outside the sparse area and a separate, robust
> +mechanism is responsible for keeping the sparsity patterns up to date.
> diff --git a/cache.h b/cache.h
> index 281f00ab1b1..b6b8e83ae35 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1003,6 +1003,7 @@ extern const char *core_fsmonitor;
>
>  extern int core_apply_sparse_checkout;
>  extern int core_sparse_checkout_cone;
> +extern int sparse_expect_files_outside_of_patterns;
>
>  /*
>   * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
> diff --git a/config.c b/config.c
> index 2bffa8d4a01..9b9ad1500aa 100644
> --- a/config.c
> +++ b/config.c
> @@ -1544,6 +1544,17 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>         return platform_core_config(var, value, cb);
>  }
>
> +static int git_default_sparse_config(const char *var, const char *value)
> +{
> +       if (!strcmp(var, "sparse.expectfilesoutsideofpatterns")) {
> +               sparse_expect_files_outside_of_patterns = git_config_bool(var, value);
> +               return 0;
> +       }
> +
> +       /* Add other config variables here and to Documentation/config/sparse.txt. */
> +       return 0;
> +}
> +
>  static int git_default_i18n_config(const char *var, const char *value)
>  {
>         if (!strcmp(var, "i18n.commitencoding"))
> @@ -1675,6 +1686,9 @@ int git_default_config(const char *var, const char *value, void *cb)
>                 return 0;
>         }
>
> +       if (starts_with(var, "sparse."))
> +               return git_default_sparse_config(var, value);
> +
>         /* Add other config variables here and to Documentation/config.txt. */
>         return 0;
>  }
> diff --git a/environment.c b/environment.c
> index fd0501e77a5..fb55bf61290 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -70,6 +70,7 @@ char *notes_ref_name;
>  int grafts_replace_parents = 1;
>  int core_apply_sparse_checkout;
>  int core_sparse_checkout_cone;
> +int sparse_expect_files_outside_of_patterns;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;
> diff --git a/sparse-index.c b/sparse-index.c
> index eed170cd8f7..daeb5112a18 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -396,7 +396,8 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>
>         int i;
>
> -       if (!core_apply_sparse_checkout)
> +       if (!core_apply_sparse_checkout ||
> +           sparse_expect_files_outside_of_patterns)
>                 return;
>
>  restart:
> diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
> index 3deb4901874..d1833c0f31b 100755
> --- a/t/t1090-sparse-checkout-scope.sh
> +++ b/t/t1090-sparse-checkout-scope.sh
> @@ -52,6 +52,25 @@ test_expect_success 'return to full checkout of main' '
>         test "$(cat b)" = "modified"
>  '
>
> +test_expect_success 'skip-worktree on files outside sparse patterns' '
> +       git sparse-checkout disable &&
> +       git sparse-checkout set --no-cone "a*" &&
> +       git checkout-index --all --ignore-skip-worktree-bits &&
> +
> +       git ls-files -t >output &&
> +       ! grep ^S output >actual &&
> +       test_must_be_empty actual &&
> +
> +       test_config sparse.expectFilesOutsideOfPatterns true &&
> +       cat <<-\EOF >expect &&
> +       S b
> +       S c
> +       EOF
> +       git ls-files -t >output &&
> +       grep ^S output >actual &&
> +       test_cmp expect actual
> +'
> +
>  test_expect_success 'in partial clone, sparse checkout only fetches needed blobs' '
>         test_create_repo server &&
>         git clone "file://$(pwd)/server" client &&
> --
> 2.35.1.574.g5d30c73bfb

  reply	other threads:[~2022-02-23  3:11 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-20  5:05 [PATCH] Provide config option to expect files outside sparse patterns Elijah Newren via GitGitGadget
2022-02-20 19:41 ` Derrick Stolee
2022-02-20 20:16   ` Junio C Hamano
2022-02-22  2:17   ` Elijah Newren
2022-02-22 12:28     ` Johannes Schindelin
2022-02-22 13:43       ` Derrick Stolee
2022-02-21 20:34 ` Johannes Schindelin
2022-02-21 22:53   ` Ævar Arnfjörð Bjarmason
2022-02-22  2:25     ` Elijah Newren
2022-02-22 12:13       ` Johannes Schindelin
2022-02-22 12:57         ` Ævar Arnfjörð Bjarmason
2022-02-22 23:13           ` Jonathan Nieder
2022-02-25 16:39             ` Ævar Arnfjörð Bjarmason
2022-02-22  2:23   ` Elijah Newren
2022-02-22 10:05     ` Ævar Arnfjörð Bjarmason
2022-02-22 12:11     ` Johannes Schindelin
2022-02-22 13:47     ` Derrick Stolee
2022-02-23  2:26 ` [PATCH v2] repo_read_index: add config " Jonathan Nieder
2022-02-23  3:10   ` Elijah Newren [this message]
2022-02-24  5:22   ` [PATCH v3] " Elijah Newren
2022-02-24 18:24     ` Junio C Hamano
2022-02-26  5:58       ` Elijah Newren
2022-02-25 16:33     ` Jonathan Nieder
2022-02-26  6:01       ` Elijah Newren
2022-02-26  6:12     ` [PATCH v4] " Elijah Newren
2022-03-02  4:33       ` [PATCH v5] " Elijah Newren
2022-03-02  7:36         ` Junio C Hamano
2022-03-02  8:01           ` Elijah Newren
2022-03-02 13:37       ` [PATCH v4] " Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BHv4VVyPv1v-pgPXf_u=6EpRhcn7Cm551o2HEbdEthAaw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jabolopes@google.com \
    --cc=jeffhostetler@github.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).