git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren <newren@gmail.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Matheus Tavares Bernardino <matheus.bernardino@usp.br>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 2/2] sparse-checkout: clear tracked sparse dirs
Date: Mon, 2 Aug 2021 10:34:28 -0400	[thread overview]
Message-ID: <76639e16-204d-7812-d4c5-56c70e280bed@gmail.com> (raw)
In-Reply-To: <CABPp-BGbRbyCYYS+NcYrC-T4hJf7BCoLE2HsXFM4K51A0wSgcg@mail.gmail.com>

On 7/30/2021 9:52 AM, Elijah Newren wrote:
> On Thu, Jul 29, 2021 at 11:27 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>> Leaving these ignored files in the sparse directories makes it
>> impossible to gain performance benefits in the sparse index. When we
>> track into these directories, we need to know if the files are ignored
>> or not, which might depend on the _tracked_ .gitignore file(s) within
>> the sparse directory. This depends on the indexed version of the file,
>> so the sparse directory must be expanded.
> 
> Is this the issue I highlighted at
> https://lore.kernel.org/git/CABPp-BHsiLTtv6AuycRrQ5K6q0=ZTJe0kd7uTUL+UT=nxj66zA@mail.gmail.com/,
> the paragraph "...I thought the point of add_patterns()..." or is
> there more or other things involved here?

This is exactly that point. I'm not sure why I didn't pick up on your
earlier concerns as something to do _immediately_, but for some reason
I thought I could delay it until later.

>> If users depend on ignored files within the sparse directories, then
>> they have created a bad shape in their repository. Regardless, such
>> shapes would create risk that changing the behavior for all cone mode
>> users might be too risky to take on at the moment. Since this data shape
>> makes it impossible to get performance benefits using the sparse index,
>> we limit the change to only be enabled when the sparse index is enabled.
>> Users can opt out of this behavior by disabline the sparse index.
> 
> s/disabline/disabling/, otherwise, I fully agree with this paragraph.

Will fix. Thanks.
 
>> Depending on user feedback or real-world use, we might want to consider
>> expanding the behavior change to all of cone mode. Since we are
>> currently restricting to the sparse index case, we can use the existence
>> of sparse directory entries in the index as indicators of which
>> directories should be removed.
> 
> Let's just expand it to all of cone mode.

I'm open to that. I'll leave this version open for feedback a bit longer
before doing so.

>> +       /*
>> +        * NEEDSWORK: For now, only use this behavior when index.sparse
>> +        * is enabled. We may want this behavior enabled whenever using
>> +        * cone mode patterns.
>> +        */
>> +       prepare_repo_settings(r);
>> +       if (!r->worktree || !r->settings.sparse_index)
>> +               return;
> 
> s/may/do/  :-)

Maybe!
 
>> +
>> +       /*
>> +        * Since we now depend on the sparse index to enable this
>> +        * behavior, use it to our advantage. This process is more
>> +        * complicated without it.
>> +        */
> 
> Ah, the real reason why you limited this change to sparse-index comes out.  :-)
> 
>> +       convert_to_sparse(r->index);
>> +
>> +       strbuf_addstr(&path, r->worktree);
>> +       strbuf_complete(&path, '/');
>> +       pathlen = path.len;
>> +
>> +       for (i = 0; i < r->index->cache_nr; i++) {
>> +               struct cache_entry *ce = r->index->cache[i];
>> +
>> +               /*
>> +                * Is this a sparse directory? If so, then definitely
>> +                * include it. All contained content is outside of the
>> +                * patterns.
> 
> "include"?  I would have used the word "remove", but it gets confusing
> because the question is if we want to include it in our list of things
> to remove or remove it from the working directory.

This comment is a bit stale, sorry. Earlier I was populating a list of
the directories, but in this version I'm just deleting them immediately.

>> +                */
>> +               if (S_ISSPARSEDIR(ce->ce_mode) &&
>> +                   repo_file_exists(r, ce->name)) {
>> +                       strbuf_setlen(&path, pathlen);
>> +                       strbuf_addstr(&path, ce->name);
>> +
>> +                       /*
>> +                        * Removal is "best effort". If something blocks
>> +                        * the deletion, then continue with a warning.
>> +                        */
>> +                       if (remove_dir_recursively(&path, 0))
>> +                               warning(_("failed to remove directory '%s'"), path.buf);
> 
> Um, doesn't this delete untracked files that are not ignored as well
> as the ignored files?  If so, was that intentional?  I'm fully on
> board with removing the gitignore'd files, but I'm worried removing
> other untracked files is dangerous.

I believe that 'git sparse-checkout (set|add|reapply)' will fail before
reaching this method if there are untracked files that could potentially
be removed. I will double-check to ensure this is the case. It is
definitely my intention to protect any untracked, non-ignored files in
these directories by failing the sparse-checkout modification.

> My implementation of this concept (in an external tool) was more along
> the lines of
> 
>   * Get $LIST_OF_NON_SPARSE_DIRECTORIES by walking `git ls-files -t`
> output and finding common fully-sparse directories
>   * git clean -fX $LIST_OF_NON_SPARSE_DIRECTORIES

I initially was running 'git clean -dfx -- <dir> ...' but that also
requires parsing and expanding the index (or being very careful with
the sparse index).

> 
>> +               }
>> +       }
>> +
>> +       strbuf_release(&path);
>> +
>> +       /*
>> +        * This is temporary: the sparse-checkout builtin is not
>> +        * integrated with the sparse-index yet, so we need to keep
>> +        * it full during the process.
>> +        */
>> +       ensure_full_index(r->index);
>> +}
>> +
>>  static int update_working_directory(struct pattern_list *pl)
>>  {
>>         enum update_sparsity_result result;
>> @@ -141,6 +207,8 @@ static int update_working_directory(struct pattern_list *pl)
>>         else
>>                 rollback_lock_file(&lock_file);
>>
>> +       clean_tracked_sparse_directories(r);
>> +
>>         r->index->sparse_checkout_patterns = NULL;
>>         return result;
>>  }
> 
> (Adding a comment here just to separate the two blocks below.)
> 
>> @@ -540,8 +608,11 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
>>  {
>>         int result;
>>         int changed_config = 0;
>> +       struct pattern_list *old_pl = xcalloc(1, sizeof(*old_pl));
>>         struct pattern_list *pl = xcalloc(1, sizeof(*pl));
>>
>> +       get_sparse_checkout_patterns(old_pl);
>> +
>>         switch (m) {
>>         case ADD:
>>                 if (core_sparse_checkout_cone)
>> @@ -567,7 +638,9 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
>>                 set_config(MODE_NO_PATTERNS);
>>
>>         clear_pattern_list(pl);
>> +       clear_pattern_list(old_pl);
>>         free(pl);
>> +       free(old_pl);
>>         return result;
>>  }
> 
> You create an old_pl, populate it with get_sparse_checkout_patterns(),
> do nothing with it, then clear and free it?  I'm confused by the above
> two blocks.

Sorry that this is also stale. I should read my own patches more carefully.

I was originally going to focus only on the directories that were "leaving"
the sparse-checkout definition, but that did not catch the 'reapply' case
or cases where the user created the directories by other means.

Will delete these bits.
>> +       git -C repo sparse-checkout reapply &&
>> +       test_path_is_missing repo/folder1 &&
>> +       test_path_is_missing repo/deep/deeper2 &&
>> +       test_path_is_dir repo/obj &&
>> +       test_path_is_file repo/file.o &&
>> +
>> +       git -C repo status --porcelain=v2 >out &&
>> +       test_must_be_empty out &&
>> +
>> +       git -C repo sparse-checkout set deep/deeper2 &&
>> +       test_path_is_missing repo/deep/deeper1 &&
>> +       test_path_is_dir repo/deep/deeper2 &&
> 
> What's the expectation for repo/obj?

I will add

	test_path_is_dir repo/obj

because this ignored directory should not be removed. This is
simulating the build artifacts that might appear in ignored
directories whose parents are in the sparse-checkout definition.

Thanks,
-Stolee

  reply	other threads:[~2021-08-02 14:34 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 17:27 [PATCH 0/2] Sparse index: delete ignored files outside sparse cone Derrick Stolee via GitGitGadget
2021-07-29 17:27 ` [PATCH 1/2] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-07-29 17:27 ` [PATCH 2/2] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-07-30 13:52   ` Elijah Newren
2021-08-02 14:34     ` Derrick Stolee [this message]
2021-08-02 16:17       ` Elijah Newren
2021-08-05  1:55         ` Derrick Stolee
2021-08-05  3:54           ` Elijah Newren
2021-07-30 13:11 ` [PATCH 0/2] Sparse index: delete ignored files outside sparse cone Elijah Newren
2021-08-10 19:50 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 1/8] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 2/8] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 3/8] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-08-19 18:24     ` Elijah Newren
2021-08-20 15:04       ` Derrick Stolee
2021-08-10 19:50   ` [PATCH v2 4/8] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 5/8] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-08-12 17:29     ` Derrick Stolee
2021-08-10 19:50   ` [PATCH v2 6/8] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 7/8] sparse-index: add SPARSE_INDEX_IGNORE_CONFIG flag Derrick Stolee via GitGitGadget
2021-08-10 19:50   ` [PATCH v2 8/8] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-08-17 13:23   ` [PATCH v3 0/8] Sparse index: delete ignored files outside sparse cone Derrick Stolee via GitGitGadget
2021-08-17 13:23     ` [PATCH v3 1/8] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-08-19  7:45       ` Johannes Schindelin
2021-08-20 15:09         ` Derrick Stolee
2021-08-20 16:40           ` Eric Sunshine
2021-08-17 13:23     ` [PATCH v3 2/8] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-08-17 13:23     ` [PATCH v3 3/8] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-08-17 13:23     ` [PATCH v3 4/8] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-08-19  8:01       ` Johannes Schindelin
2021-08-20 15:18         ` Derrick Stolee
2021-08-20 19:35           ` René Scharfe
2021-08-20 20:22             ` René Scharfe
2021-08-19 18:29       ` Elijah Newren
2021-08-17 13:23     ` [PATCH v3 5/8] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-08-19  8:07       ` Johannes Schindelin
2021-08-20 15:30         ` Derrick Stolee
2021-08-17 13:23     ` [PATCH v3 6/8] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-08-19  8:11       ` Johannes Schindelin
2021-08-20 15:36         ` Derrick Stolee
2021-08-19 20:53       ` Elijah Newren
2021-08-20 15:39         ` Derrick Stolee
2021-08-20 16:05           ` Elijah Newren
2021-08-17 13:23     ` [PATCH v3 7/8] sparse-index: add SPARSE_INDEX_IGNORE_CONFIG flag Derrick Stolee via GitGitGadget
2021-08-18 18:59       ` Derrick Stolee
2021-08-17 13:23     ` [PATCH v3 8/8] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-08-19  8:48       ` Johannes Schindelin
2021-08-20 15:49         ` Derrick Stolee
2021-08-20 16:15           ` Elijah Newren
2021-08-20 15:56         ` Elijah Newren
2021-08-23 20:00           ` Johannes Schindelin
2021-08-17 14:09     ` [PATCH v3 0/8] Sparse index: delete ignored files outside sparse cone Elijah Newren
2021-08-24 21:51     ` [PATCH v4 00/10] " Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 01/10] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 02/10] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 03/10] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 04/10] sparse-index: use WRITE_TREE_MISSING_OK Derrick Stolee via GitGitGadget
2021-08-27 21:33         ` Elijah Newren
2021-08-30 13:19           ` Derrick Stolee
2021-08-30 20:08             ` Elijah Newren
2021-08-24 21:51       ` [PATCH v4 05/10] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-08-24 22:21         ` René Scharfe
2021-08-25  1:09           ` Derrick Stolee
2021-08-24 21:51       ` [PATCH v4 06/10] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 07/10] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 08/10] sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 09/10] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-08-24 21:51       ` [PATCH v4 10/10] sparse-checkout: add config to disable deleting dirs Derrick Stolee via GitGitGadget
2021-08-27 20:58         ` Elijah Newren
2021-08-30 13:30           ` Derrick Stolee
2021-08-30 20:11             ` Elijah Newren
2021-08-27 21:56       ` [PATCH v4 00/10] Sparse index: delete ignored files outside sparse cone Elijah Newren
2021-08-27 22:01         ` Elijah Newren
2021-08-30 13:34           ` Derrick Stolee
2021-08-30 20:14             ` Elijah Newren
2021-08-30 13:54         ` Derrick Stolee
2021-08-30 20:23           ` Elijah Newren
2021-09-08  1:42       ` [PATCH v5 0/9] " Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 1/9] t7519: rewrite sparse index test Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 2/9] sparse-index: silently return when not using cone-mode patterns Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 3/9] unpack-trees: fix nested sparse-dir search Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 4/9] sparse-index: silently return when cache tree fails Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 5/9] sparse-index: use WRITE_TREE_MISSING_OK Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 6/9] sparse-checkout: create helper methods Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 7/9] attr: be careful about sparse directories Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 8/9] sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag Derrick Stolee via GitGitGadget
2021-09-08  1:42         ` [PATCH v5 9/9] sparse-checkout: clear tracked sparse dirs Derrick Stolee via GitGitGadget
2021-09-08  5:21         ` [PATCH v5 0/9] Sparse index: delete ignored files outside sparse cone Junio C Hamano
2021-09-08  6:56           ` Junio C Hamano
2021-09-08 11:39             ` Derrick Stolee
2021-09-08 16:11               ` Junio C Hamano
2021-09-08  5:30         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76639e16-204d-7812-d4c5-56c70e280bed@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).