git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Victoria Dye <vdye@github.com>
Subject: Re: [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing
Date: Fri, 29 Oct 2021 15:00:01 -0400	[thread overview]
Message-ID: <006b1747-fc6b-61e4-7ce9-f0a6687ddd8f@gmail.com> (raw)
In-Reply-To: <xmqqfssjllen.fsf@gitster.g>

On 10/29/2021 2:45 PM, Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> When converting a full index to sparse, clear and recreate the cache tree
>> only if the cache tree is not fully valid. The convert_to_sparse operation
>> should exit silently if a cache tree update cannot be successfully completed
>> (e.g., due to a conflicted entry state). However, because this failure
>> scenario only occurs when at least a portion of the cache tree is invalid,
>> we can save ourselves the cost of clearing and recreating the cache tree by
>> skipping the check when the cache tree is fully valid.
> 
> I see in cache-tree.c::update_one() this snippet of code.
> 
> 	/*
> 	 * If the first entry of this region is a sparse directory
> 	 * entry corresponding exactly to 'base', then this cache_tree
> 	 * struct is a "leaf" in the data structure, pointing to the
> 	 * tree OID specified in the entry.
> 	 */
> 	if (entries > 0) {
> 		const struct cache_entry *ce = cache[0];
> 
> 		if (S_ISSPARSEDIR(ce->ce_mode) &&
> 		    ce->ce_namelen == baselen &&
> 		    !strncmp(ce->name, base, baselen)) {
> 			it->entry_count = 1;
> 			oidcpy(&it->oid, &ce->oid);
> 			return 1;
> 		}
> 	}
> 
> Sorry for not noticing it earlier, but does this mean that the
> content of a cache-tree changes shape when sparse-ness of the index
> changes?  Is a cache-tree that knows about all of the
> subdirectories, even the ones that are descendants of a directory
> that is represented as a tree-ish entry in the main index array,
> still valid in a sparse index?
> 
> If not, then I do not think of a quick and sure way to ensure that
> the cache-tree is valid when the sparse-ness changes.
> 
> The earlier suggestion was based on my assumption that even when the
> main index array becomes sparse, the cache tree is still populated
> and valid, so that after writing a tree and writing an on-disk index,
> and then reading the on-disk index back (possibly in another process),
> would not have to incur the recomputation cost of the full tree when
> the reading codepath needs to flip the sparseness.
>
> But the above code snippet makes me worried a lot.  A cache-tree
> that used to be valid when the corresponding in-core index array was
> not sparse will become invalid immediately when we decide to make it
> sparse, right?

I think I would argue that the cache-tree is valid in this case. Each
node represents the tree that would be created from that region of the
index if we called 'git commit'. We can reconstruct the contained
subtrees by walking trees if necessary, but that isn't necessary at
this point.

I am writing a blog post about the sparse index, and I go into detail
about how the cache tree is modified. I use the
derrickstolee/trace-flamechart repo [1] as an example, so I will use
it as a concrete discussion of what is happening here.

Here is the cache-tree without sparse-index:

 [ root (left: 0, size: 16) ]
  |
  +---- [ bin/ (left: 2, size: 1) ] 
  |
  +---- [ examples/ (left: 3, size: 11) ]
        |
        +---- [ fetch/ (left: 3, size: 8) ]
        |
        +---- [ maintenance/ (left: 11, size: 3) ]

Then, I run

	git sparse-checkout init --cone --sparse-index
	git sparse-checkout set bin

the cache-tree looks like this:

 [ root (left: 0, size: 6) ]
  |
  +---- [ bin/ (left: 2, size: 1) ] 
  |
  +---- [ examples/ (left: 3, size: 1) ]

The tree OIDs at each of "root", "bin" and "examples" match the trees
for the nodes in the non-sparse cache tree.

By contrast, including any subtrees of "examples" would cause a few
things to happen:

1. The size of the cache-tree would increase (significantly in large
   repos).

2. The contained cache-tree nodes would have size _zero_, since there
   are no cache entries defined within their directories.

I think in this pruned form, the cache-tree is valid and serves all of
the roles it was designed for. We can quickly compute the root tree in
'git commit'. We can navigate the cache-tree in traverse_by_cache_tree()
to accurately describe the trees in the index. These things are tested
in t1092 and the p2000 performance test.

Does this satisfy your concern, or is there a larger expectation of the
cache-tree extension that I am not taking into account?

Thanks,
-Stolee

  reply	other threads:[~2021-10-29 19:00 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-15 19:54 [PATCH 0/2] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget
2021-10-15 19:54 ` [PATCH 1/2] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-15 19:54 ` [PATCH 2/2] sparse-index: update index read to consider index.sparse config Victoria Dye via GitGitGadget
2021-10-15 21:53   ` Junio C Hamano
2021-10-17  1:21     ` Derrick Stolee
2021-10-17  5:58       ` Junio C Hamano
2021-10-17 19:33         ` Derrick Stolee
2021-10-18  1:15           ` Junio C Hamano
2021-10-18 13:25             ` Derrick Stolee
2021-10-18 14:14               ` Victoria Dye
2021-10-21 13:26                 ` Derrick Stolee
2021-10-18 16:09               ` Junio C Hamano
2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget
2021-10-21 20:48   ` [PATCH v2 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-21 20:48   ` [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-10-21 22:20     ` Junio C Hamano
2021-10-27 17:21       ` Victoria Dye
2021-10-21 20:48   ` [PATCH v2 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-10-27 18:20   ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget
2021-10-27 18:20     ` [PATCH v3 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-27 18:20     ` [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-10-27 20:07       ` Derrick Stolee
2021-10-27 21:32         ` Junio C Hamano
2021-10-28  1:24           ` Derrick Stolee
2021-10-29 13:43             ` Victoria Dye
2021-10-27 18:20     ` [PATCH v3 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-10-27 20:08     ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Derrick Stolee
2021-10-29 13:51     ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget
2021-10-29 13:51       ` [PATCH v4 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-29 13:51       ` [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget
2021-10-29 18:45         ` Junio C Hamano
2021-10-29 19:00           ` Derrick Stolee [this message]
2021-10-29 20:01             ` Junio C Hamano
2021-10-29 13:51       ` [PATCH v4 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-10-29 13:51       ` [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-11-22 17:36         ` Elijah Newren
2021-11-22 18:59           ` Victoria Dye
2021-11-22 17:40       ` [PATCH v4 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren
2021-11-23  0:20       ` [PATCH v5 " Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-11-23 17:21         ` [PATCH v5 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=006b1747-fc6b-61e4-7ce9-f0a6687ddd8f@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).