From: Derrick Stolee <derrickstolee@github.com>
To: Junio C Hamano <gitster@pobox.com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, vdye@github.com, shaoxuan.yuan02@gmail.com,
Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2 07/10] sparse-index: partially expand directories
Date: Fri, 20 May 2022 14:33:17 -0400 [thread overview]
Message-ID: <8c093377-74f7-0a1e-fbcc-8a1aedfe3fca@github.com> (raw)
In-Reply-To: <xmqqbkvshxvn.fsf@gitster.g>
On 5/20/2022 2:17 PM, Junio C Hamano wrote:> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>> struct strbuf *base, const char *path,
>> unsigned int mode, void *context)
>> {
>
> OK, this function is a callback of read_tree_at(), whose caller is
> walking the source (=current) index, and while it is copying
> existing non-directory entries to the destination (=ctx->write)
> index, it found a tree in the source in-core index that does not
> match the cone pattern (or we are expanding fully). Our job is to
> expand the given tree as a "subdirectory" in the project into the
> destination index.
>
> We used to just let the calling read_tree_at() fully and recursively
> walk the tree while adding any non-tree entries to the destination.
> Now, we have the pattern list in the context, we go more selective.
>
> The design makes sense.
>
>> + struct modify_index_context *ctx = (struct modify_index_context *)context;
>> struct cache_entry *ce;
>> size_t len = base->len;
>>
>> + if (S_ISDIR(mode)) {
>> + int dtype;
>> + size_t baselen = base->len;
>> + if (!ctx->pl)
>> + return READ_TREE_RECURSIVE;
>
> Fully recursive case. We can do without any match, just like the
> caller (i.e. expand_to_pattern_list) did, when the pattern list is
> NULL.
>
>> + /*
>> + * Have we expanded to a point outside of the sparse-checkout?
>> + */
>> + strbuf_addstr(base, path);
>> + strbuf_add(base, "/-", 2);
>> +
>> + if (path_matches_pattern_list(base->buf, base->len,
>> + NULL, &dtype,
>> + ctx->pl, ctx->write)) {
>
> If that sample path in the directory matches (MATCHED or
> MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand.
>
> Can the function return UNDECIDED here? If so what should happen?
> As written, the code will behave as if it matched, and it is not
> quite clear if that is the behaviour intended by this patch or an
> accident.
UNDECIDED can only be returned when not in cone mode. We are in cone mode
when this helper is used.
>> + strbuf_setlen(base, baselen);
>> + return READ_TREE_RECURSIVE;
>> + }
>
> The caller found a tree at path <path> in the index. We check if
> our patterns match a fictitious path <path> + "/-", which may exist
> if the <path> is a directory and if there is a funny named file or
> directory "-" in it.
>
> Why "-"? Are we trying ot see if "ctx->pl" matches "anything" in
> the directory that is at <path>? Is the assumption here that pl
> only names directories literally without blobs (I presume that is
> the same thing as assuming the cone mode)?
>
> I am trying to see if there is a way that expresses the intention of
> this code more clearly than using an arbitrary path "-" (and trying
> to figure out if I got the intention right in the first place ;-)..
The '+ "/-"' is there to give us an example file path for "<path>" as a
directory. This is specifically because path_matches_pattern_list() checks
the parent directories for matches, expecting the input to be a file name.
This could use a comment. I'm thinking...
/*
* Have we expanded to a point outside of the sparse-checkout?
*
* Artificially pad the path name with a slash "/" to
* indicate it as a directory, and add an arbitrary file
* name ("-") so we can consider base->buf as a file name
* to match against the cone-mode patterns.
*
* If we compared just "path", then we would expand more
* than we should. Since every file at root is always
* included, we would expand every directory at root at
* least one level deep instead of using sparse directory
* entries.
*/
>> + /*
>> + * The path "{base}{path}/" is a sparse directory. Create the correct
>> + * name for inserting the entry into the index.
>> + */
>> + strbuf_setlen(base, base->len - 1);
>
> This removes that phoney "-" while keeping the trailing "/". Just
> like "-" was unclear, understanding this "- 1" requires that the
> reader understands why "/-" was used earlier.
>
> The resulting "base" is used in the newly created cache entry to
> represent the (unexpanded) directory below, and such a cache entry
> is supposed to have a path with a trailing slash, so it makes sense.
>
>> + } else {
>> + strbuf_addstr(base, path);
>
> For any non-tree thing, we use the given path for the cache entry to
> represent it.
>
>> + }
>> +
>> + ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>> ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>> - set_index_entry(istate, istate->cache_nr++, ce);
>> + set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
>
> And the cache entry (newly created) is added to the destination.
> Unlike what happens in the caller (i.e. expand_to_pattern_list) that
> moves the cache entry taken from the source index to the destination
> index, the caller will discard the cache entry taken from the source
> index after read_tree_at() returns, and we create a new instance for
> ourselves here, even if we _could_ have reused it (by passing it in
> the context structure, for example).
>
>> strbuf_setlen(base, len);
>
> And we restore the length of the path in the base to the original
> before we return.
>
>> return 0;
>
> Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've
> dealt with the entry; don't recurse into subtree even if you called
> us with a tree", which is the right thing to do here, as we did have
> done all we need to here.
>
> OK, except for that "/-" thing, all of the above makes sense to me.
Thanks for your careful read. Hopefully my explanation of the "/-" thing
helps.
Thanks,
-Stolee
next prev parent reply other threads:[~2022-05-20 18:33 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-16 20:36 ` Victoria Dye
2022-05-16 20:49 ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 2/8] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 3/8] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 5/8] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-16 20:36 ` Victoria Dye
2022-05-16 18:11 ` [PATCH 6/8] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-16 20:38 ` Victoria Dye
2022-05-17 13:23 ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 8/8] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-16 20:38 ` Victoria Dye
2022-05-17 13:28 ` Derrick Stolee
2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-19 17:52 ` [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-19 17:52 ` [PATCH v2 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-19 17:52 ` [PATCH v2 03/10] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-19 19:50 ` Junio C Hamano
2022-05-20 18:01 ` Derrick Stolee
2022-05-19 17:52 ` [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-19 20:05 ` Junio C Hamano
2022-05-20 18:05 ` Derrick Stolee
2022-05-20 18:23 ` Junio C Hamano
2022-05-19 17:52 ` [PATCH v2 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-19 20:14 ` Junio C Hamano
2022-05-20 18:13 ` Derrick Stolee
2022-05-19 17:52 ` [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-19 20:19 ` Junio C Hamano
2022-05-19 17:52 ` [PATCH v2 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-20 18:17 ` Junio C Hamano
2022-05-20 18:33 ` Derrick Stolee [this message]
2022-05-19 17:52 ` [PATCH v2 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-21 7:45 ` Junio C Hamano
2022-05-23 13:13 ` Derrick Stolee
2022-05-23 13:18 ` Derrick Stolee
2022-05-23 18:01 ` Junio C Hamano
2022-05-23 22:48 ` Junio C Hamano
2022-05-25 14:26 ` Derrick Stolee
2022-05-25 16:32 ` Junio C Hamano
2022-05-19 17:52 ` [PATCH v2 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-19 17:52 ` [PATCH v2 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 03/10] sparse-index: create expand_index() Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-23 13:48 ` [PATCH v3 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c093377-74f7-0a1e-fbcc-8a1aedfe3fca@github.com \
--to=derrickstolee@github.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=shaoxuan.yuan02@gmail.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).