git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, vdye@github.com, shaoxuan.yuan02@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2 07/10] sparse-index: partially expand directories
Date: Fri, 20 May 2022 14:33:17 -0400	[thread overview]
Message-ID: <8c093377-74f7-0a1e-fbcc-8a1aedfe3fca@github.com> (raw)
In-Reply-To: <xmqqbkvshxvn.fsf@gitster.g>

On 5/20/2022 2:17 PM, Junio C Hamano wrote:> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>>  			     struct strbuf *base, const char *path,
>>  			     unsigned int mode, void *context)
>>  {
> 
> OK, this function is a callback of read_tree_at(), whose caller is
> walking the source (=current) index, and while it is copying
> existing non-directory entries to the destination (=ctx->write)
> index, it found a tree in the source in-core index that does not
> match the cone pattern (or we are expanding fully).  Our job is to
> expand the given tree as a "subdirectory" in the project into the
> destination index.
> 
> We used to just let the calling read_tree_at() fully and recursively
> walk the tree while adding any non-tree entries to the destination.
> Now, we have the pattern list in the context, we go more selective.
> 
> The design makes sense.
> 
>> +	struct modify_index_context *ctx = (struct modify_index_context *)context;
>>  	struct cache_entry *ce;
>>  	size_t len = base->len;
>>  
>> +	if (S_ISDIR(mode)) {
>> +		int dtype;
>> +		size_t baselen = base->len;
>> +		if (!ctx->pl)
>> +			return READ_TREE_RECURSIVE;
> 
> Fully recursive case.  We can do without any match, just like the
> caller (i.e. expand_to_pattern_list) did, when the pattern list is
> NULL.
> 
>> +		/*
>> +		 * Have we expanded to a point outside of the sparse-checkout?
>> +		 */
>> +		strbuf_addstr(base, path);
>> +		strbuf_add(base, "/-", 2);
>> +
>> +		if (path_matches_pattern_list(base->buf, base->len,
>> +					      NULL, &dtype,
>> +					      ctx->pl, ctx->write)) {
> 
> If that sample path in the directory matches (MATCHED or
> MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand.
> 
> Can the function return UNDECIDED here?  If so what should happen?
> As written, the code will behave as if it matched, and it is not
> quite clear if that is the behaviour intended by this patch or an
> accident.

UNDECIDED can only be returned when not in cone mode. We are in cone mode
when this helper is used.

>> +			strbuf_setlen(base, baselen);
>> +			return READ_TREE_RECURSIVE;
>> +		}
> 
> The caller found a tree at path <path> in the index.  We check if
> our patterns match a fictitious path <path> + "/-", which may exist
> if the <path> is a directory and if there is a funny named file or
> directory "-" in it.
> 
> Why "-"?  Are we trying ot see if "ctx->pl" matches "anything" in
> the directory that is at <path>?  Is the assumption here that pl
> only names directories literally without blobs (I presume that is
> the same thing as assuming the cone mode)?
> 
> I am trying to see if there is a way that expresses the intention of
> this code more clearly than using an arbitrary path "-" (and trying
> to figure out if I got the intention right in the first place ;-)..

The '+ "/-"' is there to give us an example file path for "<path>" as a
directory. This is specifically because path_matches_pattern_list() checks
the parent directories for matches, expecting the input to be a file name.

This could use a comment. I'm thinking...

/*
 * Have we expanded to a point outside of the sparse-checkout?
 *
 * Artificially pad the path name with a slash "/" to
 * indicate it as a directory, and add an arbitrary file
 * name ("-") so we can consider base->buf as a file name
 * to match against the cone-mode patterns.
 *
 * If we compared just "path", then we would expand more
 * than we should. Since every file at root is always
 * included, we would expand every directory at root at
 * least one level deep instead of using sparse directory
 * entries.
 */

>> +		/*
>> +		 * The path "{base}{path}/" is a sparse directory. Create the correct
>> +		 * name for inserting the entry into the index.
>> +		 */
>> +		strbuf_setlen(base, base->len - 1);
> 
> This removes that phoney "-" while keeping the trailing "/".  Just
> like "-" was unclear, understanding this "- 1" requires that the
> reader understands why "/-" was used earlier.
> 
> The resulting "base" is used in the newly created cache entry to
> represent the (unexpanded) directory below, and such a cache entry
> is supposed to have a path with a trailing slash, so it makes sense.
> 
>> +	} else {
>> +		strbuf_addstr(base, path);
> 
> For any non-tree thing, we use the given path for the cache entry to
> represent it.
> 
>> +	}
>> +
>> +	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>>  	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>> -	set_index_entry(istate, istate->cache_nr++, ce);
>> +	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
> 
> And the cache entry (newly created) is added to the destination.
> Unlike what happens in the caller (i.e. expand_to_pattern_list) that
> moves the cache entry taken from the source index to the destination
> index, the caller will discard the cache entry taken from the source
> index after read_tree_at() returns, and we create a new instance for
> ourselves here, even if we _could_ have reused it (by passing it in
> the context structure, for example).
> 
>>  	strbuf_setlen(base, len);
> 
> And we restore the length of the path in the base to the original
> before we return.
> 
>>  	return 0;
> 
> Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've
> dealt with the entry; don't recurse into subtree even if you called
> us with a tree", which is the right thing to do here, as we did have
> done all we need to here.
> 
> OK, except for that "/-" thing, all of the above makes sense to me.

Thanks for your careful read. Hopefully my explanation of the "/-" thing
helps.

Thanks,
-Stolee

  reply	other threads:[~2022-05-20 18:33 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-16 20:36   ` Victoria Dye
2022-05-16 20:49     ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 2/8] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 3/8] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 5/8] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-16 20:36   ` Victoria Dye
2022-05-16 18:11 ` [PATCH 6/8] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-16 20:38   ` Victoria Dye
2022-05-17 13:23     ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 8/8] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-16 20:38   ` Victoria Dye
2022-05-17 13:28     ` Derrick Stolee
2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 03/10] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-19 19:50     ` Junio C Hamano
2022-05-20 18:01       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-19 20:05     ` Junio C Hamano
2022-05-20 18:05       ` Derrick Stolee
2022-05-20 18:23         ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-19 20:14     ` Junio C Hamano
2022-05-20 18:13       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-19 20:19     ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-20 18:17     ` Junio C Hamano
2022-05-20 18:33       ` Derrick Stolee [this message]
2022-05-19 17:52   ` [PATCH v2 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-21  7:45     ` Junio C Hamano
2022-05-23 13:13       ` Derrick Stolee
2022-05-23 13:18         ` Derrick Stolee
2022-05-23 18:01           ` Junio C Hamano
2022-05-23 22:48         ` Junio C Hamano
2022-05-25 14:26           ` Derrick Stolee
2022-05-25 16:32             ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 03/10] sparse-index: create expand_index() Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c093377-74f7-0a1e-fbcc-8a1aedfe3fca@github.com \
    --to=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).