git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, vdye@github.com, shaoxuan.yuan02@gmail.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2 07/10] sparse-index: partially expand directories
Date: Fri, 20 May 2022 11:17:32 -0700	[thread overview]
Message-ID: <xmqqbkvshxvn.fsf@gitster.g> (raw)
In-Reply-To: <346c56bf2560c5a89850ef4f8a58fbe17cde10fc.1652982759.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Thu, 19 May 2022 17:52:35 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>  			     struct strbuf *base, const char *path,
>  			     unsigned int mode, void *context)
>  {

OK, this function is a callback of read_tree_at(), whose caller is
walking the source (=current) index, and while it is copying
existing non-directory entries to the destination (=ctx->write)
index, it found a tree in the source in-core index that does not
match the cone pattern (or we are expanding fully).  Our job is to
expand the given tree as a "subdirectory" in the project into the
destination index.

We used to just let the calling read_tree_at() fully and recursively
walk the tree while adding any non-tree entries to the destination.
Now, we have the pattern list in the context, we go more selective.

The design makes sense.

> +	struct modify_index_context *ctx = (struct modify_index_context *)context;
>  	struct cache_entry *ce;
>  	size_t len = base->len;
>  
> +	if (S_ISDIR(mode)) {
> +		int dtype;
> +		size_t baselen = base->len;
> +		if (!ctx->pl)
> +			return READ_TREE_RECURSIVE;

Fully recursive case.  We can do without any match, just like the
caller (i.e. expand_to_pattern_list) did, when the pattern list is
NULL.

> +		/*
> +		 * Have we expanded to a point outside of the sparse-checkout?
> +		 */
> +		strbuf_addstr(base, path);
> +		strbuf_add(base, "/-", 2);
> +
> +		if (path_matches_pattern_list(base->buf, base->len,
> +					      NULL, &dtype,
> +					      ctx->pl, ctx->write)) {

If that sample path in the directory matches (MATCHED or
MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand.

Can the function return UNDECIDED here?  If so what should happen?
As written, the code will behave as if it matched, and it is not
quite clear if that is the behaviour intended by this patch or an
accident.

> +			strbuf_setlen(base, baselen);
> +			return READ_TREE_RECURSIVE;
> +		}

The caller found a tree at path <path> in the index.  We check if
our patterns match a fictitious path <path> + "/-", which may exist
if the <path> is a directory and if there is a funny named file or
directory "-" in it.

Why "-"?  Are we trying ot see if "ctx->pl" matches "anything" in
the directory that is at <path>?  Is the assumption here that pl
only names directories literally without blobs (I presume that is
the same thing as assuming the cone mode)?

I am trying to see if there is a way that expresses the intention of
this code more clearly than using an arbitrary path "-" (and trying
to figure out if I got the intention right in the first place ;-)..

> +		/*
> +		 * The path "{base}{path}/" is a sparse directory. Create the correct
> +		 * name for inserting the entry into the index.
> +		 */
> +		strbuf_setlen(base, base->len - 1);

This removes that phoney "-" while keeping the trailing "/".  Just
like "-" was unclear, understanding this "- 1" requires that the
reader understands why "/-" was used earlier.

The resulting "base" is used in the newly created cache entry to
represent the (unexpanded) directory below, and such a cache entry
is supposed to have a path with a trailing slash, so it makes sense.

> +	} else {
> +		strbuf_addstr(base, path);

For any non-tree thing, we use the given path for the cache entry to
represent it.

> +	}
> +
> +	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>  	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
> -	set_index_entry(istate, istate->cache_nr++, ce);
> +	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);

And the cache entry (newly created) is added to the destination.
Unlike what happens in the caller (i.e. expand_to_pattern_list) that
moves the cache entry taken from the source index to the destination
index, the caller will discard the cache entry taken from the source
index after read_tree_at() returns, and we create a new instance for
ourselves here, even if we _could_ have reused it (by passing it in
the context structure, for example).

>  	strbuf_setlen(base, len);

And we restore the length of the path in the base to the original
before we return.

>  	return 0;

Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've
dealt with the entry; don't recurse into subtree even if you called
us with a tree", which is the right thing to do here, as we did have
done all we need to here.

OK, except for that "/-" thing, all of the above makes sense to me.

Thanks.

> @@ -254,6 +282,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  	int i;
>  	struct index_state *full;
>  	struct strbuf base = STRBUF_INIT;
> +	struct modify_index_context ctx;
>  
>  	/*
>  	 * If the index is already full, then keep it full. We will convert
> @@ -294,6 +323,9 @@ void expand_to_pattern_list(struct index_state *istate,
>  	full->cache_nr = 0;
>  	ALLOC_ARRAY(full->cache, full->cache_alloc);
>  
> +	ctx.write = full;
> +	ctx.pl = pl;
> +
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
>  		struct tree *tree;
> @@ -319,7 +351,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  		strbuf_add(&base, ce->name, strlen(ce->name));
>  
>  		read_tree_at(istate->repo, tree, &base, &ps,
> -			     add_path_to_index, full);
> +			     add_path_to_index, &ctx);
>  
>  		/* free directory entries. full entries are re-used */
>  		discard_cache_entry(ce);

  reply	other threads:[~2022-05-20 18:18 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-16 20:36   ` Victoria Dye
2022-05-16 20:49     ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 2/8] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 3/8] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 5/8] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-16 20:36   ` Victoria Dye
2022-05-16 18:11 ` [PATCH 6/8] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-16 20:38   ` Victoria Dye
2022-05-17 13:23     ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 8/8] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-16 20:38   ` Victoria Dye
2022-05-17 13:28     ` Derrick Stolee
2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 03/10] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-19 19:50     ` Junio C Hamano
2022-05-20 18:01       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-19 20:05     ` Junio C Hamano
2022-05-20 18:05       ` Derrick Stolee
2022-05-20 18:23         ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-19 20:14     ` Junio C Hamano
2022-05-20 18:13       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-19 20:19     ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-20 18:17     ` Junio C Hamano [this message]
2022-05-20 18:33       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-21  7:45     ` Junio C Hamano
2022-05-23 13:13       ` Derrick Stolee
2022-05-23 13:18         ` Derrick Stolee
2022-05-23 18:01           ` Junio C Hamano
2022-05-23 22:48         ` Junio C Hamano
2022-05-25 14:26           ` Derrick Stolee
2022-05-25 16:32             ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 03/10] sparse-index: create expand_index() Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbkvshxvn.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).