git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Matheus Tavares <matheus.bernardino@usp.br>, git@vger.kernel.org
Cc: newren@gmail.com, gitster@pobox.com, stolee@gmail.com,
	vdye@github.com, derrickstolee@github.com,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH v2] add, rm, mv: fix bug that prevents the update of non-sparse dirs
Date: Tue, 26 Oct 2021 18:22:26 +0200	[thread overview]
Message-ID: <ca1c6a86-23ab-57ae-b1ca-64a9851d72db@web.de> (raw)
In-Reply-To: <5e99c039db0b9644fb21f2ea72a464c67a74ff64.1635191000.git.matheus.bernardino@usp.br>

Am 25.10.21 um 23:07 schrieb Matheus Tavares:
> These three commands recently learned to avoid updating paths outside
> the sparse checkout even if they are missing the SKIP_WORKTREE bit. This
> is done using path_in_sparse_checkout(), which checks whether a given
> path matches the current list of sparsity rules, similar to what
> clear_ce_flags() does when we run "git sparse checkout init" or "git
> sparse-checkout reapply". However, clear_ce_flags() uses a recursive
> approach, applying the match results from parent directories on paths
> that get the UNDECIDED result, whereas path_in_sparse_checkout() only
> attempts to match the full path and immediately considers UNDECIDED as
> NOT_MATCHED. This makes the function miss matches with leading
> directories. For example, if the user has the sparsity patterns "!/a"
> and "b/", add, rm, and mv will fail to update the path "a/b/c" and end
> up displaying a warning about it being outside the sparse checkout even
> though it isn't. This problem only occurs in full pattern mode as the
> pattern matching functions never return UNDECIDED for cone mode.
>
> To fix this, replicate the recursive behavior of clear_ce_flags() in
> path_in_sparse_checkout(), falling back to the parent directory match
> when a path gets the UNDECIDED result. (If this turns out to be too
> expensive in some cases, we may want to later add some form of caching
> to accelerate multiple queries within the same directory. This is not
> implemented in this patch, though.) Also add two tests for each affected
> command (add, rm, and mv) to check that they behave correctly with the
> recursive pattern matching. The first test would previously fail without
> this patch while the second already succeeded. It is added mostly to
> make sure that we are not breaking the existing pattern matching for
> directories that are really sparse, and also as a protection against any
> future regressions.
>
> Two other existing tests had to be changed as well: one test in t3602
> checks that "git rm -r <dir>" won't remove sparse entries, but it didn't
> allow the non-sparse entries inside <dir> to be removed. The other one,
> in t7002, tested that "git mv" would correctly display a warning message
> for sparse paths, but it accidentally expected the message to include
> two non-sparse paths as well.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>
> Changes since RFC/v1 [1]:
>
> - Inverted the loop direction to start from the full path and go backwards in
>   the parent dirs. This way we can stop early when we find the first
>   non-UNDECIDED match result.
>
> - Simplified the implementation by unifing the code path for cone mode and
>   full pattern mode. Since path_matches_pattern_list() never returns UNDECIDED
>   for cone mode, it will always execute only one iteration of the loop and then
>   find the final answer. There is no need to handle this case in a separate
>   block.
>
> - Inside the loop, made sure to change dtype to DT_DIR when going to parent
>   directories. Without this, the pattern match would fail if we had a path
>   like "a/b/c" and the pattern "b/" (with trailing slash).
>
> - Changed the tests to use trailing slash to make sure they cover the corner
>   case described above.
>
> - Improved commit message.
>
> [1]: https://lore.kernel.org/git/80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@usp.br
> (The RFC was deep down another thread, so I separated v2 to help
> readers. Please, let me know if that is not a good approach and I will
> avoid it in the future.)
>
>  dir.c                          | 25 +++++++++++++++++------
>  t/t3602-rm-sparse-checkout.sh  | 37 +++++++++++++++++++++++++++++++---
>  t/t3705-add-sparse-checkout.sh | 18 +++++++++++++++++
>  t/t7002-mv-sparse-checkout.sh  | 24 ++++++++++++++++++++--
>  4 files changed, 93 insertions(+), 11 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index a4306ab874..248f72e732 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
>  				     struct index_state *istate,
>  				     int require_cone_mode)
>  {
> -	const char *base;
>  	int dtype = DT_REG;
> +	enum pattern_match_result match = UNDECIDED;
> +	const char *end, *slash;
>
>  	/*
>  	 * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,23 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	/*
> +	 * If UNDECIDED, use the match from the parent dir (recursively),
> +	 * or fall back to NOT_MATCHED at the topmost level.
> +	 */
> +	for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) {
> +
> +		for (slash = end - 1; slash >= path && *slash != '/'; slash--)
> +			; /* do nothing */

slash can end up one less than path.  If path points to the first char
of a string object this would be undefined if I read 6.5.6 of C99
correctly.  (A pointer to the array element just after the last one is
specified as fine as long as it's not dereferenced, but a pointer to
the element before the first one is not mentioned and thus undefined.)

Do you really need the ">=" instead of ">"?

> +
> +		match = path_matches_pattern_list(path, end - path,
> +				slash >= path ? slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		/* We are going to match the parent dir now */
> +		dtype = DT_DIR;
> +	}
> +	return match > 0;
>  }
>
>  int path_in_sparse_checkout(const char *path,

  parent reply	other threads:[~2021-10-26 16:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 21:07 [PATCH v2] add, rm, mv: fix bug that prevents the update of non-sparse dirs Matheus Tavares
2021-10-26 12:53 ` Derrick Stolee
2021-10-26 22:43   ` Matheus Tavares
2021-10-27 11:35     ` Derrick Stolee
2021-10-26 16:22 ` René Scharfe [this message]
2021-10-26 19:04   ` Derrick Stolee
2021-10-27  0:00     ` Matheus Tavares
2021-10-27 22:13     ` Chris Torek
2021-10-27 17:36 ` Sean Christopherson
2021-10-28 14:21 ` [PATCH v3] " Matheus Tavares
2021-10-28 15:06   ` Derrick Stolee
2021-10-28 15:58     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca1c6a86-23ab-57ae-b1ca-64a9851d72db@web.de \
    --to=l.s.r@web.de \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=seanjc@google.com \
    --cc=stolee@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).