git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Victoria Dye <vdye@github.com>,
	Derrick Stolee <stolee@gmail.com>,
	Lessley Dennington <lessleydennington@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
Date: Wed, 16 Feb 2022 10:32:12 +0100	[thread overview]
Message-ID: <220216.86fsojup82.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <05ac964e630a2e72eebaa1818a8807cd7a7d4f7e.1642175983.git.gitgitgadget@gmail.com>


On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> [...]
> +static int path_found(const char *path, const char **dirname, size_t *dir_len,
> +		      int *dir_found)
> +{
> +	struct stat st;
> +	char *newdir;
> +	char *tmp;
> +
> +	/*
> +	 * If dirname corresponds to a directory that doesn't exist, and this
> +	 * path starts with dirname, then path can't exist.
> +	 */
> +	if (!*dir_found && !memcmp(path, *dirname, *dir_len))
> +		return 0;
> +
> +	/*
> +	 * If path itself exists, return 1.
> +	 */
> +	if (!lstat(path, &st))
> +		return 1;
> +
> +	/*
> +	 * Otherwise, path does not exist so we'll return 0...but we'll first
> +	 * determine some info about its parent directory so we can avoid
> +	 * lstat calls for future cache entries.
> +	 */
> +	newdir = strrchr(path, '/');
> +	if (!newdir)
> +		return 0; /* Didn't find a parent dir; just return 0 now. */
> +
> +	/*
> +	 * If path starts with directory (which we already lstat'ed and found),
> +	 * then no need to lstat parent directory again.
> +	 */
> +	if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
> +		return 0;

I really don't care/just asking, but there was a discussion on another
topic about guarding calls to the mem*() family when n=0:
https://lore.kernel.org/git/xmqq1r24gsph.fsf@gitster.g/

Is this the same sort of redundancy where we could lose the "&&
*dirname" part, or is it still important because a "\0" dirname would
have corresponding non-0 *dir_len?

More generally ... (see below)...

> +
> +	/* Free previous dirname, and cache path's dirname */
> +	*dirname = path;
> +	*dir_len = newdir - path + 1;
> +
> +	tmp = xstrndup(path, *dir_len);
> +	*dir_found = !lstat(tmp, &st);

In most other places we're a bit more careful about lstat() error handling, e.g.:
    
    builtin/init-db.c:              if (lstat(path->buf, &st_git)) {
    builtin/init-db.c-                      if (errno != ENOENT)
    builtin/init-db.c-                              die_errno(_("cannot stat '%s'"), path->buf);
    builtin/init-db.c-              }
    
Shouldn't we do the same here and at least error() on return values of
-1 with an accompanying errno that isn't ENOENT?


> +	free(tmp);
> +
> +	return 0;
> +}
> +
>  void clear_skip_worktree_from_present_files(struct index_state *istate)
>  {
> +	const char *last_dirname = NULL;
> +	size_t dir_len = 0;
> +	int dir_found = 1;
> +
>  	int i;
> +
>  	if (!core_apply_sparse_checkout)
>  		return;
>  
>  restart:
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
> -		struct stat st;
>  
> -		if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
> +		if (ce_skip_worktree(ce) &&
> +		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {

...(continued from above) is the "path is zero" part of this even
reachable? I tried with this on top and ran your tests (and the rest of
t*sparse*.sh) successfully:
	
	diff --git a/sparse-index.c b/sparse-index.c
	index eed170cd8f7..f89c944d8cd 100644
	--- a/sparse-index.c
	+++ b/sparse-index.c
	@@ -403,6 +403,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
	 	for (i = 0; i < istate->cache_nr; i++) {
	 		struct cache_entry *ce = istate->cache[i];
	 
	+		assert(*ce->name);
	 		if (ce_skip_worktree(ce) &&
	 		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
	 			if (S_ISSPARSEDIR(ce->ce_mode)) {

I.e. isn't this undue paranoia about the cache API giving us zero-length
paths?

  parent reply	other threads:[~2022-02-16  9:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-13 23:35   ` Elijah Newren
2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-02-16  8:51     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:02       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:08       ` Elijah Newren
2022-02-19  1:06     ` Jonathan Nieder
2022-02-19 16:42       ` Elijah Newren
2022-02-19 18:14         ` Jonathan Nieder
2022-02-20  5:28           ` Elijah Newren
2022-02-20 16:56       ` Derrick Stolee
2022-02-22 23:17         ` Jonathan Nieder
2022-01-14 15:59   ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-02-16  9:15     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:21       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-15  1:39     ` Victoria Dye
2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason [this message]
2022-02-16 16:30       ` Elijah Newren
2022-02-17  4:40         ` Elijah Newren
2022-01-15  1:51   ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Victoria Dye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220216.86fsojup82.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=lessleydennington@gmail.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).