From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Victoria Dye <vdye@github.com>, Derrick Stolee <stolee@gmail.com>,
Lessley Dennington <lessleydennington@gmail.com>
Subject: Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
Date: Wed, 16 Feb 2022 08:30:42 -0800 [thread overview]
Message-ID: <CABPp-BEog_CBEjx3FBGdUAhjwrPPDuP54HWQssAWnGeUnr0cBg@mail.gmail.com> (raw)
In-Reply-To: <220216.86fsojup82.gmgdl@evledraar.gmail.com>
On Wed, Feb 16, 2022 at 1:37 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> > [...]
> > +static int path_found(const char *path, const char **dirname, size_t *dir_len,
> > + int *dir_found)
> > +{
> > + struct stat st;
> > + char *newdir;
> > + char *tmp;
> > +
> > + /*
> > + * If dirname corresponds to a directory that doesn't exist, and this
> > + * path starts with dirname, then path can't exist.
> > + */
> > + if (!*dir_found && !memcmp(path, *dirname, *dir_len))
> > + return 0;
> > +
> > + /*
> > + * If path itself exists, return 1.
> > + */
> > + if (!lstat(path, &st))
> > + return 1;
> > +
> > + /*
> > + * Otherwise, path does not exist so we'll return 0...but we'll first
> > + * determine some info about its parent directory so we can avoid
> > + * lstat calls for future cache entries.
> > + */
> > + newdir = strrchr(path, '/');
> > + if (!newdir)
> > + return 0; /* Didn't find a parent dir; just return 0 now. */
> > +
> > + /*
> > + * If path starts with directory (which we already lstat'ed and found),
> > + * then no need to lstat parent directory again.
> > + */
> > + if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
> > + return 0;
>
> I really don't care/just asking, but there was a discussion on another
> topic about guarding calls to the mem*() family when n=0:
> https://lore.kernel.org/git/xmqq1r24gsph.fsf@gitster.g/
>
> Is this the same sort of redundancy where we could lose the "&&
> *dirname" part, or is it still important because a "\0" dirname would
> have corresponding non-0 *dir_len?
No, dirname is a char**, not a char*. I need to make sure *dirname is
non-NULL before passing to memcmp or we get segfaults (and *dirname
will be NULL the first time it gets to this line, so the check is
critical).
> More generally ... (see below)...
>
> > +
> > + /* Free previous dirname, and cache path's dirname */
> > + *dirname = path;
> > + *dir_len = newdir - path + 1;
> > +
> > + tmp = xstrndup(path, *dir_len);
> > + *dir_found = !lstat(tmp, &st);
>
> In most other places we're a bit more careful about lstat() error handling, e.g.:
>
> builtin/init-db.c: if (lstat(path->buf, &st_git)) {
> builtin/init-db.c- if (errno != ENOENT)
> builtin/init-db.c- die_errno(_("cannot stat '%s'"), path->buf);
> builtin/init-db.c- }
>
> Shouldn't we do the same here and at least error() on return values of
> -1 with an accompanying errno that isn't ENOENT?
If we should do that everywhere, should we have an xlstat in wrapper.[ch]?
> > + free(tmp);
> > +
> > + return 0;
> > +}
> > +
> > void clear_skip_worktree_from_present_files(struct index_state *istate)
> > {
> > + const char *last_dirname = NULL;
> > + size_t dir_len = 0;
> > + int dir_found = 1;
> > +
> > int i;
> > +
> > if (!core_apply_sparse_checkout)
> > return;
> >
> > restart:
> > for (i = 0; i < istate->cache_nr; i++) {
> > struct cache_entry *ce = istate->cache[i];
> > - struct stat st;
> >
> > - if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
> > + if (ce_skip_worktree(ce) &&
> > + path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>
> ...(continued from above) is the "path is zero" part of this even
> reachable? I tried with this on top and ran your tests (and the rest of
> t*sparse*.sh) successfully:
>
> diff --git a/sparse-index.c b/sparse-index.c
> index eed170cd8f7..f89c944d8cd 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -403,6 +403,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
> for (i = 0; i < istate->cache_nr; i++) {
> struct cache_entry *ce = istate->cache[i];
>
> + assert(*ce->name);
> if (ce_skip_worktree(ce) &&
> path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> if (S_ISSPARSEDIR(ce->ce_mode)) {
>
> I.e. isn't this undue paranoia about the cache API giving us zero-length
> paths?
Nope, not related at all, for two reasons: the code above was checking
for NULL pointers rather than NUL characters, and the argument I was
checking was last_dirname, not ce->name.
next prev parent reply other threads:[~2022-02-16 16:31 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-13 23:35 ` Elijah Newren
2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-14 15:59 ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-02-16 8:51 ` Ævar Arnfjörð Bjarmason
2022-02-16 16:02 ` Elijah Newren
2022-01-14 15:59 ` [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-14 15:59 ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-02-16 8:57 ` Ævar Arnfjörð Bjarmason
2022-02-16 16:08 ` Elijah Newren
2022-02-19 1:06 ` Jonathan Nieder
2022-02-19 16:42 ` Elijah Newren
2022-02-19 18:14 ` Jonathan Nieder
2022-02-20 5:28 ` Elijah Newren
2022-02-20 16:56 ` Derrick Stolee
2022-02-22 23:17 ` Jonathan Nieder
2022-01-14 15:59 ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-02-16 9:15 ` Ævar Arnfjörð Bjarmason
2022-02-16 16:21 ` Elijah Newren
2022-01-14 15:59 ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-15 1:39 ` Victoria Dye
2022-02-16 9:32 ` Ævar Arnfjörð Bjarmason
2022-02-16 16:30 ` Elijah Newren [this message]
2022-02-17 4:40 ` Elijah Newren
2022-01-15 1:51 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Victoria Dye
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABPp-BEog_CBEjx3FBGdUAhjwrPPDuP54HWQssAWnGeUnr0cBg@mail.gmail.com \
--to=newren@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=lessleydennington@gmail.com \
--cc=stolee@gmail.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).