git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Derrick Stolee <dstolee@microsoft.com>,
	Kevin.Willford@microsoft.com, Kyle Meyer <kyle@kyleam.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v3 2/4] dir: treat_leading_path() and read_directory_recursive(), round 2
Date: Thu, 16 Jan 2020 20:21:54 +0000	[thread overview]
Message-ID: <ea95186774762a5e1cf3cb882cff45fc904e3bf6.1579206117.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.692.v3.git.git.1579206117.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

I was going to title this "dir: more synchronizing of
treat_leading_path() and read_directory_recursive()", a nod to commit
777b42034764 ("dir: synchronize treat_leading_path() and
read_directory_recursive()", 2019-12-19), but the title was too long.

Anyway, first the backstory...

fill_directory() has always had a slightly error-prone interface: it
returns a subset of paths which *might* match the specified pathspec; it
was intended to prune away some paths which didn't match the specified
pathspec and keep at least all the ones that did match it.  Given this
interface, callers were responsible to post-process the results and
check whether each actually matched the pathspec.

builtin/clean.c did this.  It would first prune out duplicates (e.g. if
"dir" was returned as well as all files under "dir/", then it would
simplify this to just "dir"), and after pruning duplicates it would
compare the remaining paths to the specified pathspec(s).  This
post-processing itself could run into problems, though, as noted in
commit 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17):

    For the case of git-clean and a set of pathspecs of "dir/file" and
    "more", this caused a problem because we'd end up with dir entries
    for both of
      "dir"
      "dir/file"
    Then correct_untracked_entries() would try to helpfully prune
    duplicates for us by removing "dir/file" since it's under "dir",
    leaving us with
      "dir"
    Since the original pathspec only had "dir/file", the only entry left
    doesn't match and leaves nothing to be removed.  (Note that if only
    one pathspec was specified, e.g. only "dir/file", then the
    common_prefix_len optimizations in fill_directory would cause us to
    bypass this problem, making it appear in simple tests that we could
    correctly remove manually specified pathspecs.)

That commit fixed the issue -- when multiple pathspecs were specified --
by making sure fill_directory() wouldn't return both "dir" and
"dir/file" outside the common_prefix_len optimization path.  This is
where it starts to get fun.

In commit b9670c1f5e6b ("dir: fix checks on common prefix directory",
2019-12-19), we noticed that the common_prefix_len wasn't doing
appropriate checks and letting all kinds of stuff through, resulting in
recursing into .git/ directories and other craziness.  So it started
locking down and doing checks on pathnames within that code path.  That
continued with commit 777b42034764 ("dir: synchronize
treat_leading_path() and read_directory_recursive()", 2019-12-19), which
noted the following:

    Our optimization to avoid calling into read_directory_recursive()
    when all pathspecs have a common leading directory mean that we need
    to match the logic that read_directory_recursive() would use if we
    had just called it from the root.  Since it does more than call
    treat_path() we need to copy that same logic.

...and then it more forcefully addressed the issue with this wonderfully
ironic statement:

    Needing to duplicate logic like this means it is guaranteed someone
    will eventually need to make further changes and forget to update
    both locations.  It is tempting to just nuke the leading_directory
    special casing to avoid such bugs and simplify the code, but
    unpack_trees' verify_clean_subdirectory() also calls
    read_directory() and does so with a non-empty leading path, so I'm
    hesitant to try to restructure further.  Add obnoxious warnings to
    treat_leading_path() and read_directory_recursive() to try to warn
    people of such problems.

You would think that with such a strongly worded description, that its
author would have actually ensured that the logic in
treat_leading_path() and read_directory_recursive() did actually match
and that *everything* that was needed had at least been copied over at
the time that this paragraph was written.  But you'd be wrong, I messed
it up by missing part of the logic.

Copy the missing bits to fix the new final test in t7300.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 4 ++++
 t/t7300-clean.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 7d255227b1..5d4c92d3aa 100644
--- a/dir.c
+++ b/dir.c
@@ -2383,6 +2383,10 @@ static int treat_leading_path(struct dir_struct *dir,
 		    (dir->flags & DIR_SHOW_IGNORED_TOO ||
 		     do_match_pathspec(istate, pathspec, sb.buf, sb.len,
 				       baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) {
+			if (!match_pathspec(istate, pathspec, sb.buf, sb.len,
+					    0 /* prefix */, NULL,
+					    0 /* do NOT special case dirs */))
+				state = path_none;
 			add_path_to_appropriate_result_list(dir, NULL, &cdir,
 							    istate,
 							    &sb, baselen,
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 782e125c89..cb5e34d94c 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -737,7 +737,7 @@ test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
 	test_i18ngrep "too long" .git/err
 '
 
-test_expect_failure 'clean untracked paths by pathspec' '
+test_expect_success 'clean untracked paths by pathspec' '
 	git init untracked &&
 	mkdir untracked/dir &&
 	echo >untracked/dir/file.txt &&
-- 
gitgitgadget


  parent reply	other threads:[~2020-01-16 20:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:32 [PATCH] dir: restructure in a way to avoid passing around a struct dirent Elijah Newren via GitGitGadget
2020-01-14 21:07 ` Johannes Schindelin
2020-01-14 21:13   ` Elijah Newren
2020-01-14 22:03 ` Jeff King
2020-01-15 14:21 ` [PATCH v2] " Elijah Newren via GitGitGadget
2020-01-15 20:21   ` [PATCH] dir: point treat_leading_path() warning to the right place Jeff King
2020-01-15 20:28     ` Elijah Newren
2020-01-16 20:21   ` [PATCH v3 0/4] dir: more fill_directory() fixes Elijah Newren via GitGitGadget
2020-01-16 20:21     ` [PATCH v3 1/4] clean: demonstrate a bug with pathspecs Derrick Stolee via GitGitGadget
2020-01-17 15:20       ` Derrick Stolee
2020-01-17 16:53         ` Elijah Newren
2020-01-16 20:21     ` Elijah Newren via GitGitGadget [this message]
2020-01-17 15:25       ` [PATCH v3 2/4] dir: treat_leading_path() and read_directory_recursive(), round 2 Derrick Stolee
2020-01-16 20:21     ` [PATCH v3 3/4] dir: restructure in a way to avoid passing around a struct dirent Jeff King via GitGitGadget
2020-01-16 20:21     ` [PATCH v3 4/4] dir: point treat_leading_path() warning to the right place Jeff King via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea95186774762a5e1cf3cb882cff45fc904e3bf6.1579206117.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=Kevin.Willford@microsoft.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=kyle@kyleam.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).