git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Martin Melka" <martin.melka@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Samuel Lijin" <sxlijin@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches
Date: Wed, 01 Apr 2020 04:17:45 +0000	[thread overview]
Message-ID: <61d9c9d758e90c6afdaa1605b191f367650d18f0.1585714667.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.700.v5.git.git.1585714667.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

Traditionally, the expected calling convention for the dir.c API was:

    fill_directory(&dir, ..., pathspec)
    foreach entry in dir->entries:
        if (dir_path_match(entry, pathspec))
            process_or_display(entry)

This may have made sense once upon a time, because the fill_directory() call
could use cheap checks to avoid doing full pathspec matching, and an external
caller may have wanted to do other post-processing of the results anyway.
However:

    * this structure makes it easy for users of the API to get it wrong

    * this structure actually makes it harder to understand
      fill_directory() and the functions it uses internally.  It has
      tripped me up several times while trying to fix bugs and
      restructure things.

    * relying on post-filtering was already found to produce wrong
      results; pathspec matching had to be added internally for multiple
      cases in order to get the right results (see commits 404ebceda01c
      (dir: also check directories for matching pathspecs, 2019-09-17)
      and 89a1f4aaf765 (dir: if our pathspec might match files under a
      dir, recurse into it, 2019-09-17))

    * it's bad for performance: fill_directory() already has to do lots
      of checks and knows the subset of cases where it still needs to do
      more checks.  Forcing external callers to do full pathspec
      matching means they must re-check _every_ path.

So, add the pathspec matching within the fill_directory() internals, and
remove it from external callers.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c    |  6 ------
 builtin/grep.c     |  2 --
 builtin/ls-files.c |  5 +++--
 builtin/stash.c    | 17 +++++------------
 dir.c              |  9 ++++++++-
 wt-status.c        |  6 ++----
 6 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 5abf087e7c4..b189b7b4ea0 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -989,12 +989,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		if (!cache_name_is_other(ent->name, ent->len))
 			continue;
 
-		if (pathspec.nr)
-			matches = dir_path_match(&the_index, ent, &pathspec, 0, NULL);
-
-		if (pathspec.nr && !matches)
-			continue;
-
 		if (lstat(ent->name, &st))
 			die_errno("Cannot lstat '%s'", ent->name);
 
diff --git a/builtin/grep.c b/builtin/grep.c
index 50ce8d94612..f3425102999 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -691,8 +691,6 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
 
 	fill_directory(&dir, opt->repo->index, pathspec);
 	for (i = 0; i < dir.nr; i++) {
-		if (!dir_path_match(opt->repo->index, dir.entries[i], pathspec, 0, NULL))
-			continue;
 		hit |= grep_file(opt, dir.entries[i]->name);
 		if (hit && opt->status_only)
 			break;
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index f069a028cea..b87c22ac240 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -128,8 +128,9 @@ static void show_dir_entry(const struct index_state *istate,
 	if (len > ent->len)
 		die("git ls-files: internal error - directory entry not superset of prefix");
 
-	if (!dir_path_match(istate, ent, &pathspec, len, ps_matched))
-		return;
+	/* If ps_matches is non-NULL, figure out which pathspec(s) match. */
+	if (ps_matched)
+		dir_path_match(istate, ent, &pathspec, len, ps_matched);
 
 	fputs(tag, stdout);
 	write_eolinfo(istate, NULL, ent->name);
diff --git a/builtin/stash.c b/builtin/stash.c
index 4ad3adf4ba5..704740b245c 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -856,30 +856,23 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 			       struct strbuf *untracked_files)
 {
 	int i;
-	int max_len;
 	int found = 0;
-	char *seen;
 	struct dir_struct dir;
 
 	memset(&dir, 0, sizeof(dir));
 	if (include_untracked != INCLUDE_ALL_FILES)
 		setup_standard_excludes(&dir);
 
-	seen = xcalloc(ps->nr, 1);
-
-	max_len = fill_directory(&dir, the_repository->index, ps);
+	fill_directory(&dir, the_repository->index, ps);
 	for (i = 0; i < dir.nr; i++) {
 		struct dir_entry *ent = dir.entries[i];
-		if (dir_path_match(&the_index, ent, ps, max_len, seen)) {
-			found++;
-			strbuf_addstr(untracked_files, ent->name);
-			/* NUL-terminate: will be fed to update-index -z */
-			strbuf_addch(untracked_files, '\0');
-		}
+		found++;
+		strbuf_addstr(untracked_files, ent->name);
+		/* NUL-terminate: will be fed to update-index -z */
+		strbuf_addch(untracked_files, '\0');
 		free(ent);
 	}
 
-	free(seen);
 	free(dir.entries);
 	free(dir.ignored);
 	clear_directory(&dir);
diff --git a/dir.c b/dir.c
index a67930dcff6..2de64910401 100644
--- a/dir.c
+++ b/dir.c
@@ -2117,7 +2117,14 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				       baselen, excluded, pathspec);
 	case DT_REG:
 	case DT_LNK:
-		return excluded ? path_excluded : path_untracked;
+		if (excluded)
+			return path_excluded;
+		if (pathspec &&
+		    !do_match_pathspec(istate, pathspec, path->buf, path->len,
+				       0 /* prefix */, NULL /* seen */,
+				       0 /* flags */))
+			return path_none;
+		return path_untracked;
 	}
 }
 
diff --git a/wt-status.c b/wt-status.c
index cc6f94504d9..98dfa6f73f9 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -722,16 +722,14 @@ static void wt_status_collect_untracked(struct wt_status *s)
 
 	for (i = 0; i < dir.nr; i++) {
 		struct dir_entry *ent = dir.entries[i];
-		if (index_name_is_other(istate, ent->name, ent->len) &&
-		    dir_path_match(istate, ent, &s->pathspec, 0, NULL))
+		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->untracked, ent->name);
 		free(ent);
 	}
 
 	for (i = 0; i < dir.ignored_nr; i++) {
 		struct dir_entry *ent = dir.ignored[i];
-		if (index_name_is_other(istate, ent->name, ent->len) &&
-		    dir_path_match(istate, ent, &s->pathspec, 0, NULL))
+		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->ignored, ent->name);
 		free(ent);
 	}
-- 
gitgitgadget


  parent reply	other threads:[~2020-04-01  4:18 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-30 15:20   ` Derrick Stolee
2020-01-31 18:04   ` SZEDER Gábor
2020-01-31 18:17     ` Elijah Newren
2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget
2020-01-30 15:33   ` Derrick Stolee
2020-01-30 15:45     ` Elijah Newren
2020-01-30 16:00       ` Derrick Stolee
2020-01-30 16:10         ` Derrick Stolee
2020-01-30 16:20           ` Elijah Newren
2020-01-30 18:17             ` Derrick Stolee
2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-30 15:55   ` Derrick Stolee
2020-01-30 17:13     ` Elijah Newren
2020-01-30 17:45       ` Elijah Newren
2020-01-31 17:13   ` SZEDER Gábor
2020-01-31 17:47     ` Elijah Newren
2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
2020-03-26 13:02       ` Derrick Stolee
2020-03-26 21:18         ` Elijah Newren
2020-03-25 19:31     ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget
2020-03-26 13:13       ` Derrick Stolee
2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-03-27 13:09         ` Derrick Stolee
2020-03-29 18:18           ` Junio C Hamano
2020-03-31 20:15             ` Elijah Newren
2020-03-26 21:27       ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-03-27 13:13       ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee
2020-03-28 17:33         ` Elijah Newren
2020-03-29 18:20           ` Junio C Hamano
2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-04-01 13:57           ` Derrick Stolee
2020-04-01 15:59             ` Elijah Newren
2020-04-01  4:17         ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget
2020-04-01  4:17         ` Elijah Newren via GitGitGadget [this message]
2020-07-19  6:33           ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Andreas Schwab
2020-07-19 12:39             ` Martin Ågren
2020-07-20 15:25               ` Elijah Newren
2020-07-20 18:45                 ` [PATCH] dir: check pathspecs before returning `path_excluded` Martin Ågren
2020-07-20 18:49                   ` Elijah Newren
2020-07-20 18:51                     ` Martin Ågren
2020-07-20 20:25                   ` Junio C Hamano
2020-07-20 18:58                 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Junio C Hamano
2020-04-01  4:17         ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61d9c9d758e90c6afdaa1605b191f367650d18f0.1585714667.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.melka@gmail.com \
    --cc=newren@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=stolee@gmail.com \
    --cc=sxlijin@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).