git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Clean cleanups
@ 2020-06-11  6:59 Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 1/4] dir: fix a few confusing comments Elijah Newren via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-06-11  6:59 UTC (permalink / raw)
  To: git; +Cc: bmalehorn, Elijah Newren

This fixes the performance regression reported at [1], and also performs a
few other cleanups found while investigating the issue.

[1] 
https://lore.kernel.org/git/CAJB88a23uU2WfB0mnB9NfNbtgmABhNOWNOEMBt7rRVu7uL_C9A@mail.gmail.com/

Elijah Newren (4):
  dir: fix a few confusing comments
  dir, clean: avoid disallowed behavior
  clean: consolidate handling of ignored parameters
  clean: optimize and document cases where we recurse into
    subdirectories

 builtin/clean.c | 49 ++++++++++++++++++++++++++++++++++++++-----------
 dir.c           | 15 ++++++++++-----
 2 files changed, 48 insertions(+), 16 deletions(-)


base-commit: 20514004ddf1a3528de8933bc32f284e175e1012
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-806%2Fnewren%2Fclean-cleanups-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-806/newren/clean-cleanups-v1
Pull-Request: https://github.com/git/git/pull/806
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] dir: fix a few confusing comments
  2020-06-11  6:59 [PATCH 0/4] Clean cleanups Elijah Newren via GitGitGadget
@ 2020-06-11  6:59 ` Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 2/4] dir, clean: avoid disallowed behavior Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-06-11  6:59 UTC (permalink / raw)
  To: git; +Cc: bmalehorn, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index d97e9558489..6fb2f8ecdd7 100644
--- a/dir.c
+++ b/dir.c
@@ -1820,7 +1820,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 * to recurse into untracked/ignored directories if either of the
 	 * following bits is set:
 	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
-	 *                           there are ignored directories below)
+	 *                           there are ignored entries below)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
@@ -1838,10 +1838,11 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_excluded;
 
 	/*
-	 * If we have we don't want to know the all the paths under an
-	 * untracked or ignored directory, we still need to go into the
-	 * directory to determine if it is empty (because an empty directory
-	 * should be path_none instead of path_excluded or path_untracked).
+	 * Even if we don't want to know all the paths under an untracked or
+	 * ignored directory, we may still need to go into the directory to
+	 * determine if it is empty (because with DIR_HIDE_EMPTY_DIRECTORIES,
+	 * an empty directory should be path_none instead of path_excluded or
+	 * path_untracked).
 	 */
 	check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) &&
 		      !(dir->flags & DIR_SHOW_IGNORED_TOO));
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] dir, clean: avoid disallowed behavior
  2020-06-11  6:59 [PATCH 0/4] Clean cleanups Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 1/4] dir: fix a few confusing comments Elijah Newren via GitGitGadget
@ 2020-06-11  6:59 ` Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 3/4] clean: consolidate handling of ignored parameters Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 4/4] clean: optimize and document cases where we recurse into subdirectories Elijah Newren via GitGitGadget
  3 siblings, 0 replies; 7+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-06-11  6:59 UTC (permalink / raw)
  To: git; +Cc: bmalehorn, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

dir.h documented quite clearly that DIR_SHOW_IGNORED and
DIR_SHOW_IGNORED_TOO are mutually exclusive, with a big comment to this
effect by the definition of both enum values.  However, a command like
   git clean -fx $DIR
would set both values for dir.flags.  I _think_ it happened to work
because:
  * As dir.h points out, DIR_KEEP_UNTRACKED_CONTENTS only takes effect
    if DIR_SHOW_IGNORED_TOO is set.
  * As coded, I believe DIR_SHOW_IGNORED would just happen to take
    precedence over DIR_SHOW_IGNORED_TOO in the code as currently
    constructed.
Which is a long way of saying "we just got lucky".

Fix clean.c to avoid setting these mutually exclusive values at the same
time, and add a check to dir.c that will throw a BUG() to prevent anyone
else from making this mistake.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 2 +-
 dir.c           | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 4ca12bc0c0b..3ca940f83a2 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -954,7 +954,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		remove_directories = 1;
 	}
 
-	if (remove_directories)
+	if (remove_directories && !ignored_only)
 		dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
 
 	if (read_cache() < 0)
diff --git a/dir.c b/dir.c
index 6fb2f8ecdd7..a3ad9702d64 100644
--- a/dir.c
+++ b/dir.c
@@ -193,6 +193,10 @@ int fill_directory(struct dir_struct *dir,
 	const char *prefix;
 	size_t prefix_len;
 
+	unsigned exclusive_flags = DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO;
+	if ((dir->flags & exclusive_flags) == exclusive_flags)
+		BUG("DIR_SHOW_IGNORED and DIR_SHOW_IGNORED_TOO are exclusive");
+
 	/*
 	 * Calculate common prefix for the pathspec, and
 	 * use that to optimize the directory walk
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] clean: consolidate handling of ignored parameters
  2020-06-11  6:59 [PATCH 0/4] Clean cleanups Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 1/4] dir: fix a few confusing comments Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 2/4] dir, clean: avoid disallowed behavior Elijah Newren via GitGitGadget
@ 2020-06-11  6:59 ` Elijah Newren via GitGitGadget
  2020-06-11  6:59 ` [PATCH 4/4] clean: optimize and document cases where we recurse into subdirectories Elijah Newren via GitGitGadget
  3 siblings, 0 replies; 7+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-06-11  6:59 UTC (permalink / raw)
  To: git; +Cc: bmalehorn, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

I spent a long time trying to figure out how and whether the code worked
with different values of ignore, ignore_only, and remove_directories.
After lots of time setting up lots of testcases, sifting through lots of
print statements, and walking through the debugger, I finally realized
that one piece of code related to how it was all setup was found in
clean.c rather than dir.c.  Make a change that would have made it easier
for me to do the extra testing by putting this handling in one spot.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 3ca940f83a2..1be437bd5a3 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -924,12 +924,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 			     0);
 
 	memset(&dir, 0, sizeof(dir));
-	if (ignored_only)
-		dir.flags |= DIR_SHOW_IGNORED;
-
-	if (ignored && ignored_only)
-		die(_("-x and -X cannot be used together"));
-
 	if (!interactive && !dry_run && !force) {
 		if (config_set)
 			die(_("clean.requireForce set to true and neither -i, -n, nor -f given; "
@@ -946,6 +940,13 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
+	if (ignored && ignored_only)
+		die(_("-x and -X cannot be used together"));
+	if (!ignored)
+		setup_standard_excludes(&dir);
+	if (ignored_only)
+		dir.flags |= DIR_SHOW_IGNORED;
+
 	if (argc) {
 		/*
 		 * Remaining args implies pathspecs specified, and we should
@@ -960,9 +961,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 	if (read_cache() < 0)
 		die(_("index file corrupt"));
 
-	if (!ignored)
-		setup_standard_excludes(&dir);
-
 	pl = add_pattern_list(&dir, EXC_CMDL, "--exclude option");
 	for (i = 0; i < exclude_list.nr; i++)
 		add_pattern(exclude_list.items[i].string, "", 0, pl, -(i+1));
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] clean: optimize and document cases where we recurse into subdirectories
  2020-06-11  6:59 [PATCH 0/4] Clean cleanups Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-06-11  6:59 ` [PATCH 3/4] clean: consolidate handling of ignored parameters Elijah Newren via GitGitGadget
@ 2020-06-11  6:59 ` Elijah Newren via GitGitGadget
  2020-06-13  0:28   ` Junio C Hamano
  3 siblings, 1 reply; 7+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-06-11  6:59 UTC (permalink / raw)
  To: git; +Cc: bmalehorn, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths",
2017-05-23) added the following code block (among others) to git-clean:
    if (remove_directories)
        dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
The reason for these flags is well documented in the commit message, but
isn't obvious just from looking at the code.  Add some explanations to
the code to make it clearer.

Further, it appears git-2.26 did not correctly handle this combination
of flags from git-clean.  With both these flags and without
DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into
all untracked AND ignored directories.  git-2.26.0 clearly was not doing
that.  I don't know the full reasons for that or whether git < 2.27.0
had additional unknown bugs because of that misbehavior, because I don't
feel it's worth digging into.  As per the huge changes and craziness
documented in commit 8d92fb2927 ("dir: replace exponential algorithm
with a linear one", 2020-04-01), the old algorithm was a mess and was
thrown out.  What I can say is that git-2.27.0 correctly recurses into
untracked AND ignored directories with that combination.

However, in clean's case we don't need to recurse into ignored
directories; that is just a waste of time.  Thus, when git-2.27.0
started correctly handling those flags, we got a performance regression
report.  Rather than relying on other bugs in fill_directory()'s former
logic to provide the behavior of skipping ignored directories, make use
of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
commit eec0f7f2b7 ("status: add option to show ignored files
differently", 2017-10-30) for this purpose.

Reported-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 1be437bd5a3..5a9c29a558b 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -955,8 +955,37 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		remove_directories = 1;
 	}
 
-	if (remove_directories && !ignored_only)
-		dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
+	if (remove_directories && !ignored_only) {
+		/*
+		 * We need to know about ignored files too:
+		 *
+		 * If (ignored), then we will delete ignored files as well.
+		 *
+		 * If (!ignored), then even though we not are doing
+		 * anything with ignored files, we need to know about them
+		 * so that we can avoid deleting a directory of untracked
+		 * files that also contains an ignored file within it.
+		 *
+		 * For the (!ignored) case, since we only need to avoid
+		 * deleting ignored files, we can set
+		 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING in order to avoid
+		 * recursing into a directory which is itself ignored.
+		 */
+		dir.flags |= DIR_SHOW_IGNORED_TOO;
+		if (!ignored)
+			dir.flags |= DIR_SHOW_IGNORED_TOO_MODE_MATCHING;
+
+		/*
+		 * Let the fill_directory() machinery know that we aren't
+		 * just recursing to collect the ignored files; we want all
+		 * the untracked ones so that we can delete them.  (Note:
+		 * we could also set DIR_KEEP_UNTRACKED_CONTENTS when
+		 * ignored_only is true, since DIR_KEEP_UNTRACKED_CONTENTS
+		 * only has effect in combination with DIR_SHOW_IGNORED_TOO.  It makes
+		 * the code clearer to exclude it, though.
+		 */
+		dir.flags |= DIR_KEEP_UNTRACKED_CONTENTS;
+	}
 
 	if (read_cache() < 0)
 		die(_("index file corrupt"));
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] clean: optimize and document cases where we recurse into subdirectories
  2020-06-11  6:59 ` [PATCH 4/4] clean: optimize and document cases where we recurse into subdirectories Elijah Newren via GitGitGadget
@ 2020-06-13  0:28   ` Junio C Hamano
  2020-06-18  6:19     ` Brian Malehorn
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2020-06-13  0:28 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, bmalehorn, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> ...
> However, in clean's case we don't need to recurse into ignored
> directories; that is just a waste of time.

Nice.

> Rather than relying on other bugs in fill_directory()'s former
> logic to provide the behavior of skipping ignored directories, make use
> of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
> commit eec0f7f2b7 ("status: add option to show ignored files
> differently", 2017-10-30) for this purpose.

Thanks.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] clean: optimize and document cases where we recurse into subdirectories
  2020-06-13  0:28   ` Junio C Hamano
@ 2020-06-18  6:19     ` Brian Malehorn
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Malehorn @ 2020-06-18  6:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Elijah Newren

Catching up on this - thanks Elijah for jumping on this bug and fixing
it immediately. I'll let you know if I see similar bugs in the future.

Thanks,
Brian

On Fri, Jun 12, 2020 at 5:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > ...
> > However, in clean's case we don't need to recurse into ignored
> > directories; that is just a waste of time.
>
> Nice.
>
> > Rather than relying on other bugs in fill_directory()'s former
> > logic to provide the behavior of skipping ignored directories, make use
> > of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
> > commit eec0f7f2b7 ("status: add option to show ignored files
> > differently", 2017-10-30) for this purpose.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-18  6:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11  6:59 [PATCH 0/4] Clean cleanups Elijah Newren via GitGitGadget
2020-06-11  6:59 ` [PATCH 1/4] dir: fix a few confusing comments Elijah Newren via GitGitGadget
2020-06-11  6:59 ` [PATCH 2/4] dir, clean: avoid disallowed behavior Elijah Newren via GitGitGadget
2020-06-11  6:59 ` [PATCH 3/4] clean: consolidate handling of ignored parameters Elijah Newren via GitGitGadget
2020-06-11  6:59 ` [PATCH 4/4] clean: optimize and document cases where we recurse into subdirectories Elijah Newren via GitGitGadget
2020-06-13  0:28   ` Junio C Hamano
2020-06-18  6:19     ` Brian Malehorn

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).