git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v11 0/5] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-04-26 17:03 Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-04-26 17:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

This is the eleventh version of a patch series that implements the GSoC
microproject of converting a recursive call to readdir() to use
dir_iterator.

v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t
v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t
v7: https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnmvco@gmail.com/T/#t
v8: https://public-inbox.org/git/a60b2ed6-2b99-b134-05af-7c8492a6949c@alum.mit.edu/T/#t
v9: https://public-inbox.org/git/CAGZ79kaBRS0SFAvrV4mN7-mVk+8QmPKPJMD55zPQ+A14ZzYFYA@mail.gmail.com/T/#me8988b7dd4adbc4ea24946ccb24fc1cf7baf44e3
v10: https://public-inbox.org/git/xmqqk26fahjn.fsf@gitster.mtv.corp.google.com/T/#m3071006ec67457adf69578b37f55b625d0e7fed7

Travis CI build: https://travis-ci.org/theiostream/git/builds/226079792

Okay, in this version I factored in Junio's request for a test rename
to t0066, and most of Michael's suggestions from the last review. I'm
sorry for the delay on this one.

Instead of either removing the iterate root dir feature or return NULL
as its basename, I chose to get the real_path() out of the dir we are
iterating over and get the basename of that, to avoid the "/." or "/.."
issues. I think this is actually less complex than the NULL solution in
terms of code that would end up needing to be written, and I think the
root dir feature is handy to have on dir-iterator.

As for the suggestion to put strerror() on the test, I feared for the
message compatibility across platforms (since we actually check which
errno code we got). If you could give me a guarantee that this is not a
problem and you think it'd be worthy of yet another series, I'm up for
it.

As for Junio's concern for a rebase issue on the test script, the
reason for it was that the same file is used across two tests, so it
seemed unnecessary to recreate the file within each of them.

Daniel Ferreira (5):
  dir_iterator: add tests for dir_iterator API
  remove_subtree(): test removing nested directories
  dir_iterator: refactor dir_iterator_advance
  dir_iterator: rewrite state machine model
  remove_subtree(): reimplement using iterators

 Makefile                        |   1 +
 dir-iterator.c                  | 244 ++++++++++++++++++++++++++++------------
 dir-iterator.h                  |  35 ++++--
 entry.c                         |  42 +++----
 refs/files-backend.c            |  15 ++-
 t/helper/.gitignore             |   1 +
 t/helper/test-dir-iterator.c    |  53 +++++++++
 t/t0066-dir-iterator.sh         | 121 ++++++++++++++++++++
 t/t2000-checkout-cache-clash.sh |  11 ++
 9 files changed, 416 insertions(+), 107 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

--
2.7.4 (Apple Git-66)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v11 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-26 17:03 [PATCH v11 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-26 17:03 ` Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-04-26 17:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir_iterator.

Create t/t0066-dir-iterator.sh, which tests that dir_iterator does
iterate through a whole directory tree.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 Makefile                     |  1 +
 t/helper/.gitignore          |  1 +
 t/helper/test-dir-iterator.c | 30 ++++++++++++++++++++++++
 t/t0066-dir-iterator.sh      | 55 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 87 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

diff --git a/Makefile b/Makefile
index eb1a1a7..a669e43 100644
--- a/Makefile
+++ b/Makefile
@@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-dir-iterator
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index acd5db1..60adab5 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-dir-iterator
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000..a7d1470
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,30 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+int cmd_main(int argc, const char **argv)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2)
+		die("BUG: test-dir-iterator needs one argument");
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else if (S_ISREG(diter->st.st_mode))
+			printf("[f] ");
+		else
+			printf("[?] ");
+
+		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename, diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
new file mode 100755
index 0000000..46e5ce5
--- /dev/null
+++ b/t/t0066-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test directory iteration.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	mkdir -p dir2/a/b/c/ &&
+	>dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	cat >expect-sorted-output <<-\EOF &&
+	[d] (a) [a] ./dir/a
+	[d] (a/b) [b] ./dir/a/b
+	[d] (a/b/c) [c] ./dir/a/b/c
+	[d] (d) [d] ./dir/d
+	[d] (d/e) [e] ./dir/d/e
+	[d] (d/e/d) [d] ./dir/d/e/d
+	[f] (a/b/c/d) [d] ./dir/a/b/c/d
+	[f] (a/e) [e] ./dir/a/e
+	[f] (b) [b] ./dir/b
+	[f] (c) [c] ./dir/c
+	[f] (d/e/d/a) [a] ./dir/d/e/d/a
+	EOF
+
+	test-dir-iterator ./dir >out &&
+	sort <out >./actual-pre-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-pre-order-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	cat >expect-pre-order-output <<-\EOF &&
+	[d] (a) [a] ./dir2/a
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	EOF
+
+	test-dir-iterator ./dir2 >actual-pre-order-output &&
+
+	test_cmp expect-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v11 2/5] remove_subtree(): test removing nested directories
  2017-04-26 17:03 [PATCH v11 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-26 17:03 ` Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 3/5] dir_iterator: refactor dir_iterator_advance Daniel Ferreira
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-04-26 17:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 t/t2000-checkout-cache-clash.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
 	git checkout-index -a -f --prefix=there/
 '
 
+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+	echo content >path &&
+	git update-index --add path &&
+	rm path &&
+	mkdir -p path/with/nested/paths &&
+	echo content >path/file1 &&
+	echo content >path/with/nested/paths/file2 &&
+	git checkout-index -f -a &&
+	test ! -d path
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v11 3/5] dir_iterator: refactor dir_iterator_advance
  2017-04-26 17:03 [PATCH v11 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
@ 2017-04-26 17:03 ` Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-04-26 17:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Factor out reusable helpers out of dir_iterator_advance(). Make
dir_iterator_advance()'s code more legible and allow some behavior to
be reusable.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 66 ++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..d168cb2 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,44 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	level->dir_state = DIR_STATE_RECURSE;
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+		   iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+	level->initialized = 0;
+}
+
+static int pop_dir_level(struct dir_iterator_int *iter)
+{
+	return --iter->levels_nr;
+}
+
+static int adjust_iterator_data(struct dir_iterator_int *iter,
+		struct dir_iterator_level *level)
+{
+	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+				iter->base.path.buf,
+				strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * We have to set these each time because
+	 * the path strbuf might have been realloc()ed.
+	 */
+	iter->base.relative_path =
+		iter->base.path.buf + iter->levels[0].prefix_len;
+	iter->base.basename =
+		iter->base.path.buf + level->prefix_len;
+	level->dir_state = DIR_STATE_ITER;
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -84,11 +122,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				 * over; now prepare to iterate into
 				 * it.
 				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
+				push_dir_level(iter, level);
 				continue;
 			} else {
 				/*
@@ -104,7 +138,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			 * This level is exhausted (or wasn't opened
 			 * successfully); pop up a level.
 			 */
-			if (--iter->levels_nr == 0)
+			if (pop_dir_level(iter) == 0)
 				return dir_iterator_abort(dir_iterator);
 
 			continue;
@@ -129,7 +163,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 						iter->base.path.buf, strerror(errno));
 
 				level->dir = NULL;
-				if (--iter->levels_nr == 0)
+				if (pop_dir_level(iter) == 0)
 					return dir_iterator_abort(dir_iterator);
 				break;
 			}
@@ -138,23 +172,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
-				continue;
-			}
 
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
+			if (adjust_iterator_data(iter, level))
+				continue;
 
 			return ITER_OK;
 		}
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v11 4/5] dir_iterator: rewrite state machine model
  2017-04-26 17:03 [PATCH v11 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 preceding siblings ...)
  2017-04-26 17:03 ` [PATCH v11 3/5] dir_iterator: refactor dir_iterator_advance Daniel Ferreira
@ 2017-04-26 17:03 ` Daniel Ferreira
  2017-04-26 17:03 ` [PATCH v11 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-04-26 17:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Perform a rewrite of dir_iterator_advance(). dir_iterator has
ceased to rely on a combination of level.initialized and level.dir_state
state variables and now only tracks the state with level.dir_state,
which simplifies the iterator mechanism, makes the code easier to follow
and eases additions of new features to the iterator.

Make dir_iterator_begin() attempt to lstat() the path it receives, and
return NULL and an appropriate errno if it fails or if the passed path
was not a directory.

Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18). This is useful for
recursively removing a directory and calling rmdir() on a directory only
after all of its contents have been wiped.

Add an option for dir_iterator to also iterate over the initial
directory (the one passed to dir_iterator_begin()).

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned new features to be enabled. The new default behavior
(i.e. flags set to 0) does not iterate over directories. Flag
DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
iterates over the initial directory. These flags do not conflict with
each other and may be used simultaneously.

Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
the flags parameter introduced, as well as handle the case in which it
fails to open the directory.

Improve t/t0066-dir-iterator.sh and t/helper/test-dir-iterator.c to
test "post-order" and "iterate-over-root" modes.

Michael Haggerty contributed with the design of the new
dir_iterator_advance() implementation, the code for
t/helper/test-dir-iterator's option parser and numerous reviews that
gradually shaped this code to its current form.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c               | 212 ++++++++++++++++++++++++++++++-------------
 dir-iterator.h               |  35 +++++--
 refs/files-backend.c         |  15 ++-
 t/helper/test-dir-iterator.c |  31 ++++++-
 t/t0066-dir-iterator.sh      | 104 +++++++++++++++++----
 5 files changed, 299 insertions(+), 98 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index d168cb2..fba8f49 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -20,9 +18,15 @@ struct dir_iterator_level {
 	 * iteration and also iterated into):
 	 */
 	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
+		DIR_STATE_PUSHED,
+		DIR_STATE_PRE_ITERATION,
+		DIR_STATE_ITERATING,
+		DIR_STATE_POST_ITERATION,
+		DIR_STATE_EXHAUSTED
 	} dir_state;
+
+	/* The stat structure for the directory this level represents. */
+	struct stat st;
 };
 
 /*
@@ -48,15 +52,23 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Holds the flags passed to dir_iterator_begin(). */
+	unsigned flags;
 };
 
-static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+static void push_dir_level(struct dir_iterator_int *iter, struct stat *st)
 {
-	level->dir_state = DIR_STATE_RECURSE;
+	struct dir_iterator_level *level;
+
 	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
 		   iter->levels_alloc);
+
+	/* Push a new level */
 	level = &iter->levels[iter->levels_nr++];
-	level->initialized = 0;
+	level->dir = NULL;
+	level->dir_state = DIR_STATE_PUSHED;
+	level->st = *st;
 }
 
 static int pop_dir_level(struct dir_iterator_int *iter)
@@ -67,7 +79,9 @@ static int pop_dir_level(struct dir_iterator_int *iter)
 static int adjust_iterator_data(struct dir_iterator_int *iter,
 		struct dir_iterator_level *level)
 {
-	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+	if (level->dir_state != DIR_STATE_ITERATING) {
+		iter->base.st = level->st;
+	} else if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
 		if (errno != ENOENT)
 			warning("error reading path '%s': %s",
 				iter->base.path.buf,
@@ -76,18 +90,52 @@ static int adjust_iterator_data(struct dir_iterator_int *iter,
 	}
 
 	/*
-	 * We have to set these each time because
-	 * the path strbuf might have been realloc()ed.
+	 * Check if we are dealing with the root directory as an
+	 * item that's being iterated through.
 	 */
-	iter->base.relative_path =
-		iter->base.path.buf + iter->levels[0].prefix_len;
-	iter->base.basename =
-		iter->base.path.buf + level->prefix_len;
-	level->dir_state = DIR_STATE_ITER;
+	if (level->dir_state != DIR_STATE_ITERATING &&
+		iter->levels_nr == 1) {
+		const char *realpath, *last_path_component;
+
+		iter->base.relative_path = ".";
+
+		/*
+		 * To get the root directory basename, we get the absolute path
+		 * to our directory. Then, we get whatever is after the last '/'
+		 * as the basename (or the whole directory if that does not exist,
+		 * which should never happen.
+		 */
+		realpath = real_path(iter->base.path.buf);
+		last_path_component = strrchr(realpath, '/');
+		iter->base.basename = last_path_component ?
+			last_path_component + 1 : realpath;
+	} else {
+		iter->base.relative_path =
+			iter->base.path.buf + iter->levels[0].prefix_len;
+
+		if (S_ISDIR(iter->base.st.st_mode))
+			iter->base.basename =
+				iter->base.path.buf + iter->levels[iter->levels_nr - 2].prefix_len;
+		else
+			iter->base.basename =
+				iter->base.path.buf + level->prefix_len;
+	}
 
 	return 0;
 }
 
+/*
+ * This function uses a state machine with the following states:
+ * - DIR_STATE_PUSHED: the directory has been pushed to the
+ *   iterator traversal tree.
+ * - DIR_STATE_PRE_ITERATION: the directory is not opened with opendir(). The
+ *   dirpath has already been returned if pre-order traversal is set.
+ * - DIR_STATE_ITERATING: the directory is initialized. We are traversing
+ *   through it.
+ * - DIR_STATE_POST_ITERATION: the directory has been iterated through, and has
+ *   been closed.
+ * - DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
+ */
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -96,9 +144,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 	while (1) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
-		struct dirent *de;
 
-		if (!level->initialized) {
+		if (level->dir_state == DIR_STATE_PUSHED) {
+			level->dir_state = DIR_STATE_PRE_ITERATION;
+
+			/* We may not want the root directory to be iterated over */
+			if ((iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) && (
+				iter->levels_nr != 1 ||
+				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
+				/*
+				 * This will only error if we fail to lstat() the
+				 * root directory. In this case, we bail.
+				 */
+				if (adjust_iterator_data(iter, level)) {
+					level->dir_state = DIR_STATE_EXHAUSTED;
+					continue;
+				}
+
+				return ITER_OK;
+			}
+		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
 			/*
 			 * Note: dir_iterator_begin() ensures that
 			 * path is not the empty string.
@@ -108,64 +173,40 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			level->prefix_len = iter->base.path.len;
 
 			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
+			if (!level->dir) {
 				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
+				 * This level wasn't opened sucessfully; pretend we
+				 * iterated through it already.
 				 */
-				push_dir_level(iter, level);
+				if (errno != ENOENT)
+					warning("error opening directory %s: %s",
+						iter->base.path.buf, strerror(errno));
+
+				level->dir_state = DIR_STATE_POST_ITERATION;
 				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
 			}
-		}
-
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (pop_dir_level(iter) == 0)
-				return dir_iterator_abort(dir_iterator);
 
-			continue;
-		}
+			level->dir_state = DIR_STATE_ITERATING;
+		} else if (level->dir_state == DIR_STATE_ITERATING) {
+			struct dirent *de;
 
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
 			strbuf_setlen(&iter->base.path, level->prefix_len);
 			errno = 0;
 			de = readdir(level->dir);
 
 			if (!de) {
-				/* This level is exhausted; pop up a level. */
+				/* In case of readdir() error */
 				if (errno) {
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
-				} else if (closedir(level->dir))
+				}
+
+				if (closedir(level->dir))
 					warning("error closing directory %s: %s",
 						iter->base.path.buf, strerror(errno));
 
-				level->dir = NULL;
-				if (pop_dir_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
+				level->dir_state = DIR_STATE_POST_ITERATION;
+				continue;
 			}
 
 			if (is_dot_or_dotdot(de->d_name))
@@ -176,7 +217,37 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (adjust_iterator_data(iter, level))
 				continue;
 
+			if (S_ISDIR(iter->base.st.st_mode)) {
+				push_dir_level(iter, &iter->base.st);
+				continue;
+			}
+
 			return ITER_OK;
+		} else if (level->dir_state == DIR_STATE_POST_ITERATION) {
+			level->dir_state = DIR_STATE_EXHAUSTED;
+
+			strbuf_setlen(&iter->base.path, level->prefix_len);
+			/*
+			 * Since we are iterating through the dirpath
+			 * after we have gone through it, we still need
+			 * to get rid of the trailing slash we appended.
+			 */
+			strbuf_strip_suffix(&iter->base.path, "/");
+
+			/* We may not want the root directory to be iterated over */
+			if ((iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) && (
+				iter->levels_nr != 1 ||
+				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
+				/*
+				 * In this state, adjust_iterator_data() should never return
+				 * an error.
+				 */
+				adjust_iterator_data(iter, level);
+				return ITER_OK;
+			}
+		} else if (level->dir_state == DIR_STATE_EXHAUSTED) {
+			if (pop_dir_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
 		}
 	}
 }
@@ -202,21 +273,36 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
-	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
-	struct dir_iterator *dir_iterator = &iter->base;
+	struct dir_iterator_int *iter;
+	struct dir_iterator *dir_iterator;
+	struct stat st;
 
 	if (!path || !*path)
 		die("BUG: empty path passed to dir_iterator_begin()");
 
+	if (lstat(path, &st) < 0)
+		return NULL;
+
+	if (!S_ISDIR(st.st_mode)) {
+		errno = ENOTDIR;
+		return NULL;
+	}
+
+	iter = xcalloc(1, sizeof(*iter));
+	dir_iterator = &iter->base;
+
+	iter->flags = flags;
+	dir_iterator->st = st;
+
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+	iter->levels_nr = 0;
 
-	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	push_dir_level(iter, &dir_iterator->st);
 
 	return dir_iterator;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..e801748 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -5,19 +5,17 @@
  * Iterate over a directory tree.
  *
  * Iterate over a directory tree, recursively, including paths of all
- * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * types and hidden paths. Skip "." and ".." entries.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, flags);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -38,6 +36,26 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Possible flags for dir_iterator_begin().
+ *
+ * - DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
+ *   a dirpath it has found before iterating through that directory's
+ * contents.
+ * - DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
+ *   a dirpath it has found after iterating through that directory's
+ *   contents.
+ * - DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
+ *   of the root directory it is iterating through if either
+ *   DIR_ITERATOR_PRE_ORDER_TRAVERSAL or DIR_ITERATOR_POST_ORDER_TRAVERSAL
+ *   is set.
+ *
+ * All flags can be used in any combination.
+ */
+#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
+#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
+#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -57,15 +75,16 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path, with options specified in
+ * 'flags'. Return a dir_iterator that holds the internal state of
+ * the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
  * path is the starting directory. An internal copy will be made.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index c9d900f..cb492cb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3482,16 +3482,23 @@ static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 
 static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_store)
 {
+	struct dir_iterator *diter;
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ,
 			       "reflog_iterator_begin");
-	struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
-	struct ref_iterator *ref_iterator = &iter->base;
+	struct files_reflog_iterator *iter;
+	struct ref_iterator *ref_iterator;
+
 	struct strbuf sb = STRBUF_INIT;
+	files_reflog_path(refs, &sb, NULL);
+	if (!(diter = dir_iterator_begin(sb.buf, 0)))
+		return empty_ref_iterator_begin();
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
-	files_reflog_path(refs, &sb, NULL);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 	return ref_iterator;
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index a7d1470..3b4948b 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -5,15 +5,38 @@
 
 int cmd_main(int argc, const char **argv)
 {
+	const char **myargv = argv;
+	int myargc = argc;
+
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
+	unsigned flag = 0;
+
+	while (--myargc && starts_with(*++myargv, "--")) {
+		if (!strcmp(*myargv, "--pre-order"))
+			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--post-order"))
+			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--list-root-dir"))
+			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
+		else if (!strcmp(*myargv, "--")) {
+			myargc--;
+			myargv++;
+			break;
+		} else
+			die("Unrecognized option: %s", *myargv);
+	}
 
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (myargc != 1)
+		die("expected exactly one non-option argument");
+	strbuf_addstr(&path, *myargv);
 
-	diter = dir_iterator_begin(path.buf);
+	diter = dir_iterator_begin(path.buf, flag);
+	if (diter == NULL) {
+		printf("begin failed: %d\n", errno);
+		return 0;
+	}
 
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 46e5ce5..c2a28e7 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -15,31 +15,42 @@ test_expect_success 'setup' '
 	>dir/d/e/d/a &&
 
 	mkdir -p dir2/a/b/c/ &&
-	>dir2/a/b/c/d
+	>dir2/a/b/c/d &&
+
+	mkdir dir3 &&
+	>file
 '
 
-test_expect_success 'dir-iterator should iterate through all files' '
-	cat >expect-sorted-output <<-\EOF &&
-	[d] (a) [a] ./dir/a
-	[d] (a/b) [b] ./dir/a/b
-	[d] (a/b/c) [c] ./dir/a/b/c
-	[d] (d) [d] ./dir/d
-	[d] (d/e) [e] ./dir/d/e
-	[d] (d/e/d) [d] ./dir/d/e/d
-	[f] (a/b/c/d) [d] ./dir/a/b/c/d
-	[f] (a/e) [e] ./dir/a/e
-	[f] (b) [b] ./dir/b
-	[f] (c) [c] ./dir/c
-	[f] (d/e/d/a) [a] ./dir/d/e/d/a
-	EOF
+cat >expect-sorted-output <<-\EOF &&
+[d] (a) [a] ./dir/a
+[d] (a/b) [b] ./dir/a/b
+[d] (a/b/c) [c] ./dir/a/b/c
+[d] (d) [d] ./dir/d
+[d] (d/e) [e] ./dir/d/e
+[d] (d/e/d) [d] ./dir/d/e/d
+[f] (a/b/c/d) [d] ./dir/a/b/c/d
+[f] (a/e) [e] ./dir/a/e
+[f] (b) [b] ./dir/b
+[f] (c) [c] ./dir/c
+[f] (d/e/d/a) [a] ./dir/d/e/d/a
+EOF
 
-	test-dir-iterator ./dir >out &&
+test_expect_success 'dir-iterator should iterate through all files' '
+	test-dir-iterator --pre-order ./dir >out &&
 	sort <out >./actual-pre-order-sorted-output &&
 
 	test_cmp expect-sorted-output actual-pre-order-sorted-output
 '
 
-test_expect_success 'dir-iterator should list files in the correct order' '
+test_expect_success 'dir-iterator should iterate through all files on post-order mode' '
+	test-dir-iterator --post-order ./dir >out &&
+	sort <out >actual-post-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-post-order-sorted-output
+'
+
+
+test_expect_success 'dir-iterator should list files properly on pre-order mode' '
 	cat >expect-pre-order-output <<-\EOF &&
 	[d] (a) [a] ./dir2/a
 	[d] (a/b) [b] ./dir2/a/b
@@ -47,9 +58,64 @@ test_expect_success 'dir-iterator should list files in the correct order' '
 	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
 	EOF
 
-	test-dir-iterator ./dir2 >actual-pre-order-output &&
-
+	test-dir-iterator --pre-order ./dir2 >actual-pre-order-output &&
 	test_cmp expect-pre-order-output actual-pre-order-output
 '
 
+test_expect_success 'dir-iterator should list files properly on post-order mode' '
+	cat >expect-post-order-output <<-\EOF &&
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a) [a] ./dir2/a
+	EOF
+
+	test-dir-iterator --post-order ./dir2 >actual-post-order-output &&
+	test_cmp expect-post-order-output actual-post-order-output
+'
+
+test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' '
+	cat >expect-pre-order-post-order-root-dir-output <<-\EOF &&
+	[d] (.) [dir2] ./dir2
+	[d] (a) [a] ./dir2/a
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a) [a] ./dir2/a
+	[d] (.) [dir2] ./dir2
+	EOF
+
+	test-dir-iterator --pre-order --post-order --list-root-dir ./dir2 >actual-pre-order-post-order-root-dir-output &&
+	test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
+'
+
+test_expect_success 'dir-iterator should list root dir properly with relative directory' '
+	cat >expect-root-dir-output <<-\EOF &&
+	[d] (.) [dir3] ./dir3/.
+	EOF
+
+	test-dir-iterator --pre-order --list-root-dir ./dir3/. >actual-root-dir-output &&
+	test_cmp expect-root-dir-output actual-root-dir-output
+'
+
+test_expect_success 'dir-iterator should return ENOENT upon opening non-existing directory' '
+	cat >expect-non-existing-dir-output <<-\EOF &&
+	begin failed: 2
+	EOF
+
+	test-dir-iterator ./dir666 >actual-non-existing-dir-output &&
+	test_cmp expect-non-existing-dir-output actual-non-existing-dir-output
+'
+
+test_expect_success 'dir-iterator should return ENOTDIR upon opening non-directory path' '
+	cat >expect-not-a-directory-output <<-\EOF &&
+	begin failed: 20
+	EOF
+
+	test-dir-iterator ./file >actual-not-a-directory-output &&
+	test_cmp expect-not-a-directory-output actual-not-a-directory-output
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v11 5/5] remove_subtree(): reimplement using iterators
  2017-04-26 17:03 [PATCH v11 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (3 preceding siblings ...)
  2017-04-26 17:03 ` [PATCH v11 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
@ 2017-04-26 17:03 ` Daniel Ferreira
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-04-26 17:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 entry.c | 42 ++++++++++++++++--------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/entry.c b/entry.c
index d2b512d..a939432 100644
--- a/entry.c
+++ b/entry.c
@@ -3,6 +3,8 @@
 #include "dir.h"
 #include "streaming.h"
 #include "submodule.h"
+#include "iterator.h"
+#include "dir-iterator.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -45,33 +47,21 @@ static void create_directories(const char *path, int path_len,
 	free(buf);
 }
 
-static void remove_subtree(struct strbuf *path)
+static void remove_subtree(const char *path)
 {
-	DIR *dir = opendir(path->buf);
-	struct dirent *de;
-	int origlen = path->len;
-
-	if (!dir)
-		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
-		strbuf_addch(path, '/');
-		strbuf_addstr(path, de->d_name);
-		if (lstat(path->buf, &st))
-			die_errno("cannot lstat '%s'", path->buf);
-		if (S_ISDIR(st.st_mode))
-			remove_subtree(path);
-		else if (unlink(path->buf))
-			die_errno("cannot unlink '%s'", path->buf);
-		strbuf_setlen(path, origlen);
+	struct dir_iterator *diter = dir_iterator_begin(path,
+		DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR);
+	if (!diter) {
+		die_errno("cannot remove path '%s'", path);
+	}
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode)) {
+			if (rmdir(diter->path.buf))
+				die_errno("cannot rmdir '%s'", diter->path.buf);
+		} else if (unlink(diter->path.buf))
+			die_errno("cannot unlink '%s'", diter->path.buf);
 	}
-	closedir(dir);
-	if (rmdir(path->buf))
-		die_errno("cannot rmdir '%s'", path->buf);
 }
 
 static int create_file(const char *path, unsigned int mode)
@@ -312,7 +302,7 @@ int checkout_entry(struct cache_entry *ce,
 				return 0;
 			if (!state->force)
 				return error("%s is a directory", path.buf);
-			remove_subtree(&path);
+			remove_subtree(path.buf);
 		} else if (unlink(path.buf))
 			return error_errno("unable to unlink old '%s'", path.buf);
 	} else if (state->not_new)
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-26 17:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26 17:03 [PATCH v11 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-26 17:03 ` [PATCH v11 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
2017-04-26 17:03 ` [PATCH v11 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
2017-04-26 17:03 ` [PATCH v11 3/5] dir_iterator: refactor dir_iterator_advance Daniel Ferreira
2017-04-26 17:03 ` [PATCH v11 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
2017-04-26 17:03 ` [PATCH v11 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).