git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-04-17 20:51 Daniel Ferreira
  2017-04-17 20:51 ` [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-17 20:51 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

This is the ninth version of a patch series that implements the GSoC
microproject of converting a recursive call to readdir() to use
dir_iterator.

v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t
v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t
v7: https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnmvco@gmail.com/T/#t
v8: https://public-inbox.org/git/a60b2ed6-2b99-b134-05af-7c8492a6949c@alum.mit.edu/T/#t

Travis CI build: https://travis-ci.org/theiostream/git/builds/222919383

I think this is the closest to a final version we've ever gotten. I
followed all of Michael and Stefan's suggestions on top of v8, and with
Michael's endorsement made dir_iterator_begin() return NULL and set
errno appropriately in case of an error.

On second thought, maybe the extra code complexity required from
dir_iterator_begin()'s callers might be actually an advantage as
dir_iterator grows to tackle more complex dir traversing challenges on
Git. After all, we might want some special behavior depending on what
the given `path` is instead of always considering it valid and later
behaving as if it was an empty directory.

Thanks again for the reviews.

Daniel Ferreira (5):
  dir_iterator: add tests for dir_iterator API
  remove_subtree(): test removing nested directories
  dir_iterator: add helpers to dir_iterator_advance
  dir_iterator: refactor state machine model
  remove_subtree(): reimplement using iterators

 Makefile                        |   1 +
 dir-iterator.c                  | 252 +++++++++++++++++++++++++++++-----------
 dir-iterator.h                  |  35 ++++--
 entry.c                         |  42 +++----
 refs/files-backend.c            |  51 +++++---
 t/helper/.gitignore             |   1 +
 t/helper/test-dir-iterator.c    |  53 +++++++++
 t/t0065-dir-iterator.sh         | 113 ++++++++++++++++++
 t/t2000-checkout-cache-clash.sh |  11 ++
 9 files changed, 435 insertions(+), 124 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

--
2.7.4 (Apple Git-66)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-17 20:51 [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-17 20:51 ` Daniel Ferreira
  2017-04-18  4:12   ` Junio C Hamano
  2017-04-18  4:37   ` Junio C Hamano
  2017-04-17 20:51 ` [PATCH v9 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-17 20:51 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir_iterator.

Create t/t0065-dir-iterator.sh, which tests that dir_iterator does
iterate through a whole directory tree.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 Makefile                     |  1 +
 t/helper/.gitignore          |  1 +
 t/helper/test-dir-iterator.c | 30 ++++++++++++++++++++++++
 t/t0065-dir-iterator.sh      | 55 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 87 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

diff --git a/Makefile b/Makefile
index d595ea3..a5c1ac0 100644
--- a/Makefile
+++ b/Makefile
@@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-dir-iterator
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 758ed2e..a7d74d3 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-dir-iterator
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000..a7d1470
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,30 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+int cmd_main(int argc, const char **argv)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2)
+		die("BUG: test-dir-iterator needs one argument");
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else if (S_ISREG(diter->st.st_mode))
+			printf("[f] ");
+		else
+			printf("[?] ");
+
+		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename, diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
new file mode 100755
index 0000000..b6d76dd
--- /dev/null
+++ b/t/t0065-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test directory iteration.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	mkdir -p dir2/a/b/c/ &&
+	>dir2/a/b/c/d
+'
+
+cat >expect-sorted-output <<-\EOF &&
+[d] (a) [a] ./dir/a
+[d] (a/b) [b] ./dir/a/b
+[d] (a/b/c) [c] ./dir/a/b/c
+[d] (d) [d] ./dir/d
+[d] (d/e) [e] ./dir/d/e
+[d] (d/e/d) [d] ./dir/d/e/d
+[f] (a/b/c/d) [d] ./dir/a/b/c/d
+[f] (a/e) [e] ./dir/a/e
+[f] (b) [b] ./dir/b
+[f] (c) [c] ./dir/c
+[f] (d/e/d/a) [a] ./dir/d/e/d/a
+EOF
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	test-dir-iterator ./dir >out &&
+	sort <out >./actual-pre-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-pre-order-sorted-output
+'
+
+cat >expect-pre-order-output <<-\EOF &&
+[d] (a) [a] ./dir2/a
+[d] (a/b) [b] ./dir2/a/b
+[d] (a/b/c) [c] ./dir2/a/b/c
+[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+EOF
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	test-dir-iterator ./dir2 >actual-pre-order-output &&
+
+	test_cmp expect-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 2/5] remove_subtree(): test removing nested directories
  2017-04-17 20:51 [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-17 20:51 ` [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-17 20:51 ` Daniel Ferreira
  2017-04-17 20:51 ` [PATCH v9 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-17 20:51 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 t/t2000-checkout-cache-clash.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
 	git checkout-index -a -f --prefix=there/
 '
 
+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+	echo content >path &&
+	git update-index --add path &&
+	rm path &&
+	mkdir -p path/with/nested/paths &&
+	echo content >path/file1 &&
+	echo content >path/with/nested/paths/file2 &&
+	git checkout-index -f -a &&
+	test ! -d path
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 3/5] dir_iterator: add helpers to dir_iterator_advance
  2017-04-17 20:51 [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-17 20:51 ` [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
  2017-04-17 20:51 ` [PATCH v9 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
@ 2017-04-17 20:51 ` Daniel Ferreira
  2017-04-18  4:23   ` Junio C Hamano
  2017-04-17 20:51 ` [PATCH v9 4/5] dir_iterator: refactor state machine model Daniel Ferreira
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-17 20:51 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create helpers to dir_iterator_advance(). Make dir_iterator_advance()'s
code more legible and allow some behavior to be reusable.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 65 +++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..9e073a0 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	level->dir_state = DIR_STATE_RECURSE;
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+		   iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+	level->initialized = 0;
+}
+
+static int pop_dir_level(struct dir_iterator_int *iter)
+{
+	return --iter->levels_nr;
+}
+
+static int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+				iter->base.path.buf,
+				strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * We have to set these each time because
+	 * the path strbuf might have been realloc()ed.
+	 */
+	iter->base.relative_path =
+		iter->base.path.buf + iter->levels[0].prefix_len;
+	iter->base.basename =
+		iter->base.path.buf + level->prefix_len;
+	level->dir_state = DIR_STATE_ITER;
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				 * over; now prepare to iterate into
 				 * it.
 				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
+				push_dir_level(iter, level);
 				continue;
 			} else {
 				/*
@@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			 * This level is exhausted (or wasn't opened
 			 * successfully); pop up a level.
 			 */
-			if (--iter->levels_nr == 0)
+			if (pop_dir_level(iter) == 0)
 				return dir_iterator_abort(dir_iterator);
 
 			continue;
@@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 						iter->base.path.buf, strerror(errno));
 
 				level->dir = NULL;
-				if (--iter->levels_nr == 0)
+				if (pop_dir_level(iter) == 0)
 					return dir_iterator_abort(dir_iterator);
 				break;
 			}
@@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
-				continue;
-			}
 
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
+			if (set_iterator_data(iter, level))
+				continue;
 
 			return ITER_OK;
 		}
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 4/5] dir_iterator: refactor state machine model
  2017-04-17 20:51 [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 preceding siblings ...)
  2017-04-17 20:51 ` [PATCH v9 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
@ 2017-04-17 20:51 ` Daniel Ferreira
  2017-04-18  4:35   ` Junio C Hamano
  2017-04-18  5:13   ` Junio C Hamano
  2017-04-17 20:51 ` [PATCH v9 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-18  4:09 ` [PATCH v9 0/5] [GSoC] " Junio C Hamano
  5 siblings, 2 replies; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-17 20:51 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Perform major refactor of dir_iterator_advance(). dir_iterator has
ceased to rely on a convoluted state machine mechanism of two loops and
two state variables (level.initialized and level.dir_state). This serves
to ease comprehension of the iterator mechanism and ease addition of new
features to the iterator.

Make dir_iterator_begin() attempt to lstat() the path it receives, and
return NULL and an appropriate errno if it fails or if the passed path
was not a directory.

Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18). This is useful for
recursively removing a directory and calling rmdir() on a directory only
after all of its contents have been wiped.

Add an option for the dir_iterator API to iterate over the root
directory (the one it was initialized with) as well.

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned new features to be enabled. The new default behavior
(i.e. flags set to 0) does not iterate over directories. Flag
DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
iterates over the root directory. These flags do not conflict with each
other and may be used simultaneously.

Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
the flags parameter introduced, as well as handle the case in which it
fails to open the directory.

Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
test "post-order" and "iterate-over-root" modes.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 dir-iterator.c               | 223 ++++++++++++++++++++++++++++++-------------
 dir-iterator.h               |  35 +++++--
 refs/files-backend.c         |  51 ++++++----
 t/helper/test-dir-iterator.c |  31 +++++-
 t/t0065-dir-iterator.sh      |  66 ++++++++++++-
 5 files changed, 305 insertions(+), 101 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 9e073a0..9394797 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -20,9 +18,15 @@ struct dir_iterator_level {
 	 * iteration and also iterated into):
 	 */
 	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
+		DIR_STATE_PUSHED,
+		DIR_STATE_PRE_ITERATION,
+		DIR_STATE_ITERATING,
+		DIR_STATE_POST_ITERATION,
+		DIR_STATE_EXHAUSTED
 	} dir_state;
+
+	/* The stat structure for the directory this level represents. */
+	struct stat st;
 };
 
 /*
@@ -48,15 +52,23 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Holds the flags passed to dir_iterator_begin(). */
+	unsigned flags;
 };
 
-static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+static void push_dir_level(struct dir_iterator_int *iter, struct stat *st)
 {
-	level->dir_state = DIR_STATE_RECURSE;
+	struct dir_iterator_level *level;
+
 	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
 		   iter->levels_alloc);
+
+	/* Push a new level */
 	level = &iter->levels[iter->levels_nr++];
-	level->initialized = 0;
+	level->dir = NULL;
+	level->dir_state = DIR_STATE_PUSHED;
+	level->st = *st;
 }
 
 static int pop_dir_level(struct dir_iterator_int *iter)
@@ -64,9 +76,14 @@ static int pop_dir_level(struct dir_iterator_int *iter)
 	return --iter->levels_nr;
 }
 
-static int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+static int set_iterator_data(struct dir_iterator_int *iter,
+		struct dir_iterator_level *level)
 {
-	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+	char *last_path_component;
+
+	if (level->dir_state != DIR_STATE_ITERATING) {
+		iter->base.st = level->st;
+	} else if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
 		if (errno != ENOENT)
 			warning("error reading path '%s': %s",
 				iter->base.path.buf,
@@ -75,18 +92,48 @@ static int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_
 	}
 
 	/*
-	 * We have to set these each time because
-	 * the path strbuf might have been realloc()ed.
+	 * Check if we are dealing with the root directory as an
+	 * item that's being iterated through.
 	 */
-	iter->base.relative_path =
-		iter->base.path.buf + iter->levels[0].prefix_len;
-	iter->base.basename =
-		iter->base.path.buf + level->prefix_len;
-	level->dir_state = DIR_STATE_ITER;
+	if (level->dir_state != DIR_STATE_ITERATING &&
+		iter->levels_nr == 1) {
+		iter->base.relative_path = ".";
+
+		/*
+		 * If we have a path like './dir', we'll get everything
+		 * after the last slash a basename. If we don't find a
+		 * slash (e.g. 'dir'), we return the whole path.
+		 */
+		last_path_component = strrchr(iter->base.path.buf, '/');
+		iter->base.basename = last_path_component ?
+			last_path_component + 1 : iter->base.path.buf;
+	} else {
+		iter->base.relative_path =
+			iter->base.path.buf + iter->levels[0].prefix_len;
+
+		if (S_ISDIR(iter->base.st.st_mode))
+			iter->base.basename =
+				iter->base.path.buf + iter->levels[iter->levels_nr - 2].prefix_len;
+		else
+			iter->base.basename =
+				iter->base.path.buf + level->prefix_len;
+	}
 
 	return 0;
 }
 
+/*
+ * This function uses a state machine with the following states:
+ * -> DIR_STATE_PUSHED: the directory has been pushed to the
+ * iterator traversal tree.
+ * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
+ * dirpath has already been returned if pre-order traversal is set.
+ * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
+ * through it.
+ * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
+ * We are ready to close it.
+ * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
+ */
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -95,9 +142,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 	while (1) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
-		struct dirent *de;
 
-		if (!level->initialized) {
+		if (level->dir_state == DIR_STATE_PUSHED) {
+			level->dir_state = DIR_STATE_PRE_ITERATION;
+
+			/* We may not want the root directory to be iterated over */
+			if ((iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) && (
+				iter->levels_nr != 1 ||
+				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
+				/*
+				 * This will only error if we fail to lstat() the
+				 * root directory. In this case, we bail.
+				 */
+				if (set_iterator_data(iter, level)) {
+					level->dir_state = DIR_STATE_EXHAUSTED;
+					continue;
+				}
+
+				return ITER_OK;
+			}
+		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
 			/*
 			 * Note: dir_iterator_begin() ensures that
 			 * path is not the empty string.
@@ -107,64 +171,37 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			level->prefix_len = iter->base.path.len;
 
 			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
+			if (!level->dir) {
 				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
+				 * This level wasn't opened sucessfully; pretend we
+				 * iterated through it already.
 				 */
-				push_dir_level(iter, level);
+				if (errno != ENOENT) {
+					warning("error opening directory %s: %s",
+						iter->base.path.buf, strerror(errno));
+				}
+
+				level->dir_state = DIR_STATE_POST_ITERATION;
 				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
 			}
-		}
 
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (pop_dir_level(iter) == 0)
-				return dir_iterator_abort(dir_iterator);
+			level->dir_state = DIR_STATE_ITERATING;
+		} else if (level->dir_state == DIR_STATE_ITERATING) {
+			struct dirent *de;
 
-			continue;
-		}
-
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
 			strbuf_setlen(&iter->base.path, level->prefix_len);
 			errno = 0;
 			de = readdir(level->dir);
 
 			if (!de) {
-				/* This level is exhausted; pop up a level. */
+				/* In case of readdir() error */
 				if (errno) {
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
-				} else if (closedir(level->dir))
-					warning("error closing directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+				}
 
-				level->dir = NULL;
-				if (pop_dir_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
+				level->dir_state = DIR_STATE_POST_ITERATION;
+				continue;
 			}
 
 			if (is_dot_or_dotdot(de->d_name))
@@ -175,7 +212,41 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (set_iterator_data(iter, level))
 				continue;
 
+			if (S_ISDIR(iter->base.st.st_mode)) {
+				push_dir_level(iter, &iter->base.st);
+				continue;
+			}
+
 			return ITER_OK;
+		} else if (level->dir_state == DIR_STATE_POST_ITERATION) {
+			if (level->dir != NULL && closedir(level->dir)) {
+				warning("error closing directory %s: %s",
+					iter->base.path.buf, strerror(errno));
+			}
+			level->dir_state = DIR_STATE_EXHAUSTED;
+
+			strbuf_setlen(&iter->base.path, level->prefix_len);
+			/*
+			 * Since we are iterating through the dirpath
+			 * after we have gone through it, we still need
+			 * to get rid of the trailing slash we appended.
+			 */
+			strbuf_strip_suffix(&iter->base.path, "/");
+
+			/* We may not want the root directory to be iterated over */
+			if ((iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) && (
+				iter->levels_nr != 1 ||
+				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
+				/*
+				 * In this state, set_iterator_data() should never return
+				 * an error.
+				 */
+				set_iterator_data(iter, level);
+				return ITER_OK;
+			}
+		} else if (level->dir_state == DIR_STATE_EXHAUSTED) {
+			if (pop_dir_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
 		}
 	}
 }
@@ -201,21 +272,41 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
-	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
-	struct dir_iterator *dir_iterator = &iter->base;
+	struct dir_iterator_int *iter;
+	struct dir_iterator *dir_iterator;
+	struct stat st;
 
 	if (!path || !*path)
 		die("BUG: empty path passed to dir_iterator_begin()");
 
+	if (lstat(path, &st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+					path, strerror(errno));
+
+		return NULL;
+	}
+
+	if (!S_ISDIR(st.st_mode)) {
+		errno = ENOTDIR;
+		return NULL;
+	}
+
+	iter = xcalloc(1, sizeof(*iter));
+	dir_iterator = &iter->base;
+
+	iter->flags = flags;
+	dir_iterator->st = st;
+
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+	iter->levels_nr = 0;
 
-	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	push_dir_level(iter, &dir_iterator->st);
 
 	return dir_iterator;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..4d8ab75 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -5,19 +5,17 @@
  * Iterate over a directory tree.
  *
  * Iterate over a directory tree, recursively, including paths of all
- * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * types and hidden paths. Skip "." and ".." entries.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, flags);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -38,6 +36,26 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Possible flags for dir_iterator_begin().
+ *
+ * -> DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found before iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found after iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
+ * of the root directory it is iterating through if either
+ * DIR_ITERATOR_PRE_ORDER_TRAVERSAL or DIR_ITERATOR_POST_ORDER_TRAVERSAL
+ * is set.
+ *
+ * All flags can be used in any combination.
+ */
+#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
+#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
+#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -57,15 +75,16 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path, with options specified in
+ * 'flags'. Return a dir_iterator that holds the internal state of
+ * the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
  * path is the starting directory. An internal copy will be made.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 50188e9..d376e1f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3284,29 +3284,42 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	struct dir_iterator *diter = iter->dir_iterator;
 	int ok;
 
-	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
-		int flags;
-
-		if (!S_ISREG(diter->st.st_mode))
-			continue;
-		if (diter->basename[0] == '.')
-			continue;
-		if (ends_with(diter->basename, ".lock"))
-			continue;
+	if (!iter->dir_iterator) {
+		/*
+		 * If our dir_iterator is NULL at this stage, it means we failed
+		 * to open the given path at dir_iterator_begin(), most likely due
+		 * to it not existing.
+		 *
+		 * In this case, we pretend it was an empty directory and just
+		 * iterate through nothing.
+		 */
+		ok = ITER_DONE;
+	} else {
+		while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
+			int flags;
+
+			if (!S_ISREG(diter->st.st_mode))
+				continue;
+			if (diter->basename[0] == '.')
+				continue;
+			if (ends_with(diter->basename, ".lock"))
+				continue;
+
+			if (read_ref_full(diter->relative_path, 0,
+					  iter->oid.hash, &flags)) {
+				error("bad ref for %s", diter->path.buf);
+				continue;
+			}
 
-		if (read_ref_full(diter->relative_path, 0,
-				  iter->oid.hash, &flags)) {
-			error("bad ref for %s", diter->path.buf);
-			continue;
+			iter->base.refname = diter->relative_path;
+			iter->base.oid = &iter->oid;
+			iter->base.flags = flags;
+			return ITER_OK;
 		}
 
-		iter->base.refname = diter->relative_path;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
-		return ITER_OK;
+		iter->dir_iterator = NULL;
 	}
 
-	iter->dir_iterator = NULL;
 	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
 		ok = ITER_ERROR;
 	return ok;
@@ -3346,7 +3359,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 	files_downcast(ref_store, 0, "reflog_iterator_begin");
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
-	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+	iter->dir_iterator = dir_iterator_begin(git_path("logs"), 0);
 	return ref_iterator;
 }
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index a7d1470..3b4948b 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -5,15 +5,38 @@
 
 int cmd_main(int argc, const char **argv)
 {
+	const char **myargv = argv;
+	int myargc = argc;
+
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
+	unsigned flag = 0;
+
+	while (--myargc && starts_with(*++myargv, "--")) {
+		if (!strcmp(*myargv, "--pre-order"))
+			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--post-order"))
+			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--list-root-dir"))
+			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
+		else if (!strcmp(*myargv, "--")) {
+			myargc--;
+			myargv++;
+			break;
+		} else
+			die("Unrecognized option: %s", *myargv);
+	}
 
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (myargc != 1)
+		die("expected exactly one non-option argument");
+	strbuf_addstr(&path, *myargv);
 
-	diter = dir_iterator_begin(path.buf);
+	diter = dir_iterator_begin(path.buf, flag);
+	if (diter == NULL) {
+		printf("begin failed: %d\n", errno);
+		return 0;
+	}
 
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
index b6d76dd..157593e 100755
--- a/t/t0065-dir-iterator.sh
+++ b/t/t0065-dir-iterator.sh
@@ -15,7 +15,9 @@ test_expect_success 'setup' '
 	>dir/d/e/d/a &&
 
 	mkdir -p dir2/a/b/c/ &&
-	>dir2/a/b/c/d
+	>dir2/a/b/c/d &&
+
+	>file
 '
 
 cat >expect-sorted-output <<-\EOF &&
@@ -33,12 +35,19 @@ cat >expect-sorted-output <<-\EOF &&
 EOF
 
 test_expect_success 'dir-iterator should iterate through all files' '
-	test-dir-iterator ./dir >out &&
+	test-dir-iterator --pre-order ./dir >out &&
 	sort <out >./actual-pre-order-sorted-output &&
 
 	test_cmp expect-sorted-output actual-pre-order-sorted-output
 '
 
+test_expect_success 'dir-iterator should iterate through all files on post-order mode' '
+	test-dir-iterator --post-order ./dir >out &&
+	sort <out >actual-post-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-post-order-sorted-output
+'
+
 cat >expect-pre-order-output <<-\EOF &&
 [d] (a) [a] ./dir2/a
 [d] (a/b) [b] ./dir2/a/b
@@ -46,10 +55,59 @@ cat >expect-pre-order-output <<-\EOF &&
 [f] (a/b/c/d) [d] ./dir2/a/b/c/d
 EOF
 
-test_expect_success 'dir-iterator should list files in the correct order' '
-	test-dir-iterator ./dir2 >actual-pre-order-output &&
+test_expect_success 'dir-iterator should list files properly on pre-order mode' '
+	test-dir-iterator --pre-order ./dir2 >actual-pre-order-output &&
 
 	test_cmp expect-pre-order-output actual-pre-order-output
 '
 
+cat >expect-post-order-output <<-\EOF &&
+[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+[d] (a/b/c) [c] ./dir2/a/b/c
+[d] (a/b) [b] ./dir2/a/b
+[d] (a) [a] ./dir2/a
+EOF
+
+test_expect_success 'dir-iterator should list files properly on post-order mode' '
+	test-dir-iterator --post-order ./dir2 >actual-post-order-output &&
+
+	test_cmp expect-post-order-output actual-post-order-output
+'
+
+cat >expect-pre-order-post-order-root-dir-output <<-\EOF &&
+[d] (.) [dir2] ./dir2
+[d] (a) [a] ./dir2/a
+[d] (a/b) [b] ./dir2/a/b
+[d] (a/b/c) [c] ./dir2/a/b/c
+[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+[d] (a/b/c) [c] ./dir2/a/b/c
+[d] (a/b) [b] ./dir2/a/b
+[d] (a) [a] ./dir2/a
+[d] (.) [dir2] ./dir2
+EOF
+
+test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' '
+	test-dir-iterator --pre-order --post-order --list-root-dir ./dir2 >actual-pre-order-post-order-root-dir-output &&
+
+	test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
+'
+
+cat >expect-non-existing-dir-output <<-\EOF &&
+begin failed: 2
+EOF
+
+test_expect_success 'dir-iterator should return ENOENT upon opening non-existing directory' '
+	test-dir-iterator ./dir3 >actual-non-existing-dir-output &&
+	test_cmp expect-non-existing-dir-output actual-non-existing-dir-output
+'
+
+cat >expect-not-a-directory-output <<-\EOF &&
+begin failed: 20
+EOF
+
+test_expect_success 'dir-iterator should return ENOTDIR upon opening non-directory path' '
+	test-dir-iterator ./file >actual-not-a-directory-output &&
+	test_cmp expect-not-a-directory-output actual-not-a-directory-output
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v9 5/5] remove_subtree(): reimplement using iterators
  2017-04-17 20:51 [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (3 preceding siblings ...)
  2017-04-17 20:51 ` [PATCH v9 4/5] dir_iterator: refactor state machine model Daniel Ferreira
@ 2017-04-17 20:51 ` Daniel Ferreira
  2017-04-18  4:09 ` [PATCH v9 0/5] [GSoC] " Junio C Hamano
  5 siblings, 0 replies; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-17 20:51 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 entry.c | 42 ++++++++++++++++--------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/entry.c b/entry.c
index d2b512d..a939432 100644
--- a/entry.c
+++ b/entry.c
@@ -3,6 +3,8 @@
 #include "dir.h"
 #include "streaming.h"
 #include "submodule.h"
+#include "iterator.h"
+#include "dir-iterator.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -45,33 +47,21 @@ static void create_directories(const char *path, int path_len,
 	free(buf);
 }
 
-static void remove_subtree(struct strbuf *path)
+static void remove_subtree(const char *path)
 {
-	DIR *dir = opendir(path->buf);
-	struct dirent *de;
-	int origlen = path->len;
-
-	if (!dir)
-		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
-		strbuf_addch(path, '/');
-		strbuf_addstr(path, de->d_name);
-		if (lstat(path->buf, &st))
-			die_errno("cannot lstat '%s'", path->buf);
-		if (S_ISDIR(st.st_mode))
-			remove_subtree(path);
-		else if (unlink(path->buf))
-			die_errno("cannot unlink '%s'", path->buf);
-		strbuf_setlen(path, origlen);
+	struct dir_iterator *diter = dir_iterator_begin(path,
+		DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR);
+	if (!diter) {
+		die_errno("cannot remove path '%s'", path);
+	}
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode)) {
+			if (rmdir(diter->path.buf))
+				die_errno("cannot rmdir '%s'", diter->path.buf);
+		} else if (unlink(diter->path.buf))
+			die_errno("cannot unlink '%s'", diter->path.buf);
 	}
-	closedir(dir);
-	if (rmdir(path->buf))
-		die_errno("cannot rmdir '%s'", path->buf);
 }
 
 static int create_file(const char *path, unsigned int mode)
@@ -312,7 +302,7 @@ int checkout_entry(struct cache_entry *ce,
 				return 0;
 			if (!state->force)
 				return error("%s is a directory", path.buf);
-			remove_subtree(&path);
+			remove_subtree(path.buf);
 		} else if (unlink(path.buf))
 			return error_errno("unable to unlink old '%s'", path.buf);
 	} else if (state->not_new)
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators
  2017-04-17 20:51 [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (4 preceding siblings ...)
  2017-04-17 20:51 ` [PATCH v9 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-18  4:09 ` Junio C Hamano
  2017-04-18 18:45   ` Stefan Beller
  5 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2017-04-18  4:09 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> I think this is the closest to a final version we've ever gotten. I
> followed all of Michael and Stefan's suggestions on top of v8, and with
> Michael's endorsement made dir_iterator_begin() return NULL and set
> errno appropriately in case of an error.
>
> On second thought, maybe the extra code complexity required from
> dir_iterator_begin()'s callers might be actually an advantage as
> dir_iterator grows to tackle more complex dir traversing challenges on
> Git. After all, we might want some special behavior depending on what
> the given `path` is instead of always considering it valid and later
> behaving as if it was an empty directory.
>
> Thanks again for the reviews.

I had a bit of trouble with phrasing here and there, but other than
that the series was a pleasant read overall.

Will queue, anticipating "Yeah, this is good as the final version"
comments from reviewers.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-17 20:51 ` [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-18  4:12   ` Junio C Hamano
  2017-04-18  4:37   ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2017-04-18  4:12 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> +test_expect_success 'setup' '
> +	mkdir -p dir &&
> +	mkdir -p dir/a/b/c/ &&
> +	>dir/b &&
> +	>dir/c &&
> +	mkdir -p dir/d/e/d/ &&
> +	>dir/a/b/c/d &&
> +	>dir/a/e &&
> +	>dir/d/e/d/a &&
> +
> +	mkdir -p dir2/a/b/c/ &&
> +	>dir2/a/b/c/d
> +'
> +
> +cat >expect-sorted-output <<-\EOF &&
> +[d] (a) [a] ./dir/a
> +[d] (a/b) [b] ./dir/a/b
> +[d] (a/b/c) [c] ./dir/a/b/c
> +[d] (d) [d] ./dir/d
> +[d] (d/e) [e] ./dir/d/e
> +[d] (d/e/d) [d] ./dir/d/e/d
> +[f] (a/b/c/d) [d] ./dir/a/b/c/d
> +[f] (a/e) [e] ./dir/a/e
> +[f] (b) [b] ./dir/b
> +[f] (c) [c] ./dir/c
> +[f] (d/e/d/a) [a] ./dir/d/e/d/a
> +EOF
> +
> +test_expect_success 'dir-iterator should iterate through all files' '
> +	test-dir-iterator ./dir >out &&
> +	sort <out >./actual-pre-order-sorted-output &&
> +
> +	test_cmp expect-sorted-output actual-pre-order-sorted-output
> +'

OK, if you can get multiple entries from a single directory, the
order of these entries are not known because they come in readdir()
order, and you'd need to sort the output to see if you got the same
thing.  Sort of makes sense.

> +
> +cat >expect-pre-order-output <<-\EOF &&
> +[d] (a) [a] ./dir2/a
> +[d] (a/b) [b] ./dir2/a/b
> +[d] (a/b/c) [c] ./dir2/a/b/c
> +[f] (a/b/c/d) [d] ./dir2/a/b/c/d
> +EOF
> +
> +test_expect_success 'dir-iterator should list files in the correct order' '
> +	test-dir-iterator ./dir2 >actual-pre-order-output &&
> +
> +	test_cmp expect-pre-order-output actual-pre-order-output
> +'

And this example has only one item per each recursion level, so the
order of output is predictable.  Again, makes sense.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 3/5] dir_iterator: add helpers to dir_iterator_advance
  2017-04-17 20:51 ` [PATCH v9 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
@ 2017-04-18  4:23   ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2017-04-18  4:23 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> Create helpers to dir_iterator_advance(). Make dir_iterator_advance()'s
> code more legible and allow some behavior to be reusable.
>
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---

This is the kind of change we typically call "refactoring" ---
factoring out reusable helpers out of existing code, without
changing the behaviour of the code at all.

> +static int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
> +{

If you are going to fold line in a patch that immediately follows
this one, do so from the get-go here.

I am not happy with the name of the function in that "set data" does
not convey as much information as it spends words.  

Conceptually, this is called once per each new directory entry
readdir() returns, and the most important "iterator data" that is
set to iter structure per loop is to update iter->base.path with
de->d_name; everything else is secondary and derived data from that.
And from that point of view, I have trouble with this function not
getting "de" passed to it, but the most important setting is instead
done by its caller.

If the function were called "adjust_iterator_data()", I wouldn't
find this function so troubling, I guess; after the caller sets the
iter.base.path, all the secondary data that can be derived from it
that are used by the caller of "advance" are set by this function.

> +	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> +		if (errno != ENOENT)
> +			warning("error reading path '%s': %s",
> +				iter->base.path.buf,
> +				strerror(errno));
> +		return -1;
> +	}
> +
> +	/*
> +	 * We have to set these each time because
> +	 * the path strbuf might have been realloc()ed.
> +	 */
> +	iter->base.relative_path =
> +		iter->base.path.buf + iter->levels[0].prefix_len;
> +	iter->base.basename =
> +		iter->base.path.buf + level->prefix_len;
> +	level->dir_state = DIR_STATE_ITER;
> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 4/5] dir_iterator: refactor state machine model
  2017-04-17 20:51 ` [PATCH v9 4/5] dir_iterator: refactor state machine model Daniel Ferreira
@ 2017-04-18  4:35   ` Junio C Hamano
  2017-04-18  5:13   ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2017-04-18  4:35 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> Perform major refactor of dir_iterator_advance(). dir_iterator has
> ceased to rely on a convoluted state machine mechanism of two loops and
> two state variables (level.initialized and level.dir_state). This serves
> to ease comprehension of the iterator mechanism and ease addition of new
> features to the iterator.

I called 3/5 a "refactoring"; on the other hand, I wouldn't call
this refactoring--it is a major rewrite and/or enhancement.

Drop "convoluted...loops and variables"; instead spend the same
number of bits to explain how updated code is easier to follow.

> Add an option for the dir_iterator API to iterate over the root
> directory (the one it was initialized with) as well.

I had trouble understanding what you wanted to say with "iterate
over the root" because the phrase sounded, at least to me, to mean
"opendir the root directory and repeatedly issue readdir() calls,
returning each entry returned by them", and making that optional did
not make any sense to me.  I had to cheat by looking at the test
script to realize that you meant that the root directory that the
iterator is told to iterate over itself is also included in the
result, as opposed to the usual expectation that iterator would just
iterate over the contents of that directory recursively.  Perhaps

    Add an option for the API to also include the initial directory
    in the result.

or something like that?

By the way, some other topic already took t0065, so I moved your
t0065 over to t0066 while queuing.

> +/*
> + * This function uses a state machine with the following states:
> + * -> DIR_STATE_PUSHED: the directory has been pushed to the
> + * iterator traversal tree.
> + * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
> + * dirpath has already been returned if pre-order traversal is set.
> + * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
> + * through it.
> + * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
> + * We are ready to close it.
> + * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
> + */

These "->" somehow makes it very hard to read; are these meant to be
bullets in a 5-item bulleted list, or do they mean more than that
(e.g. as an arrow, you are trying to say from state PUSHED there is
a transition to PRE_ITERATION, etc.)?

PRE_ITERATION says "*NOT* initialized"; has the directory been
already initialized in PUSHED state?  Instead of saying "*NOT*", it
may make it clearer if you say when it is initialized instead.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-17 20:51 ` [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
  2017-04-18  4:12   ` Junio C Hamano
@ 2017-04-18  4:37   ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2017-04-18  4:37 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> +cat >expect-sorted-output <<-\EOF &&
> +[d] (a) [a] ./dir/a
> +[d] (a/b) [b] ./dir/a/b
> +[d] (a/b/c) [c] ./dir/a/b/c
> +[d] (d) [d] ./dir/d
> +[d] (d/e) [e] ./dir/d/e
> +[d] (d/e/d) [d] ./dir/d/e/d
> +[f] (a/b/c/d) [d] ./dir/a/b/c/d
> +[f] (a/e) [e] ./dir/a/e
> +[f] (b) [b] ./dir/b
> +[f] (c) [c] ./dir/c
> +[f] (d/e/d/a) [a] ./dir/d/e/d/a
> +EOF
> +
> +test_expect_success 'dir-iterator should iterate through all files' '
> +	test-dir-iterator ./dir >out &&
> +	sort <out >./actual-pre-order-sorted-output &&
> +
> +	test_cmp expect-sorted-output actual-pre-order-sorted-output
> +'

Modern tests create these test vectors inside test_expect_success
block, like so:

test_expect_success 'dir-iterator should iterate through all files' '
        cat >expect-sorted-output <<-\EOF &&
        [d] (a) [a] ./dir/a
        [d] (a/b) [b] ./dir/a/b
        [d] (a/b/c) [c] ./dir/a/b/c
        [d] (d) [d] ./dir/d
        [d] (d/e) [e] ./dir/d/e
        [d] (d/e/d) [d] ./dir/d/e/d
        [f] (a/b/c/d) [d] ./dir/a/b/c/d
        [f] (a/e) [e] ./dir/a/e
        [f] (b) [b] ./dir/b
        [f] (c) [c] ./dir/c
        [f] (d/e/d/a) [a] ./dir/d/e/d/a
        EOF

	test-dir-iterator ./dir >out &&
	sort <out >./actual-pre-order-sorted-output &&

	test_cmp expect-sorted-output actual-pre-order-sorted-output
'

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 4/5] dir_iterator: refactor state machine model
  2017-04-17 20:51 ` [PATCH v9 4/5] dir_iterator: refactor state machine model Daniel Ferreira
  2017-04-18  4:35   ` Junio C Hamano
@ 2017-04-18  5:13   ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2017-04-18  5:13 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> ...
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>

I had an impression that you took Michael's code snippet and wrote
this patch, and if that is the case, the flow of the patch is from
Michael to you to me, so these lines are out of order.

Describing which part of this patch owes Michael's contribution in
the log message would also not hurt.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators
  2017-04-18  4:09 ` [PATCH v9 0/5] [GSoC] " Junio C Hamano
@ 2017-04-18 18:45   ` Stefan Beller
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Beller @ 2017-04-18 18:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Daniel Ferreira, git@vger.kernel.org, Duy Nguyen,
	Michael Haggerty

On Mon, Apr 17, 2017 at 9:09 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Daniel Ferreira <bnmvco@gmail.com> writes:
>
>> I think this is the closest to a final version we've ever gotten. I
>> followed all of Michael and Stefan's suggestions on top of v8, and with
>> Michael's endorsement made dir_iterator_begin() return NULL and set
>> errno appropriately in case of an error.
>>
>> On second thought, maybe the extra code complexity required from
>> dir_iterator_begin()'s callers might be actually an advantage as
>> dir_iterator grows to tackle more complex dir traversing challenges on
>> Git. After all, we might want some special behavior depending on what
>> the given `path` is instead of always considering it valid and later
>> behaving as if it was an empty directory.
>>
>> Thanks again for the reviews.
>
> I had a bit of trouble with phrasing here and there, but other than
> that the series was a pleasant read overall.
>
> Will queue, anticipating "Yeah, this is good as the final version"
> comments from reviewers.

yeah, it is.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-04-18 18:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-17 20:51 [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-17 20:51 ` [PATCH v9 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
2017-04-18  4:12   ` Junio C Hamano
2017-04-18  4:37   ` Junio C Hamano
2017-04-17 20:51 ` [PATCH v9 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
2017-04-17 20:51 ` [PATCH v9 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
2017-04-18  4:23   ` Junio C Hamano
2017-04-17 20:51 ` [PATCH v9 4/5] dir_iterator: refactor state machine model Daniel Ferreira
2017-04-18  4:35   ` Junio C Hamano
2017-04-18  5:13   ` Junio C Hamano
2017-04-17 20:51 ` [PATCH v9 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-18  4:09 ` [PATCH v9 0/5] [GSoC] " Junio C Hamano
2017-04-18 18:45   ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).