git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-04-06  1:39 Daniel Ferreira
  2017-04-06  1:39 ` [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Daniel Ferreira @ 2017-04-06  1:39 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

This is the seventh version of a patch series that implements the GSoC
microproject of converting a recursive call to readdir() to use dir_iterator.

v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t
v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t
v7: https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnmvco@gmail.com/T/#t

Travis CI build: https://travis-ci.org/theiostream/git/jobs/219111182

In this version, I applied pretty much all suggestions Michael and Stefan had
for the patch.

Michael, regarding the patch you sent for parsing the arguments on the
test-dir-iterator helper, I did not squash because it'd generate too many
merge conflicts. I just preferred add your code manually. Let me know how I
can properly credit you for it.

The only "controversial" part of this code is how I ended up caching the result
of lstat() on the dir_iterator_level struct. Having to handle the case of the
root directory ended up making set_iterator_data() way more complicated (having
to handle the case of level->stat not being set by push_dir_level()), as well
as required the introduction of st_is_set member in the level struct. This issue
would be solved if we could lstat() the root dir on dir_iterator_begin() and
possibly return an error there. Regardless, I appreciate other suggestions to
make this less complex.

Daniel Ferreira (5):
  dir_iterator: add tests for dir_iterator API
  remove_subtree(): test removing nested directories
  dir_iterator: add helpers to dir_iterator_advance
  dir_iterator: refactor state machine model
  remove_subtree(): reimplement using iterators

 Makefile                        |   1 +
 dir-iterator.c                  | 247 +++++++++++++++++++++++++++++-----------
 dir-iterator.h                  |  35 ++++--
 entry.c                         |  39 +++----
 refs/files-backend.c            |   2 +-
 t/helper/.gitignore             |   1 +
 t/helper/test-dir-iterator.c    |  48 ++++++++
 t/t0065-dir-iterator.sh         |  93 +++++++++++++++
 t/t2000-checkout-cache-clash.sh |  11 ++
 9 files changed, 373 insertions(+), 104 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

--
2.7.4 (Apple Git-66)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-06  1:39 ` Daniel Ferreira
  2017-04-13  0:42   ` Stefan Beller
  2017-04-06  1:39 ` [PATCH v8 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Daniel Ferreira @ 2017-04-06  1:39 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir_iterator.

Create t/t0065-dir-iterator.sh, which tests that dir_iterator does
iterate through a whole directory tree.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 Makefile                     |  1 +
 t/helper/.gitignore          |  1 +
 t/helper/test-dir-iterator.c | 29 +++++++++++++++++++++++
 t/t0065-dir-iterator.sh      | 55 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 86 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

diff --git a/Makefile b/Makefile
index 9b36068..41ce9ab 100644
--- a/Makefile
+++ b/Makefile
@@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-dir-iterator
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 758ed2e..a7d74d3 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-dir-iterator
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000..0394a13
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,29 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+int cmd_main(int argc, const char **argv) {
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2)
+		die("BUG: test-dir-iterator needs one argument");
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else if (S_ISREG(diter->st.st_mode))
+			printf("[f] ");
+		else
+			printf("[?] ");
+
+		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename, diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
new file mode 100755
index 0000000..b6d76dd
--- /dev/null
+++ b/t/t0065-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test directory iteration.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	mkdir -p dir2/a/b/c/ &&
+	>dir2/a/b/c/d
+'
+
+cat >expect-sorted-output <<-\EOF &&
+[d] (a) [a] ./dir/a
+[d] (a/b) [b] ./dir/a/b
+[d] (a/b/c) [c] ./dir/a/b/c
+[d] (d) [d] ./dir/d
+[d] (d/e) [e] ./dir/d/e
+[d] (d/e/d) [d] ./dir/d/e/d
+[f] (a/b/c/d) [d] ./dir/a/b/c/d
+[f] (a/e) [e] ./dir/a/e
+[f] (b) [b] ./dir/b
+[f] (c) [c] ./dir/c
+[f] (d/e/d/a) [a] ./dir/d/e/d/a
+EOF
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	test-dir-iterator ./dir >out &&
+	sort <out >./actual-pre-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-pre-order-sorted-output
+'
+
+cat >expect-pre-order-output <<-\EOF &&
+[d] (a) [a] ./dir2/a
+[d] (a/b) [b] ./dir2/a/b
+[d] (a/b/c) [c] ./dir2/a/b/c
+[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+EOF
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	test-dir-iterator ./dir2 >actual-pre-order-output &&
+
+	test_cmp expect-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 2/5] remove_subtree(): test removing nested directories
  2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-06  1:39 ` [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-06  1:39 ` Daniel Ferreira
  2017-04-06  1:39 ` [PATCH v8 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Daniel Ferreira @ 2017-04-06  1:39 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 t/t2000-checkout-cache-clash.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
 	git checkout-index -a -f --prefix=there/
 '
 
+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+	echo content >path &&
+	git update-index --add path &&
+	rm path &&
+	mkdir -p path/with/nested/paths &&
+	echo content >path/file1 &&
+	echo content >path/with/nested/paths/file2 &&
+	git checkout-index -f -a &&
+	test ! -d path
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 3/5] dir_iterator: add helpers to dir_iterator_advance
  2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-06  1:39 ` [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
  2017-04-06  1:39 ` [PATCH v8 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
@ 2017-04-06  1:39 ` Daniel Ferreira
  2017-04-06  1:39 ` [PATCH v8 4/5] dir_iterator: refactor state machine model Daniel Ferreira
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Daniel Ferreira @ 2017-04-06  1:39 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create helpers to dir_iterator_advance(). Make dir_iterator_advance()'s
code more legible and allow some behavior to be reusable.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 65 +++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..9e073a0 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	level->dir_state = DIR_STATE_RECURSE;
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+		   iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+	level->initialized = 0;
+}
+
+static int pop_dir_level(struct dir_iterator_int *iter)
+{
+	return --iter->levels_nr;
+}
+
+static int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+				iter->base.path.buf,
+				strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * We have to set these each time because
+	 * the path strbuf might have been realloc()ed.
+	 */
+	iter->base.relative_path =
+		iter->base.path.buf + iter->levels[0].prefix_len;
+	iter->base.basename =
+		iter->base.path.buf + level->prefix_len;
+	level->dir_state = DIR_STATE_ITER;
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				 * over; now prepare to iterate into
 				 * it.
 				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
+				push_dir_level(iter, level);
 				continue;
 			} else {
 				/*
@@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			 * This level is exhausted (or wasn't opened
 			 * successfully); pop up a level.
 			 */
-			if (--iter->levels_nr == 0)
+			if (pop_dir_level(iter) == 0)
 				return dir_iterator_abort(dir_iterator);
 
 			continue;
@@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 						iter->base.path.buf, strerror(errno));
 
 				level->dir = NULL;
-				if (--iter->levels_nr == 0)
+				if (pop_dir_level(iter) == 0)
 					return dir_iterator_abort(dir_iterator);
 				break;
 			}
@@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
-				continue;
-			}
 
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
+			if (set_iterator_data(iter, level))
+				continue;
 
 			return ITER_OK;
 		}
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 4/5] dir_iterator: refactor state machine model
  2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 preceding siblings ...)
  2017-04-06  1:39 ` [PATCH v8 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
@ 2017-04-06  1:39 ` Daniel Ferreira
  2017-04-17  6:52   ` Michael Haggerty
  2017-04-06  1:39 ` [PATCH v8 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Daniel Ferreira @ 2017-04-06  1:39 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Perform major refactor of dir_iterator_advance(). dir_iterator has
ceased to rely on a convoluted state machine mechanism of two loops and
two state variables (level.initialized and level.dir_state). This serves
to ease comprehension of the iterator mechanism and ease addition of new
features to the iterator.

Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18). This is useful for
recursively removing a directory and calling rmdir() on a directory only
after all of its contents have been wiped.

Add an option for the dir_iterator API to iterate over the root
directory (the one it was initialized with) as well.

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned new features to be enabled. The new default behavior
(i.e. flags set to 0) does not iterate over directories. Flag
DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
iterates over the root directory. These flags do not conflict with each
other and may be used simultaneously.

Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
the flags parameter introduced.

Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
test "post-order" and "iterate-over-root" modes.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c               | 216 ++++++++++++++++++++++++++++++-------------
 dir-iterator.h               |  35 +++++--
 refs/files-backend.c         |   2 +-
 t/helper/test-dir-iterator.c |  27 +++++-
 t/t0065-dir-iterator.sh      |  44 ++++++++-
 5 files changed, 245 insertions(+), 79 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 9e073a0..4c919d1 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -20,9 +18,20 @@ struct dir_iterator_level {
 	 * iteration and also iterated into):
 	 */
 	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
+		DIR_STATE_PUSHED,
+		DIR_STATE_PRE_ITERATION,
+		DIR_STATE_ITERATING,
+		DIR_STATE_POST_ITERATION,
+		DIR_STATE_EXHAUSTED
 	} dir_state;
+
+	/*
+	 * The stat structure for the directory this level represents.
+	 * It comes with a st_is_set flag which indicates whether it is,
+	 * in fact, set.
+	 */
+	unsigned st_is_set : 1;
+	struct stat st;
 };
 
 /*
@@ -48,15 +57,23 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Holds the flags passed to dir_iterator_begin(). */
+	unsigned flags;
 };
 
-static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+static void push_dir_level(struct dir_iterator_int *iter,
+		struct dir_iterator_level *level, struct stat *st)
 {
-	level->dir_state = DIR_STATE_RECURSE;
 	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
 		   iter->levels_alloc);
+
+	/* Push a new level */
 	level = &iter->levels[iter->levels_nr++];
-	level->initialized = 0;
+	level->dir = NULL;
+	level->dir_state = DIR_STATE_PUSHED;
+	level->st_is_set = 1;
+	level->st = *st;
 }
 
 static int pop_dir_level(struct dir_iterator_int *iter)
@@ -64,29 +81,73 @@ static int pop_dir_level(struct dir_iterator_int *iter)
 	return --iter->levels_nr;
 }
 
-static int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+static int set_iterator_data(struct dir_iterator_int *iter,
+		struct dir_iterator_level *level)
 {
-	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+	char *last_path_component;
+
+	if (level->dir_state != DIR_STATE_ITERATING && level->st_is_set) {
+		iter->base.st = level->st;
+	} else if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
 		if (errno != ENOENT)
 			warning("error reading path '%s': %s",
 				iter->base.path.buf,
 				strerror(errno));
+		else if (!level->st_is_set) {
+			/*
+			 * When we are iterating over the root directory, we
+			 * are forced to initialize the level stat here, since
+			 * we cannot lstat() on dir_iterator_begin() before pushing
+			 * the root level.
+			 */
+			level->st = iter->base.st;
+			level->st_is_set = 1;
+		}
+
 		return -1;
 	}
 
 	/*
-	 * We have to set these each time because
-	 * the path strbuf might have been realloc()ed.
+	 * Check if we are dealing with the root directory as an
+	 * item that's being iterated through.
 	 */
-	iter->base.relative_path =
-		iter->base.path.buf + iter->levels[0].prefix_len;
-	iter->base.basename =
-		iter->base.path.buf + level->prefix_len;
-	level->dir_state = DIR_STATE_ITER;
+	if (level->dir_state != DIR_STATE_ITERATING &&
+		iter->levels_nr == 1) {
+		iter->base.relative_path = ".";
+
+		/* If we have a path like './dir', we'll get everything
+		 * after the last slash a basename. If we don't find a
+		 * slash (e.g. 'dir'), we return the whole path. */
+		last_path_component = strrchr(iter->base.path.buf, '/');
+		iter->base.basename = last_path_component ?
+			last_path_component + 1 : iter->base.path.buf;
+	} else {
+		iter->base.relative_path =
+			iter->base.path.buf + iter->levels[0].prefix_len;
+
+		if (S_ISDIR(iter->base.st.st_mode))
+			iter->base.basename =
+				iter->base.path.buf + iter->levels[iter->levels_nr - 2].prefix_len;
+		else
+			iter->base.basename =
+				iter->base.path.buf + level->prefix_len;
+	}
 
 	return 0;
 }
 
+/*
+ * This function uses a state machine with the following states:
+ * -> DIR_STATE_PUSHED: the directory has been pushed to the
+ * iterator traversal tree.
+ * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
+ * dirpath has already been returned if pre-order traversal is set.
+ * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
+ * through it.
+ * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
+ * We are ready to close it.
+ * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
+ */
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -97,7 +158,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			&iter->levels[iter->levels_nr - 1];
 		struct dirent *de;
 
-		if (!level->initialized) {
+		if (level->dir_state == DIR_STATE_PUSHED) {
+			level->dir_state = DIR_STATE_PRE_ITERATION;
+
+			if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) {
+				/* We may not want the root directory to be iterated over */
+				if (iter->levels_nr != 1 ||
+					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {
+					/*
+					 * This will only error if we fail to lstat() the
+					 * root directory. In this case, we bail.
+					 */
+					if (set_iterator_data(iter, level)) {
+						level->dir_state = DIR_STATE_EXHAUSTED;
+						continue;
+					}
+
+					return ITER_OK;
+				}
+			}
+		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
 			/*
 			 * Note: dir_iterator_begin() ensures that
 			 * path is not the empty string.
@@ -107,64 +187,35 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			level->prefix_len = iter->base.path.len;
 
 			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
+			if (!level->dir) {
 				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
+				 * This level wasn't opened sucessfully; pretend we
+				 * iterated through it already.
 				 */
-				push_dir_level(iter, level);
+				if (errno != ENOENT) {
+					warning("error opening directory %s: %s",
+						iter->base.path.buf, strerror(errno));
+				}
+
+				level->dir_state = DIR_STATE_POST_ITERATION;
 				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
 			}
-		}
 
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (pop_dir_level(iter) == 0)
-				return dir_iterator_abort(dir_iterator);
-
-			continue;
-		}
-
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
+			level->dir_state = DIR_STATE_ITERATING;
+		} else if (level->dir_state == DIR_STATE_ITERATING) {
 			strbuf_setlen(&iter->base.path, level->prefix_len);
 			errno = 0;
 			de = readdir(level->dir);
 
 			if (!de) {
-				/* This level is exhausted; pop up a level. */
+				/* In case of readdir() error */
 				if (errno) {
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
-				} else if (closedir(level->dir))
-					warning("error closing directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+				}
 
-				level->dir = NULL;
-				if (pop_dir_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
+				level->dir_state = DIR_STATE_POST_ITERATION;
+				continue;
 			}
 
 			if (is_dot_or_dotdot(de->d_name))
@@ -175,7 +226,42 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (set_iterator_data(iter, level))
 				continue;
 
+			if (S_ISDIR(iter->base.st.st_mode)) {
+				push_dir_level(iter, level, &iter->base.st);
+				continue;
+			}
+
 			return ITER_OK;
+		} else if (level->dir_state == DIR_STATE_POST_ITERATION) {
+			if (level->dir != NULL && closedir(level->dir)) {
+				warning("error closing directory %s: %s",
+					iter->base.path.buf, strerror(errno));
+			}
+			level->dir_state = DIR_STATE_EXHAUSTED;
+
+			strbuf_setlen(&iter->base.path, level->prefix_len);
+			/*
+			 * Since we are iterating through the dirpath
+			 * after we have gone through it, we still need
+			 * to get rid of the trailing slash we appended.
+			 */
+			strbuf_strip_suffix(&iter->base.path, "/");
+
+			if (iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) {
+				/* We may not want the root directory to be iterated over */
+				if (iter->levels_nr != 1 ||
+					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {
+					/*
+					 * In this state, set_iterator_data() should never return
+					 * an error.
+					 */
+					set_iterator_data(iter, level);
+					return ITER_OK;
+				}
+			}
+		} else if (level->dir_state == DIR_STATE_EXHAUSTED) {
+			if (pop_dir_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
 		}
 	}
 }
@@ -201,7 +287,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -209,13 +295,17 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 	if (!path || !*path)
 		die("BUG: empty path passed to dir_iterator_begin()");
 
+	iter->flags = flags;
+
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
 	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	iter->levels[0].dir = NULL;
+	iter->levels[0].dir_state = DIR_STATE_PUSHED;
+	iter->levels[0].st_is_set = 0;
 
 	return dir_iterator;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..4d8ab75 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -5,19 +5,17 @@
  * Iterate over a directory tree.
  *
  * Iterate over a directory tree, recursively, including paths of all
- * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * types and hidden paths. Skip "." and ".." entries.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, flags);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -38,6 +36,26 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Possible flags for dir_iterator_begin().
+ *
+ * -> DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found before iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found after iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
+ * of the root directory it is iterating through if either
+ * DIR_ITERATOR_PRE_ORDER_TRAVERSAL or DIR_ITERATOR_POST_ORDER_TRAVERSAL
+ * is set.
+ *
+ * All flags can be used in any combination.
+ */
+#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
+#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
+#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -57,15 +75,16 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path, with options specified in
+ * 'flags'. Return a dir_iterator that holds the internal state of
+ * the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
  * path is the starting directory. An internal copy will be made.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 50188e9..c29dc68 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 	files_downcast(ref_store, 0, "reflog_iterator_begin");
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
-	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+	iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL);
 	return ref_iterator;
 }
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 0394a13..9ac3fa7 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,15 +4,34 @@
 #include "dir-iterator.h"
 
 int cmd_main(int argc, const char **argv) {
+	const char **myargv = argv;
+	int myargc = argc;
+
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
+	unsigned flag = 0;
+
+	while (--myargc && starts_with(*++myargv, "--")) {
+		if (!strcmp(*myargv, "--pre-order"))
+			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--post-order"))
+			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--list-root-dir"))
+			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
+		else if (!strcmp(*myargv, "--")) {
+			myargc--;
+			myargv++;
+			break;
+		} else
+		   die("Unrecognized option: %s", *myargv);
+	}
 
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (myargc != 1)
+		die("expected exactly one non-option argument");
+	strbuf_addstr(&path, *myargv);
 
-	diter = dir_iterator_begin(path.buf);
+	diter = dir_iterator_begin(path.buf, flag);
 
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
index b6d76dd..458c57e 100755
--- a/t/t0065-dir-iterator.sh
+++ b/t/t0065-dir-iterator.sh
@@ -33,12 +33,19 @@ cat >expect-sorted-output <<-\EOF &&
 EOF
 
 test_expect_success 'dir-iterator should iterate through all files' '
-	test-dir-iterator ./dir >out &&
+	test-dir-iterator --pre-order ./dir >out &&
 	sort <out >./actual-pre-order-sorted-output &&
 
 	test_cmp expect-sorted-output actual-pre-order-sorted-output
 '
 
+test_expect_success 'dir-iterator should iterate through all files on post-order mode' '
+	test-dir-iterator --post-order ./dir >out &&
+	sort <out >actual-post-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-post-order-sorted-output
+'
+
 cat >expect-pre-order-output <<-\EOF &&
 [d] (a) [a] ./dir2/a
 [d] (a/b) [b] ./dir2/a/b
@@ -46,10 +53,41 @@ cat >expect-pre-order-output <<-\EOF &&
 [f] (a/b/c/d) [d] ./dir2/a/b/c/d
 EOF
 
-test_expect_success 'dir-iterator should list files in the correct order' '
-	test-dir-iterator ./dir2 >actual-pre-order-output &&
+test_expect_success 'dir-iterator should list files properly on pre-order mode' '
+	test-dir-iterator --pre-order ./dir2 >actual-pre-order-output &&
 
 	test_cmp expect-pre-order-output actual-pre-order-output
 '
 
+cat >expect-post-order-output <<-\EOF &&
+[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+[d] (a/b/c) [c] ./dir2/a/b/c
+[d] (a/b) [b] ./dir2/a/b
+[d] (a) [a] ./dir2/a
+EOF
+
+test_expect_success 'dir-iterator should list files properly on post-order mode' '
+	test-dir-iterator --post-order ./dir2 >actual-post-order-output &&
+
+	test_cmp expect-post-order-output actual-post-order-output
+'
+
+cat >expect-pre-order-post-order-root-dir-output <<-\EOF &&
+[d] (.) [dir2] ./dir2
+[d] (a) [a] ./dir2/a
+[d] (a/b) [b] ./dir2/a/b
+[d] (a/b/c) [c] ./dir2/a/b/c
+[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+[d] (a/b/c) [c] ./dir2/a/b/c
+[d] (a/b) [b] ./dir2/a/b
+[d] (a) [a] ./dir2/a
+[d] (.) [dir2] ./dir2
+EOF
+
+test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' '
+	test-dir-iterator --pre-order --post-order --list-root-dir ./dir2 >actual-pre-order-post-order-root-dir-output &&
+
+	test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 5/5] remove_subtree(): reimplement using iterators
  2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (3 preceding siblings ...)
  2017-04-06  1:39 ` [PATCH v8 4/5] dir_iterator: refactor state machine model Daniel Ferreira
@ 2017-04-06  1:39 ` Daniel Ferreira
  2017-04-13  0:06 ` [PATCH v8 0/5] [GSoC] " Daniel Ferreira (theiostream)
  2017-04-17  6:29 ` Michael Haggerty
  6 siblings, 0 replies; 11+ messages in thread
From: Daniel Ferreira @ 2017-04-06  1:39 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 entry.c | 39 +++++++++++++--------------------------
 1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/entry.c b/entry.c
index d2b512d..d543ccf 100644
--- a/entry.c
+++ b/entry.c
@@ -3,6 +3,8 @@
 #include "dir.h"
 #include "streaming.h"
 #include "submodule.h"
+#include "iterator.h"
+#include "dir-iterator.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -45,33 +47,18 @@ static void create_directories(const char *path, int path_len,
 	free(buf);
 }
 
-static void remove_subtree(struct strbuf *path)
+static void remove_subtree(const char *path)
 {
-	DIR *dir = opendir(path->buf);
-	struct dirent *de;
-	int origlen = path->len;
-
-	if (!dir)
-		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
-		strbuf_addch(path, '/');
-		strbuf_addstr(path, de->d_name);
-		if (lstat(path->buf, &st))
-			die_errno("cannot lstat '%s'", path->buf);
-		if (S_ISDIR(st.st_mode))
-			remove_subtree(path);
-		else if (unlink(path->buf))
-			die_errno("cannot unlink '%s'", path->buf);
-		strbuf_setlen(path, origlen);
+	struct dir_iterator *diter = dir_iterator_begin(path,
+		DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode)) {
+			if (rmdir(diter->path.buf))
+				die_errno("cannot rmdir '%s'", diter->path.buf);
+		} else if (unlink(diter->path.buf))
+			die_errno("cannot unlink '%s'", diter->path.buf);
 	}
-	closedir(dir);
-	if (rmdir(path->buf))
-		die_errno("cannot rmdir '%s'", path->buf);
 }
 
 static int create_file(const char *path, unsigned int mode)
@@ -312,7 +299,7 @@ int checkout_entry(struct cache_entry *ce,
 				return 0;
 			if (!state->force)
 				return error("%s is a directory", path.buf);
-			remove_subtree(&path);
+			remove_subtree(path.buf);
 		} else if (unlink(path.buf))
 			return error_errno("unable to unlink old '%s'", path.buf);
 	} else if (state->not_new)
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators
  2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (4 preceding siblings ...)
  2017-04-06  1:39 ` [PATCH v8 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-13  0:06 ` Daniel Ferreira (theiostream)
  2017-04-13  0:50   ` Stefan Beller
  2017-04-17  6:29 ` Michael Haggerty
  6 siblings, 1 reply; 11+ messages in thread
From: Daniel Ferreira (theiostream) @ 2017-04-13  0:06 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Junio C Hamano, Stefan Beller, Duy Nguyen, Michael Haggerty,
	Daniel Ferreira

Hey there! I'm sorry for bothering you, but any chance you might have
overlooked this patch for a review? (I'm just not familiar with how
long patches usually take to be reviewed, and since I always got an
answer within two days of sending it I wondered if you may have just
not noticed it.)

-- Daniel.

On Wed, Apr 5, 2017 at 10:39 PM, Daniel Ferreira <bnmvco@gmail.com> wrote:
> This is the seventh version of a patch series that implements the GSoC
> microproject of converting a recursive call to readdir() to use dir_iterator.
>
> v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
> v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
> v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t
> v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
> v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
> v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t
> v7: https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnmvco@gmail.com/T/#t
>
> Travis CI build: https://travis-ci.org/theiostream/git/jobs/219111182
>
> In this version, I applied pretty much all suggestions Michael and Stefan had
> for the patch.
>
> Michael, regarding the patch you sent for parsing the arguments on the
> test-dir-iterator helper, I did not squash because it'd generate too many
> merge conflicts. I just preferred add your code manually. Let me know how I
> can properly credit you for it.
>
> The only "controversial" part of this code is how I ended up caching the result
> of lstat() on the dir_iterator_level struct. Having to handle the case of the
> root directory ended up making set_iterator_data() way more complicated (having
> to handle the case of level->stat not being set by push_dir_level()), as well
> as required the introduction of st_is_set member in the level struct. This issue
> would be solved if we could lstat() the root dir on dir_iterator_begin() and
> possibly return an error there. Regardless, I appreciate other suggestions to
> make this less complex.
>
> Daniel Ferreira (5):
>   dir_iterator: add tests for dir_iterator API
>   remove_subtree(): test removing nested directories
>   dir_iterator: add helpers to dir_iterator_advance
>   dir_iterator: refactor state machine model
>   remove_subtree(): reimplement using iterators
>
>  Makefile                        |   1 +
>  dir-iterator.c                  | 247 +++++++++++++++++++++++++++++-----------
>  dir-iterator.h                  |  35 ++++--
>  entry.c                         |  39 +++----
>  refs/files-backend.c            |   2 +-
>  t/helper/.gitignore             |   1 +
>  t/helper/test-dir-iterator.c    |  48 ++++++++
>  t/t0065-dir-iterator.sh         |  93 +++++++++++++++
>  t/t2000-checkout-cache-clash.sh |  11 ++
>  9 files changed, 373 insertions(+), 104 deletions(-)
>  create mode 100644 t/helper/test-dir-iterator.c
>  create mode 100755 t/t0065-dir-iterator.sh
>
> --
> 2.7.4 (Apple Git-66)
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-06  1:39 ` [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-13  0:42   ` Stefan Beller
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2017-04-13  0:42 UTC (permalink / raw)
  To: Daniel Ferreira
  Cc: git@vger.kernel.org, Junio C Hamano, Duy Nguyen, Michael Haggerty

On Wed, Apr 5, 2017 at 6:39 PM, Daniel Ferreira <bnmvco@gmail.com> wrote:

> +int cmd_main(int argc, const char **argv) {

Style (minor nit, in case resend is needed):

We only put the { brace on the same line for statements
(if, for, while, switch), not for function declarations.

I tried to find a good rationale for it, this is the most sufficient I
could find
https://www.kernel.org/doc/Documentation/process/coding-style.rst
(section 3):

    Heretic people all over the world have claimed that this inconsistency
    is ...  well ...  inconsistent, but all right-thinking people know that
    (a) K&R are **right** and (b) K&R are right.  Besides, functions are
    special anyway (you can't nest them in C).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators
  2017-04-13  0:06 ` [PATCH v8 0/5] [GSoC] " Daniel Ferreira (theiostream)
@ 2017-04-13  0:50   ` Stefan Beller
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2017-04-13  0:50 UTC (permalink / raw)
  To: Daniel Ferreira (theiostream)
  Cc: Git Mailing List, Junio C Hamano, Duy Nguyen, Michael Haggerty

On Wed, Apr 12, 2017 at 5:06 PM, Daniel Ferreira (theiostream)
<bnmvco@gmail.com> wrote:
> Hey there! I'm sorry for bothering you, but any chance you might have
> overlooked this patch for a review?

Sort of. I read through them and found no issue back then, but did
not reply anticipating someone else would .

>  (I'm just not familiar with how
> long patches usually take to be reviewed, and since I always got an
> answer within two days of sending it I wondered if you may have just
> not noticed it.)

I reviewed it again and found just a very minor nit, no need for a resend
if that is the only nit.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators
  2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (5 preceding siblings ...)
  2017-04-13  0:06 ` [PATCH v8 0/5] [GSoC] " Daniel Ferreira (theiostream)
@ 2017-04-17  6:29 ` Michael Haggerty
  6 siblings, 0 replies; 11+ messages in thread
From: Michael Haggerty @ 2017-04-17  6:29 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 04/06/2017 03:39 AM, Daniel Ferreira wrote:
> This is the seventh version of a patch series that implements the GSoC
> microproject of converting a recursive call to readdir() to use dir_iterator.
> 
> v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
> v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
> v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t
> v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
> v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
> v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t
> v7: https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnmvco@gmail.com/T/#t
> 
> Travis CI build: https://travis-ci.org/theiostream/git/jobs/219111182
> 
> In this version, I applied pretty much all suggestions Michael and Stefan had
> for the patch.
> 
> Michael, regarding the patch you sent for parsing the arguments on the
> test-dir-iterator helper, I did not squash because it'd generate too many
> merge conflicts. I just preferred add your code manually. Let me know how I
> can properly credit you for it.

It's a small enough bit of code that you don't have to credit me.
However, technically you should add a

    Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>

line to your patch that includes my code, which you are allowed to do
because I included that line in the email where I suggested the change [1].

> The only "controversial" part of this code is how I ended up caching the result
> of lstat() on the dir_iterator_level struct. Having to handle the case of the
> root directory ended up making set_iterator_data() way more complicated (having
> to handle the case of level->stat not being set by push_dir_level()), as well
> as required the introduction of st_is_set member in the level struct. This issue
> would be solved if we could lstat() the root dir on dir_iterator_begin() and
> possibly return an error there. Regardless, I appreciate other suggestions to
> make this less complex.

I agree that this adds a lot of code complexity. But I think your
suggestion to move the initial `lstat()` call to `dir_iterator_begin()`
is a good one. It could return NULL on error (being careful not to leak
memory of course). It should be careful to leave `errno` set
appropriately, which would allow the caller to see why it failed (i.e.,
because the directory didn't exist, or the path was not a directory, or
whatever). In that case, probably `dir_iterator_begin()` could call
`push_dir_level()`, eliminating a bit of repeated code at the same time.

This would also fix an anomaly in the current code: currently, if the
top-level "directory" is not a directory, it is nevertheless reported
back to the caller during `DIR_ITERATOR_PRE_ORDER_TRAVERSAL` and *again*
during `DIR_ITERATOR_POST_ORDER_TRAVERSAL`. Instead,
`dir_iterator_begin()` would report an error and the caller would not be
misled.

> Hey there! I'm sorry for bothering you, but any chance you might have
> overlooked this patch for a review? (I'm just not familiar with how
> long patches usually take to be reviewed, and since I always got an
> answer within two days of sending it I wondered if you may have just
> not noticed it.)

It's pretty common for response times to vary wildly in open source
(Junio being a notable exception). People often work on open source in
their spare time, and disappear and reappear without notice based on
what else is going on in their lives. For this reason, it is really
important to submit changes in small batches as early as possible
(especially in GSoC) and continue working on the next batch while
waiting for review on earlier batches. If you find yourself with nothing
else to work on, maybe spend some time reviewing *other* people's
patches, which is also a valuable contribution.

That being said, if you are waiting for a response from particular
person, after a wait it is OK to write them an email asking if they will
have time to get to it (realizing that the answer might be "no").
Probably such an email should be to the person directly, and not CCed to
the mailing list.

Michael

[1]
http://public-inbox.org/git/bfd5d9a07139dbc6eb1fd1dc82ffb38dbbefee1e.1491286711.git.mhagger@alum.mit.edu/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 4/5] dir_iterator: refactor state machine model
  2017-04-06  1:39 ` [PATCH v8 4/5] dir_iterator: refactor state machine model Daniel Ferreira
@ 2017-04-17  6:52   ` Michael Haggerty
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Haggerty @ 2017-04-17  6:52 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 04/06/2017 03:39 AM, Daniel Ferreira wrote:
> Perform major refactor of dir_iterator_advance(). dir_iterator has
> ceased to rely on a convoluted state machine mechanism of two loops and
> two state variables (level.initialized and level.dir_state). This serves
> to ease comprehension of the iterator mechanism and ease addition of new
> features to the iterator.
> 
> Create an option for the dir_iterator API to iterate over subdirectories
> only after having iterated through their contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18). This is useful for
> recursively removing a directory and calling rmdir() on a directory only
> after all of its contents have been wiped.
> 
> Add an option for the dir_iterator API to iterate over the root
> directory (the one it was initialized with) as well.
> 
> Add the "flags" parameter to dir_iterator_create, allowing for the
> aforementioned new features to be enabled. The new default behavior
> (i.e. flags set to 0) does not iterate over directories. Flag
> DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
> so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
> directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
> iterates over the root directory. These flags do not conflict with each
> other and may be used simultaneously.
> 
> Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
> the flags parameter introduced.
> 
> Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
> test "post-order" and "iterate-over-root" modes.
> 
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  dir-iterator.c               | 216 ++++++++++++++++++++++++++++++-------------
>  dir-iterator.h               |  35 +++++--
>  refs/files-backend.c         |   2 +-
>  t/helper/test-dir-iterator.c |  27 +++++-
>  t/t0065-dir-iterator.sh      |  44 ++++++++-
>  5 files changed, 245 insertions(+), 79 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index 9e073a0..4c919d1 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -4,8 +4,6 @@
>  #include "dir-iterator.h"
>  
>  struct dir_iterator_level {
> -	int initialized;
> -
>  	DIR *dir;
>  
>  	/*
> @@ -20,9 +18,20 @@ struct dir_iterator_level {
>  	 * iteration and also iterated into):
>  	 */
>  	enum {
> -		DIR_STATE_ITER,
> -		DIR_STATE_RECURSE
> +		DIR_STATE_PUSHED,
> +		DIR_STATE_PRE_ITERATION,
> +		DIR_STATE_ITERATING,
> +		DIR_STATE_POST_ITERATION,
> +		DIR_STATE_EXHAUSTED
>  	} dir_state;
> +
> +	/*
> +	 * The stat structure for the directory this level represents.
> +	 * It comes with a st_is_set flag which indicates whether it is,
> +	 * in fact, set.
> +	 */
> +	unsigned st_is_set : 1;

In another email I just agreed with you that it would probably be
cleaner to call `lstat()` on the top-level directory from
`dir_iterator_begin()`. But if that weren't an option, rather than
adding a new state variable `st_is_set`, it might be more natural to add
a new value, say `DIR_STATE_INITIALIZING`, to `dir_state`, and do the
special case initialization in the main loop in `dir_iterator_advance()`
rather than in `set_iterator_data()`.

> +	struct stat st;
>  };
>  
>  /*
> @@ -48,15 +57,23 @@ struct dir_iterator_int {
>  	 * that will be included in this iteration.
>  	 */
>  	struct dir_iterator_level *levels;
> +
> +	/* Holds the flags passed to dir_iterator_begin(). */
> +	unsigned flags;
>  };
>  
> -static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
> +static void push_dir_level(struct dir_iterator_int *iter,
> +		struct dir_iterator_level *level, struct stat *st)

The `level` argument is no longer used.

>  {
> -	level->dir_state = DIR_STATE_RECURSE;
>  	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
>  		   iter->levels_alloc);
> +
> +	/* Push a new level */
>  	level = &iter->levels[iter->levels_nr++];
> -	level->initialized = 0;
> +	level->dir = NULL;
> +	level->dir_state = DIR_STATE_PUSHED;
> +	level->st_is_set = 1;
> +	level->st = *st;
>  }
>  
> [...]
> +/*
> + * This function uses a state machine with the following states:
> + * -> DIR_STATE_PUSHED: the directory has been pushed to the
> + * iterator traversal tree.
> + * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
> + * dirpath has already been returned if pre-order traversal is set.
> + * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
> + * through it.
> + * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
> + * We are ready to close it.
> + * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
> + */
>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  {
>  	struct dir_iterator_int *iter =
> @@ -97,7 +158,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			&iter->levels[iter->levels_nr - 1];
>  		struct dirent *de;

`de` is only used in one branch of the `if` statement. Its definition
could be moved there.

>  
> -		if (!level->initialized) {
> +		if (level->dir_state == DIR_STATE_PUSHED) {
> +			level->dir_state = DIR_STATE_PRE_ITERATION;
> +
> +			if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) {
> +				/* We may not want the root directory to be iterated over */
> +				if (iter->levels_nr != 1 ||
> +					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {

These two `if` statements can be combined into one:

			/* We may not want the root directory to be iterated over */
			if ((iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) &&
			    (iter->levels_nr != 1 ||
			     (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
				/* ... */


> +					/*
> +					 * This will only error if we fail to lstat() the
> +					 * root directory. In this case, we bail.
> +					 */
> +					if (set_iterator_data(iter, level)) {
> +						level->dir_state = DIR_STATE_EXHAUSTED;
> +						continue;
> +					}
> +
> +					return ITER_OK;
> +				}
> +			}
> +		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
>  			/*
>  			 * Note: dir_iterator_begin() ensures that
>  			 * path is not the empty string.
> [...]
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 50188e9..c29dc68 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
>  	files_downcast(ref_store, 0, "reflog_iterator_begin");
>  
>  	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
> -	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
> +	iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL);

Actually, this iterator *doesn't* care about directories. If you look up
in `files_reflog_iterator_advance()`, only entries for which
`S_ISREG(diter->st.st_mode)` are processed. So you can set the flag to
0. (You still need to retain the `S_ISREG()` check, though, to exclude
other types of files.)

>  	return ref_iterator;
>  }
>  
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
> index 0394a13..9ac3fa7 100644
> --- a/t/helper/test-dir-iterator.c
> +++ b/t/helper/test-dir-iterator.c
> @@ -4,15 +4,34 @@
>  #include "dir-iterator.h"
>  
>  int cmd_main(int argc, const char **argv) {
> +	const char **myargv = argv;
> +	int myargc = argc;
> +
>  	struct strbuf path = STRBUF_INIT;
>  	struct dir_iterator *diter;
>  
> -	if (argc < 2)
> -		die("BUG: test-dir-iterator needs one argument");
> +	unsigned flag = 0;
> +
> +	while (--myargc && starts_with(*++myargv, "--")) {
> +		if (!strcmp(*myargv, "--pre-order"))
> +			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
> +		else if (!strcmp(*myargv, "--post-order"))
> +			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
> +		else if (!strcmp(*myargv, "--list-root-dir"))
> +			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
> +		else if (!strcmp(*myargv, "--")) {
> +			myargc--;
> +			myargv++;
> +			break;
> +		} else
> +		   die("Unrecognized option: %s", *myargv);

The indentation above is wrong.

> +	}
>  
> -	strbuf_add(&path, argv[1], strlen(argv[1]));
> +	if (myargc != 1)
> +		die("expected exactly one non-option argument");
> +	strbuf_addstr(&path, *myargv);
>  
> -	diter = dir_iterator_begin(path.buf);
> +	diter = dir_iterator_begin(path.buf, flag);
>  
>  	while (dir_iterator_advance(diter) == ITER_OK) {
>  		if (S_ISDIR(diter->st.st_mode))
> [...]

Michael


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-17  6:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06  1:39 [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-06  1:39 ` [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
2017-04-13  0:42   ` Stefan Beller
2017-04-06  1:39 ` [PATCH v8 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
2017-04-06  1:39 ` [PATCH v8 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
2017-04-06  1:39 ` [PATCH v8 4/5] dir_iterator: refactor state machine model Daniel Ferreira
2017-04-17  6:52   ` Michael Haggerty
2017-04-06  1:39 ` [PATCH v8 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-13  0:06 ` [PATCH v8 0/5] [GSoC] " Daniel Ferreira (theiostream)
2017-04-13  0:50   ` Stefan Beller
2017-04-17  6:29 ` Michael Haggerty

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).