git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-04-19 13:14 Daniel Ferreira
  2017-04-19 13:14 ` [PATCH v10 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Daniel Ferreira @ 2017-04-19 13:14 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

This is the tenth version of a patch series that implements the GSoC
microproject of converting a recursive call to readdir() to use
dir_iterator.

v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t
v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t
v7: https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnmvco@gmail.com/T/#t
v8: https://public-inbox.org/git/a60b2ed6-2b99-b134-05af-7c8492a6949c@alum.mit.edu/T/#t
v9: https://public-inbox.org/git/CAGZ79kaBRS0SFAvrV4mN7-mVk+8QmPKPJMD55zPQ+A14ZzYFYA@mail.gmail.com/T/#me8988b7dd4adbc4ea24946ccb24fc1cf7baf44e3

Travis CI build: https://travis-ci.org/theiostream/git/builds/223542902

I do not know if "queuing" meant I did not have to change this series
any further (specially after Stefan's "ok"), but anyway, this series
applies Junio's corrections back from v9, that were mostly regarding
commit messages or style. I hope I got them right.

The only point I was in doubt was about Michael's signoff. Actually, he
gave it not regarding the snippet for the new dir_iterator_advance()
logic, but to a small piece of actual code he wrote on the new dir
iterator test helper[1]. Thus I don't know whether this would require his
signoff to come before or after mine. Regardless, proper credit has
been given in the commit message, as suggested. In the end, I kept
his before mine, but I suppose that can be adjusted by Junio if
necessary.

I also didn't get whether I myself should have renamed t0065 to t0066
given the other queued patch. I kept it as t0065 since I figured it
would be weird for this patch as a unit to "skip" a number.

Once again, thanks for all the time invested in the reviews for this
patch.

[1]: https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnmvco@gmail.com/T/#m187b9e681e3369862ccc6083bbf6596cd2e19cd4

Daniel Ferreira (5):
  dir_iterator: add tests for dir_iterator API
  remove_subtree(): test removing nested directories
  dir_iterator: refactor dir_iterator_advance
  dir_iterator: rewrite state machine model
  remove_subtree(): reimplement using iterators

 Makefile                        |   1 +
 dir-iterator.c                  | 252 +++++++++++++++++++++++++++++-----------
 dir-iterator.h                  |  35 ++++--
 entry.c                         |  42 +++----
 refs/files-backend.c            |  51 +++++---
 t/helper/.gitignore             |   1 +
 t/helper/test-dir-iterator.c    |  53 +++++++++
 t/t0065-dir-iterator.sh         | 111 ++++++++++++++++++
 t/t2000-checkout-cache-clash.sh |  11 ++
 9 files changed, 433 insertions(+), 124 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

--
2.7.4 (Apple Git-66)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v10 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-19 13:14 [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-19 13:14 ` Daniel Ferreira
  2017-04-19 13:14 ` [PATCH v10 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Daniel Ferreira @ 2017-04-19 13:14 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir_iterator.

Create t/t0065-dir-iterator.sh, which tests that dir_iterator does
iterate through a whole directory tree.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 Makefile                     |  1 +
 t/helper/.gitignore          |  1 +
 t/helper/test-dir-iterator.c | 30 ++++++++++++++++++++++++
 t/t0065-dir-iterator.sh      | 55 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 87 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

diff --git a/Makefile b/Makefile
index d595ea3..a5c1ac0 100644
--- a/Makefile
+++ b/Makefile
@@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-dir-iterator
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 758ed2e..a7d74d3 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-dir-iterator
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000..a7d1470
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,30 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+int cmd_main(int argc, const char **argv)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2)
+		die("BUG: test-dir-iterator needs one argument");
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else if (S_ISREG(diter->st.st_mode))
+			printf("[f] ");
+		else
+			printf("[?] ");
+
+		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename, diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
new file mode 100755
index 0000000..46e5ce5
--- /dev/null
+++ b/t/t0065-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test directory iteration.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	mkdir -p dir2/a/b/c/ &&
+	>dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	cat >expect-sorted-output <<-\EOF &&
+	[d] (a) [a] ./dir/a
+	[d] (a/b) [b] ./dir/a/b
+	[d] (a/b/c) [c] ./dir/a/b/c
+	[d] (d) [d] ./dir/d
+	[d] (d/e) [e] ./dir/d/e
+	[d] (d/e/d) [d] ./dir/d/e/d
+	[f] (a/b/c/d) [d] ./dir/a/b/c/d
+	[f] (a/e) [e] ./dir/a/e
+	[f] (b) [b] ./dir/b
+	[f] (c) [c] ./dir/c
+	[f] (d/e/d/a) [a] ./dir/d/e/d/a
+	EOF
+
+	test-dir-iterator ./dir >out &&
+	sort <out >./actual-pre-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-pre-order-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	cat >expect-pre-order-output <<-\EOF &&
+	[d] (a) [a] ./dir2/a
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	EOF
+
+	test-dir-iterator ./dir2 >actual-pre-order-output &&
+
+	test_cmp expect-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 2/5] remove_subtree(): test removing nested directories
  2017-04-19 13:14 [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-19 13:14 ` [PATCH v10 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-19 13:14 ` Daniel Ferreira
  2017-04-19 13:14 ` [PATCH v10 3/5] dir_iterator: refactor dir_iterator_advance Daniel Ferreira
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Daniel Ferreira @ 2017-04-19 13:14 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 t/t2000-checkout-cache-clash.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
 	git checkout-index -a -f --prefix=there/
 '
 
+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+	echo content >path &&
+	git update-index --add path &&
+	rm path &&
+	mkdir -p path/with/nested/paths &&
+	echo content >path/file1 &&
+	echo content >path/with/nested/paths/file2 &&
+	git checkout-index -f -a &&
+	test ! -d path
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 3/5] dir_iterator: refactor dir_iterator_advance
  2017-04-19 13:14 [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-19 13:14 ` [PATCH v10 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
  2017-04-19 13:14 ` [PATCH v10 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
@ 2017-04-19 13:14 ` Daniel Ferreira
  2017-04-19 13:14 ` [PATCH v10 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Daniel Ferreira @ 2017-04-19 13:14 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Factor out reusable helpers out of dir_iterator_advance(). Make
dir_iterator_advance()'s code more legible and allow some behavior to
be reusable.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 66 ++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..d168cb2 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,44 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	level->dir_state = DIR_STATE_RECURSE;
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+		   iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+	level->initialized = 0;
+}
+
+static int pop_dir_level(struct dir_iterator_int *iter)
+{
+	return --iter->levels_nr;
+}
+
+static int adjust_iterator_data(struct dir_iterator_int *iter,
+		struct dir_iterator_level *level)
+{
+	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+				iter->base.path.buf,
+				strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * We have to set these each time because
+	 * the path strbuf might have been realloc()ed.
+	 */
+	iter->base.relative_path =
+		iter->base.path.buf + iter->levels[0].prefix_len;
+	iter->base.basename =
+		iter->base.path.buf + level->prefix_len;
+	level->dir_state = DIR_STATE_ITER;
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -84,11 +122,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				 * over; now prepare to iterate into
 				 * it.
 				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
+				push_dir_level(iter, level);
 				continue;
 			} else {
 				/*
@@ -104,7 +138,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			 * This level is exhausted (or wasn't opened
 			 * successfully); pop up a level.
 			 */
-			if (--iter->levels_nr == 0)
+			if (pop_dir_level(iter) == 0)
 				return dir_iterator_abort(dir_iterator);
 
 			continue;
@@ -129,7 +163,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 						iter->base.path.buf, strerror(errno));
 
 				level->dir = NULL;
-				if (--iter->levels_nr == 0)
+				if (pop_dir_level(iter) == 0)
 					return dir_iterator_abort(dir_iterator);
 				break;
 			}
@@ -138,23 +172,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
-				continue;
-			}
 
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
+			if (adjust_iterator_data(iter, level))
+				continue;
 
 			return ITER_OK;
 		}
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 4/5] dir_iterator: rewrite state machine model
  2017-04-19 13:14 [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 preceding siblings ...)
  2017-04-19 13:14 ` [PATCH v10 3/5] dir_iterator: refactor dir_iterator_advance Daniel Ferreira
@ 2017-04-19 13:14 ` Daniel Ferreira
  2017-04-20  3:30   ` Junio C Hamano
  2017-04-20  9:36   ` Michael Haggerty
  2017-04-19 13:14 ` [PATCH v10 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-20  3:23 ` [PATCH v10 0/5] [GSoC] " Junio C Hamano
  5 siblings, 2 replies; 9+ messages in thread
From: Daniel Ferreira @ 2017-04-19 13:14 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Perform a rewrite of dir_iterator_advance(). dir_iterator has
ceased to rely on a combination of level.initialized and level.dir_state
state variables and now only tracks the state with level.dir_state,
which simplifies the iterator mechanism, makes the code easier to follow
and eases additions of new features to the iterator.

Make dir_iterator_begin() attempt to lstat() the path it receives, and
return NULL and an appropriate errno if it fails or if the passed path
was not a directory.

Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18). This is useful for
recursively removing a directory and calling rmdir() on a directory only
after all of its contents have been wiped.

Add an option for dir_iterator to also iterate over the initial
directory (the one passed to dir_iterator_begin()).

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned new features to be enabled. The new default behavior
(i.e. flags set to 0) does not iterate over directories. Flag
DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
iterates over the initial directory. These flags do not conflict with
each other and may be used simultaneously.

Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
the flags parameter introduced, as well as handle the case in which it
fails to open the directory.

Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
test "post-order" and "iterate-over-root" modes.

Michael Haggerty contributed with the design of the new
dir_iterator_advance() implementation, the code for
t/helper/test-dir-iterator's option parser and numerous reviews that
gradually shaped this code to its current form.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c               | 220 ++++++++++++++++++++++++++++++-------------
 dir-iterator.h               |  35 +++++--
 refs/files-backend.c         |  51 ++++++----
 t/helper/test-dir-iterator.c |  31 +++++-
 t/t0065-dir-iterator.sh      |  94 ++++++++++++++----
 5 files changed, 316 insertions(+), 115 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index d168cb2..3c20e0e 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -20,9 +18,15 @@ struct dir_iterator_level {
 	 * iteration and also iterated into):
 	 */
 	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
+		DIR_STATE_PUSHED,
+		DIR_STATE_PRE_ITERATION,
+		DIR_STATE_ITERATING,
+		DIR_STATE_POST_ITERATION,
+		DIR_STATE_EXHAUSTED
 	} dir_state;
+
+	/* The stat structure for the directory this level represents. */
+	struct stat st;
 };
 
 /*
@@ -48,15 +52,23 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Holds the flags passed to dir_iterator_begin(). */
+	unsigned flags;
 };
 
-static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+static void push_dir_level(struct dir_iterator_int *iter, struct stat *st)
 {
-	level->dir_state = DIR_STATE_RECURSE;
+	struct dir_iterator_level *level;
+
 	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
 		   iter->levels_alloc);
+
+	/* Push a new level */
 	level = &iter->levels[iter->levels_nr++];
-	level->initialized = 0;
+	level->dir = NULL;
+	level->dir_state = DIR_STATE_PUSHED;
+	level->st = *st;
 }
 
 static int pop_dir_level(struct dir_iterator_int *iter)
@@ -67,7 +79,11 @@ static int pop_dir_level(struct dir_iterator_int *iter)
 static int adjust_iterator_data(struct dir_iterator_int *iter,
 		struct dir_iterator_level *level)
 {
-	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+	char *last_path_component;
+
+	if (level->dir_state != DIR_STATE_ITERATING) {
+		iter->base.st = level->st;
+	} else if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
 		if (errno != ENOENT)
 			warning("error reading path '%s': %s",
 				iter->base.path.buf,
@@ -76,18 +92,48 @@ static int adjust_iterator_data(struct dir_iterator_int *iter,
 	}
 
 	/*
-	 * We have to set these each time because
-	 * the path strbuf might have been realloc()ed.
+	 * Check if we are dealing with the root directory as an
+	 * item that's being iterated through.
 	 */
-	iter->base.relative_path =
-		iter->base.path.buf + iter->levels[0].prefix_len;
-	iter->base.basename =
-		iter->base.path.buf + level->prefix_len;
-	level->dir_state = DIR_STATE_ITER;
+	if (level->dir_state != DIR_STATE_ITERATING &&
+		iter->levels_nr == 1) {
+		iter->base.relative_path = ".";
+
+		/*
+		 * If we have a path like './dir', we'll get everything
+		 * after the last slash a basename. If we don't find a
+		 * slash (e.g. 'dir'), we return the whole path.
+		 */
+		last_path_component = strrchr(iter->base.path.buf, '/');
+		iter->base.basename = last_path_component ?
+			last_path_component + 1 : iter->base.path.buf;
+	} else {
+		iter->base.relative_path =
+			iter->base.path.buf + iter->levels[0].prefix_len;
+
+		if (S_ISDIR(iter->base.st.st_mode))
+			iter->base.basename =
+				iter->base.path.buf + iter->levels[iter->levels_nr - 2].prefix_len;
+		else
+			iter->base.basename =
+				iter->base.path.buf + level->prefix_len;
+	}
 
 	return 0;
 }
 
+/*
+ * This function uses a state machine with the following states:
+ * * DIR_STATE_PUSHED: the directory has been pushed to the
+ * iterator traversal tree.
+ * * DIR_STATE_PRE_ITERATION: the directory is not opened with opendir(). The
+ * dirpath has already been returned if pre-order traversal is set.
+ * * DIR_STATE_ITERATING: the directory is initialized. We are traversing
+ * through it.
+ * * DIR_STATE_POST_ITERATION: the directory has been iterated through.
+ * We are ready to close it.
+ * * DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
+ */
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -96,9 +142,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 	while (1) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
-		struct dirent *de;
 
-		if (!level->initialized) {
+		if (level->dir_state == DIR_STATE_PUSHED) {
+			level->dir_state = DIR_STATE_PRE_ITERATION;
+
+			/* We may not want the root directory to be iterated over */
+			if ((iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) && (
+				iter->levels_nr != 1 ||
+				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
+				/*
+				 * This will only error if we fail to lstat() the
+				 * root directory. In this case, we bail.
+				 */
+				if (adjust_iterator_data(iter, level)) {
+					level->dir_state = DIR_STATE_EXHAUSTED;
+					continue;
+				}
+
+				return ITER_OK;
+			}
+		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
 			/*
 			 * Note: dir_iterator_begin() ensures that
 			 * path is not the empty string.
@@ -108,64 +171,37 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			level->prefix_len = iter->base.path.len;
 
 			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
+			if (!level->dir) {
 				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
+				 * This level wasn't opened sucessfully; pretend we
+				 * iterated through it already.
 				 */
-				push_dir_level(iter, level);
+				if (errno != ENOENT) {
+					warning("error opening directory %s: %s",
+						iter->base.path.buf, strerror(errno));
+				}
+
+				level->dir_state = DIR_STATE_POST_ITERATION;
 				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
 			}
-		}
 
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (pop_dir_level(iter) == 0)
-				return dir_iterator_abort(dir_iterator);
+			level->dir_state = DIR_STATE_ITERATING;
+		} else if (level->dir_state == DIR_STATE_ITERATING) {
+			struct dirent *de;
 
-			continue;
-		}
-
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
 			strbuf_setlen(&iter->base.path, level->prefix_len);
 			errno = 0;
 			de = readdir(level->dir);
 
 			if (!de) {
-				/* This level is exhausted; pop up a level. */
+				/* In case of readdir() error */
 				if (errno) {
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
-				} else if (closedir(level->dir))
-					warning("error closing directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+				}
 
-				level->dir = NULL;
-				if (pop_dir_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
+				level->dir_state = DIR_STATE_POST_ITERATION;
+				continue;
 			}
 
 			if (is_dot_or_dotdot(de->d_name))
@@ -176,7 +212,41 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (adjust_iterator_data(iter, level))
 				continue;
 
+			if (S_ISDIR(iter->base.st.st_mode)) {
+				push_dir_level(iter, &iter->base.st);
+				continue;
+			}
+
 			return ITER_OK;
+		} else if (level->dir_state == DIR_STATE_POST_ITERATION) {
+			if (level->dir != NULL && closedir(level->dir)) {
+				warning("error closing directory %s: %s",
+					iter->base.path.buf, strerror(errno));
+			}
+			level->dir_state = DIR_STATE_EXHAUSTED;
+
+			strbuf_setlen(&iter->base.path, level->prefix_len);
+			/*
+			 * Since we are iterating through the dirpath
+			 * after we have gone through it, we still need
+			 * to get rid of the trailing slash we appended.
+			 */
+			strbuf_strip_suffix(&iter->base.path, "/");
+
+			/* We may not want the root directory to be iterated over */
+			if ((iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) && (
+				iter->levels_nr != 1 ||
+				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
+				/*
+				 * In this state, adjust_iterator_data() should never return
+				 * an error.
+				 */
+				adjust_iterator_data(iter, level);
+				return ITER_OK;
+			}
+		} else if (level->dir_state == DIR_STATE_EXHAUSTED) {
+			if (pop_dir_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
 		}
 	}
 }
@@ -202,21 +272,41 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
-	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
-	struct dir_iterator *dir_iterator = &iter->base;
+	struct dir_iterator_int *iter;
+	struct dir_iterator *dir_iterator;
+	struct stat st;
 
 	if (!path || !*path)
 		die("BUG: empty path passed to dir_iterator_begin()");
 
+	if (lstat(path, &st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+					path, strerror(errno));
+
+		return NULL;
+	}
+
+	if (!S_ISDIR(st.st_mode)) {
+		errno = ENOTDIR;
+		return NULL;
+	}
+
+	iter = xcalloc(1, sizeof(*iter));
+	dir_iterator = &iter->base;
+
+	iter->flags = flags;
+	dir_iterator->st = st;
+
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+	iter->levels_nr = 0;
 
-	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	push_dir_level(iter, &dir_iterator->st);
 
 	return dir_iterator;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..a21e845 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -5,19 +5,17 @@
  * Iterate over a directory tree.
  *
  * Iterate over a directory tree, recursively, including paths of all
- * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * types and hidden paths. Skip "." and ".." entries.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, flags);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -38,6 +36,26 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Possible flags for dir_iterator_begin().
+ *
+ * * DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found before iterating through that directory's
+ * contents.
+ * * DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found after iterating through that directory's
+ * contents.
+ * * DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
+ * of the root directory it is iterating through if either
+ * DIR_ITERATOR_PRE_ORDER_TRAVERSAL or DIR_ITERATOR_POST_ORDER_TRAVERSAL
+ * is set.
+ *
+ * All flags can be used in any combination.
+ */
+#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
+#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
+#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -57,15 +75,16 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path, with options specified in
+ * 'flags'. Return a dir_iterator that holds the internal state of
+ * the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
  * path is the starting directory. An internal copy will be made.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 50188e9..d376e1f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3284,29 +3284,42 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	struct dir_iterator *diter = iter->dir_iterator;
 	int ok;
 
-	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
-		int flags;
-
-		if (!S_ISREG(diter->st.st_mode))
-			continue;
-		if (diter->basename[0] == '.')
-			continue;
-		if (ends_with(diter->basename, ".lock"))
-			continue;
+	if (!iter->dir_iterator) {
+		/*
+		 * If our dir_iterator is NULL at this stage, it means we failed
+		 * to open the given path at dir_iterator_begin(), most likely due
+		 * to it not existing.
+		 *
+		 * In this case, we pretend it was an empty directory and just
+		 * iterate through nothing.
+		 */
+		ok = ITER_DONE;
+	} else {
+		while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
+			int flags;
+
+			if (!S_ISREG(diter->st.st_mode))
+				continue;
+			if (diter->basename[0] == '.')
+				continue;
+			if (ends_with(diter->basename, ".lock"))
+				continue;
+
+			if (read_ref_full(diter->relative_path, 0,
+					  iter->oid.hash, &flags)) {
+				error("bad ref for %s", diter->path.buf);
+				continue;
+			}
 
-		if (read_ref_full(diter->relative_path, 0,
-				  iter->oid.hash, &flags)) {
-			error("bad ref for %s", diter->path.buf);
-			continue;
+			iter->base.refname = diter->relative_path;
+			iter->base.oid = &iter->oid;
+			iter->base.flags = flags;
+			return ITER_OK;
 		}
 
-		iter->base.refname = diter->relative_path;
-		iter->base.oid = &iter->oid;
-		iter->base.flags = flags;
-		return ITER_OK;
+		iter->dir_iterator = NULL;
 	}
 
-	iter->dir_iterator = NULL;
 	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
 		ok = ITER_ERROR;
 	return ok;
@@ -3346,7 +3359,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 	files_downcast(ref_store, 0, "reflog_iterator_begin");
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
-	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+	iter->dir_iterator = dir_iterator_begin(git_path("logs"), 0);
 	return ref_iterator;
 }
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index a7d1470..3b4948b 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -5,15 +5,38 @@
 
 int cmd_main(int argc, const char **argv)
 {
+	const char **myargv = argv;
+	int myargc = argc;
+
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
+	unsigned flag = 0;
+
+	while (--myargc && starts_with(*++myargv, "--")) {
+		if (!strcmp(*myargv, "--pre-order"))
+			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--post-order"))
+			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
+		else if (!strcmp(*myargv, "--list-root-dir"))
+			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
+		else if (!strcmp(*myargv, "--")) {
+			myargc--;
+			myargv++;
+			break;
+		} else
+			die("Unrecognized option: %s", *myargv);
+	}
 
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (myargc != 1)
+		die("expected exactly one non-option argument");
+	strbuf_addstr(&path, *myargv);
 
-	diter = dir_iterator_begin(path.buf);
+	diter = dir_iterator_begin(path.buf, flag);
+	if (diter == NULL) {
+		printf("begin failed: %d\n", errno);
+		return 0;
+	}
 
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
index 46e5ce5..4c6632f 100755
--- a/t/t0065-dir-iterator.sh
+++ b/t/t0065-dir-iterator.sh
@@ -15,31 +15,41 @@ test_expect_success 'setup' '
 	>dir/d/e/d/a &&
 
 	mkdir -p dir2/a/b/c/ &&
-	>dir2/a/b/c/d
+	>dir2/a/b/c/d &&
+
+	>file
 '
 
-test_expect_success 'dir-iterator should iterate through all files' '
-	cat >expect-sorted-output <<-\EOF &&
-	[d] (a) [a] ./dir/a
-	[d] (a/b) [b] ./dir/a/b
-	[d] (a/b/c) [c] ./dir/a/b/c
-	[d] (d) [d] ./dir/d
-	[d] (d/e) [e] ./dir/d/e
-	[d] (d/e/d) [d] ./dir/d/e/d
-	[f] (a/b/c/d) [d] ./dir/a/b/c/d
-	[f] (a/e) [e] ./dir/a/e
-	[f] (b) [b] ./dir/b
-	[f] (c) [c] ./dir/c
-	[f] (d/e/d/a) [a] ./dir/d/e/d/a
-	EOF
+cat >expect-sorted-output <<-\EOF &&
+[d] (a) [a] ./dir/a
+[d] (a/b) [b] ./dir/a/b
+[d] (a/b/c) [c] ./dir/a/b/c
+[d] (d) [d] ./dir/d
+[d] (d/e) [e] ./dir/d/e
+[d] (d/e/d) [d] ./dir/d/e/d
+[f] (a/b/c/d) [d] ./dir/a/b/c/d
+[f] (a/e) [e] ./dir/a/e
+[f] (b) [b] ./dir/b
+[f] (c) [c] ./dir/c
+[f] (d/e/d/a) [a] ./dir/d/e/d/a
+EOF
 
-	test-dir-iterator ./dir >out &&
+test_expect_success 'dir-iterator should iterate through all files' '
+	test-dir-iterator --pre-order ./dir >out &&
 	sort <out >./actual-pre-order-sorted-output &&
 
 	test_cmp expect-sorted-output actual-pre-order-sorted-output
 '
 
-test_expect_success 'dir-iterator should list files in the correct order' '
+test_expect_success 'dir-iterator should iterate through all files on post-order mode' '
+	test-dir-iterator --post-order ./dir >out &&
+	sort <out >actual-post-order-sorted-output &&
+
+	test_cmp expect-sorted-output actual-post-order-sorted-output
+'
+
+
+test_expect_success 'dir-iterator should list files properly on pre-order mode' '
 	cat >expect-pre-order-output <<-\EOF &&
 	[d] (a) [a] ./dir2/a
 	[d] (a/b) [b] ./dir2/a/b
@@ -47,9 +57,55 @@ test_expect_success 'dir-iterator should list files in the correct order' '
 	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
 	EOF
 
-	test-dir-iterator ./dir2 >actual-pre-order-output &&
-
+	test-dir-iterator --pre-order ./dir2 >actual-pre-order-output &&
 	test_cmp expect-pre-order-output actual-pre-order-output
 '
 
+test_expect_success 'dir-iterator should list files properly on post-order mode' '
+	cat >expect-post-order-output <<-\EOF &&
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a) [a] ./dir2/a
+	EOF
+
+	test-dir-iterator --post-order ./dir2 >actual-post-order-output &&
+	test_cmp expect-post-order-output actual-post-order-output
+'
+
+test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' '
+	cat >expect-pre-order-post-order-root-dir-output <<-\EOF &&
+	[d] (.) [dir2] ./dir2
+	[d] (a) [a] ./dir2/a
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a) [a] ./dir2/a
+	[d] (.) [dir2] ./dir2
+	EOF
+
+	test-dir-iterator --pre-order --post-order --list-root-dir ./dir2 >actual-pre-order-post-order-root-dir-output &&
+	test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
+'
+
+test_expect_success 'dir-iterator should return ENOENT upon opening non-existing directory' '
+	cat >expect-non-existing-dir-output <<-\EOF &&
+	begin failed: 2
+	EOF
+
+	test-dir-iterator ./dir3 >actual-non-existing-dir-output &&
+	test_cmp expect-non-existing-dir-output actual-non-existing-dir-output
+'
+
+test_expect_success 'dir-iterator should return ENOTDIR upon opening non-directory path' '
+	cat >expect-not-a-directory-output <<-\EOF &&
+	begin failed: 20
+	EOF
+
+	test-dir-iterator ./file >actual-not-a-directory-output &&
+	test_cmp expect-not-a-directory-output actual-not-a-directory-output
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 5/5] remove_subtree(): reimplement using iterators
  2017-04-19 13:14 [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (3 preceding siblings ...)
  2017-04-19 13:14 ` [PATCH v10 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
@ 2017-04-19 13:14 ` Daniel Ferreira
  2017-04-20  3:23 ` [PATCH v10 0/5] [GSoC] " Junio C Hamano
  5 siblings, 0 replies; 9+ messages in thread
From: Daniel Ferreira @ 2017-04-19 13:14 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 entry.c | 42 ++++++++++++++++--------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/entry.c b/entry.c
index d2b512d..a939432 100644
--- a/entry.c
+++ b/entry.c
@@ -3,6 +3,8 @@
 #include "dir.h"
 #include "streaming.h"
 #include "submodule.h"
+#include "iterator.h"
+#include "dir-iterator.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -45,33 +47,21 @@ static void create_directories(const char *path, int path_len,
 	free(buf);
 }
 
-static void remove_subtree(struct strbuf *path)
+static void remove_subtree(const char *path)
 {
-	DIR *dir = opendir(path->buf);
-	struct dirent *de;
-	int origlen = path->len;
-
-	if (!dir)
-		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
-		strbuf_addch(path, '/');
-		strbuf_addstr(path, de->d_name);
-		if (lstat(path->buf, &st))
-			die_errno("cannot lstat '%s'", path->buf);
-		if (S_ISDIR(st.st_mode))
-			remove_subtree(path);
-		else if (unlink(path->buf))
-			die_errno("cannot unlink '%s'", path->buf);
-		strbuf_setlen(path, origlen);
+	struct dir_iterator *diter = dir_iterator_begin(path,
+		DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR);
+	if (!diter) {
+		die_errno("cannot remove path '%s'", path);
+	}
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode)) {
+			if (rmdir(diter->path.buf))
+				die_errno("cannot rmdir '%s'", diter->path.buf);
+		} else if (unlink(diter->path.buf))
+			die_errno("cannot unlink '%s'", diter->path.buf);
 	}
-	closedir(dir);
-	if (rmdir(path->buf))
-		die_errno("cannot rmdir '%s'", path->buf);
 }
 
 static int create_file(const char *path, unsigned int mode)
@@ -312,7 +302,7 @@ int checkout_entry(struct cache_entry *ce,
 				return 0;
 			if (!state->force)
 				return error("%s is a directory", path.buf);
-			remove_subtree(&path);
+			remove_subtree(path.buf);
 		} else if (unlink(path.buf))
 			return error_errno("unable to unlink old '%s'", path.buf);
 	} else if (state->not_new)
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators
  2017-04-19 13:14 [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (4 preceding siblings ...)
  2017-04-19 13:14 ` [PATCH v10 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-20  3:23 ` Junio C Hamano
  5 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2017-04-20  3:23 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> I do not know if "queuing" meant I did not have to change this series
> any further (specially after Stefan's "ok"), but anyway, this series
> applies Junio's corrections back from v9, that were mostly regarding
> commit messages or style. I hope I got them right.

Queuing merely means that I queued the series on its own topic
branch and merged it to 'pu', which is a bit more (but not much
more) permanent place than the list archive.  It does not mean
anything more---specifically, it does not mean that it is now cast
in stone.  It does not mean it will appear in the next release,
either.

If you and others are happy with the state of the commits recorded,
we may then merge the topic to 'next' and then to 'master'.  But in
the meantime, if there are things that you are not exactly happy in
the series (e.g. you found a better way to approach the issue you
tackled, you noticed that you didn't fully address the review
comments, etc.)  you are welcome to further polish your topic by
sending out replacements.

> The only point I was in doubt was about Michael's signoff. Actually, he
> gave it not regarding the snippet for the new dir_iterator_advance()
> logic, but to a small piece of actual code he wrote on the new dir
> iterator test helper[1].

If part of your patch contains his code more or less verbatim, then
it is the right thing to do to have his sign-off.  For that part you
are relaying his patch, so he signs it off first, signaling that he
wrote it and he has the authority to release it to the project, and
then you sign it off, signaling that you have the authority to relay
it to the project (see DCO in Documentation/SubmittingPatches).

> I also didn't get whether I myself should have renamed t0065 to t0066
> given the other queued patch.

I would have expected you to move it over to t0066, as I'd have to
rename it myself otherwise.  There is no "skipping" of a number in
the result, as this series comes later than the one that adds t0065.

Having said that, there is no need to resend the series if the only
change necessary on top of v10 were renaming t0065 to t0066.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v10 4/5] dir_iterator: rewrite state machine model
  2017-04-19 13:14 ` [PATCH v10 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
@ 2017-04-20  3:30   ` Junio C Hamano
  2017-04-20  9:36   ` Michael Haggerty
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2017-04-20  3:30 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, sbeller, pclouds, mhagger

Daniel Ferreira <bnmvco@gmail.com> writes:

> diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
> index 46e5ce5..4c6632f 100755
> --- a/t/t0065-dir-iterator.sh
> +++ b/t/t0065-dir-iterator.sh
> @@ -15,31 +15,41 @@ test_expect_success 'setup' '
>  	>dir/d/e/d/a &&
>  
>  	mkdir -p dir2/a/b/c/ &&
> -	>dir2/a/b/c/d
> +	>dir2/a/b/c/d &&
> +
> +	>file
>  '
>  
> -test_expect_success 'dir-iterator should iterate through all files' '
> -	cat >expect-sorted-output <<-\EOF &&
> -	[d] (a) [a] ./dir/a
> -	[d] (a/b) [b] ./dir/a/b
> -	[d] (a/b/c) [c] ./dir/a/b/c
> -	[d] (d) [d] ./dir/d
> -	[d] (d/e) [e] ./dir/d/e
> -	[d] (d/e/d) [d] ./dir/d/e/d
> -	[f] (a/b/c/d) [d] ./dir/a/b/c/d
> -	[f] (a/e) [e] ./dir/a/e
> -	[f] (b) [b] ./dir/b
> -	[f] (c) [c] ./dir/c
> -	[f] (d/e/d/a) [a] ./dir/d/e/d/a
> -	EOF
> +cat >expect-sorted-output <<-\EOF &&
> +[d] (a) [a] ./dir/a
> +[d] (a/b) [b] ./dir/a/b
> +[d] (a/b/c) [c] ./dir/a/b/c
> +[d] (d) [d] ./dir/d
> +[d] (d/e) [e] ./dir/d/e
> +[d] (d/e/d) [d] ./dir/d/e/d
> +[f] (a/b/c/d) [d] ./dir/a/b/c/d
> +[f] (a/e) [e] ./dir/a/e
> +[f] (b) [b] ./dir/b
> +[f] (c) [c] ./dir/c
> +[f] (d/e/d/a) [a] ./dir/d/e/d/a
> +EOF
>  
> -	test-dir-iterator ./dir >out &&

There is something fishy going on around here in this patch, pushing
the code to prepare test vector out of test_expect_success block.  A
mistake in rebasing or something?

If you need to reroll the series to update this part, please rename
the test to t0066 and do remember to update the logmessage of a few
commits that refer to t0065.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v10 4/5] dir_iterator: rewrite state machine model
  2017-04-19 13:14 ` [PATCH v10 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
  2017-04-20  3:30   ` Junio C Hamano
@ 2017-04-20  9:36   ` Michael Haggerty
  1 sibling, 0 replies; 9+ messages in thread
From: Michael Haggerty @ 2017-04-20  9:36 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 04/19/2017 03:14 PM, Daniel Ferreira wrote:
> Perform a rewrite of dir_iterator_advance(). dir_iterator has
> ceased to rely on a combination of level.initialized and level.dir_state
> state variables and now only tracks the state with level.dir_state,
> which simplifies the iterator mechanism, makes the code easier to follow
> and eases additions of new features to the iterator.
> 
> Make dir_iterator_begin() attempt to lstat() the path it receives, and
> return NULL and an appropriate errno if it fails or if the passed path
> was not a directory.
> 
> Create an option for the dir_iterator API to iterate over subdirectories
> only after having iterated through their contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18). This is useful for
> recursively removing a directory and calling rmdir() on a directory only
> after all of its contents have been wiped.
> 
> Add an option for dir_iterator to also iterate over the initial
> directory (the one passed to dir_iterator_begin()).
> 
> Add the "flags" parameter to dir_iterator_create, allowing for the
> aforementioned new features to be enabled. The new default behavior
> (i.e. flags set to 0) does not iterate over directories. Flag
> DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
> so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
> directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
> iterates over the initial directory. These flags do not conflict with
> each other and may be used simultaneously.
> 
> Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
> the flags parameter introduced, as well as handle the case in which it
> fails to open the directory.
> 
> Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
> test "post-order" and "iterate-over-root" modes.
> 
> Michael Haggerty contributed with the design of the new
> dir_iterator_advance() implementation, the code for
> t/helper/test-dir-iterator's option parser and numerous reviews that
> gradually shaped this code to its current form.
> 
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  dir-iterator.c               | 220 ++++++++++++++++++++++++++++++-------------
>  dir-iterator.h               |  35 +++++--
>  refs/files-backend.c         |  51 ++++++----
>  t/helper/test-dir-iterator.c |  31 +++++-
>  t/t0065-dir-iterator.sh      |  94 ++++++++++++++----
>  5 files changed, 316 insertions(+), 115 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index d168cb2..3c20e0e 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -4,8 +4,6 @@
>  #include "dir-iterator.h"
>  
>  struct dir_iterator_level {
> -	int initialized;
> -
>  	DIR *dir;
>  
>  	/*
> @@ -20,9 +18,15 @@ struct dir_iterator_level {
>  	 * iteration and also iterated into):
>  	 */
>  	enum {
> -		DIR_STATE_ITER,
> -		DIR_STATE_RECURSE
> +		DIR_STATE_PUSHED,
> +		DIR_STATE_PRE_ITERATION,
> +		DIR_STATE_ITERATING,
> +		DIR_STATE_POST_ITERATION,
> +		DIR_STATE_EXHAUSTED
>  	} dir_state;
> +
> +	/* The stat structure for the directory this level represents. */
> +	struct stat st;
>  };
>  
>  /*
> @@ -48,15 +52,23 @@ struct dir_iterator_int {
>  	 * that will be included in this iteration.
>  	 */
>  	struct dir_iterator_level *levels;
> +
> +	/* Holds the flags passed to dir_iterator_begin(). */
> +	unsigned flags;
>  };
>  
> -static void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
> +static void push_dir_level(struct dir_iterator_int *iter, struct stat *st)
>  {
> -	level->dir_state = DIR_STATE_RECURSE;
> +	struct dir_iterator_level *level;
> +
>  	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
>  		   iter->levels_alloc);
> +
> +	/* Push a new level */
>  	level = &iter->levels[iter->levels_nr++];
> -	level->initialized = 0;
> +	level->dir = NULL;
> +	level->dir_state = DIR_STATE_PUSHED;
> +	level->st = *st;
>  }
>  
>  static int pop_dir_level(struct dir_iterator_int *iter)
> @@ -67,7 +79,11 @@ static int pop_dir_level(struct dir_iterator_int *iter)
>  static int adjust_iterator_data(struct dir_iterator_int *iter,
>  		struct dir_iterator_level *level)
>  {
> -	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> +	char *last_path_component;
> +
> +	if (level->dir_state != DIR_STATE_ITERATING) {
> +		iter->base.st = level->st;
> +	} else if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
>  		if (errno != ENOENT)
>  			warning("error reading path '%s': %s",
>  				iter->base.path.buf,
> @@ -76,18 +92,48 @@ static int adjust_iterator_data(struct dir_iterator_int *iter,
>  	}
>  
>  	/*
> -	 * We have to set these each time because
> -	 * the path strbuf might have been realloc()ed.
> +	 * Check if we are dealing with the root directory as an
> +	 * item that's being iterated through.
>  	 */
> -	iter->base.relative_path =
> -		iter->base.path.buf + iter->levels[0].prefix_len;
> -	iter->base.basename =
> -		iter->base.path.buf + level->prefix_len;
> -	level->dir_state = DIR_STATE_ITER;
> +	if (level->dir_state != DIR_STATE_ITERATING &&
> +		iter->levels_nr == 1) {
> +		iter->base.relative_path = ".";
> +
> +		/*
> +		 * If we have a path like './dir', we'll get everything
> +		 * after the last slash a basename. If we don't find a
> +		 * slash (e.g. 'dir'), we return the whole path.
> +		 */
> +		last_path_component = strrchr(iter->base.path.buf, '/');
> +		iter->base.basename = last_path_component ?
> +			last_path_component + 1 : iter->base.path.buf;

There's kindof a lot of awkward code here that is only needed to support
`DIR_ITERATOR_LIST_ROOT_DIR`. And the basename for the top-level
directory is often not useful. For example, if the starting path is ".."
or "foo/.." or "bar/.", then its basename is set to ".." or ".". It
would be really awkward to find the real basename of such arguments.

This might be the code trying to tell us something. Maybe
`DIR_ITERATOR_LIST_ROOT_DIR` isn't worth it as a feature. When I tried
ripping it out, I saved 40 LOC in `dir_iterator`, at a cost of only two
extra lines in `entry.c`.

Alternatively (given that the basename isn't really useful for the
top-level path), I think it would be acceptable to just set basename to
NULL for the case of the top-level directory (and document that
peculiarity).

> +	} else {
> +		iter->base.relative_path =
> +			iter->base.path.buf + iter->levels[0].prefix_len;
> +
> +		if (S_ISDIR(iter->base.st.st_mode))
> +			iter->base.basename =
> +				iter->base.path.buf + iter->levels[iter->levels_nr - 2].prefix_len;
> +		else
> +			iter->base.basename =
> +				iter->base.path.buf + level->prefix_len;
> +	}
>  
>  	return 0;
>  }
>  
> +/*
> + * This function uses a state machine with the following states:
> + * * DIR_STATE_PUSHED: the directory has been pushed to the
> + * iterator traversal tree.
> + * * DIR_STATE_PRE_ITERATION: the directory is not opened with opendir(). The
> + * dirpath has already been returned if pre-order traversal is set.
> + * * DIR_STATE_ITERATING: the directory is initialized. We are traversing
> + * through it.
> + * * DIR_STATE_POST_ITERATION: the directory has been iterated through.
> + * We are ready to close it.
> + * * DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
> + */

A little indentation would make the bulleted list above a lot easier to
read. And it is more common to use "-" for bullets, I guess because they
don't clash so much with the "*" that are part of the comment prefix:

 +/*
 + * This function uses a state machine with the following states:
 + * - DIR_STATE_PUSHED: the directory has been pushed to the
 + *   iterator traversal tree.
 + * - DIR_STATE_PRE_ITERATION: the directory is not opened with
opendir(). The
 + *   dirpath has already been returned if pre-order traversal is set.
 + * - DIR_STATE_ITERATING: the directory is initialized. We are traversing
 + *   through it.
 + * - DIR_STATE_POST_ITERATION: the directory has been iterated through.
 + *   We are ready to close it.
 + * - DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
 + */


>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  {
>  	struct dir_iterator_int *iter =
> @@ -96,9 +142,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  	while (1) {
>  		struct dir_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> -		struct dirent *de;
>  
> -		if (!level->initialized) {
> +		if (level->dir_state == DIR_STATE_PUSHED) {
> +			level->dir_state = DIR_STATE_PRE_ITERATION;
> +
> +			/* We may not want the root directory to be iterated over */
> +			if ((iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) && (
> +				iter->levels_nr != 1 ||
> +				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
> +				/*
> +				 * This will only error if we fail to lstat() the
> +				 * root directory. In this case, we bail.
> +				 */
> +				if (adjust_iterator_data(iter, level)) {
> +					level->dir_state = DIR_STATE_EXHAUSTED;
> +					continue;
> +				}
> +
> +				return ITER_OK;
> +			}
> +		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
>  			/*
>  			 * Note: dir_iterator_begin() ensures that
>  			 * path is not the empty string.
> @@ -108,64 +171,37 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			level->prefix_len = iter->base.path.len;
>  
>  			level->dir = opendir(iter->base.path.buf);
> -			if (!level->dir && errno != ENOENT) {
> -				warning("error opening directory %s: %s",
> -					iter->base.path.buf, strerror(errno));
> -				/* Popping the level is handled below */
> -			}
> -
> -			level->initialized = 1;
> -		} else if (S_ISDIR(iter->base.st.st_mode)) {
> -			if (level->dir_state == DIR_STATE_ITER) {
> +			if (!level->dir) {
>  				/*
> -				 * The directory was just iterated
> -				 * over; now prepare to iterate into
> -				 * it.
> +				 * This level wasn't opened sucessfully; pretend we
> +				 * iterated through it already.
>  				 */
> -				push_dir_level(iter, level);
> +				if (errno != ENOENT) {

We usually don't use braces for blocks consisting of a single statement.

> +					warning("error opening directory %s: %s",
> +						iter->base.path.buf, strerror(errno));
> +				}
> +
> +				level->dir_state = DIR_STATE_POST_ITERATION;
>  				continue;
> -			} else {
> -				/*
> -				 * The directory has already been
> -				 * iterated over and iterated into;
> -				 * we're done with it.
> -				 */
>  			}
> -		}
>  
> -		if (!level->dir) {
> -			/*
> -			 * This level is exhausted (or wasn't opened
> -			 * successfully); pop up a level.
> -			 */
> -			if (pop_dir_level(iter) == 0)
> -				return dir_iterator_abort(dir_iterator);
> +			level->dir_state = DIR_STATE_ITERATING;
> +		} else if (level->dir_state == DIR_STATE_ITERATING) {
> +			struct dirent *de;
>  
> -			continue;
> -		}
> -
> -		/*
> -		 * Loop until we find an entry that we can give back
> -		 * to the caller:
> -		 */
> -		while (1) {
>  			strbuf_setlen(&iter->base.path, level->prefix_len);
>  			errno = 0;
>  			de = readdir(level->dir);
>  
>  			if (!de) {
> -				/* This level is exhausted; pop up a level. */
> +				/* In case of readdir() error */
>  				if (errno) {
>  					warning("error reading directory %s: %s",
>  						iter->base.path.buf, strerror(errno));
> -				} else if (closedir(level->dir))
> -					warning("error closing directory %s: %s",
> -						iter->base.path.buf, strerror(errno));
> +				}
>  
> -				level->dir = NULL;
> -				if (pop_dir_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> -				break;
> +				level->dir_state = DIR_STATE_POST_ITERATION;
> +				continue;
>  			}
>  
>  			if (is_dot_or_dotdot(de->d_name))
> @@ -176,7 +212,41 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			if (adjust_iterator_data(iter, level))
>  				continue;
>  
> +			if (S_ISDIR(iter->base.st.st_mode)) {
> +				push_dir_level(iter, &iter->base.st);
> +				continue;
> +			}
> +
>  			return ITER_OK;
> +		} else if (level->dir_state == DIR_STATE_POST_ITERATION) {
> +			if (level->dir != NULL && closedir(level->dir)) {
> +				warning("error closing directory %s: %s",
> +					iter->base.path.buf, strerror(errno));
> +			}

The `closedir()` will only be attempted if we arrived at
`DIR_STATE_POST_ITERATION` state from the `DIR_STATE_ITERATING` branch,
right? So why not call `closedir()` up there? Then you wouldn't need the
`level->dir != NULL` check (which, by the way, we spell `if
(level->dir)`; the `!= NULL` is redundant).

> +			level->dir_state = DIR_STATE_EXHAUSTED;
> +
> +			strbuf_setlen(&iter->base.path, level->prefix_len);
> +			/*
> +			 * Since we are iterating through the dirpath
> +			 * after we have gone through it, we still need
> +			 * to get rid of the trailing slash we appended.
> +			 */
> +			strbuf_strip_suffix(&iter->base.path, "/");
> +
> +			/* We may not want the root directory to be iterated over */
> +			if ((iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) && (
> +				iter->levels_nr != 1 ||
> +				(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR))) {
> +				/*
> +				 * In this state, adjust_iterator_data() should never return
> +				 * an error.
> +				 */
> +				adjust_iterator_data(iter, level);
> +				return ITER_OK;
> +			}
> +		} else if (level->dir_state == DIR_STATE_EXHAUSTED) {
> +			if (pop_dir_level(iter) == 0)
> +				return dir_iterator_abort(dir_iterator);
>  		}
>  	}
>  }
> @@ -202,21 +272,41 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	return ITER_DONE;
>  }
>  
> -struct dir_iterator *dir_iterator_begin(const char *path)
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
>  {
> -	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
> -	struct dir_iterator *dir_iterator = &iter->base;
> +	struct dir_iterator_int *iter;
> +	struct dir_iterator *dir_iterator;
> +	struct stat st;
>  
>  	if (!path || !*path)
>  		die("BUG: empty path passed to dir_iterator_begin()");
>  
> +	if (lstat(path, &st) < 0) {
> +		if (errno != ENOENT)
> +			warning("error reading path '%s': %s",
> +					path, strerror(errno));

In a case like this, where we are providing an `errno` to the caller, it
is probably better to leave it to the caller to emit a warning/error if
it wants. The caller will have more context and so will be able to
explain the error in a way that the user can better understand. And some
callers might not want to emit an error at all, or base their decision
on `errno`.

(The story is different *during* the iteration, because we currently
don't have a way to return an error to the caller short of aborting the
whole iteration.)

> +		return NULL;
> +	}
> +
> +	if (!S_ISDIR(st.st_mode)) {
> +		errno = ENOTDIR;
> +		return NULL;
> +	}
> +
> +	iter = xcalloc(1, sizeof(*iter));
> +	dir_iterator = &iter->base;
> +
> +	iter->flags = flags;
> +	dir_iterator->st = st;
> +
>  	strbuf_init(&iter->base.path, PATH_MAX);
>  	strbuf_addstr(&iter->base.path, path);
>  
>  	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
> +	iter->levels_nr = 0;
>  
> -	iter->levels_nr = 1;
> -	iter->levels[0].initialized = 0;
> +	push_dir_level(iter, &dir_iterator->st);
>  
>  	return dir_iterator;
>  }
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 27739e6..a21e845 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -5,19 +5,17 @@
>   * Iterate over a directory tree.
>   *
>   * Iterate over a directory tree, recursively, including paths of all
> - * types and hidden paths. Skip "." and ".." entries and don't follow
> - * symlinks except for the original path.
> + * types and hidden paths. Skip "." and ".." entries.
>   *
>   * Every time dir_iterator_advance() is called, update the members of
>   * the dir_iterator structure to reflect the next path in the
>   * iteration. The order that paths are iterated over within a
> - * directory is undefined, but directory paths are always iterated
> - * over before the subdirectory contents.
> + * directory is undefined.
>   *
>   * A typical iteration looks like this:
>   *
>   *     int ok;
> - *     struct iterator *iter = dir_iterator_begin(path);
> + *     struct iterator *iter = dir_iterator_begin(path, flags);
>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
>   *             if (want_to_stop_iteration()) {
> @@ -38,6 +36,26 @@
>   * dir_iterator_advance() again.
>   */
>  
> +/*
> + * Possible flags for dir_iterator_begin().
> + *
> + * * DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
> + * a dirpath it has found before iterating through that directory's
> + * contents.
> + * * DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
> + * a dirpath it has found after iterating through that directory's
> + * contents.
> + * * DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
> + * of the root directory it is iterating through if either
> + * DIR_ITERATOR_PRE_ORDER_TRAVERSAL or DIR_ITERATOR_POST_ORDER_TRAVERSAL
> + * is set.
> + *
> + * All flags can be used in any combination.
> + */

Please indent this bulleted list, too.

> +#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
> +#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
> +#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
> +
>  struct dir_iterator {
>  	/* The current path: */
>  	struct strbuf path;
> @@ -57,15 +75,16 @@ struct dir_iterator {
>  };
>  
>  /*
> - * Start a directory iteration over path. Return a dir_iterator that
> - * holds the internal state of the iteration.
> + * Start a directory iteration over path, with options specified in
> + * 'flags'. Return a dir_iterator that holds the internal state of
> + * the iteration.
>   *
>   * The iteration includes all paths under path, not including path
>   * itself and not including "." or ".." entries.
>   *
>   * path is the starting directory. An internal copy will be made.
>   */
> -struct dir_iterator *dir_iterator_begin(const char *path);
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
>  
>  /*
>   * Advance the iterator to the first or next item and return ITER_OK.
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 50188e9..d376e1f 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3284,29 +3284,42 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
>  	struct dir_iterator *diter = iter->dir_iterator;
>  	int ok;
>  
> -	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
> -		int flags;
> -
> -		if (!S_ISREG(diter->st.st_mode))
> -			continue;
> -		if (diter->basename[0] == '.')
> -			continue;
> -		if (ends_with(diter->basename, ".lock"))
> -			continue;
> +	if (!iter->dir_iterator) {
> +		/*
> +		 * If our dir_iterator is NULL at this stage, it means we failed
> +		 * to open the given path at dir_iterator_begin(), most likely due
> +		 * to it not existing.
> +		 *
> +		 * In this case, we pretend it was an empty directory and just
> +		 * iterate through nothing.
> +		 */
> +		ok = ITER_DONE;
> +	} else {
> +		while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
> +			int flags;
> +
> +			if (!S_ISREG(diter->st.st_mode))
> +				continue;
> +			if (diter->basename[0] == '.')
> +				continue;
> +			if (ends_with(diter->basename, ".lock"))
> +				continue;
> +
> +			if (read_ref_full(diter->relative_path, 0,
> +					  iter->oid.hash, &flags)) {
> +				error("bad ref for %s", diter->path.buf);
> +				continue;
> +			}
>  
> -		if (read_ref_full(diter->relative_path, 0,
> -				  iter->oid.hash, &flags)) {
> -			error("bad ref for %s", diter->path.buf);
> -			continue;
> +			iter->base.refname = diter->relative_path;
> +			iter->base.oid = &iter->oid;
> +			iter->base.flags = flags;
> +			return ITER_OK;
>  		}
>  
> -		iter->base.refname = diter->relative_path;
> -		iter->base.oid = &iter->oid;
> -		iter->base.flags = flags;
> -		return ITER_OK;
> +		iter->dir_iterator = NULL;
>  	}
>  
> -	iter->dir_iterator = NULL;
>  	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
>  		ok = ITER_ERROR;
>  	return ok;
> @@ -3346,7 +3359,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
>  	files_downcast(ref_store, 0, "reflog_iterator_begin");
>  
>  	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
> -	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
> +	iter->dir_iterator = dir_iterator_begin(git_path("logs"), 0);

An easier way to deal with an error here would be to free `iter` (or not
allocate it in the first place) and return `empty_ref_iterator_begin()`
instead.

>  	return ref_iterator;
>  }
>  
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
> index a7d1470..3b4948b 100644
> --- a/t/helper/test-dir-iterator.c
> +++ b/t/helper/test-dir-iterator.c
> @@ -5,15 +5,38 @@
>  
>  int cmd_main(int argc, const char **argv)
>  {
> +	const char **myargv = argv;
> +	int myargc = argc;
> +
>  	struct strbuf path = STRBUF_INIT;
>  	struct dir_iterator *diter;
>  
> -	if (argc < 2)
> -		die("BUG: test-dir-iterator needs one argument");
> +	unsigned flag = 0;
> +
> +	while (--myargc && starts_with(*++myargv, "--")) {
> +		if (!strcmp(*myargv, "--pre-order"))
> +			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
> +		else if (!strcmp(*myargv, "--post-order"))
> +			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
> +		else if (!strcmp(*myargv, "--list-root-dir"))
> +			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
> +		else if (!strcmp(*myargv, "--")) {
> +			myargc--;
> +			myargv++;
> +			break;
> +		} else
> +			die("Unrecognized option: %s", *myargv);
> +	}
>  
> -	strbuf_add(&path, argv[1], strlen(argv[1]));
> +	if (myargc != 1)
> +		die("expected exactly one non-option argument");
> +	strbuf_addstr(&path, *myargv);
>  
> -	diter = dir_iterator_begin(path.buf);
> +	diter = dir_iterator_begin(path.buf, flag);
> +	if (diter == NULL) {
> +		printf("begin failed: %d\n", errno);

It would be more helpful to include `strerror(errno)` rather than
`errno` itself in the error output.

> +		return 0;
> +	}
>  
>  	while (dir_iterator_advance(diter) == ITER_OK) {
>  		if (S_ISDIR(diter->st.st_mode))
> diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
> [...]



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-04-20  9:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19 13:14 [PATCH v10 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-19 13:14 ` [PATCH v10 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
2017-04-19 13:14 ` [PATCH v10 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
2017-04-19 13:14 ` [PATCH v10 3/5] dir_iterator: refactor dir_iterator_advance Daniel Ferreira
2017-04-19 13:14 ` [PATCH v10 4/5] dir_iterator: rewrite state machine model Daniel Ferreira
2017-04-20  3:30   ` Junio C Hamano
2017-04-20  9:36   ` Michael Haggerty
2017-04-19 13:14 ` [PATCH v10 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-20  3:23 ` [PATCH v10 0/5] [GSoC] " Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).