git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-03-29  0:32 Daniel Ferreira
  2017-03-29  0:32 ` [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Daniel Ferreira @ 2017-03-29  0:32 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Hi there,

This is the fourth version of the GSoC microproject of refactoring
remove_subtree() from recursively using readdir() to use dir_iterator. Below
are the threads for other versions:

v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t

In this version of the patch, I followed Michael's suggestion
of splitting the commits responsible for adding the new feature to
dir_iterator. His suggestion of using flags instead of an options
struct has also been adopted.

This version also contains a test that is finally able to test the
function decently (not the one Stefan had suggested, which
did not result in a call to remove_subtree). I am just unsure about
its location in t/. I'd appreciate suggestions to put it in a more
decent home.

Daniel Ferreira (5):
  dir_iterator: add helpers to dir_iterator_advance
  dir_iterator: iterate over dir after its contents
  remove_subtree(): reimplement using iterators
  remove_subtree(): test removing nested directories
  files_reflog_iterator: amend use of dir_iterator

 dir-iterator.c                  | 105 +++++++++++++++++++++++++++++++---------
 dir-iterator.h                  |  14 ++++--
 entry.c                         |  31 ++++--------
 refs/files-backend.c            |   2 +-
 t/t2000-checkout-cache-clash.sh |  11 +++++
 5 files changed, 114 insertions(+), 49 deletions(-)

--
2.7.4 (Apple Git-66)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance
  2017-03-29  0:32 [PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-03-29  0:32 ` Daniel Ferreira
  2017-03-29  4:32   ` Michael Haggerty
  2017-03-29  0:32 ` [PATCH v4 2/5] dir_iterator: iterate over dir after its contents Daniel Ferreira
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Ferreira @ 2017-03-29  0:32 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create inline helpers to dir_iterator_advance(). Make
dir_iterator_advance()'s code more legible and allow some behavior to
be reusable.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 65 +++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..853c040 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };

+static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	level->dir_state = DIR_STATE_RECURSE;
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+		   iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+	level->initialized = 0;
+}
+
+static inline int pop_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	return --iter->levels_nr;
+}
+
+static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+				iter->base.path.buf,
+				strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * We have to set these each time because
+	 * the path strbuf might have been realloc()ed.
+	 */
+	iter->base.relative_path =
+		iter->base.path.buf + iter->levels[0].prefix_len;
+	iter->base.basename =
+		iter->base.path.buf + level->prefix_len;
+	level->dir_state = DIR_STATE_ITER;
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				 * over; now prepare to iterate into
 				 * it.
 				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
+				push_dir_level(iter, level);
 				continue;
 			} else {
 				/*
@@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			 * This level is exhausted (or wasn't opened
 			 * successfully); pop up a level.
 			 */
-			if (--iter->levels_nr == 0)
+			if (pop_dir_level(iter, level) == 0)
 				return dir_iterator_abort(dir_iterator);

 			continue;
@@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 						iter->base.path.buf, strerror(errno));

 				level->dir = NULL;
-				if (--iter->levels_nr == 0)
+				if (pop_dir_level(iter, level) == 0)
 					return dir_iterator_abort(dir_iterator);
 				break;
 			}
@@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;

 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
-				continue;
-			}

-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
+			if (set_iterator_data(iter, level))
+				continue;

 			return ITER_OK;
 		}
--
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-29  0:32 [PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-03-29  0:32 ` [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
@ 2017-03-29  0:32 ` Daniel Ferreira
  2017-03-29  9:56   ` Michael Haggerty
  2017-03-29  0:32 ` [PATCH v4 3/5] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Ferreira @ 2017-03-29  0:32 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18).

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned "depth-first" iteration mode to be enabled. Currently,
the only acceptable flag is DIR_ITERATOR_DEPTH_FIRST.

This is useful for recursively removing a directory and calling rmdir()
on a directory only after all of its contents have been wiped.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
 dir-iterator.h | 14 +++++++++++---
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 853c040..545d333 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,6 +48,9 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Holds the flags passed to dir_iterator_begin(). */
+	unsigned flags;
 };

 static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
@@ -114,12 +117,14 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			}

 			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
+		} else if (S_ISDIR(iter->base.st.st_mode) &&
+		!iter->flags & DIR_ITERATOR_DEPTH_FIRST) {
 			if (level->dir_state == DIR_STATE_ITER) {
 				/*
 				 * The directory was just iterated
 				 * over; now prepare to iterate into
-				 * it.
+				 * it (unless an option is set for us
+				 * to do otherwise).
 				 */
 				push_dir_level(iter, level);
 				continue;
@@ -153,10 +158,27 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			de = readdir(level->dir);

 			if (!de) {
-				/* This level is exhausted; pop up a level. */
+				/* This level is exhausted  */
 				if (errno) {
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
+				} else if (iter->flags & DIR_ITERATOR_DEPTH_FIRST) {
+					/* If we are handling dirpaths after their contents,
+					 * we have to iterate over the directory now that we'll
+					 * have finished iterating into it. */
+					level->dir = NULL;
+
+					if (pop_dir_level(iter, level) == 0)
+						return dir_iterator_abort(dir_iterator);
+
+					level = &iter->levels[iter->levels_nr - 1];
+					/* Remove a trailing slash */
+					strbuf_strip_suffix(&iter->base.path, "/");
+
+					if (set_iterator_data(iter, level))
+						continue;
+
+					return ITER_OK;
 				} else if (closedir(level->dir))
 					warning("error closing directory %s: %s",
 						iter->base.path.buf, strerror(errno));
@@ -175,8 +197,22 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (set_iterator_data(iter, level))
 				continue;

+			/*
+			 * If we want to iterate dirs after files, we shall
+			 * begin looking into them *before* we return the dir
+			 * itself.
+			 */
+			if (S_ISDIR(iter->base.st.st_mode) &&
+			iter->flags & DIR_ITERATOR_DEPTH_FIRST) {
+				push_dir_level(iter, level);
+				goto continue_outer_loop;
+			}
+
 			return ITER_OK;
 		}
+
+continue_outer_loop:
+		;
 	}
 }

@@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }

-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -209,6 +245,8 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 	if (!path || !*path)
 		die("BUG: empty path passed to dir_iterator_begin()");

+	iter->flags = flags;
+
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);

diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..28ff3df 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -38,6 +38,13 @@
  * dir_iterator_advance() again.
  */

+/* Possible flags for dir_iterator_begin().
+ *
+ * DIR_ITERATOR_DEPTH_FIRST: ensures subdirectories and their contents
+ * are iterated through before the containing directory.
+ */
+#define DIR_ITERATOR_DEPTH_FIRST (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -57,15 +64,16 @@ struct dir_iterator {
 };

 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path, with options specified in
+ * 'flags'. Return a dir_iterator that holds the internal state of
+ * the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
  * path is the starting directory. An internal copy will be made.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);

 /*
  * Advance the iterator to the first or next item and return ITER_OK.
--
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 3/5] remove_subtree(): reimplement using iterators
  2017-03-29  0:32 [PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-03-29  0:32 ` [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
  2017-03-29  0:32 ` [PATCH v4 2/5] dir_iterator: iterate over dir after its contents Daniel Ferreira
@ 2017-03-29  0:32 ` Daniel Ferreira
  2017-03-29 10:01   ` Michael Haggerty
  2017-03-29  0:32 ` [PATCH v4 4/5] remove_subtree(): test removing nested directories Daniel Ferreira
  2017-03-29  0:32 ` [PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator Daniel Ferreira
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Ferreira @ 2017-03-29  0:32 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 entry.c | 31 ++++++++++---------------------
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/entry.c b/entry.c
index c6eea24..bbebd16 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,8 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "iterator.h"
+#include "dir-iterator.h"

 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -46,29 +48,16 @@ static void create_directories(const char *path, int path_len,

 static void remove_subtree(struct strbuf *path)
 {
-	DIR *dir = opendir(path->buf);
-	struct dirent *de;
-	int origlen = path->len;
-
-	if (!dir)
-		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
-		strbuf_addch(path, '/');
-		strbuf_addstr(path, de->d_name);
-		if (lstat(path->buf, &st))
-			die_errno("cannot lstat '%s'", path->buf);
-		if (S_ISDIR(st.st_mode))
-			remove_subtree(path);
-		else if (unlink(path->buf))
+	struct dir_iterator *diter = dir_iterator_begin(path->buf, DIR_ITERATOR_DEPTH_FIRST);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode)) {
+			if (rmdir(diter->path.buf))
+				die_errno("cannot rmdir '%s'", path->buf);
+		} else if (unlink(diter->path.buf))
 			die_errno("cannot unlink '%s'", path->buf);
-		strbuf_setlen(path, origlen);
 	}
-	closedir(dir);
+
 	if (rmdir(path->buf))
 		die_errno("cannot rmdir '%s'", path->buf);
 }
--
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 4/5] remove_subtree(): test removing nested directories
  2017-03-29  0:32 [PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 preceding siblings ...)
  2017-03-29  0:32 ` [PATCH v4 3/5] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-03-29  0:32 ` Daniel Ferreira
  2017-03-29  0:32 ` [PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator Daniel Ferreira
  4 siblings, 0 replies; 19+ messages in thread
From: Daniel Ferreira @ 2017-03-29  0:32 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 t/t2000-checkout-cache-clash.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
 	git checkout-index -a -f --prefix=there/
 '

+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+	echo content >path &&
+	git update-index --add path &&
+	rm path &&
+	mkdir -p path/with/nested/paths &&
+	echo content >path/file1 &&
+	echo content >path/with/nested/paths/file2 &&
+	git checkout-index -f -a &&
+	test ! -d path
+'
+
 test_done
--
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator
  2017-03-29  0:32 [PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (3 preceding siblings ...)
  2017-03-29  0:32 ` [PATCH v4 4/5] remove_subtree(): test removing nested directories Daniel Ferreira
@ 2017-03-29  0:32 ` Daniel Ferreira
  2017-03-29 10:45   ` Michael Haggerty
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Ferreira @ 2017-03-29  0:32 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Amend a call to dir_iterator_begin() to pass the flags parameter
introduced in 3efb5c0 ("dir_iterator: iterate over dir after its
contents", 2017-28-03).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 50188e9..b4bba74 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 	files_downcast(ref_store, 0, "reflog_iterator_begin");

 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
-	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+	iter->dir_iterator = dir_iterator_begin(git_path("logs"), 0);
 	return ref_iterator;
 }

--
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance
  2017-03-29  0:32 ` [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
@ 2017-03-29  4:32   ` Michael Haggerty
  0 siblings, 0 replies; 19+ messages in thread
From: Michael Haggerty @ 2017-03-29  4:32 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 03/29/2017 02:32 AM, Daniel Ferreira wrote:
> Create inline helpers to dir_iterator_advance(). Make
> dir_iterator_advance()'s code more legible and allow some behavior to
> be reusable.

Thanks for breaking up the patches. That makes them a lot easier to review.

> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  dir-iterator.c | 65 +++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index 34182a9..853c040 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -50,6 +50,43 @@ struct dir_iterator_int {
>  	struct dir_iterator_level *levels;
>  };
> 
> +static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
> +{
> +	level->dir_state = DIR_STATE_RECURSE;
> +	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
> +		   iter->levels_alloc);
> +	level = &iter->levels[iter->levels_nr++];
> +	level->initialized = 0;
> +}
> +
> +static inline int pop_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
> +{
> +	return --iter->levels_nr;
> +}

`pop_dir_level()` doesn't use its `level` argument; it can be removed.

> [...]

Michael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-29  0:32 ` [PATCH v4 2/5] dir_iterator: iterate over dir after its contents Daniel Ferreira
@ 2017-03-29  9:56   ` Michael Haggerty
  2017-03-29 10:44     ` Michael Haggerty
  2017-03-29 16:46     ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Michael Haggerty @ 2017-03-29  9:56 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 03/29/2017 02:32 AM, Daniel Ferreira wrote:
> Create an option for the dir_iterator API to iterate over subdirectories
> only after having iterated through their contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18).

I admire your ambition at taking on this project. Even though the
`dir_iterator` code is fairly short, it is quite intricate and it took
me quite a bit of thought to get it right. Don't be discouraged by how
many iterations it takes to get a change like this accepted.

When reviewing your changes, I realized that
`dir_iterator_level::initialized` and `dir_iterator_level::dir_state`
are somewhat redundant with each other in the pre-existing code. It
would be possible to add a third state, say `DIR_STATE_PUSH`, to the
enum, and use that to replace the state that we now call `!initialized`.

I'm not demanding that change, but it might make the bookkeeping in the
new code a little bit easier to understand, because the logic that
decides what to do based on setting of the new option would either
transition the level from `DIR_STATE_PUSH -> DIR_STATE_ITER ->
DIR_STATE_RECURSE` (for pre-order traversal) or `DIR_STATE_PUSH ->
DIR_STATE_RECURSE -> DIR_STATE_ITER` (for post-order traversal). I think
this could make the state machine clearer and thereby make it easier to
reason about the code.

> Add the "flags" parameter to dir_iterator_create, allowing for the
> aforementioned "depth-first" iteration mode to be enabled. Currently,
> the only acceptable flag is DIR_ITERATOR_DEPTH_FIRST.
> 
> This is useful for recursively removing a directory and calling rmdir()
> on a directory only after all of its contents have been wiped.

This patch changes the signature of `ref_iterator_begin()` without
adjusting the caller in `refs/files-backend.c`. This means that the code
doesn't even compile after this patch is applied.

The Git project insists that the code compile and all tests pass after
each and every commit. Among other things, this keeps the code
"bisectable" using `git bisect`, which is a very useful property.

I see that you adjust the caller later in the patch series, in
"files_reflog_iterator: amend use of dir_iterator". Please squash that
patch into this one.

I also realize that I made a goof in my comments about v3 of this patch
series. Your new option is not choosing between "depth-first" and
"breadth-first". Both types of iteration are depth-first. Really it is
choosing between pre-order and post-order traversal. So I think it would
be better to name the option `DIR_ITERATOR_POST_ORDER`. Sorry about that.

> ---
>  dir-iterator.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
>  dir-iterator.h | 14 +++++++++++---
>  2 files changed, 53 insertions(+), 7 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index 853c040..545d333 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -48,6 +48,9 @@ struct dir_iterator_int {
>  	 * that will be included in this iteration.
>  	 */
>  	struct dir_iterator_level *levels;
> +
> +	/* Holds the flags passed to dir_iterator_begin(). */
> +	unsigned flags;
>  };
> 
>  static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
> @@ -114,12 +117,14 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			}
> 
>  			level->initialized = 1;
> -		} else if (S_ISDIR(iter->base.st.st_mode)) {
> +		} else if (S_ISDIR(iter->base.st.st_mode) &&
> +		!iter->flags & DIR_ITERATOR_DEPTH_FIRST) {

You need parentheses on the previous line:

		!(iter->flags & DIR_ITERATOR_DEPTH_FIRST)) {

because otherwise it is interpreted as

		(!iter->flags) & DIR_ITERATOR_DEPTH_FIRST) {

BTW, you should compile with stricter compiler options so that your
compiler warns you about problems like this. This is documented in
`Documentation/CodingGuidelines` (along with lots more useful information):

 - As a Git developer we assume you have a reasonably modern compiler
   and we recommend you to enable the DEVELOPER makefile knob to
   ensure your patch is clear of all compiler warnings we care about,
   by e.g. "echo DEVELOPER=1 >>config.mak".

Also, this line is not indented far enough. It should probably line up
with the inside of the opening parenthesis of the preceding line, like

> +		} else if (S_ISDIR(iter->base.st.st_mode) &&
> +			   !(iter->flags & DIR_ITERATOR_DEPTH_FIRST)) {

But let's dig a little bit deeper. Why don't the tests catch this coding
error? Currently there is only one option, `DIR_ITERATOR_DEPTH_FIRST`,
so if that constant were defined to be 1 as expected, the code would
accidentally work anyway (though it would break if another flag is ever
added). But in fact, you define `DIR_ITERATOR_DEPTH_FIRST` to be 2
(probably unintentionally; see below), so this condition always
evaluates to false!

So what's going on here? Why don't the tests fail?

The only pre-existing user of `dir_iterator` is
`files_reflog_iterator_begin()`, and I think that none of the callers of
that function care whether the iteration is pre-order or post-order. So
maybe they're getting post-order iteration and just don't notice?

But no, that's not it either. For example, the following command, which
iterates over reflogs, gives different answers on your branch vs.
standard git:

    $ git rev-list --all --reflog | sort >expected
    $ ./git rev-list --all --reflog | sort >actual
    $ git diff --no-index --numstat expected actual
    0       1065    expected => actual

(Adding the parentheses as suggested above makes the two outputs agree.)

The disagreement is not a surprise, because there isn't a corresponding
coding error in the code below that returns the directory itself in a
post-order iteration. The net result appears to be that there is no
recursion at all into subdirectories when `DIR_ITERATOR_DEPTH_FIRST` is
set. So due to this bug, we get neither a correct post-order iteration
nor a correct pre-order iteration with the new option.

In summary, this is a trivial bug that is easy to fix, but it points to
an alarming problem: our tests don't detect a pretty serious breakage in
the iteration over reflogs. And nothing is testing directory iteration
itself, and in particular whether the new option is doing what it should.

Probably the easiest way to remedy the latter problem would be to write
a little test helper program that iterates over an arbitrary directory
and prints the names of the files that it finds, with a `--post-order`
option that twiddles the new functionality. Then directory iteration
could be tested pretty easily by creating a few test directories and
files, running the test helper, and checking its output (remembering
that the order of iteration at a *single directory level* is undefined).
I think such a test is needed if we are to be confident that your
changes are correct.

>  			if (level->dir_state == DIR_STATE_ITER) {
>  				/*
>  				 * The directory was just iterated
>  				 * over; now prepare to iterate into
> -				 * it.
> +				 * it (unless an option is set for us
> +				 * to do otherwise).
>  				 */
>  				push_dir_level(iter, level);
>  				continue;
> @@ -153,10 +158,27 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			de = readdir(level->dir);
> 
>  			if (!de) {
> -				/* This level is exhausted; pop up a level. */
> +				/* This level is exhausted  */
>  				if (errno) {
>  					warning("error reading directory %s: %s",
>  						iter->base.path.buf, strerror(errno));
> +				} else if (iter->flags & DIR_ITERATOR_DEPTH_FIRST) {
> +					/* If we are handling dirpaths after their contents,
> +					 * we have to iterate over the directory now that we'll
> +					 * have finished iterating into it. */

The Git project style is to open and terminate multiline comments on
separate lines:

        /*
	 * If we are handling dirpaths after their contents,
	 * we have to iterate over the directory now that we'll
	 * have finished iterating into it.
         */

The same problem appears elsewhere in your patch, too.

> +					level->dir = NULL;

You never call `closedir()` in this branch of the code, so you are
leaking file descriptors. Meanwhile, I don't think there's a reason to
set `dir->level` to `NULL` here, since it's just about to be popped off
the stack.

> +
> +					if (pop_dir_level(iter, level) == 0)
> +						return dir_iterator_abort(dir_iterator);
> +
> +					level = &iter->levels[iter->levels_nr - 1];
> +					/* Remove a trailing slash */
> +					strbuf_strip_suffix(&iter->base.path, "/");

I think that unconditionally stripping trailing slashes prevents this
code from being used to iterate over the root directory, "/". It's
probably not a use case that will be needed, but it would be an
unexpected surprise if anybody decided to do so.

Please either fix this or document the limitation.

> +
> +					if (set_iterator_data(iter, level))
> +						continue;
> +
> +					return ITER_OK;
>  				} else if (closedir(level->dir))
>  					warning("error closing directory %s: %s",
>  						iter->base.path.buf, strerror(errno));
> @@ -175,8 +197,22 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			if (set_iterator_data(iter, level))
>  				continue;
> 
> +			/*
> +			 * If we want to iterate dirs after files, we shall
> +			 * begin looking into them *before* we return the dir
> +			 * itself.
> +			 */
> +			if (S_ISDIR(iter->base.st.st_mode) &&
> +			iter->flags & DIR_ITERATOR_DEPTH_FIRST) {

The line above is also indented incorrectly.

> +				push_dir_level(iter, level);
> +				goto continue_outer_loop;

Instead of `goto`, couldn't you use `break` here?

> +			}
> +
>  			return ITER_OK;
>  		}
> +
> +continue_outer_loop:
> +		;
>  	}
>  }
> 
> @@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	return ITER_DONE;
>  }
> 
> -struct dir_iterator *dir_iterator_begin(const char *path)
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
>  {
>  	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
>  	struct dir_iterator *dir_iterator = &iter->base;
> @@ -209,6 +245,8 @@ struct dir_iterator *dir_iterator_begin(const char *path)
>  	if (!path || !*path)
>  		die("BUG: empty path passed to dir_iterator_begin()");
> 
> +	iter->flags = flags;
> +
>  	strbuf_init(&iter->base.path, PATH_MAX);
>  	strbuf_addstr(&iter->base.path, path);
> 
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 27739e6..28ff3df 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -38,6 +38,13 @@
>   * dir_iterator_advance() again.
>   */

The module docstring just above this hunk needs updating, too.

> +/* Possible flags for dir_iterator_begin().
> + *
> + * DIR_ITERATOR_DEPTH_FIRST: ensures subdirectories and their contents
> + * are iterated through before the containing directory.
> + */
> +#define DIR_ITERATOR_DEPTH_FIRST (1 << 1)
> +

Normally the first constant in a bitset gets the value `(1 << 0)`; i.e.,
1. Is there a reason you use 2 here?

> [...]

Michael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] remove_subtree(): reimplement using iterators
  2017-03-29  0:32 ` [PATCH v4 3/5] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-03-29 10:01   ` Michael Haggerty
  0 siblings, 0 replies; 19+ messages in thread
From: Michael Haggerty @ 2017-03-29 10:01 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 03/29/2017 02:32 AM, Daniel Ferreira wrote:
> Use dir_iterator to traverse through remove_subtree()'s directory tree,
> avoiding the need for recursive calls to readdir(). Simplify
> remove_subtree()'s code.
> 
> A conversion similar in purpose was previously done at 46d092a
> ("for_each_reflog(): reimplement using iterators", 2016-05-21).
> 
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  entry.c | 31 ++++++++++---------------------
>  1 file changed, 10 insertions(+), 21 deletions(-)
> 
> diff --git a/entry.c b/entry.c
> index c6eea24..bbebd16 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -2,6 +2,8 @@
>  #include "blob.h"
>  #include "dir.h"
>  #include "streaming.h"
> +#include "iterator.h"
> +#include "dir-iterator.h"
> 
>  static void create_directories(const char *path, int path_len,
>  			       const struct checkout *state)
> @@ -46,29 +48,16 @@ static void create_directories(const char *path, int path_len,
> 
>  static void remove_subtree(struct strbuf *path)

The reason that this function took a `strbuf` as argument was that it
used to modify the strbuf while it was working. Now that `dir_iterator`
does all of that work, you could simplify it to take a `const char *`
argument, and change the caller to pass `path.buf`.

>  {
> -	DIR *dir = opendir(path->buf);
> -	struct dirent *de;
> -	int origlen = path->len;
> -
> -	if (!dir)
> -		die_errno("cannot opendir '%s'", path->buf);
> -	while ((de = readdir(dir)) != NULL) {
> -		struct stat st;
> -
> -		if (is_dot_or_dotdot(de->d_name))
> -			continue;
> -
> -		strbuf_addch(path, '/');
> -		strbuf_addstr(path, de->d_name);
> -		if (lstat(path->buf, &st))
> -			die_errno("cannot lstat '%s'", path->buf);
> -		if (S_ISDIR(st.st_mode))
> -			remove_subtree(path);
> -		else if (unlink(path->buf))
> +	struct dir_iterator *diter = dir_iterator_begin(path->buf, DIR_ITERATOR_DEPTH_FIRST);
> +
> +	while (dir_iterator_advance(diter) == ITER_OK) {
> +		if (S_ISDIR(diter->st.st_mode)) {
> +			if (rmdir(diter->path.buf))
> +				die_errno("cannot rmdir '%s'", path->buf);
> +		} else if (unlink(diter->path.buf))
>  			die_errno("cannot unlink '%s'", path->buf);

The two `die_errno()` calls must be changed to use `diter->path.buf`
rather than `path->buf`.

> -		strbuf_setlen(path, origlen);
>  	}
> -	closedir(dir);
> +
>  	if (rmdir(path->buf))
>  		die_errno("cannot rmdir '%s'", path->buf);
>  }

Michael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-29  9:56   ` Michael Haggerty
@ 2017-03-29 10:44     ` Michael Haggerty
  2017-03-29 16:46     ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Michael Haggerty @ 2017-03-29 10:44 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 03/29/2017 11:56 AM, Michael Haggerty wrote:
> On 03/29/2017 02:32 AM, Daniel Ferreira wrote:
>> [...]
> [...]
> The disagreement is not a surprise, because there isn't a corresponding
> coding error in the code below that returns the directory itself in a
> post-order iteration. The net result appears to be that there is no
> recursion at all into subdirectories when `DIR_ITERATOR_DEPTH_FIRST` is
> set. So due to this bug, we get neither a correct post-order iteration
> nor a correct pre-order iteration with the new option.

Correction: the second-to-last sentence should read "there is no
recursion at all into subdirectories when `DIR_ITERATOR_DEPTH_FIRST` is
*NOT* set.

Michael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator
  2017-03-29  0:32 ` [PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator Daniel Ferreira
@ 2017-03-29 10:45   ` Michael Haggerty
  0 siblings, 0 replies; 19+ messages in thread
From: Michael Haggerty @ 2017-03-29 10:45 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 03/29/2017 02:32 AM, Daniel Ferreira wrote:
> Amend a call to dir_iterator_begin() to pass the flags parameter
> introduced in 3efb5c0 ("dir_iterator: iterate over dir after its
> contents", 2017-28-03).
> 
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  refs/files-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 50188e9..b4bba74 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
>  	files_downcast(ref_store, 0, "reflog_iterator_begin");
> 
>  	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
> -	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
> +	iter->dir_iterator = dir_iterator_begin(git_path("logs"), 0);
>  	return ref_iterator;
>  }

As mentioned earlier, this patch should be squashed into patch [2/5].

Michael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-29  9:56   ` Michael Haggerty
  2017-03-29 10:44     ` Michael Haggerty
@ 2017-03-29 16:46     ` Junio C Hamano
  2017-03-30  4:59       ` Michael Haggerty
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2017-03-29 16:46 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Daniel Ferreira, git, sbeller, pclouds

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I also realize that I made a goof in my comments about v3 of this patch
> series. Your new option is not choosing between "depth-first" and
> "breadth-first". Both types of iteration are depth-first. Really it is
> choosing between pre-order and post-order traversal. So I think it would
> be better to name the option `DIR_ITERATOR_POST_ORDER`. Sorry about that.

That solicits a natural reaction from a bystander.  Would an
IN_ORDER option also be useful?  I am not demanding it to be added
to this series, especially if there is no immediate need, but if we
foresee that it would also make sense for some other callers, we
would at least want to make sure that the code after this addition
of POST_ORDER is in a shape that is easy to add such an option
later.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-29 16:46     ` Junio C Hamano
@ 2017-03-30  4:59       ` Michael Haggerty
  2017-03-30  6:08         ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Haggerty @ 2017-03-30  4:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Ferreira, git, sbeller, pclouds

On 03/29/2017 06:46 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> I also realize that I made a goof in my comments about v3 of this patch
>> series. Your new option is not choosing between "depth-first" and
>> "breadth-first". Both types of iteration are depth-first. Really it is
>> choosing between pre-order and post-order traversal. So I think it would
>> be better to name the option `DIR_ITERATOR_POST_ORDER`. Sorry about that.
> 
> That solicits a natural reaction from a bystander.  Would an
> IN_ORDER option also be useful?  I am not demanding it to be added
> to this series, especially if there is no immediate need, but if we
> foresee that it would also make sense for some other callers, we
> would at least want to make sure that the code after this addition
> of POST_ORDER is in a shape that is easy to add such an option
> later.

I think IN_ORDER really only applies to *binary* trees, not arbitrary
trees like a filesystem.

Michael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-30  4:59       ` Michael Haggerty
@ 2017-03-30  6:08         ` Junio C Hamano
  2017-03-30  6:39           ` Michael Haggerty
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2017-03-30  6:08 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Daniel Ferreira, git, sbeller, pclouds

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I think IN_ORDER really only applies to *binary* trees, not arbitrary
> trees like a filesystem.

How true.  Even if we were giving a sorted output (and dir-iterator
doesn't and there is no need for it to), dir/ should come before any
of its contents, so for that application we can use pre-order, and
there is no sensible and useful definition of in-order.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-30  6:08         ` Junio C Hamano
@ 2017-03-30  6:39           ` Michael Haggerty
  2017-03-30 11:08             ` Duy Nguyen
  2017-03-30 17:26             ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Michael Haggerty @ 2017-03-30  6:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Ferreira, git, sbeller, pclouds

On 03/30/2017 08:08 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> I think IN_ORDER really only applies to *binary* trees, not arbitrary
>> trees like a filesystem.
> 
> How true.  Even if we were giving a sorted output (and dir-iterator
> doesn't and there is no need for it to), dir/ should come before any
> of its contents, so for that application we can use pre-order, and
> there is no sensible and useful definition of in-order.

Your email got me thinking, though, that there is one generalization of
the concept of PRE_ORDER vs. POST_ORDER that would be both easy to
implement and potentially useful. Namely, flags could include the
following orthogonal options (instead of `DIR_ITERATOR_POST_ORDER)`:

* DIR_ITERATOR_DIRS_BEFORE -- when this is set, directories
  are included in the iteration *before* their contents.

* DIR_ITERATOR_DIRS_AFTER -- when this is set, directories
  are included in the iteration *after* their contents.

Enabling one or the other of these options would select pre-order or
post-order iteration.

Enabling neither of them would mean that directory entries themselves
are not included in the iteration at all, even though recursion would
happen *into* subdirectories. This option would surely be useful to some
caller somewhere (though it's easy for the caller to get the same effect
itself via

	if (S_ISDIR(iter->base.st.st_mode))
		continue;

).

It's even conceivable that enabling *both* options at the same time
would be useful, if the caller want to know when the processing of a
directory is begun and also when it is finished (e.g., because it needs
to load or unload a `.gitignore` file for that directory). If we wanted
to make it easier for the caller figure out whether it is seeing an
"entering directory" event vs. a "leaving directory" event, we could
expose something like the `dir_state` member in the iterator.

While we're blue-skying, a

* DIR_ITERATOR_RECURSE -- recurse into subdirectories

would make the set of possible options complete. If this option is not
set, then the iteration would be over the entries in a single directory
without traversing its subdirectories.

I don't think any of this needs to be implemented now, but maybe keep it
in mind if/when `dir_iterator` gets more users.

Michael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-30  6:39           ` Michael Haggerty
@ 2017-03-30 11:08             ` Duy Nguyen
  2017-04-02  4:25               ` Daniel Ferreira (theiostream)
  2017-03-30 17:26             ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2017-03-30 11:08 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Daniel Ferreira, Git Mailing List, Stefan Beller

On Thu, Mar 30, 2017 at 1:39 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> * DIR_ITERATOR_RECURSE -- recurse into subdirectories
>
> would make the set of possible options complete. If this option is not
> set, then the iteration would be over the entries in a single directory
> without traversing its subdirectories.

And would make it possible to use dir-iterator everywhere (except
read_directory_recursiveky). That would be really nice :)
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-30  6:39           ` Michael Haggerty
  2017-03-30 11:08             ` Duy Nguyen
@ 2017-03-30 17:26             ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2017-03-30 17:26 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Daniel Ferreira, git, sbeller, pclouds

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I don't think any of this needs to be implemented now, but maybe keep it
> in mind if/when `dir_iterator` gets more users.

OK.  One thing that was missing in your list was the opposite of "do
not show directories", i.e. "show only directories".  That should
also be easy to do (but we need to figure out a way to encode it in
the flag somehow) if it turns out to be useful later.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-03-30 11:08             ` Duy Nguyen
@ 2017-04-02  4:25               ` Daniel Ferreira (theiostream)
  2017-04-05  9:21                 ` Duy Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Ferreira (theiostream) @ 2017-04-02  4:25 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Michael Haggerty, Junio C Hamano, Git Mailing List, Stefan Beller

Why exactly would it not be applicable to read_directory_recursively()?

On Thu, Mar 30, 2017 at 8:08 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Thu, Mar 30, 2017 at 1:39 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> * DIR_ITERATOR_RECURSE -- recurse into subdirectories
>>
>> would make the set of possible options complete. If this option is not
>> set, then the iteration would be over the entries in a single directory
>> without traversing its subdirectories.
>
> And would make it possible to use dir-iterator everywhere (except
> read_directory_recursiveky). That would be really nice :)
> --
> Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents
  2017-04-02  4:25               ` Daniel Ferreira (theiostream)
@ 2017-04-05  9:21                 ` Duy Nguyen
  0 siblings, 0 replies; 19+ messages in thread
From: Duy Nguyen @ 2017-04-05  9:21 UTC (permalink / raw)
  To: Daniel Ferreira (theiostream)
  Cc: Michael Haggerty, Junio C Hamano, Git Mailing List, Stefan Beller

On Sun, Apr 2, 2017 at 11:25 AM, Daniel Ferreira (theiostream)
<bnmvco@gmail.com> wrote:
> Why exactly would it not be applicable to read_directory_recursively()?

Because that function is a beast (and probably should have "beast" in
the function name).

The function is supposed to read .gitignore, index file and whatever
else needed, traverse the directory and return the list of untracked
files, or ignored files. It even has options to return a directory
path, if all entries inside is ignored/untracked, instead of the list
of files inside. On top of that, it has "untracked cache" to skip
traversing directories to speed up.

Using dir_iterator might be possible but we'll need to improve it a
lot. read_directory_recursively() needs really fine control over
traversing: it can decide to not recurse in a subdirectory, it can
decide to ignore the rest of the entries in a directory and go back to
its parent, and in the case of untracked cache, it can switch to
traversing the cache instead of on-disk directories.

It's possible, probably, but it's going to need quite a lot of work
(and care, since I think there are corner cases that have been abused
in real world and we just cannot change those behaviors). Using
iterator interface would be a good improvement to clean up the "on
disk or on cache" directory traversal there though.

>
> On Thu, Mar 30, 2017 at 8:08 AM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Thu, Mar 30, 2017 at 1:39 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>> * DIR_ITERATOR_RECURSE -- recurse into subdirectories
>>>
>>> would make the set of possible options complete. If this option is not
>>> set, then the iteration would be over the entries in a single directory
>>> without traversing its subdirectories.
>>
>> And would make it possible to use dir-iterator everywhere (except
>> read_directory_recursiveky). That would be really nice :)
>> --
>> Duy



-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-04-05  9:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-29  0:32 [PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-29  0:32 ` [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
2017-03-29  4:32   ` Michael Haggerty
2017-03-29  0:32 ` [PATCH v4 2/5] dir_iterator: iterate over dir after its contents Daniel Ferreira
2017-03-29  9:56   ` Michael Haggerty
2017-03-29 10:44     ` Michael Haggerty
2017-03-29 16:46     ` Junio C Hamano
2017-03-30  4:59       ` Michael Haggerty
2017-03-30  6:08         ` Junio C Hamano
2017-03-30  6:39           ` Michael Haggerty
2017-03-30 11:08             ` Duy Nguyen
2017-04-02  4:25               ` Daniel Ferreira (theiostream)
2017-04-05  9:21                 ` Duy Nguyen
2017-03-30 17:26             ` Junio C Hamano
2017-03-29  0:32 ` [PATCH v4 3/5] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-29 10:01   ` Michael Haggerty
2017-03-29  0:32 ` [PATCH v4 4/5] remove_subtree(): test removing nested directories Daniel Ferreira
2017-03-29  0:32 ` [PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator Daniel Ferreira
2017-03-29 10:45   ` Michael Haggerty

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).