git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-03-25 18:12 Daniel Ferreira
  2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

This is the third version of the GSoC microproject
of refactoring remove_subtree() from recursively using
readdir() to use dir_iterator. Below are the threads for
other versions:

v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t

Duy suggested adding features to dir_iterator might go
beyond the intention of a microproject, but I figured I
might go for it to learn more about the project.

The dir_iterator reimplementation has been tested in a
separate binary I created (and linked with libgit.a) to
reproduce remove_subtree()'s contents. As pointed out in the
last thread, git's tests for this function were unable to
catch a daunting bug I had introduced, and I still haven't
been able to come up with a way to reproduce remove_subtree()
being called. Any help?

Thank you all again for all the reviews.

Daniel Ferreira (2):
  dir_iterator: iterate over dir after its contents
  remove_subtree(): reimplement using iterators

 dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-------------
 dir-iterator.h |   7 ++++
 entry.c        |  32 +++++++-----------
 3 files changed, 95 insertions(+), 44 deletions(-)

--
2.7.4 (Apple Git-66)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents
  2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-03-25 18:12 ` Daniel Ferreira
  2017-03-26 22:40   ` Michael Haggerty
  2017-03-26 22:48   ` Michael Haggerty
  2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller
  2 siblings, 2 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create an option for the dir_iterator API to iterate over a directory
path only after having iterated through its contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18).

This is useful for recursively removing a directory and calling rmdir()
on a directory only after all of its contents have been wiped.

An "options" member has been added to the dir_iterator struct. It
contains the "iterate_dirs_after_files" flag, that enables the feature
when set to 1. Default behavior continues to be iterating over directory
paths before its contents.

Two inline functions have been added to dir_iterator's code to avoid
code repetition inside dir_iterator_advance() and make code more clear.

No particular functions or wrappers for setting the options struct's
fields have been added to avoid either breaking the current dir_iterator
API or over-engineering an extremely simple option architecture.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-------------
 dir-iterator.h |   7 ++++
 2 files changed, 84 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..833d56a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };

+static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	level->dir_state = DIR_STATE_RECURSE;
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+		   iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+	level->initialized = 0;
+}
+
+static inline int pop_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	return --iter->levels_nr;
+}
+
+static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+				iter->base.path.buf,
+				strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * We have to set these each time because
+	 * the path strbuf might have been realloc()ed.
+	 */
+	iter->base.relative_path =
+		iter->base.path.buf + iter->levels[0].prefix_len;
+	iter->base.basename =
+		iter->base.path.buf + level->prefix_len;
+	level->dir_state = DIR_STATE_ITER;
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -77,18 +114,16 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			}

 			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
+		} else if (S_ISDIR(iter->base.st.st_mode) &&
+		!iter->base.options.iterate_dirs_after_files) {
 			if (level->dir_state == DIR_STATE_ITER) {
 				/*
 				 * The directory was just iterated
 				 * over; now prepare to iterate into
-				 * it.
+				 * it (unless an option is set for us
+				 * to do otherwise).
 				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
+				push_dir_level(iter, level);
 				continue;
 			} else {
 				/*
@@ -104,7 +139,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			 * This level is exhausted (or wasn't opened
 			 * successfully); pop up a level.
 			 */
-			if (--iter->levels_nr == 0)
+			if (pop_dir_level(iter, level) == 0)
 				return dir_iterator_abort(dir_iterator);

 			continue;
@@ -120,16 +155,33 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			de = readdir(level->dir);

 			if (!de) {
-				/* This level is exhausted; pop up a level. */
+				/* This level is exhausted  */
 				if (errno) {
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
+				} else if (iter->base.options.iterate_dirs_after_files) {
+					/* If we are handling dirpaths after their contents,
+					 * we have to iterate over the directory now that we'll
+					 * have finished iterating into it. */
+					level->dir = NULL;
+
+					if (pop_dir_level(iter, level) == 0)
+						return dir_iterator_abort(dir_iterator);
+
+					level = &iter->levels[iter->levels_nr - 1];
+					/* Remove a trailing slash */
+					strbuf_strip_suffix(&iter->base.path, "/");
+
+					if (set_iterator_data(iter, level))
+						continue;
+
+					return ITER_OK;
 				} else if (closedir(level->dir))
 					warning("error closing directory %s: %s",
 						iter->base.path.buf, strerror(errno));

 				level->dir = NULL;
-				if (--iter->levels_nr == 0)
+				if (pop_dir_level(iter, level) == 0)
 					return dir_iterator_abort(dir_iterator);
 				break;
 			}
@@ -138,26 +190,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;

 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
+
+			if (set_iterator_data(iter, level))
 				continue;
-			}

 			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
+			 * If we want to iterate dirs after files, we shall
+			 * begin looking into them *before* we return the dir
+			 * itself.
 			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
+			if (S_ISDIR(iter->base.st.st_mode) &&
+			iter->base.options.iterate_dirs_after_files) {
+				push_dir_level(iter, level);
+				goto continue_outer_loop;
+			}

 			return ITER_OK;
 		}
+
+continue_outer_loop:
+		;
 	}
 }

@@ -190,6 +242,8 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 	if (!path || !*path)
 		die("BUG: empty path passed to dir_iterator_begin()");

+	iter->base.options.iterate_dirs_after_files = 0;
+
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);

diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..4304913 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -38,7 +38,14 @@
  * dir_iterator_advance() again.
  */

+struct dir_iterator_options {
+	unsigned iterate_dirs_after_files : 1;
+};
+
 struct dir_iterator {
+	/* Options for dir_iterator */
+	struct dir_iterator_options options;
+
 	/* The current path: */
 	struct strbuf path;

--
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators
  2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
@ 2017-03-25 18:12 ` Daniel Ferreira
  2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller
  2 siblings, 0 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 entry.c | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/entry.c b/entry.c
index c6eea24..670ffeb 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,8 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "iterator.h"
+#include "dir-iterator.h"

 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -46,29 +48,17 @@ static void create_directories(const char *path, int path_len,

 static void remove_subtree(struct strbuf *path)
 {
-	DIR *dir = opendir(path->buf);
-	struct dirent *de;
-	int origlen = path->len;
-
-	if (!dir)
-		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
-		strbuf_addch(path, '/');
-		strbuf_addstr(path, de->d_name);
-		if (lstat(path->buf, &st))
-			die_errno("cannot lstat '%s'", path->buf);
-		if (S_ISDIR(st.st_mode))
-			remove_subtree(path);
-		else if (unlink(path->buf))
+	struct dir_iterator *diter = dir_iterator_begin(path->buf);
+	diter->options.iterate_dirs_after_files = 1;
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode)) {
+			if (rmdir(diter->path.buf))
+				die_errno("cannot rmdir '%s'", path->buf);
+		} else if (unlink(diter->path.buf))
 			die_errno("cannot unlink '%s'", path->buf);
-		strbuf_setlen(path, origlen);
 	}
-	closedir(dir);
+
 	if (rmdir(path->buf))
 		die_errno("cannot rmdir '%s'", path->buf);
 }
--
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents
  2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
@ 2017-03-26 22:40   ` Michael Haggerty
  2017-03-26 22:48   ` Michael Haggerty
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Haggerty @ 2017-03-26 22:40 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 03/25/2017 07:12 PM, Daniel Ferreira wrote:
> Create an option for the dir_iterator API to iterate over a directory
> path only after having iterated through its contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18).
> 
> This is useful for recursively removing a directory and calling rmdir()
> on a directory only after all of its contents have been wiped.
> 
> An "options" member has been added to the dir_iterator struct. It
> contains the "iterate_dirs_after_files" flag, that enables the feature
> when set to 1. Default behavior continues to be iterating over directory
> paths before its contents.
> 
> Two inline functions have been added to dir_iterator's code to avoid
> code repetition inside dir_iterator_advance() and make code more clear.
> 
> No particular functions or wrappers for setting the options struct's
> fields have been added to avoid either breaking the current dir_iterator
> API or over-engineering an extremely simple option architecture.

This patch would be easier to read if it were split into two: one
extracting the new functions and changing old code to use them, and a
second adding the new functionality. As one patch, is is hard to see
quickly which changes have what purpose.

I also suggest adding a new `unsigned int flags` parameter to
`dir_iterator_begin`. I think that's more natural, because it doesn't
make sense to change the iteration order during an iteration. It's not
much of a problem to change the API given that all callers are in the
same codebase. If you were to forget to convert any callers (or if a
different in-flight patch series were to add a new caller using the old
call style), the compiler would complain, and the problem would be
obvious and easy to fix.

I didn't actually read the patch carefully yet because I don't have time
this evening to seek out the interesting parts in the long diff.

Michael

> [...]


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents
  2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
  2017-03-26 22:40   ` Michael Haggerty
@ 2017-03-26 22:48   ` Michael Haggerty
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Haggerty @ 2017-03-26 22:48 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 03/25/2017 07:12 PM, Daniel Ferreira wrote:
> Create an option for the dir_iterator API to iterate over a directory
> path only after having iterated through its contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18).
> 
> This is useful for recursively removing a directory and calling rmdir()
> on a directory only after all of its contents have been wiped.
> 
> An "options" member has been added to the dir_iterator struct. It
> contains the "iterate_dirs_after_files" flag, that enables the feature
> when set to 1. Default behavior continues to be iterating over directory
> paths before its contents.
> 
> Two inline functions have been added to dir_iterator's code to avoid
> code repetition inside dir_iterator_advance() and make code more clear.
> 
> No particular functions or wrappers for setting the options struct's
> fields have been added to avoid either breaking the current dir_iterator
> API or over-engineering an extremely simple option architecture.
> 
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-------------
>  dir-iterator.h |   7 ++++
>  2 files changed, 84 insertions(+), 23 deletions(-)
> 
> [...]
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 27739e6..4304913 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -38,7 +38,14 @@
>   * dir_iterator_advance() again.
>   */
> 
> +struct dir_iterator_options {
> +	unsigned iterate_dirs_after_files : 1;
> +};
> +
>  struct dir_iterator {
> +	/* Options for dir_iterator */
> +	struct dir_iterator_options options;
> +
>  	/* The current path: */
>  	struct strbuf path;

Another thing I noticed: the name of this option,
`iterate_dirs_after_files`, is a little bit misleading. If I understand
correctly, it doesn't make the iteration process files before
directories within a single directory; rather, it ensures that
subdirectories and their contents are processed before the containing
directory. Therefore, a better name might be something like "depth_first".

I should mention that I like the overall idea to add this new feature
and use it to simplify `remove_subtree()`.

Michael


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators
  2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
  2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-03-28 17:13 ` Stefan Beller
  2 siblings, 0 replies; 6+ messages in thread
From: Stefan Beller @ 2017-03-28 17:13 UTC (permalink / raw)
  To: Daniel Ferreira
  Cc: git@vger.kernel.org, Junio C Hamano, Duy Nguyen, Michael Haggerty

On Sat, Mar 25, 2017 at 11:12 AM, Daniel Ferreira <bnmvco@gmail.com> wrote:
> This is the third version of the GSoC microproject
> of refactoring remove_subtree() from recursively using
> readdir() to use dir_iterator. Below are the threads for
> other versions:
>
> v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730
> v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
>
> Duy suggested adding features to dir_iterator might go
> beyond the intention of a microproject, but I figured I
> might go for it to learn more about the project.
>
> The dir_iterator reimplementation has been tested in a
> separate binary I created (and linked with libgit.a) to
> reproduce remove_subtree()'s contents. As pointed out in the
> last thread, git's tests for this function were unable to
> catch a daunting bug I had introduced, and I still haven't
> been able to come up with a way to reproduce remove_subtree()
> being called. Any help?
>

I would think a test llike the following would work:

test_expect_success 'remove nested subtrees' '
    test_commit initial &&
    mkdir -p dir/with/nested/dir &&
    echo content >dir/with/nested/dir/file &&
    echo content >dir/file &&
    git add dir/with/nested/dir/file dir/file &&
    git commit -a -m "commit directory structure" &&
    git checkout initial &&
    ! test dir
'

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-03-28 17:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
2017-03-26 22:40   ` Michael Haggerty
2017-03-26 22:48   ` Michael Haggerty
2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).