* [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-03-25 18:12 Daniel Ferreira
2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw)
To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira
This is the third version of the GSoC microproject
of refactoring remove_subtree() from recursively using
readdir() to use dir_iterator. Below are the threads for
other versions:
v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
Duy suggested adding features to dir_iterator might go
beyond the intention of a microproject, but I figured I
might go for it to learn more about the project.
The dir_iterator reimplementation has been tested in a
separate binary I created (and linked with libgit.a) to
reproduce remove_subtree()'s contents. As pointed out in the
last thread, git's tests for this function were unable to
catch a daunting bug I had introduced, and I still haven't
been able to come up with a way to reproduce remove_subtree()
being called. Any help?
Thank you all again for all the reviews.
Daniel Ferreira (2):
dir_iterator: iterate over dir after its contents
remove_subtree(): reimplement using iterators
dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-------------
dir-iterator.h | 7 ++++
entry.c | 32 +++++++-----------
3 files changed, 95 insertions(+), 44 deletions(-)
--
2.7.4 (Apple Git-66)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents
2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-03-25 18:12 ` Daniel Ferreira
2017-03-26 22:40 ` Michael Haggerty
2017-03-26 22:48 ` Michael Haggerty
2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller
2 siblings, 2 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw)
To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira
Create an option for the dir_iterator API to iterate over a directory
path only after having iterated through its contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18).
This is useful for recursively removing a directory and calling rmdir()
on a directory only after all of its contents have been wiped.
An "options" member has been added to the dir_iterator struct. It
contains the "iterate_dirs_after_files" flag, that enables the feature
when set to 1. Default behavior continues to be iterating over directory
paths before its contents.
Two inline functions have been added to dir_iterator's code to avoid
code repetition inside dir_iterator_advance() and make code more clear.
No particular functions or wrappers for setting the options struct's
fields have been added to avoid either breaking the current dir_iterator
API or over-engineering an extremely simple option architecture.
Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-------------
dir-iterator.h | 7 ++++
2 files changed, 84 insertions(+), 23 deletions(-)
diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..833d56a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
struct dir_iterator_level *levels;
};
+static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+ level->dir_state = DIR_STATE_RECURSE;
+ ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+ iter->levels_alloc);
+ level = &iter->levels[iter->levels_nr++];
+ level->initialized = 0;
+}
+
+static inline int pop_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+ return --iter->levels_nr;
+}
+
+static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+ if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+ if (errno != ENOENT)
+ warning("error reading path '%s': %s",
+ iter->base.path.buf,
+ strerror(errno));
+ return -1;
+ }
+
+ /*
+ * We have to set these each time because
+ * the path strbuf might have been realloc()ed.
+ */
+ iter->base.relative_path =
+ iter->base.path.buf + iter->levels[0].prefix_len;
+ iter->base.basename =
+ iter->base.path.buf + level->prefix_len;
+ level->dir_state = DIR_STATE_ITER;
+
+ return 0;
+}
+
int dir_iterator_advance(struct dir_iterator *dir_iterator)
{
struct dir_iterator_int *iter =
@@ -77,18 +114,16 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
}
level->initialized = 1;
- } else if (S_ISDIR(iter->base.st.st_mode)) {
+ } else if (S_ISDIR(iter->base.st.st_mode) &&
+ !iter->base.options.iterate_dirs_after_files) {
if (level->dir_state == DIR_STATE_ITER) {
/*
* The directory was just iterated
* over; now prepare to iterate into
- * it.
+ * it (unless an option is set for us
+ * to do otherwise).
*/
- level->dir_state = DIR_STATE_RECURSE;
- ALLOC_GROW(iter->levels, iter->levels_nr + 1,
- iter->levels_alloc);
- level = &iter->levels[iter->levels_nr++];
- level->initialized = 0;
+ push_dir_level(iter, level);
continue;
} else {
/*
@@ -104,7 +139,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
* This level is exhausted (or wasn't opened
* successfully); pop up a level.
*/
- if (--iter->levels_nr == 0)
+ if (pop_dir_level(iter, level) == 0)
return dir_iterator_abort(dir_iterator);
continue;
@@ -120,16 +155,33 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
de = readdir(level->dir);
if (!de) {
- /* This level is exhausted; pop up a level. */
+ /* This level is exhausted */
if (errno) {
warning("error reading directory %s: %s",
iter->base.path.buf, strerror(errno));
+ } else if (iter->base.options.iterate_dirs_after_files) {
+ /* If we are handling dirpaths after their contents,
+ * we have to iterate over the directory now that we'll
+ * have finished iterating into it. */
+ level->dir = NULL;
+
+ if (pop_dir_level(iter, level) == 0)
+ return dir_iterator_abort(dir_iterator);
+
+ level = &iter->levels[iter->levels_nr - 1];
+ /* Remove a trailing slash */
+ strbuf_strip_suffix(&iter->base.path, "/");
+
+ if (set_iterator_data(iter, level))
+ continue;
+
+ return ITER_OK;
} else if (closedir(level->dir))
warning("error closing directory %s: %s",
iter->base.path.buf, strerror(errno));
level->dir = NULL;
- if (--iter->levels_nr == 0)
+ if (pop_dir_level(iter, level) == 0)
return dir_iterator_abort(dir_iterator);
break;
}
@@ -138,26 +190,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
continue;
strbuf_addstr(&iter->base.path, de->d_name);
- if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
- if (errno != ENOENT)
- warning("error reading path '%s': %s",
- iter->base.path.buf,
- strerror(errno));
+
+ if (set_iterator_data(iter, level))
continue;
- }
/*
- * We have to set these each time because
- * the path strbuf might have been realloc()ed.
+ * If we want to iterate dirs after files, we shall
+ * begin looking into them *before* we return the dir
+ * itself.
*/
- iter->base.relative_path =
- iter->base.path.buf + iter->levels[0].prefix_len;
- iter->base.basename =
- iter->base.path.buf + level->prefix_len;
- level->dir_state = DIR_STATE_ITER;
+ if (S_ISDIR(iter->base.st.st_mode) &&
+ iter->base.options.iterate_dirs_after_files) {
+ push_dir_level(iter, level);
+ goto continue_outer_loop;
+ }
return ITER_OK;
}
+
+continue_outer_loop:
+ ;
}
}
@@ -190,6 +242,8 @@ struct dir_iterator *dir_iterator_begin(const char *path)
if (!path || !*path)
die("BUG: empty path passed to dir_iterator_begin()");
+ iter->base.options.iterate_dirs_after_files = 0;
+
strbuf_init(&iter->base.path, PATH_MAX);
strbuf_addstr(&iter->base.path, path);
diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..4304913 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -38,7 +38,14 @@
* dir_iterator_advance() again.
*/
+struct dir_iterator_options {
+ unsigned iterate_dirs_after_files : 1;
+};
+
struct dir_iterator {
+ /* Options for dir_iterator */
+ struct dir_iterator_options options;
+
/* The current path: */
struct strbuf path;
--
2.7.4 (Apple Git-66)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators
2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
@ 2017-03-25 18:12 ` Daniel Ferreira
2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller
2 siblings, 0 replies; 6+ messages in thread
From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw)
To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira
Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.
A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).
Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
entry.c | 32 +++++++++++---------------------
1 file changed, 11 insertions(+), 21 deletions(-)
diff --git a/entry.c b/entry.c
index c6eea24..670ffeb 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,8 @@
#include "blob.h"
#include "dir.h"
#include "streaming.h"
+#include "iterator.h"
+#include "dir-iterator.h"
static void create_directories(const char *path, int path_len,
const struct checkout *state)
@@ -46,29 +48,17 @@ static void create_directories(const char *path, int path_len,
static void remove_subtree(struct strbuf *path)
{
- DIR *dir = opendir(path->buf);
- struct dirent *de;
- int origlen = path->len;
-
- if (!dir)
- die_errno("cannot opendir '%s'", path->buf);
- while ((de = readdir(dir)) != NULL) {
- struct stat st;
-
- if (is_dot_or_dotdot(de->d_name))
- continue;
-
- strbuf_addch(path, '/');
- strbuf_addstr(path, de->d_name);
- if (lstat(path->buf, &st))
- die_errno("cannot lstat '%s'", path->buf);
- if (S_ISDIR(st.st_mode))
- remove_subtree(path);
- else if (unlink(path->buf))
+ struct dir_iterator *diter = dir_iterator_begin(path->buf);
+ diter->options.iterate_dirs_after_files = 1;
+
+ while (dir_iterator_advance(diter) == ITER_OK) {
+ if (S_ISDIR(diter->st.st_mode)) {
+ if (rmdir(diter->path.buf))
+ die_errno("cannot rmdir '%s'", path->buf);
+ } else if (unlink(diter->path.buf))
die_errno("cannot unlink '%s'", path->buf);
- strbuf_setlen(path, origlen);
}
- closedir(dir);
+
if (rmdir(path->buf))
die_errno("cannot rmdir '%s'", path->buf);
}
--
2.7.4 (Apple Git-66)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents
2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
@ 2017-03-26 22:40 ` Michael Haggerty
2017-03-26 22:48 ` Michael Haggerty
1 sibling, 0 replies; 6+ messages in thread
From: Michael Haggerty @ 2017-03-26 22:40 UTC (permalink / raw)
To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds
On 03/25/2017 07:12 PM, Daniel Ferreira wrote:
> Create an option for the dir_iterator API to iterate over a directory
> path only after having iterated through its contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18).
>
> This is useful for recursively removing a directory and calling rmdir()
> on a directory only after all of its contents have been wiped.
>
> An "options" member has been added to the dir_iterator struct. It
> contains the "iterate_dirs_after_files" flag, that enables the feature
> when set to 1. Default behavior continues to be iterating over directory
> paths before its contents.
>
> Two inline functions have been added to dir_iterator's code to avoid
> code repetition inside dir_iterator_advance() and make code more clear.
>
> No particular functions or wrappers for setting the options struct's
> fields have been added to avoid either breaking the current dir_iterator
> API or over-engineering an extremely simple option architecture.
This patch would be easier to read if it were split into two: one
extracting the new functions and changing old code to use them, and a
second adding the new functionality. As one patch, is is hard to see
quickly which changes have what purpose.
I also suggest adding a new `unsigned int flags` parameter to
`dir_iterator_begin`. I think that's more natural, because it doesn't
make sense to change the iteration order during an iteration. It's not
much of a problem to change the API given that all callers are in the
same codebase. If you were to forget to convert any callers (or if a
different in-flight patch series were to add a new caller using the old
call style), the compiler would complain, and the problem would be
obvious and easy to fix.
I didn't actually read the patch carefully yet because I don't have time
this evening to seek out the interesting parts in the long diff.
Michael
> [...]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents
2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
2017-03-26 22:40 ` Michael Haggerty
@ 2017-03-26 22:48 ` Michael Haggerty
1 sibling, 0 replies; 6+ messages in thread
From: Michael Haggerty @ 2017-03-26 22:48 UTC (permalink / raw)
To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds
On 03/25/2017 07:12 PM, Daniel Ferreira wrote:
> Create an option for the dir_iterator API to iterate over a directory
> path only after having iterated through its contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18).
>
> This is useful for recursively removing a directory and calling rmdir()
> on a directory only after all of its contents have been wiped.
>
> An "options" member has been added to the dir_iterator struct. It
> contains the "iterate_dirs_after_files" flag, that enables the feature
> when set to 1. Default behavior continues to be iterating over directory
> paths before its contents.
>
> Two inline functions have been added to dir_iterator's code to avoid
> code repetition inside dir_iterator_advance() and make code more clear.
>
> No particular functions or wrappers for setting the options struct's
> fields have been added to avoid either breaking the current dir_iterator
> API or over-engineering an extremely simple option architecture.
>
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
> dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-------------
> dir-iterator.h | 7 ++++
> 2 files changed, 84 insertions(+), 23 deletions(-)
>
> [...]
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 27739e6..4304913 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -38,7 +38,14 @@
> * dir_iterator_advance() again.
> */
>
> +struct dir_iterator_options {
> + unsigned iterate_dirs_after_files : 1;
> +};
> +
> struct dir_iterator {
> + /* Options for dir_iterator */
> + struct dir_iterator_options options;
> +
> /* The current path: */
> struct strbuf path;
Another thing I noticed: the name of this option,
`iterate_dirs_after_files`, is a little bit misleading. If I understand
correctly, it doesn't make the iteration process files before
directories within a single directory; rather, it ensures that
subdirectories and their contents are processed before the containing
directory. Therefore, a better name might be something like "depth_first".
I should mention that I like the overall idea to add this new feature
and use it to simplify `remove_subtree()`.
Michael
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators
2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-03-28 17:13 ` Stefan Beller
2 siblings, 0 replies; 6+ messages in thread
From: Stefan Beller @ 2017-03-28 17:13 UTC (permalink / raw)
To: Daniel Ferreira
Cc: git@vger.kernel.org, Junio C Hamano, Duy Nguyen, Michael Haggerty
On Sat, Mar 25, 2017 at 11:12 AM, Daniel Ferreira <bnmvco@gmail.com> wrote:
> This is the third version of the GSoC microproject
> of refactoring remove_subtree() from recursively using
> readdir() to use dir_iterator. Below are the threads for
> other versions:
>
> v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730
> v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
>
> Duy suggested adding features to dir_iterator might go
> beyond the intention of a microproject, but I figured I
> might go for it to learn more about the project.
>
> The dir_iterator reimplementation has been tested in a
> separate binary I created (and linked with libgit.a) to
> reproduce remove_subtree()'s contents. As pointed out in the
> last thread, git's tests for this function were unable to
> catch a daunting bug I had introduced, and I still haven't
> been able to come up with a way to reproduce remove_subtree()
> being called. Any help?
>
I would think a test llike the following would work:
test_expect_success 'remove nested subtrees' '
test_commit initial &&
mkdir -p dir/with/nested/dir &&
echo content >dir/with/nested/dir/file &&
echo content >dir/file &&
git add dir/with/nested/dir/file dir/file &&
git commit -a -m "commit directory structure" &&
git checkout initial &&
! test dir
'
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-03-28 17:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira
2017-03-26 22:40 ` Michael Haggerty
2017-03-26 22:48 ` Michael Haggerty
2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).