git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests
@ 2019-02-26  5:17 Matheus Tavares
  2019-02-26  5:18 ` [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
                   ` (11 more replies)
  0 siblings, 12 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-02-26  5:17 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder

Ævar sent v1: https://public-inbox.org/git/20190226002625.13022-1-avarab@gmail.com/

Based on Ævar's work and comments, I refactored my previous series[1],
so that clone's original behaviour regarding symlinks is kept untouched.
So, keeping this behaviour, having tests for this, changing local clone
to use dir-iterator's API and modifing hidden paths behaviour (which we
agreed to seem like unintentional), I think this version addresses all
the comments and concerns raised in this thread:
https://public-inbox.org/git/CAP8UFD2xrfMHNxcmeYf8G+d53SL26N07FFAoDP+e0h3r-tvKQw@mail.gmail.com/
And also addresses the comments I made in WIP's v1.

[1]: https://public-inbox.org/git/20190223190309.6728-1-matheus.bernardino@usp.br/

Matheus Tavares (4):
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 builtin/clone.c            | 70 ++++++++++++++++++++++----------------
 dir-iterator.c             | 28 +++++++++++++--
 dir-iterator.h             | 40 +++++++++++++++++-----
 refs/files-backend.c       |  2 +-
 t/t5604-clone-reference.sh | 69 +++++++++++++++++++++++++++++++++++++
 5 files changed, 168 insertions(+), 41 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
@ 2019-02-26  5:18 ` Matheus Tavares
  2019-02-26 12:01   ` Duy Nguyen
  2019-02-26  5:18 ` [WIP RFC PATCH v2 2/5] clone: test for our behavior on odd objects/* content Matheus Tavares
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares @ 2019-02-26  5:18 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Michael Haggerty, Ramsay Jones,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
dir_iterator_advance abort imediatelly in the case of an error while
trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
makes the iteration follow symlinks to directories and include its
contents in the iteration. These new flags will be used in a subsequent
patch.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c       | 28 +++++++++++++++++++++++++---
 dir-iterator.h       | 40 ++++++++++++++++++++++++++++++++--------
 refs/files-backend.c |  2 +-
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..17aca8ea41 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,12 +48,16 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned flags;
 };
 
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
+	int ret;
 
 	while (1) {
 		struct dir_iterator_level *level =
@@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
 				warning("error opening directory %s: %s",
 					iter->base.path.buf, strerror(errno));
 				/* Popping the level is handled below */
@@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
 				} else if (closedir(level->dir))
@@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
+
+			if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+				ret = stat(iter->base.path.buf, &iter->base.st);
+			else
+				ret = lstat(iter->base.path.buf, &iter->base.st);
+
+			if (ret < 0) {
+				if (errno != ENOENT) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading path '%s': %s",
 						iter->base.path.buf,
 						strerror(errno));
+				}
 				continue;
 			}
 
@@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			return ITER_OK;
 		}
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +203,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +216,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
+	iter->flags = flags;
 	iter->levels_nr = 1;
 	iter->levels[0].initialized = 0;
 
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..fe9eb9a04b 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -6,9 +6,10 @@
 /*
  * Iterate over a directory tree.
  *
- * Iterate over a directory tree, recursively, including paths of all
- * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * With no flags to modify behaviour, iterate over a directory tree,
+ * recursively, including paths of all types and hidden paths. Skip
+ * "." and ".." entries and don't follow symlinks except for the
+ * original path.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
@@ -19,7 +20,7 @@
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, 0);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -40,6 +41,20 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error while trying to fetch the next entry, which is
+ *   to emit a warning and keep going. With this flag, resouces are
+ *   freed and ITER_ERROR is return immediately.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
+ *   directories, i.e., iterate over linked directories' contents.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -59,15 +74,19 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. Return a dir_iterator that holds the
+ * internal state of the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behaviour.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +95,11 @@ struct dir_iterator *dir_iterator_begin(const char *path);
  * dir_iterator and associated resources and return ITER_ERROR. It is
  * a bug to use iterator or call this function again after it has
  * returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir-iterator will return ITER_ERROR when failing
+ * to fetch the next entry or just emit a warning and try to fetch the
+ * next is defined by the 'pedantic' option at dir-iterator's
+ * initialization.
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index dd8abe9185..c3d3b6c454 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [WIP RFC PATCH v2 2/5] clone: test for our behavior on odd objects/* content
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
  2019-02-26  5:18 ` [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-02-26  5:18 ` Matheus Tavares
  2019-02-26  5:18 ` [WIP RFC PATCH v2 3/5] clone: copy hidden paths at local clone Matheus Tavares
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-02-26  5:18 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Junio C Hamano, Alex Riesen

From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

We've implicitly supported .git/objects/* content of symlinks since
approximately forever, and when we do a copy of the repo we transfer
those over, but aren't very consistent about other random stuff we
find depending on if it's a "hidden" file or not.

Let's add a test for that, which shouldn't read as an endorsement of
what we're doing now, just asserts current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5604-clone-reference.sh | 60 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..6f9c77049e 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,64 @@ test_expect_success 'clone, dissociate from alternates' '
 	( cd C && git fsck )
 '
 
+test_expect_success SHA1,SYMLINKS 'setup repo with manually symlinked objects/*' '
+	git init S &&
+	(
+		cd S &&
+		test_commit A &&
+		git gc &&
+		test_commit B &&
+		(
+			cd .git/objects &&
+			mv 22/3b7836fb19fdf64ba2d3cd6173c6a283141f78 . &&
+			ln -s ../3b7836fb19fdf64ba2d3cd6173c6a283141f78 22/ &&
+			mv 40 forty &&
+			ln -s forty 40 &&
+			mv pack packs &&
+			ln -s packs pack &&
+			>.some-hidden-file &&
+			>some-file &&
+			mkdir .some-hidden-dir &&
+			>.some-hidden-dir/some-file &&
+			>.some-hidden-dir/.some-dot-file &&
+			mkdir some-dir &&
+			>some-dir/some-file &&
+			>some-dir/.some-dot-file
+		)
+	)
+'
+
+test_expect_success SHA1,SYMLINKS 'clone repo with manually symlinked objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option S S$option || return 1 &&
+		git -C S$option fsck || return 1
+	done &&
+	find S-* -type l | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/22/3b7836fb19fdf64ba2d3cd6173c6a283141f78
+	S--local/.git/objects/22/3b7836fb19fdf64ba2d3cd6173c6a283141f78
+	EOF
+	test_cmp expected actual &&
+	find S-* -name "*some*" | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-file
+	S--dissociate/.git/objects/some-dir
+	S--dissociate/.git/objects/some-dir/.some-dot-file
+	S--dissociate/.git/objects/some-dir/some-file
+	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-file
+	S--local/.git/objects/some-dir
+	S--local/.git/objects/some-dir/.some-dot-file
+	S--local/.git/objects/some-dir/some-file
+	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-file
+	S--no-hardlinks/.git/objects/some-dir
+	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/some-dir/some-file
+	S--no-hardlinks/.git/objects/some-file
+	EOF
+	test_cmp expected actual
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [WIP RFC PATCH v2 3/5] clone: copy hidden paths at local clone
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
  2019-02-26  5:18 ` [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
  2019-02-26  5:18 ` [WIP RFC PATCH v2 2/5] clone: test for our behavior on odd objects/* content Matheus Tavares
@ 2019-02-26  5:18 ` Matheus Tavares
  2019-02-26 12:13   ` Duy Nguyen
  2019-02-26  5:18 ` [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory Matheus Tavares
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares @ 2019-02-26  5:18 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Junio C Hamano

Make the copy_or_link_directory function no longer skip hidden paths.
This function, used to copy .git/objects, currently skips all hidden
directories but not hidden files, which is an odd behaviour. The reason
for that could be unintentional: probably the intention was to skip '.'
and '..' only but it ended up accidentally skipping all directories
starting with '.'. Besides being more natural, the new behaviour is more
permissive to the user.

Also adjusted tests to reflect this change.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            | 2 +-
 t/t5604-clone-reference.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..cae069f03b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 			continue;
 		}
 		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
+			if (!is_dot_or_dotdot(de->d_name))
 				copy_or_link_directory(src, dest,
 						       src_repo, src_baselen);
 			continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 6f9c77049e..f1a8e74c44 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -262,16 +262,25 @@ test_expect_success SHA1,SYMLINKS 'clone repo with manually symlinked objects/*'
 	test_cmp expected actual &&
 	find S-* -name "*some*" | sort >actual &&
 	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-dir
+	S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+	S--dissociate/.git/objects/.some-hidden-dir/some-file
 	S--dissociate/.git/objects/.some-hidden-file
 	S--dissociate/.git/objects/some-dir
 	S--dissociate/.git/objects/some-dir/.some-dot-file
 	S--dissociate/.git/objects/some-dir/some-file
 	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-dir
+	S--local/.git/objects/.some-hidden-dir/.some-dot-file
+	S--local/.git/objects/.some-hidden-dir/some-file
 	S--local/.git/objects/.some-hidden-file
 	S--local/.git/objects/some-dir
 	S--local/.git/objects/some-dir/.some-dot-file
 	S--local/.git/objects/some-dir/some-file
 	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir
+	S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
 	S--no-hardlinks/.git/objects/.some-hidden-file
 	S--no-hardlinks/.git/objects/some-dir
 	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (2 preceding siblings ...)
  2019-02-26  5:18 ` [WIP RFC PATCH v2 3/5] clone: copy hidden paths at local clone Matheus Tavares
@ 2019-02-26  5:18 ` Matheus Tavares
  2019-02-26 12:18   ` Duy Nguyen
  2019-02-26  5:18 ` [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares @ 2019-02-26  5:18 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Junio C Hamano

Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help removing
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes code more readable.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index cae069f03b..fd580fa98d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,24 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
 	fclose(in);
 }
 
+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+	/*
+	 * Create a dir at pathname unless there's already one.
+	 */
+	struct stat st;
+
+	if (mkdir(pathname, mode)) {
+		if (errno != EEXIST)
+			die_errno(_("failed to create directory '%s'"),
+				  pathname);
+		else if (stat(pathname, &st))
+			die_errno(_("failed to stat '%s'"), pathname);
+		else if (!S_ISDIR(st.st_mode))
+			die(_("%s exists and is not a directory"), pathname);
+	}
+}
+
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 				   const char *src_repo, int src_baselen)
 {
@@ -404,14 +422,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	if (!dir)
 		die_errno(_("failed to open '%s'"), src->buf);
 
-	if (mkdir(dest->buf, 0777)) {
-		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
-		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
-	}
+	mkdir_if_missing(dest->buf, 0777);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (3 preceding siblings ...)
  2019-02-26  5:18 ` [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory Matheus Tavares
@ 2019-02-26  5:18 ` Matheus Tavares
  2019-02-26 11:35   ` Ævar Arnfjörð Bjarmason
  2019-02-26 12:32   ` Duy Nguyen
  2019-02-26 11:36 ` [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-02-26  5:18 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	Junio C Hamano

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoid recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat, inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone would end up
successfully even though the .git/objects copy didn't fully succeeded.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
I can also make the change described in the last paragraph in a separate
patch before this one, but I would have to undo it in this patch because
dir-iterator already implements it. So, IMHO, it would be just noise
and not worthy.

 builtin/clone.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd580fa98d..b23ba64c94 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -411,42 +413,37 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
 }
 
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+				   const char *src_repo)
 {
-	struct dirent *de;
-	struct stat buf;
 	int src_len, dest_len;
-	DIR *dir;
-
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
+	struct dir_iterator *iter;
+	int iter_status;
+	struct stat st;
+	unsigned flags;
 
 	mkdir_if_missing(dest->buf, 0777);
 
+	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+	iter = dir_iterator_begin(src->buf, flags);
+
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (!is_dot_or_dotdot(de->d_name))
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
@@ -463,7 +460,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
@@ -481,7 +482,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
 		get_common_dir(&dest, dest_repo);
 		strbuf_addstr(&src, "/objects");
 		strbuf_addstr(&dest, "/objects");
-		copy_or_link_directory(&src, &dest, src_repo, src.len);
+		copy_or_link_directory(&src, &dest, src_repo);
 		strbuf_release(&src);
 		strbuf_release(&dest);
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-26  5:18 ` [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-02-26 11:35   ` Ævar Arnfjörð Bjarmason
  2019-02-26 12:32   ` Duy Nguyen
  1 sibling, 0 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 11:35 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Thomas Gummerer, Christian Couder,
	Nguyễn Thái Ngọc Duy, Junio C Hamano


On Tue, Feb 26 2019, Matheus Tavares wrote:

> +	int iter_status;
> +	struct stat st;
> +	unsigned flags;

If you compile with DEVELOPER=1 you'll get compile-time errors when
variables aren't used. Like "st" here.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (4 preceding siblings ...)
  2019-02-26  5:18 ` [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-02-26 11:36 ` Ævar Arnfjörð Bjarmason
  2019-02-26 12:20   ` Duy Nguyen
  2019-02-26 12:28 ` [RFC PATCH v3 " Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 11:36 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Thomas Gummerer, Christian Couder


On Tue, Feb 26 2019, Matheus Tavares wrote:

> Ævar sent v1: https://public-inbox.org/git/20190226002625.13022-1-avarab@gmail.com/

Tip. Use --in-reply-to with git format-patch, then the whole discussion
goes in the same thread on the ML.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin
  2019-02-26  5:18 ` [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-02-26 12:01   ` Duy Nguyen
  2019-02-27 13:59     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Duy Nguyen @ 2019-02-26 12:01 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Michael Haggerty, Ramsay Jones, Junio C Hamano

On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  {
>         struct dir_iterator_int *iter =
>                 (struct dir_iterator_int *)dir_iterator;
> +       int ret;
>
>         while (1) {
>                 struct dir_iterator_level *level =
> @@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>
>                         level->dir = opendir(iter->base.path.buf);
>                         if (!level->dir && errno != ENOENT) {
> +                               if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +                                       goto error_out;
>                                 warning("error opening directory %s: %s",
>                                         iter->base.path.buf, strerror(errno));
>                                 /* Popping the level is handled below */
> @@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>                         if (!de) {
>                                 /* This level is exhausted; pop up a level. */
>                                 if (errno) {
> +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +                                               goto error_out;
>                                         warning("error reading directory %s: %s",
>                                                 iter->base.path.buf, strerror(errno));
>                                 } else if (closedir(level->dir))
> @@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>                                 continue;
>
>                         strbuf_addstr(&iter->base.path, de->d_name);
> -                       if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> -                               if (errno != ENOENT)
> +
> +                       if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
> +                               ret = stat(iter->base.path.buf, &iter->base.st);
> +                       else
> +                               ret = lstat(iter->base.path.buf, &iter->base.st);
> +
> +                       if (ret < 0) {
> +                               if (errno != ENOENT) {
> +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +                                               goto error_out;
>                                         warning("error reading path '%s': %s",
>                                                 iter->base.path.buf,
>                                                 strerror(errno));
> +                               }
>                                 continue;
>                         }
>
> @@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>                         return ITER_OK;
>                 }
>         }
> +
> +error_out:
> +       dir_iterator_abort(dir_iterator);

Should this function call this or leaveit to the caller? The
description says "free resources" which to me sounds like something
the caller should be aware of. Although if double dir_iterator_abort()
has no bad side effects then I guess we can even leave it here.

PS. files-backend.c does call dir_iterator_abort() unconditionally.
Which sounds like this double-abort pattern should be dealt with even
if that call site does not use the pedantic flag (it could later on,
who knows; don't leave traps behind).

> +       return ITER_ERROR;
>  }
>
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 970793d07a..fe9eb9a04b 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -6,9 +6,10 @@
>  /*
>   * Iterate over a directory tree.
>   *
> - * Iterate over a directory tree, recursively, including paths of all
> - * types and hidden paths. Skip "." and ".." entries and don't follow
> - * symlinks except for the original path.
> + * With no flags to modify behaviour, iterate over a directory tree,

Nit but I think we can just skip "With no flags to modify behavior". It's given.

> + * recursively, including paths of all types and hidden paths. Skip
> + * "." and ".." entries and don't follow symlinks except for the
> + * original path.
>   *
>   * Every time dir_iterator_advance() is called, update the members of
>   * the dir_iterator structure to reflect the next path in the
> @@ -19,7 +20,7 @@
>   * A typical iteration looks like this:
>   *
>   *     int ok;
> - *     struct iterator *iter = dir_iterator_begin(path);
> + *     struct iterator *iter = dir_iterator_begin(path, 0);
>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
>   *             if (want_to_stop_iteration()) {
> @@ -40,6 +41,20 @@
>   * dir_iterator_advance() again.
>   */
>
> +/*
> + * Flags for dir_iterator_begin:
> + *
> + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> + *   in case of an error while trying to fetch the next entry, which is
> + *   to emit a warning and keep going. With this flag, resouces are
> + *   freed and ITER_ERROR is return immediately.
> + *
> + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
> + *   directories, i.e., iterate over linked directories' contents.

Do we really need this flag? If dir-iterator does not follow symlinks,
the caller _can_ check stat data to detect symlinks and look inside
anyway. So this flag is more about convenience (_if_ it has more than
one call site, convenience for one call site is just not worth it).

Or is there something else I'm missing?

> + */
> +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> +
-- 
Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 3/5] clone: copy hidden paths at local clone
  2019-02-26  5:18 ` [WIP RFC PATCH v2 3/5] clone: copy hidden paths at local clone Matheus Tavares
@ 2019-02-26 12:13   ` Duy Nguyen
  0 siblings, 0 replies; 127+ messages in thread
From: Duy Nguyen @ 2019-02-26 12:13 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Make the copy_or_link_directory function no longer skip hidden paths.

It's actually only hidden directories because of the S_ISDIR check
right above. Not that it matters much...

> This function, used to copy .git/objects, currently skips all hidden
> directories but not hidden files, which is an odd behaviour. The reason
> for that could be unintentional:

This goes back to the very first version of clone.c in 8434c2f1af
(Build in clone - 2008-04-27). If you look at git-clone.sh back then,
which is the version before the C conversion, it does something like
this

    find objects -depth -print | cpio $cpio_quiet_flag -pumd$l "$GIT_DIR/"

and I'm pretty sure 'find' will not attempt to hide anything. So yes I
think this is just for skipping '.' and '..' and accidentally skips
more. From that view, it's actually a regresssion but nobody ever
bothers to hide anything in 'objects' directory to notice.

> probably the intention was to skip '.'
> and '..' only but it ended up accidentally skipping all directories
> starting with '.'. Besides being more natural, the new behaviour is more
> permissive to the user.
>
> Also adjusted tests to reflect this change.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/clone.c            | 2 +-
>  t/t5604-clone-reference.sh | 9 +++++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 50bde99618..cae069f03b 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>                         continue;
>                 }
>                 if (S_ISDIR(buf.st_mode)) {
> -                       if (de->d_name[0] != '.')
> +                       if (!is_dot_or_dotdot(de->d_name))
>                                 copy_or_link_directory(src, dest,
>                                                        src_repo, src_baselen);
>                         continue;
> diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
> index 6f9c77049e..f1a8e74c44 100755
> --- a/t/t5604-clone-reference.sh
> +++ b/t/t5604-clone-reference.sh
> @@ -262,16 +262,25 @@ test_expect_success SHA1,SYMLINKS 'clone repo with manually symlinked objects/*'
>         test_cmp expected actual &&
>         find S-* -name "*some*" | sort >actual &&
>         cat >expected <<-EOF &&
> +       S--dissociate/.git/objects/.some-hidden-dir
> +       S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
> +       S--dissociate/.git/objects/.some-hidden-dir/some-file
>         S--dissociate/.git/objects/.some-hidden-file
>         S--dissociate/.git/objects/some-dir
>         S--dissociate/.git/objects/some-dir/.some-dot-file
>         S--dissociate/.git/objects/some-dir/some-file
>         S--dissociate/.git/objects/some-file
> +       S--local/.git/objects/.some-hidden-dir
> +       S--local/.git/objects/.some-hidden-dir/.some-dot-file
> +       S--local/.git/objects/.some-hidden-dir/some-file
>         S--local/.git/objects/.some-hidden-file
>         S--local/.git/objects/some-dir
>         S--local/.git/objects/some-dir/.some-dot-file
>         S--local/.git/objects/some-dir/some-file
>         S--local/.git/objects/some-file
> +       S--no-hardlinks/.git/objects/.some-hidden-dir
> +       S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
> +       S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
>         S--no-hardlinks/.git/objects/.some-hidden-file
>         S--no-hardlinks/.git/objects/some-dir
>         S--no-hardlinks/.git/objects/some-dir/.some-dot-file
> --
> 2.20.1
>


-- 
Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory
  2019-02-26  5:18 ` [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory Matheus Tavares
@ 2019-02-26 12:18   ` Duy Nguyen
  2019-02-27 17:30     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Duy Nguyen @ 2019-02-26 12:18 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Extract dir creation code snippet from copy_or_link_directory to its own
> function named mkdir_if_missing. This change will help removing
> copy_or_link_directory's explicit recursion, which will be done in a
> following patch. Also makes code more readable.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  builtin/clone.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index cae069f03b..fd580fa98d 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -392,6 +392,24 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
>         fclose(in);
>  }
>
> +static void mkdir_if_missing(const char *pathname, mode_t mode)
> +{
> +       /*
> +        * Create a dir at pathname unless there's already one.

This confused me for a second because I thought it described "st"
variable. I think we usually put the description of the function on
top (before the "void mkdir_if.." line). But with a such a short
function and clear name like this, I don't think we need any comments.

> +        */
> +       struct stat st;
> +
> +       if (mkdir(pathname, mode)) {

Good opportunity to unindent this by doing

    if (!mkdir(...
         return;

but it's up to you.

> +               if (errno != EEXIST)
> +                       die_errno(_("failed to create directory '%s'"),
> +                                 pathname);
> +               else if (stat(pathname, &st))
> +                       die_errno(_("failed to stat '%s'"), pathname);
> +               else if (!S_ISDIR(st.st_mode))
> +                       die(_("%s exists and is not a directory"), pathname);
> +       }
> +}
> +
>  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>                                    const char *src_repo, int src_baselen)
>  {
> @@ -404,14 +422,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>         if (!dir)
>                 die_errno(_("failed to open '%s'"), src->buf);
>
> -       if (mkdir(dest->buf, 0777)) {
> -               if (errno != EEXIST)
> -                       die_errno(_("failed to create directory '%s'"), dest->buf);
> -               else if (stat(dest->buf, &buf))
> -                       die_errno(_("failed to stat '%s'"), dest->buf);
> -               else if (!S_ISDIR(buf.st_mode))
> -                       die(_("%s exists and is not a directory"), dest->buf);
> -       }
> +       mkdir_if_missing(dest->buf, 0777);
>
>         strbuf_addch(src, '/');
>         src_len = src->len;
> --
> 2.20.1
>


-- 
Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests
  2019-02-26 11:36 ` [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:20   ` Duy Nguyen
  0 siblings, 0 replies; 127+ messages in thread
From: Duy Nguyen @ 2019-02-26 12:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Matheus Tavares, Git Mailing List, Thomas Gummerer, Christian Couder

On Tue, Feb 26, 2019 at 6:36 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Feb 26 2019, Matheus Tavares wrote:
>
> > Ævar sent v1: https://public-inbox.org/git/20190226002625.13022-1-avarab@gmail.com/
>
> Tip. Use --in-reply-to with git format-patch, then the whole discussion
> goes in the same thread on the ML.

Better tip. Click on v1 link, look for the "reply" link, click it and
copy the whole format-patch command :D
-- 
Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [RFC PATCH v3 0/5] clone: dir iterator refactoring with tests
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (5 preceding siblings ...)
  2019-02-26 11:36 ` [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:28 ` " Ævar Arnfjörð Bjarmason
  2019-02-26 20:56   ` Matheus Tavares Bernardino
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
  2019-02-26 12:28 ` [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 12:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Matheus Tavares, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason

There's still active review going on for the "v2"[1], in particular
Duy's comments coming in as I write this. This doesn't address any of
that.

What it does do is have a better version of my patch to add tests for
the current behavior. It now doesn't reply the on SHA1 prereq anymore,
and we can test the dotfiles without needing the SYMLINK prereq.

I also moved it to the front of the series just to document/make sure
that we start by asserting testing functionality. I ran a full test
suite run for each of these patches and they all pass.

The only other change is getting rid of an unused "struct stat st"
variable which errored out under DEVELOPER=1.

1. https://public-inbox.org/git/20190226051804.10631-1-matheus.bernardino@usp.br/

Matheus Tavares (4):
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 builtin/clone.c            |  69 ++++++++++-------
 dir-iterator.c             |  28 ++++++-
 dir-iterator.h             |  40 ++++++++--
 refs/files-backend.c       |   2 +-
 t/t5604-clone-reference.sh | 151 +++++++++++++++++++++++++++++++++++++
 5 files changed, 249 insertions(+), 41 deletions(-)

-- 
2.21.0.rc2.261.ga7da99ff1b


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (6 preceding siblings ...)
  2019-02-26 12:28 ` [RFC PATCH v3 " Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:28 ` Ævar Arnfjörð Bjarmason
  2019-02-28 21:19   ` Matheus Tavares Bernardino
  2019-02-26 12:28 ` [RFC PATCH v3 2/5] dir-iterator: add flags parameter to dir_iterator_begin Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 12:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Matheus Tavares, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason

Add tests for what happens when we locally clone .git/objects
directories where some of the loose objects or packs are symlinked, or
when when there's unknown files there.

I'm bending over backwards here to avoid a SHA1 dependency. See [1]
for an earlier and simpler version that hardcoded a SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

But the behavior showed where with symlinks seems pretty
random. E.g. if "pack" is a symlink we end up with two copies of the
contents, and only transfer some symlinks as-is.

In follow-up commits we'll look at changing some of this behavior, but
for now let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5604-clone-reference.sh | 142 +++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..cb0dc22d14 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,146 @@ test_expect_success 'clone, dissociate from alternates' '
 	( cd C && git fsck )
 '
 
+test_expect_success 'setup repo with garbage in objects/*' '
+	git init S &&
+	(
+		cd S &&
+		test_commit A &&
+
+		cd .git/objects &&
+		>.some-hidden-file &&
+		>some-file &&
+		mkdir .some-hidden-dir &&
+		>.some-hidden-dir/some-file &&
+		>.some-hidden-dir/.some-dot-file &&
+		mkdir some-dir &&
+		>some-dir/some-file &&
+		>some-dir/.some-dot-file
+	)
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option S S$option || return 1 &&
+		git -C S$option fsck || return 1
+	done &&
+	find S-* -name "*some*" | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-file
+	S--dissociate/.git/objects/some-dir
+	S--dissociate/.git/objects/some-dir/.some-dot-file
+	S--dissociate/.git/objects/some-dir/some-file
+	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-file
+	S--local/.git/objects/some-dir
+	S--local/.git/objects/some-dir/.some-dot-file
+	S--local/.git/objects/some-dir/some-file
+	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-file
+	S--no-hardlinks/.git/objects/some-dir
+	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/some-dir/some-file
+	S--no-hardlinks/.git/objects/some-file
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked objects/*' '
+	git init T &&
+	(
+		cd T &&
+		test_commit A &&
+		git gc &&
+		(
+			cd .git/objects &&
+			mv pack packs &&
+			ln -s packs pack
+		) &&
+		test_commit B &&
+		(
+			cd .git/objects &&
+			find ?? -type d >loose-dirs &&
+			last_loose=$(tail -n 1 loose-dirs) &&
+			mv $last_loose a-loose-dir &&
+			ln -s a-loose-dir $last_loose &&
+			first_loose=$(head -n 1 loose-dirs) &&
+			(
+				cd $first_loose &&
+				obj=$(ls *) &&
+				mv $obj ../an-object &&
+				ln -s ../an-object $obj
+			) &&
+			find . -type f | sort >../../../T.objects-files.raw &&
+			find . -type l | sort >../../../T.objects-links.raw
+		)
+	) &&
+	git -C T fsck &&
+	git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option T T$option || return 1 &&
+		git -C T$option fsck || return 1 &&
+		git -C T$option rev-list --all --objects >T$option.objects &&
+		test_cmp T.objects T$option.objects &&
+		(
+			cd T$option/.git/objects &&
+			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type l | sort >../../../T$option.objects-links.raw
+		)
+	done &&
+
+	for raw in $(ls T*.raw)
+	do
+		sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" <$raw >$raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./Y/Z
+	./a-loose-dir/Z
+	./an-object
+	./Y/Z
+	./info/packs
+	./loose-dirs
+	./pack/pack-Z.idx
+	./pack/pack-Z.pack
+	./packs/pack-Z.idx
+	./packs/pack-Z.pack
+	EOF
+	cat >expected-links <<-EOF &&
+	./Y/Z
+	EOF
+	for option in --local --dissociate
+	do
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+		test_cmp expected-links T$option.objects-links.raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./Y/Z
+	./Y/Z
+	./a-loose-dir/Z
+	./an-object
+	./Y/Z
+	./info/packs
+	./loose-dirs
+	./pack/pack-Z.idx
+	./pack/pack-Z.pack
+	./packs/pack-Z.idx
+	./packs/pack-Z.pack
+	EOF
+	test_cmp expected-files T--no-hardlinks.objects-files.raw.de-sha &&
+	test_must_be_empty T--no-hardlinks.objects-links.raw.de-sha &&
+
+	cat >expected-files <<-EOF &&
+	./info/alternates
+	EOF
+	test_cmp expected-files T--shared.objects-files.raw &&
+	test_must_be_empty T--shared.objects-links.raw
+'
+
 test_done
-- 
2.21.0.rc2.261.ga7da99ff1b


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [RFC PATCH v3 2/5] dir-iterator: add flags parameter to dir_iterator_begin
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (7 preceding siblings ...)
  2019-02-26 12:28 ` [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:28 ` Ævar Arnfjörð Bjarmason
  2019-02-26 12:28 ` [RFC PATCH v3 3/5] clone: copy hidden paths at local clone Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 12:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Matheus Tavares, Thomas Gummerer

From: Matheus Tavares <matheus.bernardino@usp.br>

Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
dir_iterator_advance abort imediatelly in the case of an error while
trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
makes the iteration follow symlinks to directories and include its
contents in the iteration. These new flags will be used in a subsequent
patch.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c       | 28 +++++++++++++++++++++++++---
 dir-iterator.h       | 40 ++++++++++++++++++++++++++++++++--------
 refs/files-backend.c |  2 +-
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..17aca8ea41 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,12 +48,16 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned flags;
 };
 
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
+	int ret;
 
 	while (1) {
 		struct dir_iterator_level *level =
@@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
 				warning("error opening directory %s: %s",
 					iter->base.path.buf, strerror(errno));
 				/* Popping the level is handled below */
@@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
 				} else if (closedir(level->dir))
@@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
+
+			if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+				ret = stat(iter->base.path.buf, &iter->base.st);
+			else
+				ret = lstat(iter->base.path.buf, &iter->base.st);
+
+			if (ret < 0) {
+				if (errno != ENOENT) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading path '%s': %s",
 						iter->base.path.buf,
 						strerror(errno));
+				}
 				continue;
 			}
 
@@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			return ITER_OK;
 		}
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +203,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +216,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
+	iter->flags = flags;
 	iter->levels_nr = 1;
 	iter->levels[0].initialized = 0;
 
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..fe9eb9a04b 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -6,9 +6,10 @@
 /*
  * Iterate over a directory tree.
  *
- * Iterate over a directory tree, recursively, including paths of all
- * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * With no flags to modify behaviour, iterate over a directory tree,
+ * recursively, including paths of all types and hidden paths. Skip
+ * "." and ".." entries and don't follow symlinks except for the
+ * original path.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
@@ -19,7 +20,7 @@
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, 0);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -40,6 +41,20 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error while trying to fetch the next entry, which is
+ *   to emit a warning and keep going. With this flag, resouces are
+ *   freed and ITER_ERROR is return immediately.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
+ *   directories, i.e., iterate over linked directories' contents.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -59,15 +74,19 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. Return a dir_iterator that holds the
+ * internal state of the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behaviour.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +95,11 @@ struct dir_iterator *dir_iterator_begin(const char *path);
  * dir_iterator and associated resources and return ITER_ERROR. It is
  * a bug to use iterator or call this function again after it has
  * returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir-iterator will return ITER_ERROR when failing
+ * to fetch the next entry or just emit a warning and try to fetch the
+ * next is defined by the 'pedantic' option at dir-iterator's
+ * initialization.
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index dd8abe9185..c3d3b6c454 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 
-- 
2.21.0.rc2.261.ga7da99ff1b


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [RFC PATCH v3 3/5] clone: copy hidden paths at local clone
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (8 preceding siblings ...)
  2019-02-26 12:28 ` [RFC PATCH v3 2/5] dir-iterator: add flags parameter to dir_iterator_begin Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:28 ` Ævar Arnfjörð Bjarmason
  2019-02-26 12:28 ` [RFC PATCH v3 4/5] clone: extract function from copy_or_link_directory Ævar Arnfjörð Bjarmason
  2019-02-26 12:28 ` [RFC PATCH v3 5/5] clone: use dir-iterator to avoid explicit dir traversal Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 12:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Matheus Tavares, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason

From: Matheus Tavares <matheus.bernardino@usp.br>

Make the copy_or_link_directory function no longer skip hidden paths.
This function, used to copy .git/objects, currently skips all hidden
directories but not hidden files, which is an odd behaviour. The reason
for that could be unintentional: probably the intention was to skip '.'
and '..' only but it ended up accidentally skipping all directories
starting with '.'. Besides being more natural, the new behaviour is more
permissive to the user.

Also adjusted tests to reflect this change.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            | 2 +-
 t/t5604-clone-reference.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..cae069f03b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 			continue;
 		}
 		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
+			if (!is_dot_or_dotdot(de->d_name))
 				copy_or_link_directory(src, dest,
 						       src_repo, src_baselen);
 			continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index cb0dc22d14..d650f67ca5 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	done &&
 	find S-* -name "*some*" | sort >actual &&
 	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-dir
+	S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+	S--dissociate/.git/objects/.some-hidden-dir/some-file
 	S--dissociate/.git/objects/.some-hidden-file
 	S--dissociate/.git/objects/some-dir
 	S--dissociate/.git/objects/some-dir/.some-dot-file
 	S--dissociate/.git/objects/some-dir/some-file
 	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-dir
+	S--local/.git/objects/.some-hidden-dir/.some-dot-file
+	S--local/.git/objects/.some-hidden-dir/some-file
 	S--local/.git/objects/.some-hidden-file
 	S--local/.git/objects/some-dir
 	S--local/.git/objects/some-dir/.some-dot-file
 	S--local/.git/objects/some-dir/some-file
 	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir
+	S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
 	S--no-hardlinks/.git/objects/.some-hidden-file
 	S--no-hardlinks/.git/objects/some-dir
 	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
-- 
2.21.0.rc2.261.ga7da99ff1b


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [RFC PATCH v3 4/5] clone: extract function from copy_or_link_directory
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (9 preceding siblings ...)
  2019-02-26 12:28 ` [RFC PATCH v3 3/5] clone: copy hidden paths at local clone Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:28 ` Ævar Arnfjörð Bjarmason
  2019-02-26 12:28 ` [RFC PATCH v3 5/5] clone: use dir-iterator to avoid explicit dir traversal Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 12:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Matheus Tavares, Thomas Gummerer

From: Matheus Tavares <matheus.bernardino@usp.br>

Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help removing
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes code more readable.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index cae069f03b..fd580fa98d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,24 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
 	fclose(in);
 }
 
+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+	/*
+	 * Create a dir at pathname unless there's already one.
+	 */
+	struct stat st;
+
+	if (mkdir(pathname, mode)) {
+		if (errno != EEXIST)
+			die_errno(_("failed to create directory '%s'"),
+				  pathname);
+		else if (stat(pathname, &st))
+			die_errno(_("failed to stat '%s'"), pathname);
+		else if (!S_ISDIR(st.st_mode))
+			die(_("%s exists and is not a directory"), pathname);
+	}
+}
+
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 				   const char *src_repo, int src_baselen)
 {
@@ -404,14 +422,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	if (!dir)
 		die_errno(_("failed to open '%s'"), src->buf);
 
-	if (mkdir(dest->buf, 0777)) {
-		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
-		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
-	}
+	mkdir_if_missing(dest->buf, 0777);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
-- 
2.21.0.rc2.261.ga7da99ff1b


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [RFC PATCH v3 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
                   ` (10 preceding siblings ...)
  2019-02-26 12:28 ` [RFC PATCH v3 4/5] clone: extract function from copy_or_link_directory Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:28 ` Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 12:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Matheus Tavares, Thomas Gummerer

From: Matheus Tavares <matheus.bernardino@usp.br>

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoid recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat, inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone would end up
successfully even though the .git/objects copy didn't fully succeeded.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd580fa98d..6c07648e49 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -411,42 +413,36 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
 }
 
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+				   const char *src_repo)
 {
-	struct dirent *de;
-	struct stat buf;
 	int src_len, dest_len;
-	DIR *dir;
-
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
+	struct dir_iterator *iter;
+	int iter_status;
+	unsigned flags;
 
 	mkdir_if_missing(dest->buf, 0777);
 
+	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+	iter = dir_iterator_begin(src->buf, flags);
+
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (!is_dot_or_dotdot(de->d_name))
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
@@ -463,7 +459,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
@@ -481,7 +481,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
 		get_common_dir(&dest, dest_repo);
 		strbuf_addstr(&src, "/objects");
 		strbuf_addstr(&dest, "/objects");
-		copy_or_link_directory(&src, &dest, src_repo, src.len);
+		copy_or_link_directory(&src, &dest, src_repo);
 		strbuf_release(&src);
 		strbuf_release(&dest);
 	}
-- 
2.21.0.rc2.261.ga7da99ff1b


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-26  5:18 ` [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
  2019-02-26 11:35   ` Ævar Arnfjörð Bjarmason
@ 2019-02-26 12:32   ` Duy Nguyen
  2019-02-26 12:50     ` Ævar Arnfjörð Bjarmason
  2019-02-27 17:40     ` Matheus Tavares Bernardino
  1 sibling, 2 replies; 127+ messages in thread
From: Duy Nguyen @ 2019-02-26 12:32 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Replace usage of opendir/readdir/closedir API to traverse directories
> recursively, at copy_or_link_directory function, by the dir-iterator
> API. This simplifies the code and avoid recursive calls to
> copy_or_link_directory.
>
> This process also makes copy_or_link_directory call die() in case of an
> error on readdir or stat, inside dir_iterator_advance. Previously it
> would just print a warning for errors on stat and ignore errors on
> readdir, which isn't nice because a local git clone would end up
> successfully even though the .git/objects copy didn't fully succeeded.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> I can also make the change described in the last paragraph in a separate
> patch before this one, but I would have to undo it in this patch because
> dir-iterator already implements it. So, IMHO, it would be just noise
> and not worthy.
>
>  builtin/clone.c | 45 +++++++++++++++++++++++----------------------
>  1 file changed, 23 insertions(+), 22 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index fd580fa98d..b23ba64c94 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -23,6 +23,8 @@
>  #include "transport.h"
>  #include "strbuf.h"
>  #include "dir.h"
> +#include "dir-iterator.h"
> +#include "iterator.h"
>  #include "sigchain.h"
>  #include "branch.h"
>  #include "remote.h"
> @@ -411,42 +413,37 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
>  }
>
>  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> -                                  const char *src_repo, int src_baselen)
> +                                  const char *src_repo)
>  {
> -       struct dirent *de;
> -       struct stat buf;
>         int src_len, dest_len;
> -       DIR *dir;
> -
> -       dir = opendir(src->buf);
> -       if (!dir)
> -               die_errno(_("failed to open '%s'"), src->buf);
> +       struct dir_iterator *iter;
> +       int iter_status;
> +       struct stat st;
> +       unsigned flags;
>
>         mkdir_if_missing(dest->buf, 0777);
>
> +       flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
> +       iter = dir_iterator_begin(src->buf, flags);
> +
>         strbuf_addch(src, '/');
>         src_len = src->len;
>         strbuf_addch(dest, '/');
>         dest_len = dest->len;
>
> -       while ((de = readdir(dir)) != NULL) {
> +       while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
>                 strbuf_setlen(src, src_len);
> -               strbuf_addstr(src, de->d_name);
> +               strbuf_addstr(src, iter->relative_path);
>                 strbuf_setlen(dest, dest_len);
> -               strbuf_addstr(dest, de->d_name);
> -               if (stat(src->buf, &buf)) {
> -                       warning (_("failed to stat %s\n"), src->buf);
> -                       continue;
> -               }
> -               if (S_ISDIR(buf.st_mode)) {
> -                       if (!is_dot_or_dotdot(de->d_name))
> -                               copy_or_link_directory(src, dest,
> -                                                      src_repo, src_baselen);
> +               strbuf_addstr(dest, iter->relative_path);
> +
> +               if (S_ISDIR(iter->st.st_mode)) {
> +                       mkdir_if_missing(dest->buf, 0777);

I wonder if this mkdir_if_missing is sufficient. What if you have to
create multiple directories?

Let's say the first advance, we hit "a". The the second advance we hit
directory "b/b/b/b", we would need to mkdir recursively and something
like safe_create_leading_directories() would be a better fit.

I'm not sure if it can happen though. I haven't re-read dir-iterator
code carefully.

>                         continue;
>                 }
>
>                 /* Files that cannot be copied bit-for-bit... */
> -               if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
> +               if (!strcmp(iter->relative_path, "info/alternates")) {

While we're here, this should be fspathcmp to be friendlier to
case-insensitive filesystems. You probably should fix it in a separate
patch though.

>                         copy_alternates(src, dest, src_repo);
>                         continue;
>                 }
> @@ -463,7 +460,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>                 if (copy_file_with_time(dest->buf, src->buf, 0666))
>                         die_errno(_("failed to copy file to '%s'"), dest->buf);
>         }
> -       closedir(dir);
> +
> +       if (iter_status != ITER_DONE) {
> +               strbuf_setlen(src, src_len);
> +               die(_("failed to iterate over '%s'"), src->buf);
> +       }

I think you need to abort the iterator even when it returns ITER_DONE.
At least that's how the first caller in files-backend.c does it.

>  }
>
>  static void clone_local(const char *src_repo, const char *dest_repo)
> @@ -481,7 +482,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
>                 get_common_dir(&dest, dest_repo);
>                 strbuf_addstr(&src, "/objects");
>                 strbuf_addstr(&dest, "/objects");
> -               copy_or_link_directory(&src, &dest, src_repo, src.len);
> +               copy_or_link_directory(&src, &dest, src_repo);
>                 strbuf_release(&src);
>                 strbuf_release(&dest);
>         }
> --
> 2.20.1
>


-- 
Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-26 12:32   ` Duy Nguyen
@ 2019-02-26 12:50     ` Ævar Arnfjörð Bjarmason
  2019-02-27 17:40     ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-26 12:50 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Matheus Tavares, Git Mailing List, Thomas Gummerer,
	Christian Couder, Junio C Hamano


On Tue, Feb 26 2019, Duy Nguyen wrote:

> On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>>
>> Replace usage of opendir/readdir/closedir API to traverse directories
>> recursively, at copy_or_link_directory function, by the dir-iterator
>> API. This simplifies the code and avoid recursive calls to
>> copy_or_link_directory.
>>
>> This process also makes copy_or_link_directory call die() in case of an
>> error on readdir or stat, inside dir_iterator_advance. Previously it
>> would just print a warning for errors on stat and ignore errors on
>> readdir, which isn't nice because a local git clone would end up
>> successfully even though the .git/objects copy didn't fully succeeded.
>>
>> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>> ---
>> I can also make the change described in the last paragraph in a separate
>> patch before this one, but I would have to undo it in this patch because
>> dir-iterator already implements it. So, IMHO, it would be just noise
>> and not worthy.
>>
>>  builtin/clone.c | 45 +++++++++++++++++++++++----------------------
>>  1 file changed, 23 insertions(+), 22 deletions(-)
>>
>> diff --git a/builtin/clone.c b/builtin/clone.c
>> index fd580fa98d..b23ba64c94 100644
>> --- a/builtin/clone.c
>> +++ b/builtin/clone.c
>> @@ -23,6 +23,8 @@
>>  #include "transport.h"
>>  #include "strbuf.h"
>>  #include "dir.h"
>> +#include "dir-iterator.h"
>> +#include "iterator.h"
>>  #include "sigchain.h"
>>  #include "branch.h"
>>  #include "remote.h"
>> @@ -411,42 +413,37 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
>>  }
>>
>>  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>> -                                  const char *src_repo, int src_baselen)
>> +                                  const char *src_repo)
>>  {
>> -       struct dirent *de;
>> -       struct stat buf;
>>         int src_len, dest_len;
>> -       DIR *dir;
>> -
>> -       dir = opendir(src->buf);
>> -       if (!dir)
>> -               die_errno(_("failed to open '%s'"), src->buf);
>> +       struct dir_iterator *iter;
>> +       int iter_status;
>> +       struct stat st;
>> +       unsigned flags;
>>
>>         mkdir_if_missing(dest->buf, 0777);
>>
>> +       flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
>> +       iter = dir_iterator_begin(src->buf, flags);
>> +
>>         strbuf_addch(src, '/');
>>         src_len = src->len;
>>         strbuf_addch(dest, '/');
>>         dest_len = dest->len;
>>
>> -       while ((de = readdir(dir)) != NULL) {
>> +       while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
>>                 strbuf_setlen(src, src_len);
>> -               strbuf_addstr(src, de->d_name);
>> +               strbuf_addstr(src, iter->relative_path);
>>                 strbuf_setlen(dest, dest_len);
>> -               strbuf_addstr(dest, de->d_name);
>> -               if (stat(src->buf, &buf)) {
>> -                       warning (_("failed to stat %s\n"), src->buf);
>> -                       continue;
>> -               }
>> -               if (S_ISDIR(buf.st_mode)) {
>> -                       if (!is_dot_or_dotdot(de->d_name))
>> -                               copy_or_link_directory(src, dest,
>> -                                                      src_repo, src_baselen);
>> +               strbuf_addstr(dest, iter->relative_path);
>> +
>> +               if (S_ISDIR(iter->st.st_mode)) {
>> +                       mkdir_if_missing(dest->buf, 0777);
>
> I wonder if this mkdir_if_missing is sufficient. What if you have to
> create multiple directories?
>
> Let's say the first advance, we hit "a". The the second advance we hit
> directory "b/b/b/b", we would need to mkdir recursively and something
> like safe_create_leading_directories() would be a better fit.
>
> I'm not sure if it can happen though. I haven't re-read dir-iterator
> code carefully.

This part isn't a problem. It iterates one level at a time. So given a
structure like a/b/c/d/e/f/g/h/i/j/k/some-l you'll find that if you
instrument the loop in clone.c you get:

    dir = a
    dir = a/b
    dir = a/b/c
    dir = a/b/c/d
    dir = a/b/c/d/e
    dir = a/b/c/d/e/f
    dir = a/b/c/d/e/f/g
    dir = a/b/c/d/e/f/g/h
    dir = a/b/c/d/e/f/g/h/i
    dir = a/b/c/d/e/f/g/h/i/j
    dir = a/b/c/d/e/f/g/h/i/j/k
    dir = a/b/c/d/e/f/g/h/i/j/k/some-l

So it's like the old implementation in that way. It readdir()'s and
walks directories one level at a time.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [RFC PATCH v3 0/5] clone: dir iterator refactoring with tests
  2019-02-26 12:28 ` [RFC PATCH v3 " Ævar Arnfjörð Bjarmason
@ 2019-02-26 20:56   ` Matheus Tavares Bernardino
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
  1 sibling, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-02-26 20:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Thomas Gummerer

Hi, Ævar

Thank you for helping me out on this series and, especially, for the tests part.

Now that we've come to a consensus on what the overall series' "shape"
should be, can I refine what's still needed and resubmit it, in the
upcoming days, as a patch set with the test included?

Best,
Matheus Tavares


On Tue, Feb 26, 2019 at 9:28 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> There's still active review going on for the "v2"[1], in particular
> Duy's comments coming in as I write this. This doesn't address any of
> that.
>
> What it does do is have a better version of my patch to add tests for
> the current behavior. It now doesn't reply the on SHA1 prereq anymore,
> and we can test the dotfiles without needing the SYMLINK prereq.
>
> I also moved it to the front of the series just to document/make sure
> that we start by asserting testing functionality. I ran a full test
> suite run for each of these patches and they all pass.
>
> The only other change is getting rid of an unused "struct stat st"
> variable which errored out under DEVELOPER=1.
>
> 1. https://public-inbox.org/git/20190226051804.10631-1-matheus.bernardino@usp.br/
>
> Matheus Tavares (4):
>   dir-iterator: add flags parameter to dir_iterator_begin
>   clone: copy hidden paths at local clone
>   clone: extract function from copy_or_link_directory
>   clone: use dir-iterator to avoid explicit dir traversal
>
> Ævar Arnfjörð Bjarmason (1):
>   clone: test for our behavior on odd objects/* content
>
>  builtin/clone.c            |  69 ++++++++++-------
>  dir-iterator.c             |  28 ++++++-
>  dir-iterator.h             |  40 ++++++++--
>  refs/files-backend.c       |   2 +-
>  t/t5604-clone-reference.sh | 151 +++++++++++++++++++++++++++++++++++++
>  5 files changed, 249 insertions(+), 41 deletions(-)
>
> --
> 2.21.0.rc2.261.ga7da99ff1b
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin
  2019-02-26 12:01   ` Duy Nguyen
@ 2019-02-27 13:59     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-02-27 13:59 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Michael Haggerty, Ramsay Jones, Junio C Hamano

On Tue, Feb 26, 2019 at 9:02 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >  int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >  {
> >         struct dir_iterator_int *iter =
> >                 (struct dir_iterator_int *)dir_iterator;
> > +       int ret;
> >
> >         while (1) {
> >                 struct dir_iterator_level *level =
> > @@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >
> >                         level->dir = opendir(iter->base.path.buf);
> >                         if (!level->dir && errno != ENOENT) {
> > +                               if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > +                                       goto error_out;
> >                                 warning("error opening directory %s: %s",
> >                                         iter->base.path.buf, strerror(errno));
> >                                 /* Popping the level is handled below */
> > @@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >                         if (!de) {
> >                                 /* This level is exhausted; pop up a level. */
> >                                 if (errno) {
> > +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > +                                               goto error_out;
> >                                         warning("error reading directory %s: %s",
> >                                                 iter->base.path.buf, strerror(errno));
> >                                 } else if (closedir(level->dir))
> > @@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >                                 continue;
> >
> >                         strbuf_addstr(&iter->base.path, de->d_name);
> > -                       if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> > -                               if (errno != ENOENT)
> > +
> > +                       if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
> > +                               ret = stat(iter->base.path.buf, &iter->base.st);
> > +                       else
> > +                               ret = lstat(iter->base.path.buf, &iter->base.st);
> > +
> > +                       if (ret < 0) {
> > +                               if (errno != ENOENT) {
> > +                                       if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > +                                               goto error_out;
> >                                         warning("error reading path '%s': %s",
> >                                                 iter->base.path.buf,
> >                                                 strerror(errno));
> > +                               }
> >                                 continue;
> >                         }
> >
> > @@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >                         return ITER_OK;
> >                 }
> >         }
> > +
> > +error_out:
> > +       dir_iterator_abort(dir_iterator);
>
> Should this function call this or leaveit to the caller? The
> description says "free resources" which to me sounds like something
> the caller should be aware of. Although if double dir_iterator_abort()
> has no bad side effects then I guess we can even leave it here.
>

The documentation at dir-iterator.h says that in case of errors (or
iteration exhaustion) at dir_iterator_advance(), the dir-iterator and
associated resources will be freed, so dir_iterator_abort() must be
called here, not left to the user. (Also, the use-case example at the
top of dir-iterator.h confirms this.)

> PS. files-backend.c does call dir_iterator_abort() unconditionally.
> Which sounds like this double-abort pattern should be dealt with even
> if that call site does not use the pedantic flag (it could later on,
> who knows; don't leave traps behind).
>

I have read the code at files-backend.c again but couldn't find where
it could call dir_iterator_abort() after dir_iterator_advance() have
already called it. Could you please point me out that?

A double-abort would certainly lead to 'double free or corruption'
error, so API users must only call it to abort iteration for some
reason when treating the iteration files, not because of iteration
errors. Also, this behaviour is not only for the pedantic flag. When
an iteration is complete, for example, a call to
dir_iterator_advance() will free the resources and return ITER_DONE.

> > +       return ITER_ERROR;
> >  }
> >
> > diff --git a/dir-iterator.h b/dir-iterator.h
> > index 970793d07a..fe9eb9a04b 100644
> > --- a/dir-iterator.h
> > +++ b/dir-iterator.h
> > @@ -6,9 +6,10 @@
> >  /*
> >   * Iterate over a directory tree.
> >   *
> > - * Iterate over a directory tree, recursively, including paths of all
> > - * types and hidden paths. Skip "." and ".." entries and don't follow
> > - * symlinks except for the original path.
> > + * With no flags to modify behaviour, iterate over a directory tree,
>
> Nit but I think we can just skip "With no flags to modify behavior". It's given.
>

Ok!

> > + * recursively, including paths of all types and hidden paths. Skip
> > + * "." and ".." entries and don't follow symlinks except for the
> > + * original path.
> >   *
> >   * Every time dir_iterator_advance() is called, update the members of
> >   * the dir_iterator structure to reflect the next path in the
> > @@ -19,7 +20,7 @@
> >   * A typical iteration looks like this:
> >   *
> >   *     int ok;
> > - *     struct iterator *iter = dir_iterator_begin(path);
> > + *     struct iterator *iter = dir_iterator_begin(path, 0);
> >   *
> >   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
> >   *             if (want_to_stop_iteration()) {
> > @@ -40,6 +41,20 @@
> >   * dir_iterator_advance() again.
> >   */
> >
> > +/*
> > + * Flags for dir_iterator_begin:
> > + *
> > + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> > + *   in case of an error while trying to fetch the next entry, which is
> > + *   to emit a warning and keep going. With this flag, resouces are
> > + *   freed and ITER_ERROR is return immediately.
> > + *
> > + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
> > + *   directories, i.e., iterate over linked directories' contents.
>
> Do we really need this flag? If dir-iterator does not follow symlinks,
> the caller _can_ check stat data to detect symlinks and look inside
> anyway. So this flag is more about convenience (_if_ it has more than
> one call site, convenience for one call site is just not worth it).
>

I don't think is just a convenience. If the API user wants to follow
symlinks a simple check at stat data to detect symlinks, wouldn't be
enough, since the user would want to traverse through all contents of
a symlinked directory and its subdirectories, recursively. Without
this flag, the caller would need to manually detect symlinks to
directories and start a new iteration on them (probably using a new
dir-iterator?). As that could lead to a more unnecessarily complex
code, and being this flag implementation so simple, I think it is
worthy.

> Or is there something else I'm missing?
>
> > + */
> > +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> > +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> > +
> --
> Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory
  2019-02-26 12:18   ` Duy Nguyen
@ 2019-02-27 17:30     ` Matheus Tavares Bernardino
  2019-02-27 22:45       ` Thomas Gummerer
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-02-27 17:30 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On Tue, Feb 26, 2019 at 9:18 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Extract dir creation code snippet from copy_or_link_directory to its own
> > function named mkdir_if_missing. This change will help removing
> > copy_or_link_directory's explicit recursion, which will be done in a
> > following patch. Also makes code more readable.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
> >  builtin/clone.c | 27 +++++++++++++++++++--------
> >  1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index cae069f03b..fd580fa98d 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -392,6 +392,24 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
> >         fclose(in);
> >  }
> >
> > +static void mkdir_if_missing(const char *pathname, mode_t mode)
> > +{
> > +       /*
> > +        * Create a dir at pathname unless there's already one.
>
> This confused me for a second because I thought it described "st"
> variable. I think we usually put the description of the function on
> top (before the "void mkdir_if.." line). But with a such a short
> function and clear name like this, I don't think we need any comments.
>

Yes, I also don't like the description being after the function
declaration, but I did this to follow the pattern from other functions
on the same file (e.g. copy_alternates).  Anyway, I do agree with you
that this function don't need a description, so I'm removing it for
the next version. Thanks!

> > +        */
> > +       struct stat st;
> > +
> > +       if (mkdir(pathname, mode)) {
>
> Good opportunity to unindent this by doing
>
>     if (!mkdir(...
>          return;
>
> but it's up to you.
>

Ok. But being such a small snippet, is the indentation really a code
smell here? (sorry, I'm still getting used to git's coding guidelines)

> > +               if (errno != EEXIST)
> > +                       die_errno(_("failed to create directory '%s'"),
> > +                                 pathname);
> > +               else if (stat(pathname, &st))
> > +                       die_errno(_("failed to stat '%s'"), pathname);
> > +               else if (!S_ISDIR(st.st_mode))
> > +                       die(_("%s exists and is not a directory"), pathname);
> > +       }
> > +}
> > +
> >  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >                                    const char *src_repo, int src_baselen)
> >  {
> > @@ -404,14 +422,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >         if (!dir)
> >                 die_errno(_("failed to open '%s'"), src->buf);
> >
> > -       if (mkdir(dest->buf, 0777)) {
> > -               if (errno != EEXIST)
> > -                       die_errno(_("failed to create directory '%s'"), dest->buf);
> > -               else if (stat(dest->buf, &buf))
> > -                       die_errno(_("failed to stat '%s'"), dest->buf);
> > -               else if (!S_ISDIR(buf.st_mode))
> > -                       die(_("%s exists and is not a directory"), dest->buf);
> > -       }
> > +       mkdir_if_missing(dest->buf, 0777);
> >
> >         strbuf_addch(src, '/');
> >         src_len = src->len;
> > --
> > 2.20.1
> >
>
>
> --
> Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-26 12:32   ` Duy Nguyen
  2019-02-26 12:50     ` Ævar Arnfjörð Bjarmason
@ 2019-02-27 17:40     ` Matheus Tavares Bernardino
  2019-02-28  7:13       ` Duy Nguyen
  2019-02-28  7:53       ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-02-27 17:40 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On Tue, Feb 26, 2019 at 9:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Replace usage of opendir/readdir/closedir API to traverse directories
> > recursively, at copy_or_link_directory function, by the dir-iterator
> > API. This simplifies the code and avoid recursive calls to
> > copy_or_link_directory.
> >
> > This process also makes copy_or_link_directory call die() in case of an
> > error on readdir or stat, inside dir_iterator_advance. Previously it
> > would just print a warning for errors on stat and ignore errors on
> > readdir, which isn't nice because a local git clone would end up
> > successfully even though the .git/objects copy didn't fully succeeded.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
> > I can also make the change described in the last paragraph in a separate
> > patch before this one, but I would have to undo it in this patch because
> > dir-iterator already implements it. So, IMHO, it would be just noise
> > and not worthy.
> >
> >  builtin/clone.c | 45 +++++++++++++++++++++++----------------------
> >  1 file changed, 23 insertions(+), 22 deletions(-)
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index fd580fa98d..b23ba64c94 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -23,6 +23,8 @@
> >  #include "transport.h"
> >  #include "strbuf.h"
> >  #include "dir.h"
> > +#include "dir-iterator.h"
> > +#include "iterator.h"
> >  #include "sigchain.h"
> >  #include "branch.h"
> >  #include "remote.h"
> > @@ -411,42 +413,37 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
> >  }
> >
> >  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> > -                                  const char *src_repo, int src_baselen)
> > +                                  const char *src_repo)
> >  {
> > -       struct dirent *de;
> > -       struct stat buf;
> >         int src_len, dest_len;
> > -       DIR *dir;
> > -
> > -       dir = opendir(src->buf);
> > -       if (!dir)
> > -               die_errno(_("failed to open '%s'"), src->buf);
> > +       struct dir_iterator *iter;
> > +       int iter_status;
> > +       struct stat st;
> > +       unsigned flags;
> >
> >         mkdir_if_missing(dest->buf, 0777);
> >
> > +       flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
> > +       iter = dir_iterator_begin(src->buf, flags);
> > +
> >         strbuf_addch(src, '/');
> >         src_len = src->len;
> >         strbuf_addch(dest, '/');
> >         dest_len = dest->len;
> >
> > -       while ((de = readdir(dir)) != NULL) {
> > +       while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
> >                 strbuf_setlen(src, src_len);
> > -               strbuf_addstr(src, de->d_name);
> > +               strbuf_addstr(src, iter->relative_path);
> >                 strbuf_setlen(dest, dest_len);
> > -               strbuf_addstr(dest, de->d_name);
> > -               if (stat(src->buf, &buf)) {
> > -                       warning (_("failed to stat %s\n"), src->buf);
> > -                       continue;
> > -               }
> > -               if (S_ISDIR(buf.st_mode)) {
> > -                       if (!is_dot_or_dotdot(de->d_name))
> > -                               copy_or_link_directory(src, dest,
> > -                                                      src_repo, src_baselen);
> > +               strbuf_addstr(dest, iter->relative_path);
> > +
> > +               if (S_ISDIR(iter->st.st_mode)) {
> > +                       mkdir_if_missing(dest->buf, 0777);
>
> I wonder if this mkdir_if_missing is sufficient. What if you have to
> create multiple directories?
>
> Let's say the first advance, we hit "a". The the second advance we hit
> directory "b/b/b/b", we would need to mkdir recursively and something
> like safe_create_leading_directories() would be a better fit.
>
> I'm not sure if it can happen though. I haven't re-read dir-iterator
> code carefully.
>
> >                         continue;
> >                 }
> >
> >                 /* Files that cannot be copied bit-for-bit... */
> > -               if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
> > +               if (!strcmp(iter->relative_path, "info/alternates")) {
>
> While we're here, this should be fspathcmp to be friendlier to
> case-insensitive filesystems. You probably should fix it in a separate
> patch though.
>

Nice! I will make this change in a separate patch in the series. Thanks!

> >                         copy_alternates(src, dest, src_repo);
> >                         continue;
> >                 }
> > @@ -463,7 +460,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >                 if (copy_file_with_time(dest->buf, src->buf, 0666))
> >                         die_errno(_("failed to copy file to '%s'"), dest->buf);
> >         }
> > -       closedir(dir);
> > +
> > +       if (iter_status != ITER_DONE) {
> > +               strbuf_setlen(src, src_len);
> > +               die(_("failed to iterate over '%s'"), src->buf);
> > +       }
>
> I think you need to abort the iterator even when it returns ITER_DONE.
> At least that's how the first caller in files-backend.c does it.
>

Hm, I don't think so, since dir_iterator_advance() already frees the
resources before returning ITER_DONE. Also, I may be wrong, but it
doesn't seem to me, that files-backend.c does it. The function
files_reflog_iterator_advance() that calls dir_iterator_advance() even
sets the dir-iterator pointer to NULL as soon as ITER_DONE is
returned.


> >  }
> >
> >  static void clone_local(const char *src_repo, const char *dest_repo)
> > @@ -481,7 +482,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
> >                 get_common_dir(&dest, dest_repo);
> >                 strbuf_addstr(&src, "/objects");
> >                 strbuf_addstr(&dest, "/objects");
> > -               copy_or_link_directory(&src, &dest, src_repo, src.len);
> > +               copy_or_link_directory(&src, &dest, src_repo);
> >                 strbuf_release(&src);
> >                 strbuf_release(&dest);
> >         }
> > --
> > 2.20.1
> >
>
>
> --
> Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory
  2019-02-27 17:30     ` Matheus Tavares Bernardino
@ 2019-02-27 22:45       ` Thomas Gummerer
  2019-02-27 22:50         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-02-27 22:45 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Duy Nguyen, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On 02/27, Matheus Tavares Bernardino wrote:
> On Tue, Feb 26, 2019 at 9:18 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
> > <matheus.bernardino@usp.br> wrote:
> > > +        */
> > > +       struct stat st;
> > > +
> > > +       if (mkdir(pathname, mode)) {
> >
> > Good opportunity to unindent this by doing
> >
> >     if (!mkdir(...
> >          return;
> >
> > but it's up to you.
> >
> 
> Ok. But being such a small snippet, is the indentation really a code
> smell here? (sorry, I'm still getting used to git's coding guidelines)

I don't think the indentation here is too bad here, but I think the
code is slightly easier to read with less indentation, and it's easier
to see what's happening in the success case as well without reading
the whole method.

And since this patch is already refactoring code we could do it here.
I don't think it's a very big deal either way, which is why Duy left
the decision on whether to use the suggestion or not up to you.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory
  2019-02-27 22:45       ` Thomas Gummerer
@ 2019-02-27 22:50         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-02-27 22:50 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Duy Nguyen, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On Wed, Feb 27, 2019 at 7:45 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 02/27, Matheus Tavares Bernardino wrote:
> > On Tue, Feb 26, 2019 at 9:18 AM Duy Nguyen <pclouds@gmail.com> wrote:
> > >
> > > On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
> > > <matheus.bernardino@usp.br> wrote:
> > > > +        */
> > > > +       struct stat st;
> > > > +
> > > > +       if (mkdir(pathname, mode)) {
> > >
> > > Good opportunity to unindent this by doing
> > >
> > >     if (!mkdir(...
> > >          return;
> > >
> > > but it's up to you.
> > >
> >
> > Ok. But being such a small snippet, is the indentation really a code
> > smell here? (sorry, I'm still getting used to git's coding guidelines)
>
> I don't think the indentation here is too bad here, but I think the
> code is slightly easier to read with less indentation, and it's easier
> to see what's happening in the success case as well without reading
> the whole method.
>
> And since this patch is already refactoring code we could do it here.
> I don't think it's a very big deal either way, which is why Duy left
> the decision on whether to use the suggestion or not up to you.

Ok, so I will do it for the next version! I just asked because it was
a good chance for me to learn a bit more about git's code style :)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-27 17:40     ` Matheus Tavares Bernardino
@ 2019-02-28  7:13       ` Duy Nguyen
  2019-02-28  7:53       ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 127+ messages in thread
From: Duy Nguyen @ 2019-02-28  7:13 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Git Mailing List, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano

On Thu, Feb 28, 2019 at 12:40 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
> > > @@ -463,7 +460,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> > >                 if (copy_file_with_time(dest->buf, src->buf, 0666))
> > >                         die_errno(_("failed to copy file to '%s'"), dest->buf);
> > >         }
> > > -       closedir(dir);
> > > +
> > > +       if (iter_status != ITER_DONE) {
> > > +               strbuf_setlen(src, src_len);
> > > +               die(_("failed to iterate over '%s'"), src->buf);
> > > +       }
> >
> > I think you need to abort the iterator even when it returns ITER_DONE.
> > At least that's how the first caller in files-backend.c does it.
> >
>
> Hm, I don't think so, since dir_iterator_advance() already frees the
> resources before returning ITER_DONE. Also, I may be wrong, but it
> doesn't seem to me, that files-backend.c does it. The function
> files_reflog_iterator_advance() that calls dir_iterator_advance() even
> sets the dir-iterator pointer to NULL as soon as ITER_DONE is
> returned.

Arghhh.. I read the ref_iterator_abort and thought it was
dir_iterator_abort! Sorry for the noise, too many iterators.
-- 
Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-27 17:40     ` Matheus Tavares Bernardino
  2019-02-28  7:13       ` Duy Nguyen
@ 2019-02-28  7:53       ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-28  7:53 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Duy Nguyen, Git Mailing List, Thomas Gummerer, Christian Couder,
	Junio C Hamano


On Wed, Feb 27 2019, Matheus Tavares Bernardino wrote:

> On Tue, Feb 26, 2019 at 9:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
>>
>> On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares
>> <matheus.bernardino@usp.br> wrote:
>> >
>> > Replace usage of opendir/readdir/closedir API to traverse directories
>> > recursively, at copy_or_link_directory function, by the dir-iterator
>> > API. This simplifies the code and avoid recursive calls to
>> > copy_or_link_directory.
>> >
>> > This process also makes copy_or_link_directory call die() in case of an
>> > error on readdir or stat, inside dir_iterator_advance. Previously it
>> > would just print a warning for errors on stat and ignore errors on
>> > readdir, which isn't nice because a local git clone would end up
>> > successfully even though the .git/objects copy didn't fully succeeded.
>> >
>> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>> > ---
>> > I can also make the change described in the last paragraph in a separate
>> > patch before this one, but I would have to undo it in this patch because
>> > dir-iterator already implements it. So, IMHO, it would be just noise
>> > and not worthy.
>> >
>> >  builtin/clone.c | 45 +++++++++++++++++++++++----------------------
>> >  1 file changed, 23 insertions(+), 22 deletions(-)
>> >
>> > diff --git a/builtin/clone.c b/builtin/clone.c
>> > index fd580fa98d..b23ba64c94 100644
>> > --- a/builtin/clone.c
>> > +++ b/builtin/clone.c
>> > @@ -23,6 +23,8 @@
>> >  #include "transport.h"
>> >  #include "strbuf.h"
>> >  #include "dir.h"
>> > +#include "dir-iterator.h"
>> > +#include "iterator.h"
>> >  #include "sigchain.h"
>> >  #include "branch.h"
>> >  #include "remote.h"
>> > @@ -411,42 +413,37 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
>> >  }
>> >
>> >  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>> > -                                  const char *src_repo, int src_baselen)
>> > +                                  const char *src_repo)
>> >  {
>> > -       struct dirent *de;
>> > -       struct stat buf;
>> >         int src_len, dest_len;
>> > -       DIR *dir;
>> > -
>> > -       dir = opendir(src->buf);
>> > -       if (!dir)
>> > -               die_errno(_("failed to open '%s'"), src->buf);
>> > +       struct dir_iterator *iter;
>> > +       int iter_status;
>> > +       struct stat st;
>> > +       unsigned flags;
>> >
>> >         mkdir_if_missing(dest->buf, 0777);
>> >
>> > +       flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
>> > +       iter = dir_iterator_begin(src->buf, flags);
>> > +
>> >         strbuf_addch(src, '/');
>> >         src_len = src->len;
>> >         strbuf_addch(dest, '/');
>> >         dest_len = dest->len;
>> >
>> > -       while ((de = readdir(dir)) != NULL) {
>> > +       while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
>> >                 strbuf_setlen(src, src_len);
>> > -               strbuf_addstr(src, de->d_name);
>> > +               strbuf_addstr(src, iter->relative_path);
>> >                 strbuf_setlen(dest, dest_len);
>> > -               strbuf_addstr(dest, de->d_name);
>> > -               if (stat(src->buf, &buf)) {
>> > -                       warning (_("failed to stat %s\n"), src->buf);
>> > -                       continue;
>> > -               }
>> > -               if (S_ISDIR(buf.st_mode)) {
>> > -                       if (!is_dot_or_dotdot(de->d_name))
>> > -                               copy_or_link_directory(src, dest,
>> > -                                                      src_repo, src_baselen);
>> > +               strbuf_addstr(dest, iter->relative_path);
>> > +
>> > +               if (S_ISDIR(iter->st.st_mode)) {
>> > +                       mkdir_if_missing(dest->buf, 0777);
>>
>> I wonder if this mkdir_if_missing is sufficient. What if you have to
>> create multiple directories?
>>
>> Let's say the first advance, we hit "a". The the second advance we hit
>> directory "b/b/b/b", we would need to mkdir recursively and something
>> like safe_create_leading_directories() would be a better fit.
>>
>> I'm not sure if it can happen though. I haven't re-read dir-iterator
>> code carefully.
>>
>> >                         continue;
>> >                 }
>> >
>> >                 /* Files that cannot be copied bit-for-bit... */
>> > -               if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
>> > +               if (!strcmp(iter->relative_path, "info/alternates")) {
>>
>> While we're here, this should be fspathcmp to be friendlier to
>> case-insensitive filesystems. You probably should fix it in a separate
>> patch though.
>>
>
> Nice! I will make this change in a separate patch in the series. Thanks!
>
>> >                         copy_alternates(src, dest, src_repo);
>> >                         continue;
>> >                 }
>> > @@ -463,7 +460,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>> >                 if (copy_file_with_time(dest->buf, src->buf, 0666))
>> >                         die_errno(_("failed to copy file to '%s'"), dest->buf);
>> >         }
>> > -       closedir(dir);
>> > +
>> > +       if (iter_status != ITER_DONE) {
>> > +               strbuf_setlen(src, src_len);
>> > +               die(_("failed to iterate over '%s'"), src->buf);
>> > +       }
>>
>> I think you need to abort the iterator even when it returns ITER_DONE.
>> At least that's how the first caller in files-backend.c does it.
>>
>
> Hm, I don't think so, since dir_iterator_advance() already frees the
> resources before returning ITER_DONE. Also, I may be wrong, but it
> doesn't seem to me, that files-backend.c does it. The function
> files_reflog_iterator_advance() that calls dir_iterator_advance() even
> sets the dir-iterator pointer to NULL as soon as ITER_DONE is
> returned.

As Duy notes you're right about this. Just to add: This pattern is
usually something we avoid in the git codebase, i.e. we try not to make
it an error to call the whatever_utility_free() function twice.

See e.g. stop_progress_msg for such an implementation, i.e. we'll check
if it's NULL already and exit early, and maybe use FREE_AND_NULL()
instead of NULL.

It means that for the cost of trivial overhead you don't need to worry
about double freeing or maintaining a "was this freed?" state machine.

Now, whether you want to fix that while you're at it is another matter,
just pointing out that we usually try to avoid this problem entirely...

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content
  2019-02-26 12:28 ` [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content Ævar Arnfjörð Bjarmason
@ 2019-02-28 21:19   ` Matheus Tavares Bernardino
  2019-03-01 13:49     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-02-28 21:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Thomas Gummerer, Christian Couder

Hi, Ævar

I'm finishing the required changes in this series to send a v4, but
when submitting to travis ci, I got some errors on the
t5604-clone-reference test:
https://travis-ci.org/MatheusBernardino/git/builds/500007587

Do you have any idea why?

Best,
Matheus Tavares

On Tue, Feb 26, 2019 at 9:28 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Add tests for what happens when we locally clone .git/objects
> directories where some of the loose objects or packs are symlinked, or
> when when there's unknown files there.
>
> I'm bending over backwards here to avoid a SHA1 dependency. See [1]
> for an earlier and simpler version that hardcoded a SHA-1s.
>
> This behavior has been the same for a *long* time, but hasn't been
> tested for.
>
> There's a good post-hoc argument to be made for copying over unknown
> things, e.g. I'd like a git version that doesn't know about the
> commit-graph to copy it under "clone --local" so a newer git version
> can make use of it.
>
> But the behavior showed where with symlinks seems pretty
> random. E.g. if "pack" is a symlink we end up with two copies of the
> contents, and only transfer some symlinks as-is.
>
> In follow-up commits we'll look at changing some of this behavior, but
> for now let's just assert it as-is so we'll notice what we'll change
> later.
>
> 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t5604-clone-reference.sh | 142 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 142 insertions(+)
>
> diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
> index 4320082b1b..cb0dc22d14 100755
> --- a/t/t5604-clone-reference.sh
> +++ b/t/t5604-clone-reference.sh
> @@ -221,4 +221,146 @@ test_expect_success 'clone, dissociate from alternates' '
>         ( cd C && git fsck )
>  '
>
> +test_expect_success 'setup repo with garbage in objects/*' '
> +       git init S &&
> +       (
> +               cd S &&
> +               test_commit A &&
> +
> +               cd .git/objects &&
> +               >.some-hidden-file &&
> +               >some-file &&
> +               mkdir .some-hidden-dir &&
> +               >.some-hidden-dir/some-file &&
> +               >.some-hidden-dir/.some-dot-file &&
> +               mkdir some-dir &&
> +               >some-dir/some-file &&
> +               >some-dir/.some-dot-file
> +       )
> +'
> +
> +test_expect_success 'clone a repo with garbage in objects/*' '
> +       for option in --local --no-hardlinks --shared --dissociate
> +       do
> +               git clone $option S S$option || return 1 &&
> +               git -C S$option fsck || return 1
> +       done &&
> +       find S-* -name "*some*" | sort >actual &&
> +       cat >expected <<-EOF &&
> +       S--dissociate/.git/objects/.some-hidden-file
> +       S--dissociate/.git/objects/some-dir
> +       S--dissociate/.git/objects/some-dir/.some-dot-file
> +       S--dissociate/.git/objects/some-dir/some-file
> +       S--dissociate/.git/objects/some-file
> +       S--local/.git/objects/.some-hidden-file
> +       S--local/.git/objects/some-dir
> +       S--local/.git/objects/some-dir/.some-dot-file
> +       S--local/.git/objects/some-dir/some-file
> +       S--local/.git/objects/some-file
> +       S--no-hardlinks/.git/objects/.some-hidden-file
> +       S--no-hardlinks/.git/objects/some-dir
> +       S--no-hardlinks/.git/objects/some-dir/.some-dot-file
> +       S--no-hardlinks/.git/objects/some-dir/some-file
> +       S--no-hardlinks/.git/objects/some-file
> +       EOF
> +       test_cmp expected actual
> +'
> +
> +test_expect_success SYMLINKS 'setup repo with manually symlinked objects/*' '
> +       git init T &&
> +       (
> +               cd T &&
> +               test_commit A &&
> +               git gc &&
> +               (
> +                       cd .git/objects &&
> +                       mv pack packs &&
> +                       ln -s packs pack
> +               ) &&
> +               test_commit B &&
> +               (
> +                       cd .git/objects &&
> +                       find ?? -type d >loose-dirs &&
> +                       last_loose=$(tail -n 1 loose-dirs) &&
> +                       mv $last_loose a-loose-dir &&
> +                       ln -s a-loose-dir $last_loose &&
> +                       first_loose=$(head -n 1 loose-dirs) &&
> +                       (
> +                               cd $first_loose &&
> +                               obj=$(ls *) &&
> +                               mv $obj ../an-object &&
> +                               ln -s ../an-object $obj
> +                       ) &&
> +                       find . -type f | sort >../../../T.objects-files.raw &&
> +                       find . -type l | sort >../../../T.objects-links.raw
> +               )
> +       ) &&
> +       git -C T fsck &&
> +       git -C T rev-list --all --objects >T.objects
> +'
> +
> +
> +test_expect_success SYMLINKS 'clone repo with symlinked objects/*' '
> +       for option in --local --no-hardlinks --shared --dissociate
> +       do
> +               git clone $option T T$option || return 1 &&
> +               git -C T$option fsck || return 1 &&
> +               git -C T$option rev-list --all --objects >T$option.objects &&
> +               test_cmp T.objects T$option.objects &&
> +               (
> +                       cd T$option/.git/objects &&
> +                       find . -type f | sort >../../../T$option.objects-files.raw &&
> +                       find . -type l | sort >../../../T$option.objects-links.raw
> +               )
> +       done &&
> +
> +       for raw in $(ls T*.raw)
> +       do
> +               sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" <$raw >$raw.de-sha || return 1
> +       done &&
> +
> +       cat >expected-files <<-EOF &&
> +       ./Y/Z
> +       ./a-loose-dir/Z
> +       ./an-object
> +       ./Y/Z
> +       ./info/packs
> +       ./loose-dirs
> +       ./pack/pack-Z.idx
> +       ./pack/pack-Z.pack
> +       ./packs/pack-Z.idx
> +       ./packs/pack-Z.pack
> +       EOF
> +       cat >expected-links <<-EOF &&
> +       ./Y/Z
> +       EOF
> +       for option in --local --dissociate
> +       do
> +               test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
> +               test_cmp expected-links T$option.objects-links.raw.de-sha || return 1
> +       done &&
> +
> +       cat >expected-files <<-EOF &&
> +       ./Y/Z
> +       ./Y/Z
> +       ./a-loose-dir/Z
> +       ./an-object
> +       ./Y/Z
> +       ./info/packs
> +       ./loose-dirs
> +       ./pack/pack-Z.idx
> +       ./pack/pack-Z.pack
> +       ./packs/pack-Z.idx
> +       ./packs/pack-Z.pack
> +       EOF
> +       test_cmp expected-files T--no-hardlinks.objects-files.raw.de-sha &&
> +       test_must_be_empty T--no-hardlinks.objects-links.raw.de-sha &&
> +
> +       cat >expected-files <<-EOF &&
> +       ./info/alternates
> +       EOF
> +       test_cmp expected-files T--shared.objects-files.raw &&
> +       test_must_be_empty T--shared.objects-links.raw
> +'
> +
>  test_done
> --
> 2.21.0.rc2.261.ga7da99ff1b
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content
  2019-02-28 21:19   ` Matheus Tavares Bernardino
@ 2019-03-01 13:49     ` Ævar Arnfjörð Bjarmason
  2019-03-13  3:17       ` Matheus Tavares
  0 siblings, 1 reply; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-01 13:49 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Thomas Gummerer, Christian Couder


On Thu, Feb 28 2019, Matheus Tavares Bernardino wrote:

> Hi, Ævar
>
> I'm finishing the required changes in this series to send a v4, but
> when submitting to travis ci, I got some errors on the
> t5604-clone-reference test:
> https://travis-ci.org/MatheusBernardino/git/builds/500007587

I don't have access to an OSX box, but could reproduce the failure on
NetBSD.

It's because there link() when faced with a symlink behaves
differently. On GNU/Linux link()-ing a symlink will produce another
symlink like it, on NetBSD (and presumably OSX) doing that will produce
a hardlink to the file the existing symlink points to.

I've pushed out a version of mine here which you might want to pull in:
https://github.com/git/git/compare/master...avar:clone-dir-iterator-3

I.e. this whole thing is silly, but just preserving the notion that
we're not going to introduce behavior changes as we're refactoring.

So it adds a commit right after the tests I added to detect this case,
and use symlink() or link() as appropriate instead of link().

There's then a commit at the end you might want to squash in that
reproduces this behavior on top of your iterator refactoring.

Of course the DIR_ITERATOR_FOLLOW_SYMLINKS flag at this point is rather
silly. We're telling it to stat(), and then end up needing both stat()
and lstat() data.

I'm starting to think that this interface which previously only had one
caller, but now has two exists at the wrong abstraction level. I.e. it
itself needs to call lstat(). Seems sensible to always do that and leave
it to the caller to call stat() if they need, as I believe Duy pointed
out. Also noticed that dir-iterator.h still has a comment to the effect
that it'll call "lstat()", even though we now have a "stat() or
lstat()?" flag.

> On Tue, Feb 26, 2019 at 9:28 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> Add tests for what happens when we locally clone .git/objects
>> directories where some of the loose objects or packs are symlinked, or
>> when when there's unknown files there.
>>
>> I'm bending over backwards here to avoid a SHA1 dependency. See [1]
>> for an earlier and simpler version that hardcoded a SHA-1s.
>>
>> This behavior has been the same for a *long* time, but hasn't been
>> tested for.
>>
>> There's a good post-hoc argument to be made for copying over unknown
>> things, e.g. I'd like a git version that doesn't know about the
>> commit-graph to copy it under "clone --local" so a newer git version
>> can make use of it.
>>
>> But the behavior showed where with symlinks seems pretty
>> random. E.g. if "pack" is a symlink we end up with two copies of the
>> contents, and only transfer some symlinks as-is.
>>
>> In follow-up commits we'll look at changing some of this behavior, but
>> for now let's just assert it as-is so we'll notice what we'll change
>> later.
>>
>> 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  t/t5604-clone-reference.sh | 142 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 142 insertions(+)
>>
>> diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
>> index 4320082b1b..cb0dc22d14 100755
>> --- a/t/t5604-clone-reference.sh
>> +++ b/t/t5604-clone-reference.sh
>> @@ -221,4 +221,146 @@ test_expect_success 'clone, dissociate from alternates' '
>>         ( cd C && git fsck )
>>  '
>>
>> +test_expect_success 'setup repo with garbage in objects/*' '
>> +       git init S &&
>> +       (
>> +               cd S &&
>> +               test_commit A &&
>> +
>> +               cd .git/objects &&
>> +               >.some-hidden-file &&
>> +               >some-file &&
>> +               mkdir .some-hidden-dir &&
>> +               >.some-hidden-dir/some-file &&
>> +               >.some-hidden-dir/.some-dot-file &&
>> +               mkdir some-dir &&
>> +               >some-dir/some-file &&
>> +               >some-dir/.some-dot-file
>> +       )
>> +'
>> +
>> +test_expect_success 'clone a repo with garbage in objects/*' '
>> +       for option in --local --no-hardlinks --shared --dissociate
>> +       do
>> +               git clone $option S S$option || return 1 &&
>> +               git -C S$option fsck || return 1
>> +       done &&
>> +       find S-* -name "*some*" | sort >actual &&
>> +       cat >expected <<-EOF &&
>> +       S--dissociate/.git/objects/.some-hidden-file
>> +       S--dissociate/.git/objects/some-dir
>> +       S--dissociate/.git/objects/some-dir/.some-dot-file
>> +       S--dissociate/.git/objects/some-dir/some-file
>> +       S--dissociate/.git/objects/some-file
>> +       S--local/.git/objects/.some-hidden-file
>> +       S--local/.git/objects/some-dir
>> +       S--local/.git/objects/some-dir/.some-dot-file
>> +       S--local/.git/objects/some-dir/some-file
>> +       S--local/.git/objects/some-file
>> +       S--no-hardlinks/.git/objects/.some-hidden-file
>> +       S--no-hardlinks/.git/objects/some-dir
>> +       S--no-hardlinks/.git/objects/some-dir/.some-dot-file
>> +       S--no-hardlinks/.git/objects/some-dir/some-file
>> +       S--no-hardlinks/.git/objects/some-file
>> +       EOF
>> +       test_cmp expected actual
>> +'
>> +
>> +test_expect_success SYMLINKS 'setup repo with manually symlinked objects/*' '
>> +       git init T &&
>> +       (
>> +               cd T &&
>> +               test_commit A &&
>> +               git gc &&
>> +               (
>> +                       cd .git/objects &&
>> +                       mv pack packs &&
>> +                       ln -s packs pack
>> +               ) &&
>> +               test_commit B &&
>> +               (
>> +                       cd .git/objects &&
>> +                       find ?? -type d >loose-dirs &&
>> +                       last_loose=$(tail -n 1 loose-dirs) &&
>> +                       mv $last_loose a-loose-dir &&
>> +                       ln -s a-loose-dir $last_loose &&
>> +                       first_loose=$(head -n 1 loose-dirs) &&
>> +                       (
>> +                               cd $first_loose &&
>> +                               obj=$(ls *) &&
>> +                               mv $obj ../an-object &&
>> +                               ln -s ../an-object $obj
>> +                       ) &&
>> +                       find . -type f | sort >../../../T.objects-files.raw &&
>> +                       find . -type l | sort >../../../T.objects-links.raw
>> +               )
>> +       ) &&
>> +       git -C T fsck &&
>> +       git -C T rev-list --all --objects >T.objects
>> +'
>> +
>> +
>> +test_expect_success SYMLINKS 'clone repo with symlinked objects/*' '
>> +       for option in --local --no-hardlinks --shared --dissociate
>> +       do
>> +               git clone $option T T$option || return 1 &&
>> +               git -C T$option fsck || return 1 &&
>> +               git -C T$option rev-list --all --objects >T$option.objects &&
>> +               test_cmp T.objects T$option.objects &&
>> +               (
>> +                       cd T$option/.git/objects &&
>> +                       find . -type f | sort >../../../T$option.objects-files.raw &&
>> +                       find . -type l | sort >../../../T$option.objects-links.raw
>> +               )
>> +       done &&
>> +
>> +       for raw in $(ls T*.raw)
>> +       do
>> +               sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" <$raw >$raw.de-sha || return 1
>> +       done &&
>> +
>> +       cat >expected-files <<-EOF &&
>> +       ./Y/Z
>> +       ./a-loose-dir/Z
>> +       ./an-object
>> +       ./Y/Z
>> +       ./info/packs
>> +       ./loose-dirs
>> +       ./pack/pack-Z.idx
>> +       ./pack/pack-Z.pack
>> +       ./packs/pack-Z.idx
>> +       ./packs/pack-Z.pack
>> +       EOF
>> +       cat >expected-links <<-EOF &&
>> +       ./Y/Z
>> +       EOF
>> +       for option in --local --dissociate
>> +       do
>> +               test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
>> +               test_cmp expected-links T$option.objects-links.raw.de-sha || return 1
>> +       done &&
>> +
>> +       cat >expected-files <<-EOF &&
>> +       ./Y/Z
>> +       ./Y/Z
>> +       ./a-loose-dir/Z
>> +       ./an-object
>> +       ./Y/Z
>> +       ./info/packs
>> +       ./loose-dirs
>> +       ./pack/pack-Z.idx
>> +       ./pack/pack-Z.pack
>> +       ./packs/pack-Z.idx
>> +       ./packs/pack-Z.pack
>> +       EOF
>> +       test_cmp expected-files T--no-hardlinks.objects-files.raw.de-sha &&
>> +       test_must_be_empty T--no-hardlinks.objects-links.raw.de-sha &&
>> +
>> +       cat >expected-files <<-EOF &&
>> +       ./info/alternates
>> +       EOF
>> +       test_cmp expected-files T--shared.objects-files.raw &&
>> +       test_must_be_empty T--shared.objects-links.raw
>> +'
>> +
>>  test_done
>> --
>> 2.21.0.rc2.261.ga7da99ff1b
>>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content
  2019-03-01 13:49     ` Ævar Arnfjörð Bjarmason
@ 2019-03-13  3:17       ` Matheus Tavares
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-13  3:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Thomas Gummerer, Christian Couder

Hi, Ævar

First of all, I must apologize for my very late reply. I just got back 
from a trip and only now have been able to look again at this series.

On Fri, Mar 1, 2019 at 10:49 AM Ævar Arnfjörð Bjarmason 
<avarab@gmail.com> wrote:
 >
 >
 > On Thu, Feb 28 2019, Matheus Tavares Bernardino wrote:
 >
 > > Hi, Ævar
 > >
 > > I'm finishing the required changes in this series to send a v4, but
 > > when submitting to travis ci, I got some errors on the
 > > t5604-clone-reference test:
 > > https://travis-ci.org/MatheusBernardino/git/builds/500007587
 >
 > I don't have access to an OSX box, but could reproduce the failure on
 > NetBSD.
 >
 > It's because there link() when faced with a symlink behaves
 > differently. On GNU/Linux link()-ing a symlink will produce another
 > symlink like it, on NetBSD (and presumably OSX) doing that will produce
 > a hardlink to the file the existing symlink points to.

Hm, interesting. I installed NetBSD here and played with it a little: It 
seems that the inconsistency comes from the fact that link() follows 
symlinks on NetBSD but not on Linux. i.e., if you have a file "C" 
link()-ed to a file "B" which, in turn, is a symlink to a file "A", 
running "ls -li A B C" we can see that:
- On linux, C points to B's inode
- On NetBSD, C points to A's inode

 > I've pushed out a version of mine here which you might want to pull in:
 > https://github.com/git/git/compare/master...avar:clone-dir-iterator-3
 >
 > I.e. this whole thing is silly, but just preserving the notion that
 > we're not going to introduce behavior changes as we're refactoring.
 >
 > So it adds a commit right after the tests I added to detect this case,
 > and use symlink() or link() as appropriate instead of link().

I think this still modifies [a little] the current behaviour, since 
symlink() will make dest->buf point to a new inode (which will be a 
symlink just as src->buf, but on a different inode) while the current 
behaviour, on Linux, is to have dest->buf being a hardlink to src->buf 
(same inode). I don't know if this sentence got too confuse, but what I 
meant is that symlink() will make a symlink at dest->buf while link(), 
on linux, will make a hardlink to the given symlink.

 > There's then a commit at the end you might want to squash in that
 > reproduces this behavior on top of your iterator refactoring.
 >
 > Of course the DIR_ITERATOR_FOLLOW_SYMLINKS flag at this point is rather
 > silly. We're telling it to stat(), and then end up needing both stat()
 > and lstat() data.
 >
 > I'm starting to think that this interface which previously only had one
 > caller, but now has two exists at the wrong abstraction level. I.e. it
 > itself needs to call lstat(). Seems sensible to always do that and leave
 > it to the caller to call stat() if they need, as I believe Duy pointed
 > out.

I see what you mean, but if the caller needs to call stat() itself, in 
the occurrence of a symlink to a directory, it would have to start a new 
directory iteration upon the symlinked dir and its subdirectories (if it 
wants to follow symlinks). This approach could became a little messy, 
IMHO. And just by calling stat() at dir-iterator we already get the 
symlinked directories iterated "for free", without having to modify 
anything else in the code. So I still think it is a good idea to have 
the DIR_ITERATOR_FOLLOW_SYMLINKS flag at dir-iterator, making it call 
stat() instead of lstat().

 > Also noticed that dir-iterator.h still has a comment to the effect
 > that it'll call "lstat()", even though we now have a "stat() or
 > lstat()?" flag.

Thanks for noticing it, I will fix it in v4.

 > > On Tue, Feb 26, 2019 at 9:28 AM Ævar Arnfjörð Bjarmason
 > > <avarab@gmail.com> wrote:
 > >>
 > >> Add tests for what happens when we locally clone .git/objects
 > >> directories where some of the loose objects or packs are symlinked, or
 > >> when when there's unknown files there.
 > >>
 > >> I'm bending over backwards here to avoid a SHA1 dependency. See [1]
 > >> for an earlier and simpler version that hardcoded a SHA-1s.
 > >>
 > >> This behavior has been the same for a *long* time, but hasn't been
 > >> tested for.
 > >>
 > >> There's a good post-hoc argument to be made for copying over unknown
 > >> things, e.g. I'd like a git version that doesn't know about the
 > >> commit-graph to copy it under "clone --local" so a newer git version
 > >> can make use of it.
 > >>
 > >> But the behavior showed where with symlinks seems pretty
 > >> random. E.g. if "pack" is a symlink we end up with two copies of the
 > >> contents, and only transfer some symlinks as-is.
 > >>
 > >> In follow-up commits we'll look at changing some of this behavior, but
 > >> for now let's just assert it as-is so we'll notice what we'll change
 > >> later.
 > >>
 > >> 1. 
https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
 > >>
 > >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
 > >> ---
 > >>  t/t5604-clone-reference.sh | 142 
+++++++++++++++++++++++++++++++++++++
 > >>  1 file changed, 142 insertions(+)
 > >>
 > >> diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
 > >> index 4320082b1b..cb0dc22d14 100755
 > >> --- a/t/t5604-clone-reference.sh
 > >> +++ b/t/t5604-clone-reference.sh
 > >> @@ -221,4 +221,146 @@ test_expect_success 'clone, dissociate from 
alternates' '
 > >>         ( cd C && git fsck )
 > >>  '
 > >>
 > >> +test_expect_success 'setup repo with garbage in objects/*' '
 > >> +       git init S &&
 > >> +       (
 > >> +               cd S &&
 > >> +               test_commit A &&
 > >> +
 > >> +               cd .git/objects &&
 > >> +               >.some-hidden-file &&
 > >> +               >some-file &&
 > >> +               mkdir .some-hidden-dir &&
 > >> +               >.some-hidden-dir/some-file &&
 > >> +               >.some-hidden-dir/.some-dot-file &&
 > >> +               mkdir some-dir &&
 > >> +               >some-dir/some-file &&
 > >> +               >some-dir/.some-dot-file
 > >> +       )
 > >> +'
 > >> +
 > >> +test_expect_success 'clone a repo with garbage in objects/*' '
 > >> +       for option in --local --no-hardlinks --shared --dissociate
 > >> +       do
 > >> +               git clone $option S S$option || return 1 &&
 > >> +               git -C S$option fsck || return 1
 > >> +       done &&
 > >> +       find S-* -name "*some*" | sort >actual &&
 > >> +       cat >expected <<-EOF &&
 > >> +       S--dissociate/.git/objects/.some-hidden-file
 > >> +       S--dissociate/.git/objects/some-dir
 > >> +       S--dissociate/.git/objects/some-dir/.some-dot-file
 > >> +       S--dissociate/.git/objects/some-dir/some-file
 > >> +       S--dissociate/.git/objects/some-file
 > >> +       S--local/.git/objects/.some-hidden-file
 > >> +       S--local/.git/objects/some-dir
 > >> +       S--local/.git/objects/some-dir/.some-dot-file
 > >> +       S--local/.git/objects/some-dir/some-file
 > >> +       S--local/.git/objects/some-file
 > >> +       S--no-hardlinks/.git/objects/.some-hidden-file
 > >> +       S--no-hardlinks/.git/objects/some-dir
 > >> +       S--no-hardlinks/.git/objects/some-dir/.some-dot-file
 > >> +       S--no-hardlinks/.git/objects/some-dir/some-file
 > >> +       S--no-hardlinks/.git/objects/some-file
 > >> +       EOF
 > >> +       test_cmp expected actual
 > >> +'
 > >> +
 > >> +test_expect_success SYMLINKS 'setup repo with manually symlinked 
objects/*' '
 > >> +       git init T &&
 > >> +       (
 > >> +               cd T &&
 > >> +               test_commit A &&
 > >> +               git gc &&
 > >> +               (
 > >> +                       cd .git/objects &&
 > >> +                       mv pack packs &&
 > >> +                       ln -s packs pack
 > >> +               ) &&
 > >> +               test_commit B &&
 > >> +               (
 > >> +                       cd .git/objects &&
 > >> +                       find ?? -type d >loose-dirs &&
 > >> +                       last_loose=$(tail -n 1 loose-dirs) &&
 > >> +                       mv $last_loose a-loose-dir &&
 > >> +                       ln -s a-loose-dir $last_loose &&
 > >> +                       first_loose=$(head -n 1 loose-dirs) &&
 > >> +                       (
 > >> +                               cd $first_loose &&
 > >> +                               obj=$(ls *) &&
 > >> +                               mv $obj ../an-object &&
 > >> +                               ln -s ../an-object $obj
 > >> +                       ) &&
 > >> +                       find . -type f | sort 
 >../../../T.objects-files.raw &&
 > >> +                       find . -type l | sort 
 >../../../T.objects-links.raw
 > >> +               )
 > >> +       ) &&
 > >> +       git -C T fsck &&
 > >> +       git -C T rev-list --all --objects >T.objects
 > >> +'
 > >> +
 > >> +
 > >> +test_expect_success SYMLINKS 'clone repo with symlinked objects/*' '
 > >> +       for option in --local --no-hardlinks --shared --dissociate
 > >> +       do
 > >> +               git clone $option T T$option || return 1 &&
 > >> +               git -C T$option fsck || return 1 &&
 > >> +               git -C T$option rev-list --all --objects 
 >T$option.objects &&
 > >> +               test_cmp T.objects T$option.objects &&
 > >> +               (
 > >> +                       cd T$option/.git/objects &&
 > >> +                       find . -type f | sort 
 >../../../T$option.objects-files.raw &&
 > >> +                       find . -type l | sort 
 >../../../T$option.objects-links.raw
 > >> +               )
 > >> +       done &&
 > >> +
 > >> +       for raw in $(ls T*.raw)
 > >> +       do
 > >> +               sed -e "s!/..\$!/X!; s!/../!/Y/!; 
s![0-9a-f]\{38,\}!Z!" <$raw >$raw.de-sha || return 1
 > >> +       done &&
 > >> +
 > >> +       cat >expected-files <<-EOF &&
 > >> +       ./Y/Z
 > >> +       ./a-loose-dir/Z
 > >> +       ./an-object
 > >> +       ./Y/Z
 > >> +       ./info/packs
 > >> +       ./loose-dirs
 > >> +       ./pack/pack-Z.idx
 > >> +       ./pack/pack-Z.pack
 > >> +       ./packs/pack-Z.idx
 > >> +       ./packs/pack-Z.pack
 > >> +       EOF
 > >> +       cat >expected-links <<-EOF &&
 > >> +       ./Y/Z
 > >> +       EOF
 > >> +       for option in --local --dissociate
 > >> +       do
 > >> +               test_cmp expected-files 
T$option.objects-files.raw.de-sha || return 1 &&
 > >> +               test_cmp expected-links 
T$option.objects-links.raw.de-sha || return 1
 > >> +       done &&
 > >> +
 > >> +       cat >expected-files <<-EOF &&
 > >> +       ./Y/Z
 > >> +       ./Y/Z
 > >> +       ./a-loose-dir/Z
 > >> +       ./an-object
 > >> +       ./Y/Z
 > >> +       ./info/packs
 > >> +       ./loose-dirs
 > >> +       ./pack/pack-Z.idx
 > >> +       ./pack/pack-Z.pack
 > >> +       ./packs/pack-Z.idx
 > >> +       ./packs/pack-Z.pack
 > >> +       EOF
 > >> +       test_cmp expected-files 
T--no-hardlinks.objects-files.raw.de-sha &&
 > >> +       test_must_be_empty T--no-hardlinks.objects-links.raw.de-sha &&
 > >> +
 > >> +       cat >expected-files <<-EOF &&
 > >> +       ./info/alternates
 > >> +       EOF
 > >> +       test_cmp expected-files T--shared.objects-files.raw &&
 > >> +       test_must_be_empty T--shared.objects-links.raw
 > >> +'
 > >> +
 > >>  test_done
 > >> --
 > >> 2.21.0.rc2.261.ga7da99ff1b
 > >>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 0/7] clone: dir-iterator refactoring with tests
  2019-02-26 12:28 ` [RFC PATCH v3 " Ævar Arnfjörð Bjarmason
  2019-02-26 20:56   ` Matheus Tavares Bernardino
@ 2019-03-22 23:22   ` " Matheus Tavares
  2019-03-22 23:22     ` [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
                       ` (7 more replies)
  1 sibling, 8 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp

This patchset contains:
- a replacement of explicit recursive dir iteration at
  copy_or_link_directory for the dir-iterator API;
- some refactoring and behaviour changes at local clone, mainly to
  take care of symlinks and hidden files at .git/objects; and
- tests for this type of files

Changes since v3 includes:
- Addressed Duy's and Ævar's comments and suggestions in v2,
  including but not limited to:
  - Add patch to replace strcmp for fspathcmp
  - Code comments refactoring
  - Unident snippet at mkdir_if_missing
- Made t5604 added subtests pass under GIT_TEST_MULTI_PACK_INDEX=1
  and GIT_TEST_COMMIT_GRAPH=1
- Re-implemented patch 2 with linkat(), to be simpler and have a safer
  behaviour when clonning repos with symlinks at .git/objects
- Split first patch's tests into patches 1 and 2, tweaked it a little
  to reflect the previous item changes, and replaced some usages of the
  string 'link' for 'symlink' just to avoid confusion with 'hardlinks'
  which are also known just by 'links'.

v3: https://public-inbox.org/git/20190226122829.19178-1-avarab@gmail.com/

Matheus Tavares (6):
  clone: better handle symlinked files at .git/objects/
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal
  clone: Replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 builtin/clone.c            |  72 ++++++++++---------
 dir-iterator.c             |  28 +++++++-
 dir-iterator.h             |  39 +++++++++--
 refs/files-backend.c       |   2 +-
 t/t5604-clone-reference.sh | 137 +++++++++++++++++++++++++++++++++++++
 5 files changed, 236 insertions(+), 42 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
@ 2019-03-22 23:22     ` Matheus Tavares
  2019-03-24 18:09       ` Matheus Tavares Bernardino
                         ` (2 more replies)
  2019-03-22 23:22     ` [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
                       ` (6 subsequent siblings)
  7 siblings, 3 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Alex Riesen, Junio C Hamano

From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA1 dependency. See [1]
for an earlier and simpler version that hardcoded a SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t5604-clone-reference.sh | 116 +++++++++++++++++++++++++++++++++++++
 1 file changed, 116 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..708b1a2c66 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,120 @@ test_expect_success 'clone, dissociate from alternates' '
 	( cd C && git fsck )
 '
 
+test_expect_success 'setup repo with garbage in objects/*' '
+	git init S &&
+	(
+		cd S &&
+		test_commit A &&
+
+		cd .git/objects &&
+		>.some-hidden-file &&
+		>some-file &&
+		mkdir .some-hidden-dir &&
+		>.some-hidden-dir/some-file &&
+		>.some-hidden-dir/.some-dot-file &&
+		mkdir some-dir &&
+		>some-dir/some-file &&
+		>some-dir/.some-dot-file
+	)
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option S S$option || return 1 &&
+		git -C S$option fsck || return 1
+	done &&
+	find S-* -name "*some*" | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-file
+	S--dissociate/.git/objects/some-dir
+	S--dissociate/.git/objects/some-dir/.some-dot-file
+	S--dissociate/.git/objects/some-dir/some-file
+	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-file
+	S--local/.git/objects/some-dir
+	S--local/.git/objects/some-dir/.some-dot-file
+	S--local/.git/objects/some-dir/some-file
+	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-file
+	S--no-hardlinks/.git/objects/some-dir
+	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/some-dir/some-file
+	S--no-hardlinks/.git/objects/some-file
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+	git init T &&
+	(
+		cd T &&
+		test_commit A &&
+		git gc &&
+		(
+			cd .git/objects &&
+			mv pack packs &&
+			ln -s packs pack
+		) &&
+		test_commit B &&
+		(
+			cd .git/objects &&
+			find ?? -type d >loose-dirs &&
+			last_loose=$(tail -n 1 loose-dirs) &&
+			rm -f loose-dirs &&
+			mv $last_loose a-loose-dir &&
+			ln -s a-loose-dir $last_loose &&
+			find . -type f | sort >../../../T.objects-files.raw &&
+			echo unknown_content> unknown_file
+		)
+	) &&
+	git -C T fsck &&
+	git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option T T$option || return 1 &&
+		git -C T$option fsck || return 1 &&
+		git -C T$option rev-list --all --objects >T$option.objects &&
+		test_cmp T.objects T$option.objects &&
+		(
+			cd T$option/.git/objects &&
+			find . -type f | sort >../../../T$option.objects-files.raw
+		)
+	done &&
+
+	for raw in $(ls T*.raw)
+	do
+		sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" \
+		    -e "/multi-pack-index/d" -e "/commit-graph/d" <$raw >$raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./Y/Z
+	./Y/Z
+	./a-loose-dir/Z
+	./Y/Z
+	./info/packs
+	./pack/pack-Z.idx
+	./pack/pack-Z.pack
+	./packs/pack-Z.idx
+	./packs/pack-Z.pack
+	./unknown_file
+	EOF
+
+	for option in --local --dissociate --no-hardlinks
+	do
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./info/alternates
+	EOF
+	test_cmp expected-files T--shared.objects-files.raw
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
  2019-03-22 23:22     ` [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
@ 2019-03-22 23:22     ` Matheus Tavares
  2019-03-28 22:10       ` Thomas Gummerer
  2019-03-22 23:22     ` [GSoC][PATCH v4 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
                       ` (5 subsequent siblings)
  7 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Benoit Pierre, Junio C Hamano

There is currently an odd behaviour when locally clonning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
    $ touch a
    $ ln -s a b
    $ link b c
    $ ls -li a b c
    155 [...] a
    156 [...] b -> a
    156 [...] c -> a

But on NetBSD:
    $ ls -li a b c
    2609160 [...] a
    2609164 [...] b -> a
    2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
since the behaviour on GNU/Linux may result in broken symlinks, let's
re-implement it with linkat() instead of link() using a flag to always
follow symlinks and make the hardlink be to the pointed file. With this,
besides standardizing the behaviour, no broken symlinks will be
produced. Also, add tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            |  2 +-
 t/t5604-clone-reference.sh | 26 +++++++++++++++++++-------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..b76f33c635 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (unlink(dest->buf) && errno != ENOENT)
 			die_errno(_("failed to unlink '%s'"), dest->buf);
 		if (!option_no_hardlinks) {
-			if (!link(src->buf, dest->buf))
+			if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))
 				continue;
 			if (option_local > 0)
 				die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 708b1a2c66..76d45f1187 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	test_cmp expected actual
 '
 
-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
 	git init T &&
 	(
 		cd T &&
@@ -282,10 +282,18 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 			cd .git/objects &&
 			find ?? -type d >loose-dirs &&
 			last_loose=$(tail -n 1 loose-dirs) &&
-			rm -f loose-dirs &&
 			mv $last_loose a-loose-dir &&
 			ln -s a-loose-dir $last_loose &&
+			first_loose=$(head -n 1 loose-dirs) &&
+			rm -f loose-dirs &&
+			(
+				cd $first_loose &&
+				obj=$(ls *) &&
+				mv $obj ../an-object &&
+				ln -s ../an-object $obj
+			) &&
 			find . -type f | sort >../../../T.objects-files.raw &&
+			find . -type l | sort >../../../T.objects-symlinks.raw &&
 			echo unknown_content> unknown_file
 		)
 	) &&
@@ -294,7 +302,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 '
 
 
-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
 	for option in --local --no-hardlinks --shared --dissociate
 	do
 		git clone $option T T$option || return 1 &&
@@ -303,7 +311,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 		test_cmp T.objects T$option.objects &&
 		(
 			cd T$option/.git/objects &&
-			find . -type f | sort >../../../T$option.objects-files.raw
+			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type l | sort >../../../T$option.objects-symlinks.raw
 		)
 	done &&
 
@@ -317,6 +326,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./Y/Z
 	./Y/Z
 	./a-loose-dir/Z
+	./an-object
 	./Y/Z
 	./info/packs
 	./pack/pack-Z.idx
@@ -326,15 +336,17 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./unknown_file
 	EOF
 
-	for option in --local --dissociate --no-hardlinks
+	for option in --local --no-hardlinks --dissociate
 	do
-		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+		test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
 	done &&
 
 	cat >expected-files <<-EOF &&
 	./info/alternates
 	EOF
-	test_cmp expected-files T--shared.objects-files.raw
+	test_cmp expected-files T--shared.objects-files.raw &&
+	test_must_be_empty T--shared.objects-symlinks.raw
 '
 
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
  2019-03-22 23:22     ` [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
  2019-03-22 23:22     ` [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
@ 2019-03-22 23:22     ` Matheus Tavares
  2019-03-28 22:19       ` Thomas Gummerer
  2019-03-22 23:22     ` [GSoC][PATCH v4 4/7] clone: copy hidden paths at local clone Matheus Tavares
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Michael Haggerty, Ramsay Jones, Junio C Hamano

Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
dir_iterator_advance abort imediatelly in the case of an error while
trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
makes the iteration follow symlinks to directories and include its
contents in the iteration. These new flags will be used in a subsequent
patch.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c       | 28 +++++++++++++++++++++++++---
 dir-iterator.h       | 39 +++++++++++++++++++++++++++++++++------
 refs/files-backend.c |  2 +-
 3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..17aca8ea41 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,12 +48,16 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned flags;
 };
 
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
+	int ret;
 
 	while (1) {
 		struct dir_iterator_level *level =
@@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
 				warning("error opening directory %s: %s",
 					iter->base.path.buf, strerror(errno));
 				/* Popping the level is handled below */
@@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
 				} else if (closedir(level->dir))
@@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
+
+			if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+				ret = stat(iter->base.path.buf, &iter->base.st);
+			else
+				ret = lstat(iter->base.path.buf, &iter->base.st);
+
+			if (ret < 0) {
+				if (errno != ENOENT) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading path '%s': %s",
 						iter->base.path.buf,
 						strerror(errno));
+				}
 				continue;
 			}
 
@@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			return ITER_OK;
 		}
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +203,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +216,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
+	iter->flags = flags;
 	iter->levels_nr = 1;
 	iter->levels[0].initialized = 0;
 
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..890d5d8dbb 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -19,7 +19,7 @@
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, 0);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -40,6 +40,20 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error while trying to fetch the next entry, which is
+ *   to emit a warning and keep going. With this flag, resouces are
+ *   freed and ITER_ERROR is return immediately.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
+ *   directories, i.e., iterate over linked directories' contents.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -54,20 +68,28 @@ struct dir_iterator {
 	/* The current basename: */
 	const char *basename;
 
-	/* The result of calling lstat() on path: */
+	/*
+	 * The result of calling lstat() on path or stat(), if the
+	 * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+	 * dir_iterator's initialization.
+	 */
 	struct stat st;
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. Return a dir_iterator that holds the
+ * internal state of the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behaviour.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +98,11 @@ struct dir_iterator *dir_iterator_begin(const char *path);
  * dir_iterator and associated resources and return ITER_ERROR. It is
  * a bug to use iterator or call this function again after it has
  * returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir-iterator will return ITER_ERROR when failing
+ * to fetch the next entry or just emit a warning and try to fetch the
+ * next is defined by the 'pedantic' option at dir-iterator's
+ * initialization.
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..2ce9783097 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 4/7] clone: copy hidden paths at local clone
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
                       ` (2 preceding siblings ...)
  2019-03-22 23:22     ` [GSoC][PATCH v4 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-03-22 23:22     ` Matheus Tavares
  2019-03-22 23:22     ` [GSoC][PATCH v4 5/7] clone: extract function from copy_or_link_directory Matheus Tavares
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Benoit Pierre, Junio C Hamano

Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            | 2 +-
 t/t5604-clone-reference.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index b76f33c635..60c6780c06 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 			continue;
 		}
 		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
+			if (!is_dot_or_dotdot(de->d_name))
 				copy_or_link_directory(src, dest,
 						       src_repo, src_baselen);
 			continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 76d45f1187..0992baa5ac 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	done &&
 	find S-* -name "*some*" | sort >actual &&
 	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-dir
+	S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+	S--dissociate/.git/objects/.some-hidden-dir/some-file
 	S--dissociate/.git/objects/.some-hidden-file
 	S--dissociate/.git/objects/some-dir
 	S--dissociate/.git/objects/some-dir/.some-dot-file
 	S--dissociate/.git/objects/some-dir/some-file
 	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-dir
+	S--local/.git/objects/.some-hidden-dir/.some-dot-file
+	S--local/.git/objects/.some-hidden-dir/some-file
 	S--local/.git/objects/.some-hidden-file
 	S--local/.git/objects/some-dir
 	S--local/.git/objects/some-dir/.some-dot-file
 	S--local/.git/objects/some-dir/some-file
 	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir
+	S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
 	S--no-hardlinks/.git/objects/.some-hidden-file
 	S--no-hardlinks/.git/objects/some-dir
 	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 5/7] clone: extract function from copy_or_link_directory
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
                       ` (3 preceding siblings ...)
  2019-03-22 23:22     ` [GSoC][PATCH v4 4/7] clone: copy hidden paths at local clone Matheus Tavares
@ 2019-03-22 23:22     ` Matheus Tavares
  2019-03-22 23:22     ` [GSoC][PATCH v4 6/7] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Benoit Pierre, Junio C Hamano

Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help removing
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes code more readable.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 60c6780c06..c17bbf1bfc 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,21 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
 	fclose(in);
 }
 
+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+	struct stat st;
+
+	if (!mkdir(pathname, mode))
+		return;
+
+	if (errno != EEXIST)
+		die_errno(_("failed to create directory '%s'"), pathname);
+	else if (stat(pathname, &st))
+		die_errno(_("failed to stat '%s'"), pathname);
+	else if (!S_ISDIR(st.st_mode))
+		die(_("%s exists and is not a directory"), pathname);
+}
+
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 				   const char *src_repo, int src_baselen)
 {
@@ -404,14 +419,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	if (!dir)
 		die_errno(_("failed to open '%s'"), src->buf);
 
-	if (mkdir(dest->buf, 0777)) {
-		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
-		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
-	}
+	mkdir_if_missing(dest->buf, 0777);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 6/7] clone: use dir-iterator to avoid explicit dir traversal
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
                       ` (4 preceding siblings ...)
  2019-03-22 23:22     ` [GSoC][PATCH v4 5/7] clone: extract function from copy_or_link_directory Matheus Tavares
@ 2019-03-22 23:22     ` Matheus Tavares
  2019-03-22 23:22     ` [GSoC][PATCH v4 7/7] clone: Replace strcmp by fspathcmp Matheus Tavares
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
  7 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Benoit Pierre, Junio C Hamano

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoid recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat, inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone would end up
successfully even though the .git/objects copy didn't fully succeeded.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index c17bbf1bfc..4ee45e7862 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -408,42 +410,36 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
 }
 
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+				   const char *src_repo)
 {
-	struct dirent *de;
-	struct stat buf;
 	int src_len, dest_len;
-	DIR *dir;
-
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
+	struct dir_iterator *iter;
+	int iter_status;
+	unsigned flags;
 
 	mkdir_if_missing(dest->buf, 0777);
 
+	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+	iter = dir_iterator_begin(src->buf, flags);
+
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (!is_dot_or_dotdot(de->d_name))
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
@@ -460,7 +456,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
@@ -478,7 +478,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
 		get_common_dir(&dest, dest_repo);
 		strbuf_addstr(&src, "/objects");
 		strbuf_addstr(&dest, "/objects");
-		copy_or_link_directory(&src, &dest, src_repo, src.len);
+		copy_or_link_directory(&src, &dest, src_repo);
 		strbuf_release(&src);
 		strbuf_release(&dest);
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v4 7/7] clone: Replace strcmp by fspathcmp
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
                       ` (5 preceding siblings ...)
  2019-03-22 23:22     ` [GSoC][PATCH v4 6/7] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-03-22 23:22     ` Matheus Tavares
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
  7 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-22 23:22 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Benoit Pierre, Junio C Hamano

Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Suggested-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4ee45e7862..763ad5e31f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -439,7 +439,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(iter->relative_path, "info/alternates")) {
+		if (!fspathcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-22 23:22     ` [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
@ 2019-03-24 18:09       ` Matheus Tavares Bernardino
  2019-03-24 20:56       ` SZEDER Gábor
  2019-03-28 21:49       ` Thomas Gummerer
  2 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-03-24 18:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Thomas Gummerer, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen,
	Junio C Hamano

On Fri, Mar 22, 2019 at 8:22 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> Add tests for what happens when we perform a local clone on a repo
> containing odd files at .git/object directory, such as symlinks to other
> dirs, or unknown files.
>
> I'm bending over backwards here to avoid a SHA1 dependency. See [1]
> for an earlier and simpler version that hardcoded a SHA-1s.
>
> This behavior has been the same for a *long* time, but hasn't been
> tested for.
>
> There's a good post-hoc argument to be made for copying over unknown
> things, e.g. I'd like a git version that doesn't know about the
> commit-graph to copy it under "clone --local" so a newer git version
> can make use of it.
>
> In follow-up commits we'll look at changing some of this behavior, but
> for now let's just assert it as-is so we'll notice what we'll change
> later.
>
> 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  t/t5604-clone-reference.sh | 116 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 116 insertions(+)
>
> diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
> index 4320082b1b..708b1a2c66 100755
> --- a/t/t5604-clone-reference.sh
> +++ b/t/t5604-clone-reference.sh
> @@ -221,4 +221,120 @@ test_expect_success 'clone, dissociate from alternates' '
>         ( cd C && git fsck )
>  '
>
> +test_expect_success 'setup repo with garbage in objects/*' '
> +       git init S &&
> +       (
> +               cd S &&
> +               test_commit A &&
> +
> +               cd .git/objects &&
> +               >.some-hidden-file &&
> +               >some-file &&
> +               mkdir .some-hidden-dir &&
> +               >.some-hidden-dir/some-file &&
> +               >.some-hidden-dir/.some-dot-file &&
> +               mkdir some-dir &&
> +               >some-dir/some-file &&
> +               >some-dir/.some-dot-file
> +       )
> +'
> +
> +test_expect_success 'clone a repo with garbage in objects/*' '
> +       for option in --local --no-hardlinks --shared --dissociate
> +       do
> +               git clone $option S S$option || return 1 &&
> +               git -C S$option fsck || return 1
> +       done &&
> +       find S-* -name "*some*" | sort >actual &&
> +       cat >expected <<-EOF &&
> +       S--dissociate/.git/objects/.some-hidden-file
> +       S--dissociate/.git/objects/some-dir
> +       S--dissociate/.git/objects/some-dir/.some-dot-file
> +       S--dissociate/.git/objects/some-dir/some-file
> +       S--dissociate/.git/objects/some-file
> +       S--local/.git/objects/.some-hidden-file
> +       S--local/.git/objects/some-dir
> +       S--local/.git/objects/some-dir/.some-dot-file
> +       S--local/.git/objects/some-dir/some-file
> +       S--local/.git/objects/some-file
> +       S--no-hardlinks/.git/objects/.some-hidden-file
> +       S--no-hardlinks/.git/objects/some-dir
> +       S--no-hardlinks/.git/objects/some-dir/.some-dot-file
> +       S--no-hardlinks/.git/objects/some-dir/some-file
> +       S--no-hardlinks/.git/objects/some-file
> +       EOF
> +       test_cmp expected actual
> +'
> +
> +test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
> +       git init T &&
> +       (
> +               cd T &&
> +               test_commit A &&
> +               git gc &&
> +               (
> +                       cd .git/objects &&
> +                       mv pack packs &&
> +                       ln -s packs pack
> +               ) &&
> +               test_commit B &&
> +               (
> +                       cd .git/objects &&
> +                       find ?? -type d >loose-dirs &&
> +                       last_loose=$(tail -n 1 loose-dirs) &&
> +                       rm -f loose-dirs &&
> +                       mv $last_loose a-loose-dir &&
> +                       ln -s a-loose-dir $last_loose &&
> +                       find . -type f | sort >../../../T.objects-files.raw &&
> +                       echo unknown_content> unknown_file
> +               )
> +       ) &&
> +       git -C T fsck &&
> +       git -C T rev-list --all --objects >T.objects
> +'
> +
> +
> +test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
> +       for option in --local --no-hardlinks --shared --dissociate
> +       do
> +               git clone $option T T$option || return 1 &&
> +               git -C T$option fsck || return 1 &&
> +               git -C T$option rev-list --all --objects >T$option.objects &&
> +               test_cmp T.objects T$option.objects &&
> +               (
> +                       cd T$option/.git/objects &&
> +                       find . -type f | sort >../../../T$option.objects-files.raw
> +               )
> +       done &&
> +
> +       for raw in $(ls T*.raw)
> +       do
> +               sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" \
> +                   -e "/multi-pack-index/d" -e "/commit-graph/d" <$raw >$raw.de-sha || return 1

Ævar, maybe I'm missing something here, but do we really need the
first sed command ("s!/..\$!/X!") ?


> +       done &&
> +
> +       cat >expected-files <<-EOF &&
> +       ./Y/Z
> +       ./Y/Z
> +       ./a-loose-dir/Z
> +       ./Y/Z
> +       ./info/packs
> +       ./pack/pack-Z.idx
> +       ./pack/pack-Z.pack
> +       ./packs/pack-Z.idx
> +       ./packs/pack-Z.pack
> +       ./unknown_file
> +       EOF
> +
> +       for option in --local --dissociate --no-hardlinks
> +       do
> +               test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
> +       done &&
> +
> +       cat >expected-files <<-EOF &&
> +       ./info/alternates
> +       EOF
> +       test_cmp expected-files T--shared.objects-files.raw
> +'
> +
>  test_done
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-22 23:22     ` [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
  2019-03-24 18:09       ` Matheus Tavares Bernardino
@ 2019-03-24 20:56       ` SZEDER Gábor
  2019-03-26 19:43         ` Matheus Tavares Bernardino
  2019-03-28 21:49       ` Thomas Gummerer
  2 siblings, 1 reply; 127+ messages in thread
From: SZEDER Gábor @ 2019-03-24 20:56 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Alex Riesen, Junio C Hamano

On Fri, Mar 22, 2019 at 08:22:31PM -0300, Matheus Tavares wrote:
> From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> 
> Add tests for what happens when we perform a local clone on a repo
> containing odd files at .git/object directory, such as symlinks to other
> dirs, or unknown files.
> 
> I'm bending over backwards here to avoid a SHA1 dependency. See [1]

s/SHA1/SHA-1/

> for an earlier and simpler version that hardcoded a SHA-1s.

s/SHA-1s/SHA-1/ or s/a SHA-1s/SHA-1s/, depending on what you consider
multiple occurrances of the same SHA-1.

> This behavior has been the same for a *long* time, but hasn't been
> tested for.
> 
> There's a good post-hoc argument to be made for copying over unknown
> things, e.g. I'd like a git version that doesn't know about the
> commit-graph to copy it under "clone --local" so a newer git version
> can make use of it.
> 
> In follow-up commits we'll look at changing some of this behavior, but
> for now let's just assert it as-is so we'll notice what we'll change
> later.
> 
> 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> Helped-by: Matheus Tavares <matheus.bernardino@usp.br>


> +test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
> +	git init T &&
> +	(
> +		cd T &&
> +		test_commit A &&
> +		git gc &&
> +		(
> +			cd .git/objects &&
> +			mv pack packs &&
> +			ln -s packs pack
> +		) &&
> +		test_commit B &&
> +		(
> +			cd .git/objects &&
> +			find ?? -type d >loose-dirs &&
> +			last_loose=$(tail -n 1 loose-dirs) &&
> +			rm -f loose-dirs &&
> +			mv $last_loose a-loose-dir &&
> +			ln -s a-loose-dir $last_loose &&
> +			find . -type f | sort >../../../T.objects-files.raw &&
> +			echo unknown_content> unknown_file
> +		)

Please drop these inner subshells.  They are unnecessary, because the
outer subshell alone is sufficient to ensure that the test script
returns to the original directory if one of the commands were to fail.

> +	) &&
> +	git -C T fsck &&
> +	git -C T rev-list --all --objects >T.objects
> +'
> +
> +
> +test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
> +	for option in --local --no-hardlinks --shared --dissociate
> +	do
> +		git clone $option T T$option || return 1 &&
> +		git -C T$option fsck || return 1 &&
> +		git -C T$option rev-list --all --objects >T$option.objects &&
> +		test_cmp T.objects T$option.objects &&
> +		(
> +			cd T$option/.git/objects &&
> +			find . -type f | sort >../../../T$option.objects-files.raw
> +		)

Nit: this might be a bit easier on the eyes when written as

  ( 
        cd T$option/.git/objects &&
        find . -type f
  ) | sort >T$option.objects-files.raw

because it would avoid that '../../../'.

> +	done &&
> +
> +	for raw in $(ls T*.raw)
> +	do
> +		sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" \
> +		    -e "/multi-pack-index/d" -e "/commit-graph/d" <$raw >$raw.de-sha || return 1
> +	done &&
> +
> +	cat >expected-files <<-EOF &&
> +	./Y/Z
> +	./Y/Z
> +	./a-loose-dir/Z
> +	./Y/Z
> +	./info/packs
> +	./pack/pack-Z.idx
> +	./pack/pack-Z.pack
> +	./packs/pack-Z.idx
> +	./packs/pack-Z.pack
> +	./unknown_file
> +	EOF
> +
> +	for option in --local --dissociate --no-hardlinks
> +	do
> +		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
> +	done &&
> +
> +	cat >expected-files <<-EOF &&
> +	./info/alternates
> +	EOF

Perhaps

  echo ./info/alternates >expected-files

> +	test_cmp expected-files T--shared.objects-files.raw
> +'
> +
>  test_done
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-24 20:56       ` SZEDER Gábor
@ 2019-03-26 19:43         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-03-26 19:43 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	Kernel USP, Alex Riesen, Junio C Hamano

On Sun, Mar 24, 2019 at 5:56 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Fri, Mar 22, 2019 at 08:22:31PM -0300, Matheus Tavares wrote:
> > From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >
> > Add tests for what happens when we perform a local clone on a repo
> > containing odd files at .git/object directory, such as symlinks to other
> > dirs, or unknown files.
> >
> > I'm bending over backwards here to avoid a SHA1 dependency. See [1]
>
> s/SHA1/SHA-1/
>

Thanks, nice catch.

> > for an earlier and simpler version that hardcoded a SHA-1s.
>
> s/SHA-1s/SHA-1/ or s/a SHA-1s/SHA-1s/, depending on what you consider
> multiple occurrances of the same SHA-1.
>

Yes, I think it should be just "SHA-1s". Thanks.

> > This behavior has been the same for a *long* time, but hasn't been
> > tested for.
> >
> > There's a good post-hoc argument to be made for copying over unknown
> > things, e.g. I'd like a git version that doesn't know about the
> > commit-graph to copy it under "clone --local" so a newer git version
> > can make use of it.
> >
> > In follow-up commits we'll look at changing some of this behavior, but
> > for now let's just assert it as-is so we'll notice what we'll change
> > later.
> >
> > 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
> >
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
>
>
> > +test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
> > +     git init T &&
> > +     (
> > +             cd T &&
> > +             test_commit A &&
> > +             git gc &&
> > +             (
> > +                     cd .git/objects &&
> > +                     mv pack packs &&
> > +                     ln -s packs pack
> > +             ) &&
> > +             test_commit B &&
> > +             (
> > +                     cd .git/objects &&
> > +                     find ?? -type d >loose-dirs &&
> > +                     last_loose=$(tail -n 1 loose-dirs) &&
> > +                     rm -f loose-dirs &&
> > +                     mv $last_loose a-loose-dir &&
> > +                     ln -s a-loose-dir $last_loose &&
> > +                     find . -type f | sort >../../../T.objects-files.raw &&
> > +                     echo unknown_content> unknown_file
> > +             )
>
> Please drop these inner subshells.  They are unnecessary, because the
> outer subshell alone is sufficient to ensure that the test script
> returns to the original directory if one of the commands were to fail.

Ok!

> > +     ) &&
> > +     git -C T fsck &&
> > +     git -C T rev-list --all --objects >T.objects
> > +'
> > +
> > +
> > +test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
> > +     for option in --local --no-hardlinks --shared --dissociate
> > +     do
> > +             git clone $option T T$option || return 1 &&
> > +             git -C T$option fsck || return 1 &&
> > +             git -C T$option rev-list --all --objects >T$option.objects &&
> > +             test_cmp T.objects T$option.objects &&
> > +             (
> > +                     cd T$option/.git/objects &&
> > +                     find . -type f | sort >../../../T$option.objects-files.raw
> > +             )
>
> Nit: this might be a bit easier on the eyes when written as
>
>   (
>         cd T$option/.git/objects &&
>         find . -type f
>   ) | sort >T$option.objects-files.raw
>
> because it would avoid that '../../../'.

Sounds good, but in the next patch of this series, another 'find'
statement will be added inside this subshell, so I think that change
is not really possible, unfortunately.

> > +     done &&
> > +
> > +     for raw in $(ls T*.raw)
> > +     do
> > +             sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" \
> > +                 -e "/multi-pack-index/d" -e "/commit-graph/d" <$raw >$raw.de-sha || return 1
> > +     done &&
> > +
> > +     cat >expected-files <<-EOF &&
> > +     ./Y/Z
> > +     ./Y/Z
> > +     ./a-loose-dir/Z
> > +     ./Y/Z
> > +     ./info/packs
> > +     ./pack/pack-Z.idx
> > +     ./pack/pack-Z.pack
> > +     ./packs/pack-Z.idx
> > +     ./packs/pack-Z.pack
> > +     ./unknown_file
> > +     EOF
> > +
> > +     for option in --local --dissociate --no-hardlinks
> > +     do
> > +             test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
> > +     done &&
> > +
> > +     cat >expected-files <<-EOF &&
> > +     ./info/alternates
> > +     EOF
>
> Perhaps
>
>   echo ./info/alternates >expected-files

Indeed, much simpler. Thanks.

> > +     test_cmp expected-files T--shared.objects-files.raw
> > +'
> > +
> >  test_done
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-22 23:22     ` [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
  2019-03-24 18:09       ` Matheus Tavares Bernardino
  2019-03-24 20:56       ` SZEDER Gábor
@ 2019-03-28 21:49       ` Thomas Gummerer
  2019-03-29 14:06         ` Matheus Tavares Bernardino
  2 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-28 21:49 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, kernel-usp, Alex Riesen,
	Junio C Hamano

On 03/22, Matheus Tavares wrote:
> From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> 
> Add tests for what happens when we perform a local clone on a repo
> containing odd files at .git/object directory, such as symlinks to other
> dirs, or unknown files.
> 
> I'm bending over backwards here to avoid a SHA1 dependency. See [1]
> for an earlier and simpler version that hardcoded a SHA-1s.
> 
> This behavior has been the same for a *long* time, but hasn't been
> tested for.
> 
> There's a good post-hoc argument to be made for copying over unknown
> things, e.g. I'd like a git version that doesn't know about the
> commit-graph to copy it under "clone --local" so a newer git version
> can make use of it.
> 
> In follow-up commits we'll look at changing some of this behavior, but
> for now let's just assert it as-is so we'll notice what we'll change
> later.
> 
> 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> Helped-by: Matheus Tavares <matheus.bernardino@usp.br>

The trailers should be in the order things have happened usually.  So
having Ævar's S-o-b first makes sense, but the Helped-by should come
before your S-o-b, as you made the changes first before sending out
the patch series.

When sending someone elses patch in a slightly modified version, it
may also be useful to add which parts you changed, as it was done in
e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
2018-10-31) for example.

Iirc, the test that is added in this patch does not work on some
platforms, notably MacOS.  That would mean that we would break
bisectability at this patch on some platforms if we were to introduce
it here.  Therefore I think it would be better to squash this patch
into the next one which fixes these inconsistencies.

Note that I can't test this at the moment, so this concern is only
based on previous discussions that I remember.  If that's already
addressed somehow, all the better!

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-22 23:22     ` [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
@ 2019-03-28 22:10       ` Thomas Gummerer
  2019-03-29  8:38         ` Ævar Arnfjörð Bjarmason
                           ` (2 more replies)
  0 siblings, 3 replies; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-28 22:10 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, kernel-usp, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin

On 03/22, Matheus Tavares wrote:
> There is currently an odd behaviour when locally clonning a repository
> with symlinks at .git/objects: using --no-hardlinks all symlinks are
> dereferenced but without it Git will try to hardlink the files with the
> link() function, which has an OS-specific behaviour on symlinks. On OSX
> and NetBSD, it creates a hardlink to the file pointed by the symlink
> whilst on GNU/Linux, it creates a hardlink to the symlink itself.
> 
> On Manjaro GNU/Linux:
>     $ touch a
>     $ ln -s a b
>     $ link b c
>     $ ls -li a b c
>     155 [...] a
>     156 [...] b -> a
>     156 [...] c -> a
> 
> But on NetBSD:
>     $ ls -li a b c
>     2609160 [...] a
>     2609164 [...] b -> a
>     2609160 [...] c
> 
> It's not good to have the result of a local clone to be OS-dependent and
> since the behaviour on GNU/Linux may result in broken symlinks, let's
> re-implement it with linkat() instead of link() using a flag to always
> follow symlinks and make the hardlink be to the pointed file. With this,
> besides standardizing the behaviour, no broken symlinks will be
> produced. Also, add tests for symlinked files at .git/objects/.
> 
> Note: Git won't create symlinks at .git/objects itself, but it's better
> to handle this case and be friendly with users who manually create them.
> 
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/clone.c            |  2 +-
>  t/t5604-clone-reference.sh | 26 +++++++++++++++++++-------
>  2 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 50bde99618..b76f33c635 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>  		if (unlink(dest->buf) && errno != ENOENT)
>  			die_errno(_("failed to unlink '%s'"), dest->buf);
>  		if (!option_no_hardlinks) {
> -			if (!link(src->buf, dest->buf))
> +			if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))

This line is starting to get a bit long, might be worth breaking it up
to keep to 80 characters per line.

I notice that we are currently not using 'linkat()' anywhere else in
our codebase.  It looks like it has been introduced in POSIX.1-2008,
which sounds fairly recent by git's standards.  So I wonder if this is
really supported on all platforms that git is being built on.

I also wonder what would need to be done on Windows if we were to
introduce this.  I see we define the 'link()' function in
'compat/mingw.c' for that currently, so I guess something similar
would be needed for 'linkat()'.  I added Dscho to Cc for Windows
expertise.

While I agree with the goal of consistency accross all platforms here,
I don't know if it's actually worth going through the pain of doing
that, especially for somewhat of an edge case in local clones.

If the test in the previous patch passes on all platforms, I'd be okay
with just calling the behaviour here undefined, especially as git
would never actually create symlinks in the .git/objects directory.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-03-22 23:22     ` [GSoC][PATCH v4 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-03-28 22:19       ` Thomas Gummerer
  2019-03-29 13:16         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-28 22:19 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, kernel-usp,
	Michael Haggerty, Ramsay Jones, Junio C Hamano

On 03/22, Matheus Tavares wrote:
> Add the possibility of giving flags to dir_iterator_begin to initialize
> a dir-iterator with special options.
> 
> Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
> dir_iterator_advance abort imediatelly in the case of an error while

s/imediatelly/immediately/


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-28 22:10       ` Thomas Gummerer
@ 2019-03-29  8:38         ` Ævar Arnfjörð Bjarmason
  2019-03-29 20:15           ` Thomas Gummerer
  2019-03-29 14:27         ` Matheus Tavares Bernardino
  2019-03-29 15:40         ` Johannes Schindelin
  2 siblings, 1 reply; 127+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-29  8:38 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Matheus Tavares, git, Christian Couder,
	Nguyễn Thái Ngọc Duy, kernel-usp, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin


On Thu, Mar 28 2019, Thomas Gummerer wrote:

> On 03/22, Matheus Tavares wrote:
>> There is currently an odd behaviour when locally clonning a repository
>> with symlinks at .git/objects: using --no-hardlinks all symlinks are
>> dereferenced but without it Git will try to hardlink the files with the
>> link() function, which has an OS-specific behaviour on symlinks. On OSX
>> and NetBSD, it creates a hardlink to the file pointed by the symlink
>> whilst on GNU/Linux, it creates a hardlink to the symlink itself.
>>
>> On Manjaro GNU/Linux:
>>     $ touch a
>>     $ ln -s a b
>>     $ link b c
>>     $ ls -li a b c
>>     155 [...] a
>>     156 [...] b -> a
>>     156 [...] c -> a
>>
>> But on NetBSD:
>>     $ ls -li a b c
>>     2609160 [...] a
>>     2609164 [...] b -> a
>>     2609160 [...] c
>>
>> It's not good to have the result of a local clone to be OS-dependent and
>> since the behaviour on GNU/Linux may result in broken symlinks, let's
>> re-implement it with linkat() instead of link() using a flag to always
>> follow symlinks and make the hardlink be to the pointed file. With this,
>> besides standardizing the behaviour, no broken symlinks will be
>> produced. Also, add tests for symlinked files at .git/objects/.
>>
>> Note: Git won't create symlinks at .git/objects itself, but it's better
>> to handle this case and be friendly with users who manually create them.
>>
>> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  builtin/clone.c            |  2 +-
>>  t/t5604-clone-reference.sh | 26 +++++++++++++++++++-------
>>  2 files changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/builtin/clone.c b/builtin/clone.c
>> index 50bde99618..b76f33c635 100644
>> --- a/builtin/clone.c
>> +++ b/builtin/clone.c
>> @@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>>  		if (unlink(dest->buf) && errno != ENOENT)
>>  			die_errno(_("failed to unlink '%s'"), dest->buf);
>>  		if (!option_no_hardlinks) {
>> -			if (!link(src->buf, dest->buf))
>> +			if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))
>
> This line is starting to get a bit long, might be worth breaking it up
> to keep to 80 characters per line.
>
> I notice that we are currently not using 'linkat()' anywhere else in
> our codebase.  It looks like it has been introduced in POSIX.1-2008,
> which sounds fairly recent by git's standards.  So I wonder if this is
> really supported on all platforms that git is being built on.
>
> I also wonder what would need to be done on Windows if we were to
> introduce this.  I see we define the 'link()' function in
> 'compat/mingw.c' for that currently, so I guess something similar
> would be needed for 'linkat()'.  I added Dscho to Cc for Windows
> expertise.

For better of worse this particular quest started because I pointed out
(with some WIP patches) that for understanding this change we should
test whatever we did now, to ensure that the refactoring didn't have
unintended side-effects.

But that's a separate question from whether or not we want to keep the
current behavior.

I think the current behavior is clearly insane, so I think we should
change it with some follow-up patches. In particular options like
--dissociate should clearly (in my mind at least) have behavior similar
to "cp -L", and --local should hardlink to the *target* of the symlink,
if anything, at least for objects/{??,pack,info}

I think that changes the portability story with linkat(), since it's not
something we should be planning to keep, just an intermediate step so we
don't have a gigantic patch that both adds tests, refactors and changes
the behavior.

> While I agree with the goal of consistency accross all platforms here,
> I don't know if it's actually worth going through the pain of doing
> that, especially for somewhat of an edge case in local clones.

Note that we explicitly clone everything under objects/, including
recursively cloning unknown directories and their files.

So this is not just say about how we handle symlinks that we don't
expect now (nothing uses them), but if we want to make the promise that
nothing in objects/ will ever use symlinks. Or more specifically, that
if a new version of git starts using it that something doing local
clones might produce a broken copy of such a repo.

Maybe we'll still say "we don't care". Just saying it's a slightly
different question...

> If the test in the previous patch passes on all platforms, I'd be okay
> with just calling the behaviour here undefined, especially as git
> would never actually create symlinks in the .git/objects directory.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-03-28 22:19       ` Thomas Gummerer
@ 2019-03-29 13:16         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-03-29 13:16 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP,
	Michael Haggerty, Ramsay Jones, Junio C Hamano

On Thu, Mar 28, 2019 at 7:19 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/22, Matheus Tavares wrote:
> > Add the possibility of giving flags to dir_iterator_begin to initialize
> > a dir-iterator with special options.
> >
> > Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
> > dir_iterator_advance abort imediatelly in the case of an error while
>
> s/imediatelly/immediately/
>

Thanks!

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-28 21:49       ` Thomas Gummerer
@ 2019-03-29 14:06         ` Matheus Tavares Bernardino
  2019-03-29 19:31           ` Thomas Gummerer
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-03-29 14:06 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen,
	Junio C Hamano

On Thu, Mar 28, 2019 at 6:49 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/22, Matheus Tavares wrote:
> > From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >
> > Add tests for what happens when we perform a local clone on a repo
> > containing odd files at .git/object directory, such as symlinks to other
> > dirs, or unknown files.
> >
> > I'm bending over backwards here to avoid a SHA1 dependency. See [1]
> > for an earlier and simpler version that hardcoded a SHA-1s.
> >
> > This behavior has been the same for a *long* time, but hasn't been
> > tested for.
> >
> > There's a good post-hoc argument to be made for copying over unknown
> > things, e.g. I'd like a git version that doesn't know about the
> > commit-graph to copy it under "clone --local" so a newer git version
> > can make use of it.
> >
> > In follow-up commits we'll look at changing some of this behavior, but
> > for now let's just assert it as-is so we'll notice what we'll change
> > later.
> >
> > 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/
> >
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
>
> The trailers should be in the order things have happened usually.  So
> having Ævar's S-o-b first makes sense, but the Helped-by should come
> before your S-o-b, as you made the changes first before sending out
> the patch series.

Ok, thanks for letting me know. I'll fix it.

> When sending someone elses patch in a slightly modified version, it
> may also be useful to add which parts you changed, as it was done in
> e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
> 2018-10-31) for example.

Thanks, I didn't know about that! I searched the log and didn't see
many of this on patches with 'Helped-by' tags, is there a particular
case to use it or not?

> Iirc, the test that is added in this patch does not work on some
> platforms, notably MacOS.  That would mean that we would break
> bisectability at this patch on some platforms if we were to introduce
> it here.  Therefore I think it would be better to squash this patch
> into the next one which fixes these inconsistencies.
> Note that I can't test this at the moment, so this concern is only
> based on previous discussions that I remember.  If that's already
> addressed somehow, all the better!

Yes, it is already addressed :) The section of these tests that used
to break on some platforms is now moved to the next patch which also
fixes the platform inconsistencies. Now both patches (this and the
next) work on macOS, NetBSD and GNU/Linux. Also every test and job is
passing at travis-ci, except by the job named "Documentation"[1]. But,
it's weird since these patches don't even touch Documentation/... And
master is failing the same job at my fork as well [2]... Any thoughts
on that?

[1] https://travis-ci.org/MatheusBernardino/git/builds/512713775
[2] https://travis-ci.org/MatheusBernardino/git/builds/513028692

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-28 22:10       ` Thomas Gummerer
  2019-03-29  8:38         ` Ævar Arnfjörð Bjarmason
@ 2019-03-29 14:27         ` Matheus Tavares Bernardino
  2019-03-29 20:05           ` Thomas Gummerer
  2019-03-29 15:40         ` Johannes Schindelin
  2 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-03-29 14:27 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin

On Thu, Mar 28, 2019 at 7:10 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/22, Matheus Tavares wrote:
> > There is currently an odd behaviour when locally clonning a repository
> > with symlinks at .git/objects: using --no-hardlinks all symlinks are
> > dereferenced but without it Git will try to hardlink the files with the
> > link() function, which has an OS-specific behaviour on symlinks. On OSX
> > and NetBSD, it creates a hardlink to the file pointed by the symlink
> > whilst on GNU/Linux, it creates a hardlink to the symlink itself.
> >
> > On Manjaro GNU/Linux:
> >     $ touch a
> >     $ ln -s a b
> >     $ link b c
> >     $ ls -li a b c
> >     155 [...] a
> >     156 [...] b -> a
> >     156 [...] c -> a
> >
> > But on NetBSD:
> >     $ ls -li a b c
> >     2609160 [...] a
> >     2609164 [...] b -> a
> >     2609160 [...] c
> >
> > It's not good to have the result of a local clone to be OS-dependent and
> > since the behaviour on GNU/Linux may result in broken symlinks, let's
> > re-implement it with linkat() instead of link() using a flag to always
> > follow symlinks and make the hardlink be to the pointed file. With this,
> > besides standardizing the behaviour, no broken symlinks will be
> > produced. Also, add tests for symlinked files at .git/objects/.
> >
> > Note: Git won't create symlinks at .git/objects itself, but it's better
> > to handle this case and be friendly with users who manually create them.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > ---
> >  builtin/clone.c            |  2 +-
> >  t/t5604-clone-reference.sh | 26 +++++++++++++++++++-------
> >  2 files changed, 20 insertions(+), 8 deletions(-)
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index 50bde99618..b76f33c635 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >               if (unlink(dest->buf) && errno != ENOENT)
> >                       die_errno(_("failed to unlink '%s'"), dest->buf);
> >               if (!option_no_hardlinks) {
> > -                     if (!link(src->buf, dest->buf))
> > +                     if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))
>
> This line is starting to get a bit long, might be worth breaking it up
> to keep to 80 characters per line.
>
> I notice that we are currently not using 'linkat()' anywhere else in
> our codebase.  It looks like it has been introduced in POSIX.1-2008,
> which sounds fairly recent by git's standards.  So I wonder if this is
> really supported on all platforms that git is being built on.
>
> I also wonder what would need to be done on Windows if we were to
> introduce this.  I see we define the 'link()' function in
> 'compat/mingw.c' for that currently, so I guess something similar
> would be needed for 'linkat()'.  I added Dscho to Cc for Windows
> expertise.

Ok, what if instead of using linkat() we use 'realpath(const char
*path, char *resolved_path)', which will resolve any symlinks at
'path' and store the canonical path at 'resolved_path'? Then, we can
still keep using link() but now, with the certainty that all platforms
will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
Would that be a better idea?

> While I agree with the goal of consistency accross all platforms here,
> I don't know if it's actually worth going through the pain of doing
> that, especially for somewhat of an edge case in local clones.
>
> If the test in the previous patch passes on all platforms, I'd be okay
> with just calling the behaviour here undefined, especially as git
> would never actually create symlinks in the .git/objects directory.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-28 22:10       ` Thomas Gummerer
  2019-03-29  8:38         ` Ævar Arnfjörð Bjarmason
  2019-03-29 14:27         ` Matheus Tavares Bernardino
@ 2019-03-29 15:40         ` Johannes Schindelin
  2 siblings, 0 replies; 127+ messages in thread
From: Johannes Schindelin @ 2019-03-29 15:40 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Matheus Tavares, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	kernel-usp, Benoit Pierre, Junio C Hamano

Hi Thomas,

On Thu, 28 Mar 2019, Thomas Gummerer wrote:

> On 03/22, Matheus Tavares wrote:
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index 50bde99618..b76f33c635 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >  		if (unlink(dest->buf) && errno != ENOENT)
> >  			die_errno(_("failed to unlink '%s'"), dest->buf);
> >  		if (!option_no_hardlinks) {
> > -			if (!link(src->buf, dest->buf))
> > +			if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))
>
> [...]
>
> I notice that we are currently not using 'linkat()' anywhere else in
> our codebase.  It looks like it has been introduced in POSIX.1-2008,
> which sounds fairly recent by git's standards.  So I wonder if this is
> really supported on all platforms that git is being built on.

I bet you it isn't.

> I also wonder what would need to be done on Windows if we were to
> introduce this.  I see we define the 'link()' function in
> 'compat/mingw.c' for that currently, so I guess something similar
> would be needed for 'linkat()'.  I added Dscho to Cc for Windows
> expertise.

Indeed, `linkat()` would have to be implemented in `compat/mingw.c`. It
would be a bit involved because the last parameter of that function
changes behavior noticeably, but the main difficulty (to determine the
path from a file descriptor) should be overcome using
`HANDLE olddirhandle = _get_osfhandle(olddirfd);` and the calling
`GetFinalPathNameByHandleW(olddirhandle, wbuf, sizeof(wbuf));`.

So yes, this is *not* something I'd do lightly.

The bigger problem will be to continue to support older Unices such as
SunOS and AIX. I highly doubt that they have that function. You should
find out, Matheus.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-29 14:06         ` Matheus Tavares Bernardino
@ 2019-03-29 19:31           ` Thomas Gummerer
  2019-03-29 19:42             ` SZEDER Gábor
  2019-03-30  2:49             ` Matheus Tavares Bernardino
  0 siblings, 2 replies; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-29 19:31 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen,
	Junio C Hamano

On 03/29, Matheus Tavares Bernardino wrote:
> On Thu, Mar 28, 2019 at 6:49 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > When sending someone elses patch in a slightly modified version, it
> > may also be useful to add which parts you changed, as it was done in
> > e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
> > 2018-10-31) for example.
> 
> Thanks, I didn't know about that! I searched the log and didn't see
> many of this on patches with 'Helped-by' tags, is there a particular
> case to use it or not?

Helped-by tags are usually used when you want to give someone credit
for help you got on a patch that you originally authored.  It's up to
you at which point of involvement you actually want to add the tag, I
tend to add them whenever someones input significantly
changes/improves the patch.  I think adding it here might be okay,
it's just less common when sending a patch that someone else authored
originally.

> > Iirc, the test that is added in this patch does not work on some
> > platforms, notably MacOS.  That would mean that we would break
> > bisectability at this patch on some platforms if we were to introduce
> > it here.  Therefore I think it would be better to squash this patch
> > into the next one which fixes these inconsistencies.
> > Note that I can't test this at the moment, so this concern is only
> > based on previous discussions that I remember.  If that's already
> > addressed somehow, all the better!
> 
> Yes, it is already addressed :) The section of these tests that used
> to break on some platforms is now moved to the next patch which also
> fixes the platform inconsistencies. Now both patches (this and the
> next) work on macOS, NetBSD and GNU/Linux.

Great!

>                                             Also every test and job is
> passing at travis-ci, except by the job named "Documentation"[1]. But,
> it's weird since these patches don't even touch Documentation/... And
> master is failing the same job at my fork as well [2]... Any thoughts
> on that?

Yeah, this error seems to have nothing to do with your patch series.
Since the last run of travis on master [*1*] at least the asciidoc
package doesn't seem to have changed, so from a first look I don't
quite understand what's going on there.  In any case, I don't think
you need to worry about that for now, as it hasn't been triggered by
your changes (I won't discourage you from looking at why it is failing
and to try and fix that, but I think your time is probably better
spent looking at this patch series and the proposal for GSoC for
now).

*1*: https://travis-ci.org/git/git/builds/508784487

> [1] https://travis-ci.org/MatheusBernardino/git/builds/512713775
> [2] https://travis-ci.org/MatheusBernardino/git/builds/513028692

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-29 19:31           ` Thomas Gummerer
@ 2019-03-29 19:42             ` SZEDER Gábor
  2019-03-30  2:49             ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 127+ messages in thread
From: SZEDER Gábor @ 2019-03-29 19:42 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Matheus Tavares Bernardino, git,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen,
	Junio C Hamano

On Fri, Mar 29, 2019 at 07:31:58PM +0000, Thomas Gummerer wrote:
> >                                             Also every test and job is
> > passing at travis-ci, except by the job named "Documentation"[1]. But,
> > it's weird since these patches don't even touch Documentation/... And
> > master is failing the same job at my fork as well [2]... Any thoughts
> > on that?
> 
> Yeah, this error seems to have nothing to do with your patch series.
> Since the last run of travis on master [*1*] at least the asciidoc
> package doesn't seem to have changed, so from a first look I don't
> quite understand what's going on there.

https://public-inbox.org/git/20190329123520.27549-6-szeder.dev@gmail.com/


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-29 14:27         ` Matheus Tavares Bernardino
@ 2019-03-29 20:05           ` Thomas Gummerer
  2019-03-30  5:32             ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-29 20:05 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin

On 03/29, Matheus Tavares Bernardino wrote:
> On Thu, Mar 28, 2019 at 7:10 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > I notice that we are currently not using 'linkat()' anywhere else in
> > our codebase.  It looks like it has been introduced in POSIX.1-2008,
> > which sounds fairly recent by git's standards.  So I wonder if this is
> > really supported on all platforms that git is being built on.
> >
> > I also wonder what would need to be done on Windows if we were to
> > introduce this.  I see we define the 'link()' function in
> > 'compat/mingw.c' for that currently, so I guess something similar
> > would be needed for 'linkat()'.  I added Dscho to Cc for Windows
> > expertise.
> 
> Ok, what if instead of using linkat() we use 'realpath(const char
> *path, char *resolved_path)', which will resolve any symlinks at
> 'path' and store the canonical path at 'resolved_path'? Then, we can
> still keep using link() but now, with the certainty that all platforms
> will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
> Would that be a better idea?

Yeah, I think that is a good idea.  Note that 'realpath()' itself is
not used anywhere in our codebase either, but there is
'strbuf_realpath()', that from reading the function documentation does
exactly what 'realpath()' would do.  So using 'strbuf_realpath()'
would probably be the right thing to do here.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-29  8:38         ` Ævar Arnfjörð Bjarmason
@ 2019-03-29 20:15           ` Thomas Gummerer
  0 siblings, 0 replies; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-29 20:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Matheus Tavares, git, Christian Couder,
	Nguyễn Thái Ngọc Duy, kernel-usp, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin

On 03/29, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Mar 28 2019, Thomas Gummerer wrote:
> > I notice that we are currently not using 'linkat()' anywhere else in
> > our codebase.  It looks like it has been introduced in POSIX.1-2008,
> > which sounds fairly recent by git's standards.  So I wonder if this is
> > really supported on all platforms that git is being built on.
> >
> > I also wonder what would need to be done on Windows if we were to
> > introduce this.  I see we define the 'link()' function in
> > 'compat/mingw.c' for that currently, so I guess something similar
> > would be needed for 'linkat()'.  I added Dscho to Cc for Windows
> > expertise.
> 
> For better of worse this particular quest started because I pointed out
> (with some WIP patches) that for understanding this change we should
> test whatever we did now, to ensure that the refactoring didn't have
> unintended side-effects.
> 
> But that's a separate question from whether or not we want to keep the
> current behavior.
> 
> I think the current behavior is clearly insane, so I think we should
> change it with some follow-up patches. In particular options like
> --dissociate should clearly (in my mind at least) have behavior similar
> to "cp -L", and --local should hardlink to the *target* of the symlink,
> if anything, at least for objects/{??,pack,info}

Right, I definitely agree with all of that.  Adding tests for the
current behaviour is definitely a good thing if we can do it in a sane
way.  And I also agree that the current behaviour is insane, and
should be fixed, but that may not want to be part of this patch
series.

> I think that changes the portability story with linkat(), since it's not
> something we should be planning to keep, just an intermediate step so we
> don't have a gigantic patch that both adds tests, refactors and changes
> the behavior.

Fair enough, but that also means that this patch series necessarily
has to introduce the changes in behaviour as well as switching clone
to use dir-iterator.  Of course we could say that the switch-over to
using dir-iterator could be done as a separate patch series, but that
seems a bit too much of a change in scope of this series.

Now I think Matheus has actually found a nice solution to this issue
using 'strbuf_readlink()', which gives us the same behaviour as using
'linkat()' in this patch would give us, so this might not be that big
an issue in the end.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content
  2019-03-29 19:31           ` Thomas Gummerer
  2019-03-29 19:42             ` SZEDER Gábor
@ 2019-03-30  2:49             ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-03-30  2:49 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Alex Riesen,
	Junio C Hamano

On Fri, Mar 29, 2019 at 4:32 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/29, Matheus Tavares Bernardino wrote:
> > On Thu, Mar 28, 2019 at 6:49 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > > When sending someone elses patch in a slightly modified version, it
> > > may also be useful to add which parts you changed, as it was done in
> > > e8dfcace31 ("poll: use GetTickCount64() to avoid wrap-around issues",
> > > 2018-10-31) for example.
> >
> > Thanks, I didn't know about that! I searched the log and didn't see
> > many of this on patches with 'Helped-by' tags, is there a particular
> > case to use it or not?
>
> Helped-by tags are usually used when you want to give someone credit
> for help you got on a patch that you originally authored.  It's up to
> you at which point of involvement you actually want to add the tag, I
> tend to add them whenever someones input significantly
> changes/improves the patch.  I think adding it here might be okay,
> it's just less common when sending a patch that someone else authored
> originally.
>

Ok, got it, thanks!

> > > Iirc, the test that is added in this patch does not work on some
> > > platforms, notably MacOS.  That would mean that we would break
> > > bisectability at this patch on some platforms if we were to introduce
> > > it here.  Therefore I think it would be better to squash this patch
> > > into the next one which fixes these inconsistencies.
> > > Note that I can't test this at the moment, so this concern is only
> > > based on previous discussions that I remember.  If that's already
> > > addressed somehow, all the better!
> >
> > Yes, it is already addressed :) The section of these tests that used
> > to break on some platforms is now moved to the next patch which also
> > fixes the platform inconsistencies. Now both patches (this and the
> > next) work on macOS, NetBSD and GNU/Linux.
>
> Great!
>
> >                                             Also every test and job is
> > passing at travis-ci, except by the job named "Documentation"[1]. But,
> > it's weird since these patches don't even touch Documentation/... And
> > master is failing the same job at my fork as well [2]... Any thoughts
> > on that?
>
> Yeah, this error seems to have nothing to do with your patch series.
> Since the last run of travis on master [*1*] at least the asciidoc
> package doesn't seem to have changed, so from a first look I don't
> quite understand what's going on there.  In any case, I don't think
> you need to worry about that for now, as it hasn't been triggered by
> your changes (I won't discourage you from looking at why it is failing
> and to try and fix that, but I think your time is probably better
> spent looking at this patch series and the proposal for GSoC for
> now).
>

Ok, thanks again.

> *1*: https://travis-ci.org/git/git/builds/508784487
>
> > [1] https://travis-ci.org/MatheusBernardino/git/builds/512713775
> > [2] https://travis-ci.org/MatheusBernardino/git/builds/513028692

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-29 20:05           ` Thomas Gummerer
@ 2019-03-30  5:32             ` Matheus Tavares Bernardino
  2019-03-30 19:27               ` Thomas Gummerer
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-03-30  5:32 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin

On Fri, Mar 29, 2019 at 5:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/29, Matheus Tavares Bernardino wrote:
> > On Thu, Mar 28, 2019 at 7:10 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > > I notice that we are currently not using 'linkat()' anywhere else in
> > > our codebase.  It looks like it has been introduced in POSIX.1-2008,
> > > which sounds fairly recent by git's standards.  So I wonder if this is
> > > really supported on all platforms that git is being built on.
> > >
> > > I also wonder what would need to be done on Windows if we were to
> > > introduce this.  I see we define the 'link()' function in
> > > 'compat/mingw.c' for that currently, so I guess something similar
> > > would be needed for 'linkat()'.  I added Dscho to Cc for Windows
> > > expertise.
> >
> > Ok, what if instead of using linkat() we use 'realpath(const char
> > *path, char *resolved_path)', which will resolve any symlinks at
> > 'path' and store the canonical path at 'resolved_path'? Then, we can
> > still keep using link() but now, with the certainty that all platforms
> > will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
> > Would that be a better idea?
>
> Yeah, I think that is a good idea.  Note that 'realpath()' itself is
> not used anywhere in our codebase either, but there is
> 'strbuf_realpath()', that from reading the function documentation does
> exactly what 'realpath()' would do.  So using 'strbuf_realpath()'
> would probably be the right thing to do here.

Thanks. While I was looking for realpath() at git codebase (before I
saw your email), I got a little confused: Besides strbuf_realpath() I
also found real_path(), real_path_if_valid() and real_pathdup(). All
these last three use strbuf_realpath() but they also initialize the
struct strbuf internally and just return a 'char *', which is much
convenient in some cases. What seems weird to me is that, whilst
real_pathdup() releases the internally initialized struct strubuf
(leaving just the returned string to be free'd by the user), the other
two don't. So, if struct strbuf change in the future to have more
dynamic allocated resources, these functions will also have to be
modified. Also, since real_pathdup() can already do what the other two
do, do you know if there is a reason to keep all of them?

One last question: I found some places which don't free the string
returned by, for example, real_path() (e.g., find_worktree() at
worktree.c). Would it be a valid/good patch (or patches) to add free()
calls in this places? (I'm currently trying to get more people here at
USP to contribute to git, and maybe this could be a nice first
contribution for them...)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-30  5:32             ` Matheus Tavares Bernardino
@ 2019-03-30 19:27               ` Thomas Gummerer
  2019-04-01  3:56                 ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-30 19:27 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin

On 03/30, Matheus Tavares Bernardino wrote:
> On Fri, Mar 29, 2019 at 5:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >
> > On 03/29, Matheus Tavares Bernardino wrote:
> > > Ok, what if instead of using linkat() we use 'realpath(const char
> > > *path, char *resolved_path)', which will resolve any symlinks at
> > > 'path' and store the canonical path at 'resolved_path'? Then, we can
> > > still keep using link() but now, with the certainty that all platforms
> > > will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
> > > Would that be a better idea?
> >
> > Yeah, I think that is a good idea.  Note that 'realpath()' itself is
> > not used anywhere in our codebase either, but there is
> > 'strbuf_realpath()', that from reading the function documentation does
> > exactly what 'realpath()' would do.  So using 'strbuf_realpath()'
> > would probably be the right thing to do here.
> 
> Thanks. While I was looking for realpath() at git codebase (before I
> saw your email), I got a little confused: Besides strbuf_realpath() I
> also found real_path(), real_path_if_valid() and real_pathdup(). All
> these last three use strbuf_realpath() but they also initialize the
> struct strbuf internally and just return a 'char *', which is much
> convenient in some cases.

Right, feel free to use whichever is most convenient for you, and
whichever works in the context.

>                            What seems weird to me is that, whilst
> real_pathdup() releases the internally initialized struct strubuf
> (leaving just the returned string to be free'd by the user), the other
> two don't. So, if struct strbuf change in the future to have more
> dynamic allocated resources, these functions will also have to be
> modified. Also, since real_pathdup() can already do what the other two
> do, do you know if there is a reason to keep all of them?

Right, '*dup()' functions usually leave the return value to be free'd
by the caller.  And while 'real_pathdup()' could do what the others do
already it also takes more effort to use it.  Users don't need to free
the return value from 'real_path()' to avoid a memory leak.  This
alone justifies its existence I think.

> One last question: I found some places which don't free the string
> returned by, for example, real_path() (e.g., find_worktree() at
> worktree.c). Would it be a valid/good patch (or patches) to add free()
> calls in this places? (I'm currently trying to get more people here at
> USP to contribute to git, and maybe this could be a nice first
> contribution for them...)

Trying to plug memory leaks in the codebase is definitely something
that I think is worthy of doing.  Sometimes it's not worth actually
free'ing the memory, for example just before the program exits, in
which case we can use the UNLEAK annotation.  It was introduced in
0e5bba53af ("add UNLEAK annotation for reducing leak false positives",
2017-09-08) if you want more background.

That said, the memory from 'real_path()' should actually not be
free'd.  The strbuf there has a static lifetime, so it is valid until
git exits.  If we were to free the return value of the function we'd
actually free an internal buffer of the strbuf, that is still valid.
So if someone were to use 'real_path()' after that, the memory that
strbuf still thinks it owns would actually have been free'd, which
would result in undefined behaviour, and probably would make git
segfault.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests
  2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
                       ` (6 preceding siblings ...)
  2019-03-22 23:22     ` [GSoC][PATCH v4 7/7] clone: Replace strcmp by fspathcmp Matheus Tavares
@ 2019-03-30 22:49     ` Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
                         ` (8 more replies)
  7 siblings, 9 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

This patchset contains:
- a replacement of explicit recursive dir iteration at
  copy_or_link_directory for the dir-iterator API;
- some refactoring and behaviour changes at local clone, mainly to
  take care of symlinks and hidden files at .git/objects; and
- tests for this type of files

Changes since v4:
- Improved and fixed errors at messages from patches 1, 3, 5, 6 and 7.
- At first patch:
  - Simplified construction, changing a multi-line cat for an echo.
  - Removed unnecessary subshells.
  - Disabled gc.auto, just to make sure we don't get any undesired
    behaviour for this test
  - Removed the first section of a sed command ("s!/..\$!/X!;")
    that converts SHA-1s to fixed strings. No SHA-1 seemed to
    be changed by this section and neither it seemed to be used
    after the command.
- At second patch, removed linkat() usage, which is  POSIX.1-2008
  and may not be supported in all platforms git is being built.
  Now the same effect is achieved using real_pathdup() + link().

v4: https://public-inbox.org/git/20190322232237.13293-1-matheus.bernardino@usp.br/

Matheus Tavares (6):
  clone: better handle symlinked files at .git/objects/
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal
  clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 builtin/clone.c            |  75 ++++++++++++---------
 dir-iterator.c             |  28 +++++++-
 dir-iterator.h             |  39 +++++++++--
 refs/files-backend.c       |   2 +-
 t/t5604-clone-reference.sh | 133 +++++++++++++++++++++++++++++++++++++
 5 files changed, 235 insertions(+), 42 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 1/7] clone: test for our behavior on odd objects/* content
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
@ 2019-03-30 22:49       ` Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
                         ` (7 subsequent siblings)
  8 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Alex Riesen

From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..207650cb95 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,115 @@ test_expect_success 'clone, dissociate from alternates' '
 	( cd C && git fsck )
 '
 
+test_expect_success 'setup repo with garbage in objects/*' '
+	git init S &&
+	(
+		cd S &&
+		test_commit A &&
+
+		cd .git/objects &&
+		>.some-hidden-file &&
+		>some-file &&
+		mkdir .some-hidden-dir &&
+		>.some-hidden-dir/some-file &&
+		>.some-hidden-dir/.some-dot-file &&
+		mkdir some-dir &&
+		>some-dir/some-file &&
+		>some-dir/.some-dot-file
+	)
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option S S$option || return 1 &&
+		git -C S$option fsck || return 1
+	done &&
+	find S-* -name "*some*" | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-file
+	S--dissociate/.git/objects/some-dir
+	S--dissociate/.git/objects/some-dir/.some-dot-file
+	S--dissociate/.git/objects/some-dir/some-file
+	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-file
+	S--local/.git/objects/some-dir
+	S--local/.git/objects/some-dir/.some-dot-file
+	S--local/.git/objects/some-dir/some-file
+	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-file
+	S--no-hardlinks/.git/objects/some-dir
+	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/some-dir/some-file
+	S--no-hardlinks/.git/objects/some-file
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+	git init T &&
+	(
+		cd T &&
+		git config gc.auto 0 &&
+		test_commit A &&
+		git gc &&
+		test_commit B &&
+
+		cd .git/objects &&
+		mv pack packs &&
+		ln -s packs pack &&
+		find ?? -type d >loose-dirs &&
+		last_loose=$(tail -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+		mv $last_loose a-loose-dir &&
+		ln -s a-loose-dir $last_loose &&
+		find . -type f | sort >../../../T.objects-files.raw &&
+		echo unknown_content> unknown_file
+	) &&
+	git -C T fsck &&
+	git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option T T$option || return 1 &&
+		git -C T$option fsck || return 1 &&
+		git -C T$option rev-list --all --objects >T$option.objects &&
+		test_cmp T.objects T$option.objects &&
+		(
+			cd T$option/.git/objects &&
+			find . -type f | sort >../../../T$option.objects-files.raw
+		)
+	done &&
+
+	for raw in $(ls T*.raw)
+	do
+		sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
+		    -e "/multi-pack-index/d" <$raw >$raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./Y/Z
+	./Y/Z
+	./a-loose-dir/Z
+	./Y/Z
+	./info/packs
+	./pack/pack-Z.idx
+	./pack/pack-Z.pack
+	./packs/pack-Z.idx
+	./packs/pack-Z.pack
+	./unknown_file
+	EOF
+
+	for option in --local --dissociate --no-hardlinks
+	do
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+	done &&
+
+	echo ./info/alternates >expected-files &&
+	test_cmp expected-files T--shared.objects-files.raw
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
@ 2019-03-30 22:49       ` Matheus Tavares
  2019-03-31 17:40         ` Thomas Gummerer
  2019-03-30 22:49       ` [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
                         ` (6 subsequent siblings)
  8 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

There is currently an odd behaviour when locally cloning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it, Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
    $ touch a
    $ ln -s a b
    $ link b c
    $ ls -li a b c
    155 [...] a
    156 [...] b -> a
    156 [...] c -> a

But on NetBSD:
    $ ls -li a b c
    2609160 [...] a
    2609164 [...] b -> a
    2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
besides that, the current behaviour on GNU/Linux may result in broken
symlinks. So let's standardize this by making the hardlinks always point
to dereferenced paths, instead of the symlinks themselves. Also, add
tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            |  5 ++++-
 t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..f975b509f1 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -443,7 +443,10 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (unlink(dest->buf) && errno != ENOENT)
 			die_errno(_("failed to unlink '%s'"), dest->buf);
 		if (!option_no_hardlinks) {
-			if (!link(src->buf, dest->buf))
+			char *resolved_path = real_pathdup(src->buf, 1);
+			int status = link(resolved_path, dest->buf);
+			free(resolved_path);
+			if (!status)
 				continue;
 			if (option_local > 0)
 				die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 207650cb95..0800c3853f 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	test_cmp expected actual
 '
 
-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
 	git init T &&
 	(
 		cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 		ln -s packs pack &&
 		find ?? -type d >loose-dirs &&
 		last_loose=$(tail -n 1 loose-dirs) &&
-		rm -f loose-dirs &&
 		mv $last_loose a-loose-dir &&
 		ln -s a-loose-dir $last_loose &&
+		first_loose=$(head -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+
+		cd $first_loose &&
+		obj=$(ls *) &&
+		mv $obj ../an-object &&
+		ln -s ../an-object $obj &&
+
+		cd ../ &&
 		find . -type f | sort >../../../T.objects-files.raw &&
+		find . -type l | sort >../../../T.objects-symlinks.raw &&
 		echo unknown_content> unknown_file
 	) &&
 	git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 '
 
 
-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
 	for option in --local --no-hardlinks --shared --dissociate
 	do
 		git clone $option T T$option || return 1 &&
@@ -300,7 +309,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 		test_cmp T.objects T$option.objects &&
 		(
 			cd T$option/.git/objects &&
-			find . -type f | sort >../../../T$option.objects-files.raw
+			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type l | sort >../../../T$option.objects-symlinks.raw
 		)
 	done &&
 
@@ -314,6 +324,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./Y/Z
 	./Y/Z
 	./a-loose-dir/Z
+	./an-object
 	./Y/Z
 	./info/packs
 	./pack/pack-Z.idx
@@ -323,13 +334,15 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./unknown_file
 	EOF
 
-	for option in --local --dissociate --no-hardlinks
+	for option in --local --no-hardlinks --dissociate
 	do
-		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+		test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
 	done &&
 
 	echo ./info/alternates >expected-files &&
-	test_cmp expected-files T--shared.objects-files.raw
+	test_cmp expected-files T--shared.objects-files.raw &&
+	test_must_be_empty T--shared.objects-symlinks.raw
 '
 
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
@ 2019-03-30 22:49       ` Matheus Tavares
  2019-03-31 18:12         ` Thomas Gummerer
  2019-03-30 22:49       ` [GSoC][PATCH v5 4/7] clone: copy hidden paths at local clone Matheus Tavares
                         ` (5 subsequent siblings)
  8 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Michael Haggerty, Ramsay Jones

Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
dir_iterator_advance abort immediately in the case of an error while
trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
makes the iteration follow symlinks to directories and include its
contents in the iteration. These new flags will be used in a subsequent
patch.

Also adjust refs/files-backend.c to the new dir_iterator_begin
signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c       | 28 +++++++++++++++++++++++++---
 dir-iterator.h       | 39 +++++++++++++++++++++++++++++++++------
 refs/files-backend.c |  2 +-
 3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..17aca8ea41 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,12 +48,16 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned flags;
 };
 
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
+	int ret;
 
 	while (1) {
 		struct dir_iterator_level *level =
@@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
 				warning("error opening directory %s: %s",
 					iter->base.path.buf, strerror(errno));
 				/* Popping the level is handled below */
@@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
 				} else if (closedir(level->dir))
@@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
+
+			if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+				ret = stat(iter->base.path.buf, &iter->base.st);
+			else
+				ret = lstat(iter->base.path.buf, &iter->base.st);
+
+			if (ret < 0) {
+				if (errno != ENOENT) {
+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
+						goto error_out;
 					warning("error reading path '%s': %s",
 						iter->base.path.buf,
 						strerror(errno));
+				}
 				continue;
 			}
 
@@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			return ITER_OK;
 		}
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -182,7 +203,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -195,6 +216,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
+	iter->flags = flags;
 	iter->levels_nr = 1;
 	iter->levels[0].initialized = 0;
 
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..93646c3bea 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -19,7 +19,7 @@
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, 0);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -40,6 +40,20 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error while trying to fetch the next entry, which is
+ *   to emit a warning and keep going. With this flag, resouces are
+ *   freed and ITER_ERROR is return immediately.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
+ *   directories, i.e., iterate over linked directories' contents.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -54,20 +68,28 @@ struct dir_iterator {
 	/* The current basename: */
 	const char *basename;
 
-	/* The result of calling lstat() on path: */
+	/*
+	 * The result of calling lstat() on path or stat(), if the
+	 * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+	 * dir_iterator's initialization.
+	 */
 	struct stat st;
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. Return a dir_iterator that holds the
+ * internal state of the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behaviour.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -76,6 +98,11 @@ struct dir_iterator *dir_iterator_begin(const char *path);
  * dir_iterator and associated resources and return ITER_ERROR. It is
  * a bug to use iterator or call this function again after it has
  * returned ITER_DONE or ITER_ERROR.
+ *
+ * Note that whether dir-iterator will return ITER_ERROR when failing
+ * to fetch the next entry or just emit a warning and try to fetch the
+ * next is defined by the 'pedantic' option at dir-iterator's
+ * initialization.
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..2ce9783097 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 4/7] clone: copy hidden paths at local clone
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
                         ` (2 preceding siblings ...)
  2019-03-30 22:49       ` [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-03-30 22:49       ` Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 5/7] clone: extract function from copy_or_link_directory Matheus Tavares
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            | 2 +-
 t/t5604-clone-reference.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f975b509f1..81e1a39c61 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 			continue;
 		}
 		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
+			if (!is_dot_or_dotdot(de->d_name))
 				copy_or_link_directory(src, dest,
 						       src_repo, src_baselen);
 			continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 0800c3853f..c3998f2f9e 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	done &&
 	find S-* -name "*some*" | sort >actual &&
 	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-dir
+	S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+	S--dissociate/.git/objects/.some-hidden-dir/some-file
 	S--dissociate/.git/objects/.some-hidden-file
 	S--dissociate/.git/objects/some-dir
 	S--dissociate/.git/objects/some-dir/.some-dot-file
 	S--dissociate/.git/objects/some-dir/some-file
 	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-dir
+	S--local/.git/objects/.some-hidden-dir/.some-dot-file
+	S--local/.git/objects/.some-hidden-dir/some-file
 	S--local/.git/objects/.some-hidden-file
 	S--local/.git/objects/some-dir
 	S--local/.git/objects/some-dir/.some-dot-file
 	S--local/.git/objects/some-dir/some-file
 	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir
+	S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
 	S--no-hardlinks/.git/objects/.some-hidden-file
 	S--no-hardlinks/.git/objects/some-dir
 	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 5/7] clone: extract function from copy_or_link_directory
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
                         ` (3 preceding siblings ...)
  2019-03-30 22:49       ` [GSoC][PATCH v5 4/7] clone: copy hidden paths at local clone Matheus Tavares
@ 2019-03-30 22:49       ` Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 6/7] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 81e1a39c61..f348eb02d4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,21 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
 	fclose(in);
 }
 
+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+	struct stat st;
+
+	if (!mkdir(pathname, mode))
+		return;
+
+	if (errno != EEXIST)
+		die_errno(_("failed to create directory '%s'"), pathname);
+	else if (stat(pathname, &st))
+		die_errno(_("failed to stat '%s'"), pathname);
+	else if (!S_ISDIR(st.st_mode))
+		die(_("%s exists and is not a directory"), pathname);
+}
+
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 				   const char *src_repo, int src_baselen)
 {
@@ -404,14 +419,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	if (!dir)
 		die_errno(_("failed to open '%s'"), src->buf);
 
-	if (mkdir(dest->buf, 0777)) {
-		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
-		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
-	}
+	mkdir_if_missing(dest->buf, 0777);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 6/7] clone: use dir-iterator to avoid explicit dir traversal
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
                         ` (4 preceding siblings ...)
  2019-03-30 22:49       ` [GSoC][PATCH v5 5/7] clone: extract function from copy_or_link_directory Matheus Tavares
@ 2019-03-30 22:49       ` Matheus Tavares
  2019-03-30 22:49       ` [GSoC][PATCH v5 7/7] clone: replace strcmp by fspathcmp Matheus Tavares
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat, inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f348eb02d4..ebe8d83334 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -408,42 +410,36 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
 }
 
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+				   const char *src_repo)
 {
-	struct dirent *de;
-	struct stat buf;
 	int src_len, dest_len;
-	DIR *dir;
-
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
+	struct dir_iterator *iter;
+	int iter_status;
+	unsigned flags;
 
 	mkdir_if_missing(dest->buf, 0777);
 
+	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+	iter = dir_iterator_begin(src->buf, flags);
+
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (!is_dot_or_dotdot(de->d_name))
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
@@ -463,7 +459,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
@@ -481,7 +481,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
 		get_common_dir(&dest, dest_repo);
 		strbuf_addstr(&src, "/objects");
 		strbuf_addstr(&dest, "/objects");
-		copy_or_link_directory(&src, &dest, src_repo, src.len);
+		copy_or_link_directory(&src, &dest, src_repo);
 		strbuf_release(&src);
 		strbuf_release(&dest);
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v5 7/7] clone: replace strcmp by fspathcmp
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
                         ` (5 preceding siblings ...)
  2019-03-30 22:49       ` [GSoC][PATCH v5 6/7] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-03-30 22:49       ` Matheus Tavares
  2019-03-31 18:16       ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Thomas Gummerer
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
  8 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-03-30 22:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Suggested-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index ebe8d83334..bf56a01638 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -439,7 +439,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(iter->relative_path, "info/alternates")) {
+		if (!fspathcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-30 22:49       ` [GSoC][PATCH v5 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
@ 2019-03-31 17:40         ` Thomas Gummerer
  2019-04-01  3:59           ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-31 17:40 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

On 03/30, Matheus Tavares wrote:
> There is currently an odd behaviour when locally cloning a repository
> with symlinks at .git/objects: using --no-hardlinks all symlinks are
> dereferenced but without it, Git will try to hardlink the files with the
> link() function, which has an OS-specific behaviour on symlinks. On OSX
> and NetBSD, it creates a hardlink to the file pointed by the symlink
> whilst on GNU/Linux, it creates a hardlink to the symlink itself.
> 
> On Manjaro GNU/Linux:
>     $ touch a
>     $ ln -s a b
>     $ link b c
>     $ ls -li a b c
>     155 [...] a
>     156 [...] b -> a
>     156 [...] c -> a
> 
> But on NetBSD:
>     $ ls -li a b c
>     2609160 [...] a
>     2609164 [...] b -> a
>     2609160 [...] c
> 
> It's not good to have the result of a local clone to be OS-dependent and
> besides that, the current behaviour on GNU/Linux may result in broken
> symlinks. So let's standardize this by making the hardlinks always point
> to dereferenced paths, instead of the symlinks themselves. Also, add
> tests for symlinked files at .git/objects/.
> 
> Note: Git won't create symlinks at .git/objects itself, but it's better
> to handle this case and be friendly with users who manually create them.
> 
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/clone.c            |  5 ++++-
>  t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
>  2 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 50bde99618..f975b509f1 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -443,7 +443,10 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>  		if (unlink(dest->buf) && errno != ENOENT)
>  			die_errno(_("failed to unlink '%s'"), dest->buf);
>  		if (!option_no_hardlinks) {
> -			if (!link(src->buf, dest->buf))
> +			char *resolved_path = real_pathdup(src->buf, 1);
> +			int status = link(resolved_path, dest->buf);
> +			free(resolved_path);
> +			if (!status)

Is there any reason why we can't use 'real_path()' here?  As I
mentioned in [*1*], 'real_path()' doesn't require the callers to free
any memory, so the above could become much simpler, and could just be

+			if (!link(real_path(src->buf), dest->buf))

*1*: <20190330192738.GQ32487@hank.intra.tgummerer.com>

>  				continue;
>  			if (option_local > 0)
>  				die_errno(_("failed to create link '%s'"), dest->buf);

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-03-30 22:49       ` [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-03-31 18:12         ` Thomas Gummerer
  2019-04-10 20:24           ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-31 18:12 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Michael Haggerty, Ramsay Jones

On 03/30, Matheus Tavares wrote:
> Add the possibility of giving flags to dir_iterator_begin to initialize
> a dir-iterator with special options.
> 
> Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
> dir_iterator_advance abort immediately in the case of an error while
> trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
> makes the iteration follow symlinks to directories and include its
> contents in the iteration. These new flags will be used in a subsequent
> patch.
> 
> Also adjust refs/files-backend.c to the new dir_iterator_begin
> signature.
> 
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  dir-iterator.c       | 28 +++++++++++++++++++++++++---
>  dir-iterator.h       | 39 +++++++++++++++++++++++++++++++++------
>  refs/files-backend.c |  2 +-
>  3 files changed, 59 insertions(+), 10 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index f2dcd82fde..17aca8ea41 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -48,12 +48,16 @@ struct dir_iterator_int {
>  	 * that will be included in this iteration.
>  	 */
>  	struct dir_iterator_level *levels;
> +
> +	/* Combination of flags for this dir-iterator */
> +	unsigned flags;
>  };
>  
>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  {
>  	struct dir_iterator_int *iter =
>  		(struct dir_iterator_int *)dir_iterator;
> +	int ret;

Minor nit: I'd define this variable closer to where it is actually
used, inside the second 'while(1)' loop in this function.  That would
make it clearer that it's only used there and not in other places in
the function as well, which I had first expected when I read this.

>  	while (1) {
>  		struct dir_iterator_level *level =
> @@ -71,6 +75,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  			level->dir = opendir(iter->base.path.buf);
>  			if (!level->dir && errno != ENOENT) {
> +				if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +					goto error_out;
>  				warning("error opening directory %s: %s",
>  					iter->base.path.buf, strerror(errno));
>  				/* Popping the level is handled below */
> @@ -122,6 +128,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			if (!de) {
>  				/* This level is exhausted; pop up a level. */
>  				if (errno) {
> +					if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +						goto error_out;
>  					warning("error reading directory %s: %s",
>  						iter->base.path.buf, strerror(errno));
>  				} else if (closedir(level->dir))
> @@ -138,11 +146,20 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  				continue;
>  
>  			strbuf_addstr(&iter->base.path, de->d_name);
> -			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
> -				if (errno != ENOENT)
> +
> +			if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
> +				ret = stat(iter->base.path.buf, &iter->base.st);
> +			else
> +				ret = lstat(iter->base.path.buf, &iter->base.st);
> +
> +			if (ret < 0) {
> +				if (errno != ENOENT) {
> +					if (iter->flags & DIR_ITERATOR_PEDANTIC)
> +						goto error_out;
>  					warning("error reading path '%s': %s",
>  						iter->base.path.buf,
>  						strerror(errno));
> +				}
>  				continue;
>  			}
>  
> @@ -159,6 +176,10 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			return ITER_OK;
>  		}
>  	}
> +
> +error_out:
> +	dir_iterator_abort(dir_iterator);
> +	return ITER_ERROR;
>  }
>  
>  int dir_iterator_abort(struct dir_iterator *dir_iterator)
> @@ -182,7 +203,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	return ITER_DONE;
>  }
>  
> -struct dir_iterator *dir_iterator_begin(const char *path)
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
>  {
>  	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
>  	struct dir_iterator *dir_iterator = &iter->base;
> @@ -195,6 +216,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
>  
>  	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
>  
> +	iter->flags = flags;
>  	iter->levels_nr = 1;
>  	iter->levels[0].initialized = 0;
>  
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 970793d07a..93646c3bea 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -19,7 +19,7 @@
>   * A typical iteration looks like this:
>   *
>   *     int ok;
> - *     struct iterator *iter = dir_iterator_begin(path);
> + *     struct iterator *iter = dir_iterator_begin(path, 0);

Outside of this context, we already mentione errorhandling when
'ok != ITER_DONE' in his example.  This still can't happen with the
way the dir iterator is used here, but it serves as a reminder if
people are using the DIR_ITERATOR_PEDANTIC flag.  Good.

>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
>   *             if (want_to_stop_iteration()) {
> @@ -40,6 +40,20 @@
>   * dir_iterator_advance() again.
>   */
>  
> +/*
> + * Flags for dir_iterator_begin:
> + *
> + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> + *   in case of an error while trying to fetch the next entry, which is
> + *   to emit a warning and keep going. With this flag, resouces are
> + *   freed and ITER_ERROR is return immediately.
> + *
> + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks to
> + *   directories, i.e., iterate over linked directories' contents.
> + */
> +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> +
>  struct dir_iterator {
>  	/* The current path: */
>  	struct strbuf path;
> @@ -54,20 +68,28 @@ struct dir_iterator {
>  	/* The current basename: */
>  	const char *basename;
>  
> -	/* The result of calling lstat() on path: */
> +	/*
> +	 * The result of calling lstat() on path or stat(), if the
> +	 * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
> +	 * dir_iterator's initialization.
> +	 */
>  	struct stat st;
>  };
>  
>  /*
> - * Start a directory iteration over path. Return a dir_iterator that
> - * holds the internal state of the iteration.
> + * Start a directory iteration over path with the combination of
> + * options specified by flags. Return a dir_iterator that holds the
> + * internal state of the iteration.
>   *
>   * The iteration includes all paths under path, not including path
>   * itself and not including "." or ".." entries.
>   *
> - * path is the starting directory. An internal copy will be made.
> + * Parameters are:
> + *  - path is the starting directory. An internal copy will be made.
> + *  - flags is a combination of the possible flags to initialize a
> + *    dir-iterator or 0 for default behaviour.
>   */
> -struct dir_iterator *dir_iterator_begin(const char *path);
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
>  
>  /*
>   * Advance the iterator to the first or next item and return ITER_OK.
> @@ -76,6 +98,11 @@ struct dir_iterator *dir_iterator_begin(const char *path);
>   * dir_iterator and associated resources and return ITER_ERROR. It is
>   * a bug to use iterator or call this function again after it has
>   * returned ITER_DONE or ITER_ERROR.
> + *
> + * Note that whether dir-iterator will return ITER_ERROR when failing
> + * to fetch the next entry or just emit a warning and try to fetch the
> + * next is defined by the 'pedantic' option at dir-iterator's
> + * initialization.

I feel like at this point we are repeating documentation that already
exists for the flags.  Should we ever find a reason to return
ITER_ERROR without the pedantic flag, this comment is likely to become
out of date.  I think not adding this note is probably better in this
case.

>   */
>  int dir_iterator_advance(struct dir_iterator *iterator);
>  
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index ef053f716c..2ce9783097 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2143,7 +2143,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
>  
>  	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
>  	strbuf_addf(&sb, "%s/logs", gitdir);
> -	iter->dir_iterator = dir_iterator_begin(sb.buf);
> +	iter->dir_iterator = dir_iterator_begin(sb.buf, 0);
>  	iter->ref_store = ref_store;
>  	strbuf_release(&sb);
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
                         ` (6 preceding siblings ...)
  2019-03-30 22:49       ` [GSoC][PATCH v5 7/7] clone: replace strcmp by fspathcmp Matheus Tavares
@ 2019-03-31 18:16       ` Thomas Gummerer
  2019-04-01 13:56         ` Matheus Tavares Bernardino
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
  8 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-03-31 18:16 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

On 03/30, Matheus Tavares wrote:
> This patchset contains:
> - a replacement of explicit recursive dir iteration at
>   copy_or_link_directory for the dir-iterator API;
> - some refactoring and behaviour changes at local clone, mainly to
>   take care of symlinks and hidden files at .git/objects; and
> - tests for this type of files

Thanks.  I read through the series, and only found a few minor nits.

One note on the cover letter, as I'm not sure I mentioned this before.
But as the series progresses and there are less changes in individual
patches, it is useful to include a 'range-diff', so reviewers can
quickly see what changed in the series.  This is especially useful if
they can still remember the last iteration, so they don't necessarily
have to re-read the whole series.

This can be added using the '--range-diff' option in 'git
format-patch'.

> Changes since v4:
> - Improved and fixed errors at messages from patches 1, 3, 5, 6 and 7.
> - At first patch:
>   - Simplified construction, changing a multi-line cat for an echo.
>   - Removed unnecessary subshells.
>   - Disabled gc.auto, just to make sure we don't get any undesired
>     behaviour for this test
>   - Removed the first section of a sed command ("s!/..\$!/X!;")
>     that converts SHA-1s to fixed strings. No SHA-1 seemed to
>     be changed by this section and neither it seemed to be used
>     after the command.
> - At second patch, removed linkat() usage, which is  POSIX.1-2008
>   and may not be supported in all platforms git is being built.
>   Now the same effect is achieved using real_pathdup() + link().
> 
> v4: https://public-inbox.org/git/20190322232237.13293-1-matheus.bernardino@usp.br/
> 
> Matheus Tavares (6):
>   clone: better handle symlinked files at .git/objects/
>   dir-iterator: add flags parameter to dir_iterator_begin
>   clone: copy hidden paths at local clone
>   clone: extract function from copy_or_link_directory
>   clone: use dir-iterator to avoid explicit dir traversal
>   clone: replace strcmp by fspathcmp
> 
> Ævar Arnfjörð Bjarmason (1):
>   clone: test for our behavior on odd objects/* content
> 
>  builtin/clone.c            |  75 ++++++++++++---------
>  dir-iterator.c             |  28 +++++++-
>  dir-iterator.h             |  39 +++++++++--
>  refs/files-backend.c       |   2 +-
>  t/t5604-clone-reference.sh | 133 +++++++++++++++++++++++++++++++++++++
>  5 files changed, 235 insertions(+), 42 deletions(-)
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-30 19:27               ` Thomas Gummerer
@ 2019-04-01  3:56                 ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-01  3:56 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, Kernel USP, Benoit Pierre,
	Junio C Hamano, Johannes Schindelin

On Sat, Mar 30, 2019 at 4:27 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/30, Matheus Tavares Bernardino wrote:
> > On Fri, Mar 29, 2019 at 5:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > >
> > > On 03/29, Matheus Tavares Bernardino wrote:
> > > > Ok, what if instead of using linkat() we use 'realpath(const char
> > > > *path, char *resolved_path)', which will resolve any symlinks at
> > > > 'path' and store the canonical path at 'resolved_path'? Then, we can
> > > > still keep using link() but now, with the certainty that all platforms
> > > > will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
> > > > Would that be a better idea?
> > >
> > > Yeah, I think that is a good idea.  Note that 'realpath()' itself is
> > > not used anywhere in our codebase either, but there is
> > > 'strbuf_realpath()', that from reading the function documentation does
> > > exactly what 'realpath()' would do.  So using 'strbuf_realpath()'
> > > would probably be the right thing to do here.
> >
> > Thanks. While I was looking for realpath() at git codebase (before I
> > saw your email), I got a little confused: Besides strbuf_realpath() I
> > also found real_path(), real_path_if_valid() and real_pathdup(). All
> > these last three use strbuf_realpath() but they also initialize the
> > struct strbuf internally and just return a 'char *', which is much
> > convenient in some cases.
>
> Right, feel free to use whichever is most convenient for you, and
> whichever works in the context.
>
> >                            What seems weird to me is that, whilst
> > real_pathdup() releases the internally initialized struct strubuf
> > (leaving just the returned string to be free'd by the user), the other
> > two don't. So, if struct strbuf change in the future to have more
> > dynamic allocated resources, these functions will also have to be
> > modified. Also, since real_pathdup() can already do what the other two
> > do, do you know if there is a reason to keep all of them?
>
> Right, '*dup()' functions usually leave the return value to be free'd
> by the caller.  And while 'real_pathdup()' could do what the others do
> already it also takes more effort to use it.  Users don't need to free
> the return value from 'real_path()' to avoid a memory leak.  This
> alone justifies its existence I think.
>
> > One last question: I found some places which don't free the string
> > returned by, for example, real_path() (e.g., find_worktree() at
> > worktree.c). Would it be a valid/good patch (or patches) to add free()
> > calls in this places? (I'm currently trying to get more people here at
> > USP to contribute to git, and maybe this could be a nice first
> > contribution for them...)
>
> Trying to plug memory leaks in the codebase is definitely something
> that I think is worthy of doing.  Sometimes it's not worth actually
> free'ing the memory, for example just before the program exits, in
> which case we can use the UNLEAK annotation.  It was introduced in
> 0e5bba53af ("add UNLEAK annotation for reducing leak false positives",
> 2017-09-08) if you want more background.
>
> That said, the memory from 'real_path()' should actually not be
> free'd.  The strbuf there has a static lifetime, so it is valid until
> git exits.  If we were to free the return value of the function we'd
> actually free an internal buffer of the strbuf, that is still valid.
> So if someone were to use 'real_path()' after that, the memory that
> strbuf still thinks it owns would actually have been free'd, which
> would result in undefined behaviour, and probably would make git
> segfault.
>

Thanks for the great explanation, Thomas. I hadn't noticed that the
strbuf variable inside real_path() is declared as static. I also took
some time, now, to better understand how strbuf functions deal with
the buf attribute (especially how it's realloc'ed) and now I think I
understand it better. Thanks again for the help!

> --
> You received this message because you are subscribed to the Google Groups "Kernel USP" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-usp+unsubscribe@googlegroups.com.
> To post to this group, send email to kernel-usp@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kernel-usp/20190330192738.GQ32487%40hank.intra.tgummerer.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 2/7] clone: better handle symlinked files at .git/objects/
  2019-03-31 17:40         ` Thomas Gummerer
@ 2019-04-01  3:59           ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-01  3:59 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Kernel USP

On Sun, Mar 31, 2019 at 2:40 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/30, Matheus Tavares wrote:
> > There is currently an odd behaviour when locally cloning a repository
> > with symlinks at .git/objects: using --no-hardlinks all symlinks are
> > dereferenced but without it, Git will try to hardlink the files with the
> > link() function, which has an OS-specific behaviour on symlinks. On OSX
> > and NetBSD, it creates a hardlink to the file pointed by the symlink
> > whilst on GNU/Linux, it creates a hardlink to the symlink itself.
> >
> > On Manjaro GNU/Linux:
> >     $ touch a
> >     $ ln -s a b
> >     $ link b c
> >     $ ls -li a b c
> >     155 [...] a
> >     156 [...] b -> a
> >     156 [...] c -> a
> >
> > But on NetBSD:
> >     $ ls -li a b c
> >     2609160 [...] a
> >     2609164 [...] b -> a
> >     2609160 [...] c
> >
> > It's not good to have the result of a local clone to be OS-dependent and
> > besides that, the current behaviour on GNU/Linux may result in broken
> > symlinks. So let's standardize this by making the hardlinks always point
> > to dereferenced paths, instead of the symlinks themselves. Also, add
> > tests for symlinked files at .git/objects/.
> >
> > Note: Git won't create symlinks at .git/objects itself, but it's better
> > to handle this case and be friendly with users who manually create them.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > ---
> >  builtin/clone.c            |  5 ++++-
> >  t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
> >  2 files changed, 24 insertions(+), 8 deletions(-)
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index 50bde99618..f975b509f1 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -443,7 +443,10 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >               if (unlink(dest->buf) && errno != ENOENT)
> >                       die_errno(_("failed to unlink '%s'"), dest->buf);
> >               if (!option_no_hardlinks) {
> > -                     if (!link(src->buf, dest->buf))
> > +                     char *resolved_path = real_pathdup(src->buf, 1);
> > +                     int status = link(resolved_path, dest->buf);
> > +                     free(resolved_path);
> > +                     if (!status)
>
> Is there any reason why we can't use 'real_path()' here?  As I
> mentioned in [*1*], 'real_path()' doesn't require the callers to free
> any memory, so the above could become much simpler, and could just be
>
> +                       if (!link(real_path(src->buf), dest->buf))
>

Yes, you are right. I will change this! I sent this v5 before
carefully reading your previous email and studding strbuf functions
and real_path(), now that I did that, I see that real_path() is the
best option here. Thanks!

> *1*: <20190330192738.GQ32487@hank.intra.tgummerer.com>
>
> >                               continue;
> >                       if (option_local > 0)
> >                               die_errno(_("failed to create link '%s'"), dest->buf);

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests
  2019-03-31 18:16       ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Thomas Gummerer
@ 2019-04-01 13:56         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-01 13:56 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Kernel USP

On Sun, Mar 31, 2019 at 3:16 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/30, Matheus Tavares wrote:
> > This patchset contains:
> > - a replacement of explicit recursive dir iteration at
> >   copy_or_link_directory for the dir-iterator API;
> > - some refactoring and behaviour changes at local clone, mainly to
> >   take care of symlinks and hidden files at .git/objects; and
> > - tests for this type of files
>
> Thanks.  I read through the series, and only found a few minor nits.
>
> One note on the cover letter, as I'm not sure I mentioned this before.
> But as the series progresses and there are less changes in individual
> patches, it is useful to include a 'range-diff', so reviewers can
> quickly see what changed in the series.  This is especially useful if
> they can still remember the last iteration, so they don't necessarily
> have to re-read the whole series.
>
> This can be added using the '--range-diff' option in 'git
> format-patch'.

Thanks! I think you've said it earlier, but I forgot to use. I will
include it in v6! Thanks for remembering me about it.

> > Changes since v4:
> > - Improved and fixed errors at messages from patches 1, 3, 5, 6 and 7.
> > - At first patch:
> >   - Simplified construction, changing a multi-line cat for an echo.
> >   - Removed unnecessary subshells.
> >   - Disabled gc.auto, just to make sure we don't get any undesired
> >     behaviour for this test
> >   - Removed the first section of a sed command ("s!/..\$!/X!;")
> >     that converts SHA-1s to fixed strings. No SHA-1 seemed to
> >     be changed by this section and neither it seemed to be used
> >     after the command.
> > - At second patch, removed linkat() usage, which is  POSIX.1-2008
> >   and may not be supported in all platforms git is being built.
> >   Now the same effect is achieved using real_pathdup() + link().
> >
> > v4: https://public-inbox.org/git/20190322232237.13293-1-matheus.bernardino@usp.br/
> >
> > Matheus Tavares (6):
> >   clone: better handle symlinked files at .git/objects/
> >   dir-iterator: add flags parameter to dir_iterator_begin
> >   clone: copy hidden paths at local clone
> >   clone: extract function from copy_or_link_directory
> >   clone: use dir-iterator to avoid explicit dir traversal
> >   clone: replace strcmp by fspathcmp
> >
> > Ęvar Arnfjörš Bjarmason (1):
> >   clone: test for our behavior on odd objects/* content
> >
> >  builtin/clone.c            |  75 ++++++++++++---------
> >  dir-iterator.c             |  28 +++++++-
> >  dir-iterator.h             |  39 +++++++++--
> >  refs/files-backend.c       |   2 +-
> >  t/t5604-clone-reference.sh | 133 +++++++++++++++++++++++++++++++++++++
> >  5 files changed, 235 insertions(+), 42 deletions(-)
> >
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-03-31 18:12         ` Thomas Gummerer
@ 2019-04-10 20:24           ` Matheus Tavares Bernardino
  2019-04-11 21:09             ` Thomas Gummerer
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-10 20:24 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones

Hi, Thomas

Sorry for the late reply, but now that I submitted my GSoC proposal I
can finally come back to this series.

On Sun, Mar 31, 2019 at 3:12 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/30, Matheus Tavares wrote:
> > Add the possibility of giving flags to dir_iterator_begin to initialize
> > a dir-iterator with special options.
> >
> > Currently possible flags are DIR_ITERATOR_PEDANTIC, which makes
> > dir_iterator_advance abort immediately in the case of an error while
> > trying to fetch next entry; and DIR_ITERATOR_FOLLOW_SYMLINKS, which
> > makes the iteration follow symlinks to directories and include its
> > contents in the iteration. These new flags will be used in a subsequent
> > patch.
> >
> > Also adjust refs/files-backend.c to the new dir_iterator_begin
> > signature.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
> >  dir-iterator.c       | 28 +++++++++++++++++++++++++---
> >  dir-iterator.h       | 39 +++++++++++++++++++++++++++++++++------
> >  refs/files-backend.c |  2 +-
> >  3 files changed, 59 insertions(+), 10 deletions(-)
> >
> > diff --git a/dir-iterator.c b/dir-iterator.c
> > index f2dcd82fde..17aca8ea41 100644
> > --- a/dir-iterator.c
> > +++ b/dir-iterator.c
> > @@ -48,12 +48,16 @@ struct dir_iterator_int {
> >        * that will be included in this iteration.
> >        */
> >       struct dir_iterator_level *levels;
> > +
> > +     /* Combination of flags for this dir-iterator */
> > +     unsigned flags;
> >  };
> >
> >  int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >  {
> >       struct dir_iterator_int *iter =
> >               (struct dir_iterator_int *)dir_iterator;
> > +     int ret;
>
> Minor nit: I'd define this variable closer to where it is actually
> used, inside the second 'while(1)' loop in this function.  That would
> make it clearer that it's only used there and not in other places in
> the function as well, which I had first expected when I read this.

Right, thanks.

> > diff --git a/dir-iterator.h b/dir-iterator.h
> > index 970793d07a..93646c3bea 100644
> > --- a/dir-iterator.h
> > +++ b/dir-iterator.h
> > @@ -19,7 +19,7 @@
> >   * A typical iteration looks like this:
> >   *
> >   *     int ok;
> > - *     struct iterator *iter = dir_iterator_begin(path);
> > + *     struct iterator *iter = dir_iterator_begin(path, 0);
>
> Outside of this context, we already mentione errorhandling when
> 'ok != ITER_DONE' in his example.  This still can't happen with the
> way the dir iterator is used here, but it serves as a reminder if
> people are using the DIR_ITERATOR_PEDANTIC flag.  Good.

This made me think again about the documentation saying that
dir_iterator_abort() and dir_iterator_advance() may return ITER_ERROR,
but the implementation does not containing these possibilities.
(Besides when the pedantic flag is used). Maybe the idea was to make
API-users implement the check for an ITER_ERROR in case dir-iterator
needs to start returning it in the future.

But do you think such a change in dir-iterator is likely to happen?
Maybe we could just make dir_iterator_abort() be void and remove this
section from documentation. Then, for dir_iterator_advance() users
would only need to check for ITER_ERROR if the pedantic flag was given
at dir-iterator creation...

Also CC-ed Michael in case he has some input

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-04-10 20:24           ` Matheus Tavares Bernardino
@ 2019-04-11 21:09             ` Thomas Gummerer
  2019-04-23 17:07               ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-04-11 21:09 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones

On 04/10, Matheus Tavares Bernardino wrote:
> > > diff --git a/dir-iterator.h b/dir-iterator.h
> > > index 970793d07a..93646c3bea 100644
> > > --- a/dir-iterator.h
> > > +++ b/dir-iterator.h
> > > @@ -19,7 +19,7 @@
> > >   * A typical iteration looks like this:
> > >   *
> > >   *     int ok;
> > > - *     struct iterator *iter = dir_iterator_begin(path);
> > > + *     struct iterator *iter = dir_iterator_begin(path, 0);
> >
> > Outside of this context, we already mentione errorhandling when
> > 'ok != ITER_DONE' in his example.  This still can't happen with the
> > way the dir iterator is used here, but it serves as a reminder if
> > people are using the DIR_ITERATOR_PEDANTIC flag.  Good.
> 
> This made me think again about the documentation saying that
> dir_iterator_abort() and dir_iterator_advance() may return ITER_ERROR,
> but the implementation does not containing these possibilities.
> (Besides when the pedantic flag is used). Maybe the idea was to make
> API-users implement the check for an ITER_ERROR in case dir-iterator
> needs to start returning it in the future.

Yeah, I think that was the intention.

> But do you think such a change in dir-iterator is likely to happen?
> Maybe we could just make dir_iterator_abort() be void and remove this
> section from documentation. Then, for dir_iterator_advance() users
> would only need to check for ITER_ERROR if the pedantic flag was given
> at dir-iterator creation...

Dunno.  In a world where we have the pedantic flag, I think only
returning ITER_ERROR if that flag is given might be what we want to
do.  I can't think of a reason why we would want to return ITER_ERROR
without the pedantic flag in that case.

Though I think I would change the example the other way in that case,
and pass DIR_ITERATOR_PEDANTIC to 'dir_iterator_begin()', as it would
be easy to forget error handling otherwise, even when it is
necessary.  I'd rather err on the side of showing too much error
handling, than having people forget it and having users run into some
odd edge cases in the wild that the tests don't cover.

> Also CC-ed Michael in case he has some input

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-04-11 21:09             ` Thomas Gummerer
@ 2019-04-23 17:07               ` Matheus Tavares Bernardino
  2019-04-24 18:36                 ` Thomas Gummerer
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-23 17:07 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones

On Thu, Apr 11, 2019 at 6:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 04/10, Matheus Tavares Bernardino wrote:
> > > > diff --git a/dir-iterator.h b/dir-iterator.h
> > > > index 970793d07a..93646c3bea 100644
> > > > --- a/dir-iterator.h
> > > > +++ b/dir-iterator.h
> > > > @@ -19,7 +19,7 @@
> > > >   * A typical iteration looks like this:
> > > >   *
> > > >   *     int ok;
> > > > - *     struct iterator *iter = dir_iterator_begin(path);
> > > > + *     struct iterator *iter = dir_iterator_begin(path, 0);
> > >
> > > Outside of this context, we already mentione errorhandling when
> > > 'ok != ITER_DONE' in his example.  This still can't happen with the
> > > way the dir iterator is used here, but it serves as a reminder if
> > > people are using the DIR_ITERATOR_PEDANTIC flag.  Good.
> >
> > This made me think again about the documentation saying that
> > dir_iterator_abort() and dir_iterator_advance() may return ITER_ERROR,
> > but the implementation does not containing these possibilities.
> > (Besides when the pedantic flag is used). Maybe the idea was to make
> > API-users implement the check for an ITER_ERROR in case dir-iterator
> > needs to start returning it in the future.
>
> Yeah, I think that was the intention.
>
> > But do you think such a change in dir-iterator is likely to happen?
> > Maybe we could just make dir_iterator_abort() be void and remove this
> > section from documentation. Then, for dir_iterator_advance() users
> > would only need to check for ITER_ERROR if the pedantic flag was given
> > at dir-iterator creation...
>
> Dunno.  In a world where we have the pedantic flag, I think only
> returning ITER_ERROR if that flag is given might be what we want to
> do.  I can't think of a reason why we would want to return ITER_ERROR
> without the pedantic flag in that case.

Ok. I began doing the change, but got stuck in a specific decision.
What I was trying to do is:

1) Make dir_iterator_advance() return ITER_ERROR only when the
pedantic flag is given;
2) Make dir_iterator_abort() be void.

The first change is trivial. But the second is not so easy: Since the
[only] current API user defines other iterators on top of
dir-iterator, it would require a somehow big surgery on refs/* to make
this change. Should I proceed and make the changes at refs/* or should
I keep dir_iterator_abort() returning int, although it can never fail?

There's also a third option: The only operation that may fail during
dir_iterator_abort() is closedir(). But even on
dir_iterator_advance(), I'm treating this error as "non-fatal" in the
sense that it's not caught by the pedantic flag (although a warning is
emitted). I did it like this because it doesn't seem like a major
error during dir iteration... But I could change this and make
DIR_ITERATOR_PEDANTIC return ITER_ERROR upon closedir() errors for
both dir-iterator advance() and abort() functions. What do you think?

> Though I think I would change the example the other way in that case,
> and pass DIR_ITERATOR_PEDANTIC to 'dir_iterator_begin()', as it would
> be easy to forget error handling otherwise, even when it is
> necessary.  I'd rather err on the side of showing too much error
> handling, than having people forget it and having users run into some
> odd edge cases in the wild that the tests don't cover.

Yes, I agree.

> > Also CC-ed Michael in case he has some input

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-04-23 17:07               ` Matheus Tavares Bernardino
@ 2019-04-24 18:36                 ` Thomas Gummerer
  2019-04-26  4:13                   ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Thomas Gummerer @ 2019-04-24 18:36 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones

On 04/23, Matheus Tavares Bernardino wrote:
> On Thu, Apr 11, 2019 at 6:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >
> > On 04/10, Matheus Tavares Bernardino wrote:
> > > > > diff --git a/dir-iterator.h b/dir-iterator.h
> > > > > index 970793d07a..93646c3bea 100644
> > > > > --- a/dir-iterator.h
> > > > > +++ b/dir-iterator.h
> > > > > @@ -19,7 +19,7 @@
> > > > >   * A typical iteration looks like this:
> > > > >   *
> > > > >   *     int ok;
> > > > > - *     struct iterator *iter = dir_iterator_begin(path);
> > > > > + *     struct iterator *iter = dir_iterator_begin(path, 0);
> > > >
> > > > Outside of this context, we already mentione errorhandling when
> > > > 'ok != ITER_DONE' in his example.  This still can't happen with the
> > > > way the dir iterator is used here, but it serves as a reminder if
> > > > people are using the DIR_ITERATOR_PEDANTIC flag.  Good.
> > >
> > > This made me think again about the documentation saying that
> > > dir_iterator_abort() and dir_iterator_advance() may return ITER_ERROR,
> > > but the implementation does not containing these possibilities.
> > > (Besides when the pedantic flag is used). Maybe the idea was to make
> > > API-users implement the check for an ITER_ERROR in case dir-iterator
> > > needs to start returning it in the future.
> >
> > Yeah, I think that was the intention.
> >
> > > But do you think such a change in dir-iterator is likely to happen?
> > > Maybe we could just make dir_iterator_abort() be void and remove this
> > > section from documentation. Then, for dir_iterator_advance() users
> > > would only need to check for ITER_ERROR if the pedantic flag was given
> > > at dir-iterator creation...
> >
> > Dunno.  In a world where we have the pedantic flag, I think only
> > returning ITER_ERROR if that flag is given might be what we want to
> > do.  I can't think of a reason why we would want to return ITER_ERROR
> > without the pedantic flag in that case.
> 
> Ok. I began doing the change, but got stuck in a specific decision.
> What I was trying to do is:
> 
> 1) Make dir_iterator_advance() return ITER_ERROR only when the
> pedantic flag is given;
> 2) Make dir_iterator_abort() be void.
> 
> The first change is trivial. But the second is not so easy: Since the
> [only] current API user defines other iterators on top of
> dir-iterator, it would require a somehow big surgery on refs/* to make
> this change. Should I proceed and make the changes at refs/* or should
> I keep dir_iterator_abort() returning int, although it can never fail?

Maybe I'm missing something, but wouldn't this change in refs.c be
enough? (Other than actually making dir_iterator_abort not return
anything)

	diff --git a/refs/files-backend.c b/refs/files-backend.c
	index 5848f32ef8..81863c3ee0 100644
	--- a/refs/files-backend.c
	+++ b/refs/files-backend.c
	@@ -2125,13 +2125,12 @@ static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
	 {
	        struct files_reflog_iterator *iter =
	                (struct files_reflog_iterator *)ref_iterator;
	-       int ok = ITER_DONE;
	 
	        if (iter->dir_iterator)
	-               ok = dir_iterator_abort(iter->dir_iterator);
	+               dir_iterator_abort(iter->dir_iterator);
	 
	        base_ref_iterator_free(ref_iterator);
	-       return ok;
	+       return ITER_DONE;
	 }
	 
	 static struct ref_iterator_vtable files_reflog_iterator_vtable = {

Currently the only thing calling dir_iterator_abort() is
files_reflog_iterator_abort() from what I can see, and
dir_iterator_abort() always returns ITER_DONE.

That said, I don't know if this is actually worth pursuing.  Having it
return some value and having the caller check that makes it more
future proof, as we won't have to change all the callers in the future
if we want to start returning anything other than ITER_DONE.   Just
leaving it as it is now doesn't actually hurt anybody I think, but may
help in the future.

> There's also a third option: The only operation that may fail during
> dir_iterator_abort() is closedir(). But even on
> dir_iterator_advance(), I'm treating this error as "non-fatal" in the
> sense that it's not caught by the pedantic flag (although a warning is
> emitted). I did it like this because it doesn't seem like a major
> error during dir iteration... But I could change this and make
> DIR_ITERATOR_PEDANTIC return ITER_ERROR upon closedir() errors for
> both dir-iterator advance() and abort() functions. What do you think?

I think this might be the right way to go.  We don't really need an
error from closedir, but at the same time if we are being pedantic,
maybe it should be an error.  I don't have a strong opinion here
either way, other than I think it should probably keep returning an
int.

> > Though I think I would change the example the other way in that case,
> > and pass DIR_ITERATOR_PEDANTIC to 'dir_iterator_begin()', as it would
> > be easy to forget error handling otherwise, even when it is
> > necessary.  I'd rather err on the side of showing too much error
> > handling, than having people forget it and having users run into some
> > odd edge cases in the wild that the tests don't cover.
> 
> Yes, I agree.
> 
> > > Also CC-ed Michael in case he has some input

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin
  2019-04-24 18:36                 ` Thomas Gummerer
@ 2019-04-26  4:13                   ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-26  4:13 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Kernel USP, Michael Haggerty, Ramsay Jones

On Wed, Apr 24, 2019 at 3:36 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 04/23, Matheus Tavares Bernardino wrote:
> > On Thu, Apr 11, 2019 at 6:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > >
> > > On 04/10, Matheus Tavares Bernardino wrote:
> > > > > > diff --git a/dir-iterator.h b/dir-iterator.h
> > > > > > index 970793d07a..93646c3bea 100644
> > > > > > --- a/dir-iterator.h
> > > > > > +++ b/dir-iterator.h
> > > > > > @@ -19,7 +19,7 @@
> > > > > >   * A typical iteration looks like this:
> > > > > >   *
> > > > > >   *     int ok;
> > > > > > - *     struct iterator *iter = dir_iterator_begin(path);
> > > > > > + *     struct iterator *iter = dir_iterator_begin(path, 0);
> > > > >
> > > > > Outside of this context, we already mentione errorhandling when
> > > > > 'ok != ITER_DONE' in his example.  This still can't happen with the
> > > > > way the dir iterator is used here, but it serves as a reminder if
> > > > > people are using the DIR_ITERATOR_PEDANTIC flag.  Good.
> > > >
> > > > This made me think again about the documentation saying that
> > > > dir_iterator_abort() and dir_iterator_advance() may return ITER_ERROR,
> > > > but the implementation does not containing these possibilities.
> > > > (Besides when the pedantic flag is used). Maybe the idea was to make
> > > > API-users implement the check for an ITER_ERROR in case dir-iterator
> > > > needs to start returning it in the future.
> > >
> > > Yeah, I think that was the intention.
> > >
> > > > But do you think such a change in dir-iterator is likely to happen?
> > > > Maybe we could just make dir_iterator_abort() be void and remove this
> > > > section from documentation. Then, for dir_iterator_advance() users
> > > > would only need to check for ITER_ERROR if the pedantic flag was given
> > > > at dir-iterator creation...
> > >
> > > Dunno.  In a world where we have the pedantic flag, I think only
> > > returning ITER_ERROR if that flag is given might be what we want to
> > > do.  I can't think of a reason why we would want to return ITER_ERROR
> > > without the pedantic flag in that case.
> >
> > Ok. I began doing the change, but got stuck in a specific decision.
> > What I was trying to do is:
> >
> > 1) Make dir_iterator_advance() return ITER_ERROR only when the
> > pedantic flag is given;
> > 2) Make dir_iterator_abort() be void.
> >
> > The first change is trivial. But the second is not so easy: Since the
> > [only] current API user defines other iterators on top of
> > dir-iterator, it would require a somehow big surgery on refs/* to make
> > this change. Should I proceed and make the changes at refs/* or should
> > I keep dir_iterator_abort() returning int, although it can never fail?
>
> Maybe I'm missing something, but wouldn't this change in refs.c be
> enough? (Other than actually making dir_iterator_abort not return
> anything)
>
>         diff --git a/refs/files-backend.c b/refs/files-backend.c
>         index 5848f32ef8..81863c3ee0 100644
>         --- a/refs/files-backend.c
>         +++ b/refs/files-backend.c
>         @@ -2125,13 +2125,12 @@ static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
>          {
>                 struct files_reflog_iterator *iter =
>                         (struct files_reflog_iterator *)ref_iterator;
>         -       int ok = ITER_DONE;
>
>                 if (iter->dir_iterator)
>         -               ok = dir_iterator_abort(iter->dir_iterator);
>         +               dir_iterator_abort(iter->dir_iterator);
>
>                 base_ref_iterator_free(ref_iterator);
>         -       return ok;
>         +       return ITER_DONE;
>          }
>
>          static struct ref_iterator_vtable files_reflog_iterator_vtable = {

Yes, indeed. But I thought that since the reason for making
dir_iterator_abort() be void is that it always returns ITER_DONE, the
same change should be applied to files_reflog_iterator_abort() as it
would fall into the same case. And this, in turn, would require
changes to ref_iterator_abort() and many other functions at
refs/iterator.c and refs/files-backend.c

> Currently the only thing calling dir_iterator_abort() is
> files_reflog_iterator_abort() from what I can see, and
> dir_iterator_abort() always returns ITER_DONE.
>
> That said, I don't know if this is actually worth pursuing.  Having it
> return some value and having the caller check that makes it more
> future proof, as we won't have to change all the callers in the future
> if we want to start returning anything other than ITER_DONE.   Just
> leaving it as it is now doesn't actually hurt anybody I think, but may
> help in the future.

Ok, I understand.

> > There's also a third option: The only operation that may fail during
> > dir_iterator_abort() is closedir(). But even on
> > dir_iterator_advance(), I'm treating this error as "non-fatal" in the
> > sense that it's not caught by the pedantic flag (although a warning is
> > emitted). I did it like this because it doesn't seem like a major
> > error during dir iteration... But I could change this and make
> > DIR_ITERATOR_PEDANTIC return ITER_ERROR upon closedir() errors for
> > both dir-iterator advance() and abort() functions. What do you think?
>
> I think this might be the right way to go.  We don't really need an
> error from closedir, but at the same time if we are being pedantic,
> maybe it should be an error.  I don't have a strong opinion here
> either way, other than I think it should probably keep returning an
> int.

I know I suggested this option, but searching the code base I saw no
other place that checks closedir()'s return besides dir-iterator. So
maybe the best option would be to keep dir_iterator_abort() always
returning ITER_DONE, even upon closedir() errors. Them, I can document
that the pedantic flag only affects dir_iterator_advance() behavior
(but closedir() errors wouldn't be considered here as well).

I got stuck in this for a while, but finally this option seems good to me now...

> > > Though I think I would change the example the other way in that case,
> > > and pass DIR_ITERATOR_PEDANTIC to 'dir_iterator_begin()', as it would
> > > be easy to forget error handling otherwise, even when it is
> > > necessary.  I'd rather err on the side of showing too much error
> > > handling, than having people forget it and having users run into some
> > > odd edge cases in the wild that the tests don't cover.
> >
> > Yes, I agree.
> >
> > > > Also CC-ed Michael in case he has some input

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 00/10] clone: dir-iterator refactoring with tests
  2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
                         ` (7 preceding siblings ...)
  2019-03-31 18:16       ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Thomas Gummerer
@ 2019-05-02 14:48       ` " Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
                           ` (10 more replies)
  8 siblings, 11 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp

This patchset contains:
- a replacement of explicit recursive dir iteration at
  copy_or_link_directory for the dir-iterator API;
- some refactoring and behaviour changes at local clone, mainly to
  take care of symlinks and hidden files at .git/objects, together
  with tests for this types of files.
- dir-iterator refactoring and feature adding with tests.

Changes since v5:
- Add tests for the dir-iterator API
- Refactor the dir-iterator state machine model, simplifying its
  mechanics to improve readability.
- Change warning() to warning_errno() at dir-iterator.c
- Add a recursive symlinks check for dir_iterator_advance() in order
  to avoid unwanted recursions with DIR_ITERATOR_FOLLOW_SYMLIKS
- Add tests for the dir-iterator flags feature
- Make warnings be emitted both when DIR_ITERATOR_PEDANTIC is
  supplied and when it's not. It contains more relevant information
  on the error, so I thought it should be always printed.
- Make dir_iterator_begin() check if the given argument is a valid
  path to a directory.
- Adjusted some minor codestyle problems and commit messages
- Address Thomas comments in v5

v5: https://public-inbox.org/git/20190330224907.3277-1-matheus.bernardino@usp.br/
travis build: https://travis-ci.org/MatheusBernardino/git/builds/527176611

Note: I tried to use --range-diff as Thomas suggested but I'm not sure
the output is as desired. Please, let me know if I did something wrong
using it.

Daniel Ferreira (1):
  dir-iterator: add tests for dir-iterator API

Matheus Tavares (8):
  clone: better handle symlinked files at .git/objects/
  dir-iterator: use warning_errno when possible
  dir-iterator: refactor state machine model
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal
  clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 Makefile                     |   1 +
 builtin/clone.c              |  75 +++++----
 dir-iterator.c               | 289 +++++++++++++++++++++--------------
 dir-iterator.h               |  59 +++++--
 refs/files-backend.c         |  17 ++-
 t/helper/test-dir-iterator.c |  58 +++++++
 t/helper/test-tool.c         |   1 +
 t/helper/test-tool.h         |   1 +
 t/t0066-dir-iterator.sh      | 163 ++++++++++++++++++++
 t/t5604-clone-reference.sh   | 133 ++++++++++++++++
 10 files changed, 634 insertions(+), 163 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

Range-diff against v5:
 1:  3d422dd4de =  1:  a630b1a129 clone: test for our behavior on odd objects/* content
 2:  35819e6ed1 !  2:  51e06687fc clone: better handle symlinked files at .git/objects/
    @@ -45,10 +45,7 @@
      			die_errno(_("failed to unlink '%s'"), dest->buf);
      		if (!option_no_hardlinks) {
     -			if (!link(src->buf, dest->buf))
    -+			char *resolved_path = real_pathdup(src->buf, 1);
    -+			int status = link(resolved_path, dest->buf);
    -+			free(resolved_path);
    -+			if (!status)
    ++			if (!link(real_path(src->buf), dest->buf))
      				continue;
      			if (option_local > 0)
      				die_errno(_("failed to create link '%s'"), dest->buf);
 3:  2afe3208a4 <  -:  ---------- dir-iterator: add flags parameter to dir_iterator_begin
 -:  ---------- >  3:  c8a860e3a5 dir-iterator: add tests for dir-iterator API
 -:  ---------- >  4:  b975351080 dir-iterator: use warning_errno when possible
 -:  ---------- >  5:  0fdbd1633e dir-iterator: refactor state machine model
 -:  ---------- >  6:  7b2a9ae947 dir-iterator: add flags parameter to dir_iterator_begin
 4:  71d64e6278 =  7:  b9f298cbc6 clone: copy hidden paths at local clone
 5:  35e36756db =  8:  0e7b1e49e2 clone: extract function from copy_or_link_directory
 6:  1bfda87879 !  9:  f726ce2733 clone: use dir-iterator to avoid explicit dir traversal
    @@ -8,10 +8,14 @@
         copy_or_link_directory.
     
         This process also makes copy_or_link_directory call die() in case of an
    -    error on readdir or stat, inside dir_iterator_advance. Previously it
    +    error on readdir or stat inside dir_iterator_advance. Previously it
         would just print a warning for errors on stat and ignore errors on
         readdir, which isn't nice because a local git clone could succeed even
    -    though the .git/objects copy didn't fully succeed.
    +    though the .git/objects copy didn't fully succeed. Also, with the
    +    dir-iterator API, recursive symlinks will be detected and skipped. This
    +    is another behavior improvement, since the current version would
    +    continue to copy the same content over and over until stat() returned an
    +    ELOOP error.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
    @@ -44,12 +48,15 @@
     -		die_errno(_("failed to open '%s'"), src->buf);
     +	struct dir_iterator *iter;
     +	int iter_status;
    -+	unsigned flags;
    ++	unsigned int flags;
      
      	mkdir_if_missing(dest->buf, 0777);
      
     +	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
     +	iter = dir_iterator_begin(src->buf, flags);
    ++
    ++	if (!iter)
    ++		die_errno(_("failed to start iterator over '%s'"), src->buf);
     +
      	strbuf_addch(src, '/');
      	src_len = src->len;
 7:  3861b30108 = 10:  6a57bb3887 clone: replace strcmp by fspathcmp
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 01/10] clone: test for our behavior on odd objects/* content
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
                           ` (9 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Junio C Hamano, Alex Riesen

From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..207650cb95 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,115 @@ test_expect_success 'clone, dissociate from alternates' '
 	( cd C && git fsck )
 '
 
+test_expect_success 'setup repo with garbage in objects/*' '
+	git init S &&
+	(
+		cd S &&
+		test_commit A &&
+
+		cd .git/objects &&
+		>.some-hidden-file &&
+		>some-file &&
+		mkdir .some-hidden-dir &&
+		>.some-hidden-dir/some-file &&
+		>.some-hidden-dir/.some-dot-file &&
+		mkdir some-dir &&
+		>some-dir/some-file &&
+		>some-dir/.some-dot-file
+	)
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option S S$option || return 1 &&
+		git -C S$option fsck || return 1
+	done &&
+	find S-* -name "*some*" | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-file
+	S--dissociate/.git/objects/some-dir
+	S--dissociate/.git/objects/some-dir/.some-dot-file
+	S--dissociate/.git/objects/some-dir/some-file
+	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-file
+	S--local/.git/objects/some-dir
+	S--local/.git/objects/some-dir/.some-dot-file
+	S--local/.git/objects/some-dir/some-file
+	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-file
+	S--no-hardlinks/.git/objects/some-dir
+	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/some-dir/some-file
+	S--no-hardlinks/.git/objects/some-file
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+	git init T &&
+	(
+		cd T &&
+		git config gc.auto 0 &&
+		test_commit A &&
+		git gc &&
+		test_commit B &&
+
+		cd .git/objects &&
+		mv pack packs &&
+		ln -s packs pack &&
+		find ?? -type d >loose-dirs &&
+		last_loose=$(tail -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+		mv $last_loose a-loose-dir &&
+		ln -s a-loose-dir $last_loose &&
+		find . -type f | sort >../../../T.objects-files.raw &&
+		echo unknown_content> unknown_file
+	) &&
+	git -C T fsck &&
+	git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option T T$option || return 1 &&
+		git -C T$option fsck || return 1 &&
+		git -C T$option rev-list --all --objects >T$option.objects &&
+		test_cmp T.objects T$option.objects &&
+		(
+			cd T$option/.git/objects &&
+			find . -type f | sort >../../../T$option.objects-files.raw
+		)
+	done &&
+
+	for raw in $(ls T*.raw)
+	do
+		sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
+		    -e "/multi-pack-index/d" <$raw >$raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./Y/Z
+	./Y/Z
+	./a-loose-dir/Z
+	./Y/Z
+	./info/packs
+	./pack/pack-Z.idx
+	./pack/pack-Z.pack
+	./packs/pack-Z.idx
+	./packs/pack-Z.pack
+	./unknown_file
+	EOF
+
+	for option in --local --dissociate --no-hardlinks
+	do
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+	done &&
+
+	echo ./info/alternates >expected-files &&
+	test_cmp expected-files T--shared.objects-files.raw
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 02/10] clone: better handle symlinked files at .git/objects/
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
                           ` (8 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Junio C Hamano, Michael Haggerty

There is currently an odd behaviour when locally cloning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it, Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
    $ touch a
    $ ln -s a b
    $ link b c
    $ ls -li a b c
    155 [...] a
    156 [...] b -> a
    156 [...] c -> a

But on NetBSD:
    $ ls -li a b c
    2609160 [...] a
    2609164 [...] b -> a
    2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
besides that, the current behaviour on GNU/Linux may result in broken
symlinks. So let's standardize this by making the hardlinks always point
to dereferenced paths, instead of the symlinks themselves. Also, add
tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            |  2 +-
 t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..d1aba3b13f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (unlink(dest->buf) && errno != ENOENT)
 			die_errno(_("failed to unlink '%s'"), dest->buf);
 		if (!option_no_hardlinks) {
-			if (!link(src->buf, dest->buf))
+			if (!link(real_path(src->buf), dest->buf))
 				continue;
 			if (option_local > 0)
 				die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 207650cb95..0800c3853f 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	test_cmp expected actual
 '
 
-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
 	git init T &&
 	(
 		cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 		ln -s packs pack &&
 		find ?? -type d >loose-dirs &&
 		last_loose=$(tail -n 1 loose-dirs) &&
-		rm -f loose-dirs &&
 		mv $last_loose a-loose-dir &&
 		ln -s a-loose-dir $last_loose &&
+		first_loose=$(head -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+
+		cd $first_loose &&
+		obj=$(ls *) &&
+		mv $obj ../an-object &&
+		ln -s ../an-object $obj &&
+
+		cd ../ &&
 		find . -type f | sort >../../../T.objects-files.raw &&
+		find . -type l | sort >../../../T.objects-symlinks.raw &&
 		echo unknown_content> unknown_file
 	) &&
 	git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 '
 
 
-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
 	for option in --local --no-hardlinks --shared --dissociate
 	do
 		git clone $option T T$option || return 1 &&
@@ -300,7 +309,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 		test_cmp T.objects T$option.objects &&
 		(
 			cd T$option/.git/objects &&
-			find . -type f | sort >../../../T$option.objects-files.raw
+			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type l | sort >../../../T$option.objects-symlinks.raw
 		)
 	done &&
 
@@ -314,6 +324,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./Y/Z
 	./Y/Z
 	./a-loose-dir/Z
+	./an-object
 	./Y/Z
 	./info/packs
 	./pack/pack-Z.idx
@@ -323,13 +334,15 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./unknown_file
 	EOF
 
-	for option in --local --dissociate --no-hardlinks
+	for option in --local --no-hardlinks --dissociate
 	do
-		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+		test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
 	done &&
 
 	echo ./info/alternates >expected-files &&
-	test_cmp expected-files T--shared.objects-files.raw
+	test_cmp expected-files T--shared.objects-files.raw &&
+	test_must_be_empty T--shared.objects-symlinks.raw
 '
 
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 03/10] dir-iterator: add tests for dir-iterator API
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
                           ` (7 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Daniel Ferreira, Junio C Hamano

From: Daniel Ferreira <bnmvco@gmail.com>

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir-iterator.

Create t/t0066-dir-iterator.sh, which tests that dir-iterator does
iterate through a whole directory tree as expected.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
[matheus.bernardino: update to use test-tool and some minor aesthetics]
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile                     |  1 +
 t/helper/test-dir-iterator.c | 33 ++++++++++++++++++++++
 t/helper/test-tool.c         |  1 +
 t/helper/test-tool.h         |  1 +
 t/t0066-dir-iterator.sh      | 55 ++++++++++++++++++++++++++++++++++++
 5 files changed, 91 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

diff --git a/Makefile b/Makefile
index 9f1b6e8926..61da7e4f35 100644
--- a/Makefile
+++ b/Makefile
@@ -713,6 +713,7 @@ TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
+TEST_BUILTINS_OBJS += test-dir-iterator.o
 TEST_BUILTINS_OBJS += test-drop-caches.o
 TEST_BUILTINS_OBJS += test-dump-cache-tree.o
 TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000000..84f50bed8c
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,33 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+/* Argument is a directory path to iterate over */
+int cmd__dir_iterator(int argc, const char **argv)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2)
+		die("BUG: test-dir-iterator needs one argument");
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else if (S_ISREG(diter->st.st_mode))
+			printf("[f] ");
+		else
+			printf("[?] ");
+
+		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
+		       diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 53c06932c4..89b3bfcad8 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -13,6 +13,7 @@ static struct test_cmd cmds[] = {
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
+	{ "dir-iterator", cmd__dir_iterator },
 	{ "drop-caches", cmd__drop_caches },
 	{ "dump-cache-tree", cmd__dump_cache_tree },
 	{ "dump-fsmonitor", cmd__dump_fsmonitor },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ffab4d19d7..0a831c839c 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -9,6 +9,7 @@ int cmd__config(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
+int cmd__dir_iterator(int argc, const char **argv);
 int cmd__drop_caches(int argc, const char **argv);
 int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
new file mode 100755
index 0000000000..6e06dc038d
--- /dev/null
+++ b/t/t0066-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test the dir-iterator functionality'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	mkdir -p dir2/a/b/c/ &&
+	>dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	cat >expected-iteration-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir/a
+	[d] (a/b) [b] ./dir/a/b
+	[d] (a/b/c) [c] ./dir/a/b/c
+	[d] (d) [d] ./dir/d
+	[d] (d/e) [e] ./dir/d/e
+	[d] (d/e/d) [d] ./dir/d/e/d
+	[f] (a/b/c/d) [d] ./dir/a/b/c/d
+	[f] (a/e) [e] ./dir/a/e
+	[f] (b) [b] ./dir/b
+	[f] (c) [c] ./dir/c
+	[f] (d/e/d/a) [a] ./dir/d/e/d/a
+	EOF
+
+	test-tool dir-iterator ./dir >out &&
+	sort <out >./actual-iteration-sorted-output &&
+
+	test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	cat >expected-pre-order-output <<-EOF &&
+	[d] (a) [a] ./dir2/a
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	EOF
+
+	test-tool dir-iterator ./dir2 >actual-pre-order-output &&
+
+	test_cmp expected-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 04/10] dir-iterator: use warning_errno when possible
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (2 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 05/10] dir-iterator: refactor state machine model Matheus Tavares
                           ` (6 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Junio C Hamano, Michael Haggerty

Change warning(..., strerror(errno)) by warning_errno(...). This helps
to unify warning display besides simplifying a bit the code. Also,
improve warning messages by surrounding paths with quotation marks and
using more meaningful statements.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..0c8880868a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -71,8 +71,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
+				warning_errno("error opening directory '%s'",
+					      iter->base.path.buf);
 				/* Popping the level is handled below */
 			}
 
@@ -122,11 +122,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
-					warning("error reading directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+					warning_errno("error reading directory '%s'",
+						      iter->base.path.buf);
 				} else if (closedir(level->dir))
-					warning("error closing directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+					warning_errno("error closing directory '%s'",
+						      iter->base.path.buf);
 
 				level->dir = NULL;
 				if (--iter->levels_nr == 0)
@@ -140,9 +140,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			strbuf_addstr(&iter->base.path, de->d_name);
 			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
 				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
+					warning_errno("failed to stat '%s'",
+						      iter->base.path.buf);
 				continue;
 			}
 
@@ -170,9 +169,11 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 			&iter->levels[iter->levels_nr - 1];
 
 		if (level->dir && closedir(level->dir)) {
+			int saved_errno = errno;
 			strbuf_setlen(&iter->base.path, level->prefix_len);
-			warning("error closing directory %s: %s",
-				iter->base.path.buf, strerror(errno));
+			errno = saved_errno;
+			warning_errno("error closing directory '%s'",
+				      iter->base.path.buf);
 		}
 	}
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 05/10] dir-iterator: refactor state machine model
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (3 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
                           ` (5 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Daniel Ferreira, Michael Haggerty,
	Ramsay Jones, Junio C Hamano, Jeff King, Johannes Schindelin

dir_iterator_advance() is a large function with two nested loops. Let's
improve its readability factoring out three functions and simplifying
its mechanics. The refactored model will no longer depend on
level.initialized and level.dir_state to keep track of the iteration
state and will perform on a single loop.

Also, dir_iterator_begin() currently does not check if the given string
represents a valid directory path. Since the refactored model will have
to stat() the given path at initialization, let's also check for this
kind of error and make dir_iterator_begin() return NULL, on failures,
with errno appropriately set. And add tests for this new behavior.

Improve documentation at dir-iteration.h and code comments at
dir-iterator.c to reflect the changes and eliminate possible
ambiguities.

Finally, adjust refs/files-backend.c to check for now possible
dir_iterator_begin() failures.

Original-patch-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

When dir_iterator_begin() fails at refs/files-backend.c, I used the same
idea Daniel proposed, which is to initialize an empty iterator with
empty_ref_iterator_begin(). Still, I'm not sure wether we shouldn't
abort execution there instead of returning an empty iterator.

dir_iterator_begin() will fail if the give argument is an empty string,
NULL, an invalid path, a non directory path or on some other stat()
errors. Maybe, on these kind of errors, we don't want the users
of refs/files-backend.c, and therefore should call die() right there?

(Also, NULL and empty string arguments were considered a bug in the
previous version of dir_iterator_begin)

 dir-iterator.c               | 234 ++++++++++++++++++-----------------
 dir-iterator.h               |  15 ++-
 refs/files-backend.c         |  17 ++-
 t/helper/test-dir-iterator.c |   5 +
 t/t0066-dir-iterator.sh      |  13 ++
 5 files changed, 163 insertions(+), 121 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 0c8880868a..594fe4d67b 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -13,16 +11,6 @@ struct dir_iterator_level {
 	 * (including a trailing '/'):
 	 */
 	size_t prefix_len;
-
-	/*
-	 * The last action that has been taken with the current entry
-	 * (needed for directories, which have to be included in the
-	 * iteration and also iterated into):
-	 */
-	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
-	} dir_state;
 };
 
 /*
@@ -34,9 +22,11 @@ struct dir_iterator_int {
 	struct dir_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack. After the first
+	 * call to dir_iterator_begin(), if it succeeds to open the
+	 * first level's dir, this will always be at least 1. Then,
+	 * when it comes to zero the iteration is ended and this
+	 * struct is freed.
 	 */
 	size_t levels_nr;
 
@@ -50,113 +40,118 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+/*
+ * Push a level in the iter stack and initialize it with information from
+ * the directory pointed by iter->base->path. It is assumed that this
+ * strbuf points to a valid directory path. Return 0 on success and -1
+ * otherwise, leaving the stack unchanged.
+ */
+static int push_level(struct dir_iterator_int *iter)
+{
+	struct dir_iterator_level *level;
+
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1, iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+
+	if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
+		strbuf_addch(&iter->base.path, '/');
+	level->prefix_len = iter->base.path.len;
+
+	level->dir = opendir(iter->base.path.buf);
+	if (!level->dir) {
+		if (errno != ENOENT) {
+			warning_errno("error opening directory '%s'",
+				      iter->base.path.buf);
+		}
+		iter->levels_nr--;
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Pop the top level on the iter stack, releasing any resources associated
+ * with it. Return the new value of iter->levels_nr.
+ */
+static int pop_level(struct dir_iterator_int *iter)
+{
+	struct dir_iterator_level *level =
+		&iter->levels[iter->levels_nr - 1];
+
+	if (level->dir && closedir(level->dir))
+		warning_errno("error closing directory '%s'",
+			      iter->base.path.buf);
+	level->dir = NULL;
+
+	return --iter->levels_nr;
+}
+
+/*
+ * Populate iter->base with the necessary information on the next iteration
+ * entry, represented by the given dirent de. Return 0 on success and -1
+ * otherwise.
+ */
+static int prepare_next_entry_data(struct dir_iterator_int *iter,
+				   struct dirent *de)
+{
+	strbuf_addstr(&iter->base.path, de->d_name);
+	/*
+	 * We have to reset these because the path strbuf might have
+	 * been realloc()ed at the previous strbuf_addstr().
+	 */
+	iter->base.relative_path = iter->base.path.buf +
+				   iter->levels[0].prefix_len;
+	iter->base.basename = iter->base.path.buf +
+			      iter->levels[iter->levels_nr - 1].prefix_len;
+
+	if (lstat(iter->base.path.buf, &iter->base.st)) {
+		if (errno != ENOENT)
+			warning_errno("failed to stat '%s'", iter->base.path.buf);
+		return -1;
+	}
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
 
+	if (S_ISDIR(iter->base.st.st_mode)) {
+		if (push_level(iter) && iter->levels_nr == 0) {
+			/* Pushing the first level failed */
+			return dir_iterator_abort(dir_iterator);
+		}
+	}
+
+	/* Loop until we find an entry that we can give back to the caller. */
 	while (1) {
+		struct dirent *de;
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
-		struct dirent *de;
 
-		if (!level->initialized) {
-			/*
-			 * Note: dir_iterator_begin() ensures that
-			 * path is not the empty string.
-			 */
-			if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
-				strbuf_addch(&iter->base.path, '/');
-			level->prefix_len = iter->base.path.len;
-
-			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning_errno("error opening directory '%s'",
+		strbuf_setlen(&iter->base.path, level->prefix_len);
+		errno = 0;
+		de = readdir(level->dir);
+
+		if (!de) {
+			if (errno)
+				warning_errno("error reading directory '%s'",
 					      iter->base.path.buf);
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
-				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
-				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
-				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
-			}
+			else if (pop_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
+			continue;
 		}
 
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (--iter->levels_nr == 0)
-				return dir_iterator_abort(dir_iterator);
+		if (is_dot_or_dotdot(de->d_name))
+			continue;
 
+		if (prepare_next_entry_data(iter, de))
 			continue;
-		}
 
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
-			strbuf_setlen(&iter->base.path, level->prefix_len);
-			errno = 0;
-			de = readdir(level->dir);
-
-			if (!de) {
-				/* This level is exhausted; pop up a level. */
-				if (errno) {
-					warning_errno("error reading directory '%s'",
-						      iter->base.path.buf);
-				} else if (closedir(level->dir))
-					warning_errno("error closing directory '%s'",
-						      iter->base.path.buf);
-
-				level->dir = NULL;
-				if (--iter->levels_nr == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
-			}
-
-			if (is_dot_or_dotdot(de->d_name))
-				continue;
-
-			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning_errno("failed to stat '%s'",
-						      iter->base.path.buf);
-				continue;
-			}
-
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
-
-			return ITER_OK;
-		}
+		return ITER_OK;
 	}
 }
 
@@ -187,17 +182,32 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
-
-	if (!path || !*path)
-		BUG("empty path passed to dir_iterator_begin()");
+	int saved_errno;
 
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+	iter->levels_nr = 0;
 
-	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	/*
+	 * Note: stat already checks for NULL or empty strings and
+	 * inexistent paths.
+	 */
+	if (stat(iter->base.path.buf, &iter->base.st) < 0) {
+		saved_errno = errno;
+		goto error_out;
+	}
+
+	if (!S_ISDIR(iter->base.st.st_mode)) {
+		saved_errno = ENOTDIR;
+		goto error_out;
+	}
 
 	return dir_iterator;
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	errno = saved_errno;
+	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..0822821e56 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -8,19 +8,23 @@
  *
  * Iterate over a directory tree, recursively, including paths of all
  * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * symlinks except for the original path. Note that the original path
+ * is not included in the iteration.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined, directory paths are always given before
+ * their contents.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
  *     struct iterator *iter = dir_iterator_begin(path);
  *
+ *     if (!iter)
+ *             goto error_handler;
+ *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
  *                     ok = dir_iterator_abort(iter);
@@ -59,8 +63,9 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path. On success, return a
+ * dir_iterator that holds the internal state of the iteration.
+ * In case of failure, return NULL and set errno accordingly.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 63e55e6773..97a54532e3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,13 +2143,22 @@ static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 						  const char *gitdir)
 {
-	struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
-	struct ref_iterator *ref_iterator = &iter->base;
+	struct dir_iterator *diter;
+	struct files_reflog_iterator *iter;
+	struct ref_iterator *ref_iterator;
 	struct strbuf sb = STRBUF_INIT;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+
+	diter = dir_iterator_begin(sb.buf);
+	if (!diter)
+		return empty_ref_iterator_begin();
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 84f50bed8c..fab1ff6237 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -17,6 +17,11 @@ int cmd__dir_iterator(int argc, const char **argv)
 
 	diter = dir_iterator_begin(path.buf);
 
+	if (!diter) {
+		printf("dir_iterator_begin failure: %d\n", errno);
+		exit(EXIT_FAILURE);
+	}
+
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
 			printf("[d] ");
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 6e06dc038d..c739ed7911 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -52,4 +52,17 @@ test_expect_success 'dir-iterator should list files in the correct order' '
 	test_cmp expected-pre-order-output actual-pre-order-output
 '
 
+test_expect_success 'begin should fail upon inexistent paths' '
+	test_must_fail test-tool dir-iterator ./inexistent-path \
+		>actual-inexistent-path-output &&
+	echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+	test_cmp expected-inexistent-path-output actual-inexistent-path-output
+'
+
+test_expect_success 'begin should fail upon non directory paths' '
+	test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
+	echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+	test_cmp expected-non-dir-output actual-non-dir-output
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (4 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 05/10] dir-iterator: refactor state machine model Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 07/10] clone: copy hidden paths at local clone Matheus Tavares
                           ` (4 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones, Junio C Hamano

Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.

These new flags will be used in a subsequent patch.

Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

refs/files_backend.c is currently using no flags at the place it calls
dir_iterator_begin(), to keep the same behavior it previously had. But
as ITER_ERROR will now only be possibly returned by
dir_iterator_avance() when DIR_ITERATOR_PEDANTIC is used and as
refs/files_backend.c already checks for ITER_ERRORs, should we, perhaps,
use this flag when initializing an iterator here?

Another uncertainty I had is why we ignore ENOENT at dir-iterator. Is it
so that files may be remove during iteration? If not, maybe we should
consider to start looking for them, as, for example, broken symlinks
will simply be ignored in this current version as an ENOENT will be
returned when trying to dereference them.

 dir-iterator.c               | 82 +++++++++++++++++++++++++------
 dir-iterator.h               | 50 ++++++++++++++-----
 refs/files-backend.c         |  2 +-
 t/helper/test-dir-iterator.c | 34 ++++++++++---
 t/t0066-dir-iterator.sh      | 95 ++++++++++++++++++++++++++++++++++++
 5 files changed, 228 insertions(+), 35 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 594fe4d67b..52db87bdc9 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -6,6 +6,9 @@
 struct dir_iterator_level {
 	DIR *dir;
 
+	/* The inode number of this level's directory. */
+	ino_t ino;
+
 	/*
 	 * The length of the directory part of path at this level
 	 * (including a trailing '/'):
@@ -38,13 +41,16 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned int flags;
 };
 
 /*
  * Push a level in the iter stack and initialize it with information from
  * the directory pointed by iter->base->path. It is assumed that this
  * strbuf points to a valid directory path. Return 0 on success and -1
- * otherwise, leaving the stack unchanged.
+ * otherwise, setting errno accordingly and leaving the stack unchanged.
  */
 static int push_level(struct dir_iterator_int *iter)
 {
@@ -56,14 +62,17 @@ static int push_level(struct dir_iterator_int *iter)
 	if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
 		strbuf_addch(&iter->base.path, '/');
 	level->prefix_len = iter->base.path.len;
+	level->ino = iter->base.st.st_ino;
 
 	level->dir = opendir(iter->base.path.buf);
 	if (!level->dir) {
+		int saved_errno = errno;
 		if (errno != ENOENT) {
 			warning_errno("error opening directory '%s'",
 				      iter->base.path.buf);
 		}
 		iter->levels_nr--;
+		errno = saved_errno;
 		return -1;
 	}
 
@@ -90,11 +99,13 @@ static int pop_level(struct dir_iterator_int *iter)
 /*
  * Populate iter->base with the necessary information on the next iteration
  * entry, represented by the given dirent de. Return 0 on success and -1
- * otherwise.
+ * otherwise, setting errno accordingly.
  */
 static int prepare_next_entry_data(struct dir_iterator_int *iter,
 				   struct dirent *de)
 {
+	int err, saved_errno;
+
 	strbuf_addstr(&iter->base.path, de->d_name);
 	/*
 	 * We have to reset these because the path strbuf might have
@@ -105,12 +116,34 @@ static int prepare_next_entry_data(struct dir_iterator_int *iter,
 	iter->base.basename = iter->base.path.buf +
 			      iter->levels[iter->levels_nr - 1].prefix_len;
 
-	if (lstat(iter->base.path.buf, &iter->base.st)) {
-		if (errno != ENOENT)
-			warning_errno("failed to stat '%s'", iter->base.path.buf);
-		return -1;
-	}
+	if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+		err = stat(iter->base.path.buf, &iter->base.st);
+	else
+		err = lstat(iter->base.path.buf, &iter->base.st);
+
+	saved_errno = errno;
+	if (err && errno != ENOENT)
+		warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+	errno = saved_errno;
+	return err;
+}
+
+/*
+ * Look for a recursive symlink at iter->base.path pointing to any directory on
+ * the previous stack levels. If it is found, return 1. If not, return 0.
+ */
+static int find_recursive_symlinks(struct dir_iterator_int *iter)
+{
+	int i;
+
+	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
+	    !S_ISDIR(iter->base.st.st_mode))
+		return 0;
 
+	for (i = 0; i < iter->levels_nr; ++i)
+		if (iter->base.st.st_ino == iter->levels[i].ino)
+			return 1;
 	return 0;
 }
 
@@ -119,11 +152,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
 
-	if (S_ISDIR(iter->base.st.st_mode)) {
-		if (push_level(iter) && iter->levels_nr == 0) {
-			/* Pushing the first level failed */
-			return dir_iterator_abort(dir_iterator);
-		}
+	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
+		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+			goto error_out;
+		if (iter->levels_nr == 0)
+			goto error_out;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -137,22 +170,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		de = readdir(level->dir);
 
 		if (!de) {
-			if (errno)
+			if (errno) {
 				warning_errno("error reading directory '%s'",
 					      iter->base.path.buf);
-			else if (pop_level(iter) == 0)
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
+			} else if (pop_level(iter) == 0) {
 				return dir_iterator_abort(dir_iterator);
+			}
 			continue;
 		}
 
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		if (prepare_next_entry_data(iter, de))
+		if (prepare_next_entry_data(iter, de)) {
+			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+				goto error_out;
 			continue;
+		}
+
+		if (find_recursive_symlinks(iter)) {
+			warning("ignoring recursive symlink at '%s'",
+				iter->base.path.buf);
+			continue;
+		}
 
 		return ITER_OK;
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -178,7 +227,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -189,6 +238,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 	iter->levels_nr = 0;
+	iter->flags = flags;
 
 	/*
 	 * Note: stat already checks for NULL or empty strings and
diff --git a/dir-iterator.h b/dir-iterator.h
index 0822821e56..28e50dabdb 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -20,7 +20,8 @@
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ *     struct iterator *iter = dir_iterator_begin(path, flags);
  *
  *     if (!iter)
  *             goto error_handler;
@@ -44,6 +45,24 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error at dir_iterator_advance(), which is to keep
+ *   looking for a next valid entry. With this flag, resources are freed
+ *   and ITER_ERROR is returned immediately. In both cases, a meaningful
+ *   warning is emitted.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ *   i.e., linked directories' contents will be iterated over and
+ *   iter->base.st will contain information on the referred files,
+ *   not the symlinks themselves, which is the default behavior.
+ *   Recursive symlinks are skipped.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -58,29 +77,38 @@ struct dir_iterator {
 	/* The current basename: */
 	const char *basename;
 
-	/* The result of calling lstat() on path: */
+	/*
+	 * The result of calling lstat() on path; or stat(), if the
+	 * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+	 * dir_iterator's initialization.
+	 */
 	struct stat st;
 };
 
 /*
- * Start a directory iteration over path. On success, return a
- * dir_iterator that holds the internal state of the iteration.
- * In case of failure, return NULL and set errno accordingly.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. On success, return a dir_iterator
+ * that holds the internal state of the iteration. In case of
+ * failure, return NULL and set errno accordingly.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behavior.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
  * If the iteration is exhausted, free the dir_iterator and any
- * resources associated with it and return ITER_DONE. On error, free
- * dir_iterator and associated resources and return ITER_ERROR. It is
- * a bug to use iterator or call this function again after it has
- * returned ITER_DONE or ITER_ERROR.
+ * resources associated with it and return ITER_DONE.
+ *
+ * It is a bug to use iterator or call this function again after it
+ * has returned ITER_DONE or ITER_ERROR (which may be returned iff
+ * the DIR_ITERATOR_PEDANTIC flag was set).
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 97a54532e3..ce78656823 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2150,7 +2150,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	strbuf_addf(&sb, "%s/logs", gitdir);
 
-	diter = dir_iterator_begin(sb.buf);
+	diter = dir_iterator_begin(sb.buf, 0);
 	if (!diter)
 		return empty_ref_iterator_begin();
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index fab1ff6237..a5b96cb0dc 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,29 +4,44 @@
 #include "iterator.h"
 #include "dir-iterator.h"
 
-/* Argument is a directory path to iterate over */
+/*
+ * usage:
+ * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
+ */
 int cmd__dir_iterator(int argc, const char **argv)
 {
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
+	unsigned int flags = 0;
+	int iter_status;
+
+	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+		if (strcmp(*argv, "--follow-symlinks") == 0)
+			flags |= DIR_ITERATOR_FOLLOW_SYMLINKS;
+		else if (strcmp(*argv, "--pedantic") == 0)
+			flags |= DIR_ITERATOR_PEDANTIC;
+		else
+			die("invalid option '%s'", *argv);
+	}
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
-
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (!*argv || argc != 1)
+		die("dir-iterator needs exactly one non-option argument");
 
-	diter = dir_iterator_begin(path.buf);
+	strbuf_add(&path, *argv, strlen(*argv));
+	diter = dir_iterator_begin(path.buf, flags);
 
 	if (!diter) {
 		printf("dir_iterator_begin failure: %d\n", errno);
 		exit(EXIT_FAILURE);
 	}
 
-	while (dir_iterator_advance(diter) == ITER_OK) {
+	while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
 			printf("[d] ");
 		else if (S_ISREG(diter->st.st_mode))
 			printf("[f] ");
+		else if (S_ISLNK(diter->st.st_mode))
+			printf("[s] ");
 		else
 			printf("[?] ");
 
@@ -34,5 +49,10 @@ int cmd__dir_iterator(int argc, const char **argv)
 		       diter->path.buf);
 	}
 
+	if (iter_status != ITER_DONE) {
+		printf("dir_iterator_advance failure\n");
+		return 1;
+	}
+
 	return 0;
 }
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index c739ed7911..8f996a31fa 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
 	test_cmp expected-non-dir-output actual-non-dir-output
 '
 
+test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
+	cat >expected-no-permissions-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	EOF
+
+	mkdir -p dir3/a &&
+	> dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
+	test_cmp expected-no-permissions-output actual-no-permissions-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
+	cat >expected-no-permissions-pedantic-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	dir_iterator_advance failure
+	EOF
+
+	mkdir -p dir3/a &&
+	> dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test_must_fail test-tool dir-iterator --pedantic ./dir3 \
+		>actual-no-permissions-pedantic-output &&
+	test_cmp expected-no-permissions-pedantic-output \
+		actual-no-permissions-pedantic-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success SYMLINKS 'setup dirs with symlinks' '
+	mkdir -p dir4/a &&
+	mkdir -p dir4/b/c &&
+	>dir4/a/d &&
+	ln -s d dir4/a/e &&
+	ln -s ../b dir4/a/f &&
+
+	mkdir -p dir5/a/b &&
+	mkdir -p dir5/a/c &&
+	ln -s ../c dir5/a/b/d &&
+	ln -s ../ dir5/a/b/e &&
+	ln -s ../../ dir5/a/b/f
+'
+
+test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
+	cat >expected-no-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[s] (a/e) [e] ./dir4/a/e
+	[s] (a/f) [f] ./dir4/a/f
+	EOF
+
+	test-tool dir-iterator ./dir4 >out &&
+	sort <out >actual-no-follow-sorted-output &&
+
+	test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
+
+test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
+	cat >expected-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (a/f) [f] ./dir4/a/f
+	[d] (a/f/c) [c] ./dir4/a/f/c
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[f] (a/e) [e] ./dir4/a/e
+	EOF
+
+	test-tool dir-iterator --follow-symlinks ./dir4 >out &&
+	sort <out >actual-follow-sorted-output &&
+
+	test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
+
+
+test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
+	cat >expected-rec-symlinks-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir5/a
+	[d] (a/b) [b] ./dir5/a/b
+	[d] (a/b/d) [d] ./dir5/a/b/d
+	[d] (a/c) [c] ./dir5/a/c
+	EOF
+
+	test-tool dir-iterator --follow-symlinks ./dir5 >out &&
+	sort <out >actual-rec-symlinks-sorted-output &&
+
+	test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 07/10] clone: copy hidden paths at local clone
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (5 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
                           ` (3 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Junio C Hamano, Michael Haggerty

Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            | 2 +-
 t/t5604-clone-reference.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index d1aba3b13f..f117a6b206 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -428,7 +428,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 			continue;
 		}
 		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
+			if (!is_dot_or_dotdot(de->d_name))
 				copy_or_link_directory(src, dest,
 						       src_repo, src_baselen);
 			continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 0800c3853f..c3998f2f9e 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	done &&
 	find S-* -name "*some*" | sort >actual &&
 	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-dir
+	S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+	S--dissociate/.git/objects/.some-hidden-dir/some-file
 	S--dissociate/.git/objects/.some-hidden-file
 	S--dissociate/.git/objects/some-dir
 	S--dissociate/.git/objects/some-dir/.some-dot-file
 	S--dissociate/.git/objects/some-dir/some-file
 	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-dir
+	S--local/.git/objects/.some-hidden-dir/.some-dot-file
+	S--local/.git/objects/.some-hidden-dir/some-file
 	S--local/.git/objects/.some-hidden-file
 	S--local/.git/objects/some-dir
 	S--local/.git/objects/some-dir/.some-dot-file
 	S--local/.git/objects/some-dir/some-file
 	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir
+	S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
 	S--no-hardlinks/.git/objects/.some-hidden-file
 	S--no-hardlinks/.git/objects/some-dir
 	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 08/10] clone: extract function from copy_or_link_directory
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (6 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 07/10] clone: copy hidden paths at local clone Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
                           ` (2 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Junio C Hamano, Michael Haggerty

Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f117a6b206..1ee6d6050e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -392,6 +392,21 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
 	fclose(in);
 }
 
+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+	struct stat st;
+
+	if (!mkdir(pathname, mode))
+		return;
+
+	if (errno != EEXIST)
+		die_errno(_("failed to create directory '%s'"), pathname);
+	else if (stat(pathname, &st))
+		die_errno(_("failed to stat '%s'"), pathname);
+	else if (!S_ISDIR(st.st_mode))
+		die(_("%s exists and is not a directory"), pathname);
+}
+
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 				   const char *src_repo, int src_baselen)
 {
@@ -404,14 +419,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	if (!dir)
 		die_errno(_("failed to open '%s'"), src->buf);
 
-	if (mkdir(dest->buf, 0777)) {
-		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
-		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
-	}
+	mkdir_if_missing(dest->buf, 0777);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 09/10] clone: use dir-iterator to avoid explicit dir traversal
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (7 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-05-02 14:48         ` [GSoC][PATCH v6 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Michael Haggerty, Junio C Hamano

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed. Also, with the
dir-iterator API, recursive symlinks will be detected and skipped. This
is another behavior improvement, since the current version would
continue to copy the same content over and over until stat() returned an
ELOOP error.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 1ee6d6050e..f99acd878f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -408,42 +410,39 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
 }
 
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+				   const char *src_repo)
 {
-	struct dirent *de;
-	struct stat buf;
 	int src_len, dest_len;
-	DIR *dir;
-
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
+	struct dir_iterator *iter;
+	int iter_status;
+	unsigned int flags;
 
 	mkdir_if_missing(dest->buf, 0777);
 
+	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+	iter = dir_iterator_begin(src->buf, flags);
+
+	if (!iter)
+		die_errno(_("failed to start iterator over '%s'"), src->buf);
+
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (!is_dot_or_dotdot(de->d_name))
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
@@ -460,7 +459,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
@@ -478,7 +481,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
 		get_common_dir(&dest, dest_repo);
 		strbuf_addstr(&src, "/objects");
 		strbuf_addstr(&dest, "/objects");
-		copy_or_link_directory(&src, &dest, src_repo, src.len);
+		copy_or_link_directory(&src, &dest, src_repo);
 		strbuf_release(&src);
 		strbuf_release(&dest);
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v6 10/10] clone: replace strcmp by fspathcmp
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (8 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-05-02 14:48         ` Matheus Tavares
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-05-02 14:48 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, kernel-usp, Junio C Hamano, Michael Haggerty

Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Suggested-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index f99acd878f..6e0f194c3b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -442,7 +442,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(iter->relative_path, "info/alternates")) {
+		if (!fspathcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests
  2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
                           ` (9 preceding siblings ...)
  2019-05-02 14:48         ` [GSoC][PATCH v6 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
@ 2019-06-18 23:27         ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
                             ` (12 more replies)
  10 siblings, 13 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp

This patchset contains:
- tests to the dir-iterator API;
- dir-iterator refactoring to make its state machine simpler
  and feature adding with tests;
- a replacement of explicit recursive dir iteration at
  copy_or_link_directory for the dir-iterator API;
- some refactoring and behavior changes at local clone, mainly to
  take care of symlinks and hidden files at .git/objects, together
  with tests for these types of files.

Changes since v6:
- Rebased with master;
- Added to dir-iterator documentation that ENOENT errors and hence broken
  symlinks are both ignored.

With the changes brought by this patchset, dir_iterator_begin() may now
return NULL (setting errno) when it finds an error. Also, it's possible
to pass a pedantic flag to it so that dir_iterator_advance() return
immediately on errors. But at refs/files-backend.c, the only user of
the API so far, the flag wasn't used and an empty iterator is
returned in case of errors at dir_iterator_begin(). These actions were
taken in order to keep the files-backend's behavior as close as
possible to the one it previously had. But since it already has guards
for possible errors at dir_iterator_advance(), I'm wondering whether I
should send a follow-up patch making it use the pedantic flag.

Also, should I perhaps call die_errno() on dir_iterator_begin() errors
at files-backend? I mean, we should continue returning an empty
iterator on ENOENT errors since ".git/logs", the dir it iterates over,
may not be present. But we could possibly abort on other errors, just
to be sure...

Any comments on this possible follow-up patch will be highly appreciated.

v6: https://public-inbox.org/git/20190502144829.4394-1-matheus.bernardino@usp.br/
travis build: https://travis-ci.org/matheustavares/git/builds/547451528

Daniel Ferreira (1):
  dir-iterator: add tests for dir-iterator API

Matheus Tavares (8):
  clone: better handle symlinked files at .git/objects/
  dir-iterator: use warning_errno when possible
  dir-iterator: refactor state machine model
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal
  clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 Makefile                     |   1 +
 builtin/clone.c              |  75 +++++----
 dir-iterator.c               | 289 +++++++++++++++++++++--------------
 dir-iterator.h               |  60 ++++++--
 refs/files-backend.c         |  17 ++-
 t/helper/test-dir-iterator.c |  58 +++++++
 t/helper/test-tool.c         |   1 +
 t/helper/test-tool.h         |   1 +
 t/t0066-dir-iterator.sh      | 163 ++++++++++++++++++++
 t/t5604-clone-reference.sh   | 133 ++++++++++++++++
 10 files changed, 635 insertions(+), 163 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 01/10] clone: test for our behavior on odd objects/* content
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
                             ` (11 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Alex Riesen,
	Junio C Hamano

From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..207650cb95 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,115 @@ test_expect_success 'clone, dissociate from alternates' '
 	( cd C && git fsck )
 '
 
+test_expect_success 'setup repo with garbage in objects/*' '
+	git init S &&
+	(
+		cd S &&
+		test_commit A &&
+
+		cd .git/objects &&
+		>.some-hidden-file &&
+		>some-file &&
+		mkdir .some-hidden-dir &&
+		>.some-hidden-dir/some-file &&
+		>.some-hidden-dir/.some-dot-file &&
+		mkdir some-dir &&
+		>some-dir/some-file &&
+		>some-dir/.some-dot-file
+	)
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option S S$option || return 1 &&
+		git -C S$option fsck || return 1
+	done &&
+	find S-* -name "*some*" | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-file
+	S--dissociate/.git/objects/some-dir
+	S--dissociate/.git/objects/some-dir/.some-dot-file
+	S--dissociate/.git/objects/some-dir/some-file
+	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-file
+	S--local/.git/objects/some-dir
+	S--local/.git/objects/some-dir/.some-dot-file
+	S--local/.git/objects/some-dir/some-file
+	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-file
+	S--no-hardlinks/.git/objects/some-dir
+	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/some-dir/some-file
+	S--no-hardlinks/.git/objects/some-file
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+	git init T &&
+	(
+		cd T &&
+		git config gc.auto 0 &&
+		test_commit A &&
+		git gc &&
+		test_commit B &&
+
+		cd .git/objects &&
+		mv pack packs &&
+		ln -s packs pack &&
+		find ?? -type d >loose-dirs &&
+		last_loose=$(tail -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+		mv $last_loose a-loose-dir &&
+		ln -s a-loose-dir $last_loose &&
+		find . -type f | sort >../../../T.objects-files.raw &&
+		echo unknown_content> unknown_file
+	) &&
+	git -C T fsck &&
+	git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option T T$option || return 1 &&
+		git -C T$option fsck || return 1 &&
+		git -C T$option rev-list --all --objects >T$option.objects &&
+		test_cmp T.objects T$option.objects &&
+		(
+			cd T$option/.git/objects &&
+			find . -type f | sort >../../../T$option.objects-files.raw
+		)
+	done &&
+
+	for raw in $(ls T*.raw)
+	do
+		sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
+		    -e "/multi-pack-index/d" <$raw >$raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./Y/Z
+	./Y/Z
+	./a-loose-dir/Z
+	./Y/Z
+	./info/packs
+	./pack/pack-Z.idx
+	./pack/pack-Z.pack
+	./packs/pack-Z.idx
+	./packs/pack-Z.pack
+	./unknown_file
+	EOF
+
+	for option in --local --dissociate --no-hardlinks
+	do
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+	done &&
+
+	echo ./info/alternates >expected-files &&
+	test_cmp expected-files T--shared.objects-files.raw
+'
+
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 02/10] clone: better handle symlinked files at .git/objects/
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
                             ` (10 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Junio C Hamano,
	Jeff King

There is currently an odd behaviour when locally cloning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it, Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
    $ touch a
    $ ln -s a b
    $ link b c
    $ ls -li a b c
    155 [...] a
    156 [...] b -> a
    156 [...] c -> a

But on NetBSD:
    $ ls -li a b c
    2609160 [...] a
    2609164 [...] b -> a
    2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
besides that, the current behaviour on GNU/Linux may result in broken
symlinks. So let's standardize this by making the hardlinks always point
to dereferenced paths, instead of the symlinks themselves. Also, add
tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            |  2 +-
 t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5b9ebe9947..4a0a2455a7 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -445,7 +445,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (unlink(dest->buf) && errno != ENOENT)
 			die_errno(_("failed to unlink '%s'"), dest->buf);
 		if (!option_no_hardlinks) {
-			if (!link(src->buf, dest->buf))
+			if (!link(real_path(src->buf), dest->buf))
 				continue;
 			if (option_local > 0)
 				die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 207650cb95..0800c3853f 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	test_cmp expected actual
 '
 
-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
 	git init T &&
 	(
 		cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 		ln -s packs pack &&
 		find ?? -type d >loose-dirs &&
 		last_loose=$(tail -n 1 loose-dirs) &&
-		rm -f loose-dirs &&
 		mv $last_loose a-loose-dir &&
 		ln -s a-loose-dir $last_loose &&
+		first_loose=$(head -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+
+		cd $first_loose &&
+		obj=$(ls *) &&
+		mv $obj ../an-object &&
+		ln -s ../an-object $obj &&
+
+		cd ../ &&
 		find . -type f | sort >../../../T.objects-files.raw &&
+		find . -type l | sort >../../../T.objects-symlinks.raw &&
 		echo unknown_content> unknown_file
 	) &&
 	git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 '
 
 
-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
 	for option in --local --no-hardlinks --shared --dissociate
 	do
 		git clone $option T T$option || return 1 &&
@@ -300,7 +309,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 		test_cmp T.objects T$option.objects &&
 		(
 			cd T$option/.git/objects &&
-			find . -type f | sort >../../../T$option.objects-files.raw
+			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type l | sort >../../../T$option.objects-symlinks.raw
 		)
 	done &&
 
@@ -314,6 +324,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./Y/Z
 	./Y/Z
 	./a-loose-dir/Z
+	./an-object
 	./Y/Z
 	./info/packs
 	./pack/pack-Z.idx
@@ -323,13 +334,15 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./unknown_file
 	EOF
 
-	for option in --local --dissociate --no-hardlinks
+	for option in --local --no-hardlinks --dissociate
 	do
-		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+		test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
 	done &&
 
 	echo ./info/alternates >expected-files &&
-	test_cmp expected-files T--shared.objects-files.raw
+	test_cmp expected-files T--shared.objects-files.raw &&
+	test_must_be_empty T--shared.objects-symlinks.raw
 '
 
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 03/10] dir-iterator: add tests for dir-iterator API
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
                             ` (9 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Daniel Ferreira,
	Junio C Hamano

From: Daniel Ferreira <bnmvco@gmail.com>

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir-iterator.

Create t/t0066-dir-iterator.sh, which tests that dir-iterator does
iterate through a whole directory tree as expected.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
[matheus.bernardino: update to use test-tool and some minor aesthetics]
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile                     |  1 +
 t/helper/test-dir-iterator.c | 33 ++++++++++++++++++++++
 t/helper/test-tool.c         |  1 +
 t/helper/test-tool.h         |  1 +
 t/t0066-dir-iterator.sh      | 55 ++++++++++++++++++++++++++++++++++++
 5 files changed, 91 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

diff --git a/Makefile b/Makefile
index f58bf14c7b..7e2a44cccc 100644
--- a/Makefile
+++ b/Makefile
@@ -704,6 +704,7 @@ TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
+TEST_BUILTINS_OBJS += test-dir-iterator.o
 TEST_BUILTINS_OBJS += test-drop-caches.o
 TEST_BUILTINS_OBJS += test-dump-cache-tree.o
 TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000000..84f50bed8c
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,33 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+/* Argument is a directory path to iterate over */
+int cmd__dir_iterator(int argc, const char **argv)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2)
+		die("BUG: test-dir-iterator needs one argument");
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else if (S_ISREG(diter->st.st_mode))
+			printf("[f] ");
+		else
+			printf("[?] ");
+
+		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
+		       diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 087a8c0cc9..7bc9bb231e 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,6 +19,7 @@ static struct test_cmd cmds[] = {
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
+	{ "dir-iterator", cmd__dir_iterator },
 	{ "drop-caches", cmd__drop_caches },
 	{ "dump-cache-tree", cmd__dump_cache_tree },
 	{ "dump-fsmonitor", cmd__dump_fsmonitor },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7e703f3038..ec0ffbd0cb 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -9,6 +9,7 @@ int cmd__config(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
+int cmd__dir_iterator(int argc, const char **argv);
 int cmd__drop_caches(int argc, const char **argv);
 int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
new file mode 100755
index 0000000000..6e06dc038d
--- /dev/null
+++ b/t/t0066-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test the dir-iterator functionality'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	mkdir -p dir2/a/b/c/ &&
+	>dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	cat >expected-iteration-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir/a
+	[d] (a/b) [b] ./dir/a/b
+	[d] (a/b/c) [c] ./dir/a/b/c
+	[d] (d) [d] ./dir/d
+	[d] (d/e) [e] ./dir/d/e
+	[d] (d/e/d) [d] ./dir/d/e/d
+	[f] (a/b/c/d) [d] ./dir/a/b/c/d
+	[f] (a/e) [e] ./dir/a/e
+	[f] (b) [b] ./dir/b
+	[f] (c) [c] ./dir/c
+	[f] (d/e/d/a) [a] ./dir/d/e/d/a
+	EOF
+
+	test-tool dir-iterator ./dir >out &&
+	sort <out >./actual-iteration-sorted-output &&
+
+	test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	cat >expected-pre-order-output <<-EOF &&
+	[d] (a) [a] ./dir2/a
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	EOF
+
+	test-tool dir-iterator ./dir2 >actual-pre-order-output &&
+
+	test_cmp expected-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 04/10] dir-iterator: use warning_errno when possible
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (2 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 05/10] dir-iterator: refactor state machine model Matheus Tavares
                             ` (8 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Junio C Hamano,
	Michael Haggerty

Change warning(..., strerror(errno)) by warning_errno(...). This helps
to unify warning display besides simplifying a bit the code. Also,
improve warning messages by surrounding paths with quotation marks and
using more meaningful statements.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..0c8880868a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -71,8 +71,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
+				warning_errno("error opening directory '%s'",
+					      iter->base.path.buf);
 				/* Popping the level is handled below */
 			}
 
@@ -122,11 +122,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
-					warning("error reading directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+					warning_errno("error reading directory '%s'",
+						      iter->base.path.buf);
 				} else if (closedir(level->dir))
-					warning("error closing directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+					warning_errno("error closing directory '%s'",
+						      iter->base.path.buf);
 
 				level->dir = NULL;
 				if (--iter->levels_nr == 0)
@@ -140,9 +140,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			strbuf_addstr(&iter->base.path, de->d_name);
 			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
 				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
+					warning_errno("failed to stat '%s'",
+						      iter->base.path.buf);
 				continue;
 			}
 
@@ -170,9 +169,11 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 			&iter->levels[iter->levels_nr - 1];
 
 		if (level->dir && closedir(level->dir)) {
+			int saved_errno = errno;
 			strbuf_setlen(&iter->base.path, level->prefix_len);
-			warning("error closing directory %s: %s",
-				iter->base.path.buf, strerror(errno));
+			errno = saved_errno;
+			warning_errno("error closing directory '%s'",
+				      iter->base.path.buf);
 		}
 	}
 
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 05/10] dir-iterator: refactor state machine model
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (3 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
                             ` (7 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Daniel Ferreira,
	Jeff King, Johannes Schindelin, Michael Haggerty, Junio C Hamano,
	Ramsay Jones

dir_iterator_advance() is a large function with two nested loops. Let's
improve its readability factoring out three functions and simplifying
its mechanics. The refactored model will no longer depend on
level.initialized and level.dir_state to keep track of the iteration
state and will perform on a single loop.

Also, dir_iterator_begin() currently does not check if the given string
represents a valid directory path. Since the refactored model will have
to stat() the given path at initialization, let's also check for this
kind of error and make dir_iterator_begin() return NULL, on failures,
with errno appropriately set. And add tests for this new behavior.

Improve documentation at dir-iteration.h and code comments at
dir-iterator.c to reflect the changes and eliminate possible
ambiguities.

Finally, adjust refs/files-backend.c to check for now possible
dir_iterator_begin() failures.

Original-patch-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c               | 234 ++++++++++++++++++-----------------
 dir-iterator.h               |  15 ++-
 refs/files-backend.c         |  17 ++-
 t/helper/test-dir-iterator.c |   5 +
 t/t0066-dir-iterator.sh      |  13 ++
 5 files changed, 163 insertions(+), 121 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 0c8880868a..594fe4d67b 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -13,16 +11,6 @@ struct dir_iterator_level {
 	 * (including a trailing '/'):
 	 */
 	size_t prefix_len;
-
-	/*
-	 * The last action that has been taken with the current entry
-	 * (needed for directories, which have to be included in the
-	 * iteration and also iterated into):
-	 */
-	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
-	} dir_state;
 };
 
 /*
@@ -34,9 +22,11 @@ struct dir_iterator_int {
 	struct dir_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack. After the first
+	 * call to dir_iterator_begin(), if it succeeds to open the
+	 * first level's dir, this will always be at least 1. Then,
+	 * when it comes to zero the iteration is ended and this
+	 * struct is freed.
 	 */
 	size_t levels_nr;
 
@@ -50,113 +40,118 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+/*
+ * Push a level in the iter stack and initialize it with information from
+ * the directory pointed by iter->base->path. It is assumed that this
+ * strbuf points to a valid directory path. Return 0 on success and -1
+ * otherwise, leaving the stack unchanged.
+ */
+static int push_level(struct dir_iterator_int *iter)
+{
+	struct dir_iterator_level *level;
+
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1, iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+
+	if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
+		strbuf_addch(&iter->base.path, '/');
+	level->prefix_len = iter->base.path.len;
+
+	level->dir = opendir(iter->base.path.buf);
+	if (!level->dir) {
+		if (errno != ENOENT) {
+			warning_errno("error opening directory '%s'",
+				      iter->base.path.buf);
+		}
+		iter->levels_nr--;
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Pop the top level on the iter stack, releasing any resources associated
+ * with it. Return the new value of iter->levels_nr.
+ */
+static int pop_level(struct dir_iterator_int *iter)
+{
+	struct dir_iterator_level *level =
+		&iter->levels[iter->levels_nr - 1];
+
+	if (level->dir && closedir(level->dir))
+		warning_errno("error closing directory '%s'",
+			      iter->base.path.buf);
+	level->dir = NULL;
+
+	return --iter->levels_nr;
+}
+
+/*
+ * Populate iter->base with the necessary information on the next iteration
+ * entry, represented by the given dirent de. Return 0 on success and -1
+ * otherwise.
+ */
+static int prepare_next_entry_data(struct dir_iterator_int *iter,
+				   struct dirent *de)
+{
+	strbuf_addstr(&iter->base.path, de->d_name);
+	/*
+	 * We have to reset these because the path strbuf might have
+	 * been realloc()ed at the previous strbuf_addstr().
+	 */
+	iter->base.relative_path = iter->base.path.buf +
+				   iter->levels[0].prefix_len;
+	iter->base.basename = iter->base.path.buf +
+			      iter->levels[iter->levels_nr - 1].prefix_len;
+
+	if (lstat(iter->base.path.buf, &iter->base.st)) {
+		if (errno != ENOENT)
+			warning_errno("failed to stat '%s'", iter->base.path.buf);
+		return -1;
+	}
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
 
+	if (S_ISDIR(iter->base.st.st_mode)) {
+		if (push_level(iter) && iter->levels_nr == 0) {
+			/* Pushing the first level failed */
+			return dir_iterator_abort(dir_iterator);
+		}
+	}
+
+	/* Loop until we find an entry that we can give back to the caller. */
 	while (1) {
+		struct dirent *de;
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
-		struct dirent *de;
 
-		if (!level->initialized) {
-			/*
-			 * Note: dir_iterator_begin() ensures that
-			 * path is not the empty string.
-			 */
-			if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
-				strbuf_addch(&iter->base.path, '/');
-			level->prefix_len = iter->base.path.len;
-
-			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning_errno("error opening directory '%s'",
+		strbuf_setlen(&iter->base.path, level->prefix_len);
+		errno = 0;
+		de = readdir(level->dir);
+
+		if (!de) {
+			if (errno)
+				warning_errno("error reading directory '%s'",
 					      iter->base.path.buf);
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
-				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
-				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
-				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
-			}
+			else if (pop_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
+			continue;
 		}
 
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (--iter->levels_nr == 0)
-				return dir_iterator_abort(dir_iterator);
+		if (is_dot_or_dotdot(de->d_name))
+			continue;
 
+		if (prepare_next_entry_data(iter, de))
 			continue;
-		}
 
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
-			strbuf_setlen(&iter->base.path, level->prefix_len);
-			errno = 0;
-			de = readdir(level->dir);
-
-			if (!de) {
-				/* This level is exhausted; pop up a level. */
-				if (errno) {
-					warning_errno("error reading directory '%s'",
-						      iter->base.path.buf);
-				} else if (closedir(level->dir))
-					warning_errno("error closing directory '%s'",
-						      iter->base.path.buf);
-
-				level->dir = NULL;
-				if (--iter->levels_nr == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
-			}
-
-			if (is_dot_or_dotdot(de->d_name))
-				continue;
-
-			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning_errno("failed to stat '%s'",
-						      iter->base.path.buf);
-				continue;
-			}
-
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
-
-			return ITER_OK;
-		}
+		return ITER_OK;
 	}
 }
 
@@ -187,17 +182,32 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
-
-	if (!path || !*path)
-		BUG("empty path passed to dir_iterator_begin()");
+	int saved_errno;
 
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+	iter->levels_nr = 0;
 
-	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	/*
+	 * Note: stat already checks for NULL or empty strings and
+	 * inexistent paths.
+	 */
+	if (stat(iter->base.path.buf, &iter->base.st) < 0) {
+		saved_errno = errno;
+		goto error_out;
+	}
+
+	if (!S_ISDIR(iter->base.st.st_mode)) {
+		saved_errno = ENOTDIR;
+		goto error_out;
+	}
 
 	return dir_iterator;
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	errno = saved_errno;
+	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..0822821e56 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -8,19 +8,23 @@
  *
  * Iterate over a directory tree, recursively, including paths of all
  * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * symlinks except for the original path. Note that the original path
+ * is not included in the iteration.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined, directory paths are always given before
+ * their contents.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
  *     struct iterator *iter = dir_iterator_begin(path);
  *
+ *     if (!iter)
+ *             goto error_handler;
+ *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
  *                     ok = dir_iterator_abort(iter);
@@ -59,8 +63,9 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path. On success, return a
+ * dir_iterator that holds the internal state of the iteration.
+ * In case of failure, return NULL and set errno accordingly.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 63e55e6773..7ed81046d4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,13 +2143,22 @@ static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 						  const char *gitdir)
 {
-	struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
-	struct ref_iterator *ref_iterator = &iter->base;
+	struct dir_iterator *diter;
+	struct files_reflog_iterator *iter;
+	struct ref_iterator *ref_iterator;
 	struct strbuf sb = STRBUF_INIT;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+
+	diter = dir_iterator_begin(sb.buf);
+	if(!diter)
+		return empty_ref_iterator_begin();
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 84f50bed8c..fab1ff6237 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -17,6 +17,11 @@ int cmd__dir_iterator(int argc, const char **argv)
 
 	diter = dir_iterator_begin(path.buf);
 
+	if (!diter) {
+		printf("dir_iterator_begin failure: %d\n", errno);
+		exit(EXIT_FAILURE);
+	}
+
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
 			printf("[d] ");
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 6e06dc038d..c739ed7911 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -52,4 +52,17 @@ test_expect_success 'dir-iterator should list files in the correct order' '
 	test_cmp expected-pre-order-output actual-pre-order-output
 '
 
+test_expect_success 'begin should fail upon inexistent paths' '
+	test_must_fail test-tool dir-iterator ./inexistent-path \
+		>actual-inexistent-path-output &&
+	echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+	test_cmp expected-inexistent-path-output actual-inexistent-path-output
+'
+
+test_expect_success 'begin should fail upon non directory paths' '
+	test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
+	echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+	test_cmp expected-non-dir-output actual-non-dir-output
+'
+
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (4 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 05/10] dir-iterator: refactor state machine model Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-25 18:00             ` Junio C Hamano
                               ` (2 more replies)
  2019-06-18 23:27           ` [GSoC][PATCH v7 07/10] clone: copy hidden paths at local clone Matheus Tavares
                             ` (6 subsequent siblings)
  12 siblings, 3 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Michael Haggerty,
	Daniel Ferreira, Ramsay Jones, Junio C Hamano

Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.

These new flags will be used in a subsequent patch.

Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c               | 82 +++++++++++++++++++++++++------
 dir-iterator.h               | 51 ++++++++++++++-----
 refs/files-backend.c         |  2 +-
 t/helper/test-dir-iterator.c | 34 ++++++++++---
 t/t0066-dir-iterator.sh      | 95 ++++++++++++++++++++++++++++++++++++
 5 files changed, 229 insertions(+), 35 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 594fe4d67b..52db87bdc9 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -6,6 +6,9 @@
 struct dir_iterator_level {
 	DIR *dir;
 
+	/* The inode number of this level's directory. */
+	ino_t ino;
+
 	/*
 	 * The length of the directory part of path at this level
 	 * (including a trailing '/'):
@@ -38,13 +41,16 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned int flags;
 };
 
 /*
  * Push a level in the iter stack and initialize it with information from
  * the directory pointed by iter->base->path. It is assumed that this
  * strbuf points to a valid directory path. Return 0 on success and -1
- * otherwise, leaving the stack unchanged.
+ * otherwise, setting errno accordingly and leaving the stack unchanged.
  */
 static int push_level(struct dir_iterator_int *iter)
 {
@@ -56,14 +62,17 @@ static int push_level(struct dir_iterator_int *iter)
 	if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
 		strbuf_addch(&iter->base.path, '/');
 	level->prefix_len = iter->base.path.len;
+	level->ino = iter->base.st.st_ino;
 
 	level->dir = opendir(iter->base.path.buf);
 	if (!level->dir) {
+		int saved_errno = errno;
 		if (errno != ENOENT) {
 			warning_errno("error opening directory '%s'",
 				      iter->base.path.buf);
 		}
 		iter->levels_nr--;
+		errno = saved_errno;
 		return -1;
 	}
 
@@ -90,11 +99,13 @@ static int pop_level(struct dir_iterator_int *iter)
 /*
  * Populate iter->base with the necessary information on the next iteration
  * entry, represented by the given dirent de. Return 0 on success and -1
- * otherwise.
+ * otherwise, setting errno accordingly.
  */
 static int prepare_next_entry_data(struct dir_iterator_int *iter,
 				   struct dirent *de)
 {
+	int err, saved_errno;
+
 	strbuf_addstr(&iter->base.path, de->d_name);
 	/*
 	 * We have to reset these because the path strbuf might have
@@ -105,12 +116,34 @@ static int prepare_next_entry_data(struct dir_iterator_int *iter,
 	iter->base.basename = iter->base.path.buf +
 			      iter->levels[iter->levels_nr - 1].prefix_len;
 
-	if (lstat(iter->base.path.buf, &iter->base.st)) {
-		if (errno != ENOENT)
-			warning_errno("failed to stat '%s'", iter->base.path.buf);
-		return -1;
-	}
+	if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+		err = stat(iter->base.path.buf, &iter->base.st);
+	else
+		err = lstat(iter->base.path.buf, &iter->base.st);
+
+	saved_errno = errno;
+	if (err && errno != ENOENT)
+		warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+	errno = saved_errno;
+	return err;
+}
+
+/*
+ * Look for a recursive symlink at iter->base.path pointing to any directory on
+ * the previous stack levels. If it is found, return 1. If not, return 0.
+ */
+static int find_recursive_symlinks(struct dir_iterator_int *iter)
+{
+	int i;
+
+	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
+	    !S_ISDIR(iter->base.st.st_mode))
+		return 0;
 
+	for (i = 0; i < iter->levels_nr; ++i)
+		if (iter->base.st.st_ino == iter->levels[i].ino)
+			return 1;
 	return 0;
 }
 
@@ -119,11 +152,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
 
-	if (S_ISDIR(iter->base.st.st_mode)) {
-		if (push_level(iter) && iter->levels_nr == 0) {
-			/* Pushing the first level failed */
-			return dir_iterator_abort(dir_iterator);
-		}
+	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
+		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+			goto error_out;
+		if (iter->levels_nr == 0)
+			goto error_out;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -137,22 +170,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		de = readdir(level->dir);
 
 		if (!de) {
-			if (errno)
+			if (errno) {
 				warning_errno("error reading directory '%s'",
 					      iter->base.path.buf);
-			else if (pop_level(iter) == 0)
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
+			} else if (pop_level(iter) == 0) {
 				return dir_iterator_abort(dir_iterator);
+			}
 			continue;
 		}
 
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		if (prepare_next_entry_data(iter, de))
+		if (prepare_next_entry_data(iter, de)) {
+			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+				goto error_out;
 			continue;
+		}
+
+		if (find_recursive_symlinks(iter)) {
+			warning("ignoring recursive symlink at '%s'",
+				iter->base.path.buf);
+			continue;
+		}
 
 		return ITER_OK;
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -178,7 +227,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -189,6 +238,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 	iter->levels_nr = 0;
+	iter->flags = flags;
 
 	/*
 	 * Note: stat already checks for NULL or empty strings and
diff --git a/dir-iterator.h b/dir-iterator.h
index 0822821e56..42751091a5 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -20,7 +20,8 @@
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ *     struct dir_iterator *iter = dir_iterator_begin(path, flags);
  *
  *     if (!iter)
  *             goto error_handler;
@@ -44,6 +45,25 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error at dir_iterator_advance(), which is to keep
+ *   looking for a next valid entry. With this flag, resources are freed
+ *   and ITER_ERROR is returned immediately. In both cases, a meaningful
+ *   warning is emitted. Note: ENOENT errors are always ignored so that
+ *   the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ *   i.e., linked directories' contents will be iterated over and
+ *   iter->base.st will contain information on the referred files,
+ *   not the symlinks themselves, which is the default behavior.
+ *   Recursive symlinks are skipped with a warning and broken symlinks
+ *   are ignored.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -58,29 +78,38 @@ struct dir_iterator {
 	/* The current basename: */
 	const char *basename;
 
-	/* The result of calling lstat() on path: */
+	/*
+	 * The result of calling lstat() on path; or stat(), if the
+	 * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+	 * dir_iterator's initialization.
+	 */
 	struct stat st;
 };
 
 /*
- * Start a directory iteration over path. On success, return a
- * dir_iterator that holds the internal state of the iteration.
- * In case of failure, return NULL and set errno accordingly.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. On success, return a dir_iterator
+ * that holds the internal state of the iteration. In case of
+ * failure, return NULL and set errno accordingly.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behavior.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
  * If the iteration is exhausted, free the dir_iterator and any
- * resources associated with it and return ITER_DONE. On error, free
- * dir_iterator and associated resources and return ITER_ERROR. It is
- * a bug to use iterator or call this function again after it has
- * returned ITER_DONE or ITER_ERROR.
+ * resources associated with it and return ITER_DONE.
+ *
+ * It is a bug to use iterator or call this function again after it
+ * has returned ITER_DONE or ITER_ERROR (which may be returned iff
+ * the DIR_ITERATOR_PEDANTIC flag was set).
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7ed81046d4..b1f8f53a09 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2150,7 +2150,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	strbuf_addf(&sb, "%s/logs", gitdir);
 
-	diter = dir_iterator_begin(sb.buf);
+	diter = dir_iterator_begin(sb.buf, 0);
 	if(!diter)
 		return empty_ref_iterator_begin();
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index fab1ff6237..a5b96cb0dc 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,29 +4,44 @@
 #include "iterator.h"
 #include "dir-iterator.h"
 
-/* Argument is a directory path to iterate over */
+/*
+ * usage:
+ * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
+ */
 int cmd__dir_iterator(int argc, const char **argv)
 {
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
+	unsigned int flags = 0;
+	int iter_status;
+
+	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+		if (strcmp(*argv, "--follow-symlinks") == 0)
+			flags |= DIR_ITERATOR_FOLLOW_SYMLINKS;
+		else if (strcmp(*argv, "--pedantic") == 0)
+			flags |= DIR_ITERATOR_PEDANTIC;
+		else
+			die("invalid option '%s'", *argv);
+	}
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
-
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (!*argv || argc != 1)
+		die("dir-iterator needs exactly one non-option argument");
 
-	diter = dir_iterator_begin(path.buf);
+	strbuf_add(&path, *argv, strlen(*argv));
+	diter = dir_iterator_begin(path.buf, flags);
 
 	if (!diter) {
 		printf("dir_iterator_begin failure: %d\n", errno);
 		exit(EXIT_FAILURE);
 	}
 
-	while (dir_iterator_advance(diter) == ITER_OK) {
+	while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
 			printf("[d] ");
 		else if (S_ISREG(diter->st.st_mode))
 			printf("[f] ");
+		else if (S_ISLNK(diter->st.st_mode))
+			printf("[s] ");
 		else
 			printf("[?] ");
 
@@ -34,5 +49,10 @@ int cmd__dir_iterator(int argc, const char **argv)
 		       diter->path.buf);
 	}
 
+	if (iter_status != ITER_DONE) {
+		printf("dir_iterator_advance failure\n");
+		return 1;
+	}
+
 	return 0;
 }
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index c739ed7911..8f996a31fa 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
 	test_cmp expected-non-dir-output actual-non-dir-output
 '
 
+test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
+	cat >expected-no-permissions-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	EOF
+
+	mkdir -p dir3/a &&
+	> dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
+	test_cmp expected-no-permissions-output actual-no-permissions-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
+	cat >expected-no-permissions-pedantic-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	dir_iterator_advance failure
+	EOF
+
+	mkdir -p dir3/a &&
+	> dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test_must_fail test-tool dir-iterator --pedantic ./dir3 \
+		>actual-no-permissions-pedantic-output &&
+	test_cmp expected-no-permissions-pedantic-output \
+		actual-no-permissions-pedantic-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success SYMLINKS 'setup dirs with symlinks' '
+	mkdir -p dir4/a &&
+	mkdir -p dir4/b/c &&
+	>dir4/a/d &&
+	ln -s d dir4/a/e &&
+	ln -s ../b dir4/a/f &&
+
+	mkdir -p dir5/a/b &&
+	mkdir -p dir5/a/c &&
+	ln -s ../c dir5/a/b/d &&
+	ln -s ../ dir5/a/b/e &&
+	ln -s ../../ dir5/a/b/f
+'
+
+test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
+	cat >expected-no-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[s] (a/e) [e] ./dir4/a/e
+	[s] (a/f) [f] ./dir4/a/f
+	EOF
+
+	test-tool dir-iterator ./dir4 >out &&
+	sort <out >actual-no-follow-sorted-output &&
+
+	test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
+
+test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
+	cat >expected-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (a/f) [f] ./dir4/a/f
+	[d] (a/f/c) [c] ./dir4/a/f/c
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[f] (a/e) [e] ./dir4/a/e
+	EOF
+
+	test-tool dir-iterator --follow-symlinks ./dir4 >out &&
+	sort <out >actual-follow-sorted-output &&
+
+	test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
+
+
+test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
+	cat >expected-rec-symlinks-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir5/a
+	[d] (a/b) [b] ./dir5/a/b
+	[d] (a/b/d) [d] ./dir5/a/b/d
+	[d] (a/c) [c] ./dir5/a/c
+	EOF
+
+	test-tool dir-iterator --follow-symlinks ./dir5 >out &&
+	sort <out >actual-rec-symlinks-sorted-output &&
+
+	test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
+'
+
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 07/10] clone: copy hidden paths at local clone
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (5 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
                             ` (5 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Jeff King,
	Junio C Hamano

Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            | 2 +-
 t/t5604-clone-reference.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4a0a2455a7..9dd083e34d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -430,7 +430,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 			continue;
 		}
 		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
+			if (!is_dot_or_dotdot(de->d_name))
 				copy_or_link_directory(src, dest,
 						       src_repo, src_baselen);
 			continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 0800c3853f..c3998f2f9e 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	done &&
 	find S-* -name "*some*" | sort >actual &&
 	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-dir
+	S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+	S--dissociate/.git/objects/.some-hidden-dir/some-file
 	S--dissociate/.git/objects/.some-hidden-file
 	S--dissociate/.git/objects/some-dir
 	S--dissociate/.git/objects/some-dir/.some-dot-file
 	S--dissociate/.git/objects/some-dir/some-file
 	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-dir
+	S--local/.git/objects/.some-hidden-dir/.some-dot-file
+	S--local/.git/objects/.some-hidden-dir/some-file
 	S--local/.git/objects/.some-hidden-file
 	S--local/.git/objects/some-dir
 	S--local/.git/objects/some-dir/.some-dot-file
 	S--local/.git/objects/some-dir/some-file
 	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir
+	S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
 	S--no-hardlinks/.git/objects/.some-hidden-file
 	S--no-hardlinks/.git/objects/some-dir
 	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 08/10] clone: extract function from copy_or_link_directory
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (6 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 07/10] clone: copy hidden paths at local clone Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
                             ` (4 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Jeff King,
	Junio C Hamano

Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9dd083e34d..96566c1bab 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -394,6 +394,21 @@ static void copy_alternates(struct strbuf *src, const char *src_repo)
 	fclose(in);
 }
 
+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+	struct stat st;
+
+	if (!mkdir(pathname, mode))
+		return;
+
+	if (errno != EEXIST)
+		die_errno(_("failed to create directory '%s'"), pathname);
+	else if (stat(pathname, &st))
+		die_errno(_("failed to stat '%s'"), pathname);
+	else if (!S_ISDIR(st.st_mode))
+		die(_("%s exists and is not a directory"), pathname);
+}
+
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 				   const char *src_repo, int src_baselen)
 {
@@ -406,14 +421,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	if (!dir)
 		die_errno(_("failed to open '%s'"), src->buf);
 
-	if (mkdir(dest->buf, 0777)) {
-		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
-		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
-	}
+	mkdir_if_missing(dest->buf, 0777);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 09/10] clone: use dir-iterator to avoid explicit dir traversal
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (7 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-18 23:27           ` [GSoC][PATCH v7 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
                             ` (3 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Junio C Hamano,
	Jeff King

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed. Also, with the
dir-iterator API, recursive symlinks will be detected and skipped. This
is another behavior improvement, since the current version would
continue to copy the same content over and over until stat() returned an
ELOOP error.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 96566c1bab..47cb4a2a8e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -410,42 +412,39 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
 }
 
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+				   const char *src_repo)
 {
-	struct dirent *de;
-	struct stat buf;
 	int src_len, dest_len;
-	DIR *dir;
-
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
+	struct dir_iterator *iter;
+	int iter_status;
+	unsigned int flags;
 
 	mkdir_if_missing(dest->buf, 0777);
 
+	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+	iter = dir_iterator_begin(src->buf, flags);
+
+	if (!iter)
+		die_errno(_("failed to start iterator over '%s'"), src->buf);
+
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (!is_dot_or_dotdot(de->d_name))
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, src_repo);
 			continue;
 		}
@@ -462,7 +461,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
@@ -480,7 +483,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
 		get_common_dir(&dest, dest_repo);
 		strbuf_addstr(&src, "/objects");
 		strbuf_addstr(&dest, "/objects");
-		copy_or_link_directory(&src, &dest, src_repo, src.len);
+		copy_or_link_directory(&src, &dest, src_repo);
 		strbuf_release(&src);
 		strbuf_release(&dest);
 	}
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v7 10/10] clone: replace strcmp by fspathcmp
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (8 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-06-18 23:27           ` Matheus Tavares
  2019-06-19  4:36           ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares Bernardino
                             ` (2 subsequent siblings)
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-06-18 23:27 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Jeff King,
	Junio C Hamano

Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Suggested-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 47cb4a2a8e..8da696ef30 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -444,7 +444,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(iter->relative_path, "info/alternates")) {
+		if (!fspathcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, src_repo);
 			continue;
 		}
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (9 preceding siblings ...)
  2019-06-18 23:27           ` [GSoC][PATCH v7 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
@ 2019-06-19  4:36           ` Matheus Tavares Bernardino
  2019-06-20 20:18           ` Junio C Hamano
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
  12 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-06-19  4:36 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty,
	Daniel Ferreira

On Tue, Jun 18, 2019 at 8:28 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> This patchset contains:
> - tests to the dir-iterator API;
> - dir-iterator refactoring to make its state machine simpler
>   and feature adding with tests;
> - a replacement of explicit recursive dir iteration at
>   copy_or_link_directory for the dir-iterator API;
> - some refactoring and behavior changes at local clone, mainly to
>   take care of symlinks and hidden files at .git/objects, together
>   with tests for these types of files.
>
> Changes since v6:
> - Rebased with master;
> - Added to dir-iterator documentation that ENOENT errors and hence broken
>   symlinks are both ignored.
>
> With the changes brought by this patchset, dir_iterator_begin() may now
> return NULL (setting errno) when it finds an error. Also, it's possible
> to pass a pedantic flag to it so that dir_iterator_advance() return
> immediately on errors. But at refs/files-backend.c, the only user of
> the API so far, the flag wasn't used and an empty iterator is
> returned in case of errors at dir_iterator_begin(). These actions were
> taken in order to keep the files-backend's behavior as close as
> possible to the one it previously had. But since it already has guards
> for possible errors at dir_iterator_advance(), I'm wondering whether I
> should send a follow-up patch making it use the pedantic flag.
>
> Also, should I perhaps call die_errno() on dir_iterator_begin() errors
> at files-backend? I mean, we should continue returning an empty
> iterator on ENOENT errors since ".git/logs", the dir it iterates over,
> may not be present. But we could possibly abort on other errors, just
> to be sure...

I got ahead of myself in this last paragraph. ".git/logs" is one of the dirs
that files-backend.c is used to iterate over, but it doesn't mean it's the only
one. This dir, in particular, is iterated when we run 'git rev-list
--reflog', for
example. And upon ENOENTs, the iteration is expected to end
successfully but with no entries.

(also adding Michael and Daniel to CC, in case they have some input on
these ideas)

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (10 preceding siblings ...)
  2019-06-19  4:36           ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares Bernardino
@ 2019-06-20 20:18           ` Junio C Hamano
  2019-06-21 13:41             ` Matheus Tavares Bernardino
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
  12 siblings, 1 reply; 127+ messages in thread
From: Junio C Hamano @ 2019-06-20 20:18 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp

Matheus Tavares <matheus.bernardino@usp.br> writes:

> Daniel Ferreira (1):
>   dir-iterator: add tests for dir-iterator API
>
> Matheus Tavares (8):
>   clone: better handle symlinked files at .git/objects/
>   dir-iterator: use warning_errno when possible
>   dir-iterator: refactor state machine model
>   dir-iterator: add flags parameter to dir_iterator_begin
>   clone: copy hidden paths at local clone
>   clone: extract function from copy_or_link_directory
>   clone: use dir-iterator to avoid explicit dir traversal
>   clone: replace strcmp by fspathcmp
>
> Ævar Arnfjörð Bjarmason (1):
>   clone: test for our behavior on odd objects/* content
>
>  Makefile                     |   1 +
>  builtin/clone.c              |  75 +++++----
>  dir-iterator.c               | 289 +++++++++++++++++++++--------------
>  dir-iterator.h               |  60 ++++++--
>  refs/files-backend.c         |  17 ++-
>  t/helper/test-dir-iterator.c |  58 +++++++
>  t/helper/test-tool.c         |   1 +
>  t/helper/test-tool.h         |   1 +
>  t/t0066-dir-iterator.sh      | 163 ++++++++++++++++++++
>  t/t5604-clone-reference.sh   | 133 ++++++++++++++++
>  10 files changed, 635 insertions(+), 163 deletions(-)
>  create mode 100644 t/helper/test-dir-iterator.c
>  create mode 100755 t/t0066-dir-iterator.sh

A higher level question is what's the benefit of using dir-iterator
API in the first place.  After subtracting 356 added lines to t/,
it still adds 279 lines while removing only 163 lines, so it is not
like "we have a perfect dir-iterator API that can be applied as-is
but an older code that predates dir-iterator API was still using an
old way, so let's make the latter use the former."



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests
  2019-06-20 20:18           ` Junio C Hamano
@ 2019-06-21 13:41             ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-06-21 13:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Kernel USP

On Thu, Jun 20, 2019 at 5:18 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > Daniel Ferreira (1):
> >   dir-iterator: add tests for dir-iterator API
> >
> > Matheus Tavares (8):
> >   clone: better handle symlinked files at .git/objects/
> >   dir-iterator: use warning_errno when possible
> >   dir-iterator: refactor state machine model
> >   dir-iterator: add flags parameter to dir_iterator_begin
> >   clone: copy hidden paths at local clone
> >   clone: extract function from copy_or_link_directory
> >   clone: use dir-iterator to avoid explicit dir traversal
> >   clone: replace strcmp by fspathcmp
> >
> > Ævar Arnfjörð Bjarmason (1):
> >   clone: test for our behavior on odd objects/* content
> >
> >  Makefile                     |   1 +
> >  builtin/clone.c              |  75 +++++----
> >  dir-iterator.c               | 289 +++++++++++++++++++++--------------
> >  dir-iterator.h               |  60 ++++++--
> >  refs/files-backend.c         |  17 ++-
> >  t/helper/test-dir-iterator.c |  58 +++++++
> >  t/helper/test-tool.c         |   1 +
> >  t/helper/test-tool.h         |   1 +
> >  t/t0066-dir-iterator.sh      | 163 ++++++++++++++++++++
> >  t/t5604-clone-reference.sh   | 133 ++++++++++++++++
> >  10 files changed, 635 insertions(+), 163 deletions(-)
> >  create mode 100644 t/helper/test-dir-iterator.c
> >  create mode 100755 t/t0066-dir-iterator.sh
>
> A higher level question is what's the benefit of using dir-iterator
> API in the first place.  After subtracting 356 added lines to t/,
> it still adds 279 lines while removing only 163 lines, so it is not
> like "we have a perfect dir-iterator API that can be applied as-is
> but an older code that predates dir-iterator API was still using an
> old way, so let's make the latter use the former."
>

Yes, indeed the dir-iterator API didn't nicely fit in clone without
some tweaking. Yet I think most of those line additions were not only
to adjust the API, but also trying to improve both dir-iterator and
local clone (I should have maybe split those changes into other
patchsets, though). For example, these changes make local clone better
handle possible symlinks and hidden files at git dir. And the API
changes should make it easier to apply it as-is in other sections of
the codebase from now on.

As for the benefit of using the API here, I think it mainly resides in
the security it brings, avoiding recursive iteration (even though it
should be shallow in local clone) and more carefully handling
symlinks.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-18 23:27           ` [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-06-25 18:00             ` Junio C Hamano
  2019-06-25 18:11               ` Matheus Tavares Bernardino
  2019-06-26 13:34             ` Johannes Schindelin
  2019-07-03  8:57             ` SZEDER Gábor
  2 siblings, 1 reply; 127+ messages in thread
From: Junio C Hamano @ 2019-06-25 18:00 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Michael Haggerty,
	Daniel Ferreira, Ramsay Jones

Matheus Tavares <matheus.bernardino@usp.br> writes:

This hunk, which claims to have 25 lines in the postimage ...

> @@ -44,6 +45,25 @@
>   * dir_iterator_advance() again.
>   */
>  
> +/*
> + * Flags for dir_iterator_begin:
> + *
> + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> + *   in case of an error at dir_iterator_advance(), which is to keep
> + *   looking for a next valid entry. With this flag, resources are freed
> + *   and ITER_ERROR is returned immediately. In both cases, a meaningful
> + *   warning is emitted. Note: ENOENT errors are always ignored so that
> + *   the API users may remove files during iteration.
> + *
> + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
> + *   i.e., linked directories' contents will be iterated over and
> + *   iter->base.st will contain information on the referred files,
> + *   not the symlinks themselves, which is the default behavior.
> + *   Recursive symlinks are skipped with a warning and broken symlinks
> + *   are ignored.
> + */
> +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> +
>  struct dir_iterator {
>  	/* The current path: */
>  	struct strbuf path;
> @@ -58,29 +78,38 @@ struct dir_iterator {

... adds 20 lines, making the postimage 26 lines long.

Did you hand edit your patch?  It is OK to do so, as long as you
know what you are doing ;-).  Adjust the length of the postimage on
the @@ ... @@ line to make it consistent with the patch text, and
also make sure a tweak you do here won't make later patches not
apply.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-25 18:00             ` Junio C Hamano
@ 2019-06-25 18:11               ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-06-25 18:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Kernel USP, Michael Haggerty,
	Daniel Ferreira, Ramsay Jones

On Tue, Jun 25, 2019 at 3:00 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> This hunk, which claims to have 25 lines in the postimage ...
>
> > @@ -44,6 +45,25 @@
> >   * dir_iterator_advance() again.
> >   */
> >
> > +/*
> > + * Flags for dir_iterator_begin:
> > + *
> > + * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
> > + *   in case of an error at dir_iterator_advance(), which is to keep
> > + *   looking for a next valid entry. With this flag, resources are freed
> > + *   and ITER_ERROR is returned immediately. In both cases, a meaningful
> > + *   warning is emitted. Note: ENOENT errors are always ignored so that
> > + *   the API users may remove files during iteration.
> > + *
> > + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
> > + *   i.e., linked directories' contents will be iterated over and
> > + *   iter->base.st will contain information on the referred files,
> > + *   not the symlinks themselves, which is the default behavior.
> > + *   Recursive symlinks are skipped with a warning and broken symlinks
> > + *   are ignored.
> > + */
> > +#define DIR_ITERATOR_PEDANTIC (1 << 0)
> > +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
> > +
> >  struct dir_iterator {
> >       /* The current path: */
> >       struct strbuf path;
> > @@ -58,29 +78,38 @@ struct dir_iterator {
>
> ... adds 20 lines, making the postimage 26 lines long.
>
> Did you hand edit your patch?  It is OK to do so, as long as you
> know what you are doing ;-).  Adjust the length of the postimage on
> the @@ ... @@ line to make it consistent with the patch text, and
> also make sure a tweak you do here won't make later patches not
> apply.

Oh, I'm sorry for that, I'll be more careful with hand editing next
time. Thanks for the advice. I think for this time it won't affect the
later patches as it was a minor addition at one comment, but should I
perhaps re-send it?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-18 23:27           ` [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
  2019-06-25 18:00             ` Junio C Hamano
@ 2019-06-26 13:34             ` Johannes Schindelin
  2019-06-26 18:04               ` Junio C Hamano
  2019-07-03  8:57             ` SZEDER Gábor
  2 siblings, 1 reply; 127+ messages in thread
From: Johannes Schindelin @ 2019-06-26 13:34 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, kernel-usp, Michael Haggerty,
	Daniel Ferreira, Ramsay Jones, Junio C Hamano

Hi Matheus,

On Tue, 18 Jun 2019, Matheus Tavares wrote:

>[...]
> +/*
> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> + * the previous stack levels. If it is found, return 1. If not, return 0.
> + */
> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> +{
> +	int i;
> +
> +	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> +	    !S_ISDIR(iter->base.st.st_mode))
> +		return 0;
>
> +	for (i = 0; i < iter->levels_nr; ++i)
> +		if (iter->base.st.st_ino == iter->levels[i].ino)

This does not work on Windows. Remember, Git relies on (too) many areas
where Linux is strong, and the `lstat()` call is one of them. Therefore,
Git overuses that call.

In the Git for Windows project, we struggled a bit to emulate it in the
best way.

It is pretty expensive, for example, to find out the number of hard
links, the device ID, an equivalent of the inode, etc. Many `lstat()`
calls are really only interested in the `mtime`, though, meaning that we
would waste a ton of time if we tried to be more faithful in our `lstat()`
emulation.

Therefore, we simply assign `0` as inode.

Sure, this violates the POSIX standard, but imagine this: the FAT
filesystem (which is still in use!) does not have _anything_ resembling
inodes.

I fear, therefore, that we will require at least a workaround for the
situation where `st_ino` is always zero.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-26 13:34             ` Johannes Schindelin
@ 2019-06-26 18:04               ` Junio C Hamano
  2019-06-27  9:20                 ` Duy Nguyen
  2019-06-27 17:23                 ` Matheus Tavares Bernardino
  0 siblings, 2 replies; 127+ messages in thread
From: Junio C Hamano @ 2019-06-26 18:04 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Matheus Tavares, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, kernel-usp, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Matheus,
>
> On Tue, 18 Jun 2019, Matheus Tavares wrote:
>
>>[...]
>> +/*
>> + * Look for a recursive symlink at iter->base.path pointing to any directory on
>> + * the previous stack levels. If it is found, return 1. If not, return 0.
>> + */
>> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
>> +{
>> +	int i;
>> +
>> +	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
>> +	    !S_ISDIR(iter->base.st.st_mode))
>> +		return 0;
>>
>> +	for (i = 0; i < iter->levels_nr; ++i)
>> +		if (iter->base.st.st_ino == iter->levels[i].ino)
>
> This does not work on Windows. [[ Windows port does not have
> usable st_ino field ]]]

And if you cross mountpoint, st_ino alone does not guarantee
uniqueness; you'd need to combine it with st_dev, I would think,
even on POSIX systems.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-26 18:04               ` Junio C Hamano
@ 2019-06-27  9:20                 ` Duy Nguyen
  2019-06-27 17:23                 ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 127+ messages in thread
From: Duy Nguyen @ 2019-06-27  9:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Matheus Tavares, Git Mailing List,
	Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, SZEDER Gábor, Olga Telezhnaya, kernel-usp,
	Michael Haggerty, Daniel Ferreira, Ramsay Jones

On Thu, Jun 27, 2019 at 1:04 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi Matheus,
> >
> > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> >
> >>[...]
> >> +/*
> >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> >> + */
> >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> >> +{
> >> +    int i;
> >> +
> >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> >> +        !S_ISDIR(iter->base.st.st_mode))
> >> +            return 0;
> >>
> >> +    for (i = 0; i < iter->levels_nr; ++i)
> >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> >
> > This does not work on Windows. [[ Windows port does not have
> > usable st_ino field ]]]
>
> And if you cross mountpoint, st_ino alone does not guarantee
> uniqueness; you'd need to combine it with st_dev, I would think,
> even on POSIX systems.

which should be protected by USE_STDEV. There's another code that
ignore st_ino on Windows in entry.c. Maybe it's time to define
USE_STINO instead of spreading "#if GIT_WINDOWS_NATIVE" more.
-- 
Duy

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-26 18:04               ` Junio C Hamano
  2019-06-27  9:20                 ` Duy Nguyen
@ 2019-06-27 17:23                 ` Matheus Tavares Bernardino
  2019-06-27 18:48                   ` Johannes Schindelin
  1 sibling, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-06-27 17:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones

On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi Matheus,
> >
> > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> >
> >>[...]
> >> +/*
> >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> >> + */
> >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> >> +{
> >> +    int i;
> >> +
> >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> >> +        !S_ISDIR(iter->base.st.st_mode))
> >> +            return 0;
> >>
> >> +    for (i = 0; i < iter->levels_nr; ++i)
> >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> >
> > This does not work on Windows. [[ Windows port does not have
> > usable st_ino field ]]]
>
> And if you cross mountpoint, st_ino alone does not guarantee
> uniqueness; you'd need to combine it with st_dev, I would think,
> even on POSIX systems.

Ok, thanks for letting me know. I'm trying to think of another
approach to test for recursive symlinks that does not rely on inode:
Given any symlink, we could get its real_path() and compare it with
the path of the directory current being iterated. If the first is a
prefix of the second, than we mark it as a recursive symlink.

What do you think of this idea?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-27 17:23                 ` Matheus Tavares Bernardino
@ 2019-06-27 18:48                   ` Johannes Schindelin
  2019-06-27 19:33                     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Johannes Schindelin @ 2019-06-27 18:48 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Junio C Hamano, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones

Hi Matheus,

On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:

> On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >
> > > Hi Matheus,
> > >
> > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > >
> > >>[...]
> > >> +/*
> > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > >> + */
> > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > >> +{
> > >> +    int i;
> > >> +
> > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > >> +        !S_ISDIR(iter->base.st.st_mode))
> > >> +            return 0;
> > >>
> > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > >
> > > This does not work on Windows. [[ Windows port does not have
> > > usable st_ino field ]]]
> >
> > And if you cross mountpoint, st_ino alone does not guarantee
> > uniqueness; you'd need to combine it with st_dev, I would think,
> > even on POSIX systems.
>
> Ok, thanks for letting me know. I'm trying to think of another
> approach to test for recursive symlinks that does not rely on inode:
> Given any symlink, we could get its real_path() and compare it with
> the path of the directory current being iterated. If the first is a
> prefix of the second, than we mark it as a recursive symlink.
>
> What do you think of this idea?

I think this would be pretty expensive. Too expensive.

A better method might be to rely on st_ino/st_dev when we can, and just
not bother looking for recursive symlinks when we cannot, like I did in
https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-27 18:48                   ` Johannes Schindelin
@ 2019-06-27 19:33                     ` Matheus Tavares Bernardino
  2019-06-28 12:51                       ` Johannes Schindelin
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-06-27 19:33 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones

On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
>
> > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > >
> > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > >
> > > > Hi Matheus,
> > > >
> > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > >
> > > >>[...]
> > > >> +/*
> > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > >> + */
> > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > >> +{
> > > >> +    int i;
> > > >> +
> > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > >> +            return 0;
> > > >>
> > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > >
> > > > This does not work on Windows. [[ Windows port does not have
> > > > usable st_ino field ]]]
> > >
> > > And if you cross mountpoint, st_ino alone does not guarantee
> > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > even on POSIX systems.
> >
> > Ok, thanks for letting me know. I'm trying to think of another
> > approach to test for recursive symlinks that does not rely on inode:
> > Given any symlink, we could get its real_path() and compare it with
> > the path of the directory current being iterated. If the first is a
> > prefix of the second, than we mark it as a recursive symlink.
> >
> > What do you think of this idea?
>
> I think this would be pretty expensive. Too expensive.

Hmm, yes unfortunately :(

> A better method might be to rely on st_ino/st_dev when we can, and just
> not bother looking for recursive symlinks when we cannot,

What if we fallback on the path prefix strategy when st_ino is not
available? I mean, if we don't look for recursive symlinks, they would
be iterated over and over until we get an ELOOP error. So I think
using real_path() should be less expensive in this case. (But just as
a fallback to st_ino, off course)

> like I did in
> https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a

Nice! At dir-iterator.h the documentation says that recursive symlinks
will be ignored. If we don't implement any fallback, should we add
that this is not available on Windows, perhaps?

> Ciao,
> Johannes

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-27 19:33                     ` Matheus Tavares Bernardino
@ 2019-06-28 12:51                       ` Johannes Schindelin
  2019-06-28 14:16                         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 127+ messages in thread
From: Johannes Schindelin @ 2019-06-28 12:51 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Junio C Hamano, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones

Hi Matheus,

On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:

> On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> >
> > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > >
> > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > > >
> > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > > >
> > > > >>[...]
> > > > >> +/*
> > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > > >> + */
> > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > > >> +{
> > > > >> +    int i;
> > > > >> +
> > > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > > >> +            return 0;
> > > > >>
> > > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > > >
> > > > > This does not work on Windows. [[ Windows port does not have
> > > > > usable st_ino field ]]]
> > > >
> > > > And if you cross mountpoint, st_ino alone does not guarantee
> > > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > > even on POSIX systems.
> > >
> > > Ok, thanks for letting me know. I'm trying to think of another
> > > approach to test for recursive symlinks that does not rely on inode:
> > > Given any symlink, we could get its real_path() and compare it with
> > > the path of the directory current being iterated. If the first is a
> > > prefix of the second, than we mark it as a recursive symlink.
> > >
> > > What do you think of this idea?
> >
> > I think this would be pretty expensive. Too expensive.
>
> Hmm, yes unfortunately :(
>
> > A better method might be to rely on st_ino/st_dev when we can, and just
> > not bother looking for recursive symlinks when we cannot,
>
> What if we fallback on the path prefix strategy when st_ino is not
> available? I mean, if we don't look for recursive symlinks, they would
> be iterated over and over until we get an ELOOP error. So I think
> using real_path() should be less expensive in this case. (But just as
> a fallback to st_ino, off course)
>
> > like I did in
> > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a
>
> Nice! At dir-iterator.h the documentation says that recursive symlinks
> will be ignored. If we don't implement any fallback, should we add
> that this is not available on Windows, perhaps?

I do not really care, unless it breaks things on Windows that were not
broken before.

You might also want to guard this behind `USE_STDEV` as Duy suggested (and
maybe use the opportunity to correct that constant to `USE_ST_DEV`; I
looked for it and did not find it because of that naming mistake).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-28 12:51                       ` Johannes Schindelin
@ 2019-06-28 14:16                         ` Matheus Tavares Bernardino
  2019-07-01 12:15                           ` Johannes Schindelin
  0 siblings, 1 reply; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-06-28 14:16 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones

On Fri, Jun 28, 2019 at 9:50 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
>
> > On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> > >
> > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > > >
> > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > > > >
> > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > > > >
> > > > > >>[...]
> > > > > >> +/*
> > > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > > > >> + */
> > > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > > > >> +{
> > > > > >> +    int i;
> > > > > >> +
> > > > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > > > >> +            return 0;
> > > > > >>
> > > > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > > > >
> > > > > > This does not work on Windows. [[ Windows port does not have
> > > > > > usable st_ino field ]]]
> > > > >
> > > > > And if you cross mountpoint, st_ino alone does not guarantee
> > > > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > > > even on POSIX systems.
> > > >
> > > > Ok, thanks for letting me know. I'm trying to think of another
> > > > approach to test for recursive symlinks that does not rely on inode:
> > > > Given any symlink, we could get its real_path() and compare it with
> > > > the path of the directory current being iterated. If the first is a
> > > > prefix of the second, than we mark it as a recursive symlink.
> > > >
> > > > What do you think of this idea?
> > >
> > > I think this would be pretty expensive. Too expensive.
> >
> > Hmm, yes unfortunately :(
> >
> > > A better method might be to rely on st_ino/st_dev when we can, and just
> > > not bother looking for recursive symlinks when we cannot,
> >
> > What if we fallback on the path prefix strategy when st_ino is not
> > available? I mean, if we don't look for recursive symlinks, they would
> > be iterated over and over until we get an ELOOP error. So I think
> > using real_path() should be less expensive in this case. (But just as
> > a fallback to st_ino, off course)
> >
> > > like I did in
> > > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a
> >
> > Nice! At dir-iterator.h the documentation says that recursive symlinks
> > will be ignored. If we don't implement any fallback, should we add
> > that this is not available on Windows, perhaps?
>
> I do not really care, unless it breaks things on Windows that were not
> broken before.
>
> You might also want to guard this behind `USE_STDEV` as Duy suggested (and
> maybe use the opportunity to correct that constant to `USE_ST_DEV`; I
> looked for it and did not find it because of that naming mistake).

Ok, just to confirm, what I should do is send your fixup patch with
the USE_STDEV guard addition, right? Also, USE_STDEV docs says it is
used "from the update-index perspective", should I make it more
generic as we're using it for other purposes or is it OK like this?

Thanks,
Matheus

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-28 14:16                         ` Matheus Tavares Bernardino
@ 2019-07-01 12:15                           ` Johannes Schindelin
  0 siblings, 0 replies; 127+ messages in thread
From: Johannes Schindelin @ 2019-07-01 12:15 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Junio C Hamano, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones

Hi Matheus,

On Fri, 28 Jun 2019, Matheus Tavares Bernardino wrote:

> On Fri, Jun 28, 2019 at 9:50 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Matheus,
> >
> > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> >
> > > On Thu, Jun 27, 2019 at 3:47 PM Johannes Schindelin
> > > <Johannes.Schindelin@gmx.de> wrote:
> > > >
> > > > On Thu, 27 Jun 2019, Matheus Tavares Bernardino wrote:
> > > >
> > > > > On Wed, Jun 26, 2019 at 3:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > > > >
> > > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > > > > >
> > > > > > > On Tue, 18 Jun 2019, Matheus Tavares wrote:
> > > > > > >
> > > > > > >>[...]
> > > > > > >> +/*
> > > > > > >> + * Look for a recursive symlink at iter->base.path pointing to any directory on
> > > > > > >> + * the previous stack levels. If it is found, return 1. If not, return 0.
> > > > > > >> + */
> > > > > > >> +static int find_recursive_symlinks(struct dir_iterator_int *iter)
> > > > > > >> +{
> > > > > > >> +    int i;
> > > > > > >> +
> > > > > > >> +    if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
> > > > > > >> +        !S_ISDIR(iter->base.st.st_mode))
> > > > > > >> +            return 0;
> > > > > > >>
> > > > > > >> +    for (i = 0; i < iter->levels_nr; ++i)
> > > > > > >> +            if (iter->base.st.st_ino == iter->levels[i].ino)
> > > > > > >
> > > > > > > This does not work on Windows. [[ Windows port does not have
> > > > > > > usable st_ino field ]]]
> > > > > >
> > > > > > And if you cross mountpoint, st_ino alone does not guarantee
> > > > > > uniqueness; you'd need to combine it with st_dev, I would think,
> > > > > > even on POSIX systems.
> > > > >
> > > > > Ok, thanks for letting me know. I'm trying to think of another
> > > > > approach to test for recursive symlinks that does not rely on inode:
> > > > > Given any symlink, we could get its real_path() and compare it with
> > > > > the path of the directory current being iterated. If the first is a
> > > > > prefix of the second, than we mark it as a recursive symlink.
> > > > >
> > > > > What do you think of this idea?
> > > >
> > > > I think this would be pretty expensive. Too expensive.
> > >
> > > Hmm, yes unfortunately :(
> > >
> > > > A better method might be to rely on st_ino/st_dev when we can, and just
> > > > not bother looking for recursive symlinks when we cannot,
> > >
> > > What if we fallback on the path prefix strategy when st_ino is not
> > > available? I mean, if we don't look for recursive symlinks, they would
> > > be iterated over and over until we get an ELOOP error. So I think
> > > using real_path() should be less expensive in this case. (But just as
> > > a fallback to st_ino, off course)
> > >
> > > > like I did in
> > > > https://github.com/git-for-windows/git/commit/979b00ccf44ec31cff4686e24adf27474923c33a
> > >
> > > Nice! At dir-iterator.h the documentation says that recursive symlinks
> > > will be ignored. If we don't implement any fallback, should we add
> > > that this is not available on Windows, perhaps?
> >
> > I do not really care, unless it breaks things on Windows that were not
> > broken before.
> >
> > You might also want to guard this behind `USE_STDEV` as Duy suggested (and
> > maybe use the opportunity to correct that constant to `USE_ST_DEV`; I
> > looked for it and did not find it because of that naming mistake).
>
> Ok, just to confirm, what I should do is send your fixup patch with
> the USE_STDEV guard addition, right? Also, USE_STDEV docs says it is
> used "from the update-index perspective", should I make it more
> generic as we're using it for other purposes or is it OK like this?

I thought Duy had verified that `USE_STDEV` would make sense in this
instance, but I agree with you that the idea of that compile time flag was
not to guard against a missing `st_dev` field, but about trusting it in
the presence of network filesystems.

So no, I revert my vote for using `USE_STDEV`.

Thanks for the sanity check.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-06-18 23:27           ` [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
  2019-06-25 18:00             ` Junio C Hamano
  2019-06-26 13:34             ` Johannes Schindelin
@ 2019-07-03  8:57             ` SZEDER Gábor
  2019-07-08 22:21               ` Matheus Tavares Bernardino
  2 siblings, 1 reply; 127+ messages in thread
From: SZEDER Gábor @ 2019-07-03  8:57 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	Olga Telezhnaya, kernel-usp, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones, Junio C Hamano

> diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
> index c739ed7911..8f996a31fa 100755
> --- a/t/t0066-dir-iterator.sh
> +++ b/t/t0066-dir-iterator.sh
> @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
>  	test_cmp expected-non-dir-output actual-non-dir-output
>  '
>  
> +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
> +	cat >expected-no-permissions-output <<-EOF &&
> +	[d] (a) [a] ./dir3/a
> +	EOF
> +
> +	mkdir -p dir3/a &&
> +	> dir3/a/b &&

Style nit: space between redirection op and pathname.

> +	chmod 0 dir3/a &&
> +
> +	test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
> +	test_cmp expected-no-permissions-output actual-no-permissions-output &&
> +	chmod 755 dir3/a &&
> +	rm -rf dir3
> +'
> +
> +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
> +	cat >expected-no-permissions-pedantic-output <<-EOF &&
> +	[d] (a) [a] ./dir3/a
> +	dir_iterator_advance failure
> +	EOF
> +
> +	mkdir -p dir3/a &&
> +	> dir3/a/b &&

Likewise.

> +	chmod 0 dir3/a &&
> +
> +	test_must_fail test-tool dir-iterator --pedantic ./dir3 \
> +		>actual-no-permissions-pedantic-output &&
> +	test_cmp expected-no-permissions-pedantic-output \
> +		actual-no-permissions-pedantic-output &&
> +	chmod 755 dir3/a &&
> +	rm -rf dir3
> +'
> +
> +test_expect_success SYMLINKS 'setup dirs with symlinks' '
> +	mkdir -p dir4/a &&
> +	mkdir -p dir4/b/c &&
> +	>dir4/a/d &&
> +	ln -s d dir4/a/e &&
> +	ln -s ../b dir4/a/f &&
> +
> +	mkdir -p dir5/a/b &&
> +	mkdir -p dir5/a/c &&
> +	ln -s ../c dir5/a/b/d &&
> +	ln -s ../ dir5/a/b/e &&
> +	ln -s ../../ dir5/a/b/f
> +'
> +
> +test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
> +	cat >expected-no-follow-sorted-output <<-EOF &&
> +	[d] (a) [a] ./dir4/a
> +	[d] (b) [b] ./dir4/b
> +	[d] (b/c) [c] ./dir4/b/c
> +	[f] (a/d) [d] ./dir4/a/d
> +	[s] (a/e) [e] ./dir4/a/e
> +	[s] (a/f) [f] ./dir4/a/f
> +	EOF
> +
> +	test-tool dir-iterator ./dir4 >out &&
> +	sort <out >actual-no-follow-sorted-output &&

Unnecessary redirection, 'sort' is capable to open the file on its
own.

> +
> +	test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
> +'
> +
> +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
> +	cat >expected-follow-sorted-output <<-EOF &&
> +	[d] (a) [a] ./dir4/a
> +	[d] (a/f) [f] ./dir4/a/f
> +	[d] (a/f/c) [c] ./dir4/a/f/c
> +	[d] (b) [b] ./dir4/b
> +	[d] (b/c) [c] ./dir4/b/c
> +	[f] (a/d) [d] ./dir4/a/d
> +	[f] (a/e) [e] ./dir4/a/e
> +	EOF
> +
> +	test-tool dir-iterator --follow-symlinks ./dir4 >out &&
> +	sort <out >actual-follow-sorted-output &&

Likewise.

> +	test_cmp expected-follow-sorted-output actual-follow-sorted-output
> +'
> +
> +
> +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
> +	cat >expected-rec-symlinks-sorted-output <<-EOF &&
> +	[d] (a) [a] ./dir5/a
> +	[d] (a/b) [b] ./dir5/a/b
> +	[d] (a/b/d) [d] ./dir5/a/b/d
> +	[d] (a/c) [c] ./dir5/a/c
> +	EOF
> +
> +	test-tool dir-iterator --follow-symlinks ./dir5 >out &&
> +	sort <out >actual-rec-symlinks-sorted-output &&

Likewise.

> +	test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
> +'
> +
>  test_done
> -- 
> 2.22.0
> 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-07-03  8:57             ` SZEDER Gábor
@ 2019-07-08 22:21               ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-07-08 22:21 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	Olga Telezhnaya, Kernel USP, Michael Haggerty, Daniel Ferreira,
	Ramsay Jones, Junio C Hamano

Thanks for the review. I'll address those issues in v8.

Best,
Matheus


On Wed, Jul 3, 2019 at 5:57 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> > diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
> > index c739ed7911..8f996a31fa 100755
> > --- a/t/t0066-dir-iterator.sh
> > +++ b/t/t0066-dir-iterator.sh
> > @@ -65,4 +65,99 @@ test_expect_success 'begin should fail upon non directory paths' '
> >       test_cmp expected-non-dir-output actual-non-dir-output
> >  '
> >
> > +test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
> > +     cat >expected-no-permissions-output <<-EOF &&
> > +     [d] (a) [a] ./dir3/a
> > +     EOF
> > +
> > +     mkdir -p dir3/a &&
> > +     > dir3/a/b &&
>
> Style nit: space between redirection op and pathname.
>
> > +     chmod 0 dir3/a &&
> > +
> > +     test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
> > +     test_cmp expected-no-permissions-output actual-no-permissions-output &&
> > +     chmod 755 dir3/a &&
> > +     rm -rf dir3
> > +'
> > +
> > +test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
> > +     cat >expected-no-permissions-pedantic-output <<-EOF &&
> > +     [d] (a) [a] ./dir3/a
> > +     dir_iterator_advance failure
> > +     EOF
> > +
> > +     mkdir -p dir3/a &&
> > +     > dir3/a/b &&
>
> Likewise.
>
> > +     chmod 0 dir3/a &&
> > +
> > +     test_must_fail test-tool dir-iterator --pedantic ./dir3 \
> > +             >actual-no-permissions-pedantic-output &&
> > +     test_cmp expected-no-permissions-pedantic-output \
> > +             actual-no-permissions-pedantic-output &&
> > +     chmod 755 dir3/a &&
> > +     rm -rf dir3
> > +'
> > +
> > +test_expect_success SYMLINKS 'setup dirs with symlinks' '
> > +     mkdir -p dir4/a &&
> > +     mkdir -p dir4/b/c &&
> > +     >dir4/a/d &&
> > +     ln -s d dir4/a/e &&
> > +     ln -s ../b dir4/a/f &&
> > +
> > +     mkdir -p dir5/a/b &&
> > +     mkdir -p dir5/a/c &&
> > +     ln -s ../c dir5/a/b/d &&
> > +     ln -s ../ dir5/a/b/e &&
> > +     ln -s ../../ dir5/a/b/f
> > +'
> > +
> > +test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
> > +     cat >expected-no-follow-sorted-output <<-EOF &&
> > +     [d] (a) [a] ./dir4/a
> > +     [d] (b) [b] ./dir4/b
> > +     [d] (b/c) [c] ./dir4/b/c
> > +     [f] (a/d) [d] ./dir4/a/d
> > +     [s] (a/e) [e] ./dir4/a/e
> > +     [s] (a/f) [f] ./dir4/a/f
> > +     EOF
> > +
> > +     test-tool dir-iterator ./dir4 >out &&
> > +     sort <out >actual-no-follow-sorted-output &&
>
> Unnecessary redirection, 'sort' is capable to open the file on its
> own.
>
> > +
> > +     test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
> > +'
> > +
> > +test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
> > +     cat >expected-follow-sorted-output <<-EOF &&
> > +     [d] (a) [a] ./dir4/a
> > +     [d] (a/f) [f] ./dir4/a/f
> > +     [d] (a/f/c) [c] ./dir4/a/f/c
> > +     [d] (b) [b] ./dir4/b
> > +     [d] (b/c) [c] ./dir4/b/c
> > +     [f] (a/d) [d] ./dir4/a/d
> > +     [f] (a/e) [e] ./dir4/a/e
> > +     EOF
> > +
> > +     test-tool dir-iterator --follow-symlinks ./dir4 >out &&
> > +     sort <out >actual-follow-sorted-output &&
>
> Likewise.
>
> > +     test_cmp expected-follow-sorted-output actual-follow-sorted-output
> > +'
> > +
> > +
> > +test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
> > +     cat >expected-rec-symlinks-sorted-output <<-EOF &&
> > +     [d] (a) [a] ./dir5/a
> > +     [d] (a/b) [b] ./dir5/a/b
> > +     [d] (a/b/d) [d] ./dir5/a/b/d
> > +     [d] (a/c) [c] ./dir5/a/c
> > +     EOF
> > +
> > +     test-tool dir-iterator --follow-symlinks ./dir5 >out &&
> > +     sort <out >actual-rec-symlinks-sorted-output &&
>
> Likewise.
>
> > +     test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
> > +'
> > +
> >  test_done
> > --
> > 2.22.0
> >

^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 00/10] clone: dir-iterator refactoring with tests
  2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
                             ` (11 preceding siblings ...)
  2019-06-20 20:18           ` Junio C Hamano
@ 2019-07-10 23:58           ` " Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
                               ` (10 more replies)
  12 siblings, 11 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp

This patchset contains:
- tests to the dir-iterator API;
- dir-iterator refactoring to make its state machine simpler
  and feature adding with tests;
- a replacement of explicit recursive dir iteration at
  copy_or_link_directory for the dir-iterator API;
- some refactoring and behavior changes at local clone, mainly to
  take care of symlinks and hidden files at .git/objects, together
  with tests for these types of files.

Changes since v7[1]:
- Applied some style fixes at tests, as suggested by SZEDER
- Removed the code to find circular symlinks as suggested in this[2]
thread. The way it was previously implemented wouldn't work on Windows.
So Dscho suggested me to remove this section until we come up with a
more portable implementation.

[1]: https://public-inbox.org/git/cover.1560898723.git.matheus.bernardino@usp.br/
[2]: https://public-inbox.org/git/nycvar.QRO.7.76.6.1907041136530.44@tvgsbejvaqbjf.bet/
travis build: https://travis-ci.org/matheustavares/git/builds/557047597

Daniel Ferreira (1):
  dir-iterator: add tests for dir-iterator API

Matheus Tavares (8):
  clone: better handle symlinked files at .git/objects/
  dir-iterator: use warning_errno when possible
  dir-iterator: refactor state machine model
  dir-iterator: add flags parameter to dir_iterator_begin
  clone: copy hidden paths at local clone
  clone: extract function from copy_or_link_directory
  clone: use dir-iterator to avoid explicit dir traversal
  clone: replace strcmp by fspathcmp

Ævar Arnfjörð Bjarmason (1):
  clone: test for our behavior on odd objects/* content

 Makefile                     |   1 +
 builtin/clone.c              |  75 +++++-----
 dir-iterator.c               | 263 ++++++++++++++++++++---------------
 dir-iterator.h               |  64 +++++++--
 refs/files-backend.c         |  17 ++-
 t/helper/test-dir-iterator.c |  58 ++++++++
 t/helper/test-tool.c         |   1 +
 t/helper/test-tool.h         |   1 +
 t/t0066-dir-iterator.sh      | 148 ++++++++++++++++++++
 t/t5604-clone-reference.sh   | 133 ++++++++++++++++++
 10 files changed, 597 insertions(+), 164 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

Range-diff against v7:
 1:  437b1eb1c7 !  1:  a2016d9d3b clone: test for our behavior on odd objects/* content
    @@ -98,7 +98,7 @@
     +		mv $last_loose a-loose-dir &&
     +		ln -s a-loose-dir $last_loose &&
     +		find . -type f | sort >../../../T.objects-files.raw &&
    -+		echo unknown_content> unknown_file
    ++		echo unknown_content >unknown_file
     +	) &&
     +	git -C T fsck &&
     +	git -C T rev-list --all --objects >T.objects
 2:  108bea2652 !  2:  47a4f9b31c clone: better handle symlinked files at .git/objects/
    @@ -80,7 +80,7 @@
     +		cd ../ &&
      		find . -type f | sort >../../../T.objects-files.raw &&
     +		find . -type l | sort >../../../T.objects-symlinks.raw &&
    - 		echo unknown_content> unknown_file
    + 		echo unknown_content >unknown_file
      	) &&
      	git -C T fsck &&
     @@
 3:  2c0232be6c !  3:  bbce6a601b dir-iterator: add tests for dir-iterator API
    @@ -129,7 +129,7 @@
     +	EOF
     +
     +	test-tool dir-iterator ./dir >out &&
    -+	sort <out >./actual-iteration-sorted-output &&
    ++	sort out >./actual-iteration-sorted-output &&
     +
     +	test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
     +'
 4:  0b76044165 =  4:  0cc5f1f0b4 dir-iterator: use warning_errno when possible
 5:  44c47d579c !  5:  f871b5d3f4 dir-iterator: refactor state machine model
    @@ -340,14 +340,14 @@
       * A typical iteration looks like this:
       *
       *     int ok;
    -  *     struct iterator *iter = dir_iterator_begin(path);
    -  *
    +- *     struct iterator *iter = dir_iterator_begin(path);
    ++ *     struct dir_iterator *iter = dir_iterator_begin(path);
    ++ *
     + *     if (!iter)
     + *             goto error_handler;
    -+ *
    +  *
       *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
       *             if (want_to_stop_iteration()) {
    -  *                     ok = dir_iterator_abort(iter);
     @@
      };
      
 6:  86fc04ad0e !  6:  fe838d7eb4 dir-iterator: add flags parameter to dir_iterator_begin
    @@ -22,16 +22,6 @@
      diff --git a/dir-iterator.c b/dir-iterator.c
      --- a/dir-iterator.c
      +++ b/dir-iterator.c
    -@@
    - struct dir_iterator_level {
    - 	DIR *dir;
    - 
    -+	/* The inode number of this level's directory. */
    -+	ino_t ino;
    -+
    - 	/*
    - 	 * The length of the directory part of path at this level
    - 	 * (including a trailing '/'):
     @@
      	 * that will be included in this iteration.
      	 */
    @@ -51,10 +41,6 @@
      static int push_level(struct dir_iterator_int *iter)
      {
     @@
    - 	if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
    - 		strbuf_addch(&iter->base.path, '/');
    - 	level->prefix_len = iter->base.path.len;
    -+	level->ino = iter->base.st.st_ino;
      
      	level->dir = opendir(iter->base.path.buf);
      	if (!level->dir) {
    @@ -96,33 +82,17 @@
     +		err = stat(iter->base.path.buf, &iter->base.st);
     +	else
     +		err = lstat(iter->base.path.buf, &iter->base.st);
    -+
    + 
    +-	return 0;
     +	saved_errno = errno;
     +	if (err && errno != ENOENT)
     +		warning_errno("failed to stat '%s'", iter->base.path.buf);
     +
     +	errno = saved_errno;
     +	return err;
    -+}
    -+
    -+/*
    -+ * Look for a recursive symlink at iter->base.path pointing to any directory on
    -+ * the previous stack levels. If it is found, return 1. If not, return 0.
    -+ */
    -+static int find_recursive_symlinks(struct dir_iterator_int *iter)
    -+{
    -+	int i;
    -+
    -+	if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) ||
    -+	    !S_ISDIR(iter->base.st.st_mode))
    -+		return 0;
    - 
    -+	for (i = 0; i < iter->levels_nr; ++i)
    -+		if (iter->base.st.st_ino == iter->levels[i].ino)
    -+			return 1;
    - 	return 0;
      }
      
    + int dir_iterator_advance(struct dir_iterator *dir_iterator)
     @@
      	struct dir_iterator_int *iter =
      		(struct dir_iterator_int *)dir_iterator;
    @@ -165,12 +135,6 @@
     +			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
     +				goto error_out;
      			continue;
    -+		}
    -+
    -+		if (find_recursive_symlinks(iter)) {
    -+			warning("ignoring recursive symlink at '%s'",
    -+				iter->base.path.buf);
    -+			continue;
     +		}
      
      		return ITER_OK;
    @@ -207,7 +171,7 @@
       * A typical iteration looks like this:
       *
       *     int ok;
    -- *     struct iterator *iter = dir_iterator_begin(path);
    +- *     struct dir_iterator *iter = dir_iterator_begin(path);
     + *     unsigned int flags = DIR_ITERATOR_PEDANTIC;
     + *     struct dir_iterator *iter = dir_iterator_begin(path, flags);
       *
    @@ -230,9 +194,12 @@
     + * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
     + *   i.e., linked directories' contents will be iterated over and
     + *   iter->base.st will contain information on the referred files,
    -+ *   not the symlinks themselves, which is the default behavior.
    -+ *   Recursive symlinks are skipped with a warning and broken symlinks
    -+ *   are ignored.
    ++ *   not the symlinks themselves, which is the default behavior. Broken
    ++ *   symlinks are ignored.
    ++ *
    ++ * Warning: circular symlinks are also followed when
    ++ * DIR_ITERATOR_FOLLOW_SYMLINKS is set. The iteration may end up with
    ++ * an ELOOP if they happen and DIR_ITERATOR_PEDANTIC is set.
     + */
     +#define DIR_ITERATOR_PEDANTIC (1 << 0)
     +#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
    @@ -383,7 +350,7 @@
     +	EOF
     +
     +	mkdir -p dir3/a &&
    -+	> dir3/a/b &&
    ++	>dir3/a/b &&
     +	chmod 0 dir3/a &&
     +
     +	test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
    @@ -399,7 +366,7 @@
     +	EOF
     +
     +	mkdir -p dir3/a &&
    -+	> dir3/a/b &&
    ++	>dir3/a/b &&
     +	chmod 0 dir3/a &&
     +
     +	test_must_fail test-tool dir-iterator --pedantic ./dir3 \
    @@ -435,7 +402,7 @@
     +	EOF
     +
     +	test-tool dir-iterator ./dir4 >out &&
    -+	sort <out >actual-no-follow-sorted-output &&
    ++	sort out >actual-no-follow-sorted-output &&
     +
     +	test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
     +'
    @@ -452,24 +419,9 @@
     +	EOF
     +
     +	test-tool dir-iterator --follow-symlinks ./dir4 >out &&
    -+	sort <out >actual-follow-sorted-output &&
    ++	sort out >actual-follow-sorted-output &&
     +
     +	test_cmp expected-follow-sorted-output actual-follow-sorted-output
     +'
    -+
    -+
    -+test_expect_success SYMLINKS 'dir-iterator should ignore recursive symlinks w/ follow flag' '
    -+	cat >expected-rec-symlinks-sorted-output <<-EOF &&
    -+	[d] (a) [a] ./dir5/a
    -+	[d] (a/b) [b] ./dir5/a/b
    -+	[d] (a/b/d) [d] ./dir5/a/b/d
    -+	[d] (a/c) [c] ./dir5/a/c
    -+	EOF
    -+
    -+	test-tool dir-iterator --follow-symlinks ./dir5 >out &&
    -+	sort <out >actual-rec-symlinks-sorted-output &&
    -+
    -+	test_cmp expected-rec-symlinks-sorted-output actual-rec-symlinks-sorted-output
    -+'
     +
      test_done
 7:  17685057cd =  7:  3da6408e04 clone: copy hidden paths at local clone
 8:  c7f3a8640e =  8:  af7430eb2c clone: extract function from copy_or_link_directory
 9:  7934036d30 !  9:  e8308c7408 clone: use dir-iterator to avoid explicit dir traversal
    @@ -11,11 +11,7 @@
         error on readdir or stat inside dir_iterator_advance. Previously it
         would just print a warning for errors on stat and ignore errors on
         readdir, which isn't nice because a local git clone could succeed even
    -    though the .git/objects copy didn't fully succeed. Also, with the
    -    dir-iterator API, recursive symlinks will be detected and skipped. This
    -    is another behavior improvement, since the current version would
    -    continue to copy the same content over and over until stat() returned an
    -    ELOOP error.
    +    though the .git/objects copy didn't fully succeed.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
10:  2e25c03c07 = 10:  782ca07eed clone: replace strcmp by fspathcmp
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 01/10] clone: test for our behavior on odd objects/* content
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
@ 2019-07-10 23:58             ` Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
                               ` (9 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp, Alex Riesen

From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Add tests for what happens when we perform a local clone on a repo
containing odd files at .git/object directory, such as symlinks to other
dirs, or unknown files.

I'm bending over backwards here to avoid a SHA-1 dependency. See [1]
for an earlier and simpler version that hardcoded SHA-1s.

This behavior has been the same for a *long* time, but hasn't been
tested for.

There's a good post-hoc argument to be made for copying over unknown
things, e.g. I'd like a git version that doesn't know about the
commit-graph to copy it under "clone --local" so a newer git version
can make use of it.

In follow-up commits we'll look at changing some of this behavior, but
for now, let's just assert it as-is so we'll notice what we'll change
later.

1. https://public-inbox.org/git/20190226002625.13022-5-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
[matheus.bernardino: improved and split tests in more than one patch]
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t5604-clone-reference.sh | 111 +++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 4320082b1b..11250cab40 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -221,4 +221,115 @@ test_expect_success 'clone, dissociate from alternates' '
 	( cd C && git fsck )
 '
 
+test_expect_success 'setup repo with garbage in objects/*' '
+	git init S &&
+	(
+		cd S &&
+		test_commit A &&
+
+		cd .git/objects &&
+		>.some-hidden-file &&
+		>some-file &&
+		mkdir .some-hidden-dir &&
+		>.some-hidden-dir/some-file &&
+		>.some-hidden-dir/.some-dot-file &&
+		mkdir some-dir &&
+		>some-dir/some-file &&
+		>some-dir/.some-dot-file
+	)
+'
+
+test_expect_success 'clone a repo with garbage in objects/*' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option S S$option || return 1 &&
+		git -C S$option fsck || return 1
+	done &&
+	find S-* -name "*some*" | sort >actual &&
+	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-file
+	S--dissociate/.git/objects/some-dir
+	S--dissociate/.git/objects/some-dir/.some-dot-file
+	S--dissociate/.git/objects/some-dir/some-file
+	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-file
+	S--local/.git/objects/some-dir
+	S--local/.git/objects/some-dir/.some-dot-file
+	S--local/.git/objects/some-dir/some-file
+	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-file
+	S--no-hardlinks/.git/objects/some-dir
+	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/some-dir/some-file
+	S--no-hardlinks/.git/objects/some-file
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+	git init T &&
+	(
+		cd T &&
+		git config gc.auto 0 &&
+		test_commit A &&
+		git gc &&
+		test_commit B &&
+
+		cd .git/objects &&
+		mv pack packs &&
+		ln -s packs pack &&
+		find ?? -type d >loose-dirs &&
+		last_loose=$(tail -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+		mv $last_loose a-loose-dir &&
+		ln -s a-loose-dir $last_loose &&
+		find . -type f | sort >../../../T.objects-files.raw &&
+		echo unknown_content >unknown_file
+	) &&
+	git -C T fsck &&
+	git -C T rev-list --all --objects >T.objects
+'
+
+
+test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+	for option in --local --no-hardlinks --shared --dissociate
+	do
+		git clone $option T T$option || return 1 &&
+		git -C T$option fsck || return 1 &&
+		git -C T$option rev-list --all --objects >T$option.objects &&
+		test_cmp T.objects T$option.objects &&
+		(
+			cd T$option/.git/objects &&
+			find . -type f | sort >../../../T$option.objects-files.raw
+		)
+	done &&
+
+	for raw in $(ls T*.raw)
+	do
+		sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
+		    -e "/multi-pack-index/d" <$raw >$raw.de-sha || return 1
+	done &&
+
+	cat >expected-files <<-EOF &&
+	./Y/Z
+	./Y/Z
+	./a-loose-dir/Z
+	./Y/Z
+	./info/packs
+	./pack/pack-Z.idx
+	./pack/pack-Z.pack
+	./packs/pack-Z.idx
+	./packs/pack-Z.pack
+	./unknown_file
+	EOF
+
+	for option in --local --dissociate --no-hardlinks
+	do
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+	done &&
+
+	echo ./info/alternates >expected-files &&
+	test_cmp expected-files T--shared.objects-files.raw
+'
+
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 02/10] clone: better handle symlinked files at .git/objects/
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
@ 2019-07-10 23:58             ` Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
                               ` (8 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp

There is currently an odd behaviour when locally cloning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it, Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
    $ touch a
    $ ln -s a b
    $ link b c
    $ ls -li a b c
    155 [...] a
    156 [...] b -> a
    156 [...] c -> a

But on NetBSD:
    $ ls -li a b c
    2609160 [...] a
    2609164 [...] b -> a
    2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
besides that, the current behaviour on GNU/Linux may result in broken
symlinks. So let's standardize this by making the hardlinks always point
to dereferenced paths, instead of the symlinks themselves. Also, add
tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            |  2 +-
 t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5b9ebe9947..4a0a2455a7 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -445,7 +445,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (unlink(dest->buf) && errno != ENOENT)
 			die_errno(_("failed to unlink '%s'"), dest->buf);
 		if (!option_no_hardlinks) {
-			if (!link(src->buf, dest->buf))
+			if (!link(real_path(src->buf), dest->buf))
 				continue;
 			if (option_local > 0)
 				die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 11250cab40..459ad8a20b 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	test_cmp expected actual
 '
 
-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
 	git init T &&
 	(
 		cd T &&
@@ -280,10 +280,19 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 		ln -s packs pack &&
 		find ?? -type d >loose-dirs &&
 		last_loose=$(tail -n 1 loose-dirs) &&
-		rm -f loose-dirs &&
 		mv $last_loose a-loose-dir &&
 		ln -s a-loose-dir $last_loose &&
+		first_loose=$(head -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+
+		cd $first_loose &&
+		obj=$(ls *) &&
+		mv $obj ../an-object &&
+		ln -s ../an-object $obj &&
+
+		cd ../ &&
 		find . -type f | sort >../../../T.objects-files.raw &&
+		find . -type l | sort >../../../T.objects-symlinks.raw &&
 		echo unknown_content >unknown_file
 	) &&
 	git -C T fsck &&
@@ -291,7 +300,7 @@ test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 '
 
 
-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
 	for option in --local --no-hardlinks --shared --dissociate
 	do
 		git clone $option T T$option || return 1 &&
@@ -300,7 +309,8 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 		test_cmp T.objects T$option.objects &&
 		(
 			cd T$option/.git/objects &&
-			find . -type f | sort >../../../T$option.objects-files.raw
+			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type l | sort >../../../T$option.objects-symlinks.raw
 		)
 	done &&
 
@@ -314,6 +324,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./Y/Z
 	./Y/Z
 	./a-loose-dir/Z
+	./an-object
 	./Y/Z
 	./info/packs
 	./pack/pack-Z.idx
@@ -323,13 +334,15 @@ test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./unknown_file
 	EOF
 
-	for option in --local --dissociate --no-hardlinks
+	for option in --local --no-hardlinks --dissociate
 	do
-		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+		test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
 	done &&
 
 	echo ./info/alternates >expected-files &&
-	test_cmp expected-files T--shared.objects-files.raw
+	test_cmp expected-files T--shared.objects-files.raw &&
+	test_must_be_empty T--shared.objects-symlinks.raw
 '
 
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 03/10] dir-iterator: add tests for dir-iterator API
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
@ 2019-07-10 23:58             ` Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
                               ` (7 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp, Daniel Ferreira

From: Daniel Ferreira <bnmvco@gmail.com>

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir-iterator.

Create t/t0066-dir-iterator.sh, which tests that dir-iterator does
iterate through a whole directory tree as expected.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
[matheus.bernardino: update to use test-tool and some minor aesthetics]
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile                     |  1 +
 t/helper/test-dir-iterator.c | 33 ++++++++++++++++++++++
 t/helper/test-tool.c         |  1 +
 t/helper/test-tool.h         |  1 +
 t/t0066-dir-iterator.sh      | 55 ++++++++++++++++++++++++++++++++++++
 5 files changed, 91 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0066-dir-iterator.sh

diff --git a/Makefile b/Makefile
index f58bf14c7b..7e2a44cccc 100644
--- a/Makefile
+++ b/Makefile
@@ -704,6 +704,7 @@ TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
+TEST_BUILTINS_OBJS += test-dir-iterator.o
 TEST_BUILTINS_OBJS += test-drop-caches.o
 TEST_BUILTINS_OBJS += test-dump-cache-tree.o
 TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000000..84f50bed8c
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,33 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+/* Argument is a directory path to iterate over */
+int cmd__dir_iterator(int argc, const char **argv)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2)
+		die("BUG: test-dir-iterator needs one argument");
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else if (S_ISREG(diter->st.st_mode))
+			printf("[f] ");
+		else
+			printf("[?] ");
+
+		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
+		       diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 087a8c0cc9..7bc9bb231e 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,6 +19,7 @@ static struct test_cmd cmds[] = {
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
+	{ "dir-iterator", cmd__dir_iterator },
 	{ "drop-caches", cmd__drop_caches },
 	{ "dump-cache-tree", cmd__dump_cache_tree },
 	{ "dump-fsmonitor", cmd__dump_fsmonitor },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7e703f3038..ec0ffbd0cb 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -9,6 +9,7 @@ int cmd__config(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
+int cmd__dir_iterator(int argc, const char **argv);
 int cmd__drop_caches(int argc, const char **argv);
 int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
new file mode 100755
index 0000000000..59bce868f4
--- /dev/null
+++ b/t/t0066-dir-iterator.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test the dir-iterator functionality'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	mkdir -p dir2/a/b/c/ &&
+	>dir2/a/b/c/d
+'
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	cat >expected-iteration-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir/a
+	[d] (a/b) [b] ./dir/a/b
+	[d] (a/b/c) [c] ./dir/a/b/c
+	[d] (d) [d] ./dir/d
+	[d] (d/e) [e] ./dir/d/e
+	[d] (d/e/d) [d] ./dir/d/e/d
+	[f] (a/b/c/d) [d] ./dir/a/b/c/d
+	[f] (a/e) [e] ./dir/a/e
+	[f] (b) [b] ./dir/b
+	[f] (c) [c] ./dir/c
+	[f] (d/e/d/a) [a] ./dir/d/e/d/a
+	EOF
+
+	test-tool dir-iterator ./dir >out &&
+	sort out >./actual-iteration-sorted-output &&
+
+	test_cmp expected-iteration-sorted-output actual-iteration-sorted-output
+'
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	cat >expected-pre-order-output <<-EOF &&
+	[d] (a) [a] ./dir2/a
+	[d] (a/b) [b] ./dir2/a/b
+	[d] (a/b/c) [c] ./dir2/a/b/c
+	[f] (a/b/c/d) [d] ./dir2/a/b/c/d
+	EOF
+
+	test-tool dir-iterator ./dir2 >actual-pre-order-output &&
+
+	test_cmp expected-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 04/10] dir-iterator: use warning_errno when possible
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (2 preceding siblings ...)
  2019-07-10 23:58             ` [GSoC][PATCH v8 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
@ 2019-07-10 23:58             ` Matheus Tavares
  2019-07-10 23:58             ` [GSoC][PATCH v8 05/10] dir-iterator: refactor state machine model Matheus Tavares
                               ` (6 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp, Michael Haggerty

Change warning(..., strerror(errno)) by warning_errno(...). This helps
to unify warning display besides simplifying a bit the code. Also,
improve warning messages by surrounding paths with quotation marks and
using more meaningful statements.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f2dcd82fde..0c8880868a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -71,8 +71,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 			level->dir = opendir(iter->base.path.buf);
 			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
+				warning_errno("error opening directory '%s'",
+					      iter->base.path.buf);
 				/* Popping the level is handled below */
 			}
 
@@ -122,11 +122,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (!de) {
 				/* This level is exhausted; pop up a level. */
 				if (errno) {
-					warning("error reading directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+					warning_errno("error reading directory '%s'",
+						      iter->base.path.buf);
 				} else if (closedir(level->dir))
-					warning("error closing directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+					warning_errno("error closing directory '%s'",
+						      iter->base.path.buf);
 
 				level->dir = NULL;
 				if (--iter->levels_nr == 0)
@@ -140,9 +140,8 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			strbuf_addstr(&iter->base.path, de->d_name);
 			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
 				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
+					warning_errno("failed to stat '%s'",
+						      iter->base.path.buf);
 				continue;
 			}
 
@@ -170,9 +169,11 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 			&iter->levels[iter->levels_nr - 1];
 
 		if (level->dir && closedir(level->dir)) {
+			int saved_errno = errno;
 			strbuf_setlen(&iter->base.path, level->prefix_len);
-			warning("error closing directory %s: %s",
-				iter->base.path.buf, strerror(errno));
+			errno = saved_errno;
+			warning_errno("error closing directory '%s'",
+				      iter->base.path.buf);
 		}
 	}
 
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 05/10] dir-iterator: refactor state machine model
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (3 preceding siblings ...)
  2019-07-10 23:58             ` [GSoC][PATCH v8 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
@ 2019-07-10 23:58             ` Matheus Tavares
  2019-07-10 23:59             ` [GSoC][PATCH v8 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
                               ` (5 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp, Daniel Ferreira, Jeff King, Ramsay Jones,
	Michael Haggerty

dir_iterator_advance() is a large function with two nested loops. Let's
improve its readability factoring out three functions and simplifying
its mechanics. The refactored model will no longer depend on
level.initialized and level.dir_state to keep track of the iteration
state and will perform on a single loop.

Also, dir_iterator_begin() currently does not check if the given string
represents a valid directory path. Since the refactored model will have
to stat() the given path at initialization, let's also check for this
kind of error and make dir_iterator_begin() return NULL, on failures,
with errno appropriately set. And add tests for this new behavior.

Improve documentation at dir-iteration.h and code comments at
dir-iterator.c to reflect the changes and eliminate possible
ambiguities.

Finally, adjust refs/files-backend.c to check for now possible
dir_iterator_begin() failures.

Original-patch-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c               | 234 ++++++++++++++++++-----------------
 dir-iterator.h               |  17 ++-
 refs/files-backend.c         |  17 ++-
 t/helper/test-dir-iterator.c |   5 +
 t/t0066-dir-iterator.sh      |  13 ++
 5 files changed, 164 insertions(+), 122 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 0c8880868a..594fe4d67b 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -13,16 +11,6 @@ struct dir_iterator_level {
 	 * (including a trailing '/'):
 	 */
 	size_t prefix_len;
-
-	/*
-	 * The last action that has been taken with the current entry
-	 * (needed for directories, which have to be included in the
-	 * iteration and also iterated into):
-	 */
-	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
-	} dir_state;
 };
 
 /*
@@ -34,9 +22,11 @@ struct dir_iterator_int {
 	struct dir_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack. After the first
+	 * call to dir_iterator_begin(), if it succeeds to open the
+	 * first level's dir, this will always be at least 1. Then,
+	 * when it comes to zero the iteration is ended and this
+	 * struct is freed.
 	 */
 	size_t levels_nr;
 
@@ -50,113 +40,118 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+/*
+ * Push a level in the iter stack and initialize it with information from
+ * the directory pointed by iter->base->path. It is assumed that this
+ * strbuf points to a valid directory path. Return 0 on success and -1
+ * otherwise, leaving the stack unchanged.
+ */
+static int push_level(struct dir_iterator_int *iter)
+{
+	struct dir_iterator_level *level;
+
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1, iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+
+	if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
+		strbuf_addch(&iter->base.path, '/');
+	level->prefix_len = iter->base.path.len;
+
+	level->dir = opendir(iter->base.path.buf);
+	if (!level->dir) {
+		if (errno != ENOENT) {
+			warning_errno("error opening directory '%s'",
+				      iter->base.path.buf);
+		}
+		iter->levels_nr--;
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Pop the top level on the iter stack, releasing any resources associated
+ * with it. Return the new value of iter->levels_nr.
+ */
+static int pop_level(struct dir_iterator_int *iter)
+{
+	struct dir_iterator_level *level =
+		&iter->levels[iter->levels_nr - 1];
+
+	if (level->dir && closedir(level->dir))
+		warning_errno("error closing directory '%s'",
+			      iter->base.path.buf);
+	level->dir = NULL;
+
+	return --iter->levels_nr;
+}
+
+/*
+ * Populate iter->base with the necessary information on the next iteration
+ * entry, represented by the given dirent de. Return 0 on success and -1
+ * otherwise.
+ */
+static int prepare_next_entry_data(struct dir_iterator_int *iter,
+				   struct dirent *de)
+{
+	strbuf_addstr(&iter->base.path, de->d_name);
+	/*
+	 * We have to reset these because the path strbuf might have
+	 * been realloc()ed at the previous strbuf_addstr().
+	 */
+	iter->base.relative_path = iter->base.path.buf +
+				   iter->levels[0].prefix_len;
+	iter->base.basename = iter->base.path.buf +
+			      iter->levels[iter->levels_nr - 1].prefix_len;
+
+	if (lstat(iter->base.path.buf, &iter->base.st)) {
+		if (errno != ENOENT)
+			warning_errno("failed to stat '%s'", iter->base.path.buf);
+		return -1;
+	}
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
 
+	if (S_ISDIR(iter->base.st.st_mode)) {
+		if (push_level(iter) && iter->levels_nr == 0) {
+			/* Pushing the first level failed */
+			return dir_iterator_abort(dir_iterator);
+		}
+	}
+
+	/* Loop until we find an entry that we can give back to the caller. */
 	while (1) {
+		struct dirent *de;
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
-		struct dirent *de;
 
-		if (!level->initialized) {
-			/*
-			 * Note: dir_iterator_begin() ensures that
-			 * path is not the empty string.
-			 */
-			if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
-				strbuf_addch(&iter->base.path, '/');
-			level->prefix_len = iter->base.path.len;
-
-			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning_errno("error opening directory '%s'",
+		strbuf_setlen(&iter->base.path, level->prefix_len);
+		errno = 0;
+		de = readdir(level->dir);
+
+		if (!de) {
+			if (errno)
+				warning_errno("error reading directory '%s'",
 					      iter->base.path.buf);
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
-				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
-				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
-				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
-			}
+			else if (pop_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
+			continue;
 		}
 
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (--iter->levels_nr == 0)
-				return dir_iterator_abort(dir_iterator);
+		if (is_dot_or_dotdot(de->d_name))
+			continue;
 
+		if (prepare_next_entry_data(iter, de))
 			continue;
-		}
 
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
-			strbuf_setlen(&iter->base.path, level->prefix_len);
-			errno = 0;
-			de = readdir(level->dir);
-
-			if (!de) {
-				/* This level is exhausted; pop up a level. */
-				if (errno) {
-					warning_errno("error reading directory '%s'",
-						      iter->base.path.buf);
-				} else if (closedir(level->dir))
-					warning_errno("error closing directory '%s'",
-						      iter->base.path.buf);
-
-				level->dir = NULL;
-				if (--iter->levels_nr == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
-			}
-
-			if (is_dot_or_dotdot(de->d_name))
-				continue;
-
-			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning_errno("failed to stat '%s'",
-						      iter->base.path.buf);
-				continue;
-			}
-
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
-
-			return ITER_OK;
-		}
+		return ITER_OK;
 	}
 }
 
@@ -187,17 +182,32 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
-
-	if (!path || !*path)
-		BUG("empty path passed to dir_iterator_begin()");
+	int saved_errno;
 
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+	iter->levels_nr = 0;
 
-	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	/*
+	 * Note: stat already checks for NULL or empty strings and
+	 * inexistent paths.
+	 */
+	if (stat(iter->base.path.buf, &iter->base.st) < 0) {
+		saved_errno = errno;
+		goto error_out;
+	}
+
+	if (!S_ISDIR(iter->base.st.st_mode)) {
+		saved_errno = ENOTDIR;
+		goto error_out;
+	}
 
 	return dir_iterator;
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	errno = saved_errno;
+	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 970793d07a..9b4cb7acd2 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -8,18 +8,22 @@
  *
  * Iterate over a directory tree, recursively, including paths of all
  * types and hidden paths. Skip "." and ".." entries and don't follow
- * symlinks except for the original path.
+ * symlinks except for the original path. Note that the original path
+ * is not included in the iteration.
  *
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined, directory paths are always given before
+ * their contents.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct dir_iterator *iter = dir_iterator_begin(path);
+ *
+ *     if (!iter)
+ *             goto error_handler;
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -59,8 +63,9 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path. On success, return a
+ * dir_iterator that holds the internal state of the iteration.
+ * In case of failure, return NULL and set errno accordingly.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 63e55e6773..7ed81046d4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2143,13 +2143,22 @@ static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 						  const char *gitdir)
 {
-	struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
-	struct ref_iterator *ref_iterator = &iter->base;
+	struct dir_iterator *diter;
+	struct files_reflog_iterator *iter;
+	struct ref_iterator *ref_iterator;
 	struct strbuf sb = STRBUF_INIT;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	strbuf_addf(&sb, "%s/logs", gitdir);
-	iter->dir_iterator = dir_iterator_begin(sb.buf);
+
+	diter = dir_iterator_begin(sb.buf);
+	if(!diter)
+		return empty_ref_iterator_begin();
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 84f50bed8c..fab1ff6237 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -17,6 +17,11 @@ int cmd__dir_iterator(int argc, const char **argv)
 
 	diter = dir_iterator_begin(path.buf);
 
+	if (!diter) {
+		printf("dir_iterator_begin failure: %d\n", errno);
+		exit(EXIT_FAILURE);
+	}
+
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
 			printf("[d] ");
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index 59bce868f4..cc4b19c34c 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -52,4 +52,17 @@ test_expect_success 'dir-iterator should list files in the correct order' '
 	test_cmp expected-pre-order-output actual-pre-order-output
 '
 
+test_expect_success 'begin should fail upon inexistent paths' '
+	test_must_fail test-tool dir-iterator ./inexistent-path \
+		>actual-inexistent-path-output &&
+	echo "dir_iterator_begin failure: 2" >expected-inexistent-path-output &&
+	test_cmp expected-inexistent-path-output actual-inexistent-path-output
+'
+
+test_expect_success 'begin should fail upon non directory paths' '
+	test_must_fail test-tool dir-iterator ./dir/b >actual-non-dir-output &&
+	echo "dir_iterator_begin failure: 20" >expected-non-dir-output &&
+	test_cmp expected-non-dir-output actual-non-dir-output
+'
+
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 06/10] dir-iterator: add flags parameter to dir_iterator_begin
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (4 preceding siblings ...)
  2019-07-10 23:58             ` [GSoC][PATCH v8 05/10] dir-iterator: refactor state machine model Matheus Tavares
@ 2019-07-10 23:59             ` Matheus Tavares
  2019-07-10 23:59             ` [GSoC][PATCH v8 07/10] clone: copy hidden paths at local clone Matheus Tavares
                               ` (4 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp, Michael Haggerty, Ramsay Jones, Daniel Ferreira

Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.

Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.

These new flags will be used in a subsequent patch.

Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir-iterator.c               | 56 +++++++++++++++++--------
 dir-iterator.h               | 55 ++++++++++++++++++++-----
 refs/files-backend.c         |  2 +-
 t/helper/test-dir-iterator.c | 34 +++++++++++----
 t/t0066-dir-iterator.sh      | 80 ++++++++++++++++++++++++++++++++++++
 5 files changed, 191 insertions(+), 36 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 594fe4d67b..b17e9f970a 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -38,13 +38,16 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Combination of flags for this dir-iterator */
+	unsigned int flags;
 };
 
 /*
  * Push a level in the iter stack and initialize it with information from
  * the directory pointed by iter->base->path. It is assumed that this
  * strbuf points to a valid directory path. Return 0 on success and -1
- * otherwise, leaving the stack unchanged.
+ * otherwise, setting errno accordingly and leaving the stack unchanged.
  */
 static int push_level(struct dir_iterator_int *iter)
 {
@@ -59,11 +62,13 @@ static int push_level(struct dir_iterator_int *iter)
 
 	level->dir = opendir(iter->base.path.buf);
 	if (!level->dir) {
+		int saved_errno = errno;
 		if (errno != ENOENT) {
 			warning_errno("error opening directory '%s'",
 				      iter->base.path.buf);
 		}
 		iter->levels_nr--;
+		errno = saved_errno;
 		return -1;
 	}
 
@@ -90,11 +95,13 @@ static int pop_level(struct dir_iterator_int *iter)
 /*
  * Populate iter->base with the necessary information on the next iteration
  * entry, represented by the given dirent de. Return 0 on success and -1
- * otherwise.
+ * otherwise, setting errno accordingly.
  */
 static int prepare_next_entry_data(struct dir_iterator_int *iter,
 				   struct dirent *de)
 {
+	int err, saved_errno;
+
 	strbuf_addstr(&iter->base.path, de->d_name);
 	/*
 	 * We have to reset these because the path strbuf might have
@@ -105,13 +112,17 @@ static int prepare_next_entry_data(struct dir_iterator_int *iter,
 	iter->base.basename = iter->base.path.buf +
 			      iter->levels[iter->levels_nr - 1].prefix_len;
 
-	if (lstat(iter->base.path.buf, &iter->base.st)) {
-		if (errno != ENOENT)
-			warning_errno("failed to stat '%s'", iter->base.path.buf);
-		return -1;
-	}
+	if (iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS)
+		err = stat(iter->base.path.buf, &iter->base.st);
+	else
+		err = lstat(iter->base.path.buf, &iter->base.st);
 
-	return 0;
+	saved_errno = errno;
+	if (err && errno != ENOENT)
+		warning_errno("failed to stat '%s'", iter->base.path.buf);
+
+	errno = saved_errno;
+	return err;
 }
 
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
@@ -119,11 +130,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 	struct dir_iterator_int *iter =
 		(struct dir_iterator_int *)dir_iterator;
 
-	if (S_ISDIR(iter->base.st.st_mode)) {
-		if (push_level(iter) && iter->levels_nr == 0) {
-			/* Pushing the first level failed */
-			return dir_iterator_abort(dir_iterator);
-		}
+	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
+		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+			goto error_out;
+		if (iter->levels_nr == 0)
+			goto error_out;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -137,22 +148,32 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		de = readdir(level->dir);
 
 		if (!de) {
-			if (errno)
+			if (errno) {
 				warning_errno("error reading directory '%s'",
 					      iter->base.path.buf);
-			else if (pop_level(iter) == 0)
+				if (iter->flags & DIR_ITERATOR_PEDANTIC)
+					goto error_out;
+			} else if (pop_level(iter) == 0) {
 				return dir_iterator_abort(dir_iterator);
+			}
 			continue;
 		}
 
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
-		if (prepare_next_entry_data(iter, de))
+		if (prepare_next_entry_data(iter, de)) {
+			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
+				goto error_out;
 			continue;
+		}
 
 		return ITER_OK;
 	}
+
+error_out:
+	dir_iterator_abort(dir_iterator);
+	return ITER_ERROR;
 }
 
 int dir_iterator_abort(struct dir_iterator *dir_iterator)
@@ -178,7 +199,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -189,6 +210,7 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 	iter->levels_nr = 0;
+	iter->flags = flags;
 
 	/*
 	 * Note: stat already checks for NULL or empty strings and
diff --git a/dir-iterator.h b/dir-iterator.h
index 9b4cb7acd2..08229157c6 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -20,7 +20,8 @@
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct dir_iterator *iter = dir_iterator_begin(path);
+ *     unsigned int flags = DIR_ITERATOR_PEDANTIC;
+ *     struct dir_iterator *iter = dir_iterator_begin(path, flags);
  *
  *     if (!iter)
  *             goto error_handler;
@@ -44,6 +45,29 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Flags for dir_iterator_begin:
+ *
+ * - DIR_ITERATOR_PEDANTIC: override dir-iterator's default behavior
+ *   in case of an error at dir_iterator_advance(), which is to keep
+ *   looking for a next valid entry. With this flag, resources are freed
+ *   and ITER_ERROR is returned immediately. In both cases, a meaningful
+ *   warning is emitted. Note: ENOENT errors are always ignored so that
+ *   the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_FOLLOW_SYMLINKS: make dir-iterator follow symlinks.
+ *   i.e., linked directories' contents will be iterated over and
+ *   iter->base.st will contain information on the referred files,
+ *   not the symlinks themselves, which is the default behavior. Broken
+ *   symlinks are ignored.
+ *
+ * Warning: circular symlinks are also followed when
+ * DIR_ITERATOR_FOLLOW_SYMLINKS is set. The iteration may end up with
+ * an ELOOP if they happen and DIR_ITERATOR_PEDANTIC is set.
+ */
+#define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_FOLLOW_SYMLINKS (1 << 1)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -58,29 +82,38 @@ struct dir_iterator {
 	/* The current basename: */
 	const char *basename;
 
-	/* The result of calling lstat() on path: */
+	/*
+	 * The result of calling lstat() on path; or stat(), if the
+	 * DIR_ITERATOR_FOLLOW_SYMLINKS flag was set at
+	 * dir_iterator's initialization.
+	 */
 	struct stat st;
 };
 
 /*
- * Start a directory iteration over path. On success, return a
- * dir_iterator that holds the internal state of the iteration.
- * In case of failure, return NULL and set errno accordingly.
+ * Start a directory iteration over path with the combination of
+ * options specified by flags. On success, return a dir_iterator
+ * that holds the internal state of the iteration. In case of
+ * failure, return NULL and set errno accordingly.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
- * path is the starting directory. An internal copy will be made.
+ * Parameters are:
+ *  - path is the starting directory. An internal copy will be made.
+ *  - flags is a combination of the possible flags to initialize a
+ *    dir-iterator or 0 for default behavior.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
  * If the iteration is exhausted, free the dir_iterator and any
- * resources associated with it and return ITER_DONE. On error, free
- * dir_iterator and associated resources and return ITER_ERROR. It is
- * a bug to use iterator or call this function again after it has
- * returned ITER_DONE or ITER_ERROR.
+ * resources associated with it and return ITER_DONE.
+ *
+ * It is a bug to use iterator or call this function again after it
+ * has returned ITER_DONE or ITER_ERROR (which may be returned iff
+ * the DIR_ITERATOR_PEDANTIC flag was set).
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7ed81046d4..b1f8f53a09 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2150,7 +2150,7 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 
 	strbuf_addf(&sb, "%s/logs", gitdir);
 
-	diter = dir_iterator_begin(sb.buf);
+	diter = dir_iterator_begin(sb.buf, 0);
 	if(!diter)
 		return empty_ref_iterator_begin();
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index fab1ff6237..a5b96cb0dc 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,29 +4,44 @@
 #include "iterator.h"
 #include "dir-iterator.h"
 
-/* Argument is a directory path to iterate over */
+/*
+ * usage:
+ * tool-test dir-iterator [--follow-symlinks] [--pedantic] directory_path
+ */
 int cmd__dir_iterator(int argc, const char **argv)
 {
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
+	unsigned int flags = 0;
+	int iter_status;
+
+	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+		if (strcmp(*argv, "--follow-symlinks") == 0)
+			flags |= DIR_ITERATOR_FOLLOW_SYMLINKS;
+		else if (strcmp(*argv, "--pedantic") == 0)
+			flags |= DIR_ITERATOR_PEDANTIC;
+		else
+			die("invalid option '%s'", *argv);
+	}
 
-	if (argc < 2)
-		die("BUG: test-dir-iterator needs one argument");
-
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (!*argv || argc != 1)
+		die("dir-iterator needs exactly one non-option argument");
 
-	diter = dir_iterator_begin(path.buf);
+	strbuf_add(&path, *argv, strlen(*argv));
+	diter = dir_iterator_begin(path.buf, flags);
 
 	if (!diter) {
 		printf("dir_iterator_begin failure: %d\n", errno);
 		exit(EXIT_FAILURE);
 	}
 
-	while (dir_iterator_advance(diter) == ITER_OK) {
+	while ((iter_status = dir_iterator_advance(diter)) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
 			printf("[d] ");
 		else if (S_ISREG(diter->st.st_mode))
 			printf("[f] ");
+		else if (S_ISLNK(diter->st.st_mode))
+			printf("[s] ");
 		else
 			printf("[?] ");
 
@@ -34,5 +49,10 @@ int cmd__dir_iterator(int argc, const char **argv)
 		       diter->path.buf);
 	}
 
+	if (iter_status != ITER_DONE) {
+		printf("dir_iterator_advance failure\n");
+		return 1;
+	}
+
 	return 0;
 }
diff --git a/t/t0066-dir-iterator.sh b/t/t0066-dir-iterator.sh
index cc4b19c34c..9354d3f1ed 100755
--- a/t/t0066-dir-iterator.sh
+++ b/t/t0066-dir-iterator.sh
@@ -65,4 +65,84 @@ test_expect_success 'begin should fail upon non directory paths' '
 	test_cmp expected-non-dir-output actual-non-dir-output
 '
 
+test_expect_success POSIXPERM,SANITY 'advance should not fail on errors by default' '
+	cat >expected-no-permissions-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	EOF
+
+	mkdir -p dir3/a &&
+	>dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test-tool dir-iterator ./dir3 >actual-no-permissions-output &&
+	test_cmp expected-no-permissions-output actual-no-permissions-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success POSIXPERM,SANITY 'advance should fail on errors, w/ pedantic flag' '
+	cat >expected-no-permissions-pedantic-output <<-EOF &&
+	[d] (a) [a] ./dir3/a
+	dir_iterator_advance failure
+	EOF
+
+	mkdir -p dir3/a &&
+	>dir3/a/b &&
+	chmod 0 dir3/a &&
+
+	test_must_fail test-tool dir-iterator --pedantic ./dir3 \
+		>actual-no-permissions-pedantic-output &&
+	test_cmp expected-no-permissions-pedantic-output \
+		actual-no-permissions-pedantic-output &&
+	chmod 755 dir3/a &&
+	rm -rf dir3
+'
+
+test_expect_success SYMLINKS 'setup dirs with symlinks' '
+	mkdir -p dir4/a &&
+	mkdir -p dir4/b/c &&
+	>dir4/a/d &&
+	ln -s d dir4/a/e &&
+	ln -s ../b dir4/a/f &&
+
+	mkdir -p dir5/a/b &&
+	mkdir -p dir5/a/c &&
+	ln -s ../c dir5/a/b/d &&
+	ln -s ../ dir5/a/b/e &&
+	ln -s ../../ dir5/a/b/f
+'
+
+test_expect_success SYMLINKS 'dir-iterator should not follow symlinks by default' '
+	cat >expected-no-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[s] (a/e) [e] ./dir4/a/e
+	[s] (a/f) [f] ./dir4/a/f
+	EOF
+
+	test-tool dir-iterator ./dir4 >out &&
+	sort out >actual-no-follow-sorted-output &&
+
+	test_cmp expected-no-follow-sorted-output actual-no-follow-sorted-output
+'
+
+test_expect_success SYMLINKS 'dir-iterator should follow symlinks w/ follow flag' '
+	cat >expected-follow-sorted-output <<-EOF &&
+	[d] (a) [a] ./dir4/a
+	[d] (a/f) [f] ./dir4/a/f
+	[d] (a/f/c) [c] ./dir4/a/f/c
+	[d] (b) [b] ./dir4/b
+	[d] (b/c) [c] ./dir4/b/c
+	[f] (a/d) [d] ./dir4/a/d
+	[f] (a/e) [e] ./dir4/a/e
+	EOF
+
+	test-tool dir-iterator --follow-symlinks ./dir4 >out &&
+	sort out >actual-follow-sorted-output &&
+
+	test_cmp expected-follow-sorted-output actual-follow-sorted-output
+'
+
 test_done
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 07/10] clone: copy hidden paths at local clone
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (5 preceding siblings ...)
  2019-07-10 23:59             ` [GSoC][PATCH v8 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
@ 2019-07-10 23:59             ` Matheus Tavares
  2019-07-10 23:59             ` [GSoC][PATCH v8 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
                               ` (3 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp

Make the copy_or_link_directory function no longer skip hidden
directories. This function, used to copy .git/objects, currently skips
all hidden directories but not hidden files, which is an odd behaviour.
The reason for that could be unintentional: probably the intention was
to skip '.' and '..' only but it ended up accidentally skipping all
directories starting with '.'. Besides being more natural, the new
behaviour is more permissive to the user.

Also adjust tests to reflect this behaviour change.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            | 2 +-
 t/t5604-clone-reference.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4a0a2455a7..9dd083e34d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -430,7 +430,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 			continue;
 		}
 		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
+			if (!is_dot_or_dotdot(de->d_name))
 				copy_or_link_directory(src, dest,
 						       src_repo, src_baselen);
 			continue;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 459ad8a20b..4894237ab8 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -247,16 +247,25 @@ test_expect_success 'clone a repo with garbage in objects/*' '
 	done &&
 	find S-* -name "*some*" | sort >actual &&
 	cat >expected <<-EOF &&
+	S--dissociate/.git/objects/.some-hidden-dir
+	S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
+	S--dissociate/.git/objects/.some-hidden-dir/some-file
 	S--dissociate/.git/objects/.some-hidden-file
 	S--dissociate/.git/objects/some-dir
 	S--dissociate/.git/objects/some-dir/.some-dot-file
 	S--dissociate/.git/objects/some-dir/some-file
 	S--dissociate/.git/objects/some-file
+	S--local/.git/objects/.some-hidden-dir
+	S--local/.git/objects/.some-hidden-dir/.some-dot-file
+	S--local/.git/objects/.some-hidden-dir/some-file
 	S--local/.git/objects/.some-hidden-file
 	S--local/.git/objects/some-dir
 	S--local/.git/objects/some-dir/.some-dot-file
 	S--local/.git/objects/some-dir/some-file
 	S--local/.git/objects/some-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir
+	S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
+	S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
 	S--no-hardlinks/.git/objects/.some-hidden-file
 	S--no-hardlinks/.git/objects/some-dir
 	S--no-hardlinks/.git/objects/some-dir/.some-dot-file
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 08/10] clone: extract function from copy_or_link_directory
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (6 preceding siblings ...)
  2019-07-10 23:59             ` [GSoC][PATCH v8 07/10] clone: copy hidden paths at local clone Matheus Tavares
@ 2019-07-10 23:59             ` Matheus Tavares
  2019-07-10 23:59             ` [GSoC][PATCH v8 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
                               ` (2 subsequent siblings)
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp

Extract dir creation code snippet from copy_or_link_directory to its own
function named mkdir_if_missing. This change will help to remove
copy_or_link_directory's explicit recursion, which will be done in a
following patch. Also makes the code more readable.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9dd083e34d..96566c1bab 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -394,6 +394,21 @@ static void copy_alternates(struct strbuf *src, const char *src_repo)
 	fclose(in);
 }
 
+static void mkdir_if_missing(const char *pathname, mode_t mode)
+{
+	struct stat st;
+
+	if (!mkdir(pathname, mode))
+		return;
+
+	if (errno != EEXIST)
+		die_errno(_("failed to create directory '%s'"), pathname);
+	else if (stat(pathname, &st))
+		die_errno(_("failed to stat '%s'"), pathname);
+	else if (!S_ISDIR(st.st_mode))
+		die(_("%s exists and is not a directory"), pathname);
+}
+
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 				   const char *src_repo, int src_baselen)
 {
@@ -406,14 +421,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 	if (!dir)
 		die_errno(_("failed to open '%s'"), src->buf);
 
-	if (mkdir(dest->buf, 0777)) {
-		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
-		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
-	}
+	mkdir_if_missing(dest->buf, 0777);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 09/10] clone: use dir-iterator to avoid explicit dir traversal
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (7 preceding siblings ...)
  2019-07-10 23:59             ` [GSoC][PATCH v8 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
@ 2019-07-10 23:59             ` Matheus Tavares
  2019-07-10 23:59             ` [GSoC][PATCH v8 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
  2019-07-11 11:56             ` [GSoC][PATCH v8 00/10] clone: dir-iterator refactoring with tests Johannes Schindelin
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp, Jeff King

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API. This simplifies the code and avoids recursive calls to
copy_or_link_directory.

This process also makes copy_or_link_directory call die() in case of an
error on readdir or stat inside dir_iterator_advance. Previously it
would just print a warning for errors on stat and ignore errors on
readdir, which isn't nice because a local git clone could succeed even
though the .git/objects copy didn't fully succeed.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 96566c1bab..47cb4a2a8e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -410,42 +412,39 @@ static void mkdir_if_missing(const char *pathname, mode_t mode)
 }
 
 static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+				   const char *src_repo)
 {
-	struct dirent *de;
-	struct stat buf;
 	int src_len, dest_len;
-	DIR *dir;
-
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
+	struct dir_iterator *iter;
+	int iter_status;
+	unsigned int flags;
 
 	mkdir_if_missing(dest->buf, 0777);
 
+	flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS;
+	iter = dir_iterator_begin(src->buf, flags);
+
+	if (!iter)
+		die_errno(_("failed to start iterator over '%s'"), src->buf);
+
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (!is_dot_or_dotdot(de->d_name))
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, src_repo);
 			continue;
 		}
@@ -462,7 +461,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
@@ -480,7 +483,7 @@ static void clone_local(const char *src_repo, const char *dest_repo)
 		get_common_dir(&dest, dest_repo);
 		strbuf_addstr(&src, "/objects");
 		strbuf_addstr(&dest, "/objects");
-		copy_or_link_directory(&src, &dest, src_repo, src.len);
+		copy_or_link_directory(&src, &dest, src_repo);
 		strbuf_release(&src);
 		strbuf_release(&dest);
 	}
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [GSoC][PATCH v8 10/10] clone: replace strcmp by fspathcmp
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (8 preceding siblings ...)
  2019-07-10 23:59             ` [GSoC][PATCH v8 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-07-10 23:59             ` Matheus Tavares
  2019-07-11 11:56             ` [GSoC][PATCH v8 00/10] clone: dir-iterator refactoring with tests Johannes Schindelin
  10 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares @ 2019-07-10 23:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Thomas Gummerer, Ævar Arnfjörð Bjarmason,
	Christian Couder, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Olga Telezhnaya, Johannes Schindelin,
	kernel-usp, Jeff King

Replace the use of strcmp by fspathcmp at copy_or_link_directory, which
is more permissive/friendly to case-insensitive file systems.

Suggested-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/clone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 47cb4a2a8e..8da696ef30 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -444,7 +444,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(iter->relative_path, "info/alternates")) {
+		if (!fspathcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, src_repo);
 			continue;
 		}
-- 
2.22.0


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v8 00/10] clone: dir-iterator refactoring with tests
  2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
                               ` (9 preceding siblings ...)
  2019-07-10 23:59             ` [GSoC][PATCH v8 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
@ 2019-07-11 11:56             ` Johannes Schindelin
  2019-07-11 15:24               ` Matheus Tavares Bernardino
  10 siblings, 1 reply; 127+ messages in thread
From: Johannes Schindelin @ 2019-07-11 11:56 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Junio C Hamano, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, kernel-usp

Hi Matheus,

On Wed, 10 Jul 2019, Matheus Tavares wrote:

> - a replacement of explicit recursive dir iteration at
>   copy_or_link_directory for the dir-iterator API;

As far as I can see, it was not replaced, but just dropped. Which is
good, as it will most likely address the CI failures.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [GSoC][PATCH v8 00/10] clone: dir-iterator refactoring with tests
  2019-07-11 11:56             ` [GSoC][PATCH v8 00/10] clone: dir-iterator refactoring with tests Johannes Schindelin
@ 2019-07-11 15:24               ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 127+ messages in thread
From: Matheus Tavares Bernardino @ 2019-07-11 15:24 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, git, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, Christian Couder,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Olga Telezhnaya, Kernel USP

On Thu, Jul 11, 2019 at 8:56 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Wed, 10 Jul 2019, Matheus Tavares wrote:
>
> > - a replacement of explicit recursive dir iteration at
> >   copy_or_link_directory for the dir-iterator API;
>
> As far as I can see, it was not replaced, but just dropped. Which is
> good, as it will most likely address the CI failures.

You mean the circular symlink checker, right? Yes, it was dropped. At
this item I was referring to a dir iteration code at builtin/clone.c
(using opendir/readdir) which was replaced by the dir-iterator API.

> Thanks,
> Dscho

^ permalink raw reply	[flat|nested] 127+ messages in thread

end of thread, back to index

Thread overview: 127+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26  5:17 [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Matheus Tavares
2019-02-26  5:18 ` [WIP RFC PATCH v2 1/5] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
2019-02-26 12:01   ` Duy Nguyen
2019-02-27 13:59     ` Matheus Tavares Bernardino
2019-02-26  5:18 ` [WIP RFC PATCH v2 2/5] clone: test for our behavior on odd objects/* content Matheus Tavares
2019-02-26  5:18 ` [WIP RFC PATCH v2 3/5] clone: copy hidden paths at local clone Matheus Tavares
2019-02-26 12:13   ` Duy Nguyen
2019-02-26  5:18 ` [WIP RFC PATCH v2 4/5] clone: extract function from copy_or_link_directory Matheus Tavares
2019-02-26 12:18   ` Duy Nguyen
2019-02-27 17:30     ` Matheus Tavares Bernardino
2019-02-27 22:45       ` Thomas Gummerer
2019-02-27 22:50         ` Matheus Tavares Bernardino
2019-02-26  5:18 ` [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
2019-02-26 11:35   ` Ævar Arnfjörð Bjarmason
2019-02-26 12:32   ` Duy Nguyen
2019-02-26 12:50     ` Ævar Arnfjörð Bjarmason
2019-02-27 17:40     ` Matheus Tavares Bernardino
2019-02-28  7:13       ` Duy Nguyen
2019-02-28  7:53       ` Ævar Arnfjörð Bjarmason
2019-02-26 11:36 ` [WIP RFC PATCH v2 0/5] clone: dir iterator refactoring with tests Ævar Arnfjörð Bjarmason
2019-02-26 12:20   ` Duy Nguyen
2019-02-26 12:28 ` [RFC PATCH v3 " Ævar Arnfjörð Bjarmason
2019-02-26 20:56   ` Matheus Tavares Bernardino
2019-03-22 23:22   ` [GSoC][PATCH v4 0/7] clone: dir-iterator " Matheus Tavares
2019-03-22 23:22     ` [GSoC][PATCH v4 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
2019-03-24 18:09       ` Matheus Tavares Bernardino
2019-03-24 20:56       ` SZEDER Gábor
2019-03-26 19:43         ` Matheus Tavares Bernardino
2019-03-28 21:49       ` Thomas Gummerer
2019-03-29 14:06         ` Matheus Tavares Bernardino
2019-03-29 19:31           ` Thomas Gummerer
2019-03-29 19:42             ` SZEDER Gábor
2019-03-30  2:49             ` Matheus Tavares Bernardino
2019-03-22 23:22     ` [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
2019-03-28 22:10       ` Thomas Gummerer
2019-03-29  8:38         ` Ævar Arnfjörð Bjarmason
2019-03-29 20:15           ` Thomas Gummerer
2019-03-29 14:27         ` Matheus Tavares Bernardino
2019-03-29 20:05           ` Thomas Gummerer
2019-03-30  5:32             ` Matheus Tavares Bernardino
2019-03-30 19:27               ` Thomas Gummerer
2019-04-01  3:56                 ` Matheus Tavares Bernardino
2019-03-29 15:40         ` Johannes Schindelin
2019-03-22 23:22     ` [GSoC][PATCH v4 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
2019-03-28 22:19       ` Thomas Gummerer
2019-03-29 13:16         ` Matheus Tavares Bernardino
2019-03-22 23:22     ` [GSoC][PATCH v4 4/7] clone: copy hidden paths at local clone Matheus Tavares
2019-03-22 23:22     ` [GSoC][PATCH v4 5/7] clone: extract function from copy_or_link_directory Matheus Tavares
2019-03-22 23:22     ` [GSoC][PATCH v4 6/7] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
2019-03-22 23:22     ` [GSoC][PATCH v4 7/7] clone: Replace strcmp by fspathcmp Matheus Tavares
2019-03-30 22:49     ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Matheus Tavares
2019-03-30 22:49       ` [GSoC][PATCH v5 1/7] clone: test for our behavior on odd objects/* content Matheus Tavares
2019-03-30 22:49       ` [GSoC][PATCH v5 2/7] clone: better handle symlinked files at .git/objects/ Matheus Tavares
2019-03-31 17:40         ` Thomas Gummerer
2019-04-01  3:59           ` Matheus Tavares Bernardino
2019-03-30 22:49       ` [GSoC][PATCH v5 3/7] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
2019-03-31 18:12         ` Thomas Gummerer
2019-04-10 20:24           ` Matheus Tavares Bernardino
2019-04-11 21:09             ` Thomas Gummerer
2019-04-23 17:07               ` Matheus Tavares Bernardino
2019-04-24 18:36                 ` Thomas Gummerer
2019-04-26  4:13                   ` Matheus Tavares Bernardino
2019-03-30 22:49       ` [GSoC][PATCH v5 4/7] clone: copy hidden paths at local clone Matheus Tavares
2019-03-30 22:49       ` [GSoC][PATCH v5 5/7] clone: extract function from copy_or_link_directory Matheus Tavares
2019-03-30 22:49       ` [GSoC][PATCH v5 6/7] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
2019-03-30 22:49       ` [GSoC][PATCH v5 7/7] clone: replace strcmp by fspathcmp Matheus Tavares
2019-03-31 18:16       ` [GSoC][PATCH v5 0/7] clone: dir-iterator refactoring with tests Thomas Gummerer
2019-04-01 13:56         ` Matheus Tavares Bernardino
2019-05-02 14:48       ` [GSoC][PATCH v6 00/10] " Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 05/10] dir-iterator: refactor state machine model Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 07/10] clone: copy hidden paths at local clone Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
2019-05-02 14:48         ` [GSoC][PATCH v6 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
2019-06-18 23:27         ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 05/10] dir-iterator: refactor state machine model Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
2019-06-25 18:00             ` Junio C Hamano
2019-06-25 18:11               ` Matheus Tavares Bernardino
2019-06-26 13:34             ` Johannes Schindelin
2019-06-26 18:04               ` Junio C Hamano
2019-06-27  9:20                 ` Duy Nguyen
2019-06-27 17:23                 ` Matheus Tavares Bernardino
2019-06-27 18:48                   ` Johannes Schindelin
2019-06-27 19:33                     ` Matheus Tavares Bernardino
2019-06-28 12:51                       ` Johannes Schindelin
2019-06-28 14:16                         ` Matheus Tavares Bernardino
2019-07-01 12:15                           ` Johannes Schindelin
2019-07-03  8:57             ` SZEDER Gábor
2019-07-08 22:21               ` Matheus Tavares Bernardino
2019-06-18 23:27           ` [GSoC][PATCH v7 07/10] clone: copy hidden paths at local clone Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
2019-06-18 23:27           ` [GSoC][PATCH v7 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
2019-06-19  4:36           ` [GSoC][PATCH v7 00/10] clone: dir-iterator refactoring with tests Matheus Tavares Bernardino
2019-06-20 20:18           ` Junio C Hamano
2019-06-21 13:41             ` Matheus Tavares Bernardino
2019-07-10 23:58           ` [GSoC][PATCH v8 " Matheus Tavares
2019-07-10 23:58             ` [GSoC][PATCH v8 01/10] clone: test for our behavior on odd objects/* content Matheus Tavares
2019-07-10 23:58             ` [GSoC][PATCH v8 02/10] clone: better handle symlinked files at .git/objects/ Matheus Tavares
2019-07-10 23:58             ` [GSoC][PATCH v8 03/10] dir-iterator: add tests for dir-iterator API Matheus Tavares
2019-07-10 23:58             ` [GSoC][PATCH v8 04/10] dir-iterator: use warning_errno when possible Matheus Tavares
2019-07-10 23:58             ` [GSoC][PATCH v8 05/10] dir-iterator: refactor state machine model Matheus Tavares
2019-07-10 23:59             ` [GSoC][PATCH v8 06/10] dir-iterator: add flags parameter to dir_iterator_begin Matheus Tavares
2019-07-10 23:59             ` [GSoC][PATCH v8 07/10] clone: copy hidden paths at local clone Matheus Tavares
2019-07-10 23:59             ` [GSoC][PATCH v8 08/10] clone: extract function from copy_or_link_directory Matheus Tavares
2019-07-10 23:59             ` [GSoC][PATCH v8 09/10] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
2019-07-10 23:59             ` [GSoC][PATCH v8 10/10] clone: replace strcmp by fspathcmp Matheus Tavares
2019-07-11 11:56             ` [GSoC][PATCH v8 00/10] clone: dir-iterator refactoring with tests Johannes Schindelin
2019-07-11 15:24               ` Matheus Tavares Bernardino
2019-02-26 12:28 ` [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content Ævar Arnfjörð Bjarmason
2019-02-28 21:19   ` Matheus Tavares Bernardino
2019-03-01 13:49     ` Ævar Arnfjörð Bjarmason
2019-03-13  3:17       ` Matheus Tavares
2019-02-26 12:28 ` [RFC PATCH v3 2/5] dir-iterator: add flags parameter to dir_iterator_begin Ævar Arnfjörð Bjarmason
2019-02-26 12:28 ` [RFC PATCH v3 3/5] clone: copy hidden paths at local clone Ævar Arnfjörð Bjarmason
2019-02-26 12:28 ` [RFC PATCH v3 4/5] clone: extract function from copy_or_link_directory Ævar Arnfjörð Bjarmason
2019-02-26 12:28 ` [RFC PATCH v3 5/5] clone: use dir-iterator to avoid explicit dir traversal Ævar Arnfjörð Bjarmason

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox