git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks
@ 2012-09-29  6:15 Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 1/9] Introduce new static function real_path_internal() Michael Haggerty
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

v2 of the series, with the following changes from v1:

Patch 2/9 is new.

Patch 4/9: remove an unnecessary memcpy().  (This change doesn't
affect the tip of the branch because the memcpy() was removed anyway
in the second-to-last patch.)

Patch 8/9: remove a superfluous "len++", and improve the excuses in
the commit message for removing the tests of
longest_ancestor_length().

Thanks to Junio for his helpful comments.

Michael Haggerty (9):
  Introduce new static function real_path_internal()
  real_path_internal(): add comment explaining use of cwd
  Introduce new function real_path_if_valid()
  longest_ancestor_length(): use string_list_split()
  longest_ancestor_length(): explicitly filter list before loop
  longest_ancestor_length(): always add a slash to the end of prefixes
  longest_ancestor_length(): use string_list_longest_prefix()
  longest_ancestor_length(): resolve symlinks before comparing paths
  t1504: stop resolving symlinks in GIT_CEILING_DIRECTORIES

 abspath.c               | 105 ++++++++++++++++++++++++++++++++++++++----------
 cache.h                 |   1 +
 path.c                  |  54 +++++++++++++++----------
 t/t0060-path-utils.sh   |  64 -----------------------------
 t/t1504-ceiling-dirs.sh |  67 +++++++++++++++---------------
 5 files changed, 151 insertions(+), 140 deletions(-)

-- 
1.7.11.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/9] Introduce new static function real_path_internal()
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
@ 2012-09-29  6:15 ` Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 2/9] real_path_internal(): add comment explaining use of cwd Michael Haggerty
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

It accepts a new parameter, die_on_error.  If die_on_error is false,
it simply cleans up after itself and returns NULL rather than dying.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 abspath.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 72 insertions(+), 21 deletions(-)

diff --git a/abspath.c b/abspath.c
index 05f2d79..a7ab8e9 100644
--- a/abspath.c
+++ b/abspath.c
@@ -15,15 +15,26 @@ int is_directory(const char *path)
 #define MAXDEPTH 5
 
 /*
- * Use this to get the real path, i.e. resolve links. If you want an
- * absolute path but don't mind links, use absolute_path.
+ * Return the real path (i.e., absolute path, with symlinks resolved
+ * and extra slashes removed) equivalent to the specified path.  (If
+ * you want an absolute path but don't mind links, use
+ * absolute_path().)  The return value is a pointer to a static
+ * buffer.
+ *
+ * The input and all intermediate paths must be shorter than MAX_PATH.
+ * The directory part of path (i.e., everything up to the last
+ * dir_sep) must denote a valid, existing directory, but the last
+ * component need not exist.  If die_on_error is set, then die with an
+ * informative error message if there is a problem.  Otherwise, return
+ * NULL on errors (without generating any output).
  *
  * If path is our buffer, then return path, as it's already what the
  * user wants.
  */
-const char *real_path(const char *path)
+static const char *real_path_internal(const char *path, int die_on_error)
 {
 	static char bufs[2][PATH_MAX + 1], *buf = bufs[0], *next_buf = bufs[1];
+	char *retval = NULL;
 	char cwd[1024] = "";
 	int buf_index = 1;
 
@@ -35,11 +46,19 @@ const char *real_path(const char *path)
 	if (path == buf || path == next_buf)
 		return path;
 
-	if (!*path)
-		die("The empty string is not a valid path");
+	if (!*path) {
+		if (die_on_error)
+			die("The empty string is not a valid path");
+		else
+			goto error_out;
+	}
 
-	if (strlcpy(buf, path, PATH_MAX) >= PATH_MAX)
-		die ("Too long path: %.*s", 60, path);
+	if (strlcpy(buf, path, PATH_MAX) >= PATH_MAX) {
+		if (die_on_error)
+			die("Too long path: %.*s", 60, path);
+		else
+			goto error_out;
+	}
 
 	while (depth--) {
 		if (!is_directory(buf)) {
@@ -54,20 +73,36 @@ const char *real_path(const char *path)
 		}
 
 		if (*buf) {
-			if (!*cwd && !getcwd(cwd, sizeof(cwd)))
-				die_errno ("Could not get current working directory");
+			if (!*cwd && !getcwd(cwd, sizeof(cwd))) {
+				if (die_on_error)
+					die_errno("Could not get current working directory");
+				else
+					goto error_out;
+			}
 
-			if (chdir(buf))
-				die_errno ("Could not switch to '%s'", buf);
+			if (chdir(buf)) {
+				if (die_on_error)
+					die_errno("Could not switch to '%s'", buf);
+				else
+					goto error_out;
+			}
+		}
+		if (!getcwd(buf, PATH_MAX)) {
+			if (die_on_error)
+				die_errno("Could not get current working directory");
+			else
+				goto error_out;
 		}
-		if (!getcwd(buf, PATH_MAX))
-			die_errno ("Could not get current working directory");
 
 		if (last_elem) {
 			size_t len = strlen(buf);
-			if (len + strlen(last_elem) + 2 > PATH_MAX)
-				die ("Too long path name: '%s/%s'",
-						buf, last_elem);
+			if (len + strlen(last_elem) + 2 > PATH_MAX) {
+				if (die_on_error)
+					die("Too long path name: '%s/%s'",
+					    buf, last_elem);
+				else
+					goto error_out;
+			}
 			if (len && !is_dir_sep(buf[len-1]))
 				buf[len++] = '/';
 			strcpy(buf + len, last_elem);
@@ -77,10 +112,18 @@ const char *real_path(const char *path)
 
 		if (!lstat(buf, &st) && S_ISLNK(st.st_mode)) {
 			ssize_t len = readlink(buf, next_buf, PATH_MAX);
-			if (len < 0)
-				die_errno ("Invalid symlink '%s'", buf);
-			if (PATH_MAX <= len)
-				die("symbolic link too long: %s", buf);
+			if (len < 0) {
+				if (die_on_error)
+					die_errno("Invalid symlink '%s'", buf);
+				else
+					goto error_out;
+			}
+			if (PATH_MAX <= len) {
+				if (die_on_error)
+					die("symbolic link too long: %s", buf);
+				else
+					goto error_out;
+			}
 			next_buf[len] = '\0';
 			buf = next_buf;
 			buf_index = 1 - buf_index;
@@ -89,10 +132,18 @@ const char *real_path(const char *path)
 			break;
 	}
 
+	retval = buf;
+error_out:
+	free(last_elem);
 	if (*cwd && chdir(cwd))
 		die_errno ("Could not change back to '%s'", cwd);
 
-	return buf;
+	return retval;
+}
+
+const char *real_path(const char *path)
+{
+	return real_path_internal(path, 1);
 }
 
 static const char *get_pwd_cwd(void)
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/9] real_path_internal(): add comment explaining use of cwd
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 1/9] Introduce new static function real_path_internal() Michael Haggerty
@ 2012-09-29  6:15 ` Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 3/9] Introduce new function real_path_if_valid() Michael Haggerty
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 abspath.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/abspath.c b/abspath.c
index a7ab8e9..f8a526f 100644
--- a/abspath.c
+++ b/abspath.c
@@ -35,7 +35,14 @@ static const char *real_path_internal(const char *path, int die_on_error)
 {
 	static char bufs[2][PATH_MAX + 1], *buf = bufs[0], *next_buf = bufs[1];
 	char *retval = NULL;
+
+	/*
+	 * If we have to temporarily chdir(), store the original CWD
+	 * here so that we can chdir() back to it at the end of the
+	 * function:
+	 */
 	char cwd[1024] = "";
+
 	int buf_index = 1;
 
 	int depth = MAXDEPTH;
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/9] Introduce new function real_path_if_valid()
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 1/9] Introduce new static function real_path_internal() Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 2/9] real_path_internal(): add comment explaining use of cwd Michael Haggerty
@ 2012-09-29  6:15 ` Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 4/9] longest_ancestor_length(): use string_list_split() Michael Haggerty
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

The function is like real_path(), except that it returns NULL on error
instead of dying.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 abspath.c | 5 +++++
 cache.h   | 1 +
 2 files changed, 6 insertions(+)

diff --git a/abspath.c b/abspath.c
index f8a526f..40cdc46 100644
--- a/abspath.c
+++ b/abspath.c
@@ -153,6 +153,11 @@ const char *real_path(const char *path)
 	return real_path_internal(path, 1);
 }
 
+const char *real_path_if_valid(const char *path)
+{
+	return real_path_internal(path, 0);
+}
+
 static const char *get_pwd_cwd(void)
 {
 	static char cwd[PATH_MAX + 1];
diff --git a/cache.h b/cache.h
index a58df84..b0d75bc 100644
--- a/cache.h
+++ b/cache.h
@@ -714,6 +714,7 @@ static inline int is_absolute_path(const char *path)
 }
 int is_directory(const char *);
 const char *real_path(const char *path);
+const char *real_path_if_valid(const char *path);
 const char *absolute_path(const char *path);
 const char *relative_path(const char *abs, const char *base);
 int normalize_path_copy(char *dst, const char *src);
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/9] longest_ancestor_length(): use string_list_split()
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
                   ` (2 preceding siblings ...)
  2012-09-29  6:15 ` [PATCH v2 3/9] Introduce new function real_path_if_valid() Michael Haggerty
@ 2012-09-29  6:15 ` Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 5/9] longest_ancestor_length(): explicitly filter list before loop Michael Haggerty
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty


Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 path.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/path.c b/path.c
index cbbdf7d..f455e8e 100644
--- a/path.c
+++ b/path.c
@@ -12,6 +12,7 @@
  */
 #include "cache.h"
 #include "strbuf.h"
+#include "string-list.h"
 
 static char bad_path[] = "/bad-path/";
 
@@ -582,20 +583,22 @@ int normalize_path_copy(char *dst, const char *src)
  */
 int longest_ancestor_length(const char *path, const char *prefix_list)
 {
+	struct string_list prefixes = STRING_LIST_INIT_DUP;
 	char buf[PATH_MAX+1];
-	const char *ceil, *colon;
-	int len, max_len = -1;
+	int i, max_len = -1;
 
 	if (prefix_list == NULL || !strcmp(path, "/"))
 		return -1;
 
-	for (colon = ceil = prefix_list; *colon; ceil = colon+1) {
-		for (colon = ceil; *colon && *colon != PATH_SEP; colon++);
-		len = colon - ceil;
+	string_list_split(&prefixes, prefix_list, PATH_SEP, -1);
+
+	for (i = 0; i < prefixes.nr; i++) {
+		const char *ceil = prefixes.items[i].string;
+		int len = strlen(ceil);
+
 		if (len == 0 || len > PATH_MAX || !is_absolute_path(ceil))
 			continue;
-		strlcpy(buf, ceil, len+1);
-		if (normalize_path_copy(buf, buf) < 0)
+		if (normalize_path_copy(buf, ceil) < 0)
 			continue;
 		len = strlen(buf);
 		if (len > 0 && buf[len-1] == '/')
@@ -608,6 +611,7 @@ int longest_ancestor_length(const char *path, const char *prefix_list)
 		}
 	}
 
+	string_list_clear(&prefixes, 0);
 	return max_len;
 }
 
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/9] longest_ancestor_length(): explicitly filter list before loop
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
                   ` (3 preceding siblings ...)
  2012-09-29  6:15 ` [PATCH v2 4/9] longest_ancestor_length(): use string_list_split() Michael Haggerty
@ 2012-09-29  6:15 ` Michael Haggerty
  2012-09-29  6:15 ` [PATCH v2 6/9] longest_ancestor_length(): always add a slash to the end of prefixes Michael Haggerty
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

Separate the step of filtering and normalizing elements of the
prefixes list from the iteration that looks for the longest prefix.
This will help keep the function testable after we not only normalize
the paths, but also convert them into real paths.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 path.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/path.c b/path.c
index f455e8e..b255a74 100644
--- a/path.c
+++ b/path.c
@@ -568,6 +568,24 @@ int normalize_path_copy(char *dst, const char *src)
 	return 0;
 }
 
+static int normalize_path_callback(struct string_list_item *item, void *cb_data)
+{
+	char buf[PATH_MAX+1];
+	const char *ceil = item->string;
+	int len = strlen(ceil);
+
+	if (len == 0 || len > PATH_MAX || !is_absolute_path(ceil))
+		return 0;
+	if (normalize_path_copy(buf, ceil) < 0)
+		return 0;
+	len = strlen(buf);
+	if (len > 0 && buf[len-1] == '/')
+		buf[--len] = '\0';
+	free(item->string);
+	item->string = xstrdup(buf);
+	return 1;
+}
+
 /*
  * path = Canonical absolute path
  * prefix_list = Colon-separated list of absolute paths
@@ -584,27 +602,19 @@ int normalize_path_copy(char *dst, const char *src)
 int longest_ancestor_length(const char *path, const char *prefix_list)
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
-	char buf[PATH_MAX+1];
 	int i, max_len = -1;
 
 	if (prefix_list == NULL || !strcmp(path, "/"))
 		return -1;
 
 	string_list_split(&prefixes, prefix_list, PATH_SEP, -1);
+	filter_string_list(&prefixes, 0, normalize_path_callback, NULL);
 
 	for (i = 0; i < prefixes.nr; i++) {
 		const char *ceil = prefixes.items[i].string;
 		int len = strlen(ceil);
 
-		if (len == 0 || len > PATH_MAX || !is_absolute_path(ceil))
-			continue;
-		if (normalize_path_copy(buf, ceil) < 0)
-			continue;
-		len = strlen(buf);
-		if (len > 0 && buf[len-1] == '/')
-			buf[--len] = '\0';
-
-		if (!strncmp(path, buf, len) &&
+		if (!strncmp(path, ceil, len) &&
 		    path[len] == '/' &&
 		    len > max_len) {
 			max_len = len;
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 6/9] longest_ancestor_length(): always add a slash to the end of prefixes
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
                   ` (4 preceding siblings ...)
  2012-09-29  6:15 ` [PATCH v2 5/9] longest_ancestor_length(): explicitly filter list before loop Michael Haggerty
@ 2012-09-29  6:15 ` Michael Haggerty
  2012-09-29  6:16 ` [PATCH v2 7/9] longest_ancestor_length(): use string_list_longest_prefix() Michael Haggerty
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

Previously we stripped off any slashes that were present.  Instead,
add a slash if it is missing.  This removes the need for an extra
check that path has a slash following the prefix and makes the
handling of the root directory more natural, making the way clear to
use string_list_longest_prefix().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 path.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/path.c b/path.c
index b255a74..a56c64a 100644
--- a/path.c
+++ b/path.c
@@ -570,7 +570,7 @@ int normalize_path_copy(char *dst, const char *src)
 
 static int normalize_path_callback(struct string_list_item *item, void *cb_data)
 {
-	char buf[PATH_MAX+1];
+	char buf[PATH_MAX+2];
 	const char *ceil = item->string;
 	int len = strlen(ceil);
 
@@ -579,8 +579,10 @@ static int normalize_path_callback(struct string_list_item *item, void *cb_data)
 	if (normalize_path_copy(buf, ceil) < 0)
 		return 0;
 	len = strlen(buf);
-	if (len > 0 && buf[len-1] == '/')
-		buf[--len] = '\0';
+	if (len == 0 || buf[len-1] != '/') {
+		buf[len++] = '/';
+		buf[len++] = '\0';
+	}
 	free(item->string);
 	item->string = xstrdup(buf);
 	return 1;
@@ -615,9 +617,8 @@ int longest_ancestor_length(const char *path, const char *prefix_list)
 		int len = strlen(ceil);
 
 		if (!strncmp(path, ceil, len) &&
-		    path[len] == '/' &&
-		    len > max_len) {
-			max_len = len;
+		    len - 1 > max_len) {
+			max_len = len - 1;
 		}
 	}
 
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 7/9] longest_ancestor_length(): use string_list_longest_prefix()
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
                   ` (5 preceding siblings ...)
  2012-09-29  6:15 ` [PATCH v2 6/9] longest_ancestor_length(): always add a slash to the end of prefixes Michael Haggerty
@ 2012-09-29  6:16 ` Michael Haggerty
  2012-09-29  6:16 ` [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths Michael Haggerty
  2012-09-29  6:16 ` [PATCH v2 9/9] t1504: stop resolving symlinks in GIT_CEILING_DIRECTORIES Michael Haggerty
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

Use string_list_longest_prefix() in the implementation of
longest_ancestor_length(), instead of an equivalent loop.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 path.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/path.c b/path.c
index a56c64a..b20f2fb 100644
--- a/path.c
+++ b/path.c
@@ -604,23 +604,16 @@ static int normalize_path_callback(struct string_list_item *item, void *cb_data)
 int longest_ancestor_length(const char *path, const char *prefix_list)
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
-	int i, max_len = -1;
+	int max_len;
+	const char *longest_prefix;
 
 	if (prefix_list == NULL || !strcmp(path, "/"))
 		return -1;
 
 	string_list_split(&prefixes, prefix_list, PATH_SEP, -1);
 	filter_string_list(&prefixes, 0, normalize_path_callback, NULL);
-
-	for (i = 0; i < prefixes.nr; i++) {
-		const char *ceil = prefixes.items[i].string;
-		int len = strlen(ceil);
-
-		if (!strncmp(path, ceil, len) &&
-		    len - 1 > max_len) {
-			max_len = len - 1;
-		}
-	}
+	longest_prefix = string_list_longest_prefix(&prefixes, path);
+	max_len = longest_prefix ? strlen(longest_prefix) - 1 : -1;
 
 	string_list_clear(&prefixes, 0);
 	return max_len;
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
                   ` (6 preceding siblings ...)
  2012-09-29  6:16 ` [PATCH v2 7/9] longest_ancestor_length(): use string_list_longest_prefix() Michael Haggerty
@ 2012-09-29  6:16 ` Michael Haggerty
  2012-09-30  8:00   ` Junio C Hamano
  2012-09-29  6:16 ` [PATCH v2 9/9] t1504: stop resolving symlinks in GIT_CEILING_DIRECTORIES Michael Haggerty
  8 siblings, 1 reply; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

longest_ancestor_length() relies on a textual comparison of directory
parts to find the part of path that overlaps with one of the paths in
prefix_list.  But this doesn't work if any of the prefixes involves a
symbolic link, because the directories will look different even though
they might logically refer to the same directory.  So canonicalize the
paths listed in prefix_list using real_path_if_valid() before trying
to find matches.

path is already in canonical form, so doesn't need to be canonicalized
again.

This fixes some problems with using GIT_CEILING_DIRECTORIES that
contains paths involving symlinks, including t4035 if run with --root
set to a path involving symlinks.

Remove a number of tests of longest_ancestor_length().  It is awkward
to test longest_ancestor_length() now, because its new path
normalization behavior depends on the contents of the whole
filesystem.  On the other hand:

* longest_ancestor_length() is now built of reusable components that
  are themselves tested separately (string_list_split(),
  string_list_longest_prefix(), and real_path_if_valid()), so it
  contains less code that can go wrong.

* longest_ancestor_length() gets some testing (albeit not systematic)
  via the GIT_CEILING_DIRECTORIES tests.

Therefore the work of updating these tests exceeds any expected
benefits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 path.c                | 18 +++++++++------
 t/t0060-path-utils.sh | 64 ---------------------------------------------------
 2 files changed, 11 insertions(+), 71 deletions(-)

diff --git a/path.c b/path.c
index b20f2fb..40d7360 100644
--- a/path.c
+++ b/path.c
@@ -570,21 +570,25 @@ int normalize_path_copy(char *dst, const char *src)
 
 static int normalize_path_callback(struct string_list_item *item, void *cb_data)
 {
-	char buf[PATH_MAX+2];
+	char *buf;
 	const char *ceil = item->string;
-	int len = strlen(ceil);
+	const char *realpath;
+	int len;
 
-	if (len == 0 || len > PATH_MAX || !is_absolute_path(ceil))
+	if (!*ceil || !is_absolute_path(ceil))
 		return 0;
-	if (normalize_path_copy(buf, ceil) < 0)
+	realpath = real_path_if_valid(ceil);
+	if (!realpath)
 		return 0;
-	len = strlen(buf);
+	len = strlen(realpath);
+	buf = xmalloc(len + 2); /* Leave space for possible trailing slash */
+	strcpy(buf, realpath);
 	if (len == 0 || buf[len-1] != '/') {
 		buf[len++] = '/';
-		buf[len++] = '\0';
+		buf[len] = '\0';
 	}
 	free(item->string);
-	item->string = xstrdup(buf);
+	item->string = buf;
 	return 1;
 }
 
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 4ef2345..c97bbf2 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -12,28 +12,6 @@ norm_path() {
 	"test \"\$(test-path-utils normalize_path_copy '$1')\" = '$2'"
 }
 
-# On Windows, we are using MSYS's bash, which mangles the paths.
-# Absolute paths are anchored at the MSYS installation directory,
-# which means that the path / accounts for this many characters:
-rootoff=$(test-path-utils normalize_path_copy / | wc -c)
-# Account for the trailing LF:
-if test $rootoff = 2; then
-	rootoff=	# we are on Unix
-else
-	rootoff=$(($rootoff-1))
-fi
-
-ancestor() {
-	# We do some math with the expected ancestor length.
-	expected=$3
-	if test -n "$rootoff" && test "x$expected" != x-1; then
-		expected=$(($expected+$rootoff))
-	fi
-	test_expect_success "longest ancestor: $1 $2 => $expected" \
-	"actual=\$(test-path-utils longest_ancestor_length '$1' '$2') &&
-	 test \"\$actual\" = '$expected'"
-}
-
 # Absolute path tests must be skipped on Windows because due to path mangling
 # the test program never sees a POSIX-style absolute path
 case $(uname -s) in
@@ -93,48 +71,6 @@ norm_path /d1/s1//../s2/../../d2 /d2 POSIX
 norm_path /d1/.../d2 /d1/.../d2 POSIX
 norm_path /d1/..././../d2 /d1/d2 POSIX
 
-ancestor / "" -1
-ancestor / / -1
-ancestor /foo "" -1
-ancestor /foo : -1
-ancestor /foo ::. -1
-ancestor /foo ::..:: -1
-ancestor /foo / 0
-ancestor /foo /fo -1
-ancestor /foo /foo -1
-ancestor /foo /foo/ -1
-ancestor /foo /bar -1
-ancestor /foo /bar/ -1
-ancestor /foo /foo/bar -1
-ancestor /foo /foo:/bar/ -1
-ancestor /foo /foo/:/bar/ -1
-ancestor /foo /foo::/bar/ -1
-ancestor /foo /:/foo:/bar/ 0
-ancestor /foo /foo:/:/bar/ 0
-ancestor /foo /:/bar/:/foo 0
-ancestor /foo/bar "" -1
-ancestor /foo/bar / 0
-ancestor /foo/bar /fo -1
-ancestor /foo/bar foo -1
-ancestor /foo/bar /foo 4
-ancestor /foo/bar /foo/ 4
-ancestor /foo/bar /foo/ba -1
-ancestor /foo/bar /:/fo 0
-ancestor /foo/bar /foo:/foo/ba 4
-ancestor /foo/bar /bar -1
-ancestor /foo/bar /bar/ -1
-ancestor /foo/bar /fo: -1
-ancestor /foo/bar :/fo -1
-ancestor /foo/bar /foo:/bar/ 4
-ancestor /foo/bar /:/foo:/bar/ 4
-ancestor /foo/bar /foo:/:/bar/ 4
-ancestor /foo/bar /:/bar/:/fo 0
-ancestor /foo/bar /:/bar/ 0
-ancestor /foo/bar .:/foo/. 4
-ancestor /foo/bar .:/foo/.:.: 4
-ancestor /foo/bar /foo/./:.:/bar 4
-ancestor /foo/bar .:/bar -1
-
 test_expect_success 'strip_path_suffix' '
 	test c:/msysgit = $(test-path-utils strip_path_suffix \
 		c:/msysgit/libexec//git-core libexec/git-core)
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 9/9] t1504: stop resolving symlinks in GIT_CEILING_DIRECTORIES
  2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
                   ` (7 preceding siblings ...)
  2012-09-29  6:16 ` [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths Michael Haggerty
@ 2012-09-29  6:16 ` Michael Haggerty
  8 siblings, 0 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-09-29  6:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Michael Haggerty

This test used to explicitly resolve symlinks in the paths derived
from TRASH_DIRECTORY that were written to GIT_CEILING_DIRECTORIES,
because the code handling GIT_CEILING_DIRECTORIES was confused by
symlinks.  This is no longer necessary.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t1504-ceiling-dirs.sh | 67 ++++++++++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/t/t1504-ceiling-dirs.sh b/t/t1504-ceiling-dirs.sh
index cce87a5..a4a5202 100755
--- a/t/t1504-ceiling-dirs.sh
+++ b/t/t1504-ceiling-dirs.sh
@@ -14,8 +14,7 @@ test_fail() {
 	'
 }
 
-TRASH_ROOT="$PWD"
-ROOT_PARENT=$(dirname "$TRASH_ROOT")
+ROOT_PARENT=$(dirname "$TRASH_DIRECTORY")
 
 
 unset GIT_CEILING_DIRECTORIES
@@ -32,16 +31,16 @@ test_prefix ceil_at_parent ""
 GIT_CEILING_DIRECTORIES="$ROOT_PARENT/"
 test_prefix ceil_at_parent_slash ""
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY"
 test_prefix ceil_at_trash ""
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/"
 test_prefix ceil_at_trash_slash ""
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub"
 test_prefix ceil_at_sub ""
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub/"
 test_prefix ceil_at_sub_slash ""
 
 
@@ -56,55 +55,55 @@ export GIT_CEILING_DIRECTORIES
 GIT_CEILING_DIRECTORIES=""
 test_prefix subdir_ceil_empty "sub/dir/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY"
 test_fail subdir_ceil_at_trash
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/"
 test_fail subdir_ceil_at_trash_slash
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub"
 test_fail subdir_ceil_at_sub
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub/"
 test_fail subdir_ceil_at_sub_slash
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub/dir"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub/dir"
 test_prefix subdir_ceil_at_subdir "sub/dir/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub/dir/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub/dir/"
 test_prefix subdir_ceil_at_subdir_slash "sub/dir/"
 
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/su"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/su"
 test_prefix subdir_ceil_at_su "sub/dir/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/su/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/su/"
 test_prefix subdir_ceil_at_su_slash "sub/dir/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub/di"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub/di"
 test_prefix subdir_ceil_at_sub_di "sub/dir/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub/di"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub/di"
 test_prefix subdir_ceil_at_sub_di_slash "sub/dir/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/subdi"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/subdi"
 test_prefix subdir_ceil_at_subdi "sub/dir/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/subdi"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/subdi"
 test_prefix subdir_ceil_at_subdi_slash "sub/dir/"
 
 
-GIT_CEILING_DIRECTORIES="/foo:$TRASH_ROOT/sub"
+GIT_CEILING_DIRECTORIES="/foo:$TRASH_DIRECTORY/sub"
 test_fail second_of_two
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub:/bar"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub:/bar"
 test_fail first_of_two
 
-GIT_CEILING_DIRECTORIES="/foo:$TRASH_ROOT/sub:/bar"
+GIT_CEILING_DIRECTORIES="/foo:$TRASH_DIRECTORY/sub:/bar"
 test_fail second_of_three
 
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sub"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sub"
 GIT_DIR=../../.git
 export GIT_DIR
 test_prefix git_dir_specified ""
@@ -123,41 +122,41 @@ export GIT_CEILING_DIRECTORIES
 GIT_CEILING_DIRECTORIES=""
 test_prefix sd_ceil_empty "s/d/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY"
 test_fail sd_ceil_at_trash
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/"
 test_fail sd_ceil_at_trash_slash
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/s"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/s"
 test_fail sd_ceil_at_s
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/s/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/s/"
 test_fail sd_ceil_at_s_slash
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/s/d"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/s/d"
 test_prefix sd_ceil_at_sd "s/d/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/s/d/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/s/d/"
 test_prefix sd_ceil_at_sd_slash "s/d/"
 
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/su"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/su"
 test_prefix sd_ceil_at_su "s/d/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/su/"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/su/"
 test_prefix sd_ceil_at_su_slash "s/d/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/s/di"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/s/di"
 test_prefix sd_ceil_at_s_di "s/d/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/s/di"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/s/di"
 test_prefix sd_ceil_at_s_di_slash "s/d/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sdi"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sdi"
 test_prefix sd_ceil_at_sdi "s/d/"
 
-GIT_CEILING_DIRECTORIES="$TRASH_ROOT/sdi"
+GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/sdi"
 test_prefix sd_ceil_at_sdi_slash "s/d/"
 
 
-- 
1.7.11.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths
  2012-09-29  6:16 ` [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths Michael Haggerty
@ 2012-09-30  8:00   ` Junio C Hamano
  2012-10-01  4:51     ` Michael Haggerty
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2012-09-30  8:00 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jiang Xin, Lea Wiemann, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> longest_ancestor_length() relies on a textual comparison of directory
> parts to find the part of path that overlaps with one of the paths in
> prefix_list.  But this doesn't work if any of the prefixes involves a
> symbolic link, because the directories will look different even though
> they might logically refer to the same directory.  So canonicalize the
> paths listed in prefix_list using real_path_if_valid() before trying
> to find matches.

I somehow feel that this is making the logic unnecessarily
convoluted by solving the problem at a wrong level.

If longest_ancestor_length() takes a single string and a list of
candidate string prefixes, conceptually it should be usable for any
hierarchy-looking string that uses slashes as hierarchy separator
(e.g. refs that may be stored in packed-refs that you cannot expect
lstat(2) or readlink(2) to give you any sensible results).

The real problem is that the list given from the environment may
contain a path that violates that "it suffices to take the longest
string-prefix taking slashes into account" assumption in such a
generic l-a-l implementation, no?  And this patch solves it by
making l-a-l _less_ generic and forcing it to be aware of the glitch
of its caller (you can either blame clueless user who lies when
setting the GIT_CEILING_DIRECTORIES by including paths contaminated
with symlinks, or blame the calling setup_git_directory_gently_1()
function that does not resolve the symbolic links before calling
this function).

As l-a-l is only used by the "stop at the ceiling" logic, isn't it a
far simpler solution to keep the function work at the string level,
and make the caller fix its input, i.e. the value taken from the
environment variable, before calling it?  That is, grab the value of
GIT_CEILING_DIRECTORIES, split it into a list at PATH_SEP (while
rejecting any non-absolute paths), normalize the elements of that
list by resolving symbolic links, and then pass the cwd[] and the
normalized string list to l-a-l?

The resulting callsite in setup_git_directory_gently_1() would pass
cwd[] and the ceiling_dirs (which now is a string list), all of
whose elements would happen to begin with "/" (or dos drive prefix
followed by the "root" symbol), but l-a-l can be written in such a
way that it does not even require that all the input has to begin at
root, which would later make it usable with things that are not
paths (references, for example, all of which begin with "refs/" and
not "/refs/").

Hrm?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths
  2012-09-30  8:00   ` Junio C Hamano
@ 2012-10-01  4:51     ` Michael Haggerty
  2012-10-01  5:30       ` Junio C Hamano
  2012-10-06  8:04       ` Proposed function path_in_directory() [was: Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths] Michael Haggerty
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Haggerty @ 2012-10-01  4:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git

On 09/30/2012 10:00 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> longest_ancestor_length() relies on a textual comparison of directory
>> parts to find the part of path that overlaps with one of the paths in
>> prefix_list.  But this doesn't work if any of the prefixes involves a
>> symbolic link, because the directories will look different even though
>> they might logically refer to the same directory.  So canonicalize the
>> paths listed in prefix_list using real_path_if_valid() before trying
>> to find matches.
> 
> I somehow feel that this is making the logic unnecessarily
> convoluted by solving the problem at a wrong level.
> 
> If longest_ancestor_length() takes a single string and a list of
> candidate string prefixes, conceptually it should be usable for any
> hierarchy-looking string that uses slashes as hierarchy separator
> (e.g. refs that may be stored in packed-refs that you cannot expect
> lstat(2) or readlink(2) to give you any sensible results).
> 
> The real problem is that the list given from the environment may
> contain a path that violates that "it suffices to take the longest
> string-prefix taking slashes into account" assumption in such a
> generic l-a-l implementation, no?  And this patch solves it by
> making l-a-l _less_ generic and forcing it to be aware of the glitch
> of its caller (you can either blame clueless user who lies when
> setting the GIT_CEILING_DIRECTORIES by including paths contaminated
> with symlinks, or blame the calling setup_git_directory_gently_1()
> function that does not resolve the symbolic links before calling
> this function).
> 
> As l-a-l is only used by the "stop at the ceiling" logic, isn't it a
> far simpler solution to keep the function work at the string level,
> and make the caller fix its input, i.e. the value taken from the
> environment variable, before calling it?  That is, grab the value of
> GIT_CEILING_DIRECTORIES, split it into a list at PATH_SEP (while
> rejecting any non-absolute paths), normalize the elements of that
> list by resolving symbolic links, and then pass the cwd[] and the
> normalized string list to l-a-l?
> 
> The resulting callsite in setup_git_directory_gently_1() would pass
> cwd[] and the ceiling_dirs (which now is a string list), all of
> whose elements would happen to begin with "/" (or dos drive prefix
> followed by the "root" symbol), but l-a-l can be written in such a
> way that it does not even require that all the input has to begin at
> root, which would later make it usable with things that are not
> paths (references, for example, all of which begin with "refs/" and
> not "/refs/").

I agree that longest_ancestor_length() is not very generic or
interesting anymore.  Nearly all of its "added value" comes from the
normalize_path_callback() helper function.  One possibility would be to
inline it at the one place it is called.

The function string_list_longest_prefix() was my attempt to isolate the
kernel of generic functionality from longest_ancestor_path().  It is
*almost* the function that you are proposing, with the exception that it
does not ensure that the prefix match ends at a "/" boundary.  So
another alternative could be to change this function to respect "/"
boundaries.  (After the change, the function might not belong in the
string-list API anymore.)

However, the semantics of a function that matches prefixes at "/"
boundaries is not entirely obvious.  Suppose we would implement the test
via a function like path_prefixcmp(path, prefix).  I can think of a few
policy questions that would have to be answered:

* Is a trailing slash on the prefix argument required, optional, or
prohibited?  What if the prefix is "/" or "//" or "c:/"?

* Is a trailing slash on the path argument optional/prohibited?  Are "/"
or "//" allowed as path arguments?

* Is a path its own prefix?

    path_prefixcmp("a/b", "a/b") -> true or false?

(For the implementation of longest_ancestor_path(), we would prefer this
to return "false".)  Or does the answer depend on whether the prefix has
a trailing slash?

    path_prefixcmp("a/b", "a/b") -> true?
    path_prefixcmp("a/b", "a/b/") -> false?

Part of the reason that I implemented string_list_longest_prefix()
instead of the function that you suggest is that the behavior of the
former is far more obvious.

I think I would advocate that the prefix has to match the front of the
path exactly (including any trailing slashes) and either

    strlen(prefix) == 0
    or the prefix ended with a '/'
    or the prefix and path are identical
    or the character in path following the matching part is a '/'

This would allow the "is path its own prefix" policy to be decided by
the caller by either including or omitting a trailing slash on the
prefix argument.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths
  2012-10-01  4:51     ` Michael Haggerty
@ 2012-10-01  5:30       ` Junio C Hamano
  2012-10-06  8:04       ` Proposed function path_in_directory() [was: Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths] Michael Haggerty
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2012-10-01  5:30 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jiang Xin, Lea Wiemann, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I think I would advocate that the prefix has to match the front of
> the path exactly (including any trailing slashes) and either
>
>     strlen(prefix) == 0
>     or the prefix ended with a '/'
>     or the prefix and path are identical
>     or the character in path following the matching part is a '/'
>
> This would allow the "is path its own prefix" policy to be decided by
> the caller by either including or omitting a trailing slash on the
> prefix argument.

I think that is sensible thing to do.

The primary thing I found questionable was that the function, given
"/net/wink/project/frotz" to check against "/pub:/s" (or "/pub/:/s/"
if you like), will report that "/net/wink/project" directory is the
longest ancestor, when "/s" is a symlink that happens to point at
"/net/wink/project".  It is very counter-intuitive when you view its
two input strings as strings.  By making its sole caller expand the
symbolic links, it would be a lot clearer what is going on to
anybody who follow the codepath.  We have one path obtained from
getcwd() and a set of paths all of which are real paths without
symbolic aliasing, and we check if one among the latter cover
an earlier part of the former.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Proposed function path_in_directory() [was: Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths]
  2012-10-01  4:51     ` Michael Haggerty
  2012-10-01  5:30       ` Junio C Hamano
@ 2012-10-06  8:04       ` Michael Haggerty
  2012-10-08 16:13         ` Proposed function path_in_directory() Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Haggerty @ 2012-10-06  8:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Lea Wiemann, git, Johannes Sixt

On 10/01/2012 06:51 AM, Michael Haggerty wrote:
> I think I would advocate that the prefix has to match the front of the
> path exactly (including any trailing slashes) and either
> 
>     strlen(prefix) == 0
>     or the prefix ended with a '/'
>     or the prefix and path are identical
>     or the character in path following the matching part is a '/'
> 
> This would allow the "is path its own prefix" policy to be decided by
> the caller by either including or omitting a trailing slash on the
> prefix argument.

Thinking about this more, I don't think it will work.  As usual, the
special cases around "/" and "//" make things awkward.  I think it is
necessary to have a separate argument to specify whether "path is its
own prefix".

So I am trying to decide how a function path_in_directory() should work,
and would like to get some feedback, especially on the following two points:

1. How should "//" be handled?  I don't really have experience with the
peculiar paths that start with "//", so I'm not sure how they should be
handled (or even if the handling needs to be platform-dependent).  My
working hypothesis is that the inputs should be normalized by the
caller, so if the caller wants "//" to be treated as equivalent to "/"
then the caller should normalize them *before* calling this function.
Conversely, if the caller passes "//" to the function, that implies that
"//" is distinct from "/" and "//" is considered a proper subdirectory
of "/".  See the cases marked with "??????" below.

2. Does there need to be any special related to DOS paths?

> /*
>  * Return true iff path is within dir.  The comparison is textual,
>  * meaning that path and dir should be normalized and either both be
>  * absolute or both be relative to the same directory.  If path and
>  * dir represent the *same* path, then return true iff allow_equal is
>  * true.  Single trailing slashes on either path or dir are ignored,
>  * (except for the special case "//"); i.e., "a/b" and "a/b/" are
>  * treated equivalently, as are "" and "/".  Examples (* means "don't
>  * care"):
>  *
>  * - path_in_directory("a/b", "a", *) -> true
>  * - path_in_directory("a", "a/b", *) -> false
>  * - path_in_directory("ab", "a", *) -> false
>  * - path_in_directory("a/b", "a/b", 0) -> false
>  *   (same if either argument is replaced with "a/b/")
>  * - path_in_directory("a/b", "a/b", 1) -> true
>  *   (same if either argument is replaced with "a/b/")
>  * - path_in_directory(*, "/", 1) -> true
>  * - path_in_directory("/", "/", 0) -> false
>  * - path_in_directory("//", "/", 0) -> true    ??????
>  * - path_in_directory("//", "/", 1) -> true
>  * - path_in_directory("/", "//", 0) -> false
>  * - path_in_directory("/", "//", 1) -> false   ??????
>  * - path_in_directory("/a/b", "//", *) -> false
>  */
> int path_in_directory(const char *path, const char *dir, int allow_equal);

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposed function path_in_directory()
  2012-10-06  8:04       ` Proposed function path_in_directory() [was: Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths] Michael Haggerty
@ 2012-10-08 16:13         ` Junio C Hamano
  2012-10-08 18:20           ` Johannes Sixt
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2012-10-08 16:13 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jiang Xin, Lea Wiemann, git, Johannes Sixt

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 10/01/2012 06:51 AM, Michael Haggerty wrote:
>> I think I would advocate that the prefix has to match the front of the
>> path exactly (including any trailing slashes) and either
>> 
>>     strlen(prefix) == 0
>>     or the prefix ended with a '/'
>>     or the prefix and path are identical
>>     or the character in path following the matching part is a '/'
>> 
>> This would allow the "is path its own prefix" policy to be decided by
>> the caller by either including or omitting a trailing slash on the
>> prefix argument.
>
> Thinking about this more, I don't think it will work.  As usual, the
> special cases around "/" and "//" make things awkward.  I think it is
> necessary to have a separate argument to specify whether "path is its
> own prefix".
>
> So I am trying to decide how a function path_in_directory() should work,
> and would like to get some feedback, especially on the following two points:
>
> 1. How should "//" be handled?  I don't really have experience with the
> peculiar paths that start with "//", so I'm not sure how they should be
> handled (or even if the handling needs to be platform-dependent).  My
> working hypothesis is that the inputs should be normalized by the
> caller, so if the caller wants "//" to be treated as equivalent to "/"
> then the caller should normalize them *before* calling this function.
> Conversely, if the caller passes "//" to the function, that implies that
> "//" is distinct from "/" and "//" is considered a proper subdirectory
> of "/".  See the cases marked with "??????" below.

I think POSIX essentially says that anything that begins with "//"
is up to the platform, but in practice, the only real-life variant
we see is "//dosshare/path/from/root".  I agree with your "caller
should normalize for the semantics it wants to see".

If our lazy programming creates paths with duplicated slashes, we
should be fixing such codepaths anyway, so assuming that all paths
we create ourselves are free of double-slashes _inside_ a path (i.e.
when concatenating a subdirectory name to a directory name), the
only case we need to worry about is the double-slashes given by the
user (either from the command line, environment, or configuration).
The path normalizing logic removes double-slashes inside a path, and
I think it should do so while keeping the leading slashes, so that we
do not lose distinction between "//dosshare/" and "/dosshare/".

> 2. Does there need to be any special related to DOS paths?

The ceiling computation may need special case for dos.  What does
the getcwd() give us?  Do we learn only the path within the "current
drive" and need to prefix C: (or D: or X:) ourselves if we really
want to tell C:\bin and D:\bin apart?

Assuming that is the case, the ceiling computation would need a
helper function that hides the gory details of prefixing getcwd()
result with drive letter or whatever needed, and another that
normalizes the elements of the environment variables (I presume that
if an element in it without the drive prefix should be normalized to
add the current drive letter to it so that the normalized getcwd()
result can be compared with it).  E.g. if the ceiling list is
"D:/a/b;/trash/" then getcwd() returning "/a" alone does not make it
outside the ceiling due to "D:/a/b"---our current drive must be "D"
for that pattern to kick in.  The unqualified /trash would apply to
any drive.

Does that make sense?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposed function path_in_directory()
  2012-10-08 16:13         ` Proposed function path_in_directory() Junio C Hamano
@ 2012-10-08 18:20           ` Johannes Sixt
  2012-10-08 18:23             ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Sixt @ 2012-10-08 18:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, Jiang Xin, Lea Wiemann, git

Am 08.10.2012 18:13, schrieb Junio C Hamano:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> 2. Does there need to be any special related to DOS paths?
> 
> The ceiling computation may need special case for dos.  What does
> the getcwd() give us?  Do we learn only the path within the "current
> drive" and need to prefix C: (or D: or X:) ourselves if we really
> want to tell C:\bin and D:\bin apart?

We don't have to do that. getcwd() returns the drive letter.

> Assuming that is the case, the ceiling computation would need a
> helper function that hides the gory details of prefixing getcwd()
> result with drive letter or whatever needed, and another that
> normalizes the elements of the environment variables (I presume that
> if an element in it without the drive prefix should be normalized to
> add the current drive letter to it so that the normalized getcwd()
> result can be compared with it).  E.g. if the ceiling list is
> "D:/a/b;/trash/" then getcwd() returning "/a" alone does not make it
> outside the ceiling due to "D:/a/b"---our current drive must be "D"
> for that pattern to kick in.  The unqualified /trash would apply to
> any drive.

A component in an path list like GIT_CEILING_DIRECTORIES or PATH that
does not contain the drive letter is a user error. Do not cater for it.

It is more important to take into account that those components can
contain backslash as directory separators, while our mingw_getcwd()
returns forward slashes.

-- Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposed function path_in_directory()
  2012-10-08 18:20           ` Johannes Sixt
@ 2012-10-08 18:23             ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2012-10-08 18:23 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Michael Haggerty, Jiang Xin, Lea Wiemann, git

Johannes Sixt <j6t@kdbg.org> writes:

> Am 08.10.2012 18:13, schrieb Junio C Hamano:
>> Michael Haggerty <mhagger@alum.mit.edu> writes:
>>> 2. Does there need to be any special related to DOS paths?
>> 
>> The ceiling computation may need special case for dos.  What does
>> the getcwd() give us?  Do we learn only the path within the "current
>> drive" and need to prefix C: (or D: or X:) ourselves if we really
>> want to tell C:\bin and D:\bin apart?
>
> We don't have to do that. getcwd() returns the drive letter.
> ...
> A component in an path list like GIT_CEILING_DIRECTORIES or PATH that
> does not contain the drive letter is a user error. Do not cater for it.

That makes it much simpler and saner.  Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-10-08 18:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 1/9] Introduce new static function real_path_internal() Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 2/9] real_path_internal(): add comment explaining use of cwd Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 3/9] Introduce new function real_path_if_valid() Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 4/9] longest_ancestor_length(): use string_list_split() Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 5/9] longest_ancestor_length(): explicitly filter list before loop Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 6/9] longest_ancestor_length(): always add a slash to the end of prefixes Michael Haggerty
2012-09-29  6:16 ` [PATCH v2 7/9] longest_ancestor_length(): use string_list_longest_prefix() Michael Haggerty
2012-09-29  6:16 ` [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths Michael Haggerty
2012-09-30  8:00   ` Junio C Hamano
2012-10-01  4:51     ` Michael Haggerty
2012-10-01  5:30       ` Junio C Hamano
2012-10-06  8:04       ` Proposed function path_in_directory() [was: Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths] Michael Haggerty
2012-10-08 16:13         ` Proposed function path_in_directory() Junio C Hamano
2012-10-08 18:20           ` Johannes Sixt
2012-10-08 18:23             ` Junio C Hamano
2012-09-29  6:16 ` [PATCH v2 9/9] t1504: stop resolving symlinks in GIT_CEILING_DIRECTORIES Michael Haggerty

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).