git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] making real_path thread-safe
@ 2016-12-05 18:58 Brandon Williams
  2016-12-05 18:58 ` [PATCH] real_path: make " Brandon Williams
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
  0 siblings, 2 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-05 18:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, peff, jacob.keller

The goal of this patch is to make the function real_path thread-safe.  To do,
the current logic which uses chdir() to do the symlink resolution needed to be
rewritten to perform the symlink resolution by hand.

I wanted to get the bulk of the work out there for review quickly so
technically this patch doesn't make real_path thread safe yet as it still uses
a static buffer to serve up the resultant path.  On a future iteration I plan
on removing this and pushing ownership of the returned string to the callers.

For more information on why this change is desirable you can see the following
threads:

https://public-inbox.org/git/20161129010538.GA121643@google.com/
https://public-inbox.org/git/1480555714-186183-1-git-send-email-bmwill@google.com/

Brandon Williams (1):
  real_path: make real_path thread-safe

 abspath.c | 183 +++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 122 insertions(+), 61 deletions(-)

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH] real_path: make real_path thread-safe
  2016-12-05 18:58 [PATCH] making real_path thread-safe Brandon Williams
@ 2016-12-05 18:58 ` Brandon Williams
  2016-12-05 19:57   ` Stefan Beller
                     ` (2 more replies)
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
  1 sibling, 3 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-05 18:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, peff, jacob.keller

The current implementation of real_path uses chdir() in order to resolve
symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
process as a whole and not just an individual thread.  Instead perform
the symlink resolution by hand so that the calls to chdir() can be
removed, making real_path reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 183 +++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 122 insertions(+), 61 deletions(-)

diff --git a/abspath.c b/abspath.c
index 2825de8..6f546e0 100644
--- a/abspath.c
+++ b/abspath.c
@@ -11,8 +11,38 @@ int is_directory(const char *path)
 	return (!stat(path, &st) && S_ISDIR(st.st_mode));
 }
 
+/* removes the last path component from 'path' except if 'path' is root */
+static void strip_last_component(struct strbuf *path)
+{
+	if (path->len > 1) {
+		char *last_slash = find_last_dir_sep(path->buf);
+		strbuf_setlen(path, last_slash - path->buf);
+	}
+}
+
+/* gets the next component in 'remaining' and places it in 'next' */
+static void get_next_component(struct strbuf *next, struct strbuf *remaining)
+{
+	char *start = NULL;
+	char *end = NULL;
+
+	strbuf_reset(next);
+
+	/* look for the next component */
+	/* Skip sequences of multiple path-separators */
+	for (start = remaining->buf; is_dir_sep(*start); start++)
+		/* nothing */;
+	/* Find end of the path component */
+	for (end = start; *end && !is_dir_sep(*end); end++)
+		/* nothing */;
+
+	strbuf_add(next, start, end - start);
+	/* remove the component from 'remaining' */
+	strbuf_remove(remaining, 0, end - remaining->buf);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXDEPTH 5
+#define MAXSYMLINKS 5
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
@@ -21,7 +51,6 @@ int is_directory(const char *path)
  * absolute_path().)  The return value is a pointer to a static
  * buffer.
  *
- * The input and all intermediate paths must be shorter than MAX_PATH.
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
@@ -33,22 +62,16 @@ int is_directory(const char *path)
  */
 static const char *real_path_internal(const char *path, int die_on_error)
 {
-	static struct strbuf sb = STRBUF_INIT;
+	static struct strbuf resolved = STRBUF_INIT;
+	struct strbuf remaining = STRBUF_INIT;
+	struct strbuf next = STRBUF_INIT;
+	struct strbuf symlink = STRBUF_INIT;
 	char *retval = NULL;
-
-	/*
-	 * If we have to temporarily chdir(), store the original CWD
-	 * here so that we can chdir() back to it at the end of the
-	 * function:
-	 */
-	struct strbuf cwd = STRBUF_INIT;
-
-	int depth = MAXDEPTH;
-	char *last_elem = NULL;
+	int num_symlinks = 0;
 	struct stat st;
 
 	/* We've already done it */
-	if (path == sb.buf)
+	if (path == resolved.buf)
 		return path;
 
 	if (!*path) {
@@ -58,74 +81,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&sb);
-	strbuf_addstr(&sb, path);
-
-	while (depth--) {
-		if (!is_directory(sb.buf)) {
-			char *last_slash = find_last_dir_sep(sb.buf);
-			if (last_slash) {
-				last_elem = xstrdup(last_slash + 1);
-				strbuf_setlen(&sb, last_slash - sb.buf + 1);
-			} else {
-				last_elem = xmemdupz(sb.buf, sb.len);
-				strbuf_reset(&sb);
-			}
+	strbuf_reset(&resolved);
+
+	if (is_absolute_path(path)) {
+		/* absolute path; start with only root as being resolved */
+		strbuf_addch(&resolved, '/');
+		strbuf_addstr(&remaining, path + 1);
+	} else {
+		/* relative path; can use CWD as the initial resolved path */
+		if (strbuf_getcwd(&resolved)) {
+			if (die_on_error)
+				die_errno("Could not get current working directory");
+			else
+				goto error_out;
+		}
+		strbuf_addstr(&remaining, path);
+	}
+
+	/* Iterate over the remaining path components */
+	while (remaining.len > 0) {
+		get_next_component(&next, &remaining);
+
+		if (next.len == 0) {
+			continue; /* empty component */
+		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
+			continue; /* '.' component */
+		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
+			/* '..' component; strip the last path component */
+			strip_last_component(&resolved);
+			continue;
 		}
 
-		if (sb.len) {
-			if (!cwd.len && strbuf_getcwd(&cwd)) {
+		/* append the next component and resolve resultant path */
+		if (resolved.buf[resolved.len - 1] != '/')
+			strbuf_addch(&resolved, '/');
+		strbuf_addbuf(&resolved, &next);
+
+		if (lstat(resolved.buf, &st)) {
+			/* error out unless this was the last component */
+			if (!(errno == ENOENT && !remaining.len)) {
 				if (die_on_error)
-					die_errno("Could not get current working directory");
+					die_errno("Invalid path '%s'",
+						  resolved.buf);
 				else
 					goto error_out;
 			}
+		} else if (S_ISLNK(st.st_mode)) {
+			ssize_t len;
+			strbuf_reset(&symlink);
 
-			if (chdir(sb.buf)) {
+			if (num_symlinks++ > MAXSYMLINKS) {
 				if (die_on_error)
-					die_errno("Could not switch to '%s'",
-						  sb.buf);
+					die("More than %d nested symlinks "
+					    "on path '%s'", MAXSYMLINKS, path);
 				else
 					goto error_out;
 			}
-		}
-		if (strbuf_getcwd(&sb)) {
-			if (die_on_error)
-				die_errno("Could not get current working directory");
-			else
-				goto error_out;
-		}
 
-		if (last_elem) {
-			if (sb.len && !is_dir_sep(sb.buf[sb.len - 1]))
-				strbuf_addch(&sb, '/');
-			strbuf_addstr(&sb, last_elem);
-			free(last_elem);
-			last_elem = NULL;
-		}
-
-		if (!lstat(sb.buf, &st) && S_ISLNK(st.st_mode)) {
-			struct strbuf next_sb = STRBUF_INIT;
-			ssize_t len = strbuf_readlink(&next_sb, sb.buf, 0);
+			len = strbuf_readlink(&symlink, resolved.buf,
+					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  sb.buf);
+						  resolved.buf);
 				else
 					goto error_out;
 			}
-			strbuf_swap(&sb, &next_sb);
-			strbuf_release(&next_sb);
-		} else
-			break;
+
+			if (is_absolute_path(symlink.buf)) {
+				/*
+				 * absolute symlink
+				 * reset resolved path to root
+				 */
+				strbuf_setlen(&resolved, 1);
+			} else {
+				/*
+				 * relative symlink
+				 * strip off the last component since it will
+				 * be replaced with the contents of the symlink
+				 */
+				strip_last_component(&resolved);
+			}
+
+			/*
+			 * if there are still remaining components to resolve
+			 * then append them to symlink
+			 */
+			if (remaining.len) {
+				strbuf_addch(&symlink, '/');
+				strbuf_addbuf(&symlink, &remaining);
+			}
+
+			/*
+			 * use the symlink as the remaining components that
+			 * need to be resloved
+			 */
+			strbuf_swap(&symlink, &remaining);
+		}
 	}
 
-	retval = sb.buf;
+	retval = resolved.buf;
+
 error_out:
-	free(last_elem);
-	if (cwd.len && chdir(cwd.buf))
-		die_errno("Could not change back to '%s'", cwd.buf);
-	strbuf_release(&cwd);
+	//strbuf_release(&resolved);
+	strbuf_release(&remaining);
+	strbuf_release(&next);
+	strbuf_release(&symlink);
 
 	return retval;
 }
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-05 18:58 ` [PATCH] real_path: make " Brandon Williams
@ 2016-12-05 19:57   ` Stefan Beller
  2016-12-05 20:12     ` Brandon Williams
  2016-12-05 20:14   ` Stefan Beller
  2016-12-06 23:44   ` Junio C Hamano
  2 siblings, 1 reply; 83+ messages in thread
From: Stefan Beller @ 2016-12-05 19:57 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jeff King, Jacob Keller

On Mon, Dec 5, 2016 at 10:58 AM, Brandon Williams <bmwill@google.com> wrote:
> The current implementation of real_path uses chdir() in order to resolve
> symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
> process as a whole and not just an individual thread.  Instead perform
> the symlink resolution by hand so that the calls to chdir() can be
> removed, making real_path reentrant.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>

Thanks for working on this, some comments below:

>
> +/* removes the last path component from 'path' except if 'path' is root */
> +static void strip_last_component(struct strbuf *path)
> +{
> +       if (path->len > 1) {
> +               char *last_slash = find_last_dir_sep(path->buf);

What happens when there is no dir_sep?

> +/* gets the next component in 'remaining' and places it in 'next' */
> +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> +{

It's more than just getting it, it also chops it off of 'remaining' ?



> +       strbuf_reset(&resolved);
> +
> +       if (is_absolute_path(path)) {
> +               /* absolute path; start with only root as being resolved */
> +               strbuf_addch(&resolved, '/');
> +               strbuf_addstr(&remaining, path + 1);

This is where we would wait for input of Windows savy people.

> +       } else {
> +               /* relative path; can use CWD as the initial resolved path */
> +               if (strbuf_getcwd(&resolved)) {
> +                       if (die_on_error)
> +                               die_errno("Could not get current working directory");

I am looking at xgetcwd, which words it slightly differently.

    if (strbuf_getcwd(&sb))
        die_errno(_("unable to get current working directory"));

Not sure if aligning them would be a good idea?

Going by "git grep die_errno" as well as our Coding guidelines,
we don't want to see capitalized error messages.

> +
> +               if (next.len == 0) {
> +                       continue; /* empty component */

which means we resolve over path//with//double//slashes just fine,
as well as /./ parts. :)

>                 }
>
> -               if (sb.len) {
> -                       if (!cwd.len && strbuf_getcwd(&cwd)) {
> +               /* append the next component and resolve resultant path */

"resultant" indicates you have a math background. :)
But I had to look it up, I guess it is fine that way,
though "resulting" may cause less mental friction
for non native speakers.


> +                       if (!(errno == ENOENT && !remaining.len)) {
>                                 if (die_on_error)
> -                                       die_errno("Could not get current working directory");
> +                                       die_errno("Invalid path '%s'",
> +                                                 resolved.buf);
>                                 else
>                                         goto error_out;
>                         }
> +               } else if (S_ISLNK(st.st_mode)) {

As far as I can tell, we could keep the symlink strbuf
at a smaller scope here? (I was surprised how many strbufs
are declared at the beginning of the function)

> +       //strbuf_release(&resolved);

This is why the cover letter toned down expectations ?
(no // as comment, maybe remove that line?)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-05 19:57   ` Stefan Beller
@ 2016-12-05 20:12     ` Brandon Williams
  2016-12-05 20:38       ` Stefan Beller
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-05 20:12 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jeff King, Jacob Keller

On 12/05, Stefan Beller wrote:
> > +/* removes the last path component from 'path' except if 'path' is root */
> > +static void strip_last_component(struct strbuf *path)
> > +{
> > +       if (path->len > 1) {
> > +               char *last_slash = find_last_dir_sep(path->buf);
> 
> What happens when there is no dir_sep?

There should always be a dir_sep since that only gets run if the passed
in path at least contains root '/'

> 
> > +/* gets the next component in 'remaining' and places it in 'next' */
> > +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> > +{
> 
> It's more than just getting it, it also chops it off of 'remaining' ?

True, I can update the comment to reflect that.

> > +       } else {
> > +               /* relative path; can use CWD as the initial resolved path */
> > +               if (strbuf_getcwd(&resolved)) {
> > +                       if (die_on_error)
> > +                               die_errno("Could not get current working directory");
> 
> I am looking at xgetcwd, which words it slightly differently.
> 
>     if (strbuf_getcwd(&sb))
>         die_errno(_("unable to get current working directory"));
> 
> Not sure if aligning them would be a good idea?
> 
> Going by "git grep die_errno" as well as our Coding guidelines,
> we don't want to see capitalized error messages.

K I can use the other msg.

> >
> > -               if (sb.len) {
> > -                       if (!cwd.len && strbuf_getcwd(&cwd)) {
> > +               /* append the next component and resolve resultant path */
> 
> "resultant" indicates you have a math background. :)
> But I had to look it up, I guess it is fine that way,
> though "resulting" may cause less mental friction
> for non native speakers.
> 
> 
> > +                       if (!(errno == ENOENT && !remaining.len)) {
> >                                 if (die_on_error)
> > -                                       die_errno("Could not get current working directory");
> > +                                       die_errno("Invalid path '%s'",
> > +                                                 resolved.buf);
> >                                 else
> >                                         goto error_out;
> >                         }
> > +               } else if (S_ISLNK(st.st_mode)) {
> 
> As far as I can tell, we could keep the symlink strbuf
> at a smaller scope here? (I was surprised how many strbufs
> are declared at the beginning of the function)

Yeah I can push it down in scope.  There will be a bit more allocation
churn with the smaller scope but multiple symlinks should be rare?
Alternatively the 'next' buffer can be reused...I decided against that
initially due to readability.  And yes, lots of string manipulation
requires lots of strbufs :)

> > +       //strbuf_release(&resolved);
> 
> This is why the cover letter toned down expectations ?
> (no // as comment, maybe remove that line?)

yep.  It will be added back in though once the callers to real_path take
ownership of the memory.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-05 18:58 ` [PATCH] real_path: make " Brandon Williams
  2016-12-05 19:57   ` Stefan Beller
@ 2016-12-05 20:14   ` Stefan Beller
  2016-12-05 20:16     ` Brandon Williams
  2016-12-06 23:44   ` Junio C Hamano
  2 siblings, 1 reply; 83+ messages in thread
From: Stefan Beller @ 2016-12-05 20:14 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jeff King, Jacob Keller

>  static const char *real_path_internal(const char *path, int die_on_error)
>  {
> -       static struct strbuf sb = STRBUF_INIT;
> +       static struct strbuf resolved = STRBUF_INIT;

Also by having this static here, it is not quite thread safe, yet.

By removing the static here we cannot do the early cheap check as:

>         /* We've already done it */
> -       if (path == sb.buf)
> +       if (path == resolved.buf)
>                 return path;

I wonder how often we run into this case; are there some callers explicitly
relying on real_path_internal being cheap for repeated calls?
(Maybe run the test suite with this early return instrumented? Not sure how
to assess the impact of removing the cheap out return optimization)

The long tail (i.e. the actual functionality) should actually be
faster, I'd imagine
as we do less than with using chdir.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-05 20:14   ` Stefan Beller
@ 2016-12-05 20:16     ` Brandon Williams
  2016-12-08  9:41       ` Duy Nguyen
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-05 20:16 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jeff King, Jacob Keller

On 12/05, Stefan Beller wrote:
> >  static const char *real_path_internal(const char *path, int die_on_error)
> >  {
> > -       static struct strbuf sb = STRBUF_INIT;
> > +       static struct strbuf resolved = STRBUF_INIT;
> 
> Also by having this static here, it is not quite thread safe, yet.
> 
> By removing the static here we cannot do the early cheap check as:
> 
> >         /* We've already done it */
> > -       if (path == sb.buf)
> > +       if (path == resolved.buf)
> >                 return path;
> 
> I wonder how often we run into this case; are there some callers explicitly
> relying on real_path_internal being cheap for repeated calls?
> (Maybe run the test suite with this early return instrumented? Not sure how
> to assess the impact of removing the cheap out return optimization)
> 
> The long tail (i.e. the actual functionality) should actually be
> faster, I'd imagine
> as we do less than with using chdir.

Depends on how expensive the chdir calls were.  And I'm working to get
rid of the static buffer.  Just need have the callers own the memory
first.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-05 20:12     ` Brandon Williams
@ 2016-12-05 20:38       ` Stefan Beller
  0 siblings, 0 replies; 83+ messages in thread
From: Stefan Beller @ 2016-12-05 20:38 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jeff King, Jacob Keller

On Mon, Dec 5, 2016 at 12:12 PM, Brandon Williams <bmwill@google.com> wrote:

>> > +       if (path->len > 1) {
>> > +               char *last_slash = find_last_dir_sep(path->buf);
>>
>> What happens when there is no dir_sep?
>
> There should always be a dir_sep since that only gets run if the passed
> in path at least contains root '/'

Oh, sure, that makes sense. When porting/running this on Windows, does
the assumption still hold?

>>     if (strbuf_getcwd(&sb))
>>         die_errno(_("unable to get current working directory"));
>>
>> Not sure if aligning them would be a good idea?
>>
>> Going by "git grep die_errno" as well as our Coding guidelines,
>> we don't want to see capitalized error messages.
>
> K I can use the other msg.

Well this wasn't a rhetorical question, but I was genuine wondering
if that was worth it.

When having different error messages in different places,
it makes debugging easier, because you have fewer starting points.

But this function is deep down in the stack, such that you would expect
other error messages to also show up , so I dunno.

>> > +               } else if (S_ISLNK(st.st_mode)) {
>>
>> As far as I can tell, we could keep the symlink strbuf
>> at a smaller scope here? (I was surprised how many strbufs
>> are declared at the beginning of the function)
>
> Yeah I can push it down in scope.  There will be a bit more allocation
> churn with the smaller scope but multiple symlinks should be rare?
> Alternatively the 'next' buffer can be reused...I decided against that
> initially due to readability.

I'd second to not reuse 'next'. :)
I guess we could keep the less churn-y version then.

>  And yes, lots of string manipulation
> requires lots of strbufs :)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-05 18:58 ` [PATCH] real_path: make " Brandon Williams
  2016-12-05 19:57   ` Stefan Beller
  2016-12-05 20:14   ` Stefan Beller
@ 2016-12-06 23:44   ` Junio C Hamano
  2016-12-07  0:10     ` Brandon Williams
  2 siblings, 1 reply; 83+ messages in thread
From: Junio C Hamano @ 2016-12-06 23:44 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, peff, jacob.keller

Brandon Williams <bmwill@google.com> writes:

> +/* removes the last path component from 'path' except if 'path' is root */
> +static void strip_last_component(struct strbuf *path)
> +{
> +	if (path->len > 1) {
> +		char *last_slash = find_last_dir_sep(path->buf);
> +		strbuf_setlen(path, last_slash - path->buf);
> +	}
> +}

You use find_last_dir_sep() which takes care of "Windows uses
backslash" issue.  Is this function expected to be fed something
like "C:\My Files\foo.txt" and more importantly "C:\My Files"?  Or
is that handled by a lot higher level up in the callchain?  I am
reacting the comparison of path->len and 1 here.

Also is the input expected to be normalized?  Are we expected to be
fed something like "/a//b/./c/../d/e" and react sensibly, or is that
handled by a lot higher level up in the callchain?

> +/* gets the next component in 'remaining' and places it in 'next' */
> +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> +{
> +	char *start = NULL;
> +	char *end = NULL;
> +
> +	strbuf_reset(next);
> +
> +	/* look for the next component */
> +	/* Skip sequences of multiple path-separators */
> +	for (start = remaining->buf; is_dir_sep(*start); start++)
> +		/* nothing */;

Style:
		; /* nothing */

> +	/* Find end of the path component */
> +	for (end = start; *end && !is_dir_sep(*end); end++)
> +		/* nothing */;
> +
> +	strbuf_add(next, start, end - start);

OK, so this was given "///foo/bar" in "remaining" and appended
'foo/' to "next".  I.e. deduping of slashes is handled here.

POSIX cares about treating "//" at the very beginning of the path
specially.  Is that supposed to be handled here, or by a lot higher
level up in the callchain?

> +	/* remove the component from 'remaining' */
> +	strbuf_remove(remaining, 0, end - remaining->buf);
> +}
> +
>  /* We allow "recursive" symbolic links. Only within reason, though. */
> -#define MAXDEPTH 5
> +#define MAXSYMLINKS 5
>  
>  /*
>   * Return the real path (i.e., absolute path, with symlinks resolved
> @@ -21,7 +51,6 @@ int is_directory(const char *path)
>   * absolute_path().)  The return value is a pointer to a static
>   * buffer.
>   *
>   * The directory part of path (i.e., everything up to the last
>   * dir_sep) must denote a valid, existing directory, but the last
>   * component need not exist.  If die_on_error is set, then die with an
> @@ -33,22 +62,16 @@ int is_directory(const char *path)
>   */
>  static const char *real_path_internal(const char *path, int die_on_error)
>  {
> +	static struct strbuf resolved = STRBUF_INIT;

This being 'static' would probably mean that this is not reentrant,
which goes against the title of the patch.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-06 23:44   ` Junio C Hamano
@ 2016-12-07  0:10     ` Brandon Williams
  2016-12-07  1:12       ` Ramsay Jones
  2016-12-07 20:43       ` Johannes Sixt
  0 siblings, 2 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-07  0:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, peff, jacob.keller

On 12/06, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > +/* removes the last path component from 'path' except if 'path' is root */
> > +static void strip_last_component(struct strbuf *path)
> > +{
> > +	if (path->len > 1) {
> > +		char *last_slash = find_last_dir_sep(path->buf);
> > +		strbuf_setlen(path, last_slash - path->buf);
> > +	}
> > +}
> 
> You use find_last_dir_sep() which takes care of "Windows uses
> backslash" issue.  Is this function expected to be fed something
> like "C:\My Files\foo.txt" and more importantly "C:\My Files"?  Or
> is that handled by a lot higher level up in the callchain?  I am
> reacting the comparison of path->len and 1 here.

This function should accept both absolute and relative paths, which
means it should probably accept "C:\My Files".  I wasn't thinking about
windows 100% of the time while writing this so I'm hoping that a windows
expert will point things like this out to me :).  So in reality this
check shouldn't be (path->len > 1) but rather some function is_only_root
which would check if the strbuf only contains a string which looks like
root.

> 
> Also is the input expected to be normalized?  Are we expected to be
> fed something like "/a//b/./c/../d/e" and react sensibly, or is that
> handled by a lot higher level up in the callchain?

Yes it should handle paths like that sensibly.

> > +/* gets the next component in 'remaining' and places it in 'next' */
> > +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> > +{
> > +	char *start = NULL;
> > +	char *end = NULL;
> > +
> > +	strbuf_reset(next);
> > +
> > +	/* look for the next component */
> > +	/* Skip sequences of multiple path-separators */
> > +	for (start = remaining->buf; is_dir_sep(*start); start++)
> > +		/* nothing */;
> 
> Style:
> 		; /* nothing */

k will fix.

> 
> > +	/* Find end of the path component */
> > +	for (end = start; *end && !is_dir_sep(*end); end++)
> > +		/* nothing */;
> > +
> > +	strbuf_add(next, start, end - start);
> 
> OK, so this was given "///foo/bar" in "remaining" and appended
> 'foo/' to "next".  I.e. deduping of slashes is handled here.
> 
> POSIX cares about treating "//" at the very beginning of the path
> specially.  Is that supposed to be handled here, or by a lot higher
> level up in the callchain?

What exactly does "//" mean in this context? (I'm just naive in this
area)

> 
> > +	/* remove the component from 'remaining' */
> > +	strbuf_remove(remaining, 0, end - remaining->buf);
> > +}
> > +
> >  /* We allow "recursive" symbolic links. Only within reason, though. */
> > -#define MAXDEPTH 5
> > +#define MAXSYMLINKS 5
> >  
> >  /*
> >   * Return the real path (i.e., absolute path, with symlinks resolved
> > @@ -21,7 +51,6 @@ int is_directory(const char *path)
> >   * absolute_path().)  The return value is a pointer to a static
> >   * buffer.
> >   *
> >   * The directory part of path (i.e., everything up to the last
> >   * dir_sep) must denote a valid, existing directory, but the last
> >   * component need not exist.  If die_on_error is set, then die with an
> > @@ -33,22 +62,16 @@ int is_directory(const char *path)
> >   */
> >  static const char *real_path_internal(const char *path, int die_on_error)
> >  {
> > +	static struct strbuf resolved = STRBUF_INIT;
> 
> This being 'static' would probably mean that this is not reentrant,
> which goes against the title of the patch.

Yes I mentioned in the cover letter that this was the one last thing
that needed to be changed to make it reentrant.  I wanted to get this
out for review sooner since this is the biggest change (getting rid of
the chdir() calls) and dropping the static here would just require
making the callers own the memory which should hopefully be an easy
refactor.  It just would have taken a bit more time to actually do that
refactor now.  I'm slowly working on it while this is being reviewed and
could be ready to send out when rerolling this patch.  Sorry for having
a slightly misleading patch title :)

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07  0:10     ` Brandon Williams
@ 2016-12-07  1:12       ` Ramsay Jones
  2016-12-07 20:14         ` Torsten Bögershausen
  2016-12-07 20:43       ` Johannes Sixt
  1 sibling, 1 reply; 83+ messages in thread
From: Ramsay Jones @ 2016-12-07  1:12 UTC (permalink / raw)
  To: Brandon Williams, Junio C Hamano; +Cc: git, sbeller, peff, jacob.keller



On 07/12/16 00:10, Brandon Williams wrote:
> On 12/06, Junio C Hamano wrote:
>> POSIX cares about treating "//" at the very beginning of the path
>> specially.  Is that supposed to be handled here, or by a lot higher
>> level up in the callchain?
> 
> What exactly does "//" mean in this context? (I'm just naive in this
> area)

This refers to a UNC path (ie Universal Naming Convention) which
is used to refer to servers, printers and other 'network resources'.
Although this started (many moons ago) in unix, it isn't used too
much outside of windows networks! (where it is usually denoted by
\\servername\path).

You can see the relics of unix UNC paths if you look at the wording
for basename() in the POSIX standard. Note the special treatment of
the path which 'is exactly "//"', see http://pubs.opengroup.org/onlinepubs/009695399/functions/basename.html

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07  1:12       ` Ramsay Jones
@ 2016-12-07 20:14         ` Torsten Bögershausen
  2016-12-07 20:32           ` Junio C Hamano
  0 siblings, 1 reply; 83+ messages in thread
From: Torsten Bögershausen @ 2016-12-07 20:14 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Brandon Williams, Junio C Hamano, git, sbeller, peff,
	jacob.keller

On Wed, Dec 07, 2016 at 01:12:25AM +0000, Ramsay Jones wrote:
> 
> 
> On 07/12/16 00:10, Brandon Williams wrote:
> > On 12/06, Junio C Hamano wrote:
> >> POSIX cares about treating "//" at the very beginning of the path
> >> specially.  Is that supposed to be handled here, or by a lot higher
> >> level up in the callchain?
> > 
> > What exactly does "//" mean in this context? (I'm just naive in this
> > area)
> 
> This refers to a UNC path (ie Universal Naming Convention) which
> is used to refer to servers, printers and other 'network resources'.
> Although this started (many moons ago) in unix, it isn't used too
> much outside of windows networks! (where it is usually denoted by
> \\servername\path).
> 
> You can see the relics of unix UNC paths if you look at the wording
> for basename() in the POSIX standard. Note the special treatment of
> the path which 'is exactly "//"', see http://pubs.opengroup.org/onlinepubs/009695399/functions/basename.html
> 
> ATB,
> Ramsay Jones

Please allow one more comment about UNC:
They are used under Windows, and typically wotk together with Git.
One breakage between 2.10 and 2.11 has been observed, saying that
pushing to \\SERVER\SHARE\DIRECTORY does not work any more.

It has been reported under
 "git 2.11.0 error when pushing to remote located on a windows share"
both here and 
here:
https://github.com/git-for-windows/git/issues/979#issuecomment-264816175

I don't have a Windows box at the moment, and I don't know if the
breakage was introduced by changes in real_patyh().

But in any case it seems that e.g.
//SEFVER/SHARE/DIR1/DIR2/..
must be converted into
//SEFVER/SHARE/DIR1

and 
\\SEFVER\SHARE\DIR1\DIR2\..
must be converted into
\\SEFVER\SHARE\DIR1





^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07 20:14         ` Torsten Bögershausen
@ 2016-12-07 20:32           ` Junio C Hamano
  2016-12-07 22:13             ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Junio C Hamano @ 2016-12-07 20:32 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Ramsay Jones, Brandon Williams, git, sbeller, peff, jacob.keller

Torsten Bögershausen <tboegi@web.de> writes:

> But in any case it seems that e.g.
> //SEFVER/SHARE/DIR1/DIR2/..
> must be converted into
> //SEFVER/SHARE/DIR1
>
> and 
> \\SEFVER\SHARE\DIR1\DIR2\..
> must be converted into
> \\SEFVER\SHARE\DIR1

Additional questions that may be interesting are:

    //A/B/../C		is it //A/C?  is it an error?
    //A/B/../../C/D	is it //C/D?  is it an error?


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07  0:10     ` Brandon Williams
  2016-12-07  1:12       ` Ramsay Jones
@ 2016-12-07 20:43       ` Johannes Sixt
  2016-12-07 22:29         ` Brandon Williams
  1 sibling, 1 reply; 83+ messages in thread
From: Johannes Sixt @ 2016-12-07 20:43 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git, sbeller, peff, jacob.keller

Am 07.12.2016 um 01:10 schrieb Brandon Williams:
> This function should accept both absolute and relative paths, which
> means it should probably accept "C:\My Files".  I wasn't thinking about
> windows 100% of the time while writing this so I'm hoping that a windows
> expert will point things like this out to me :).

;)

With this patch, the test suite fails at the very first git init call:

D:\Src\mingw-git\t>sh t0000-basic.sh -v -i -x
fatal: Invalid path '/:': No such file or directory
error: cannot run git init -- have you built things yet?
FATAL: Unexpected exit with code 1

I haven't dug further, yet.

-- Hannes


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07 20:32           ` Junio C Hamano
@ 2016-12-07 22:13             ` Brandon Williams
  2016-12-08  7:55               ` Torsten Bögershausen
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-07 22:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Torsten Bögershausen, Ramsay Jones, git, sbeller, peff,
	jacob.keller

On 12/07, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
> > But in any case it seems that e.g.
> > //SEFVER/SHARE/DIR1/DIR2/..
> > must be converted into
> > //SEFVER/SHARE/DIR1
> >
> > and 
> > \\SEFVER\SHARE\DIR1\DIR2\..
> > must be converted into
> > \\SEFVER\SHARE\DIR1
> 
> Additional questions that may be interesting are:
> 
>     //A/B/../C		is it //A/C?  is it an error?
>     //A/B/../../C/D	is it //C/D?  is it an error?
> 


Also is //.. the same as //?  I would assume so since /.. is /

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07 20:43       ` Johannes Sixt
@ 2016-12-07 22:29         ` Brandon Williams
  2016-12-08 11:32           ` Johannes Sixt
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-07 22:29 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git, sbeller, peff, jacob.keller

On 12/07, Johannes Sixt wrote:
> Am 07.12.2016 um 01:10 schrieb Brandon Williams:
> >This function should accept both absolute and relative paths, which
> >means it should probably accept "C:\My Files".  I wasn't thinking about
> >windows 100% of the time while writing this so I'm hoping that a windows
> >expert will point things like this out to me :).
> 
> ;)
> 
> With this patch, the test suite fails at the very first git init call:
> 
> D:\Src\mingw-git\t>sh t0000-basic.sh -v -i -x
> fatal: Invalid path '/:': No such file or directory
> error: cannot run git init -- have you built things yet?
> FATAL: Unexpected exit with code 1
> 
> I haven't dug further, yet.
> 
> -- Hannes
> 

Thanks for providing me with the error.  Instead of assuming root is "/"
I'll need to extract what root is from an absolute path.  Aside from
what root looks like, do most other path constructs behave similarly in
unix and windows? (like ".." and "." as examples)

Since I don't really have a windows machine to test things it might be
slightly difficult to get everything correct quickly but hopefully we can
get this working :)

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07 22:13             ` Brandon Williams
@ 2016-12-08  7:55               ` Torsten Bögershausen
  2016-12-08 18:41                 ` Johannes Sixt
  0 siblings, 1 reply; 83+ messages in thread
From: Torsten Bögershausen @ 2016-12-08  7:55 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Junio C Hamano, Ramsay Jones, git, sbeller, peff, jacob.keller

On Wed, Dec 07, 2016 at 02:13:35PM -0800, Brandon Williams wrote:
> On 12/07, Junio C Hamano wrote:
> > Torsten Bögershausen <tboegi@web.de> writes:
> > 
> > > But in any case it seems that e.g.
> > > //SEFVER/SHARE/DIR1/DIR2/..
> > > must be converted into
> > > //SEFVER/SHARE/DIR1
> > >
> > > and 
> > > \\SEFVER\SHARE\DIR1\DIR2\..
> > > must be converted into
> > > \\SEFVER\SHARE\DIR1
> > 
> > Additional questions that may be interesting are:
> > 
> >     //A/B/../C		is it //A/C?  is it an error?
Yes, at least under Windows.
If I have e.g. a Raspi with SAMBA, I can put a git Repository here: 

//raspi/torsten/projects/git
If I use
git push //raspi/torsten/../junio/projects/git
that should be an error.

> >     //A/B/../../C/D	is it //C/D?  is it an error?
> > 

Same for
git push /raspi/../raspi2/torsten//projects/git


> 
> 
> Also is //.. the same as //?  I would assume so since /.. is /
> 
Under Windows //.. is simply illegal, I would say.
The documentation here
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

Mentions these 2 examples, what to feed into the WIN32 file API:

a)
\\?\D:\very-long-path

b)
\\server\share\path\file"

c)
"\\?\UNC\server\share\path" 

So whatever we do, the ".." resoltion is only allowed to look at 
"very-long-path" or "path".

Some conversion may be done in mingw.c:
https://github.com/github/git-msysgit/blob/master/compat/mingw.c
So what I understand, '/' in Git are already converted into '\' if needed ?

It seams that we may wnat a function get_start_of_path(uncpath),
which returns:

get_start_of_path_win("//?/D:/very-long-path")         "/very-long-path" 
get_start_of_path_win("//server/share/path/file")      "/path/file"
get_start_of_path_win("//?/UNC/server/share/path")     "/path"
(I don't know if we need the variant with '\', but is would'n hurt):
get_start_of_path_win("\\\\?\\D:\\very-long-path")         "\\very-long-path" 
get_start_of_path_win("\\\\server\\share\\path\\file")      "\\path\\file"
get_start_of_path_win("\\\\?\\UNC\\server\\share\\path")     "\\path"

Then the non-windows version could simply return
get_start_of_path_non_win(something)     something

Does this make sense ?


 



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-05 20:16     ` Brandon Williams
@ 2016-12-08  9:41       ` Duy Nguyen
  2016-12-08 17:50         ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Duy Nguyen @ 2016-12-08  9:41 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Stefan Beller, git@vger.kernel.org, Jeff King, Jacob Keller

On Tue, Dec 6, 2016 at 3:16 AM, Brandon Williams <bmwill@google.com> wrote:
> On 12/05, Stefan Beller wrote:
>> >  static const char *real_path_internal(const char *path, int die_on_error)
>> >  {
>> > -       static struct strbuf sb = STRBUF_INIT;
>> > +       static struct strbuf resolved = STRBUF_INIT;
>>
>> Also by having this static here, it is not quite thread safe, yet.
>>
>> By removing the static here we cannot do the early cheap check as:
>>
>> >         /* We've already done it */
>> > -       if (path == sb.buf)
>> > +       if (path == resolved.buf)
>> >                 return path;
>>
>> I wonder how often we run into this case; are there some callers explicitly
>> relying on real_path_internal being cheap for repeated calls?
>> (Maybe run the test suite with this early return instrumented? Not sure how
>> to assess the impact of removing the cheap out return optimization)
>>
>> The long tail (i.e. the actual functionality) should actually be
>> faster, I'd imagine
>> as we do less than with using chdir.
>
> Depends on how expensive the chdir calls were.  And I'm working to get
> rid of the static buffer.  Just need have the callers own the memory
> first.

I suggest you turn this real_path_internal into strbuf_real_path. In
other words, it takes a strbuf and writes the result there, allocating
memory if needed.

This function can replace the two strbuf_addstr(..., real_path(..));
we have in setup.c and sha1_file.c. real_path() can own a static
strbuf buffer to retain current behavior. We could also have a new
wrapper real_pathdup() around strbuf_real_path(), which can replace 9
instances of xstrdup(real_path(...)) (and Stefan is adding a few more;
that's what led me back to these mails)
-- 
Duy

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-07 22:29         ` Brandon Williams
@ 2016-12-08 11:32           ` Johannes Sixt
  2016-12-08 16:54             ` Junio C Hamano
  0 siblings, 1 reply; 83+ messages in thread
From: Johannes Sixt @ 2016-12-08 11:32 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git, sbeller, peff, jacob.keller

Am 07.12.2016 um 23:29 schrieb Brandon Williams:
> Instead of assuming root is "/"
> I'll need to extract what root is from an absolute path.  Aside from
> what root looks like, do most other path constructs behave similarly in
> unix and windows? (like ".." and "." as examples)

Yes, .. and . work the same way, except that they cannot appear in the 
\\server\share part. I also think that .. does not cancel these parts.

As long as you use is_absolute_path() and do not simplify path 
components before offset_1st_component(), you should be on the safe side.

> Since I don't really have a windows machine to test things it might be
> slightly difficult to get everything correct quickly but hopefully we can
> get this working :)

I'll lend a hand, of course, as time permits.

-- Hannes


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-08 11:32           ` Johannes Sixt
@ 2016-12-08 16:54             ` Junio C Hamano
  0 siblings, 0 replies; 83+ messages in thread
From: Junio C Hamano @ 2016-12-08 16:54 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Brandon Williams, git, sbeller, peff, jacob.keller

Johannes Sixt <j6t@kdbg.org> writes:

> Am 07.12.2016 um 23:29 schrieb Brandon Williams:
>> Instead of assuming root is "/"
>> I'll need to extract what root is from an absolute path.  Aside from
>> what root looks like, do most other path constructs behave similarly in
>> unix and windows? (like ".." and "." as examples)
>
> Yes, .. and . work the same way, except that they cannot appear in the
> \\server\share part. I also think that .. does not cancel these parts.
>
> As long as you use is_absolute_path() and do not simplify path
> components before offset_1st_component(), you should be on the safe
> side.
>
>> Since I don't really have a windows machine to test things it might be
>> slightly difficult to get everything correct quickly but hopefully we can
>> get this working :)
>
> I'll lend a hand, of course, as time permits.

Thanks, as always, for working well together ;-).

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-08  9:41       ` Duy Nguyen
@ 2016-12-08 17:50         ` Brandon Williams
  0 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-08 17:50 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, git@vger.kernel.org, Jeff King, Jacob Keller

On 12/08, Duy Nguyen wrote:
> On Tue, Dec 6, 2016 at 3:16 AM, Brandon Williams <bmwill@google.com> wrote:
> > On 12/05, Stefan Beller wrote:
> >> >  static const char *real_path_internal(const char *path, int die_on_error)
> >> >  {
> >> > -       static struct strbuf sb = STRBUF_INIT;
> >> > +       static struct strbuf resolved = STRBUF_INIT;
> >>
> >> Also by having this static here, it is not quite thread safe, yet.
> >>
> >> By removing the static here we cannot do the early cheap check as:
> >>
> >> >         /* We've already done it */
> >> > -       if (path == sb.buf)
> >> > +       if (path == resolved.buf)
> >> >                 return path;
> >>
> >> I wonder how often we run into this case; are there some callers explicitly
> >> relying on real_path_internal being cheap for repeated calls?
> >> (Maybe run the test suite with this early return instrumented? Not sure how
> >> to assess the impact of removing the cheap out return optimization)
> >>
> >> The long tail (i.e. the actual functionality) should actually be
> >> faster, I'd imagine
> >> as we do less than with using chdir.
> >
> > Depends on how expensive the chdir calls were.  And I'm working to get
> > rid of the static buffer.  Just need have the callers own the memory
> > first.
> 
> I suggest you turn this real_path_internal into strbuf_real_path. In
> other words, it takes a strbuf and writes the result there, allocating
> memory if needed.
> 
> This function can replace the two strbuf_addstr(..., real_path(..));
> we have in setup.c and sha1_file.c. real_path() can own a static
> strbuf buffer to retain current behavior. We could also have a new
> wrapper real_pathdup() around strbuf_real_path(), which can replace 9
> instances of xstrdup(real_path(...)) (and Stefan is adding a few more;
> that's what led me back to these mails)

Sounds like a plan, thanks for the advice.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-08  7:55               ` Torsten Bögershausen
@ 2016-12-08 18:41                 ` Johannes Sixt
  2016-12-08 19:02                   ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Johannes Sixt @ 2016-12-08 18:41 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Brandon Williams, Junio C Hamano, Ramsay Jones, git, sbeller,
	peff, jacob.keller

Am 08.12.2016 um 08:55 schrieb Torsten Bögershausen:
> Some conversion may be done in mingw.c:
> https://github.com/github/git-msysgit/blob/master/compat/mingw.c
> So what I understand, '/' in Git are already converted into '\' if needed ?

Only if needed, and there are not many places where this is the case. 
(Actually, I can't find one place where we do.) In particular, typical 
file accesses does not require the conversion.

> It seams that we may wnat a function get_start_of_path(uncpath),
> which returns:
>
> get_start_of_path_win("//?/D:/very-long-path")         "/very-long-path"

We have offset_1st_component().

-- Hannes


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] real_path: make real_path thread-safe
  2016-12-08 18:41                 ` Johannes Sixt
@ 2016-12-08 19:02                   ` Brandon Williams
  0 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-08 19:02 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Torsten Bögershausen, Junio C Hamano, Ramsay Jones, git,
	sbeller, peff, jacob.keller

On 12/08, Johannes Sixt wrote:
> Am 08.12.2016 um 08:55 schrieb Torsten Bögershausen:
> >Some conversion may be done in mingw.c:
> >https://github.com/github/git-msysgit/blob/master/compat/mingw.c
> >So what I understand, '/' in Git are already converted into '\' if needed ?
> 
> Only if needed, and there are not many places where this is the
> case. (Actually, I can't find one place where we do.) In particular,
> typical file accesses does not require the conversion.

That's convenient, that way I don't have to worry about using '\' as a
directory separator instead of '/'.

> 
> >It seams that we may wnat a function get_start_of_path(uncpath),
> >which returns:
> >
> >get_start_of_path_win("//?/D:/very-long-path")         "/very-long-path"
> 
> We have offset_1st_component().

Thanks for letting me know this function exists, that makes my job a bit
easier.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v2 0/4] road to reentrant real_path
  2016-12-05 18:58 [PATCH] making real_path thread-safe Brandon Williams
  2016-12-05 18:58 ` [PATCH] real_path: make " Brandon Williams
@ 2016-12-08 23:58 ` Brandon Williams
  2016-12-08 23:58   ` [PATCH v2 1/4] real_path: resolve symlinks by hand Brandon Williams
                     ` (5 more replies)
  1 sibling, 6 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-08 23:58 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Thanks for all of the comments on v1 of the series.  Hopefully this series
addresses the issues with windows and actually passes the first test :)

Some changes in v2:
* the 1st component of a path should now be handled correctly on windows as well
  as unix.
* Pushed the static strbuf to the callers of real_path_internal
* renamed real_path_internal to strbuf_realpath
* added real_pathdup
* migrated some callers of real_path to real_pathdup and strbuf_realpath

Brandon Williams (4):
  real_path: resolve symlinks by hand
  real_path: convert real_path_internal to strbuf_realpath
  real_path: create real_pathdup
  real_path: have callers use real_pathdup and strbuf_realpath

 abspath.c         | 215 ++++++++++++++++++++++++++++++++++++------------------
 builtin/init-db.c |   6 +-
 cache.h           |   3 +
 environment.c     |   2 +-
 setup.c           |  15 ++--
 sha1_file.c       |   2 +-
 submodule.c       |   2 +-
 transport.c       |   2 +-
 worktree.c        |   2 +-
 9 files changed, 163 insertions(+), 86 deletions(-)

--- interdiff from v1

diff --git a/abspath.c b/abspath.c
index 6f546e0..df37356 100644
--- a/abspath.c
+++ b/abspath.c
@@ -14,13 +14,13 @@ int is_directory(const char *path)
 /* removes the last path component from 'path' except if 'path' is root */
 static void strip_last_component(struct strbuf *path)
 {
-	if (path->len > 1) {
+	if (path->len > offset_1st_component(path->buf)) {
 		char *last_slash = find_last_dir_sep(path->buf);
 		strbuf_setlen(path, last_slash - path->buf);
 	}
 }
 
-/* gets the next component in 'remaining' and places it in 'next' */
+/* get (and remove) the next component in 'remaining' and place it in 'next' */
 static void get_next_component(struct strbuf *next, struct strbuf *remaining)
 {
 	char *start = NULL;
@@ -31,10 +31,10 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
 	/* look for the next component */
 	/* Skip sequences of multiple path-separators */
 	for (start = remaining->buf; is_dir_sep(*start); start++)
-		/* nothing */;
+		; /* nothing */
 	/* Find end of the path component */
 	for (end = start; *end && !is_dir_sep(*end); end++)
-		/* nothing */;
+		; /* nothing */
 
 	strbuf_add(next, start, end - start);
 	/* remove the component from 'remaining' */
@@ -48,21 +48,17 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
  * Return the real path (i.e., absolute path, with symlinks resolved
  * and extra slashes removed) equivalent to the specified path.  (If
  * you want an absolute path but don't mind links, use
- * absolute_path().)  The return value is a pointer to a static
- * buffer.
+ * absolute_path().)  Places the resolved realpath in the provided strbuf.
  *
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
  * informative error message if there is a problem.  Otherwise, return
  * NULL on errors (without generating any output).
- *
- * If path is our buffer, then return path, as it's already what the
- * user wants.
  */
-static const char *real_path_internal(const char *path, int die_on_error)
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error)
 {
-	static struct strbuf resolved = STRBUF_INIT;
 	struct strbuf remaining = STRBUF_INIT;
 	struct strbuf next = STRBUF_INIT;
 	struct strbuf symlink = STRBUF_INIT;
@@ -70,10 +66,6 @@ static const char *real_path_internal(const char *path, int die_on_error)
 	int num_symlinks = 0;
 	struct stat st;
 
-	/* We've already done it */
-	if (path == resolved.buf)
-		return path;
-
 	if (!*path) {
 		if (die_on_error)
 			die("The empty string is not a valid path");
@@ -81,17 +73,18 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&resolved);
+	strbuf_reset(resolved);
 
 	if (is_absolute_path(path)) {
 		/* absolute path; start with only root as being resolved */
-		strbuf_addch(&resolved, '/');
-		strbuf_addstr(&remaining, path + 1);
+		int offset = offset_1st_component(path);
+		strbuf_add(resolved, path, offset);
+		strbuf_addstr(&remaining, path + offset);
 	} else {
 		/* relative path; can use CWD as the initial resolved path */
-		if (strbuf_getcwd(&resolved)) {
+		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
-				die_errno("Could not get current working directory");
+				die_errno("unable to get current working directory");
 			else
 				goto error_out;
 		}
@@ -108,21 +101,21 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			continue; /* '.' component */
 		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
 			/* '..' component; strip the last path component */
-			strip_last_component(&resolved);
+			strip_last_component(resolved);
 			continue;
 		}
 
 		/* append the next component and resolve resultant path */
-		if (resolved.buf[resolved.len - 1] != '/')
-			strbuf_addch(&resolved, '/');
-		strbuf_addbuf(&resolved, &next);
+		if (!is_dir_sep(resolved->buf[resolved->len - 1]))
+			strbuf_addch(resolved, '/');
+		strbuf_addbuf(resolved, &next);
 
-		if (lstat(resolved.buf, &st)) {
+		if (lstat(resolved->buf, &st)) {
 			/* error out unless this was the last component */
 			if (!(errno == ENOENT && !remaining.len)) {
 				if (die_on_error)
 					die_errno("Invalid path '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -138,29 +131,29 @@ static const char *real_path_internal(const char *path, int die_on_error)
 					goto error_out;
 			}
 
-			len = strbuf_readlink(&symlink, resolved.buf,
+			len = strbuf_readlink(&symlink, resolved->buf,
 					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
 
 			if (is_absolute_path(symlink.buf)) {
-				/*
-				 * absolute symlink
-				 * reset resolved path to root
-				 */
-				strbuf_setlen(&resolved, 1);
+				/* absolute symlink; set resolved to root */
+				int offset = offset_1st_component(symlink.buf);
+				strbuf_reset(resolved);
+				strbuf_add(resolved, symlink.buf, offset);
+				strbuf_remove(&symlink, 0, offset);
 			} else {
 				/*
 				 * relative symlink
 				 * strip off the last component since it will
 				 * be replaced with the contents of the symlink
 				 */
-				strip_last_component(&resolved);
+				strip_last_component(resolved);
 			}
 
 			/*
@@ -180,25 +173,42 @@ static const char *real_path_internal(const char *path, int die_on_error)
 		}
 	}
 
-	retval = resolved.buf;
+	retval = resolved->buf;
 
 error_out:
-	//strbuf_release(&resolved);
 	strbuf_release(&remaining);
 	strbuf_release(&next);
 	strbuf_release(&symlink);
 
+	if (!retval)
+		strbuf_reset(resolved);
+
 	return retval;
 }
 
 const char *real_path(const char *path)
 {
-	return real_path_internal(path, 1);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 1);
 }
 
 const char *real_path_if_valid(const char *path)
 {
-	return real_path_internal(path, 0);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 0);
+}
+
+char *real_pathdup(const char *path)
+{
+	struct strbuf realpath = STRBUF_INIT;
+	char *retval = NULL;
+
+	if(strbuf_realpath(&realpath, path, 0))
+		retval = strbuf_detach(&realpath, NULL);
+
+	strbuf_release(&realpath);
+
+	return retval;
 }
 
 /*
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 2399b97..76d68fa 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -338,7 +338,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
-	char *original_git_dir = xstrdup(real_path(git_dir));
+	char *original_git_dir = real_pathdup(git_dir);
 
 	if (real_git_dir) {
 		struct stat st;
@@ -489,7 +489,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, init_db_options, init_db_usage, 0);
 
 	if (real_git_dir && !is_absolute_path(real_git_dir))
-		real_git_dir = xstrdup(real_path(real_git_dir));
+		real_git_dir = real_pathdup(real_git_dir);
 
 	if (argc == 1) {
 		int mkdir_tried = 0;
@@ -560,7 +560,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		const char *git_dir_parent = strrchr(git_dir, '/');
 		if (git_dir_parent) {
 			char *rel = xstrndup(git_dir, git_dir_parent - git_dir);
-			git_work_tree_cfg = xstrdup(real_path(rel));
+			git_work_tree_cfg = real_pathdup(rel);
 			free(rel);
 		}
 		if (!git_work_tree_cfg)
diff --git a/cache.h b/cache.h
index a50a61a..e12a5d9 100644
--- a/cache.h
+++ b/cache.h
@@ -1064,8 +1064,11 @@ static inline int is_absolute_path(const char *path)
 	return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
 }
 int is_directory(const char *);
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
+char *real_pathdup(const char *path);
 const char *absolute_path(const char *path);
 const char *remove_leading_path(const char *in, const char *prefix);
 const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
diff --git a/environment.c b/environment.c
index 0935ec6..9b943d2 100644
--- a/environment.c
+++ b/environment.c
@@ -259,7 +259,7 @@ void set_git_work_tree(const char *new_work_tree)
 		return;
 	}
 	git_work_tree_initialized = 1;
-	work_tree = xstrdup(real_path(new_work_tree));
+	work_tree = real_pathdup(new_work_tree);
 }
 
 const char *get_git_work_tree(void)
diff --git a/setup.c b/setup.c
index fe572b8..0d9fdd0 100644
--- a/setup.c
+++ b/setup.c
@@ -254,10 +254,12 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		if (!is_absolute_path(data.buf))
 			strbuf_addf(&path, "%s/", gitdir);
 		strbuf_addbuf(&path, &data);
-		strbuf_addstr(sb, real_path(path.buf));
+		strbuf_realpath(sb, path.buf, 1);
 		ret = 1;
-	} else
+	} else {
 		strbuf_addstr(sb, gitdir);
+	}
+
 	strbuf_release(&data);
 	strbuf_release(&path);
 	return ret;
@@ -692,7 +694,7 @@ static const char *setup_discovered_git_dir(const char *gitdir,
 	/* --work-tree is set without --git-dir; use discovered one */
 	if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
 		if (offset != cwd->len && !is_absolute_path(gitdir))
-			gitdir = xstrdup(real_path(gitdir));
+			gitdir = real_pathdup(gitdir);
 		if (chdir(cwd->buf))
 			die_errno("Could not come back to cwd");
 		return setup_explicit_git_dir(gitdir, cwd, nongit_ok);
@@ -800,11 +802,12 @@ static int canonicalize_ceiling_entry(struct string_list_item *item,
 		/* Keep entry but do not canonicalize it */
 		return 1;
 	} else {
-		const char *real_path = real_path_if_valid(ceil);
-		if (!real_path)
+		char *real_path = real_pathdup(ceil);
+		if (!real_path) {
 			return 0;
+		}
 		free(item->string);
-		item->string = xstrdup(real_path);
+		item->string = real_path;
 		return 1;
 	}
 }
diff --git a/sha1_file.c b/sha1_file.c
index 9c86d19..6a03cc3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -291,7 +291,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (!is_absolute_path(entry) && relative_base) {
-		strbuf_addstr(&pathbuf, real_path(relative_base));
+		strbuf_realpath(&pathbuf, relative_base, 1);
 		strbuf_addch(&pathbuf, '/');
 	}
 	strbuf_addstr(&pathbuf, entry);
diff --git a/submodule.c b/submodule.c
index 6f7d883..c85ba50 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1227,7 +1227,7 @@ void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir)
 {
 	struct strbuf file_name = STRBUF_INIT;
 	struct strbuf rel_path = STRBUF_INIT;
-	const char *real_work_tree = xstrdup(real_path(work_tree));
+	const char *real_work_tree = real_pathdup(work_tree);
 
 	/* Update gitfile */
 	strbuf_addf(&file_name, "%s/.git", work_tree);
diff --git a/transport.c b/transport.c
index d57e8de..236c6f6 100644
--- a/transport.c
+++ b/transport.c
@@ -1130,7 +1130,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
 	const struct ref *extra;
 	struct alternate_refs_data *cb = data;
 
-	other = xstrdup(real_path(e->path));
+	other = real_pathdup(e->path);
 	len = strlen(other);
 
 	while (other[len-1] == '/')
diff --git a/worktree.c b/worktree.c
index f7869f8..c90e013 100644
--- a/worktree.c
+++ b/worktree.c
@@ -255,7 +255,7 @@ struct worktree *find_worktree(struct worktree **list,
 		return wt;
 
 	arg = prefix_filename(prefix, strlen(prefix), arg);
-	path = xstrdup(real_path(arg));
+	path = real_pathdup(arg);
 	for (; *list; list++)
 		if (!fspathcmp(path, real_path((*list)->path)))
 			break;

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 1/4] real_path: resolve symlinks by hand
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
@ 2016-12-08 23:58   ` Brandon Williams
  2016-12-09  1:49     ` Jacob Keller
  2016-12-09 14:33     ` Johannes Sixt
  2016-12-08 23:58   ` [PATCH v2 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-08 23:58 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

The current implementation of real_path uses chdir() in order to resolve
symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
process as a whole and not just an individual thread.  Instead perform
the symlink resolution by hand so that the calls to chdir() can be
removed, making real_path one step closer to being reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 183 +++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 122 insertions(+), 61 deletions(-)

diff --git a/abspath.c b/abspath.c
index 2825de8..92f2a29 100644
--- a/abspath.c
+++ b/abspath.c
@@ -11,8 +11,38 @@ int is_directory(const char *path)
 	return (!stat(path, &st) && S_ISDIR(st.st_mode));
 }
 
+/* removes the last path component from 'path' except if 'path' is root */
+static void strip_last_component(struct strbuf *path)
+{
+	if (path->len > offset_1st_component(path->buf)) {
+		char *last_slash = find_last_dir_sep(path->buf);
+		strbuf_setlen(path, last_slash - path->buf);
+	}
+}
+
+/* get (and remove) the next component in 'remaining' and place it in 'next' */
+static void get_next_component(struct strbuf *next, struct strbuf *remaining)
+{
+	char *start = NULL;
+	char *end = NULL;
+
+	strbuf_reset(next);
+
+	/* look for the next component */
+	/* Skip sequences of multiple path-separators */
+	for (start = remaining->buf; is_dir_sep(*start); start++)
+		; /* nothing */
+	/* Find end of the path component */
+	for (end = start; *end && !is_dir_sep(*end); end++)
+		; /* nothing */
+
+	strbuf_add(next, start, end - start);
+	/* remove the component from 'remaining' */
+	strbuf_remove(remaining, 0, end - remaining->buf);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXDEPTH 5
+#define MAXSYMLINKS 5
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
@@ -21,7 +51,6 @@ int is_directory(const char *path)
  * absolute_path().)  The return value is a pointer to a static
  * buffer.
  *
- * The input and all intermediate paths must be shorter than MAX_PATH.
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
@@ -33,22 +62,16 @@ int is_directory(const char *path)
  */
 static const char *real_path_internal(const char *path, int die_on_error)
 {
-	static struct strbuf sb = STRBUF_INIT;
+	static struct strbuf resolved = STRBUF_INIT;
+	struct strbuf remaining = STRBUF_INIT;
+	struct strbuf next = STRBUF_INIT;
+	struct strbuf symlink = STRBUF_INIT;
 	char *retval = NULL;
-
-	/*
-	 * If we have to temporarily chdir(), store the original CWD
-	 * here so that we can chdir() back to it at the end of the
-	 * function:
-	 */
-	struct strbuf cwd = STRBUF_INIT;
-
-	int depth = MAXDEPTH;
-	char *last_elem = NULL;
+	int num_symlinks = 0;
 	struct stat st;
 
 	/* We've already done it */
-	if (path == sb.buf)
+	if (path == resolved.buf)
 		return path;
 
 	if (!*path) {
@@ -58,74 +81,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&sb);
-	strbuf_addstr(&sb, path);
-
-	while (depth--) {
-		if (!is_directory(sb.buf)) {
-			char *last_slash = find_last_dir_sep(sb.buf);
-			if (last_slash) {
-				last_elem = xstrdup(last_slash + 1);
-				strbuf_setlen(&sb, last_slash - sb.buf + 1);
-			} else {
-				last_elem = xmemdupz(sb.buf, sb.len);
-				strbuf_reset(&sb);
-			}
+	strbuf_reset(&resolved);
+
+	if (is_absolute_path(path)) {
+		/* absolute path; start with only root as being resolved */
+		int offset = offset_1st_component(path);
+		strbuf_add(&resolved, path, offset);
+		strbuf_addstr(&remaining, path + offset);
+	} else {
+		/* relative path; can use CWD as the initial resolved path */
+		if (strbuf_getcwd(&resolved)) {
+			if (die_on_error)
+				die_errno("unable to get current working directory");
+			else
+				goto error_out;
+		}
+		strbuf_addstr(&remaining, path);
+	}
+
+	/* Iterate over the remaining path components */
+	while (remaining.len > 0) {
+		get_next_component(&next, &remaining);
+
+		if (next.len == 0) {
+			continue; /* empty component */
+		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
+			continue; /* '.' component */
+		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
+			/* '..' component; strip the last path component */
+			strip_last_component(&resolved);
+			continue;
 		}
 
-		if (sb.len) {
-			if (!cwd.len && strbuf_getcwd(&cwd)) {
+		/* append the next component and resolve resultant path */
+		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
+			strbuf_addch(&resolved, '/');
+		strbuf_addbuf(&resolved, &next);
+
+		if (lstat(resolved.buf, &st)) {
+			/* error out unless this was the last component */
+			if (!(errno == ENOENT && !remaining.len)) {
 				if (die_on_error)
-					die_errno("Could not get current working directory");
+					die_errno("Invalid path '%s'",
+						  resolved.buf);
 				else
 					goto error_out;
 			}
+		} else if (S_ISLNK(st.st_mode)) {
+			ssize_t len;
+			strbuf_reset(&symlink);
 
-			if (chdir(sb.buf)) {
+			if (num_symlinks++ > MAXSYMLINKS) {
 				if (die_on_error)
-					die_errno("Could not switch to '%s'",
-						  sb.buf);
+					die("More than %d nested symlinks "
+					    "on path '%s'", MAXSYMLINKS, path);
 				else
 					goto error_out;
 			}
-		}
-		if (strbuf_getcwd(&sb)) {
-			if (die_on_error)
-				die_errno("Could not get current working directory");
-			else
-				goto error_out;
-		}
-
-		if (last_elem) {
-			if (sb.len && !is_dir_sep(sb.buf[sb.len - 1]))
-				strbuf_addch(&sb, '/');
-			strbuf_addstr(&sb, last_elem);
-			free(last_elem);
-			last_elem = NULL;
-		}
 
-		if (!lstat(sb.buf, &st) && S_ISLNK(st.st_mode)) {
-			struct strbuf next_sb = STRBUF_INIT;
-			ssize_t len = strbuf_readlink(&next_sb, sb.buf, 0);
+			len = strbuf_readlink(&symlink, resolved.buf,
+					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  sb.buf);
+						  resolved.buf);
 				else
 					goto error_out;
 			}
-			strbuf_swap(&sb, &next_sb);
-			strbuf_release(&next_sb);
-		} else
-			break;
+
+			if (is_absolute_path(symlink.buf)) {
+				/* absolute symlink; set resolved to root */
+				int offset = offset_1st_component(symlink.buf);
+				strbuf_reset(&resolved);
+				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_remove(&symlink, 0, offset);
+			} else {
+				/*
+				 * relative symlink
+				 * strip off the last component since it will
+				 * be replaced with the contents of the symlink
+				 */
+				strip_last_component(&resolved);
+			}
+
+			/*
+			 * if there are still remaining components to resolve
+			 * then append them to symlink
+			 */
+			if (remaining.len) {
+				strbuf_addch(&symlink, '/');
+				strbuf_addbuf(&symlink, &remaining);
+			}
+
+			/*
+			 * use the symlink as the remaining components that
+			 * need to be resloved
+			 */
+			strbuf_swap(&symlink, &remaining);
+		}
 	}
 
-	retval = sb.buf;
+	retval = resolved.buf;
+
 error_out:
-	free(last_elem);
-	if (cwd.len && chdir(cwd.buf))
-		die_errno("Could not change back to '%s'", cwd.buf);
-	strbuf_release(&cwd);
+	strbuf_release(&remaining);
+	strbuf_release(&next);
+	strbuf_release(&symlink);
 
 	return retval;
 }
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 2/4] real_path: convert real_path_internal to strbuf_realpath
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
  2016-12-08 23:58   ` [PATCH v2 1/4] real_path: resolve symlinks by hand Brandon Williams
@ 2016-12-08 23:58   ` Brandon Williams
  2016-12-08 23:58   ` [PATCH v2 3/4] real_path: create real_pathdup Brandon Williams
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-08 23:58 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Change the name of real_path_internal to strbuf_realpath.  In addition
push the static strbuf up to its callers and instead take as a
parameter a pointer to a strbuf to use for the final result.

This change makes strbuf_realpath reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 53 +++++++++++++++++++++++++----------------------------
 cache.h   |  2 ++
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/abspath.c b/abspath.c
index 92f2a29..b0d4c1b 100644
--- a/abspath.c
+++ b/abspath.c
@@ -48,21 +48,17 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
  * Return the real path (i.e., absolute path, with symlinks resolved
  * and extra slashes removed) equivalent to the specified path.  (If
  * you want an absolute path but don't mind links, use
- * absolute_path().)  The return value is a pointer to a static
- * buffer.
+ * absolute_path().)  Places the resolved realpath in the provided strbuf.
  *
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
  * informative error message if there is a problem.  Otherwise, return
  * NULL on errors (without generating any output).
- *
- * If path is our buffer, then return path, as it's already what the
- * user wants.
  */
-static const char *real_path_internal(const char *path, int die_on_error)
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error)
 {
-	static struct strbuf resolved = STRBUF_INIT;
 	struct strbuf remaining = STRBUF_INIT;
 	struct strbuf next = STRBUF_INIT;
 	struct strbuf symlink = STRBUF_INIT;
@@ -70,10 +66,6 @@ static const char *real_path_internal(const char *path, int die_on_error)
 	int num_symlinks = 0;
 	struct stat st;
 
-	/* We've already done it */
-	if (path == resolved.buf)
-		return path;
-
 	if (!*path) {
 		if (die_on_error)
 			die("The empty string is not a valid path");
@@ -81,16 +73,16 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&resolved);
+	strbuf_reset(resolved);
 
 	if (is_absolute_path(path)) {
 		/* absolute path; start with only root as being resolved */
 		int offset = offset_1st_component(path);
-		strbuf_add(&resolved, path, offset);
+		strbuf_add(resolved, path, offset);
 		strbuf_addstr(&remaining, path + offset);
 	} else {
 		/* relative path; can use CWD as the initial resolved path */
-		if (strbuf_getcwd(&resolved)) {
+		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
 				die_errno("unable to get current working directory");
 			else
@@ -109,21 +101,21 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			continue; /* '.' component */
 		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
 			/* '..' component; strip the last path component */
-			strip_last_component(&resolved);
+			strip_last_component(resolved);
 			continue;
 		}
 
 		/* append the next component and resolve resultant path */
-		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
-			strbuf_addch(&resolved, '/');
-		strbuf_addbuf(&resolved, &next);
+		if (!is_dir_sep(resolved->buf[resolved->len - 1]))
+			strbuf_addch(resolved, '/');
+		strbuf_addbuf(resolved, &next);
 
-		if (lstat(resolved.buf, &st)) {
+		if (lstat(resolved->buf, &st)) {
 			/* error out unless this was the last component */
 			if (!(errno == ENOENT && !remaining.len)) {
 				if (die_on_error)
 					die_errno("Invalid path '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -139,12 +131,12 @@ static const char *real_path_internal(const char *path, int die_on_error)
 					goto error_out;
 			}
 
-			len = strbuf_readlink(&symlink, resolved.buf,
+			len = strbuf_readlink(&symlink, resolved->buf,
 					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -152,8 +144,8 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			if (is_absolute_path(symlink.buf)) {
 				/* absolute symlink; set resolved to root */
 				int offset = offset_1st_component(symlink.buf);
-				strbuf_reset(&resolved);
-				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_reset(resolved);
+				strbuf_add(resolved, symlink.buf, offset);
 				strbuf_remove(&symlink, 0, offset);
 			} else {
 				/*
@@ -161,7 +153,7 @@ static const char *real_path_internal(const char *path, int die_on_error)
 				 * strip off the last component since it will
 				 * be replaced with the contents of the symlink
 				 */
-				strip_last_component(&resolved);
+				strip_last_component(resolved);
 			}
 
 			/*
@@ -181,24 +173,29 @@ static const char *real_path_internal(const char *path, int die_on_error)
 		}
 	}
 
-	retval = resolved.buf;
+	retval = resolved->buf;
 
 error_out:
 	strbuf_release(&remaining);
 	strbuf_release(&next);
 	strbuf_release(&symlink);
 
+	if (!retval)
+		strbuf_reset(resolved);
+
 	return retval;
 }
 
 const char *real_path(const char *path)
 {
-	return real_path_internal(path, 1);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 1);
 }
 
 const char *real_path_if_valid(const char *path)
 {
-	return real_path_internal(path, 0);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 0);
 }
 
 /*
diff --git a/cache.h b/cache.h
index a50a61a..7a81294 100644
--- a/cache.h
+++ b/cache.h
@@ -1064,6 +1064,8 @@ static inline int is_absolute_path(const char *path)
 	return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
 }
 int is_directory(const char *);
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
 const char *absolute_path(const char *path);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 3/4] real_path: create real_pathdup
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
  2016-12-08 23:58   ` [PATCH v2 1/4] real_path: resolve symlinks by hand Brandon Williams
  2016-12-08 23:58   ` [PATCH v2 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
@ 2016-12-08 23:58   ` Brandon Williams
  2016-12-09 14:35     ` Johannes Sixt
  2016-12-08 23:58   ` [PATCH v2 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-08 23:58 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Create real_pathdup which returns a caller owned string of the resolved
realpath based on the provide path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 13 +++++++++++++
 cache.h   |  1 +
 2 files changed, 14 insertions(+)

diff --git a/abspath.c b/abspath.c
index b0d4c1b..df37356 100644
--- a/abspath.c
+++ b/abspath.c
@@ -198,6 +198,19 @@ const char *real_path_if_valid(const char *path)
 	return strbuf_realpath(&realpath, path, 0);
 }
 
+char *real_pathdup(const char *path)
+{
+	struct strbuf realpath = STRBUF_INIT;
+	char *retval = NULL;
+
+	if(strbuf_realpath(&realpath, path, 0))
+		retval = strbuf_detach(&realpath, NULL);
+
+	strbuf_release(&realpath);
+
+	return retval;
+}
+
 /*
  * Use this to get an absolute path from a relative one. If you want
  * to resolve links, you should use real_path.
diff --git a/cache.h b/cache.h
index 7a81294..e12a5d9 100644
--- a/cache.h
+++ b/cache.h
@@ -1068,6 +1068,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
+char *real_pathdup(const char *path);
 const char *absolute_path(const char *path);
 const char *remove_leading_path(const char *in, const char *prefix);
 const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 4/4] real_path: have callers use real_pathdup and strbuf_realpath
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
                     ` (2 preceding siblings ...)
  2016-12-08 23:58   ` [PATCH v2 3/4] real_path: create real_pathdup Brandon Williams
@ 2016-12-08 23:58   ` Brandon Williams
  2016-12-09 12:33   ` [PATCH v2 0/4] road to reentrant real_path Duy Nguyen
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
  5 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-08 23:58 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Migrate callers of real_path() who duplicate the retern value to use
real_pathdup or strbuf_realpath.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/init-db.c |  6 +++---
 environment.c     |  2 +-
 setup.c           | 15 +++++++++------
 sha1_file.c       |  2 +-
 submodule.c       |  2 +-
 transport.c       |  2 +-
 worktree.c        |  2 +-
 7 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 2399b97..76d68fa 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -338,7 +338,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
-	char *original_git_dir = xstrdup(real_path(git_dir));
+	char *original_git_dir = real_pathdup(git_dir);
 
 	if (real_git_dir) {
 		struct stat st;
@@ -489,7 +489,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, init_db_options, init_db_usage, 0);
 
 	if (real_git_dir && !is_absolute_path(real_git_dir))
-		real_git_dir = xstrdup(real_path(real_git_dir));
+		real_git_dir = real_pathdup(real_git_dir);
 
 	if (argc == 1) {
 		int mkdir_tried = 0;
@@ -560,7 +560,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		const char *git_dir_parent = strrchr(git_dir, '/');
 		if (git_dir_parent) {
 			char *rel = xstrndup(git_dir, git_dir_parent - git_dir);
-			git_work_tree_cfg = xstrdup(real_path(rel));
+			git_work_tree_cfg = real_pathdup(rel);
 			free(rel);
 		}
 		if (!git_work_tree_cfg)
diff --git a/environment.c b/environment.c
index 0935ec6..9b943d2 100644
--- a/environment.c
+++ b/environment.c
@@ -259,7 +259,7 @@ void set_git_work_tree(const char *new_work_tree)
 		return;
 	}
 	git_work_tree_initialized = 1;
-	work_tree = xstrdup(real_path(new_work_tree));
+	work_tree = real_pathdup(new_work_tree);
 }
 
 const char *get_git_work_tree(void)
diff --git a/setup.c b/setup.c
index fe572b8..0d9fdd0 100644
--- a/setup.c
+++ b/setup.c
@@ -254,10 +254,12 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		if (!is_absolute_path(data.buf))
 			strbuf_addf(&path, "%s/", gitdir);
 		strbuf_addbuf(&path, &data);
-		strbuf_addstr(sb, real_path(path.buf));
+		strbuf_realpath(sb, path.buf, 1);
 		ret = 1;
-	} else
+	} else {
 		strbuf_addstr(sb, gitdir);
+	}
+
 	strbuf_release(&data);
 	strbuf_release(&path);
 	return ret;
@@ -692,7 +694,7 @@ static const char *setup_discovered_git_dir(const char *gitdir,
 	/* --work-tree is set without --git-dir; use discovered one */
 	if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
 		if (offset != cwd->len && !is_absolute_path(gitdir))
-			gitdir = xstrdup(real_path(gitdir));
+			gitdir = real_pathdup(gitdir);
 		if (chdir(cwd->buf))
 			die_errno("Could not come back to cwd");
 		return setup_explicit_git_dir(gitdir, cwd, nongit_ok);
@@ -800,11 +802,12 @@ static int canonicalize_ceiling_entry(struct string_list_item *item,
 		/* Keep entry but do not canonicalize it */
 		return 1;
 	} else {
-		const char *real_path = real_path_if_valid(ceil);
-		if (!real_path)
+		char *real_path = real_pathdup(ceil);
+		if (!real_path) {
 			return 0;
+		}
 		free(item->string);
-		item->string = xstrdup(real_path);
+		item->string = real_path;
 		return 1;
 	}
 }
diff --git a/sha1_file.c b/sha1_file.c
index 9c86d19..6a03cc3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -291,7 +291,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (!is_absolute_path(entry) && relative_base) {
-		strbuf_addstr(&pathbuf, real_path(relative_base));
+		strbuf_realpath(&pathbuf, relative_base, 1);
 		strbuf_addch(&pathbuf, '/');
 	}
 	strbuf_addstr(&pathbuf, entry);
diff --git a/submodule.c b/submodule.c
index 6f7d883..c85ba50 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1227,7 +1227,7 @@ void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir)
 {
 	struct strbuf file_name = STRBUF_INIT;
 	struct strbuf rel_path = STRBUF_INIT;
-	const char *real_work_tree = xstrdup(real_path(work_tree));
+	const char *real_work_tree = real_pathdup(work_tree);
 
 	/* Update gitfile */
 	strbuf_addf(&file_name, "%s/.git", work_tree);
diff --git a/transport.c b/transport.c
index d57e8de..236c6f6 100644
--- a/transport.c
+++ b/transport.c
@@ -1130,7 +1130,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
 	const struct ref *extra;
 	struct alternate_refs_data *cb = data;
 
-	other = xstrdup(real_path(e->path));
+	other = real_pathdup(e->path);
 	len = strlen(other);
 
 	while (other[len-1] == '/')
diff --git a/worktree.c b/worktree.c
index f7869f8..c90e013 100644
--- a/worktree.c
+++ b/worktree.c
@@ -255,7 +255,7 @@ struct worktree *find_worktree(struct worktree **list,
 		return wt;
 
 	arg = prefix_filename(prefix, strlen(prefix), arg);
-	path = xstrdup(real_path(arg));
+	path = real_pathdup(arg);
 	for (; *list; list++)
 		if (!fspathcmp(path, real_path((*list)->path)))
 			break;
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 1/4] real_path: resolve symlinks by hand
  2016-12-08 23:58   ` [PATCH v2 1/4] real_path: resolve symlinks by hand Brandon Williams
@ 2016-12-09  1:49     ` Jacob Keller
  2016-12-09 14:33     ` Johannes Sixt
  1 sibling, 0 replies; 83+ messages in thread
From: Jacob Keller @ 2016-12-09  1:49 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Git mailing list, Stefan Beller, Jeff King, Junio C Hamano,
	Ramsay Jones, Torsten Bögershausen, Johannes Sixt,
	Duy Nguyen

On Thu, Dec 8, 2016 at 3:58 PM, Brandon Williams <bmwill@google.com> wrote:
> The current implementation of real_path uses chdir() in order to resolve
> symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
> process as a whole and not just an individual thread.  Instead perform
> the symlink resolution by hand so that the calls to chdir() can be
> removed, making real_path one step closer to being reentrant.
>

Better description for this part of the change. I like the
improvement, as it clearly indicates what this particular patch is
about, and why, but doesn't over-state what we gain here.

The rest of this seems reasonable, though I'm not super familiar with
the code, so I didn't have any comments.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 0/4] road to reentrant real_path
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
                     ` (3 preceding siblings ...)
  2016-12-08 23:58   ` [PATCH v2 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
@ 2016-12-09 12:33   ` Duy Nguyen
  2016-12-09 19:42     ` Brandon Williams
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
  5 siblings, 1 reply; 83+ messages in thread
From: Duy Nguyen @ 2016-12-09 12:33 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Git Mailing List, Stefan Beller, Jeff King, Jacob Keller,
	Junio C Hamano, Ramsay Jones, Torsten Bögershausen,
	Johannes Sixt

On Fri, Dec 9, 2016 at 6:58 AM, Brandon Williams <bmwill@google.com> wrote:
> diff --git a/setup.c b/setup.c
> index fe572b8..0d9fdd0 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -254,10 +254,12 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>                 if (!is_absolute_path(data.buf))
>                         strbuf_addf(&path, "%s/", gitdir);
>                 strbuf_addbuf(&path, &data);
> -               strbuf_addstr(sb, real_path(path.buf));
> +               strbuf_realpath(sb, path.buf, 1);

This is not the same because of this hunk in strbuf_realpath()

> @@ -81,17 +73,18 @@ static const char *real_path_internal(const char *path, int die_on_error)
>                         goto error_out;
>         }
>
> -       strbuf_reset(&resolved);
> +       strbuf_reset(resolved);
>
>         if (is_absolute_path(path)) {

But if you you remove that, then all (old) callers of
strbuf_realpath() must do a strbuf_reset() in advance if needed
(probably just real_path does) which sounds reasonable to me. You're
probably want to be careful about the strbuf_reset() at the end of the
function too.

Other than that, I think this diff looks nice.
-- 
Duy

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 1/4] real_path: resolve symlinks by hand
  2016-12-08 23:58   ` [PATCH v2 1/4] real_path: resolve symlinks by hand Brandon Williams
  2016-12-09  1:49     ` Jacob Keller
@ 2016-12-09 14:33     ` Johannes Sixt
  2016-12-09 20:04       ` Brandon Williams
  1 sibling, 1 reply; 83+ messages in thread
From: Johannes Sixt @ 2016-12-09 14:33 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

Am 09.12.2016 um 00:58 schrieb Brandon Williams:
> The current implementation of real_path uses chdir() in order to resolve
> symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
> process as a whole and not just an individual thread.  Instead perform
> the symlink resolution by hand so that the calls to chdir() can be
> removed, making real_path one step closer to being reentrant.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  abspath.c | 183 +++++++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 122 insertions(+), 61 deletions(-)
>
> diff --git a/abspath.c b/abspath.c
> index 2825de8..92f2a29 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -11,8 +11,38 @@ int is_directory(const char *path)
>  	return (!stat(path, &st) && S_ISDIR(st.st_mode));
>  }
>
> +/* removes the last path component from 'path' except if 'path' is root */
> +static void strip_last_component(struct strbuf *path)
> +{
> +	if (path->len > offset_1st_component(path->buf)) {
> +		char *last_slash = find_last_dir_sep(path->buf);
> +		strbuf_setlen(path, last_slash - path->buf);
> +	}
> +}

This implementation is not correct because when the input is "/foo", the 
result is "" when it should be "/". Also, can the input be a 
non-normalized path? When the input is "foo///bar", should the result be 
"foo" or would "foo//" be an acceptable result? I think it should be the 
former. find_last_dir_sep() returns the last of the three slashes, not 
the first one. Therefore, I've rewritten the function thusly:

static void strip_last_component(struct strbuf *path)
{
	size_t offset = offset_1st_component(path->buf);
	size_t len = path->len;
	while (offset < len && !is_dir_sep(path->buf[len - 1]))
		len--;
	while (offset < len && is_dir_sep(path->buf[len - 1]))
		len--;
	strbuf_setlen(path, len);
}

> +
> +/* get (and remove) the next component in 'remaining' and place it in 'next' */
> +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> +{
> +	char *start = NULL;
> +	char *end = NULL;
> +
> +	strbuf_reset(next);
> +
> +	/* look for the next component */
> +	/* Skip sequences of multiple path-separators */
> +	for (start = remaining->buf; is_dir_sep(*start); start++)
> +		; /* nothing */
> +	/* Find end of the path component */
> +	for (end = start; *end && !is_dir_sep(*end); end++)
> +		; /* nothing */
> +
> +	strbuf_add(next, start, end - start);
> +	/* remove the component from 'remaining' */
> +	strbuf_remove(remaining, 0, end - remaining->buf);
> +}

Mental note: This function strips leading slashes and the first path 
component; post-condition: remaining is empty or has leading slashes.

> +
>  /* We allow "recursive" symbolic links. Only within reason, though. */
> -#define MAXDEPTH 5
> +#define MAXSYMLINKS 5
>
>  /*
>   * Return the real path (i.e., absolute path, with symlinks resolved
> @@ -21,7 +51,6 @@ int is_directory(const char *path)
>   * absolute_path().)  The return value is a pointer to a static
>   * buffer.
>   *
> - * The input and all intermediate paths must be shorter than MAX_PATH.
>   * The directory part of path (i.e., everything up to the last
>   * dir_sep) must denote a valid, existing directory, but the last
>   * component need not exist.  If die_on_error is set, then die with an
> @@ -33,22 +62,16 @@ int is_directory(const char *path)
>   */
>  static const char *real_path_internal(const char *path, int die_on_error)
>  {
> -	static struct strbuf sb = STRBUF_INIT;
> +	static struct strbuf resolved = STRBUF_INIT;
> +	struct strbuf remaining = STRBUF_INIT;
> +	struct strbuf next = STRBUF_INIT;
> +	struct strbuf symlink = STRBUF_INIT;
>  	char *retval = NULL;
> -
> -	/*
> -	 * If we have to temporarily chdir(), store the original CWD
> -	 * here so that we can chdir() back to it at the end of the
> -	 * function:
> -	 */
> -	struct strbuf cwd = STRBUF_INIT;
> -
> -	int depth = MAXDEPTH;
> -	char *last_elem = NULL;
> +	int num_symlinks = 0;
>  	struct stat st;
>
>  	/* We've already done it */
> -	if (path == sb.buf)
> +	if (path == resolved.buf)
>  		return path;
>
>  	if (!*path) {
> @@ -58,74 +81,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
>  			goto error_out;
>  	}
>
> -	strbuf_reset(&sb);
> -	strbuf_addstr(&sb, path);
> -
> -	while (depth--) {
> -		if (!is_directory(sb.buf)) {
> -			char *last_slash = find_last_dir_sep(sb.buf);
> -			if (last_slash) {
> -				last_elem = xstrdup(last_slash + 1);
> -				strbuf_setlen(&sb, last_slash - sb.buf + 1);
> -			} else {
> -				last_elem = xmemdupz(sb.buf, sb.len);
> -				strbuf_reset(&sb);
> -			}
> +	strbuf_reset(&resolved);
> +
> +	if (is_absolute_path(path)) {
> +		/* absolute path; start with only root as being resolved */
> +		int offset = offset_1st_component(path);
> +		strbuf_add(&resolved, path, offset);
> +		strbuf_addstr(&remaining, path + offset);

OK.

> +	} else {
> +		/* relative path; can use CWD as the initial resolved path */
> +		if (strbuf_getcwd(&resolved)) {
> +			if (die_on_error)
> +				die_errno("unable to get current working directory");
> +			else
> +				goto error_out;
> +		}
> +		strbuf_addstr(&remaining, path);
> +	}
> +
> +	/* Iterate over the remaining path components */
> +	while (remaining.len > 0) {
> +		get_next_component(&next, &remaining);
> +
> +		if (next.len == 0) {
> +			continue; /* empty component */

Can this happen? I think it can: when 'path' is a root directory, or 
ends with path separators.

> +		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
> +			continue; /* '.' component */

Good.

I don't mind strcmp(), but others might point out that a single 
character comparison should be sufficient.

> +		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
> +			/* '..' component; strip the last path component */
> +			strip_last_component(&resolved);
> +			continue;

Good.

>  		}
>
> -		if (sb.len) {
> -			if (!cwd.len && strbuf_getcwd(&cwd)) {
> +		/* append the next component and resolve resultant path */
> +		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
> +			strbuf_addch(&resolved, '/');

Can we access resolved.buf[resolved.len - 1] here? At this point, 
resolved has been filled with an absolute path or from getcwd(); it 
cannot be empty in the first iteration. In subsequent iterations, it 
cannot be empty as long as strip_last_component() does not make it 
empty. Your original version of strip_last_component() could make it 
empty; my rewrite should not.

Do we have to check for the path separator at all? Typically, resolved 
does not end with a slash, but if we went all the way up to the root, 
there is a trailing slash. Good; the condition is required.

> +		strbuf_addbuf(&resolved, &next);
> +
> +		if (lstat(resolved.buf, &st)) {
> +			/* error out unless this was the last component */
> +			if (!(errno == ENOENT && !remaining.len)) {

Perhaps it was easier to _write_ the condition in this way, but I would 
have an easier time to _read_ it when it is

			if (errno != ENOENT || remaining.len) {

>  				if (die_on_error)
> -					die_errno("Could not get current working directory");
> +					die_errno("Invalid path '%s'",
> +						  resolved.buf);
>  				else
>  					goto error_out;
>  			}
> +		} else if (S_ISLNK(st.st_mode)) {
> +			ssize_t len;
> +			strbuf_reset(&symlink);
>
> -			if (chdir(sb.buf)) {
> +			if (num_symlinks++ > MAXSYMLINKS) {
>  				if (die_on_error)
> -					die_errno("Could not switch to '%s'",
> -						  sb.buf);
> +					die("More than %d nested symlinks "
> +					    "on path '%s'", MAXSYMLINKS, path);
>  				else
>  					goto error_out;
>  			}
> -		}
> -		if (strbuf_getcwd(&sb)) {
> -			if (die_on_error)
> -				die_errno("Could not get current working directory");
> -			else
> -				goto error_out;
> -		}
> -
> -		if (last_elem) {
> -			if (sb.len && !is_dir_sep(sb.buf[sb.len - 1]))
> -				strbuf_addch(&sb, '/');
> -			strbuf_addstr(&sb, last_elem);
> -			free(last_elem);
> -			last_elem = NULL;
> -		}
>
> -		if (!lstat(sb.buf, &st) && S_ISLNK(st.st_mode)) {
> -			struct strbuf next_sb = STRBUF_INIT;
> -			ssize_t len = strbuf_readlink(&next_sb, sb.buf, 0);
> +			len = strbuf_readlink(&symlink, resolved.buf,
> +					      st.st_size);
>  			if (len < 0) {
>  				if (die_on_error)
>  					die_errno("Invalid symlink '%s'",
> -						  sb.buf);
> +						  resolved.buf);
>  				else
>  					goto error_out;
>  			}
> -			strbuf_swap(&sb, &next_sb);
> -			strbuf_release(&next_sb);
> -		} else
> -			break;
> +
> +			if (is_absolute_path(symlink.buf)) {
> +				/* absolute symlink; set resolved to root */
> +				int offset = offset_1st_component(symlink.buf);
> +				strbuf_reset(&resolved);
> +				strbuf_add(&resolved, symlink.buf, offset);
> +				strbuf_remove(&symlink, 0, offset);

Good. I would have expected some kind of "goto repeat;" because when we 
encounter a symlink to an absolute path, most, if not all, work done so 
far is obsoleted. But I haven't thought too deeply about this.

> +			} else {
> +				/*
> +				 * relative symlink
> +				 * strip off the last component since it will
> +				 * be replaced with the contents of the symlink
> +				 */
> +				strip_last_component(&resolved);
> +			}
> +
> +			/*
> +			 * if there are still remaining components to resolve
> +			 * then append them to symlink
> +			 */
> +			if (remaining.len) {
> +				strbuf_addch(&symlink, '/');
> +				strbuf_addbuf(&symlink, &remaining);
> +			}
> +
> +			/*
> +			 * use the symlink as the remaining components that
> +			 * need to be resloved
> +			 */
> +			strbuf_swap(&symlink, &remaining);

Good. This looks very reasonable.

> +		}
>  	}
>
> -	retval = sb.buf;
> +	retval = resolved.buf;
> +
>  error_out:
> -	free(last_elem);
> -	if (cwd.len && chdir(cwd.buf))
> -		die_errno("Could not change back to '%s'", cwd.buf);
> -	strbuf_release(&cwd);
> +	strbuf_release(&remaining);
> +	strbuf_release(&next);
> +	strbuf_release(&symlink);
>
>  	return retval;
>  }
>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 3/4] real_path: create real_pathdup
  2016-12-08 23:58   ` [PATCH v2 3/4] real_path: create real_pathdup Brandon Williams
@ 2016-12-09 14:35     ` Johannes Sixt
  0 siblings, 0 replies; 83+ messages in thread
From: Johannes Sixt @ 2016-12-09 14:35 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

Am 09.12.2016 um 00:58 schrieb Brandon Williams:
> +char *real_pathdup(const char *path)
> +{
> +	struct strbuf realpath = STRBUF_INIT;
> +	char *retval = NULL;
> +
> +	if(strbuf_realpath(&realpath, path, 0))

Style nit: blank after if is missing.

-- Hannes


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 0/4] road to reentrant real_path
  2016-12-09 12:33   ` [PATCH v2 0/4] road to reentrant real_path Duy Nguyen
@ 2016-12-09 19:42     ` Brandon Williams
  2016-12-10 11:02       ` Duy Nguyen
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-09 19:42 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Stefan Beller, Jeff King, Jacob Keller,
	Junio C Hamano, Ramsay Jones, Torsten Bögershausen,
	Johannes Sixt

On 12/09, Duy Nguyen wrote:
> On Fri, Dec 9, 2016 at 6:58 AM, Brandon Williams <bmwill@google.com> wrote:
> > diff --git a/setup.c b/setup.c
> > index fe572b8..0d9fdd0 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -254,10 +254,12 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
> >                 if (!is_absolute_path(data.buf))
> >                         strbuf_addf(&path, "%s/", gitdir);
> >                 strbuf_addbuf(&path, &data);
> > -               strbuf_addstr(sb, real_path(path.buf));
> > +               strbuf_realpath(sb, path.buf, 1);
> 
> This is not the same because of this hunk in strbuf_realpath()

Then perhaps I shouldn't make this change (and just leave it as is)
since the way real_path_internal/strbuf_realpath is written requires
that the strbuf being used for the resolved path only contains the
resolved path (see the lstat(resolved->buf &st) call).  Sidenote it
looks like strbuf_getcwd() also does a reset, though more subtlety,
since it just passes its buffer to getcwd().

> 
> > @@ -81,17 +73,18 @@ static const char *real_path_internal(const char *path, int die_on_error)
> >                         goto error_out;
> >         }
> >
> > -       strbuf_reset(&resolved);
> > +       strbuf_reset(resolved);
> >
> >         if (is_absolute_path(path)) {
> 
> But if you you remove that, then all (old) callers of
> strbuf_realpath() must do a strbuf_reset() in advance if needed
> (probably just real_path does) which sounds reasonable to me. You're
> probably want to be careful about the strbuf_reset() at the end of the
> function too.
> 
> Other than that, I think this diff looks nice.
> -- 
> Duy

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 1/4] real_path: resolve symlinks by hand
  2016-12-09 14:33     ` Johannes Sixt
@ 2016-12-09 20:04       ` Brandon Williams
  0 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-09 20:04 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

On 12/09, Johannes Sixt wrote:
> Am 09.12.2016 um 00:58 schrieb Brandon Williams:
> >The current implementation of real_path uses chdir() in order to resolve
> >symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
> >process as a whole and not just an individual thread.  Instead perform
> >the symlink resolution by hand so that the calls to chdir() can be
> >removed, making real_path one step closer to being reentrant.
> >
> >Signed-off-by: Brandon Williams <bmwill@google.com>
> >---
> > abspath.c | 183 +++++++++++++++++++++++++++++++++++++++++---------------------
> > 1 file changed, 122 insertions(+), 61 deletions(-)
> >
> >diff --git a/abspath.c b/abspath.c
> >index 2825de8..92f2a29 100644
> >--- a/abspath.c
> >+++ b/abspath.c
> >@@ -11,8 +11,38 @@ int is_directory(const char *path)
> > 	return (!stat(path, &st) && S_ISDIR(st.st_mode));
> > }
> >
> >+/* removes the last path component from 'path' except if 'path' is root */
> >+static void strip_last_component(struct strbuf *path)
> >+{
> >+	if (path->len > offset_1st_component(path->buf)) {
> >+		char *last_slash = find_last_dir_sep(path->buf);
> >+		strbuf_setlen(path, last_slash - path->buf);
> >+	}
> >+}
> 
> This implementation is not correct because when the input is "/foo",
> the result is "" when it should be "/". Also, can the input be a
> non-normalized path? When the input is "foo///bar", should the
> result be "foo" or would "foo//" be an acceptable result? I think it
> should be the former. find_last_dir_sep() returns the last of the
> three slashes, not the first one. Therefore, I've rewritten the
> function thusly:
> 
> static void strip_last_component(struct strbuf *path)
> {
> 	size_t offset = offset_1st_component(path->buf);
> 	size_t len = path->len;
> 	while (offset < len && !is_dir_sep(path->buf[len - 1]))
> 		len--;
> 	while (offset < len && is_dir_sep(path->buf[len - 1]))
> 		len--;
> 	strbuf_setlen(path, len);
> }
> 

Thanks for that catch.  So your rewrite takes the offset of the 1st
component and ensures that we don't cut that part off.  It first strips
all non directory separators and then all directory separators.  This
way "/foo////bar" becomes "/foo" and as you pointed out "/foo" would
become "/".  The offset would also take care of windows drive letters
and the like.  Looks good.  Thanks!

> >+		strbuf_addbuf(&resolved, &next);
> >+
> >+		if (lstat(resolved.buf, &st)) {
> >+			/* error out unless this was the last component */
> >+			if (!(errno == ENOENT && !remaining.len)) {
> 
> Perhaps it was easier to _write_ the condition in this way, but I
> would have an easier time to _read_ it when it is
> 
> 			if (errno != ENOENT || remaining.len) {
> 

Yes I did write it out weird, mostly because it made the most sense for
what I was trying to accomplish (add path components must exist, except
for the very last one).  I'm fine applying DeMorgan's since it looks a
little cleaner.

> >+
> >+			if (is_absolute_path(symlink.buf)) {
> >+				/* absolute symlink; set resolved to root */
> >+				int offset = offset_1st_component(symlink.buf);
> >+				strbuf_reset(&resolved);
> >+				strbuf_add(&resolved, symlink.buf, offset);
> >+				strbuf_remove(&symlink, 0, offset);
> 
> Good. I would have expected some kind of "goto repeat;" because when
> we encounter a symlink to an absolute path, most, if not all, work
> done so far is obsoleted. But I haven't thought too deeply about
> this.

It made the most sense to just reuse the same looping condition that
I already had in place.  Resetting the resolved string to be the root
component of the absolute symlink made it easy to "throw away" all the
old work to allow us to start from scratch again.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 0/4] road to reentrant real_path
  2016-12-09 19:42     ` Brandon Williams
@ 2016-12-10 11:02       ` Duy Nguyen
  0 siblings, 0 replies; 83+ messages in thread
From: Duy Nguyen @ 2016-12-10 11:02 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Git Mailing List, Stefan Beller, Jeff King, Jacob Keller,
	Junio C Hamano, Ramsay Jones, Torsten Bögershausen,
	Johannes Sixt

On Sat, Dec 10, 2016 at 2:42 AM, Brandon Williams <bmwill@google.com> wrote:
> On 12/09, Duy Nguyen wrote:
>> On Fri, Dec 9, 2016 at 6:58 AM, Brandon Williams <bmwill@google.com> wrote:
>> > diff --git a/setup.c b/setup.c
>> > index fe572b8..0d9fdd0 100644
>> > --- a/setup.c
>> > +++ b/setup.c
>> > @@ -254,10 +254,12 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>> >                 if (!is_absolute_path(data.buf))
>> >                         strbuf_addf(&path, "%s/", gitdir);
>> >                 strbuf_addbuf(&path, &data);
>> > -               strbuf_addstr(sb, real_path(path.buf));
>> > +               strbuf_realpath(sb, path.buf, 1);
>>
>> This is not the same because of this hunk in strbuf_realpath()
>
> Then perhaps I shouldn't make this change (and just leave it as is)
> since the way real_path_internal/strbuf_realpath is written requires
> that the strbuf being used for the resolved path only contains the
> resolved path (see the lstat(resolved->buf &st) call).  Sidenote it
> looks like strbuf_getcwd() also does a reset, though more subtlety,
> since it just passes its buffer to getcwd().

Yeah that's ok too (I did not see this subtlety when I suggested the change).
-- 
Duy

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 0/4] road to reentrant real_path
  2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
                     ` (4 preceding siblings ...)
  2016-12-09 12:33   ` [PATCH v2 0/4] road to reentrant real_path Duy Nguyen
@ 2016-12-12 18:16   ` Brandon Williams
  2016-12-12 18:16     ` [PATCH v3 1/4] real_path: resolve symlinks by hand Brandon Williams
                       ` (5 more replies)
  5 siblings, 6 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-12 18:16 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Changes in v3:

* Rewrite of `strip_last_component()` as the v2 verison didn't properly handle
  inputs like '/foo'.  Thanks to Johannes for pointing this out and suggesting
  a solution.
* Small style changes
* Revert the call in `get_common_dir_noenv()` to maintain proper functionality.

Brandon Williams (4):
  real_path: resolve symlinks by hand
  real_path: convert real_path_internal to strbuf_realpath
  real_path: create real_pathdup
  real_path: have callers use real_pathdup and strbuf_realpath

 abspath.c         | 222 ++++++++++++++++++++++++++++++++++++------------------
 builtin/init-db.c |   6 +-
 cache.h           |   3 +
 environment.c     |   2 +-
 setup.c           |  13 ++--
 sha1_file.c       |   2 +-
 submodule.c       |   2 +-
 transport.c       |   2 +-
 worktree.c        |   2 +-
 9 files changed, 169 insertions(+), 85 deletions(-)

--- interdiff from v2

diff --git a/abspath.c b/abspath.c
index df37356..79ee310 100644
--- a/abspath.c
+++ b/abspath.c
@@ -14,10 +14,17 @@ int is_directory(const char *path)
 /* removes the last path component from 'path' except if 'path' is root */
 static void strip_last_component(struct strbuf *path)
 {
-	if (path->len > offset_1st_component(path->buf)) {
-		char *last_slash = find_last_dir_sep(path->buf);
-		strbuf_setlen(path, last_slash - path->buf);
-	}
+	size_t offset = offset_1st_component(path->buf);
+	size_t len = path->len;
+
+	/* Find start of the last component */
+	while (offset < len && !is_dir_sep(path->buf[len - 1]))
+		len--;
+	/* Skip sequences of multiple path-separators */
+	while (offset < len && is_dir_sep(path->buf[len - 1]))
+		len--;
+
+	strbuf_setlen(path, len);
 }
 
 /* get (and remove) the next component in 'remaining' and place it in 'next' */
@@ -112,7 +119,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 
 		if (lstat(resolved->buf, &st)) {
 			/* error out unless this was the last component */
-			if (!(errno == ENOENT && !remaining.len)) {
+			if (errno != ENOENT || remaining.len) {
 				if (die_on_error)
 					die_errno("Invalid path '%s'",
 						  resolved->buf);
@@ -203,7 +210,7 @@ char *real_pathdup(const char *path)
 	struct strbuf realpath = STRBUF_INIT;
 	char *retval = NULL;
 
-	if(strbuf_realpath(&realpath, path, 0))
+	if (strbuf_realpath(&realpath, path, 0))
 		retval = strbuf_detach(&realpath, NULL);
 
 	strbuf_release(&realpath);
diff --git a/setup.c b/setup.c
index 0d9fdd0..1b534a7 100644
--- a/setup.c
+++ b/setup.c
@@ -254,7 +254,7 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		if (!is_absolute_path(data.buf))
 			strbuf_addf(&path, "%s/", gitdir);
 		strbuf_addbuf(&path, &data);
-		strbuf_realpath(sb, path.buf, 1);
+		strbuf_addstr(sb, real_path(path.buf));
 		ret = 1;
 	} else {
 		strbuf_addstr(sb, gitdir);

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 1/4] real_path: resolve symlinks by hand
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
@ 2016-12-12 18:16     ` Brandon Williams
  2016-12-12 22:19       ` Junio C Hamano
  2016-12-12 18:16     ` [PATCH v3 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-12 18:16 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

The current implementation of real_path uses chdir() in order to resolve
symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
process as a whole and not just an individual thread.  Instead perform
the symlink resolution by hand so that the calls to chdir() can be
removed, making real_path one step closer to being reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 190 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 129 insertions(+), 61 deletions(-)

diff --git a/abspath.c b/abspath.c
index 2825de8..cafcae0 100644
--- a/abspath.c
+++ b/abspath.c
@@ -11,8 +11,45 @@ int is_directory(const char *path)
 	return (!stat(path, &st) && S_ISDIR(st.st_mode));
 }
 
+/* removes the last path component from 'path' except if 'path' is root */
+static void strip_last_component(struct strbuf *path)
+{
+	size_t offset = offset_1st_component(path->buf);
+	size_t len = path->len;
+
+	/* Find start of the last component */
+	while (offset < len && !is_dir_sep(path->buf[len - 1]))
+		len--;
+	/* Skip sequences of multiple path-separators */
+	while (offset < len && is_dir_sep(path->buf[len - 1]))
+		len--;
+
+	strbuf_setlen(path, len);
+}
+
+/* get (and remove) the next component in 'remaining' and place it in 'next' */
+static void get_next_component(struct strbuf *next, struct strbuf *remaining)
+{
+	char *start = NULL;
+	char *end = NULL;
+
+	strbuf_reset(next);
+
+	/* look for the next component */
+	/* Skip sequences of multiple path-separators */
+	for (start = remaining->buf; is_dir_sep(*start); start++)
+		; /* nothing */
+	/* Find end of the path component */
+	for (end = start; *end && !is_dir_sep(*end); end++)
+		; /* nothing */
+
+	strbuf_add(next, start, end - start);
+	/* remove the component from 'remaining' */
+	strbuf_remove(remaining, 0, end - remaining->buf);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXDEPTH 5
+#define MAXSYMLINKS 5
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
@@ -21,7 +58,6 @@ int is_directory(const char *path)
  * absolute_path().)  The return value is a pointer to a static
  * buffer.
  *
- * The input and all intermediate paths must be shorter than MAX_PATH.
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
@@ -33,22 +69,16 @@ int is_directory(const char *path)
  */
 static const char *real_path_internal(const char *path, int die_on_error)
 {
-	static struct strbuf sb = STRBUF_INIT;
+	static struct strbuf resolved = STRBUF_INIT;
+	struct strbuf remaining = STRBUF_INIT;
+	struct strbuf next = STRBUF_INIT;
+	struct strbuf symlink = STRBUF_INIT;
 	char *retval = NULL;
-
-	/*
-	 * If we have to temporarily chdir(), store the original CWD
-	 * here so that we can chdir() back to it at the end of the
-	 * function:
-	 */
-	struct strbuf cwd = STRBUF_INIT;
-
-	int depth = MAXDEPTH;
-	char *last_elem = NULL;
+	int num_symlinks = 0;
 	struct stat st;
 
 	/* We've already done it */
-	if (path == sb.buf)
+	if (path == resolved.buf)
 		return path;
 
 	if (!*path) {
@@ -58,74 +88,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&sb);
-	strbuf_addstr(&sb, path);
-
-	while (depth--) {
-		if (!is_directory(sb.buf)) {
-			char *last_slash = find_last_dir_sep(sb.buf);
-			if (last_slash) {
-				last_elem = xstrdup(last_slash + 1);
-				strbuf_setlen(&sb, last_slash - sb.buf + 1);
-			} else {
-				last_elem = xmemdupz(sb.buf, sb.len);
-				strbuf_reset(&sb);
-			}
+	strbuf_reset(&resolved);
+
+	if (is_absolute_path(path)) {
+		/* absolute path; start with only root as being resolved */
+		int offset = offset_1st_component(path);
+		strbuf_add(&resolved, path, offset);
+		strbuf_addstr(&remaining, path + offset);
+	} else {
+		/* relative path; can use CWD as the initial resolved path */
+		if (strbuf_getcwd(&resolved)) {
+			if (die_on_error)
+				die_errno("unable to get current working directory");
+			else
+				goto error_out;
 		}
+		strbuf_addstr(&remaining, path);
+	}
 
-		if (sb.len) {
-			if (!cwd.len && strbuf_getcwd(&cwd)) {
+	/* Iterate over the remaining path components */
+	while (remaining.len > 0) {
+		get_next_component(&next, &remaining);
+
+		if (next.len == 0) {
+			continue; /* empty component */
+		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
+			continue; /* '.' component */
+		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
+			/* '..' component; strip the last path component */
+			strip_last_component(&resolved);
+			continue;
+		}
+
+		/* append the next component and resolve resultant path */
+		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
+			strbuf_addch(&resolved, '/');
+		strbuf_addbuf(&resolved, &next);
+
+		if (lstat(resolved.buf, &st)) {
+			/* error out unless this was the last component */
+			if (errno != ENOENT || remaining.len) {
 				if (die_on_error)
-					die_errno("Could not get current working directory");
+					die_errno("Invalid path '%s'",
+						  resolved.buf);
 				else
 					goto error_out;
 			}
+		} else if (S_ISLNK(st.st_mode)) {
+			ssize_t len;
+			strbuf_reset(&symlink);
 
-			if (chdir(sb.buf)) {
+			if (num_symlinks++ > MAXSYMLINKS) {
 				if (die_on_error)
-					die_errno("Could not switch to '%s'",
-						  sb.buf);
+					die("More than %d nested symlinks "
+					    "on path '%s'", MAXSYMLINKS, path);
 				else
 					goto error_out;
 			}
-		}
-		if (strbuf_getcwd(&sb)) {
-			if (die_on_error)
-				die_errno("Could not get current working directory");
-			else
-				goto error_out;
-		}
-
-		if (last_elem) {
-			if (sb.len && !is_dir_sep(sb.buf[sb.len - 1]))
-				strbuf_addch(&sb, '/');
-			strbuf_addstr(&sb, last_elem);
-			free(last_elem);
-			last_elem = NULL;
-		}
 
-		if (!lstat(sb.buf, &st) && S_ISLNK(st.st_mode)) {
-			struct strbuf next_sb = STRBUF_INIT;
-			ssize_t len = strbuf_readlink(&next_sb, sb.buf, 0);
+			len = strbuf_readlink(&symlink, resolved.buf,
+					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  sb.buf);
+						  resolved.buf);
 				else
 					goto error_out;
 			}
-			strbuf_swap(&sb, &next_sb);
-			strbuf_release(&next_sb);
-		} else
-			break;
+
+			if (is_absolute_path(symlink.buf)) {
+				/* absolute symlink; set resolved to root */
+				int offset = offset_1st_component(symlink.buf);
+				strbuf_reset(&resolved);
+				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_remove(&symlink, 0, offset);
+			} else {
+				/*
+				 * relative symlink
+				 * strip off the last component since it will
+				 * be replaced with the contents of the symlink
+				 */
+				strip_last_component(&resolved);
+			}
+
+			/*
+			 * if there are still remaining components to resolve
+			 * then append them to symlink
+			 */
+			if (remaining.len) {
+				strbuf_addch(&symlink, '/');
+				strbuf_addbuf(&symlink, &remaining);
+			}
+
+			/*
+			 * use the symlink as the remaining components that
+			 * need to be resloved
+			 */
+			strbuf_swap(&symlink, &remaining);
+		}
 	}
 
-	retval = sb.buf;
+	retval = resolved.buf;
+
 error_out:
-	free(last_elem);
-	if (cwd.len && chdir(cwd.buf))
-		die_errno("Could not change back to '%s'", cwd.buf);
-	strbuf_release(&cwd);
+	strbuf_release(&remaining);
+	strbuf_release(&next);
+	strbuf_release(&symlink);
 
 	return retval;
 }
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 2/4] real_path: convert real_path_internal to strbuf_realpath
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
  2016-12-12 18:16     ` [PATCH v3 1/4] real_path: resolve symlinks by hand Brandon Williams
@ 2016-12-12 18:16     ` Brandon Williams
  2016-12-12 22:20       ` Junio C Hamano
  2016-12-12 18:16     ` [PATCH v3 3/4] real_path: create real_pathdup Brandon Williams
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-12 18:16 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Change the name of real_path_internal to strbuf_realpath.  In addition
push the static strbuf up to its callers and instead take as a
parameter a pointer to a strbuf to use for the final result.

This change makes strbuf_realpath reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 53 +++++++++++++++++++++++++----------------------------
 cache.h   |  2 ++
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/abspath.c b/abspath.c
index cafcae0..8c6c76b 100644
--- a/abspath.c
+++ b/abspath.c
@@ -55,21 +55,17 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
  * Return the real path (i.e., absolute path, with symlinks resolved
  * and extra slashes removed) equivalent to the specified path.  (If
  * you want an absolute path but don't mind links, use
- * absolute_path().)  The return value is a pointer to a static
- * buffer.
+ * absolute_path().)  Places the resolved realpath in the provided strbuf.
  *
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
  * informative error message if there is a problem.  Otherwise, return
  * NULL on errors (without generating any output).
- *
- * If path is our buffer, then return path, as it's already what the
- * user wants.
  */
-static const char *real_path_internal(const char *path, int die_on_error)
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error)
 {
-	static struct strbuf resolved = STRBUF_INIT;
 	struct strbuf remaining = STRBUF_INIT;
 	struct strbuf next = STRBUF_INIT;
 	struct strbuf symlink = STRBUF_INIT;
@@ -77,10 +73,6 @@ static const char *real_path_internal(const char *path, int die_on_error)
 	int num_symlinks = 0;
 	struct stat st;
 
-	/* We've already done it */
-	if (path == resolved.buf)
-		return path;
-
 	if (!*path) {
 		if (die_on_error)
 			die("The empty string is not a valid path");
@@ -88,16 +80,16 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&resolved);
+	strbuf_reset(resolved);
 
 	if (is_absolute_path(path)) {
 		/* absolute path; start with only root as being resolved */
 		int offset = offset_1st_component(path);
-		strbuf_add(&resolved, path, offset);
+		strbuf_add(resolved, path, offset);
 		strbuf_addstr(&remaining, path + offset);
 	} else {
 		/* relative path; can use CWD as the initial resolved path */
-		if (strbuf_getcwd(&resolved)) {
+		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
 				die_errno("unable to get current working directory");
 			else
@@ -116,21 +108,21 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			continue; /* '.' component */
 		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
 			/* '..' component; strip the last path component */
-			strip_last_component(&resolved);
+			strip_last_component(resolved);
 			continue;
 		}
 
 		/* append the next component and resolve resultant path */
-		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
-			strbuf_addch(&resolved, '/');
-		strbuf_addbuf(&resolved, &next);
+		if (!is_dir_sep(resolved->buf[resolved->len - 1]))
+			strbuf_addch(resolved, '/');
+		strbuf_addbuf(resolved, &next);
 
-		if (lstat(resolved.buf, &st)) {
+		if (lstat(resolved->buf, &st)) {
 			/* error out unless this was the last component */
 			if (errno != ENOENT || remaining.len) {
 				if (die_on_error)
 					die_errno("Invalid path '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -146,12 +138,12 @@ static const char *real_path_internal(const char *path, int die_on_error)
 					goto error_out;
 			}
 
-			len = strbuf_readlink(&symlink, resolved.buf,
+			len = strbuf_readlink(&symlink, resolved->buf,
 					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -159,8 +151,8 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			if (is_absolute_path(symlink.buf)) {
 				/* absolute symlink; set resolved to root */
 				int offset = offset_1st_component(symlink.buf);
-				strbuf_reset(&resolved);
-				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_reset(resolved);
+				strbuf_add(resolved, symlink.buf, offset);
 				strbuf_remove(&symlink, 0, offset);
 			} else {
 				/*
@@ -168,7 +160,7 @@ static const char *real_path_internal(const char *path, int die_on_error)
 				 * strip off the last component since it will
 				 * be replaced with the contents of the symlink
 				 */
-				strip_last_component(&resolved);
+				strip_last_component(resolved);
 			}
 
 			/*
@@ -188,24 +180,29 @@ static const char *real_path_internal(const char *path, int die_on_error)
 		}
 	}
 
-	retval = resolved.buf;
+	retval = resolved->buf;
 
 error_out:
 	strbuf_release(&remaining);
 	strbuf_release(&next);
 	strbuf_release(&symlink);
 
+	if (!retval)
+		strbuf_reset(resolved);
+
 	return retval;
 }
 
 const char *real_path(const char *path)
 {
-	return real_path_internal(path, 1);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 1);
 }
 
 const char *real_path_if_valid(const char *path)
 {
-	return real_path_internal(path, 0);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 0);
 }
 
 /*
diff --git a/cache.h b/cache.h
index a50a61a..7a81294 100644
--- a/cache.h
+++ b/cache.h
@@ -1064,6 +1064,8 @@ static inline int is_absolute_path(const char *path)
 	return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
 }
 int is_directory(const char *);
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
 const char *absolute_path(const char *path);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 3/4] real_path: create real_pathdup
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
  2016-12-12 18:16     ` [PATCH v3 1/4] real_path: resolve symlinks by hand Brandon Williams
  2016-12-12 18:16     ` [PATCH v3 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
@ 2016-12-12 18:16     ` Brandon Williams
  2016-12-12 22:25       ` Junio C Hamano
  2016-12-12 18:16     ` [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-12 18:16 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Create real_pathdup which returns a caller owned string of the resolved
realpath based on the provide path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 13 +++++++++++++
 cache.h   |  1 +
 2 files changed, 14 insertions(+)

diff --git a/abspath.c b/abspath.c
index 8c6c76b..79ee310 100644
--- a/abspath.c
+++ b/abspath.c
@@ -205,6 +205,19 @@ const char *real_path_if_valid(const char *path)
 	return strbuf_realpath(&realpath, path, 0);
 }
 
+char *real_pathdup(const char *path)
+{
+	struct strbuf realpath = STRBUF_INIT;
+	char *retval = NULL;
+
+	if (strbuf_realpath(&realpath, path, 0))
+		retval = strbuf_detach(&realpath, NULL);
+
+	strbuf_release(&realpath);
+
+	return retval;
+}
+
 /*
  * Use this to get an absolute path from a relative one. If you want
  * to resolve links, you should use real_path.
diff --git a/cache.h b/cache.h
index 7a81294..e12a5d9 100644
--- a/cache.h
+++ b/cache.h
@@ -1068,6 +1068,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
+char *real_pathdup(const char *path);
 const char *absolute_path(const char *path);
 const char *remove_leading_path(const char *in, const char *prefix);
 const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
                       ` (2 preceding siblings ...)
  2016-12-12 18:16     ` [PATCH v3 3/4] real_path: create real_pathdup Brandon Williams
@ 2016-12-12 18:16     ` Brandon Williams
  2016-12-12 22:26       ` Junio C Hamano
  2016-12-21 21:51     ` [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts Johannes Sixt
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
  5 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-12 18:16 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds

Migrate callers of real_path() who duplicate the retern value to use
real_pathdup or strbuf_realpath.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/init-db.c |  6 +++---
 environment.c     |  2 +-
 setup.c           | 13 ++++++++-----
 sha1_file.c       |  2 +-
 submodule.c       |  2 +-
 transport.c       |  2 +-
 worktree.c        |  2 +-
 7 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 2399b97..76d68fa 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -338,7 +338,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
-	char *original_git_dir = xstrdup(real_path(git_dir));
+	char *original_git_dir = real_pathdup(git_dir);
 
 	if (real_git_dir) {
 		struct stat st;
@@ -489,7 +489,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, init_db_options, init_db_usage, 0);
 
 	if (real_git_dir && !is_absolute_path(real_git_dir))
-		real_git_dir = xstrdup(real_path(real_git_dir));
+		real_git_dir = real_pathdup(real_git_dir);
 
 	if (argc == 1) {
 		int mkdir_tried = 0;
@@ -560,7 +560,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		const char *git_dir_parent = strrchr(git_dir, '/');
 		if (git_dir_parent) {
 			char *rel = xstrndup(git_dir, git_dir_parent - git_dir);
-			git_work_tree_cfg = xstrdup(real_path(rel));
+			git_work_tree_cfg = real_pathdup(rel);
 			free(rel);
 		}
 		if (!git_work_tree_cfg)
diff --git a/environment.c b/environment.c
index 0935ec6..9b943d2 100644
--- a/environment.c
+++ b/environment.c
@@ -259,7 +259,7 @@ void set_git_work_tree(const char *new_work_tree)
 		return;
 	}
 	git_work_tree_initialized = 1;
-	work_tree = xstrdup(real_path(new_work_tree));
+	work_tree = real_pathdup(new_work_tree);
 }
 
 const char *get_git_work_tree(void)
diff --git a/setup.c b/setup.c
index fe572b8..1b534a7 100644
--- a/setup.c
+++ b/setup.c
@@ -256,8 +256,10 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		strbuf_addbuf(&path, &data);
 		strbuf_addstr(sb, real_path(path.buf));
 		ret = 1;
-	} else
+	} else {
 		strbuf_addstr(sb, gitdir);
+	}
+
 	strbuf_release(&data);
 	strbuf_release(&path);
 	return ret;
@@ -692,7 +694,7 @@ static const char *setup_discovered_git_dir(const char *gitdir,
 	/* --work-tree is set without --git-dir; use discovered one */
 	if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
 		if (offset != cwd->len && !is_absolute_path(gitdir))
-			gitdir = xstrdup(real_path(gitdir));
+			gitdir = real_pathdup(gitdir);
 		if (chdir(cwd->buf))
 			die_errno("Could not come back to cwd");
 		return setup_explicit_git_dir(gitdir, cwd, nongit_ok);
@@ -800,11 +802,12 @@ static int canonicalize_ceiling_entry(struct string_list_item *item,
 		/* Keep entry but do not canonicalize it */
 		return 1;
 	} else {
-		const char *real_path = real_path_if_valid(ceil);
-		if (!real_path)
+		char *real_path = real_pathdup(ceil);
+		if (!real_path) {
 			return 0;
+		}
 		free(item->string);
-		item->string = xstrdup(real_path);
+		item->string = real_path;
 		return 1;
 	}
 }
diff --git a/sha1_file.c b/sha1_file.c
index 9c86d19..6a03cc3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -291,7 +291,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (!is_absolute_path(entry) && relative_base) {
-		strbuf_addstr(&pathbuf, real_path(relative_base));
+		strbuf_realpath(&pathbuf, relative_base, 1);
 		strbuf_addch(&pathbuf, '/');
 	}
 	strbuf_addstr(&pathbuf, entry);
diff --git a/submodule.c b/submodule.c
index 6f7d883..c85ba50 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1227,7 +1227,7 @@ void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir)
 {
 	struct strbuf file_name = STRBUF_INIT;
 	struct strbuf rel_path = STRBUF_INIT;
-	const char *real_work_tree = xstrdup(real_path(work_tree));
+	const char *real_work_tree = real_pathdup(work_tree);
 
 	/* Update gitfile */
 	strbuf_addf(&file_name, "%s/.git", work_tree);
diff --git a/transport.c b/transport.c
index d57e8de..236c6f6 100644
--- a/transport.c
+++ b/transport.c
@@ -1130,7 +1130,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
 	const struct ref *extra;
 	struct alternate_refs_data *cb = data;
 
-	other = xstrdup(real_path(e->path));
+	other = real_pathdup(e->path);
 	len = strlen(other);
 
 	while (other[len-1] == '/')
diff --git a/worktree.c b/worktree.c
index f7869f8..c90e013 100644
--- a/worktree.c
+++ b/worktree.c
@@ -255,7 +255,7 @@ struct worktree *find_worktree(struct worktree **list,
 		return wt;
 
 	arg = prefix_filename(prefix, strlen(prefix), arg);
-	path = xstrdup(real_path(arg));
+	path = real_pathdup(arg);
 	for (; *list; list++)
 		if (!fspathcmp(path, real_path((*list)->path)))
 			break;
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 1/4] real_path: resolve symlinks by hand
  2016-12-12 18:16     ` [PATCH v3 1/4] real_path: resolve symlinks by hand Brandon Williams
@ 2016-12-12 22:19       ` Junio C Hamano
  2016-12-12 22:50         ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Junio C Hamano @ 2016-12-12 22:19 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds

Brandon Williams <bmwill@google.com> writes:

> +/* removes the last path component from 'path' except if 'path' is root */
> +static void strip_last_component(struct strbuf *path)
> +{
> +	size_t offset = offset_1st_component(path->buf);
> +	size_t len = path->len;
> +
> +	/* Find start of the last component */
> +	while (offset < len && !is_dir_sep(path->buf[len - 1]))
> +		len--;

If somebody at a higher level in the callchain has already
normalized path, this is not a problem, but this will behave
"unexpectedly" when path ends with a dir_sep byte (or more).

E.g. for input path "foo/bar/", the above loop runs zero times and
then ...

> +	/* Skip sequences of multiple path-separators */
> +	while (offset < len && is_dir_sep(path->buf[len - 1]))
> +		len--;

... the slash at the end is removed, leaving "foo/bar" in path.

> +	strbuf_setlen(path, len);
> +}
> ...
> +/* get (and remove) the next component in 'remaining' and place it in 'next' */
> +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> +{
> +	char *start = NULL;
> +	char *end = NULL;
> +
> +	strbuf_reset(next);
> +
> +	/* look for the next component */
> +	/* Skip sequences of multiple path-separators */
> +	for (start = remaining->buf; is_dir_sep(*start); start++)
> +		; /* nothing */
> +	/* Find end of the path component */
> +	for (end = start; *end && !is_dir_sep(*end); end++)
> +		; /* nothing */
> +
> +	strbuf_add(next, start, end - start);
> +	/* remove the component from 'remaining' */
> +	strbuf_remove(remaining, 0, end - remaining->buf);
> +}

Unlike the strip_last_component(), I think this one is more
carefully done and avoids getting fooled by //extra//slashes// at
the beginning or at the end, which does help in the correctness of
the loop we see below.

> @@ -58,74 +88,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
>  			goto error_out;
>  	}
>  
> +	strbuf_reset(&resolved);
> +
> +	if (is_absolute_path(path)) {
> +		/* absolute path; start with only root as being resolved */
> +		int offset = offset_1st_component(path);
> +		strbuf_add(&resolved, path, offset);
> +		strbuf_addstr(&remaining, path + offset);
> +	} else {
> +		/* relative path; can use CWD as the initial resolved path */
> +		if (strbuf_getcwd(&resolved)) {
> +			if (die_on_error)
> +				die_errno("unable to get current working directory");
> +			else
> +				goto error_out;
>  		}
> +		strbuf_addstr(&remaining, path);
> +	}
>  
> +	/* Iterate over the remaining path components */
> +	while (remaining.len > 0) {
> +		get_next_component(&next, &remaining);
> +
> +		if (next.len == 0) {
> +			continue; /* empty component */
> +		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
> +			continue; /* '.' component */
> +		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
> +			/* '..' component; strip the last path component */
> +			strip_last_component(&resolved);

Wouldn't this let "resolved" eventually run out of the path
components to strip for a malformed input e.g. "/a/../../b"?

> + ...
> +			/*
> +			 * if there are still remaining components to resolve
> +			 * then append them to symlink
> +			 */
> +			if (remaining.len) {
> +				strbuf_addch(&symlink, '/');

This can add duplicate dir_sep if readlink(2)'ed contents of the
symbolic link already ends with a slash, but I think it (together
with the fact that the code does nothing to normalize what is read
from the symbolic link) probably does not matter, given the way how
get_next_component() is implemented.

> +				strbuf_addbuf(&symlink, &remaining);
> +			}
> +
> +			/*
> +			 * use the symlink as the remaining components that
> +			 * need to be resloved
> +			 */
> +			strbuf_swap(&symlink, &remaining);
> +		}
>  	}
>  
> +	retval = resolved.buf;
> +
>  error_out:
> +	strbuf_release(&remaining);
> +	strbuf_release(&next);
> +	strbuf_release(&symlink);
>  
>  	return retval;
>  }


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/4] real_path: convert real_path_internal to strbuf_realpath
  2016-12-12 18:16     ` [PATCH v3 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
@ 2016-12-12 22:20       ` Junio C Hamano
  0 siblings, 0 replies; 83+ messages in thread
From: Junio C Hamano @ 2016-12-12 22:20 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds

Brandon Williams <bmwill@google.com> writes:

> Change the name of real_path_internal to strbuf_realpath.  In addition
> push the static strbuf up to its callers and instead take as a
> parameter a pointer to a strbuf to use for the final result.
>
> This change makes strbuf_realpath reentrant.

Yup, this step makes sense.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/4] real_path: create real_pathdup
  2016-12-12 18:16     ` [PATCH v3 3/4] real_path: create real_pathdup Brandon Williams
@ 2016-12-12 22:25       ` Junio C Hamano
  0 siblings, 0 replies; 83+ messages in thread
From: Junio C Hamano @ 2016-12-12 22:25 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds

Brandon Williams <bmwill@google.com> writes:

> Create real_pathdup which returns a caller owned string of the resolved
> realpath based on the provide path.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  abspath.c | 13 +++++++++++++
>  cache.h   |  1 +
>  2 files changed, 14 insertions(+)
>
> diff --git a/abspath.c b/abspath.c
> index 8c6c76b..79ee310 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -205,6 +205,19 @@ const char *real_path_if_valid(const char *path)
>  	return strbuf_realpath(&realpath, path, 0);
>  }
>  
> +char *real_pathdup(const char *path)
> +{
> +	struct strbuf realpath = STRBUF_INIT;
> +	char *retval = NULL;
> +
> +	if (strbuf_realpath(&realpath, path, 0))
> +		retval = strbuf_detach(&realpath, NULL);
> +
> +	strbuf_release(&realpath);
> +
> +	return retval;
> +}

OK.  Taken alone, the above makes a reader wonder if it still makes
sense for strbuf_realpath() to return realpath.buf (or NULL upon
error).  An obvious alternative is to return 0 on success and -1 on
failure.

But other callers of strbuf_realpath() are mimicking the historical
"give a path in 'const char *', receive a path in 'const char *'
that you have to xstrdup() if you want to keep it" interface, and
this interface may be easier for them.  I dunno.

>  /*
>   * Use this to get an absolute path from a relative one. If you want
>   * to resolve links, you should use real_path.
> diff --git a/cache.h b/cache.h
> index 7a81294..e12a5d9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1068,6 +1068,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  		      int die_on_error);
>  const char *real_path(const char *path);
>  const char *real_path_if_valid(const char *path);
> +char *real_pathdup(const char *path);
>  const char *absolute_path(const char *path);
>  const char *remove_leading_path(const char *in, const char *prefix);
>  const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath
  2016-12-12 18:16     ` [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
@ 2016-12-12 22:26       ` Junio C Hamano
  2016-12-12 23:47         ` Junio C Hamano
  0 siblings, 1 reply; 83+ messages in thread
From: Junio C Hamano @ 2016-12-12 22:26 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds

Brandon Williams <bmwill@google.com> writes:

> Migrate callers of real_path() who duplicate the retern value to use
> real_pathdup or strbuf_realpath.

Looks good.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 1/4] real_path: resolve symlinks by hand
  2016-12-12 22:19       ` Junio C Hamano
@ 2016-12-12 22:50         ` Brandon Williams
  2016-12-12 23:32           ` Junio C Hamano
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-12 22:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds

On 12/12, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > +/* removes the last path component from 'path' except if 'path' is root */
> > +static void strip_last_component(struct strbuf *path)
> > +{
> > +	size_t offset = offset_1st_component(path->buf);
> > +	size_t len = path->len;
> > +
> > +	/* Find start of the last component */
> > +	while (offset < len && !is_dir_sep(path->buf[len - 1]))
> > +		len--;
> 
> If somebody at a higher level in the callchain has already
> normalized path, this is not a problem, but this will behave
> "unexpectedly" when path ends with a dir_sep byte (or more).
> 
> E.g. for input path "foo/bar/", the above loop runs zero times and
> then ...
> 
> > +	/* Skip sequences of multiple path-separators */
> > +	while (offset < len && is_dir_sep(path->buf[len - 1]))
> > +		len--;
> 
> ... the slash at the end is removed, leaving "foo/bar" in path.
> 

The way this is currently used I don't believe this scenario can happen
(since input to this shouldn't have trailing slashes), but if others
begin to use this function then yes, that is an implicit assumption.  I
think it may be an ok assumption though since this is only called on
"resolved" which is the ouput and needs to be normalized to begin with. To
fix this we could simply add the while loop that strips dir_sep at the
beginning as well as at the end, like so:

  /* Skip sequences of multiple path-separators */
  while (offset < len && is_dir_sep(path->buf[len - 1]))
  	len--;
  /* Skip sequences of multiple path-separators */
  while (offset < len && !is_dir_sep(path->buf[len - 1]))
  	len--;
  /* Skip sequences of multiple path-separators */
  while (offset < len && is_dir_sep(path->buf[len - 1]))
  	len--;

> > +	strbuf_setlen(path, len);
> > +}
> > ...
> > +/* get (and remove) the next component in 'remaining' and place it in 'next' */
> > +static void get_next_component(struct strbuf *next, struct strbuf *remaining)
> > +{
> > +	char *start = NULL;
> > +	char *end = NULL;
> > +
> > +	strbuf_reset(next);
> > +
> > +	/* look for the next component */
> > +	/* Skip sequences of multiple path-separators */
> > +	for (start = remaining->buf; is_dir_sep(*start); start++)
> > +		; /* nothing */
> > +	/* Find end of the path component */
> > +	for (end = start; *end && !is_dir_sep(*end); end++)
> > +		; /* nothing */
> > +
> > +	strbuf_add(next, start, end - start);
> > +	/* remove the component from 'remaining' */
> > +	strbuf_remove(remaining, 0, end - remaining->buf);
> > +}
> 
> Unlike the strip_last_component(), I think this one is more
> carefully done and avoids getting fooled by //extra//slashes// at
> the beginning or at the end, which does help in the correctness of
> the loop we see below.
> 
> > @@ -58,74 +88,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
> >  			goto error_out;
> >  	}
> >  
> > +	strbuf_reset(&resolved);
> > +
> > +	if (is_absolute_path(path)) {
> > +		/* absolute path; start with only root as being resolved */
> > +		int offset = offset_1st_component(path);
> > +		strbuf_add(&resolved, path, offset);
> > +		strbuf_addstr(&remaining, path + offset);
> > +	} else {
> > +		/* relative path; can use CWD as the initial resolved path */
> > +		if (strbuf_getcwd(&resolved)) {
> > +			if (die_on_error)
> > +				die_errno("unable to get current working directory");
> > +			else
> > +				goto error_out;
> >  		}
> > +		strbuf_addstr(&remaining, path);
> > +	}
> >  
> > +	/* Iterate over the remaining path components */
> > +	while (remaining.len > 0) {
> > +		get_next_component(&next, &remaining);
> > +
> > +		if (next.len == 0) {
> > +			continue; /* empty component */
> > +		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
> > +			continue; /* '.' component */
> > +		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
> > +			/* '..' component; strip the last path component */
> > +			strip_last_component(&resolved);
> 
> Wouldn't this let "resolved" eventually run out of the path
> components to strip for a malformed input e.g. "/a/../../b"?
> 

As I understand it, that path is correct and would resolve to "/b".  The
strip_last_component function doesn't allow striping the "1st" component
or the "root" component.  So if resolved is "/" and we encounter a ".."
which requires striping of the last component, the result would be "/".

> > + ...
> > +			/*
> > +			 * if there are still remaining components to resolve
> > +			 * then append them to symlink
> > +			 */
> > +			if (remaining.len) {
> > +				strbuf_addch(&symlink, '/');
> 
> This can add duplicate dir_sep if readlink(2)'ed contents of the
> symbolic link already ends with a slash, but I think it (together
> with the fact that the code does nothing to normalize what is read
> from the symbolic link) probably does not matter, given the way how
> get_next_component() is implemented.
> 

Yes, I think the way get_next_component() is written will account for
non-normalized symlink contents.  This way we only have to worry about
normalizing input in one location (maybe two with
strip_last_component()).

> > +				strbuf_addbuf(&symlink, &remaining);
> > +			}
> > +
> > +			/*
> > +			 * use the symlink as the remaining components that
> > +			 * need to be resloved
> > +			 */
> > +			strbuf_swap(&symlink, &remaining);
> > +		}
> >  	}
> >  
> > +	retval = resolved.buf;
> > +
> >  error_out:
> > +	strbuf_release(&remaining);
> > +	strbuf_release(&next);
> > +	strbuf_release(&symlink);
> >  
> >  	return retval;
> >  }
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 1/4] real_path: resolve symlinks by hand
  2016-12-12 22:50         ` Brandon Williams
@ 2016-12-12 23:32           ` Junio C Hamano
  0 siblings, 0 replies; 83+ messages in thread
From: Junio C Hamano @ 2016-12-12 23:32 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds

Brandon Williams <bmwill@google.com> writes:

> On 12/12, Junio C Hamano wrote:
> ...
>> E.g. for input path "foo/bar/", the above loop runs zero times and
>> then ...
>> 
>> > +	/* Skip sequences of multiple path-separators */
>> > +	while (offset < len && is_dir_sep(path->buf[len - 1]))
>> > +		len--;
>> 
>> ... the slash at the end is removed, leaving "foo/bar" in path.
>
> The way this is currently used I don't believe this scenario can happen
> (since input to this shouldn't have trailing slashes), but if others
> begin to use this function then yes, that is an implicit assumption.

OK.

>> > +	/* Iterate over the remaining path components */
>> > +	while (remaining.len > 0) {
>> > +		get_next_component(&next, &remaining);
>> > +
>> > +		if (next.len == 0) {
>> > +			continue; /* empty component */
>> > +		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
>> > +			continue; /* '.' component */
>> > +		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
>> > +			/* '..' component; strip the last path component */
>> > +			strip_last_component(&resolved);
>> 
>> Wouldn't this let "resolved" eventually run out of the path
>> components to strip for a malformed input e.g. "/a/../../b"?
>
> As I understand it, that path is correct and would resolve to "/b".

That is OK on the traditional UNIX end.  

I am not sure if we want to handle the "//host/share/$remaining" in
this codepath, though.  If we do, then this is still an issue (IIRC,
somebody explained that we do not want to step into //host/share/
part when seeing ".."---we'd at least need to leave a NEEDSWORK
comment so that interested parties can later take a look into it.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath
  2016-12-12 22:26       ` Junio C Hamano
@ 2016-12-12 23:47         ` Junio C Hamano
  2016-12-12 23:58           ` Stefan Beller
  0 siblings, 1 reply; 83+ messages in thread
From: Junio C Hamano @ 2016-12-12 23:47 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds

Junio C Hamano <gitster@pobox.com> writes:

> Brandon Williams <bmwill@google.com> writes:
>
>> Migrate callers of real_path() who duplicate the retern value to use
>> real_pathdup or strbuf_realpath.
>
> Looks good.

This has small non-textual conflicts with Stefan's embed^Wabsorption
topic; please holler if you spot my mismerge.  Thanks.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath
  2016-12-12 23:47         ` Junio C Hamano
@ 2016-12-12 23:58           ` Stefan Beller
  2016-12-13  1:15             ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Stefan Beller @ 2016-12-12 23:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Brandon Williams, git@vger.kernel.org, Jeff King, Jacob Keller,
	Ramsay Jones, Torsten Bögershausen, Johannes Sixt,
	Duy Nguyen

On Mon, Dec 12, 2016 at 3:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Brandon Williams <bmwill@google.com> writes:
>>
>>> Migrate callers of real_path() who duplicate the retern value to use
>>> real_pathdup or strbuf_realpath.
>>
>> Looks good.
>
> This has small non-textual conflicts with Stefan's embed^Wabsorption
> topic; please holler if you spot my mismerge.  Thanks.
>

5f6a003727 (Merge branch 'bw/realpath-wo-chdir' into jch)
looks good to me.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath
  2016-12-12 23:58           ` Stefan Beller
@ 2016-12-13  1:15             ` Brandon Williams
  2016-12-13  6:39               ` Junio C Hamano
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-13  1:15 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, git@vger.kernel.org, Jeff King, Jacob Keller,
	Ramsay Jones, Torsten Bögershausen, Johannes Sixt,
	Duy Nguyen

On 12/12, Stefan Beller wrote:
> On Mon, Dec 12, 2016 at 3:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> >
> >> Brandon Williams <bmwill@google.com> writes:
> >>
> >>> Migrate callers of real_path() who duplicate the retern value to use
> >>> real_pathdup or strbuf_realpath.
> >>
> >> Looks good.
> >
> > This has small non-textual conflicts with Stefan's embed^Wabsorption
> > topic; please holler if you spot my mismerge.  Thanks.
> >
> 
> 5f6a003727 (Merge branch 'bw/realpath-wo-chdir' into jch)
> looks good to me.

Looks good to me as well.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath
  2016-12-13  1:15             ` Brandon Williams
@ 2016-12-13  6:39               ` Junio C Hamano
  0 siblings, 0 replies; 83+ messages in thread
From: Junio C Hamano @ 2016-12-13  6:39 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Stefan Beller, git@vger.kernel.org, Jeff King, Jacob Keller,
	Ramsay Jones, Torsten Bögershausen, Johannes Sixt,
	Duy Nguyen

Brandon Williams <bmwill@google.com> writes:

> On 12/12, Stefan Beller wrote:
>> On Mon, Dec 12, 2016 at 3:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> > Junio C Hamano <gitster@pobox.com> writes:
>> >
>> >> Brandon Williams <bmwill@google.com> writes:
>> >>
>> >>> Migrate callers of real_path() who duplicate the retern value to use
>> >>> real_pathdup or strbuf_realpath.
>> >>
>> >> Looks good.
>> >
>> > This has small non-textual conflicts with Stefan's embed^Wabsorption
>> > topic; please holler if you spot my mismerge.  Thanks.
>> 
>> 5f6a003727 (Merge branch 'bw/realpath-wo-chdir' into jch)
>> looks good to me.
>
> Looks good to me as well.

Thanks.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
                       ` (3 preceding siblings ...)
  2016-12-12 18:16     ` [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
@ 2016-12-21 21:51     ` Johannes Sixt
  2016-12-21 22:33       ` Brandon Williams
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
  5 siblings, 1 reply; 83+ messages in thread
From: Johannes Sixt @ 2016-12-21 21:51 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

When an absolute path is resolved, resolution begins at the first path
component after the root part. The root part is just copied verbatim,
because it must not be inspected for symbolic links. For POSIX paths,
this is just the initial slash, but on Windows, the root part has the
forms c:\ or \\server\share. We do want to canonicalize the back-slashes
in the root part because these parts are compared to the result of
getcwd(), which does return a fully canonicalized path.

Factor out a helper that splits off the root part, and have it
canonicalize the copied part.

This change was prompted because t1504-ceiling-dirs.sh caught a breakage
in GIT_CEILING_DIRECTORIES handling on Windows.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 This introduces the second #ifdef GIT_WINDOWS_NATIVE in this file.
 It could be avoided if convert_slashes were defined as a do-nothing
 on POSIX, but that would not help the other occurrence. Therefore,
 I suggest to leave it at this.

 abspath.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/abspath.c b/abspath.c
index 79ee310867..1d56f5ed9f 100644
--- a/abspath.c
+++ b/abspath.c
@@ -48,6 +48,19 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
 	strbuf_remove(remaining, 0, end - remaining->buf);
 }
 
+/* copies root part from remaining to resolved, canonicalizing it on the way */
+static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
+{
+	int offset = offset_1st_component(remaining->buf);
+
+	strbuf_reset(resolved);
+	strbuf_add(resolved, remaining->buf, offset);
+#ifdef GIT_WINDOWS_NATIVE
+	convert_slashes(resolved->buf);
+#endif
+	strbuf_remove(remaining, 0, offset);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
 #define MAXSYMLINKS 5
 
@@ -80,14 +93,10 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			goto error_out;
 	}
 
-	strbuf_reset(resolved);
+	strbuf_addstr(&remaining, path);
+	get_root_part(resolved, &remaining);
 
-	if (is_absolute_path(path)) {
-		/* absolute path; start with only root as being resolved */
-		int offset = offset_1st_component(path);
-		strbuf_add(resolved, path, offset);
-		strbuf_addstr(&remaining, path + offset);
-	} else {
+	if (!resolved->len) {
 		/* relative path; can use CWD as the initial resolved path */
 		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
@@ -95,7 +104,6 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			else
 				goto error_out;
 		}
-		strbuf_addstr(&remaining, path);
 	}
 
 	/* Iterate over the remaining path components */
@@ -150,10 +158,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 
 			if (is_absolute_path(symlink.buf)) {
 				/* absolute symlink; set resolved to root */
-				int offset = offset_1st_component(symlink.buf);
-				strbuf_reset(resolved);
-				strbuf_add(resolved, symlink.buf, offset);
-				strbuf_remove(&symlink, 0, offset);
+				get_root_part(resolved, &symlink);
 			} else {
 				/*
 				 * relative symlink
-- 
2.11.0.79.gf6b77ca


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts
  2016-12-21 21:51     ` [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts Johannes Sixt
@ 2016-12-21 22:33       ` Brandon Williams
  2016-12-22  6:07         ` Johannes Sixt
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2016-12-21 22:33 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

On 12/21, Johannes Sixt wrote:
> When an absolute path is resolved, resolution begins at the first path
> component after the root part. The root part is just copied verbatim,
> because it must not be inspected for symbolic links. For POSIX paths,
> this is just the initial slash, but on Windows, the root part has the
> forms c:\ or \\server\share. We do want to canonicalize the back-slashes
> in the root part because these parts are compared to the result of
> getcwd(), which does return a fully canonicalized path.
> 
> Factor out a helper that splits off the root part, and have it
> canonicalize the copied part.
> 
> This change was prompted because t1504-ceiling-dirs.sh caught a breakage
> in GIT_CEILING_DIRECTORIES handling on Windows.
> 
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  This introduces the second #ifdef GIT_WINDOWS_NATIVE in this file.
>  It could be avoided if convert_slashes were defined as a do-nothing
>  on POSIX, but that would not help the other occurrence. Therefore,
>  I suggest to leave it at this.
> 
>  abspath.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/abspath.c b/abspath.c
> index 79ee310867..1d56f5ed9f 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -48,6 +48,19 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
>  	strbuf_remove(remaining, 0, end - remaining->buf);
>  }
>  
> +/* copies root part from remaining to resolved, canonicalizing it on the way */
> +static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
> +{
> +	int offset = offset_1st_component(remaining->buf);
> +
> +	strbuf_reset(resolved);
> +	strbuf_add(resolved, remaining->buf, offset);
> +#ifdef GIT_WINDOWS_NATIVE
> +	convert_slashes(resolved->buf);
> +#endif

So then the only extra cononicalization that is happening here is
converting '\\server\share' to '//server/share'? (or 'c:\' to 'c:/')

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts
  2016-12-21 22:33       ` Brandon Williams
@ 2016-12-22  6:07         ` Johannes Sixt
  2016-12-22 17:33           ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Johannes Sixt @ 2016-12-22  6:07 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

Am 21.12.2016 um 23:33 schrieb Brandon Williams:
> On 12/21, Johannes Sixt wrote:
>> +/* copies root part from remaining to resolved, canonicalizing it on the way */
>> +static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
>> +{
>> +	int offset = offset_1st_component(remaining->buf);
>> +
>> +	strbuf_reset(resolved);
>> +	strbuf_add(resolved, remaining->buf, offset);
>> +#ifdef GIT_WINDOWS_NATIVE
>> +	convert_slashes(resolved->buf);
>> +#endif
>
> So then the only extra cononicalization that is happening here is
> converting '\\server\share' to '//server/share'? (or 'c:\' to 'c:/')

Correct. All other directory separators are canonicalized by the primary 
function, strbuf_realpath.

-- Hannes


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts
  2016-12-22  6:07         ` Johannes Sixt
@ 2016-12-22 17:33           ` Brandon Williams
  2016-12-22 18:54             ` Johannes Sixt
  2016-12-22 19:33             ` Junio C Hamano
  0 siblings, 2 replies; 83+ messages in thread
From: Brandon Williams @ 2016-12-22 17:33 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

On 12/22, Johannes Sixt wrote:
> Am 21.12.2016 um 23:33 schrieb Brandon Williams:
> >On 12/21, Johannes Sixt wrote:
> >>+/* copies root part from remaining to resolved, canonicalizing it on the way */
> >>+static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
> >>+{
> >>+	int offset = offset_1st_component(remaining->buf);
> >>+
> >>+	strbuf_reset(resolved);
> >>+	strbuf_add(resolved, remaining->buf, offset);
> >>+#ifdef GIT_WINDOWS_NATIVE
> >>+	convert_slashes(resolved->buf);
> >>+#endif
> >
> >So then the only extra cononicalization that is happening here is
> >converting '\\server\share' to '//server/share'? (or 'c:\' to 'c:/')
> 
> Correct. All other directory separators are canonicalized by the
> primary function, strbuf_realpath.

Sounds good. Logically everything looks good to me.  And I like that
setting 'resolved' to the root of an abs path is pulled out into a
helper function.  It took me a couple extra seconds to realize that
offset_1st_component returns 0 with a relative path, which makes causes
the call to get_root_part to essentially be a noop (ie nothing is
resolved).

Thanks for helping get this to work on windows!

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts
  2016-12-22 17:33           ` Brandon Williams
@ 2016-12-22 18:54             ` Johannes Sixt
  2016-12-22 19:33             ` Junio C Hamano
  1 sibling, 0 replies; 83+ messages in thread
From: Johannes Sixt @ 2016-12-22 18:54 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, gitster, ramsay, tboegi,
	pclouds

Am 22.12.2016 um 18:33 schrieb Brandon Williams:
> It took me a couple extra seconds to realize that
> offset_1st_component returns 0 with a relative path, which makes causes
> the call to get_root_part to essentially be a noop (ie nothing is
> resolved).

Yeah, I am still unsure whether it is a good idea to optimize away the 
is_absolute_path() call, because we lose the symmetry to the symlink 
case, where we cannot get rid of the call...

But I think the condition plus comment

    if (!resolved->len) {
        /* relative path; can use CWD as the initial resolved path */

makes things fairly clear.

-- Hannes


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts
  2016-12-22 17:33           ` Brandon Williams
  2016-12-22 18:54             ` Johannes Sixt
@ 2016-12-22 19:33             ` Junio C Hamano
  1 sibling, 0 replies; 83+ messages in thread
From: Junio C Hamano @ 2016-12-22 19:33 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Johannes Sixt, git, sbeller, peff, jacob.keller, ramsay, tboegi,
	pclouds

Brandon Williams <bmwill@google.com> writes:

> On 12/22, Johannes Sixt wrote:
>> Am 21.12.2016 um 23:33 schrieb Brandon Williams:
>> >On 12/21, Johannes Sixt wrote:
>> >>+/* copies root part from remaining to resolved, canonicalizing it on the way */
>> >>+static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
>> >>+{
>> >>+	int offset = offset_1st_component(remaining->buf);
>> >>+
>> >>+	strbuf_reset(resolved);
>> >>+	strbuf_add(resolved, remaining->buf, offset);
>> >>+#ifdef GIT_WINDOWS_NATIVE
>> >>+	convert_slashes(resolved->buf);
>> >>+#endif
>> >
>> >So then the only extra cononicalization that is happening here is
>> >converting '\\server\share' to '//server/share'? (or 'c:\' to 'c:/')
>> 
>> Correct. All other directory separators are canonicalized by the
>> primary function, strbuf_realpath.
>
> Sounds good. Logically everything looks good to me.  And I like that
> setting 'resolved' to the root of an abs path is pulled out into a
> helper function.  It took me a couple extra seconds to realize that
> offset_1st_component returns 0 with a relative path, which makes causes
> the call to get_root_part to essentially be a noop (ie nothing is
> resolved).
>
> Thanks for helping get this to work on windows!

Thanks, both.  

Let's move the topic with this patch to 'next'.  Further
micro-optimization can be done incrementally if desired.




^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v4 0/5] road to reentrant real_path
  2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
                       ` (4 preceding siblings ...)
  2016-12-21 21:51     ` [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts Johannes Sixt
@ 2017-01-03 19:09     ` Brandon Williams
  2017-01-03 19:09       ` [PATCH v4 1/5] real_path: resolve symlinks by hand Brandon Williams
                         ` (6 more replies)
  5 siblings, 7 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-03 19:09 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
MAXDEPTH due to a naming conflict brought up by Lars Schneider.

Brandon Williams (4):
  real_path: resolve symlinks by hand
  real_path: convert real_path_internal to strbuf_realpath
  real_path: create real_pathdup
  real_path: have callers use real_pathdup and strbuf_realpath

Johannes Sixt (1):
  real_path: canonicalize directory separators in root parts

 abspath.c         | 225 +++++++++++++++++++++++++++++++++++++-----------------
 builtin/init-db.c |   6 +-
 cache.h           |   3 +
 environment.c     |   2 +-
 setup.c           |  13 ++--
 sha1_file.c       |   2 +-
 submodule.c       |   2 +-
 transport.c       |   2 +-
 worktree.c        |   2 +-
 9 files changed, 173 insertions(+), 84 deletions(-)

--- interdiff with v3

diff --git a/abspath.c b/abspath.c
index 1d56f5ed9..3562d17bf 100644
--- a/abspath.c
+++ b/abspath.c
@@ -62,7 +62,7 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
 }
 
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXSYMLINKS 5
+#define MAXDEPTH 5
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
@@ -138,10 +138,10 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			ssize_t len;
 			strbuf_reset(&symlink);
 
-			if (num_symlinks++ > MAXSYMLINKS) {
+			if (num_symlinks++ > MAXDEPTH) {
 				if (die_on_error)
 					die("More than %d nested symlinks "
-					    "on path '%s'", MAXSYMLINKS, path);
+					    "on path '%s'", MAXDEPTH, path);
 				else
 					goto error_out;
 			}

-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 1/5] real_path: resolve symlinks by hand
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
@ 2017-01-03 19:09       ` Brandon Williams
  2017-01-03 19:09       ` [PATCH v4 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-03 19:09 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

The current implementation of real_path uses chdir() in order to resolve
symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
process as a whole and not just an individual thread.  Instead perform
the symlink resolution by hand so that the calls to chdir() can be
removed, making real_path one step closer to being reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 188 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 128 insertions(+), 60 deletions(-)

diff --git a/abspath.c b/abspath.c
index 2825de859..0f34636a8 100644
--- a/abspath.c
+++ b/abspath.c
@@ -11,6 +11,43 @@ int is_directory(const char *path)
 	return (!stat(path, &st) && S_ISDIR(st.st_mode));
 }
 
+/* removes the last path component from 'path' except if 'path' is root */
+static void strip_last_component(struct strbuf *path)
+{
+	size_t offset = offset_1st_component(path->buf);
+	size_t len = path->len;
+
+	/* Find start of the last component */
+	while (offset < len && !is_dir_sep(path->buf[len - 1]))
+		len--;
+	/* Skip sequences of multiple path-separators */
+	while (offset < len && is_dir_sep(path->buf[len - 1]))
+		len--;
+
+	strbuf_setlen(path, len);
+}
+
+/* get (and remove) the next component in 'remaining' and place it in 'next' */
+static void get_next_component(struct strbuf *next, struct strbuf *remaining)
+{
+	char *start = NULL;
+	char *end = NULL;
+
+	strbuf_reset(next);
+
+	/* look for the next component */
+	/* Skip sequences of multiple path-separators */
+	for (start = remaining->buf; is_dir_sep(*start); start++)
+		; /* nothing */
+	/* Find end of the path component */
+	for (end = start; *end && !is_dir_sep(*end); end++)
+		; /* nothing */
+
+	strbuf_add(next, start, end - start);
+	/* remove the component from 'remaining' */
+	strbuf_remove(remaining, 0, end - remaining->buf);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
 #define MAXDEPTH 5
 
@@ -21,7 +58,6 @@ int is_directory(const char *path)
  * absolute_path().)  The return value is a pointer to a static
  * buffer.
  *
- * The input and all intermediate paths must be shorter than MAX_PATH.
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
@@ -33,22 +69,16 @@ int is_directory(const char *path)
  */
 static const char *real_path_internal(const char *path, int die_on_error)
 {
-	static struct strbuf sb = STRBUF_INIT;
+	static struct strbuf resolved = STRBUF_INIT;
+	struct strbuf remaining = STRBUF_INIT;
+	struct strbuf next = STRBUF_INIT;
+	struct strbuf symlink = STRBUF_INIT;
 	char *retval = NULL;
-
-	/*
-	 * If we have to temporarily chdir(), store the original CWD
-	 * here so that we can chdir() back to it at the end of the
-	 * function:
-	 */
-	struct strbuf cwd = STRBUF_INIT;
-
-	int depth = MAXDEPTH;
-	char *last_elem = NULL;
+	int num_symlinks = 0;
 	struct stat st;
 
 	/* We've already done it */
-	if (path == sb.buf)
+	if (path == resolved.buf)
 		return path;
 
 	if (!*path) {
@@ -58,74 +88,112 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&sb);
-	strbuf_addstr(&sb, path);
-
-	while (depth--) {
-		if (!is_directory(sb.buf)) {
-			char *last_slash = find_last_dir_sep(sb.buf);
-			if (last_slash) {
-				last_elem = xstrdup(last_slash + 1);
-				strbuf_setlen(&sb, last_slash - sb.buf + 1);
-			} else {
-				last_elem = xmemdupz(sb.buf, sb.len);
-				strbuf_reset(&sb);
-			}
+	strbuf_reset(&resolved);
+
+	if (is_absolute_path(path)) {
+		/* absolute path; start with only root as being resolved */
+		int offset = offset_1st_component(path);
+		strbuf_add(&resolved, path, offset);
+		strbuf_addstr(&remaining, path + offset);
+	} else {
+		/* relative path; can use CWD as the initial resolved path */
+		if (strbuf_getcwd(&resolved)) {
+			if (die_on_error)
+				die_errno("unable to get current working directory");
+			else
+				goto error_out;
 		}
+		strbuf_addstr(&remaining, path);
+	}
 
-		if (sb.len) {
-			if (!cwd.len && strbuf_getcwd(&cwd)) {
+	/* Iterate over the remaining path components */
+	while (remaining.len > 0) {
+		get_next_component(&next, &remaining);
+
+		if (next.len == 0) {
+			continue; /* empty component */
+		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
+			continue; /* '.' component */
+		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
+			/* '..' component; strip the last path component */
+			strip_last_component(&resolved);
+			continue;
+		}
+
+		/* append the next component and resolve resultant path */
+		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
+			strbuf_addch(&resolved, '/');
+		strbuf_addbuf(&resolved, &next);
+
+		if (lstat(resolved.buf, &st)) {
+			/* error out unless this was the last component */
+			if (errno != ENOENT || remaining.len) {
 				if (die_on_error)
-					die_errno("Could not get current working directory");
+					die_errno("Invalid path '%s'",
+						  resolved.buf);
 				else
 					goto error_out;
 			}
+		} else if (S_ISLNK(st.st_mode)) {
+			ssize_t len;
+			strbuf_reset(&symlink);
 
-			if (chdir(sb.buf)) {
+			if (num_symlinks++ > MAXDEPTH) {
 				if (die_on_error)
-					die_errno("Could not switch to '%s'",
-						  sb.buf);
+					die("More than %d nested symlinks "
+					    "on path '%s'", MAXDEPTH, path);
 				else
 					goto error_out;
 			}
-		}
-		if (strbuf_getcwd(&sb)) {
-			if (die_on_error)
-				die_errno("Could not get current working directory");
-			else
-				goto error_out;
-		}
-
-		if (last_elem) {
-			if (sb.len && !is_dir_sep(sb.buf[sb.len - 1]))
-				strbuf_addch(&sb, '/');
-			strbuf_addstr(&sb, last_elem);
-			free(last_elem);
-			last_elem = NULL;
-		}
 
-		if (!lstat(sb.buf, &st) && S_ISLNK(st.st_mode)) {
-			struct strbuf next_sb = STRBUF_INIT;
-			ssize_t len = strbuf_readlink(&next_sb, sb.buf, 0);
+			len = strbuf_readlink(&symlink, resolved.buf,
+					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  sb.buf);
+						  resolved.buf);
 				else
 					goto error_out;
 			}
-			strbuf_swap(&sb, &next_sb);
-			strbuf_release(&next_sb);
-		} else
-			break;
+
+			if (is_absolute_path(symlink.buf)) {
+				/* absolute symlink; set resolved to root */
+				int offset = offset_1st_component(symlink.buf);
+				strbuf_reset(&resolved);
+				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_remove(&symlink, 0, offset);
+			} else {
+				/*
+				 * relative symlink
+				 * strip off the last component since it will
+				 * be replaced with the contents of the symlink
+				 */
+				strip_last_component(&resolved);
+			}
+
+			/*
+			 * if there are still remaining components to resolve
+			 * then append them to symlink
+			 */
+			if (remaining.len) {
+				strbuf_addch(&symlink, '/');
+				strbuf_addbuf(&symlink, &remaining);
+			}
+
+			/*
+			 * use the symlink as the remaining components that
+			 * need to be resloved
+			 */
+			strbuf_swap(&symlink, &remaining);
+		}
 	}
 
-	retval = sb.buf;
+	retval = resolved.buf;
+
 error_out:
-	free(last_elem);
-	if (cwd.len && chdir(cwd.buf))
-		die_errno("Could not change back to '%s'", cwd.buf);
-	strbuf_release(&cwd);
+	strbuf_release(&remaining);
+	strbuf_release(&next);
+	strbuf_release(&symlink);
 
 	return retval;
 }
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 2/5] real_path: convert real_path_internal to strbuf_realpath
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
  2017-01-03 19:09       ` [PATCH v4 1/5] real_path: resolve symlinks by hand Brandon Williams
@ 2017-01-03 19:09       ` Brandon Williams
  2017-01-03 19:09       ` [PATCH v4 3/5] real_path: create real_pathdup Brandon Williams
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-03 19:09 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

Change the name of real_path_internal to strbuf_realpath.  In addition
push the static strbuf up to its callers and instead take as a
parameter a pointer to a strbuf to use for the final result.

This change makes strbuf_realpath reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 53 +++++++++++++++++++++++++----------------------------
 cache.h   |  2 ++
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/abspath.c b/abspath.c
index 0f34636a8..c3a6acd4d 100644
--- a/abspath.c
+++ b/abspath.c
@@ -55,21 +55,17 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
  * Return the real path (i.e., absolute path, with symlinks resolved
  * and extra slashes removed) equivalent to the specified path.  (If
  * you want an absolute path but don't mind links, use
- * absolute_path().)  The return value is a pointer to a static
- * buffer.
+ * absolute_path().)  Places the resolved realpath in the provided strbuf.
  *
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
  * informative error message if there is a problem.  Otherwise, return
  * NULL on errors (without generating any output).
- *
- * If path is our buffer, then return path, as it's already what the
- * user wants.
  */
-static const char *real_path_internal(const char *path, int die_on_error)
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error)
 {
-	static struct strbuf resolved = STRBUF_INIT;
 	struct strbuf remaining = STRBUF_INIT;
 	struct strbuf next = STRBUF_INIT;
 	struct strbuf symlink = STRBUF_INIT;
@@ -77,10 +73,6 @@ static const char *real_path_internal(const char *path, int die_on_error)
 	int num_symlinks = 0;
 	struct stat st;
 
-	/* We've already done it */
-	if (path == resolved.buf)
-		return path;
-
 	if (!*path) {
 		if (die_on_error)
 			die("The empty string is not a valid path");
@@ -88,16 +80,16 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&resolved);
+	strbuf_reset(resolved);
 
 	if (is_absolute_path(path)) {
 		/* absolute path; start with only root as being resolved */
 		int offset = offset_1st_component(path);
-		strbuf_add(&resolved, path, offset);
+		strbuf_add(resolved, path, offset);
 		strbuf_addstr(&remaining, path + offset);
 	} else {
 		/* relative path; can use CWD as the initial resolved path */
-		if (strbuf_getcwd(&resolved)) {
+		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
 				die_errno("unable to get current working directory");
 			else
@@ -116,21 +108,21 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			continue; /* '.' component */
 		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
 			/* '..' component; strip the last path component */
-			strip_last_component(&resolved);
+			strip_last_component(resolved);
 			continue;
 		}
 
 		/* append the next component and resolve resultant path */
-		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
-			strbuf_addch(&resolved, '/');
-		strbuf_addbuf(&resolved, &next);
+		if (!is_dir_sep(resolved->buf[resolved->len - 1]))
+			strbuf_addch(resolved, '/');
+		strbuf_addbuf(resolved, &next);
 
-		if (lstat(resolved.buf, &st)) {
+		if (lstat(resolved->buf, &st)) {
 			/* error out unless this was the last component */
 			if (errno != ENOENT || remaining.len) {
 				if (die_on_error)
 					die_errno("Invalid path '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -146,12 +138,12 @@ static const char *real_path_internal(const char *path, int die_on_error)
 					goto error_out;
 			}
 
-			len = strbuf_readlink(&symlink, resolved.buf,
+			len = strbuf_readlink(&symlink, resolved->buf,
 					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -159,8 +151,8 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			if (is_absolute_path(symlink.buf)) {
 				/* absolute symlink; set resolved to root */
 				int offset = offset_1st_component(symlink.buf);
-				strbuf_reset(&resolved);
-				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_reset(resolved);
+				strbuf_add(resolved, symlink.buf, offset);
 				strbuf_remove(&symlink, 0, offset);
 			} else {
 				/*
@@ -168,7 +160,7 @@ static const char *real_path_internal(const char *path, int die_on_error)
 				 * strip off the last component since it will
 				 * be replaced with the contents of the symlink
 				 */
-				strip_last_component(&resolved);
+				strip_last_component(resolved);
 			}
 
 			/*
@@ -188,24 +180,29 @@ static const char *real_path_internal(const char *path, int die_on_error)
 		}
 	}
 
-	retval = resolved.buf;
+	retval = resolved->buf;
 
 error_out:
 	strbuf_release(&remaining);
 	strbuf_release(&next);
 	strbuf_release(&symlink);
 
+	if (!retval)
+		strbuf_reset(resolved);
+
 	return retval;
 }
 
 const char *real_path(const char *path)
 {
-	return real_path_internal(path, 1);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 1);
 }
 
 const char *real_path_if_valid(const char *path)
 {
-	return real_path_internal(path, 0);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 0);
 }
 
 /*
diff --git a/cache.h b/cache.h
index a50a61a19..7a8129403 100644
--- a/cache.h
+++ b/cache.h
@@ -1064,6 +1064,8 @@ static inline int is_absolute_path(const char *path)
 	return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
 }
 int is_directory(const char *);
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
 const char *absolute_path(const char *path);
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 3/5] real_path: create real_pathdup
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
  2017-01-03 19:09       ` [PATCH v4 1/5] real_path: resolve symlinks by hand Brandon Williams
  2017-01-03 19:09       ` [PATCH v4 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
@ 2017-01-03 19:09       ` Brandon Williams
  2017-01-03 19:09       ` [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-03 19:09 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

Create real_pathdup which returns a caller owned string of the resolved
realpath based on the provide path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 13 +++++++++++++
 cache.h   |  1 +
 2 files changed, 14 insertions(+)

diff --git a/abspath.c b/abspath.c
index c3a6acd4d..f4283f465 100644
--- a/abspath.c
+++ b/abspath.c
@@ -205,6 +205,19 @@ const char *real_path_if_valid(const char *path)
 	return strbuf_realpath(&realpath, path, 0);
 }
 
+char *real_pathdup(const char *path)
+{
+	struct strbuf realpath = STRBUF_INIT;
+	char *retval = NULL;
+
+	if (strbuf_realpath(&realpath, path, 0))
+		retval = strbuf_detach(&realpath, NULL);
+
+	strbuf_release(&realpath);
+
+	return retval;
+}
+
 /*
  * Use this to get an absolute path from a relative one. If you want
  * to resolve links, you should use real_path.
diff --git a/cache.h b/cache.h
index 7a8129403..e12a5d912 100644
--- a/cache.h
+++ b/cache.h
@@ -1068,6 +1068,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
+char *real_pathdup(const char *path);
 const char *absolute_path(const char *path);
 const char *remove_leading_path(const char *in, const char *prefix);
 const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
                         ` (2 preceding siblings ...)
  2017-01-03 19:09       ` [PATCH v4 3/5] real_path: create real_pathdup Brandon Williams
@ 2017-01-03 19:09       ` Brandon Williams
  2017-01-04  1:07         ` Jacob Keller
  2017-01-03 19:09       ` [PATCH v4 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
                         ` (2 subsequent siblings)
  6 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2017-01-03 19:09 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

Migrate callers of real_path() who duplicate the retern value to use
real_pathdup or strbuf_realpath.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/init-db.c |  6 +++---
 environment.c     |  2 +-
 setup.c           | 13 ++++++++-----
 sha1_file.c       |  2 +-
 submodule.c       |  2 +-
 transport.c       |  2 +-
 worktree.c        |  2 +-
 7 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 2399b97d9..76d68fad0 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -338,7 +338,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
-	char *original_git_dir = xstrdup(real_path(git_dir));
+	char *original_git_dir = real_pathdup(git_dir);
 
 	if (real_git_dir) {
 		struct stat st;
@@ -489,7 +489,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, init_db_options, init_db_usage, 0);
 
 	if (real_git_dir && !is_absolute_path(real_git_dir))
-		real_git_dir = xstrdup(real_path(real_git_dir));
+		real_git_dir = real_pathdup(real_git_dir);
 
 	if (argc == 1) {
 		int mkdir_tried = 0;
@@ -560,7 +560,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		const char *git_dir_parent = strrchr(git_dir, '/');
 		if (git_dir_parent) {
 			char *rel = xstrndup(git_dir, git_dir_parent - git_dir);
-			git_work_tree_cfg = xstrdup(real_path(rel));
+			git_work_tree_cfg = real_pathdup(rel);
 			free(rel);
 		}
 		if (!git_work_tree_cfg)
diff --git a/environment.c b/environment.c
index 0935ec696..9b943d2d5 100644
--- a/environment.c
+++ b/environment.c
@@ -259,7 +259,7 @@ void set_git_work_tree(const char *new_work_tree)
 		return;
 	}
 	git_work_tree_initialized = 1;
-	work_tree = xstrdup(real_path(new_work_tree));
+	work_tree = real_pathdup(new_work_tree);
 }
 
 const char *get_git_work_tree(void)
diff --git a/setup.c b/setup.c
index fe572b82c..1b534a750 100644
--- a/setup.c
+++ b/setup.c
@@ -256,8 +256,10 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		strbuf_addbuf(&path, &data);
 		strbuf_addstr(sb, real_path(path.buf));
 		ret = 1;
-	} else
+	} else {
 		strbuf_addstr(sb, gitdir);
+	}
+
 	strbuf_release(&data);
 	strbuf_release(&path);
 	return ret;
@@ -692,7 +694,7 @@ static const char *setup_discovered_git_dir(const char *gitdir,
 	/* --work-tree is set without --git-dir; use discovered one */
 	if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
 		if (offset != cwd->len && !is_absolute_path(gitdir))
-			gitdir = xstrdup(real_path(gitdir));
+			gitdir = real_pathdup(gitdir);
 		if (chdir(cwd->buf))
 			die_errno("Could not come back to cwd");
 		return setup_explicit_git_dir(gitdir, cwd, nongit_ok);
@@ -800,11 +802,12 @@ static int canonicalize_ceiling_entry(struct string_list_item *item,
 		/* Keep entry but do not canonicalize it */
 		return 1;
 	} else {
-		const char *real_path = real_path_if_valid(ceil);
-		if (!real_path)
+		char *real_path = real_pathdup(ceil);
+		if (!real_path) {
 			return 0;
+		}
 		free(item->string);
-		item->string = xstrdup(real_path);
+		item->string = real_path;
 		return 1;
 	}
 }
diff --git a/sha1_file.c b/sha1_file.c
index 9c86d1924..6a03cc393 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -291,7 +291,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (!is_absolute_path(entry) && relative_base) {
-		strbuf_addstr(&pathbuf, real_path(relative_base));
+		strbuf_realpath(&pathbuf, relative_base, 1);
 		strbuf_addch(&pathbuf, '/');
 	}
 	strbuf_addstr(&pathbuf, entry);
diff --git a/submodule.c b/submodule.c
index ece17315d..55819ba9c 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1301,7 +1301,7 @@ void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir)
 {
 	struct strbuf file_name = STRBUF_INIT;
 	struct strbuf rel_path = STRBUF_INIT;
-	const char *real_work_tree = xstrdup(real_path(work_tree));
+	const char *real_work_tree = real_pathdup(work_tree);
 
 	/* Update gitfile */
 	strbuf_addf(&file_name, "%s/.git", work_tree);
diff --git a/transport.c b/transport.c
index 04e5d6623..a3b98f198 100644
--- a/transport.c
+++ b/transport.c
@@ -1146,7 +1146,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
 	const struct ref *extra;
 	struct alternate_refs_data *cb = data;
 
-	other = xstrdup(real_path(e->path));
+	other = real_pathdup(e->path);
 	len = strlen(other);
 
 	while (other[len-1] == '/')
diff --git a/worktree.c b/worktree.c
index eb6121263..edf14daf9 100644
--- a/worktree.c
+++ b/worktree.c
@@ -255,7 +255,7 @@ struct worktree *find_worktree(struct worktree **list,
 		return wt;
 
 	arg = prefix_filename(prefix, strlen(prefix), arg);
-	path = xstrdup(real_path(arg));
+	path = real_pathdup(arg);
 	for (; *list; list++)
 		if (!fspathcmp(path, real_path((*list)->path)))
 			break;
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 5/5] real_path: canonicalize directory separators in root parts
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
                         ` (3 preceding siblings ...)
  2017-01-03 19:09       ` [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
@ 2017-01-03 19:09       ` Brandon Williams
  2017-01-04  0:48       ` [PATCH v4 0/5] road to reentrant real_path Jeff King
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
  6 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-03 19:09 UTC (permalink / raw)
  To: git
  Cc: Johannes Sixt, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, pclouds, larsxschneider

From: Johannes Sixt <j6t@kdbg.org>

When an absolute path is resolved, resolution begins at the first path
component after the root part. The root part is just copied verbatim,
because it must not be inspected for symbolic links. For POSIX paths,
this is just the initial slash, but on Windows, the root part has the
forms c:\ or \\server\share. We do want to canonicalize the back-slashes
in the root part because these parts are compared to the result of
getcwd(), which does return a fully canonicalized path.

Factor out a helper that splits off the root part, and have it
canonicalize the copied part.

This change was prompted because t1504-ceiling-dirs.sh caught a breakage
in GIT_CEILING_DIRECTORIES handling on Windows.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 abspath.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/abspath.c b/abspath.c
index f4283f465..3562d17bf 100644
--- a/abspath.c
+++ b/abspath.c
@@ -48,6 +48,19 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
 	strbuf_remove(remaining, 0, end - remaining->buf);
 }
 
+/* copies root part from remaining to resolved, canonicalizing it on the way */
+static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
+{
+	int offset = offset_1st_component(remaining->buf);
+
+	strbuf_reset(resolved);
+	strbuf_add(resolved, remaining->buf, offset);
+#ifdef GIT_WINDOWS_NATIVE
+	convert_slashes(resolved->buf);
+#endif
+	strbuf_remove(remaining, 0, offset);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
 #define MAXDEPTH 5
 
@@ -80,14 +93,10 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			goto error_out;
 	}
 
-	strbuf_reset(resolved);
+	strbuf_addstr(&remaining, path);
+	get_root_part(resolved, &remaining);
 
-	if (is_absolute_path(path)) {
-		/* absolute path; start with only root as being resolved */
-		int offset = offset_1st_component(path);
-		strbuf_add(resolved, path, offset);
-		strbuf_addstr(&remaining, path + offset);
-	} else {
+	if (!resolved->len) {
 		/* relative path; can use CWD as the initial resolved path */
 		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
@@ -95,7 +104,6 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			else
 				goto error_out;
 		}
-		strbuf_addstr(&remaining, path);
 	}
 
 	/* Iterate over the remaining path components */
@@ -150,10 +158,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 
 			if (is_absolute_path(symlink.buf)) {
 				/* absolute symlink; set resolved to root */
-				int offset = offset_1st_component(symlink.buf);
-				strbuf_reset(resolved);
-				strbuf_add(resolved, symlink.buf, offset);
-				strbuf_remove(&symlink, 0, offset);
+				get_root_part(resolved, &symlink);
 			} else {
 				/*
 				 * relative symlink
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 0/5] road to reentrant real_path
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
                         ` (4 preceding siblings ...)
  2017-01-03 19:09       ` [PATCH v4 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
@ 2017-01-04  0:48       ` Jeff King
  2017-01-04  6:56         ` Torsten Bögershausen
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
  6 siblings, 1 reply; 83+ messages in thread
From: Jeff King @ 2017-01-04  0:48 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, jacob.keller, gitster, ramsay, tboegi, j6t, pclouds,
	larsxschneider

On Tue, Jan 03, 2017 at 11:09:18AM -0800, Brandon Williams wrote:

> Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
> MAXDEPTH due to a naming conflict brought up by Lars Schneider.

Hmm. Isn't MAXSYMLINKS basically what you want here, though? It what's
what all other similar functions will be using.

The only problem was that we were redefining the macro. So maybe:

  #ifndef MAXSYMLINKS
  #define MAXSYMLINKS 5
  #endif

would be a good solution?

It looks like the "usual" value is more like 20 or 30 on most systems,
though.  We should probably also set errno to ELOOP when we hit the
limit, which is what other symlink-resolving functions would do.

I'm surprised we didn't hit this on Linux, which has MAXSYMLINKS, too.
We should be picking it up from <sys/param.h>.

-Peff

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath
  2017-01-03 19:09       ` [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
@ 2017-01-04  1:07         ` Jacob Keller
  2017-01-04 18:14           ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Jacob Keller @ 2017-01-04  1:07 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Git mailing list, Stefan Beller, Jeff King, Junio C Hamano,
	Ramsay Jones, Torsten Bögershausen, Johannes Sixt,
	Duy Nguyen, Lars Schneider

On Tue, Jan 3, 2017 at 11:09 AM, Brandon Williams <bmwill@google.com> wrote:
> Migrate callers of real_path() who duplicate the retern value to use
> real_pathdup or strbuf_realpath.

Nit: s/retern/return

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 0/5] road to reentrant real_path
  2017-01-04  0:48       ` [PATCH v4 0/5] road to reentrant real_path Jeff King
@ 2017-01-04  6:56         ` Torsten Bögershausen
  2017-01-04  7:01           ` Jeff King
  0 siblings, 1 reply; 83+ messages in thread
From: Torsten Bögershausen @ 2017-01-04  6:56 UTC (permalink / raw)
  To: Jeff King, Brandon Williams
  Cc: git, sbeller, jacob.keller, gitster, ramsay, j6t, pclouds,
	larsxschneider

On 04.01.17 01:48, Jeff King wrote:
> On Tue, Jan 03, 2017 at 11:09:18AM -0800, Brandon Williams wrote:
> 
>> Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
>> MAXDEPTH due to a naming conflict brought up by Lars Schneider.
> 
> Hmm. Isn't MAXSYMLINKS basically what you want here, though? It what's
> what all other similar functions will be using.
> 
> The only problem was that we were redefining the macro. So maybe:
> 
>   #ifndef MAXSYMLINKS
>   #define MAXSYMLINKS 5
>   #endif
> 
> would be a good solution?
Why 5  ? (looking at the  20..30 below)
And why 5 on one system and e.g. on my Mac OS
#define MAXSYMLINKS     32  

Would the same value value for all Git installations on all platforms make sense?
#define GITMAXSYMLINKS 20

> 
> It looks like the "usual" value is more like 20 or 30 on most systems,
> though.  We should probably also set errno to ELOOP when we hit the
> limit, which is what other symlink-resolving functions would do.
> 
> I'm surprised we didn't hit this on Linux, which has MAXSYMLINKS, too.
> We should be picking it up from <sys/param.h>.
> 
> -Peff
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 0/5] road to reentrant real_path
  2017-01-04  6:56         ` Torsten Bögershausen
@ 2017-01-04  7:01           ` Jeff King
  2017-01-04 18:13             ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Jeff King @ 2017-01-04  7:01 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Brandon Williams, git, sbeller, jacob.keller, gitster, ramsay,
	j6t, pclouds, larsxschneider

On Wed, Jan 04, 2017 at 07:56:02AM +0100, Torsten Bögershausen wrote:

> On 04.01.17 01:48, Jeff King wrote:
> > On Tue, Jan 03, 2017 at 11:09:18AM -0800, Brandon Williams wrote:
> > 
> >> Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
> >> MAXDEPTH due to a naming conflict brought up by Lars Schneider.
> > 
> > Hmm. Isn't MAXSYMLINKS basically what you want here, though? It what's
> > what all other similar functions will be using.
> > 
> > The only problem was that we were redefining the macro. So maybe:
> > 
> >   #ifndef MAXSYMLINKS
> >   #define MAXSYMLINKS 5
> >   #endif
> > 
> > would be a good solution?
> Why 5  ? (looking at the  20..30 below)
> And why 5 on one system and e.g. on my Mac OS
> #define MAXSYMLINKS     32  

I mentioned "5" because that is the current value of MAXDEPTH. I do
think it would be reasonable to bump it to something higher.

> Would the same value value for all Git installations on all platforms make sense?
> #define GITMAXSYMLINKS 20

I think it's probably more important to match the rest of the OS, so
that open("foo") and realpath("foo") behave similarly on the same
system. Though I think even that isn't always possible, as the limit is
dynamic on some systems.

I think the idea of the 20-30 range is that it's small enough to catch
an infinite loop quickly, and large enough that nobody will ever hit it
in practice. :)

-Peff

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 0/5] road to reentrant real_path
  2017-01-04  7:01           ` Jeff King
@ 2017-01-04 18:13             ` Brandon Williams
  2017-01-04 18:22               ` Stefan Beller
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 18:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Torsten Bögershausen, git, sbeller, jacob.keller, gitster,
	ramsay, j6t, pclouds, larsxschneider

On 01/04, Jeff King wrote:
> On Wed, Jan 04, 2017 at 07:56:02AM +0100, Torsten Bögershausen wrote:
> 
> > On 04.01.17 01:48, Jeff King wrote:
> > > On Tue, Jan 03, 2017 at 11:09:18AM -0800, Brandon Williams wrote:
> > > 
> > >> Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
> > >> MAXDEPTH due to a naming conflict brought up by Lars Schneider.
> > > 
> > > Hmm. Isn't MAXSYMLINKS basically what you want here, though? It what's
> > > what all other similar functions will be using.
> > > 
> > > The only problem was that we were redefining the macro. So maybe:
> > > 
> > >   #ifndef MAXSYMLINKS
> > >   #define MAXSYMLINKS 5
> > >   #endif
> > > 
> > > would be a good solution?
> > Why 5  ? (looking at the  20..30 below)
> > And why 5 on one system and e.g. on my Mac OS
> > #define MAXSYMLINKS     32  
> 
> I mentioned "5" because that is the current value of MAXDEPTH. I do
> think it would be reasonable to bump it to something higher.
> 
> > Would the same value value for all Git installations on all platforms make sense?
> > #define GITMAXSYMLINKS 20
> 
> I think it's probably more important to match the rest of the OS, so
> that open("foo") and realpath("foo") behave similarly on the same
> system. Though I think even that isn't always possible, as the limit is
> dynamic on some systems.
> 
> I think the idea of the 20-30 range is that it's small enough to catch
> an infinite loop quickly, and large enough that nobody will ever hit it
> in practice. :)

I agree that we should have similar guarantees as the OS provides,
especially if the OS already has MAXSYMLINKS defined.  What then, should
the fall back value be if the OS doesn't have this defined?  5 like we
have done historically, or something around the 20-30 range as Torsten
suggests?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath
  2017-01-04  1:07         ` Jacob Keller
@ 2017-01-04 18:14           ` Brandon Williams
  0 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 18:14 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Git mailing list, Stefan Beller, Jeff King, Junio C Hamano,
	Ramsay Jones, Torsten Bögershausen, Johannes Sixt,
	Duy Nguyen, Lars Schneider

On 01/03, Jacob Keller wrote:
> On Tue, Jan 3, 2017 at 11:09 AM, Brandon Williams <bmwill@google.com> wrote:
> > Migrate callers of real_path() who duplicate the retern value to use
> > real_pathdup or strbuf_realpath.
> 
> Nit: s/retern/return

Thanks for catching that, I'll fix that in the next reroll.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 0/5] road to reentrant real_path
  2017-01-04 18:13             ` Brandon Williams
@ 2017-01-04 18:22               ` Stefan Beller
  2017-01-04 21:46                 ` Jacob Keller
  0 siblings, 1 reply; 83+ messages in thread
From: Stefan Beller @ 2017-01-04 18:22 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jeff King, Torsten Bögershausen, git@vger.kernel.org,
	Jacob Keller, Junio C Hamano, Ramsay Jones, Johannes Sixt,
	Duy Nguyen, Lars Schneider

On Wed, Jan 4, 2017 at 10:13 AM, Brandon Williams <bmwill@google.com> wrote:
> On 01/04, Jeff King wrote:
>> On Wed, Jan 04, 2017 at 07:56:02AM +0100, Torsten Bögershausen wrote:
>>
>> > On 04.01.17 01:48, Jeff King wrote:
>> > > On Tue, Jan 03, 2017 at 11:09:18AM -0800, Brandon Williams wrote:
>> > >
>> > >> Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
>> > >> MAXDEPTH due to a naming conflict brought up by Lars Schneider.
>> > >
>> > > Hmm. Isn't MAXSYMLINKS basically what you want here, though? It what's
>> > > what all other similar functions will be using.
>> > >
>> > > The only problem was that we were redefining the macro. So maybe:
>> > >
>> > >   #ifndef MAXSYMLINKS
>> > >   #define MAXSYMLINKS 5
>> > >   #endif
>> > >
>> > > would be a good solution?
>> > Why 5  ? (looking at the  20..30 below)
>> > And why 5 on one system and e.g. on my Mac OS
>> > #define MAXSYMLINKS     32
>>
>> I mentioned "5" because that is the current value of MAXDEPTH. I do
>> think it would be reasonable to bump it to something higher.
>>
>> > Would the same value value for all Git installations on all platforms make sense?
>> > #define GITMAXSYMLINKS 20
>>
>> I think it's probably more important to match the rest of the OS, so
>> that open("foo") and realpath("foo") behave similarly on the same
>> system. Though I think even that isn't always possible, as the limit is
>> dynamic on some systems.
>>
>> I think the idea of the 20-30 range is that it's small enough to catch
>> an infinite loop quickly, and large enough that nobody will ever hit it
>> in practice. :)
>
> I agree that we should have similar guarantees as the OS provides,
> especially if the OS already has MAXSYMLINKS defined.  What then, should
> the fall back value be if the OS doesn't have this defined?  5 like we
> have done historically, or something around the 20-30 range as Torsten
> suggests?

As a fallback I'd rather go for a larger number than too small.
The reason for the existence is just to break an infinite loop
(and report an error? which the current code doesn't quite do,
but your series actually does).

If the number is too large, then it takes a bit longer to generate the error
message, but the error path is no big deal w.r.t. performance, so it's fine
for it taking a bit longer.

If the number is too low, then we may hinder people from getting actual
work done, (i.e. they have to figure out what the problem is and recompile
git), so I'd think a larger number is not harmful. So 32?

>
> --
> Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 0/5] road to reentrant real_path
  2017-01-04 18:22               ` Stefan Beller
@ 2017-01-04 21:46                 ` Jacob Keller
  2017-01-04 21:55                   ` Brandon Williams
  0 siblings, 1 reply; 83+ messages in thread
From: Jacob Keller @ 2017-01-04 21:46 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Brandon Williams, Jeff King, Torsten Bögershausen,
	git@vger.kernel.org, Junio C Hamano, Ramsay Jones, Johannes Sixt,
	Duy Nguyen, Lars Schneider

On Wed, Jan 4, 2017 at 10:22 AM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, Jan 4, 2017 at 10:13 AM, Brandon Williams <bmwill@google.com> wrote:
>> On 01/04, Jeff King wrote:
>>> On Wed, Jan 04, 2017 at 07:56:02AM +0100, Torsten Bögershausen wrote:
>>>
>>> > On 04.01.17 01:48, Jeff King wrote:
>>> > > On Tue, Jan 03, 2017 at 11:09:18AM -0800, Brandon Williams wrote:
>>> > >
>>> > >> Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
>>> > >> MAXDEPTH due to a naming conflict brought up by Lars Schneider.
>>> > >
>>> > > Hmm. Isn't MAXSYMLINKS basically what you want here, though? It what's
>>> > > what all other similar functions will be using.
>>> > >
>>> > > The only problem was that we were redefining the macro. So maybe:
>>> > >
>>> > >   #ifndef MAXSYMLINKS
>>> > >   #define MAXSYMLINKS 5
>>> > >   #endif
>>> > >
>>> > > would be a good solution?
>>> > Why 5  ? (looking at the  20..30 below)
>>> > And why 5 on one system and e.g. on my Mac OS
>>> > #define MAXSYMLINKS     32
>>>
>>> I mentioned "5" because that is the current value of MAXDEPTH. I do
>>> think it would be reasonable to bump it to something higher.
>>>
>>> > Would the same value value for all Git installations on all platforms make sense?
>>> > #define GITMAXSYMLINKS 20
>>>
>>> I think it's probably more important to match the rest of the OS, so
>>> that open("foo") and realpath("foo") behave similarly on the same
>>> system. Though I think even that isn't always possible, as the limit is
>>> dynamic on some systems.
>>>
>>> I think the idea of the 20-30 range is that it's small enough to catch
>>> an infinite loop quickly, and large enough that nobody will ever hit it
>>> in practice. :)
>>
>> I agree that we should have similar guarantees as the OS provides,
>> especially if the OS already has MAXSYMLINKS defined.  What then, should
>> the fall back value be if the OS doesn't have this defined?  5 like we
>> have done historically, or something around the 20-30 range as Torsten
>> suggests?
>
> As a fallback I'd rather go for a larger number than too small.
> The reason for the existence is just to break an infinite loop
> (and report an error? which the current code doesn't quite do,
> but your series actually does).
>
> If the number is too large, then it takes a bit longer to generate the error
> message, but the error path is no big deal w.r.t. performance, so it's fine
> for it taking a bit longer.
>
> If the number is too low, then we may hinder people from getting actual
> work done, (i.e. they have to figure out what the problem is and recompile
> git), so I'd think a larger number is not harmful. So 32?
>

I think I agree as well.

Thanks,
Jake

>>
>> --
>> Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 0/5] road to reentrant real_path
  2017-01-04 21:46                 ` Jacob Keller
@ 2017-01-04 21:55                   ` Brandon Williams
  0 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 21:55 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Jeff King, Torsten Bögershausen,
	git@vger.kernel.org, Junio C Hamano, Ramsay Jones, Johannes Sixt,
	Duy Nguyen, Lars Schneider

On 01/04, Jacob Keller wrote:
> On Wed, Jan 4, 2017 at 10:22 AM, Stefan Beller <sbeller@google.com> wrote:
> > On Wed, Jan 4, 2017 at 10:13 AM, Brandon Williams <bmwill@google.com> wrote:
> >> On 01/04, Jeff King wrote:
> >>> On Wed, Jan 04, 2017 at 07:56:02AM +0100, Torsten Bögershausen wrote:
> >>>
> >>> > On 04.01.17 01:48, Jeff King wrote:
> >>> > > On Tue, Jan 03, 2017 at 11:09:18AM -0800, Brandon Williams wrote:
> >>> > >
> >>> > >> Only change with v4 is in [1/5] renaming the #define MAXSYMLINKS back to
> >>> > >> MAXDEPTH due to a naming conflict brought up by Lars Schneider.
> >>> > >
> >>> > > Hmm. Isn't MAXSYMLINKS basically what you want here, though? It what's
> >>> > > what all other similar functions will be using.
> >>> > >
> >>> > > The only problem was that we were redefining the macro. So maybe:
> >>> > >
> >>> > >   #ifndef MAXSYMLINKS
> >>> > >   #define MAXSYMLINKS 5
> >>> > >   #endif
> >>> > >
> >>> > > would be a good solution?
> >>> > Why 5  ? (looking at the  20..30 below)
> >>> > And why 5 on one system and e.g. on my Mac OS
> >>> > #define MAXSYMLINKS     32
> >>>
> >>> I mentioned "5" because that is the current value of MAXDEPTH. I do
> >>> think it would be reasonable to bump it to something higher.
> >>>
> >>> > Would the same value value for all Git installations on all platforms make sense?
> >>> > #define GITMAXSYMLINKS 20
> >>>
> >>> I think it's probably more important to match the rest of the OS, so
> >>> that open("foo") and realpath("foo") behave similarly on the same
> >>> system. Though I think even that isn't always possible, as the limit is
> >>> dynamic on some systems.
> >>>
> >>> I think the idea of the 20-30 range is that it's small enough to catch
> >>> an infinite loop quickly, and large enough that nobody will ever hit it
> >>> in practice. :)
> >>
> >> I agree that we should have similar guarantees as the OS provides,
> >> especially if the OS already has MAXSYMLINKS defined.  What then, should
> >> the fall back value be if the OS doesn't have this defined?  5 like we
> >> have done historically, or something around the 20-30 range as Torsten
> >> suggests?
> >
> > As a fallback I'd rather go for a larger number than too small.
> > The reason for the existence is just to break an infinite loop
> > (and report an error? which the current code doesn't quite do,
> > but your series actually does).
> >
> > If the number is too large, then it takes a bit longer to generate the error
> > message, but the error path is no big deal w.r.t. performance, so it's fine
> > for it taking a bit longer.
> >
> > If the number is too low, then we may hinder people from getting actual
> > work done, (i.e. they have to figure out what the problem is and recompile
> > git), so I'd think a larger number is not harmful. So 32?
> >
> 
> I think I agree as well.
> 
> Thanks,
> Jake
> 
> >>
> >> --
> >> Brandon Williams

That's two people for 32.  I'll use that as the fallback and resend the
series.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v5 0/5] road to reentrant real_path
  2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
                         ` (5 preceding siblings ...)
  2017-01-04  0:48       ` [PATCH v4 0/5] road to reentrant real_path Jeff King
@ 2017-01-04 22:01       ` Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 1/5] real_path: resolve symlinks by hand Brandon Williams
                           ` (5 more replies)
  6 siblings, 6 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 22:01 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

changes in v5:
* set errno to ELOOP when MAXSYMLINKS is exceded.
* revert to use MAXSYMLINKS instead of MAXDEPTH.
* If the OS hasn't defined MAXSYMLINKS, use a fallback value of 32.

Brandon Williams (4):
  real_path: resolve symlinks by hand
  real_path: convert real_path_internal to strbuf_realpath
  real_path: create real_pathdup
  real_path: have callers use real_pathdup and strbuf_realpath

Johannes Sixt (1):
  real_path: canonicalize directory separators in root parts

 abspath.c         | 231 +++++++++++++++++++++++++++++++++++++-----------------
 builtin/init-db.c |   6 +-
 cache.h           |   3 +
 environment.c     |   2 +-
 setup.c           |  13 +--
 sha1_file.c       |   2 +-
 submodule.c       |   2 +-
 transport.c       |   2 +-
 worktree.c        |   2 +-
 9 files changed, 178 insertions(+), 85 deletions(-)

--- interdiff with v4

diff --git a/abspath.c b/abspath.c
index 3562d17bf..fce40fddc 100644
--- a/abspath.c
+++ b/abspath.c
@@ -62,7 +62,9 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
 }
 
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXDEPTH 5
+#ifndef MAXSYMLINKS
+#define MAXSYMLINKS 32
+#endif
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
@@ -138,10 +140,12 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			ssize_t len;
 			strbuf_reset(&symlink);
 
-			if (num_symlinks++ > MAXDEPTH) {
+			if (num_symlinks++ > MAXSYMLINKS) {
+				errno = ELOOP;
+
 				if (die_on_error)
 					die("More than %d nested symlinks "
-					    "on path '%s'", MAXDEPTH, path);
+					    "on path '%s'", MAXSYMLINKS, path);
 				else
 					goto error_out;
 			}

-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v5 1/5] real_path: resolve symlinks by hand
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
@ 2017-01-04 22:01         ` Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 22:01 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

The current implementation of real_path uses chdir() in order to resolve
symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
process as a whole and not just an individual thread.  Instead perform
the symlink resolution by hand so that the calls to chdir() can be
removed, making real_path one step closer to being reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 194 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 133 insertions(+), 61 deletions(-)

diff --git a/abspath.c b/abspath.c
index 2825de859..629201e48 100644
--- a/abspath.c
+++ b/abspath.c
@@ -11,8 +11,47 @@ int is_directory(const char *path)
 	return (!stat(path, &st) && S_ISDIR(st.st_mode));
 }
 
+/* removes the last path component from 'path' except if 'path' is root */
+static void strip_last_component(struct strbuf *path)
+{
+	size_t offset = offset_1st_component(path->buf);
+	size_t len = path->len;
+
+	/* Find start of the last component */
+	while (offset < len && !is_dir_sep(path->buf[len - 1]))
+		len--;
+	/* Skip sequences of multiple path-separators */
+	while (offset < len && is_dir_sep(path->buf[len - 1]))
+		len--;
+
+	strbuf_setlen(path, len);
+}
+
+/* get (and remove) the next component in 'remaining' and place it in 'next' */
+static void get_next_component(struct strbuf *next, struct strbuf *remaining)
+{
+	char *start = NULL;
+	char *end = NULL;
+
+	strbuf_reset(next);
+
+	/* look for the next component */
+	/* Skip sequences of multiple path-separators */
+	for (start = remaining->buf; is_dir_sep(*start); start++)
+		; /* nothing */
+	/* Find end of the path component */
+	for (end = start; *end && !is_dir_sep(*end); end++)
+		; /* nothing */
+
+	strbuf_add(next, start, end - start);
+	/* remove the component from 'remaining' */
+	strbuf_remove(remaining, 0, end - remaining->buf);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXDEPTH 5
+#ifndef MAXSYMLINKS
+#define MAXSYMLINKS 32
+#endif
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
@@ -21,7 +60,6 @@ int is_directory(const char *path)
  * absolute_path().)  The return value is a pointer to a static
  * buffer.
  *
- * The input and all intermediate paths must be shorter than MAX_PATH.
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
@@ -33,22 +71,16 @@ int is_directory(const char *path)
  */
 static const char *real_path_internal(const char *path, int die_on_error)
 {
-	static struct strbuf sb = STRBUF_INIT;
+	static struct strbuf resolved = STRBUF_INIT;
+	struct strbuf remaining = STRBUF_INIT;
+	struct strbuf next = STRBUF_INIT;
+	struct strbuf symlink = STRBUF_INIT;
 	char *retval = NULL;
-
-	/*
-	 * If we have to temporarily chdir(), store the original CWD
-	 * here so that we can chdir() back to it at the end of the
-	 * function:
-	 */
-	struct strbuf cwd = STRBUF_INIT;
-
-	int depth = MAXDEPTH;
-	char *last_elem = NULL;
+	int num_symlinks = 0;
 	struct stat st;
 
 	/* We've already done it */
-	if (path == sb.buf)
+	if (path == resolved.buf)
 		return path;
 
 	if (!*path) {
@@ -58,74 +90,114 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&sb);
-	strbuf_addstr(&sb, path);
-
-	while (depth--) {
-		if (!is_directory(sb.buf)) {
-			char *last_slash = find_last_dir_sep(sb.buf);
-			if (last_slash) {
-				last_elem = xstrdup(last_slash + 1);
-				strbuf_setlen(&sb, last_slash - sb.buf + 1);
-			} else {
-				last_elem = xmemdupz(sb.buf, sb.len);
-				strbuf_reset(&sb);
-			}
+	strbuf_reset(&resolved);
+
+	if (is_absolute_path(path)) {
+		/* absolute path; start with only root as being resolved */
+		int offset = offset_1st_component(path);
+		strbuf_add(&resolved, path, offset);
+		strbuf_addstr(&remaining, path + offset);
+	} else {
+		/* relative path; can use CWD as the initial resolved path */
+		if (strbuf_getcwd(&resolved)) {
+			if (die_on_error)
+				die_errno("unable to get current working directory");
+			else
+				goto error_out;
 		}
+		strbuf_addstr(&remaining, path);
+	}
 
-		if (sb.len) {
-			if (!cwd.len && strbuf_getcwd(&cwd)) {
+	/* Iterate over the remaining path components */
+	while (remaining.len > 0) {
+		get_next_component(&next, &remaining);
+
+		if (next.len == 0) {
+			continue; /* empty component */
+		} else if (next.len == 1 && !strcmp(next.buf, ".")) {
+			continue; /* '.' component */
+		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
+			/* '..' component; strip the last path component */
+			strip_last_component(&resolved);
+			continue;
+		}
+
+		/* append the next component and resolve resultant path */
+		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
+			strbuf_addch(&resolved, '/');
+		strbuf_addbuf(&resolved, &next);
+
+		if (lstat(resolved.buf, &st)) {
+			/* error out unless this was the last component */
+			if (errno != ENOENT || remaining.len) {
 				if (die_on_error)
-					die_errno("Could not get current working directory");
+					die_errno("Invalid path '%s'",
+						  resolved.buf);
 				else
 					goto error_out;
 			}
+		} else if (S_ISLNK(st.st_mode)) {
+			ssize_t len;
+			strbuf_reset(&symlink);
+
+			if (num_symlinks++ > MAXSYMLINKS) {
+				errno = ELOOP;
 
-			if (chdir(sb.buf)) {
 				if (die_on_error)
-					die_errno("Could not switch to '%s'",
-						  sb.buf);
+					die("More than %d nested symlinks "
+					    "on path '%s'", MAXSYMLINKS, path);
 				else
 					goto error_out;
 			}
-		}
-		if (strbuf_getcwd(&sb)) {
-			if (die_on_error)
-				die_errno("Could not get current working directory");
-			else
-				goto error_out;
-		}
 
-		if (last_elem) {
-			if (sb.len && !is_dir_sep(sb.buf[sb.len - 1]))
-				strbuf_addch(&sb, '/');
-			strbuf_addstr(&sb, last_elem);
-			free(last_elem);
-			last_elem = NULL;
-		}
-
-		if (!lstat(sb.buf, &st) && S_ISLNK(st.st_mode)) {
-			struct strbuf next_sb = STRBUF_INIT;
-			ssize_t len = strbuf_readlink(&next_sb, sb.buf, 0);
+			len = strbuf_readlink(&symlink, resolved.buf,
+					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  sb.buf);
+						  resolved.buf);
 				else
 					goto error_out;
 			}
-			strbuf_swap(&sb, &next_sb);
-			strbuf_release(&next_sb);
-		} else
-			break;
+
+			if (is_absolute_path(symlink.buf)) {
+				/* absolute symlink; set resolved to root */
+				int offset = offset_1st_component(symlink.buf);
+				strbuf_reset(&resolved);
+				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_remove(&symlink, 0, offset);
+			} else {
+				/*
+				 * relative symlink
+				 * strip off the last component since it will
+				 * be replaced with the contents of the symlink
+				 */
+				strip_last_component(&resolved);
+			}
+
+			/*
+			 * if there are still remaining components to resolve
+			 * then append them to symlink
+			 */
+			if (remaining.len) {
+				strbuf_addch(&symlink, '/');
+				strbuf_addbuf(&symlink, &remaining);
+			}
+
+			/*
+			 * use the symlink as the remaining components that
+			 * need to be resloved
+			 */
+			strbuf_swap(&symlink, &remaining);
+		}
 	}
 
-	retval = sb.buf;
+	retval = resolved.buf;
+
 error_out:
-	free(last_elem);
-	if (cwd.len && chdir(cwd.buf))
-		die_errno("Could not change back to '%s'", cwd.buf);
-	strbuf_release(&cwd);
+	strbuf_release(&remaining);
+	strbuf_release(&next);
+	strbuf_release(&symlink);
 
 	return retval;
 }
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v5 2/5] real_path: convert real_path_internal to strbuf_realpath
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 1/5] real_path: resolve symlinks by hand Brandon Williams
@ 2017-01-04 22:01         ` Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 3/5] real_path: create real_pathdup Brandon Williams
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 22:01 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

Change the name of real_path_internal to strbuf_realpath.  In addition
push the static strbuf up to its callers and instead take as a
parameter a pointer to a strbuf to use for the final result.

This change makes strbuf_realpath reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 53 +++++++++++++++++++++++++----------------------------
 cache.h   |  2 ++
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/abspath.c b/abspath.c
index 629201e48..a200d4220 100644
--- a/abspath.c
+++ b/abspath.c
@@ -57,21 +57,17 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
  * Return the real path (i.e., absolute path, with symlinks resolved
  * and extra slashes removed) equivalent to the specified path.  (If
  * you want an absolute path but don't mind links, use
- * absolute_path().)  The return value is a pointer to a static
- * buffer.
+ * absolute_path().)  Places the resolved realpath in the provided strbuf.
  *
  * The directory part of path (i.e., everything up to the last
  * dir_sep) must denote a valid, existing directory, but the last
  * component need not exist.  If die_on_error is set, then die with an
  * informative error message if there is a problem.  Otherwise, return
  * NULL on errors (without generating any output).
- *
- * If path is our buffer, then return path, as it's already what the
- * user wants.
  */
-static const char *real_path_internal(const char *path, int die_on_error)
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error)
 {
-	static struct strbuf resolved = STRBUF_INIT;
 	struct strbuf remaining = STRBUF_INIT;
 	struct strbuf next = STRBUF_INIT;
 	struct strbuf symlink = STRBUF_INIT;
@@ -79,10 +75,6 @@ static const char *real_path_internal(const char *path, int die_on_error)
 	int num_symlinks = 0;
 	struct stat st;
 
-	/* We've already done it */
-	if (path == resolved.buf)
-		return path;
-
 	if (!*path) {
 		if (die_on_error)
 			die("The empty string is not a valid path");
@@ -90,16 +82,16 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			goto error_out;
 	}
 
-	strbuf_reset(&resolved);
+	strbuf_reset(resolved);
 
 	if (is_absolute_path(path)) {
 		/* absolute path; start with only root as being resolved */
 		int offset = offset_1st_component(path);
-		strbuf_add(&resolved, path, offset);
+		strbuf_add(resolved, path, offset);
 		strbuf_addstr(&remaining, path + offset);
 	} else {
 		/* relative path; can use CWD as the initial resolved path */
-		if (strbuf_getcwd(&resolved)) {
+		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
 				die_errno("unable to get current working directory");
 			else
@@ -118,21 +110,21 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			continue; /* '.' component */
 		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
 			/* '..' component; strip the last path component */
-			strip_last_component(&resolved);
+			strip_last_component(resolved);
 			continue;
 		}
 
 		/* append the next component and resolve resultant path */
-		if (!is_dir_sep(resolved.buf[resolved.len - 1]))
-			strbuf_addch(&resolved, '/');
-		strbuf_addbuf(&resolved, &next);
+		if (!is_dir_sep(resolved->buf[resolved->len - 1]))
+			strbuf_addch(resolved, '/');
+		strbuf_addbuf(resolved, &next);
 
-		if (lstat(resolved.buf, &st)) {
+		if (lstat(resolved->buf, &st)) {
 			/* error out unless this was the last component */
 			if (errno != ENOENT || remaining.len) {
 				if (die_on_error)
 					die_errno("Invalid path '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -150,12 +142,12 @@ static const char *real_path_internal(const char *path, int die_on_error)
 					goto error_out;
 			}
 
-			len = strbuf_readlink(&symlink, resolved.buf,
+			len = strbuf_readlink(&symlink, resolved->buf,
 					      st.st_size);
 			if (len < 0) {
 				if (die_on_error)
 					die_errno("Invalid symlink '%s'",
-						  resolved.buf);
+						  resolved->buf);
 				else
 					goto error_out;
 			}
@@ -163,8 +155,8 @@ static const char *real_path_internal(const char *path, int die_on_error)
 			if (is_absolute_path(symlink.buf)) {
 				/* absolute symlink; set resolved to root */
 				int offset = offset_1st_component(symlink.buf);
-				strbuf_reset(&resolved);
-				strbuf_add(&resolved, symlink.buf, offset);
+				strbuf_reset(resolved);
+				strbuf_add(resolved, symlink.buf, offset);
 				strbuf_remove(&symlink, 0, offset);
 			} else {
 				/*
@@ -172,7 +164,7 @@ static const char *real_path_internal(const char *path, int die_on_error)
 				 * strip off the last component since it will
 				 * be replaced with the contents of the symlink
 				 */
-				strip_last_component(&resolved);
+				strip_last_component(resolved);
 			}
 
 			/*
@@ -192,24 +184,29 @@ static const char *real_path_internal(const char *path, int die_on_error)
 		}
 	}
 
-	retval = resolved.buf;
+	retval = resolved->buf;
 
 error_out:
 	strbuf_release(&remaining);
 	strbuf_release(&next);
 	strbuf_release(&symlink);
 
+	if (!retval)
+		strbuf_reset(resolved);
+
 	return retval;
 }
 
 const char *real_path(const char *path)
 {
-	return real_path_internal(path, 1);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 1);
 }
 
 const char *real_path_if_valid(const char *path)
 {
-	return real_path_internal(path, 0);
+	static struct strbuf realpath = STRBUF_INIT;
+	return strbuf_realpath(&realpath, path, 0);
 }
 
 /*
diff --git a/cache.h b/cache.h
index a50a61a19..7a8129403 100644
--- a/cache.h
+++ b/cache.h
@@ -1064,6 +1064,8 @@ static inline int is_absolute_path(const char *path)
 	return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
 }
 int is_directory(const char *);
+char *strbuf_realpath(struct strbuf *resolved, const char *path,
+		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
 const char *absolute_path(const char *path);
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v5 3/5] real_path: create real_pathdup
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 1/5] real_path: resolve symlinks by hand Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
@ 2017-01-04 22:01         ` Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 22:01 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

Create real_pathdup which returns a caller owned string of the resolved
realpath based on the provide path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 13 +++++++++++++
 cache.h   |  1 +
 2 files changed, 14 insertions(+)

diff --git a/abspath.c b/abspath.c
index a200d4220..72f716f80 100644
--- a/abspath.c
+++ b/abspath.c
@@ -209,6 +209,19 @@ const char *real_path_if_valid(const char *path)
 	return strbuf_realpath(&realpath, path, 0);
 }
 
+char *real_pathdup(const char *path)
+{
+	struct strbuf realpath = STRBUF_INIT;
+	char *retval = NULL;
+
+	if (strbuf_realpath(&realpath, path, 0))
+		retval = strbuf_detach(&realpath, NULL);
+
+	strbuf_release(&realpath);
+
+	return retval;
+}
+
 /*
  * Use this to get an absolute path from a relative one. If you want
  * to resolve links, you should use real_path.
diff --git a/cache.h b/cache.h
index 7a8129403..e12a5d912 100644
--- a/cache.h
+++ b/cache.h
@@ -1068,6 +1068,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 		      int die_on_error);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
+char *real_pathdup(const char *path);
 const char *absolute_path(const char *path);
 const char *remove_leading_path(const char *in, const char *prefix);
 const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v5 4/5] real_path: have callers use real_pathdup and strbuf_realpath
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
                           ` (2 preceding siblings ...)
  2017-01-04 22:01         ` [PATCH v5 3/5] real_path: create real_pathdup Brandon Williams
@ 2017-01-04 22:01         ` Brandon Williams
  2017-01-04 22:01         ` [PATCH v5 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
  2017-01-08  3:09         ` [PATCH v5 0/5] road to reentrant real_path Junio C Hamano
  5 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 22:01 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, j6t, pclouds, larsxschneider

Migrate callers of real_path() who duplicate the return value to use
real_pathdup or strbuf_realpath.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/init-db.c |  6 +++---
 environment.c     |  2 +-
 setup.c           | 13 ++++++++-----
 sha1_file.c       |  2 +-
 submodule.c       |  2 +-
 transport.c       |  2 +-
 worktree.c        |  2 +-
 7 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 2399b97d9..76d68fad0 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -338,7 +338,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
-	char *original_git_dir = xstrdup(real_path(git_dir));
+	char *original_git_dir = real_pathdup(git_dir);
 
 	if (real_git_dir) {
 		struct stat st;
@@ -489,7 +489,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, init_db_options, init_db_usage, 0);
 
 	if (real_git_dir && !is_absolute_path(real_git_dir))
-		real_git_dir = xstrdup(real_path(real_git_dir));
+		real_git_dir = real_pathdup(real_git_dir);
 
 	if (argc == 1) {
 		int mkdir_tried = 0;
@@ -560,7 +560,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		const char *git_dir_parent = strrchr(git_dir, '/');
 		if (git_dir_parent) {
 			char *rel = xstrndup(git_dir, git_dir_parent - git_dir);
-			git_work_tree_cfg = xstrdup(real_path(rel));
+			git_work_tree_cfg = real_pathdup(rel);
 			free(rel);
 		}
 		if (!git_work_tree_cfg)
diff --git a/environment.c b/environment.c
index 0935ec696..9b943d2d5 100644
--- a/environment.c
+++ b/environment.c
@@ -259,7 +259,7 @@ void set_git_work_tree(const char *new_work_tree)
 		return;
 	}
 	git_work_tree_initialized = 1;
-	work_tree = xstrdup(real_path(new_work_tree));
+	work_tree = real_pathdup(new_work_tree);
 }
 
 const char *get_git_work_tree(void)
diff --git a/setup.c b/setup.c
index fe572b82c..1b534a750 100644
--- a/setup.c
+++ b/setup.c
@@ -256,8 +256,10 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		strbuf_addbuf(&path, &data);
 		strbuf_addstr(sb, real_path(path.buf));
 		ret = 1;
-	} else
+	} else {
 		strbuf_addstr(sb, gitdir);
+	}
+
 	strbuf_release(&data);
 	strbuf_release(&path);
 	return ret;
@@ -692,7 +694,7 @@ static const char *setup_discovered_git_dir(const char *gitdir,
 	/* --work-tree is set without --git-dir; use discovered one */
 	if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
 		if (offset != cwd->len && !is_absolute_path(gitdir))
-			gitdir = xstrdup(real_path(gitdir));
+			gitdir = real_pathdup(gitdir);
 		if (chdir(cwd->buf))
 			die_errno("Could not come back to cwd");
 		return setup_explicit_git_dir(gitdir, cwd, nongit_ok);
@@ -800,11 +802,12 @@ static int canonicalize_ceiling_entry(struct string_list_item *item,
 		/* Keep entry but do not canonicalize it */
 		return 1;
 	} else {
-		const char *real_path = real_path_if_valid(ceil);
-		if (!real_path)
+		char *real_path = real_pathdup(ceil);
+		if (!real_path) {
 			return 0;
+		}
 		free(item->string);
-		item->string = xstrdup(real_path);
+		item->string = real_path;
 		return 1;
 	}
 }
diff --git a/sha1_file.c b/sha1_file.c
index 9c86d1924..6a03cc393 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -291,7 +291,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (!is_absolute_path(entry) && relative_base) {
-		strbuf_addstr(&pathbuf, real_path(relative_base));
+		strbuf_realpath(&pathbuf, relative_base, 1);
 		strbuf_addch(&pathbuf, '/');
 	}
 	strbuf_addstr(&pathbuf, entry);
diff --git a/submodule.c b/submodule.c
index ece17315d..55819ba9c 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1301,7 +1301,7 @@ void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir)
 {
 	struct strbuf file_name = STRBUF_INIT;
 	struct strbuf rel_path = STRBUF_INIT;
-	const char *real_work_tree = xstrdup(real_path(work_tree));
+	const char *real_work_tree = real_pathdup(work_tree);
 
 	/* Update gitfile */
 	strbuf_addf(&file_name, "%s/.git", work_tree);
diff --git a/transport.c b/transport.c
index 04e5d6623..a3b98f198 100644
--- a/transport.c
+++ b/transport.c
@@ -1146,7 +1146,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
 	const struct ref *extra;
 	struct alternate_refs_data *cb = data;
 
-	other = xstrdup(real_path(e->path));
+	other = real_pathdup(e->path);
 	len = strlen(other);
 
 	while (other[len-1] == '/')
diff --git a/worktree.c b/worktree.c
index eb6121263..edf14daf9 100644
--- a/worktree.c
+++ b/worktree.c
@@ -255,7 +255,7 @@ struct worktree *find_worktree(struct worktree **list,
 		return wt;
 
 	arg = prefix_filename(prefix, strlen(prefix), arg);
-	path = xstrdup(real_path(arg));
+	path = real_pathdup(arg);
 	for (; *list; list++)
 		if (!fspathcmp(path, real_path((*list)->path)))
 			break;
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v5 5/5] real_path: canonicalize directory separators in root parts
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
                           ` (3 preceding siblings ...)
  2017-01-04 22:01         ` [PATCH v5 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
@ 2017-01-04 22:01         ` Brandon Williams
  2017-01-08  3:09         ` [PATCH v5 0/5] road to reentrant real_path Junio C Hamano
  5 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-04 22:01 UTC (permalink / raw)
  To: git
  Cc: Johannes Sixt, sbeller, peff, jacob.keller, gitster, ramsay,
	tboegi, pclouds, larsxschneider

From: Johannes Sixt <j6t@kdbg.org>

When an absolute path is resolved, resolution begins at the first path
component after the root part. The root part is just copied verbatim,
because it must not be inspected for symbolic links. For POSIX paths,
this is just the initial slash, but on Windows, the root part has the
forms c:\ or \\server\share. We do want to canonicalize the back-slashes
in the root part because these parts are compared to the result of
getcwd(), which does return a fully canonicalized path.

Factor out a helper that splits off the root part, and have it
canonicalize the copied part.

This change was prompted because t1504-ceiling-dirs.sh caught a breakage
in GIT_CEILING_DIRECTORIES handling on Windows.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 abspath.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/abspath.c b/abspath.c
index 72f716f80..fce40fddc 100644
--- a/abspath.c
+++ b/abspath.c
@@ -48,6 +48,19 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining)
 	strbuf_remove(remaining, 0, end - remaining->buf);
 }
 
+/* copies root part from remaining to resolved, canonicalizing it on the way */
+static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
+{
+	int offset = offset_1st_component(remaining->buf);
+
+	strbuf_reset(resolved);
+	strbuf_add(resolved, remaining->buf, offset);
+#ifdef GIT_WINDOWS_NATIVE
+	convert_slashes(resolved->buf);
+#endif
+	strbuf_remove(remaining, 0, offset);
+}
+
 /* We allow "recursive" symbolic links. Only within reason, though. */
 #ifndef MAXSYMLINKS
 #define MAXSYMLINKS 32
@@ -82,14 +95,10 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			goto error_out;
 	}
 
-	strbuf_reset(resolved);
+	strbuf_addstr(&remaining, path);
+	get_root_part(resolved, &remaining);
 
-	if (is_absolute_path(path)) {
-		/* absolute path; start with only root as being resolved */
-		int offset = offset_1st_component(path);
-		strbuf_add(resolved, path, offset);
-		strbuf_addstr(&remaining, path + offset);
-	} else {
+	if (!resolved->len) {
 		/* relative path; can use CWD as the initial resolved path */
 		if (strbuf_getcwd(resolved)) {
 			if (die_on_error)
@@ -97,7 +106,6 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			else
 				goto error_out;
 		}
-		strbuf_addstr(&remaining, path);
 	}
 
 	/* Iterate over the remaining path components */
@@ -154,10 +162,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 
 			if (is_absolute_path(symlink.buf)) {
 				/* absolute symlink; set resolved to root */
-				int offset = offset_1st_component(symlink.buf);
-				strbuf_reset(resolved);
-				strbuf_add(resolved, symlink.buf, offset);
-				strbuf_remove(&symlink, 0, offset);
+				get_root_part(resolved, &symlink);
 			} else {
 				/*
 				 * relative symlink
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v5 0/5] road to reentrant real_path
  2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
                           ` (4 preceding siblings ...)
  2017-01-04 22:01         ` [PATCH v5 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
@ 2017-01-08  3:09         ` Junio C Hamano
  2017-01-09 18:04           ` Brandon Williams
  5 siblings, 1 reply; 83+ messages in thread
From: Junio C Hamano @ 2017-01-08  3:09 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds,
	larsxschneider

Brandon Williams <bmwill@google.com> writes:

> changes in v5:
> * set errno to ELOOP when MAXSYMLINKS is exceded.
> * revert to use MAXSYMLINKS instead of MAXDEPTH.
> * If the OS hasn't defined MAXSYMLINKS, use a fallback value of 32.
>
> Brandon Williams (4):
>   real_path: resolve symlinks by hand
>   real_path: convert real_path_internal to strbuf_realpath
>   real_path: create real_pathdup
>   real_path: have callers use real_pathdup and strbuf_realpath
>
> Johannes Sixt (1):
>   real_path: canonicalize directory separators in root parts
>

How does this relate to the 5-patch real_path: series that has been
on 'next' since last year?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v5 0/5] road to reentrant real_path
  2017-01-08  3:09         ` [PATCH v5 0/5] road to reentrant real_path Junio C Hamano
@ 2017-01-09 18:04           ` Brandon Williams
  2017-01-09 18:18             ` Junio C Hamano
  0 siblings, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2017-01-09 18:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds,
	larsxschneider

On 01/07, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > changes in v5:
> > * set errno to ELOOP when MAXSYMLINKS is exceded.
> > * revert to use MAXSYMLINKS instead of MAXDEPTH.
> > * If the OS hasn't defined MAXSYMLINKS, use a fallback value of 32.
> >
> > Brandon Williams (4):
> >   real_path: resolve symlinks by hand
> >   real_path: convert real_path_internal to strbuf_realpath
> >   real_path: create real_pathdup
> >   real_path: have callers use real_pathdup and strbuf_realpath
> >
> > Johannes Sixt (1):
> >   real_path: canonicalize directory separators in root parts
> >
> 
> How does this relate to the 5-patch real_path: series that has been
> on 'next' since last year?

The only difference should be in the first patch of the series which
handles the #define a bit differently due to the discussion that
happened last week.

Here is the interdiff between v5 and 'origin/bw/realpath-wo-chdir':

diff --git a/abspath.c b/abspath.c
index 1d56f5ed9..fce40fddc 100644
--- a/abspath.c
+++ b/abspath.c
@@ -62,7 +62,9 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
 }
 
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXSYMLINKS 5
+#ifndef MAXSYMLINKS
+#define MAXSYMLINKS 32
+#endif
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
@@ -139,6 +141,8 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			strbuf_reset(&symlink);
 
 			if (num_symlinks++ > MAXSYMLINKS) {
+				errno = ELOOP;
+
 				if (die_on_error)
 					die("More than %d nested symlinks "
 					    "on path '%s'", MAXSYMLINKS, path);

-- 
Brandon Williams

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v5 0/5] road to reentrant real_path
  2017-01-09 18:04           ` Brandon Williams
@ 2017-01-09 18:18             ` Junio C Hamano
  2017-01-09 18:24               ` Brandon Williams
  2017-01-09 18:50               ` [PATCH 1/2] real_path: prevent redefinition of MAXSYMLINKS Brandon Williams
  0 siblings, 2 replies; 83+ messages in thread
From: Junio C Hamano @ 2017-01-09 18:18 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds,
	larsxschneider

Brandon Williams <bmwill@google.com> writes:

>> How does this relate to the 5-patch real_path: series that has been
>> on 'next' since last year?
>
> The only difference should be in the first patch of the series which
> handles the #define a bit differently due to the discussion that
> happened last week.
>
> Here is the interdiff between v5 and 'origin/bw/realpath-wo-chdir':

Then can you make that an incremental patch (or two) with its own
log message instead?  It (or they) would look and smell like a
bugfix patch that follows up a change that has already landed.  As
you know, we won't eject and replace patches that are already in
'next' during a development cycle.

Thanks.

>
> diff --git a/abspath.c b/abspath.c
> index 1d56f5ed9..fce40fddc 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -62,7 +62,9 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
>  }
>  
>  /* We allow "recursive" symbolic links. Only within reason, though. */
> -#define MAXSYMLINKS 5
> +#ifndef MAXSYMLINKS
> +#define MAXSYMLINKS 32
> +#endif
>  
>  /*
>   * Return the real path (i.e., absolute path, with symlinks resolved
> @@ -139,6 +141,8 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  			strbuf_reset(&symlink);
>  
>  			if (num_symlinks++ > MAXSYMLINKS) {
> +				errno = ELOOP;
> +
>  				if (die_on_error)
>  					die("More than %d nested symlinks "
>  					    "on path '%s'", MAXSYMLINKS, path);

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v5 0/5] road to reentrant real_path
  2017-01-09 18:18             ` Junio C Hamano
@ 2017-01-09 18:24               ` Brandon Williams
  2017-01-09 19:26                 ` Junio C Hamano
  2017-01-09 18:50               ` [PATCH 1/2] real_path: prevent redefinition of MAXSYMLINKS Brandon Williams
  1 sibling, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2017-01-09 18:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds,
	larsxschneider

On 01/09, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> >> How does this relate to the 5-patch real_path: series that has been
> >> on 'next' since last year?
> >
> > The only difference should be in the first patch of the series which
> > handles the #define a bit differently due to the discussion that
> > happened last week.
> >
> > Here is the interdiff between v5 and 'origin/bw/realpath-wo-chdir':
> 
> Then can you make that an incremental patch (or two) with its own
> log message instead?  It (or they) would look and smell like a
> bugfix patch that follows up a change that has already landed.  As
> you know, we won't eject and replace patches that are already in
> 'next' during a development cycle.
> 
> Thanks.

Yes I'll get right on that.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 1/2] real_path: prevent redefinition of MAXSYMLINKS
  2017-01-09 18:18             ` Junio C Hamano
  2017-01-09 18:24               ` Brandon Williams
@ 2017-01-09 18:50               ` Brandon Williams
  2017-01-09 18:50                 ` [PATCH 2/2] real_path: set errno when max number of symlinks is exceeded Brandon Williams
  1 sibling, 1 reply; 83+ messages in thread
From: Brandon Williams @ 2017-01-09 18:50 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, gitster, larsxschneider

The macro 'MAXSYMLINKS' is already defined on macOS and Linux in
<sys/param.h>.  If 'MAXSYMLINKS' has already been defined, use the value
defined by the OS otherwise default to a value of 32 which is more
inline with what is allowed by many systems.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/abspath.c b/abspath.c
index 1d56f5ed9..0393213e5 100644
--- a/abspath.c
+++ b/abspath.c
@@ -62,7 +62,9 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
 }
 
 /* We allow "recursive" symbolic links. Only within reason, though. */
-#define MAXSYMLINKS 5
+#ifndef MAXSYMLINKS
+#define MAXSYMLINKS 32
+#endif
 
 /*
  * Return the real path (i.e., absolute path, with symlinks resolved
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 2/2] real_path: set errno when max number of symlinks is exceeded
  2017-01-09 18:50               ` [PATCH 1/2] real_path: prevent redefinition of MAXSYMLINKS Brandon Williams
@ 2017-01-09 18:50                 ` Brandon Williams
  0 siblings, 0 replies; 83+ messages in thread
From: Brandon Williams @ 2017-01-09 18:50 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, gitster, larsxschneider

Set errno to ELOOP when the maximum number of symlinks is exceeded, as
would be done by other symlink-resolving functions.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 abspath.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/abspath.c b/abspath.c
index 0393213e5..fce40fddc 100644
--- a/abspath.c
+++ b/abspath.c
@@ -141,6 +141,8 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
 			strbuf_reset(&symlink);
 
 			if (num_symlinks++ > MAXSYMLINKS) {
+				errno = ELOOP;
+
 				if (die_on_error)
 					die("More than %d nested symlinks "
 					    "on path '%s'", MAXSYMLINKS, path);
-- 
2.11.0.390.gc69c2f50cf-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v5 0/5] road to reentrant real_path
  2017-01-09 18:24               ` Brandon Williams
@ 2017-01-09 19:26                 ` Junio C Hamano
  0 siblings, 0 replies; 83+ messages in thread
From: Junio C Hamano @ 2017-01-09 19:26 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, sbeller, peff, jacob.keller, ramsay, tboegi, j6t, pclouds,
	larsxschneider

Brandon Williams <bmwill@google.com> writes:

> On 01/09, Junio C Hamano wrote:
>> Brandon Williams <bmwill@google.com> writes:
>> 
>> >> How does this relate to the 5-patch real_path: series that has been
>> >> on 'next' since last year?
>> >
>> > The only difference should be in the first patch of the series which
>> > handles the #define a bit differently due to the discussion that
>> > happened last week.
>> >
>> > Here is the interdiff between v5 and 'origin/bw/realpath-wo-chdir':
>> 
>> Then can you make that an incremental patch (or two) with its own
>> log message instead?  It (or they) would look and smell like a
>> bugfix patch that follows up a change that has already landed.  As
>> you know, we won't eject and replace patches that are already in
>> 'next' during a development cycle.
>> 
>> Thanks.
>
> Yes I'll get right on that.

Thanks.  Will queue.

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2017-01-09 19:26 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-05 18:58 [PATCH] making real_path thread-safe Brandon Williams
2016-12-05 18:58 ` [PATCH] real_path: make " Brandon Williams
2016-12-05 19:57   ` Stefan Beller
2016-12-05 20:12     ` Brandon Williams
2016-12-05 20:38       ` Stefan Beller
2016-12-05 20:14   ` Stefan Beller
2016-12-05 20:16     ` Brandon Williams
2016-12-08  9:41       ` Duy Nguyen
2016-12-08 17:50         ` Brandon Williams
2016-12-06 23:44   ` Junio C Hamano
2016-12-07  0:10     ` Brandon Williams
2016-12-07  1:12       ` Ramsay Jones
2016-12-07 20:14         ` Torsten Bögershausen
2016-12-07 20:32           ` Junio C Hamano
2016-12-07 22:13             ` Brandon Williams
2016-12-08  7:55               ` Torsten Bögershausen
2016-12-08 18:41                 ` Johannes Sixt
2016-12-08 19:02                   ` Brandon Williams
2016-12-07 20:43       ` Johannes Sixt
2016-12-07 22:29         ` Brandon Williams
2016-12-08 11:32           ` Johannes Sixt
2016-12-08 16:54             ` Junio C Hamano
2016-12-08 23:58 ` [PATCH v2 0/4] road to reentrant real_path Brandon Williams
2016-12-08 23:58   ` [PATCH v2 1/4] real_path: resolve symlinks by hand Brandon Williams
2016-12-09  1:49     ` Jacob Keller
2016-12-09 14:33     ` Johannes Sixt
2016-12-09 20:04       ` Brandon Williams
2016-12-08 23:58   ` [PATCH v2 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2016-12-08 23:58   ` [PATCH v2 3/4] real_path: create real_pathdup Brandon Williams
2016-12-09 14:35     ` Johannes Sixt
2016-12-08 23:58   ` [PATCH v2 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2016-12-09 12:33   ` [PATCH v2 0/4] road to reentrant real_path Duy Nguyen
2016-12-09 19:42     ` Brandon Williams
2016-12-10 11:02       ` Duy Nguyen
2016-12-12 18:16   ` [PATCH v3 " Brandon Williams
2016-12-12 18:16     ` [PATCH v3 1/4] real_path: resolve symlinks by hand Brandon Williams
2016-12-12 22:19       ` Junio C Hamano
2016-12-12 22:50         ` Brandon Williams
2016-12-12 23:32           ` Junio C Hamano
2016-12-12 18:16     ` [PATCH v3 2/4] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2016-12-12 22:20       ` Junio C Hamano
2016-12-12 18:16     ` [PATCH v3 3/4] real_path: create real_pathdup Brandon Williams
2016-12-12 22:25       ` Junio C Hamano
2016-12-12 18:16     ` [PATCH v3 4/4] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2016-12-12 22:26       ` Junio C Hamano
2016-12-12 23:47         ` Junio C Hamano
2016-12-12 23:58           ` Stefan Beller
2016-12-13  1:15             ` Brandon Williams
2016-12-13  6:39               ` Junio C Hamano
2016-12-21 21:51     ` [PATCH bw/realpath-wo-chdir] real_path: canonicalize directory separators in root parts Johannes Sixt
2016-12-21 22:33       ` Brandon Williams
2016-12-22  6:07         ` Johannes Sixt
2016-12-22 17:33           ` Brandon Williams
2016-12-22 18:54             ` Johannes Sixt
2016-12-22 19:33             ` Junio C Hamano
2017-01-03 19:09     ` [PATCH v4 0/5] road to reentrant real_path Brandon Williams
2017-01-03 19:09       ` [PATCH v4 1/5] real_path: resolve symlinks by hand Brandon Williams
2017-01-03 19:09       ` [PATCH v4 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2017-01-03 19:09       ` [PATCH v4 3/5] real_path: create real_pathdup Brandon Williams
2017-01-03 19:09       ` [PATCH v4 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2017-01-04  1:07         ` Jacob Keller
2017-01-04 18:14           ` Brandon Williams
2017-01-03 19:09       ` [PATCH v4 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
2017-01-04  0:48       ` [PATCH v4 0/5] road to reentrant real_path Jeff King
2017-01-04  6:56         ` Torsten Bögershausen
2017-01-04  7:01           ` Jeff King
2017-01-04 18:13             ` Brandon Williams
2017-01-04 18:22               ` Stefan Beller
2017-01-04 21:46                 ` Jacob Keller
2017-01-04 21:55                   ` Brandon Williams
2017-01-04 22:01       ` [PATCH v5 " Brandon Williams
2017-01-04 22:01         ` [PATCH v5 1/5] real_path: resolve symlinks by hand Brandon Williams
2017-01-04 22:01         ` [PATCH v5 2/5] real_path: convert real_path_internal to strbuf_realpath Brandon Williams
2017-01-04 22:01         ` [PATCH v5 3/5] real_path: create real_pathdup Brandon Williams
2017-01-04 22:01         ` [PATCH v5 4/5] real_path: have callers use real_pathdup and strbuf_realpath Brandon Williams
2017-01-04 22:01         ` [PATCH v5 5/5] real_path: canonicalize directory separators in root parts Brandon Williams
2017-01-08  3:09         ` [PATCH v5 0/5] road to reentrant real_path Junio C Hamano
2017-01-09 18:04           ` Brandon Williams
2017-01-09 18:18             ` Junio C Hamano
2017-01-09 18:24               ` Brandon Williams
2017-01-09 19:26                 ` Junio C Hamano
2017-01-09 18:50               ` [PATCH 1/2] real_path: prevent redefinition of MAXSYMLINKS Brandon Williams
2017-01-09 18:50                 ` [PATCH 2/2] real_path: set errno when max number of symlinks is exceeded Brandon Williams

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).