git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [PATCH v2 1/2] abspath: add a function to resolve paths with missing components
Date: Fri, 09 Oct 2020 14:10:04 -0700	[thread overview]
Message-ID: <xmqqk0vzrtqr.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20201009191511.267461-2-sandals@crustytoothpaste.net> (brian m. carlson's message of "Fri, 9 Oct 2020 19:15:10 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We'd like to canonicalize paths such that we can preserve any number of
> trailing components that may be missing.

Sorry, but at least to me, the above gives no clue what kind of
operation is desired to be done on paths.  How would one preserve
what does not exist (i.e. are missing)?

Do you mean some leading components in a path point at existing
directories and after some point a component names a directory
that does not exist, so everything after that does not yet exist
until you "mkdir -p" them?

I guess my confusion comes primarily from the fuzziness of the verb
"canonicalize" in the sentence.  We want to handle a/b/../c/d and
there are various combinations of missng and existing directories,
e.g. a/b may not exist or a/b may but a/c may not, etc.  Is that
what is going on?  Makes me wonder if it makes sense to canonicalize
a/b/../c/d into a/c/d when a/b does not exist in the first place,
though.

> Let's add a function to do
> that that calls strbuf_realpath to find the canonical path for the
> portion we do have and then append the missing part.  We adjust
> strip_last_component to return us the component it has stripped and use
> that to help us accumulate the missing part.

OK, so if we have a/b/c/d and know a/b/c/d does not exist on the
filesystem, we start by splitting it to a/b/c and d, see if a/b/c
exists, and if not, do the same recursively to a/b/c to split it
into a/b and c, and prefix the latter to 'd' that we split earlier
(i.e. now we have a/b and c/d), until we have an existing directory
on the first half?

> Note that it is intentional that we invoke strbuf_realpath here,
> repeatedly if necessary, because on Windows that function is replaced
> with a version that uses the proper system semantics for
> canonicalization.  Trying to adjust strbuf_realpath to perform this kind
> of canonicalization with an additional option would fail to work
> properly on Windows.  The present approach is equivalent to
> strbuf_realpath for cases where the path exists, and the only other
> cases where we will use this function the additional overhead of
> multiple invocations is not significant.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  abspath.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++----
>  cache.h   |  1 +
>  2 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/abspath.c b/abspath.c
> index 6f15a418bb..092bb33b64 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -11,8 +11,12 @@ int is_directory(const char *path)
>  	return (!stat(path, &st) && S_ISDIR(st.st_mode));
>  }
>  
> -/* removes the last path component from 'path' except if 'path' is root */
> -static void strip_last_component(struct strbuf *path)
> +/*
> + * Removes the last path component from 'path' except if 'path' is root.
> + *
> + * If last is not NULL, the last path component is copied to last.
> + */
> +static void strip_last_component(struct strbuf *path, struct strbuf *last)
>  {
>  	size_t offset = offset_1st_component(path->buf);
>  	size_t len = path->len;
> @@ -20,6 +24,10 @@ static void strip_last_component(struct strbuf *path)
>  	/* Find start of the last component */
>  	while (offset < len && !is_dir_sep(path->buf[len - 1]))
>  		len--;
> +
> +	if (last)
> +		strbuf_addstr(last, path->buf + len);
> +
>  	/* Skip sequences of multiple path-separators */
>  	while (offset < len && is_dir_sep(path->buf[len - 1]))
>  		len--;
> @@ -118,7 +126,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  			continue; /* '.' component */
>  		} else if (next.len == 2 && !strcmp(next.buf, "..")) {
>  			/* '..' component; strip the last path component */
> -			strip_last_component(resolved);
> +			strip_last_component(resolved, NULL);
>  			continue;
>  		}
>  
> @@ -169,7 +177,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  				 * strip off the last component since it will
>  				 * be replaced with the contents of the symlink
>  				 */
> -				strip_last_component(resolved);
> +				strip_last_component(resolved, NULL);
>  			}
>  
>  			/*
> @@ -202,6 +210,40 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  	return retval;
>  }
>  
> +/*
> + * Like strbuf_realpath, but trailing components which do not exist are copied
> + * through.
> + */
> +char *strbuf_realpath_missing(struct strbuf *resolved, const char *path)
> +{
> +	struct strbuf remaining = STRBUF_INIT;
> +	struct strbuf trailing = STRBUF_INIT;
> +	struct strbuf component = STRBUF_INIT;
> +
> +	strbuf_addstr(&remaining, path);
> +
> +	while (remaining.len) {
> +		if (strbuf_realpath(resolved, remaining.buf, 0)) {
> +			strbuf_addbuf(resolved, &trailing);
> +
> +			strbuf_release(&component);
> +			strbuf_release(&remaining);
> +			strbuf_release(&trailing);
> +
> +			return resolved->buf;
> +		}
> +		strip_last_component(&remaining, &component);
> +		strbuf_insertstr(&trailing, 0, "/");
> +		strbuf_insertstr(&trailing, 1, component.buf);

I may be utterly confused, but is this where

    - we started with a/b/c/d, pushed 'd' into trailing and decided
      to redo with a/b/c

    - now we split the a/b/c into a/b and c, and adjusting what is
      in trailing from 'd' to 'c/d'

happens place?  It's a bit sad that we need to repeatedly use
insertstr to prepend in front, instead of appending.

> +		strbuf_reset(&component);
> +	}
> +
> +	strbuf_release(&component);
> +	strbuf_release(&remaining);
> +	strbuf_release(&trailing);
> +	return NULL;
> +}
> +
>  char *real_pathdup(const char *path, int die_on_error)
>  {
>  	struct strbuf realpath = STRBUF_INIT;
> diff --git a/cache.h b/cache.h
> index c0072d43b1..e1e17e108e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1320,6 +1320,7 @@ static inline int is_absolute_path(const char *path)
>  int is_directory(const char *);
>  char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  		      int die_on_error);
> +char *strbuf_realpath_missing(struct strbuf *resolved, const char *path);
>  char *real_pathdup(const char *path, int die_on_error);
>  const char *absolute_path(const char *path);
>  char *absolute_pathdup(const char *path);

  reply	other threads:[~2020-10-09 21:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-09 19:15 [PATCH v2 0/2] rev-parse options for absolute or relative paths brian m. carlson
2020-10-09 19:15 ` [PATCH v2 1/2] abspath: add a function to resolve paths with missing components brian m. carlson
2020-10-09 21:10   ` Junio C Hamano [this message]
2020-10-10  1:10     ` brian m. carlson
2020-11-09 13:57       ` Johannes Schindelin
2020-11-09 13:55   ` Johannes Schindelin
2020-11-16  2:21     ` brian m. carlson
2020-10-09 19:15 ` [PATCH v2 2/2] rev-parse: add option for absolute or relative path formatting brian m. carlson
2020-11-09 14:46   ` Johannes Schindelin
2020-11-16  2:15     ` brian m. carlson
2020-11-04 23:01 ` [PATCH v2 0/2] rev-parse options for absolute or relative paths Emily Shaffer
2020-11-05  3:20   ` brian m. carlson
2020-11-09 13:33 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk0vzrtqr.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).