git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jiang Xin <worldhello.net@gmail.com>,
	Lea Wiemann <lewiemann@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths
Date: Mon, 01 Oct 2012 06:51:28 +0200	[thread overview]
Message-ID: <50692150.8080504@alum.mit.edu> (raw)
In-Reply-To: <7vd314gcti.fsf@alter.siamese.dyndns.org>

On 09/30/2012 10:00 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> longest_ancestor_length() relies on a textual comparison of directory
>> parts to find the part of path that overlaps with one of the paths in
>> prefix_list.  But this doesn't work if any of the prefixes involves a
>> symbolic link, because the directories will look different even though
>> they might logically refer to the same directory.  So canonicalize the
>> paths listed in prefix_list using real_path_if_valid() before trying
>> to find matches.
> 
> I somehow feel that this is making the logic unnecessarily
> convoluted by solving the problem at a wrong level.
> 
> If longest_ancestor_length() takes a single string and a list of
> candidate string prefixes, conceptually it should be usable for any
> hierarchy-looking string that uses slashes as hierarchy separator
> (e.g. refs that may be stored in packed-refs that you cannot expect
> lstat(2) or readlink(2) to give you any sensible results).
> 
> The real problem is that the list given from the environment may
> contain a path that violates that "it suffices to take the longest
> string-prefix taking slashes into account" assumption in such a
> generic l-a-l implementation, no?  And this patch solves it by
> making l-a-l _less_ generic and forcing it to be aware of the glitch
> of its caller (you can either blame clueless user who lies when
> setting the GIT_CEILING_DIRECTORIES by including paths contaminated
> with symlinks, or blame the calling setup_git_directory_gently_1()
> function that does not resolve the symbolic links before calling
> this function).
> 
> As l-a-l is only used by the "stop at the ceiling" logic, isn't it a
> far simpler solution to keep the function work at the string level,
> and make the caller fix its input, i.e. the value taken from the
> environment variable, before calling it?  That is, grab the value of
> GIT_CEILING_DIRECTORIES, split it into a list at PATH_SEP (while
> rejecting any non-absolute paths), normalize the elements of that
> list by resolving symbolic links, and then pass the cwd[] and the
> normalized string list to l-a-l?
> 
> The resulting callsite in setup_git_directory_gently_1() would pass
> cwd[] and the ceiling_dirs (which now is a string list), all of
> whose elements would happen to begin with "/" (or dos drive prefix
> followed by the "root" symbol), but l-a-l can be written in such a
> way that it does not even require that all the input has to begin at
> root, which would later make it usable with things that are not
> paths (references, for example, all of which begin with "refs/" and
> not "/refs/").

I agree that longest_ancestor_length() is not very generic or
interesting anymore.  Nearly all of its "added value" comes from the
normalize_path_callback() helper function.  One possibility would be to
inline it at the one place it is called.

The function string_list_longest_prefix() was my attempt to isolate the
kernel of generic functionality from longest_ancestor_path().  It is
*almost* the function that you are proposing, with the exception that it
does not ensure that the prefix match ends at a "/" boundary.  So
another alternative could be to change this function to respect "/"
boundaries.  (After the change, the function might not belong in the
string-list API anymore.)

However, the semantics of a function that matches prefixes at "/"
boundaries is not entirely obvious.  Suppose we would implement the test
via a function like path_prefixcmp(path, prefix).  I can think of a few
policy questions that would have to be answered:

* Is a trailing slash on the prefix argument required, optional, or
prohibited?  What if the prefix is "/" or "//" or "c:/"?

* Is a trailing slash on the path argument optional/prohibited?  Are "/"
or "//" allowed as path arguments?

* Is a path its own prefix?

    path_prefixcmp("a/b", "a/b") -> true or false?

(For the implementation of longest_ancestor_path(), we would prefer this
to return "false".)  Or does the answer depend on whether the prefix has
a trailing slash?

    path_prefixcmp("a/b", "a/b") -> true?
    path_prefixcmp("a/b", "a/b/") -> false?

Part of the reason that I implemented string_list_longest_prefix()
instead of the function that you suggest is that the behavior of the
former is far more obvious.

I think I would advocate that the prefix has to match the front of the
path exactly (including any trailing slashes) and either

    strlen(prefix) == 0
    or the prefix ended with a '/'
    or the prefix and path are identical
    or the character in path following the matching part is a '/'

This would allow the "is path its own prefix" policy to be decided by
the caller by either including or omitting a trailing slash on the
prefix argument.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2012-10-01  4:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-29  6:15 [PATCH v2 0/9] Fix GIT_CEILING_DIRECTORIES that contain symlinks Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 1/9] Introduce new static function real_path_internal() Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 2/9] real_path_internal(): add comment explaining use of cwd Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 3/9] Introduce new function real_path_if_valid() Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 4/9] longest_ancestor_length(): use string_list_split() Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 5/9] longest_ancestor_length(): explicitly filter list before loop Michael Haggerty
2012-09-29  6:15 ` [PATCH v2 6/9] longest_ancestor_length(): always add a slash to the end of prefixes Michael Haggerty
2012-09-29  6:16 ` [PATCH v2 7/9] longest_ancestor_length(): use string_list_longest_prefix() Michael Haggerty
2012-09-29  6:16 ` [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths Michael Haggerty
2012-09-30  8:00   ` Junio C Hamano
2012-10-01  4:51     ` Michael Haggerty [this message]
2012-10-01  5:30       ` Junio C Hamano
2012-10-06  8:04       ` Proposed function path_in_directory() [was: Re: [PATCH v2 8/9] longest_ancestor_length(): resolve symlinks before comparing paths] Michael Haggerty
2012-10-08 16:13         ` Proposed function path_in_directory() Junio C Hamano
2012-10-08 18:20           ` Johannes Sixt
2012-10-08 18:23             ` Junio C Hamano
2012-09-29  6:16 ` [PATCH v2 9/9] t1504: stop resolving symlinks in GIT_CEILING_DIRECTORIES Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50692150.8080504@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lewiemann@gmail.com \
    --cc=worldhello.net@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).