On Wed, Jul 29, 2015 at 10:42:21AM -0700, Junio C Hamano wrote: > Patrick Steinhardt writes: > > > We fail to guess a sensible directory name for a newly cloned > > repository when the path component of the URL is empty. E.g. > > cloning a repository 'ssh://user:password@example.com/' we create > > a directory 'password@example.com' for the clone. > > > > Fix this by ... > > It is clear that you do not want to have that password in the > resulting directory name from the problem description, but you > started saying "Fix this" without saying what the desired outcome > is. "We want to use only the hostname, e.g. 'example.com', in such > a case instead." or something, perhaps, at the end of the first > paragraph? "Fix this by doing such and such" becomes understandable > only after we know what end result you want to achieve by "doing > such and such". Agreed, will fix with the next iteration. > > ... using parse_connect_url to split host and path > > components and explicitly checking whether we need to fall back > > to the hostname for guessing a directory name. > > I cannot help wonder why this much change (including patches 3 and > 4) is needed. Isn't it just the matter of making this part of the > existing code be aware of '@' in addition to ':'? Actually no, as host and path components need to be treated differently. See below. > > - /* > > - * Find last component, but be prepared that repo could have > > - * the form "remote.example.com:foo.git", i.e. no slash > > - * in the directory part. > > - */ > > - start = end; > > - while (repo < start && !is_dir_sep(start[-1]) && start[-1] != ':') > > - start--; > > Regardless of the issue you are trying to address, we may want to > limit that "be prepared for and careful with ':'" logic in the > existing code to the case where the "last component" does not have > any other component before it. That is: > > http://example.com/foo:bar.git/ > > would be stripped to > > http://example.com/foo:bar > > and then we scan backwards for ':' or '/' and declare that "bar" is > the name of the repository, but we would probably want "foo:bar" > instead (or we may not, as some filesystems do not want to have a > colon in its path components). This case is exactly why I did include patches 3 and 4. We've got two cases that need to be distinguished: 1. we've got a non-empty path component (that is it contains more than just a '/'). In this case we want to take its last part and strip it of things like '.git'. We should only honor ':' as a path separator if it is the first character in the path component, otherwise only honor '/'. 2. we've got an empty path component. In this case we want to inspect the host part. If it is empty we have to error out, otherwise we want to strip it of authentication information (everythin up to and including '@') and port information (everything following the ':'). So both cases are treated entirely different. For your example we'd first split up 'http://example.com/foo:bar.git' into the host 'example.com' and the path '/foo:bar.git'. As 'parse_connect_url()' does exactly what we need, e.g. split up host and path, I think it is only natural to reuse it. But actually you are right, currently I still have the old logic in place that splits on colons in the path component. In my case it would be 'parse_connect_url()'s responsibility to detect if host and path are not separated by '/' but by ':' and thus we'd not run into the problem with 'foo:bar.git'. I'll verify that behavior though and write some tests. Patrick