git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4] git-p4: fix faulty paths for case insensitive systems
@ 2015-08-21 17:19 larsxschneider
  2015-08-21 18:01 ` Junio C Hamano
  2015-08-23  0:11 ` Luke Diamand
  0 siblings, 2 replies; 10+ messages in thread
From: larsxschneider @ 2015-08-21 17:19 UTC (permalink / raw
  To: git
  Cc: luke, pw, torarvid, ksaitoh560, tboegi, sunshine, gitster,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

PROBLEM:
We run P4 servers on Linux and P4 clients on Windows. For an unknown
reason the file path for a number of files in P4 does not match the
directory path with respect to case sensitivity.

E.g. `p4 files` might return
//depot/path/to/file1
//depot/PATH/to/file2

If you use P4/P4V then these files end up in the same directory, e.g.
//depot/path/to/file1
//depot/path/to/file2

If you use git-p4 then all files not matching the correct file path
(e.g. `file2`) will be ignored.

SOLUTION:
Identify path names that are different with respect to case sensitivity.
If there are any then run `p4 dirs` to build up a dictionary
containing the "correct" cases for each path. It looks like P4
interprets "correct" here as the existing path of the first file in a
directory. The path dictionary is used later on to fix all paths.

This is only applied if the parameter "--ignore-path-case" is passed to
the git-p4 clone command.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 git-p4.py                         |  82 ++++++++++++++++++++++++++--
 t/t9821-git-p4-path-variations.sh | 109 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 187 insertions(+), 4 deletions(-)
 create mode 100755 t/t9821-git-p4-path-variations.sh

diff --git a/git-p4.py b/git-p4.py
index 073f87b..61a587b 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1931,7 +1931,7 @@ class View(object):
                 (self.client_prefix, clientFile))
         return clientFile[len(self.client_prefix):]
 
-    def update_client_spec_path_cache(self, files):
+    def update_client_spec_path_cache(self, files, fixPathCase = None):
         """ Caching file paths by "p4 where" batch query """
 
         # List depot file paths exclude that already cached
@@ -1950,6 +1950,8 @@ class View(object):
             if "unmap" in res:
                 # it will list all of them, but only one not unmap-ped
                 continue
+            if fixPathCase:
+                res['depotFile'] = fixPathCase(res['depotFile'])
             self.client_spec_path_cache[res['depotFile']] = self.convert_client_path(res["clientFile"])
 
         # not found files or unmap files set to ""
@@ -1987,6 +1989,7 @@ class P4Sync(Command, P4UserMap):
                                      help="Maximum number of changes to import"),
                 optparse.make_option("--changes-block-size", dest="changes_block_size", type="int",
                                      help="Internal block size to use when iteratively calling p4 changes"),
+                optparse.make_option("--ignore-path-case", dest="ignorePathCase", action="store_true"),
                 optparse.make_option("--keep-path", dest="keepRepoPath", action='store_true',
                                      help="Keep entire BRANCH/DIR/SUBDIR prefix during import"),
                 optparse.make_option("--use-client-spec", dest="useClientSpec", action='store_true',
@@ -2017,6 +2020,7 @@ class P4Sync(Command, P4UserMap):
         self.maxChanges = ""
         self.changes_block_size = None
         self.keepRepoPath = False
+        self.ignorePathCase = False
         self.depotPaths = None
         self.p4BranchesInGit = []
         self.cloneExclude = []
@@ -2049,7 +2053,8 @@ class P4Sync(Command, P4UserMap):
         files = []
         fnum = 0
         while commit.has_key("depotFile%s" % fnum):
-            path =  commit["depotFile%s" % fnum]
+            path = commit["depotFile%s" % fnum]
+            path = self.fixPathCase(path)
 
             if [p for p in self.cloneExclude
                 if p4PathStartsWith(path, p)]:
@@ -2113,7 +2118,9 @@ class P4Sync(Command, P4UserMap):
         branches = {}
         fnum = 0
         while commit.has_key("depotFile%s" % fnum):
-            path =  commit["depotFile%s" % fnum]
+            path = commit["depotFile%s" % fnum]
+            path = self.fixPathCase(path)
+
             found = [p for p in self.depotPaths
                      if p4PathStartsWith(path, p)]
             if not found:
@@ -2240,6 +2247,10 @@ class P4Sync(Command, P4UserMap):
             if marshalled["code"] == "error":
                 if "data" in marshalled:
                     err = marshalled["data"].rstrip()
+
+        if "depotFile" in marshalled:
+            marshalled['depotFile'] = self.fixPathCase(marshalled['depotFile'])
+
         if err:
             f = None
             if self.stream_have_file_info:
@@ -2314,6 +2325,7 @@ class P4Sync(Command, P4UserMap):
 
             # do the last chunk
             if self.stream_file.has_key('depotFile'):
+                self.stream_file['depotFile'] = self.fixPathCase(self.stream_file['depotFile'])
                 self.streamOneP4File(self.stream_file, self.stream_contents)
 
     def make_email(self, userid):
@@ -2371,7 +2383,8 @@ class P4Sync(Command, P4UserMap):
                 sys.stderr.write("Ignoring file outside of prefix: %s\n" % f['path'])
 
         if self.clientSpecDirs:
-            self.clientSpecDirs.update_client_spec_path_cache(files)
+            self.clientSpecDirs.update_client_spec_path_cache(
+                files, lambda x: self.fixPathCase(x))
 
         self.gitStream.write("commit %s\n" % branch)
 #        gitStream.write("mark :%s\n" % details["change"])
@@ -2835,6 +2848,62 @@ class P4Sync(Command, P4UserMap):
             print "IO error with git fast-import. Is your git version recent enough?"
             print self.gitError.read()
 
+    def fixPathCase(self, path):
+        if self.caseCorrectedPaths:
+            components = path.split('/')
+            filename = components.pop()
+            dirname = '/'.join(components).lower() + '/'
+            if dirname in self.caseCorrectedPaths:
+                path = self.caseCorrectedPaths[dirname] + filename
+        return path
+
+    def generatePathCaseDict(self, clientPrefix):
+        # Query all files and generate a list of all used paths
+        # e.g. this files list:
+        # //depot/path/to/file1
+        # //depot/PATH/to/file2
+        #
+        # result in this path list:
+        # //depot/
+        # //depot/PATH/
+        # //depot/path/
+        # //depot/PATH/to/
+        # //depot/path/to/
+        p4_paths = set()
+        for f in p4CmdList(["files", clientPrefix + "..."]):
+            components = f["depotFile"].split('/')[0:-1]
+            for i in range(3, len(components)+1):
+                p4_paths.add('/'.join(components[0:i]) + '/')
+        p4_paths = sorted(list(p4_paths), key=len)
+
+        if len(p4_paths) > len(set([p.lower() for p in p4_paths])):
+            print "ATTENTION: File paths with different case variations detected. Fixing may take a while..."
+            found_variations = True
+            while found_variations:
+                for path in p4_paths:
+                    found_variations = False
+                    path_variations = [p for p in p4_paths if p.lower() == path.lower()]
+
+                    if len(path_variations) > 1:
+                        print  "%i different case variations for path '%s' detected." % (len(path_variations), path)
+                        # If we detect path variations (e.g. //depot/path and //depot/PATH) then we query P4 to list
+                        # the subdirectories of the parent (e.g //depot/*). P4 will return these subdirectories with
+                        # the correct case.
+                        parent_path = '/'.join(path.split('/')[0:-2])
+                        case_ok_paths = [p["dir"] + '/' for p in p4CmdList(["dirs", "-D", parent_path + '/*'])]
+
+                        # Replace all known paths with the case corrected path from P4 dirs command
+                        for case_ok_path in case_ok_paths:
+                            pattern = re.compile("^" + re.escape(case_ok_path), re.IGNORECASE)
+                            p4_paths = sorted(list(set([pattern.sub(case_ok_path, p) for p in p4_paths])), key=len)
+
+                        found_variations = True
+                        break
+            return dict((p.lower(), p) for p in p4_paths)
+        else:
+            if self.verbose:
+                print "All file paths have consistent case"
+            return None
 
     def run(self, args):
         self.depotPaths = []
@@ -3006,6 +3075,11 @@ class P4Sync(Command, P4UserMap):
 
         self.depotPaths = newPaths
 
+        if self.ignorePathCase and self.clientSpecDirs:
+            self.caseCorrectedPaths = self.generatePathCaseDict(self.clientSpecDirs.client_prefix)
+        else:
+            self.caseCorrectedPaths = None
+
         # --detect-branches may change this for each branch
         self.branchPrefixes = self.depotPaths
 
diff --git a/t/t9821-git-p4-path-variations.sh b/t/t9821-git-p4-path-variations.sh
new file mode 100755
index 0000000..90515a1
--- /dev/null
+++ b/t/t9821-git-p4-path-variations.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='Clone repositories with path case variations'
+
+. ./lib-git-p4.sh
+
+test_expect_success 'start p4d with case folding enabled' '
+	start_p4d -C1
+'
+
+test_expect_success 'Create a repo with path case variations' '
+	client_view "//depot/... //client/..." &&
+	(
+		cd "$cli" &&
+
+		mkdir -p One/two &&
+		>One/two/File2.txt &&
+		p4 add One/two/File2.txt &&
+		p4 submit -d "Add file2" &&
+		rm -rf One &&
+
+		mkdir -p one/TWO &&
+		>one/TWO/file1.txt &&
+		p4 add one/TWO/file1.txt &&
+		p4 submit -d "Add file1" &&
+		rm -rf one &&
+
+		mkdir -p one/two &&
+		>one/two/file3.txt &&
+		p4 add one/two/file3.txt &&
+		p4 submit -d "Add file3" &&
+		rm -rf one &&
+
+		mkdir -p outside-spec &&
+		>outside-spec/file4.txt &&
+		p4 add outside-spec/file4.txt &&
+		p4 submit -d "Add file4" &&
+		rm -rf outside-spec
+	)
+'
+
+test_expect_success 'Clone the repo and WITHOUT path fixing' '
+	client_view "//depot/One/... //client/..." &&
+	git p4 clone --use-client-spec --destination="$git" //depot &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		# This method is used instead of "test -f" to ensure the case is
+		# checked even if the test is executed on case-insensitive file systems.
+		cat >expect <<-\EOF &&
+			two/File2.txt
+		EOF
+		git ls-files >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'Clone the repo WITH path fixing' '
+	client_view "//depot/One/... //client/..." &&
+	git p4 clone --ignore-path-case --use-client-spec --destination="$git" //depot &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		cat >expect <<-\EOF &&
+			TWO/File2.txt
+			TWO/file1.txt
+			TWO/file3.txt
+		EOF
+		git ls-files >actual &&
+		test_cmp expect actual
+	)
+'
+
+# It looks like P4 determines the path case based on the first file in
+# lexicographical order. Please note the lower case "two" directory for all
+# files triggered through the addition of "File0.txt".
+test_expect_success 'Add a new file and clone the repo WITH path fixing' '
+	client_view "//depot/... //client/..." &&
+	(
+		cd "$cli" &&
+
+		mkdir -p One/two &&
+		>One/two/File0.txt &&
+		p4 add One/two/File0.txt &&
+		p4 submit -d "Add file" &&
+		rm -rf One
+	) &&
+
+	client_view "//depot/One/... //client/..." &&
+	git p4 clone --ignore-path-case --use-client-spec --destination="$git" //depot &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		cat >expect <<-\EOF &&
+			two/File0.txt
+			two/File2.txt
+			two/file1.txt
+			two/file3.txt
+		EOF
+		git ls-files >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'kill p4d' '
+	kill_p4d
+'
+
+test_done
-- 
1.9.5 (Apple Git-50.3)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-21 17:19 [PATCH v4] git-p4: fix faulty paths for case insensitive systems larsxschneider
@ 2015-08-21 18:01 ` Junio C Hamano
  2015-08-23 17:10   ` Lars Schneider
  2015-08-23  0:11 ` Luke Diamand
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2015-08-21 18:01 UTC (permalink / raw
  To: larsxschneider; +Cc: git, luke, pw, torarvid, ksaitoh560, tboegi, sunshine

larsxschneider@gmail.com writes:

> From: Lars Schneider <larsxschneider@gmail.com>
>
> PROBLEM:
> We run P4 servers on Linux and P4 clients on Windows. For an unknown
> reason the file path for a number of files in P4 does not match the
> directory path with respect to case sensitivity.
>
> E.g. `p4 files` might return
> //depot/path/to/file1
> //depot/PATH/to/file2
>
> If you use P4/P4V then these files end up in the same directory, e.g.
> //depot/path/to/file1
> //depot/path/to/file2
>
> If you use git-p4 then all files not matching the correct file path
> (e.g. `file2`) will be ignored.
>
> SOLUTION:
> Identify path names that are different with respect to case sensitivity.
> If there are any then run `p4 dirs` to build up a dictionary
> containing the "correct" cases for each path. It looks like P4
> interprets "correct" here as the existing path of the first file in a
> directory. The path dictionary is used later on to fix all paths.
>
> This is only applied if the parameter "--ignore-path-case" is passed to
> the git-p4 clone command.
>
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---

Thanks.  A few comments.

 - Have you checked "git log" on our history and notice how nobody
   says "PROBLEM:" and "SOLUTION:" in capital letters?  Don't try to
   be original in the form; your contributions are already original
   and valuable in the substance ;-)

 - I think I saw v3 yesterday.  It would be nice to see a brief
   description of what has been updated in this version.

I do not do Perforce and am not familiar enough with this code to
judge myself.  Will wait for area experts (you have them CC'ed, which
is good) to comment.

> diff --git a/git-p4.py b/git-p4.py
> index 073f87b..61a587b 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1931,7 +1931,7 @@ class View(object):
>                  (self.client_prefix, clientFile))
>          return clientFile[len(self.client_prefix):]
>  
> -    def update_client_spec_path_cache(self, files):
> +    def update_client_spec_path_cache(self, files, fixPathCase = None):

I didn't check the remainder of the file, but I thought it is
customery *not* to have spaces around '=' when the parameter is
written with its default value in Python?

> diff --git a/t/t9821-git-p4-path-variations.sh b/t/t9821-git-p4-path-variations.sh
> ...
> +test_expect_success 'Clone the repo and WITHOUT path fixing' '
> +	client_view "//depot/One/... //client/..." &&
> +	git p4 clone --use-client-spec --destination="$git" //depot &&
> +	test_when_finished cleanup_git &&
> +	(
> +		cd "$git" &&
> +		# This method is used instead of "test -f" to ensure the case is
> +		# checked even if the test is executed on case-insensitive file systems.
> +		cat >expect <<-\EOF &&
> +			two/File2.txt
> +		EOF

I think we usually write the body of the indented here text
(i.e. "<<-") indented to the same level as the command and
delimiter, i.e.

	cat >expect <<-\EOF &&
        body of the here document line 1
        body of the here document line 2
        ...
        EOF

> +		git ls-files >actual &&
> +		test_cmp expect actual
> +	)
> +'

I think you used to check the result with "find .", i.e. what the
filesystem actually recorded.  "ls-files" gives what the index
records (i.e. to be committed if you were to make a new commit from
that index).  They can be different in case on case-insensitive
filesystems, which I think are the ones that are most problematic
with the issue your patch is trying to address.

You are verifying what the set of "canonical" paths should be by
checking ls-files output.  I think that is what you intended to do
(i.e. I am saying "I think the code is more correct than the earlier
round that used find"), but I just am double checking.

> +test_expect_success 'Add a new file and clone the repo WITH path fixing' '
> +	client_view "//depot/... //client/..." &&
> +	(
> +		cd "$cli" &&
> +
> +		mkdir -p One/two &&

A blank after 'cd' only in this one but not others.  A blank line is
a good vehicle to convey that blocks of text above and below it are
logically separate, but use it consistently.  I _think_ this blank
line can go.

> +		>One/two/File0.txt &&
> +		p4 add One/two/File0.txt &&

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-21 17:19 [PATCH v4] git-p4: fix faulty paths for case insensitive systems larsxschneider
  2015-08-21 18:01 ` Junio C Hamano
@ 2015-08-23  0:11 ` Luke Diamand
  2015-08-24  9:51   ` Lars Schneider
  1 sibling, 1 reply; 10+ messages in thread
From: Luke Diamand @ 2015-08-23  0:11 UTC (permalink / raw
  To: Lars Schneider
  Cc: Git Users, Pete Wyckoff, Tor Arvid Lund, ksaitoh560,
	Torsten Bögershausen, Eric Sunshine, Junio C Hamano

Lars - thanks for persisting with this!

I'm still trying to fully understand what's going on here - can you
point out where I've got it wrong below please!

The server is on Linux, and is case-sensitive. For whatever reason
(probably people committing changes on Windows in the first place)
we've ended up with a P4 repo that has differences in path case that
need to be ignored, with all upper-case paths being mapped to
lower-case.

e.g. the *real* depot might be:

//depot/path/File1
//depot/PATH/File2

git-p4 clone should return this case-folded on Windows (and Linux?)

$GIT_DIR/path/File1
$GIT_DIR/path/File2

(Am I right in thinking that this change folds the directory, but not
the filename?)

I don't really understand why a dictionary is required to do this
though - is there a reason we can't just lowercase all incoming paths?
Which is what the existing code in p4StartWith() is trying to do. That
code lowercases the *entire* path including the filename; this change
seems to leave the filename portion unchanged. I guess though that the
answer may be to do with your finding that P4 makes up the case of
directories based on lexicographic ordering - is that the basic
problem?

For a large repo, this approach is surely going to be really slow.

Additionally, could you update your patch to add some words to
Documentation/git-p4.txt please?

A few other comments inline.

Thanks!
Luke



On 21 August 2015 at 18:19,  <larsxschneider@gmail.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> PROBLEM:
> We run P4 servers on Linux and P4 clients on Windows. For an unknown
> reason the file path for a number of files in P4 does not match the
> directory path with respect to case sensitivity.
>
> E.g. `p4 files` might return
> //depot/path/to/file1
> //depot/PATH/to/file2
>
> If you use P4/P4V then these files end up in the same directory, e.g.
> //depot/path/to/file1
> //depot/path/to/file2
>
> If you use git-p4 then all files not matching the correct file path
> (e.g. `file2`) will be ignored.

I'd like to think that the existing code that checks core.ignorecase
should handle this. I haven't tried it myself though.


>
> SOLUTION:
> Identify path names that are different with respect to case sensitivity.
> If there are any then run `p4 dirs` to build up a dictionary
> containing the "correct" cases for each path. It looks like P4
> interprets "correct" here as the existing path of the first file in a
> directory. The path dictionary is used later on to fix all paths.
>
> This is only applied if the parameter "--ignore-path-case" is passed to
> the git-p4 clone command.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
>  git-p4.py                         |  82 ++++++++++++++++++++++++++--
>  t/t9821-git-p4-path-variations.sh | 109 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 187 insertions(+), 4 deletions(-)
>  create mode 100755 t/t9821-git-p4-path-variations.sh
>
> diff --git a/git-p4.py b/git-p4.py
> index 073f87b..61a587b 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1931,7 +1931,7 @@ class View(object):
>                  (self.client_prefix, clientFile))
>          return clientFile[len(self.client_prefix):]
>
> -    def update_client_spec_path_cache(self, files):
> +    def update_client_spec_path_cache(self, files, fixPathCase = None):
>          """ Caching file paths by "p4 where" batch query """
>
>          # List depot file paths exclude that already cached
> @@ -1950,6 +1950,8 @@ class View(object):
>              if "unmap" in res:
>                  # it will list all of them, but only one not unmap-ped
>                  continue
> +            if fixPathCase:
> +                res['depotFile'] = fixPathCase(res['depotFile'])
>              self.client_spec_path_cache[res['depotFile']] = self.convert_client_path(res["clientFile"])
>
>          # not found files or unmap files set to ""
> @@ -1987,6 +1989,7 @@ class P4Sync(Command, P4UserMap):
>                                       help="Maximum number of changes to import"),
>                  optparse.make_option("--changes-block-size", dest="changes_block_size", type="int",
>                                       help="Internal block size to use when iteratively calling p4 changes"),
> +                optparse.make_option("--ignore-path-case", dest="ignorePathCase", action="store_true"),
>                  optparse.make_option("--keep-path", dest="keepRepoPath", action='store_true',
>                                       help="Keep entire BRANCH/DIR/SUBDIR prefix during import"),
>                  optparse.make_option("--use-client-spec", dest="useClientSpec", action='store_true',
> @@ -2017,6 +2020,7 @@ class P4Sync(Command, P4UserMap):
>          self.maxChanges = ""
>          self.changes_block_size = None
>          self.keepRepoPath = False
> +        self.ignorePathCase = False
>          self.depotPaths = None
>          self.p4BranchesInGit = []
>          self.cloneExclude = []
> @@ -2049,7 +2053,8 @@ class P4Sync(Command, P4UserMap):
>          files = []
>          fnum = 0
>          while commit.has_key("depotFile%s" % fnum):
> -            path =  commit["depotFile%s" % fnum]
> +            path = commit["depotFile%s" % fnum]
> +            path = self.fixPathCase(path)
>
>              if [p for p in self.cloneExclude
>                  if p4PathStartsWith(path, p)]:
> @@ -2113,7 +2118,9 @@ class P4Sync(Command, P4UserMap):
>          branches = {}
>          fnum = 0
>          while commit.has_key("depotFile%s" % fnum):
> -            path =  commit["depotFile%s" % fnum]
> +            path = commit["depotFile%s" % fnum]
> +            path = self.fixPathCase(path)
> +
>              found = [p for p in self.depotPaths
>                       if p4PathStartsWith(path, p)]
>              if not found:
> @@ -2240,6 +2247,10 @@ class P4Sync(Command, P4UserMap):
>              if marshalled["code"] == "error":
>                  if "data" in marshalled:
>                      err = marshalled["data"].rstrip()
> +
> +        if "depotFile" in marshalled:
> +            marshalled['depotFile'] = self.fixPathCase(marshalled['depotFile'])
> +
>          if err:
>              f = None
>              if self.stream_have_file_info:
> @@ -2314,6 +2325,7 @@ class P4Sync(Command, P4UserMap):
>
>              # do the last chunk
>              if self.stream_file.has_key('depotFile'):
> +                self.stream_file['depotFile'] = self.fixPathCase(self.stream_file['depotFile'])
>                  self.streamOneP4File(self.stream_file, self.stream_contents)
>
>      def make_email(self, userid):
> @@ -2371,7 +2383,8 @@ class P4Sync(Command, P4UserMap):
>                  sys.stderr.write("Ignoring file outside of prefix: %s\n" % f['path'])
>
>          if self.clientSpecDirs:
> -            self.clientSpecDirs.update_client_spec_path_cache(files)
> +            self.clientSpecDirs.update_client_spec_path_cache(
> +                files, lambda x: self.fixPathCase(x))
>
>          self.gitStream.write("commit %s\n" % branch)
>  #        gitStream.write("mark :%s\n" % details["change"])
> @@ -2835,6 +2848,62 @@ class P4Sync(Command, P4UserMap):
>              print "IO error with git fast-import. Is your git version recent enough?"
>              print self.gitError.read()
>
> +    def fixPathCase(self, path):
> +        if self.caseCorrectedPaths:
> +            components = path.split('/')
> +            filename = components.pop()
> +            dirname = '/'.join(components).lower() + '/'
> +            if dirname in self.caseCorrectedPaths:
> +                path = self.caseCorrectedPaths[dirname] + filename
> +        return path
> +
> +    def generatePathCaseDict(self, clientPrefix):
> +        # Query all files and generate a list of all used paths
> +        # e.g. this files list:
> +        # //depot/path/to/file1
> +        # //depot/PATH/to/file2
> +        #
> +        # result in this path list:
> +        # //depot/
> +        # //depot/PATH/
> +        # //depot/path/
> +        # //depot/PATH/to/
> +        # //depot/path/to/
> +        p4_paths = set()
> +        for f in p4CmdList(["files", clientPrefix + "..."]):
> +            components = f["depotFile"].split('/')[0:-1]
> +            for i in range(3, len(components)+1):
> +                p4_paths.add('/'.join(components[0:i]) + '/')
> +        p4_paths = sorted(list(p4_paths), key=len)
> +
> +        if len(p4_paths) > len(set([p.lower() for p in p4_paths])):
> +            print "ATTENTION: File paths with different case variations detected. Fixing may take a while..."

Does this really need "ATTENTION" in the output?

> +            found_variations = True
> +            while found_variations:
> +                for path in p4_paths:
> +                    found_variations = False
> +                    path_variations = [p for p in p4_paths if p.lower() == path.lower()]
> +
> +                    if len(path_variations) > 1:
> +                        print  "%i different case variations for path '%s' detected." % (len(path_variations), path)
> +                        # If we detect path variations (e.g. //depot/path and //depot/PATH) then we query P4 to list
> +                        # the subdirectories of the parent (e.g //depot/*). P4 will return these subdirectories with
> +                        # the correct case.
> +                        parent_path = '/'.join(path.split('/')[0:-2])
> +                        case_ok_paths = [p["dir"] + '/' for p in p4CmdList(["dirs", "-D", parent_path + '/*'])]
> +
> +                        # Replace all known paths with the case corrected path from P4 dirs command
> +                        for case_ok_path in case_ok_paths:
> +                            pattern = re.compile("^" + re.escape(case_ok_path), re.IGNORECASE)
> +                            p4_paths = sorted(list(set([pattern.sub(case_ok_path, p) for p in p4_paths])), key=len)

Sorry, I don't quite get what this is doing.

We're generating a list of all of the files in the depot being cloned.
From that we get a set of all of the paths, sorted by length.
And then we seem to repeatedly rewrite that set with a new set where
each element has been replaced using a regular expression in a way
that I don't quite understand....

> +
> +                        found_variations = True
> +                        break
> +            return dict((p.lower(), p) for p in p4_paths)

... and then we just return a dictionary that maps every path to a
lowercase version of itself? So couldn't we have just blindly
lowercased all the paths in the first place?


> +        else:
> +            if self.verbose:
> +                print "All file paths have consistent case"
> +            return None
>
>      def run(self, args):
>          self.depotPaths = []
> @@ -3006,6 +3075,11 @@ class P4Sync(Command, P4UserMap):
>
>          self.depotPaths = newPaths
>
> +        if self.ignorePathCase and self.clientSpecDirs:
> +            self.caseCorrectedPaths = self.generatePathCaseDict(self.clientSpecDirs.client_prefix)
> +        else:
> +            self.caseCorrectedPaths = None
> +
>          # --detect-branches may change this for each branch
>          self.branchPrefixes = self.depotPaths
>
> diff --git a/t/t9821-git-p4-path-variations.sh b/t/t9821-git-p4-path-variations.sh
> new file mode 100755
> index 0000000..90515a1
> --- /dev/null
> +++ b/t/t9821-git-p4-path-variations.sh
> @@ -0,0 +1,109 @@
> +#!/bin/sh
> +
> +test_description='Clone repositories with path case variations'
> +
> +. ./lib-git-p4.sh
> +
> +test_expect_success 'start p4d with case folding enabled' '
> +       start_p4d -C1
> +'
> +
> +test_expect_success 'Create a repo with path case variations' '
> +       client_view "//depot/... //client/..." &&
> +       (
> +               cd "$cli" &&
> +
> +               mkdir -p One/two &&
> +               >One/two/File2.txt &&
> +               p4 add One/two/File2.txt &&
> +               p4 submit -d "Add file2" &&
> +               rm -rf One &&
> +
> +               mkdir -p one/TWO &&
> +               >one/TWO/file1.txt &&
> +               p4 add one/TWO/file1.txt &&
> +               p4 submit -d "Add file1" &&
> +               rm -rf one &&
> +
> +               mkdir -p one/two &&
> +               >one/two/file3.txt &&
> +               p4 add one/two/file3.txt &&
> +               p4 submit -d "Add file3" &&
> +               rm -rf one &&
> +
> +               mkdir -p outside-spec &&
> +               >outside-spec/file4.txt &&
> +               p4 add outside-spec/file4.txt &&
> +               p4 submit -d "Add file4" &&
> +               rm -rf outside-spec
> +       )
> +'
> +
> +test_expect_success 'Clone the repo and WITHOUT path fixing' '

s/Clone the repo and/Clone the repo WITHOUT/



> +       client_view "//depot/One/... //client/..." &&
> +       git p4 clone --use-client-spec --destination="$git" //depot &&
> +       test_when_finished cleanup_git &&
> +       (
> +               cd "$git" &&
> +               # This method is used instead of "test -f" to ensure the case is
> +               # checked even if the test is executed on case-insensitive file systems.
> +               cat >expect <<-\EOF &&
> +                       two/File2.txt
> +               EOF
> +               git ls-files >actual &&
> +               test_cmp expect actual
> +       )
> +'
> +
> +test_expect_success 'Clone the repo WITH path fixing' '
> +       client_view "//depot/One/... //client/..." &&
> +       git p4 clone --ignore-path-case --use-client-spec --destination="$git" //depot &&
> +       test_when_finished cleanup_git &&
> +       (
> +               cd "$git" &&
> +               cat >expect <<-\EOF &&
> +                       TWO/File2.txt
> +                       TWO/file1.txt
> +                       TWO/file3.txt
> +               EOF
> +               git ls-files >actual &&
> +               test_cmp expect actual
> +       )
> +'
> +
> +# It looks like P4 determines the path case based on the first file in
> +# lexicographical order. Please note the lower case "two" directory for all
> +# files triggered through the addition of "File0.txt".

If that's what P4 is doing, then it's *really* nasty!


> +test_expect_success 'Add a new file and clone the repo WITH path fixing' '
> +       client_view "//depot/... //client/..." &&
> +       (
> +               cd "$cli" &&
> +
> +               mkdir -p One/two &&
> +               >One/two/File0.txt &&
> +               p4 add One/two/File0.txt &&
> +               p4 submit -d "Add file" &&
> +               rm -rf One
> +       ) &&
> +
> +       client_view "//depot/One/... //client/..." &&
> +       git p4 clone --ignore-path-case --use-client-spec --destination="$git" //depot &&
> +       test_when_finished cleanup_git &&
> +       (
> +               cd "$git" &&
> +               cat >expect <<-\EOF &&
> +                       two/File0.txt
> +                       two/File2.txt
> +                       two/file1.txt
> +                       two/file3.txt
> +               EOF
> +               git ls-files >actual &&
> +               test_cmp expect actual
> +       )
> +'
> +
> +test_expect_success 'kill p4d' '
> +       kill_p4d
> +'
> +
> +test_done
> --
> 1.9.5 (Apple Git-50.3)
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-21 18:01 ` Junio C Hamano
@ 2015-08-23 17:10   ` Lars Schneider
  2015-08-24  6:33     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Lars Schneider @ 2015-08-23 17:10 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, luke, pw, torarvid, ksaitoh560, tboegi, sunshine


On 21 Aug 2015, at 20:01, Junio C Hamano <gitster@pobox.com> wrote:

> larsxschneider@gmail.com writes:
> 
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> PROBLEM:
>> We run P4 servers on Linux and P4 clients on Windows. For an unknown
>> reason the file path for a number of files in P4 does not match the
>> directory path with respect to case sensitivity.
>> 
>> E.g. `p4 files` might return
>> //depot/path/to/file1
>> //depot/PATH/to/file2
>> 
>> If you use P4/P4V then these files end up in the same directory, e.g.
>> //depot/path/to/file1
>> //depot/path/to/file2
>> 
>> If you use git-p4 then all files not matching the correct file path
>> (e.g. `file2`) will be ignored.
>> 
>> SOLUTION:
>> Identify path names that are different with respect to case sensitivity.
>> If there are any then run `p4 dirs` to build up a dictionary
>> containing the "correct" cases for each path. It looks like P4
>> interprets "correct" here as the existing path of the first file in a
>> directory. The path dictionary is used later on to fix all paths.
>> 
>> This is only applied if the parameter "--ignore-path-case" is passed to
>> the git-p4 clone command.
>> 
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
> 
> Thanks.  A few comments.
> 
> - Have you checked "git log" on our history and notice how nobody
>   says "PROBLEM:" and "SOLUTION:" in capital letters?  Don't try to
>   be original in the form; your contributions are already original
>   and valuable in the substance ;-)
haha ok. I will make them lower case :-)

> 
> - I think I saw v3 yesterday.  It would be nice to see a brief
>   description of what has been updated in this version.
I discovered an optimization. In v3 I fixed the paths of *all* files that are underneath of a given P4 clone path. In v4 I fix only the paths that are visible on the client via client-spec (P4 can perform partial checkouts via “client-views”). I was wondering how to convey this change. Would have been a cover letter for v4 the correct way or should I have made another commit on top of my v3 change?
 
> 
> I do not do Perforce and am not familiar enough with this code to
> judge myself.  Will wait for area experts (you have them CC'ed, which
> is good) to comment.
> 
>> diff --git a/git-p4.py b/git-p4.py
>> index 073f87b..61a587b 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -1931,7 +1931,7 @@ class View(object):
>>                 (self.client_prefix, clientFile))
>>         return clientFile[len(self.client_prefix):]
>> 
>> -    def update_client_spec_path_cache(self, files):
>> +    def update_client_spec_path_cache(self, files, fixPathCase = None):
> 
> I didn't check the remainder of the file, but I thought it is
> customery *not* to have spaces around '=' when the parameter is
> written with its default value in Python?
Yes, that is PEP-8 style and I will change it accordingly. Unfortunately git-p4.py does not follow a style guide. 
e.g. line 2369: def commit(self, details, files, branch, parent = ""):

More annoyingly (to me at least) is that git-p4 mixes CamelCase with snake_case even within classes/functions. I think I read somewhere that these kind of refactorings are discouraged. I assume that applies here, too?

> 
>> diff --git a/t/t9821-git-p4-path-variations.sh b/t/t9821-git-p4-path-variations.sh
>> ...
>> +test_expect_success 'Clone the repo and WITHOUT path fixing' '
>> +	client_view "//depot/One/... //client/..." &&
>> +	git p4 clone --use-client-spec --destination="$git" //depot &&
>> +	test_when_finished cleanup_git &&
>> +	(
>> +		cd "$git" &&
>> +		# This method is used instead of "test -f" to ensure the case is
>> +		# checked even if the test is executed on case-insensitive file systems.
>> +		cat >expect <<-\EOF &&
>> +			two/File2.txt
>> +		EOF
> 
> I think we usually write the body of the indented here text
> (i.e. "<<-") indented to the same level as the command and
> delimiter, i.e.
> 
> 	cat >expect <<-\EOF &&
>        body of the here document line 1
>        body of the here document line 2
>        ...
>        EOF
ok

> 
>> +		git ls-files >actual &&
>> +		test_cmp expect actual
>> +	)
>> +'
> 
> I think you used to check the result with "find .", i.e. what the
> filesystem actually recorded.  "ls-files" gives what the index
> records (i.e. to be committed if you were to make a new commit from
> that index).  They can be different in case on case-insensitive
> filesystems, which I think are the ones that are most problematic
> with the issue your patch is trying to address.
> 
> You are verifying what the set of "canonical" paths should be by
> checking ls-files output.  I think that is what you intended to do
> (i.e. I am saying "I think the code is more correct than the earlier
> round that used find"), but I just am double checking.
I agree that “ls-files” is better because it reflects what ends up in the Git repository and how it ends up there.

> 
>> +test_expect_success 'Add a new file and clone the repo WITH path fixing' '
>> +	client_view "//depot/... //client/..." &&
>> +	(
>> +		cd "$cli" &&
>> +
>> +		mkdir -p One/two &&
> 
> A blank after 'cd' only in this one but not others.  A blank line is
> a good vehicle to convey that blocks of text above and below it are
> logically separate, but use it consistently.  I _think_ this blank
> line can go.
Agreed.
> 
>> +		>One/two/File0.txt &&
>> +		p4 add One/two/File0.txt &&
> 
> Thanks.

Thanks again for the feedback,
Lars

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-23 17:10   ` Lars Schneider
@ 2015-08-24  6:33     ` Junio C Hamano
  2015-08-24  8:22       ` Lars Schneider
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2015-08-24  6:33 UTC (permalink / raw
  To: Lars Schneider; +Cc: git, luke, pw, torarvid, ksaitoh560, tboegi, sunshine

Lars Schneider <larsxschneider@gmail.com> writes:

>> - Have you checked "git log" on our history and notice how nobody
>>   says "PROBLEM:" and "SOLUTION:" in capital letters?  Don't try to
>>   be original in the form; your contributions are already original
>>   and valuable in the substance ;-)
> haha ok. I will make them lower case :-)

I cannot tell if you are joking or not, but just in case you are
serious, please check "git log" for recent history again.  We do not
mark our paragraphs with noisy labels like "PROBLEM" and "SOLUTION",
regardless of case.  Typically, our description outlines the current
status (which prepares the reader's mind to understand what you are
going to talk about), highlight what is problematic in that current
status, and then explains what change the patch does and justifies
why it is the right change, in this order.  So those who read your
description can tell PROBLEM and SOLUTION apart without being told
with labels.

>> - I think I saw v3 yesterday.  It would be nice to see a brief
>>   description of what has been updated in this version.
> I discovered an optimization. In v3 I fixed the paths of *all* files
> that are underneath of a given P4 clone path. In v4 I fix only the
> paths that are visible on the client via client-spec (P4 can perform
> partial checkouts via “client-views”). I was wondering how to convey
> this change. Would have been a cover letter for v4 the correct way or
> should I have made another commit on top of my v3 change?

Often people do this with either

 (1) a cover letter for v4, that shows the "git diff" output to go
     from the result of applying v3 to the result of applying v4 to
     the same initial state; or

 (2) a textual description after three-dash line of v4 that explains
     what has changed relative to v3.

The latter is often done when the change between v3 and v4 is small
enough.

> Yes, that is PEP-8 style and I will change it
> accordingly. Unfortunately git-p4.py does not follow a style guide.
> e.g. line 2369: def commit(self, details, files, branch, parent = ""):

OK, just as I suspected.  Then do not worry too much about it for
now, as fixes to existing style violations should be done outside of
this change, perhaps after the dust settles (or if you prefer, you
can do so as a preliminary clean-up patch, that does not change
anything but style, and then build your fix on top of it).

> More annoyingly (to me at least) is that git-p4 mixes CamelCase with
> snake_case even within classes/functions. I think I read somewhere
> that these kind of refactorings are discouraged. I assume that applies
> here, too?

If you are doing something other than style fixes (call that
"meaningful work"), it is strongly discouraged to fix existing style
violations in the same commit.  If you are going to do meaningful
work on an otherwise dormant part of the system (you can judge it by
checking the recent history of the files you are going to touch,
e.g. "git log --no-merges pu -- git-p4.py"), you are encouraged to
first do the style fixes in separate patches as preliminary clean-ups
without changing anything else and then build your meaningful work
on top of it.

What is discouraged is a change that tries to only do style fixes
etc. to parts of the system that are actively being modified by
other people for their meaningful work.

>> You are verifying what the set of "canonical" paths should be by
>> checking ls-files output.  I think that is what you intended to do
>> (i.e. I am saying "I think the code is more correct than the earlier
>> round that used find"), but I just am double checking.
> I agree that “ls-files” is better because it reflects what ends up
> in the Git repository and how it ends up there.

Thanks. I wanted to double-check that the problem you saw was not
about what is left in the filesystem but more about what is recorded
in the Git history.  "ls-files" check is absolutely the right approach
in that case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-24  6:33     ` Junio C Hamano
@ 2015-08-24  8:22       ` Lars Schneider
  0 siblings, 0 replies; 10+ messages in thread
From: Lars Schneider @ 2015-08-24  8:22 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, luke, pw, torarvid, ksaitoh560, tboegi, sunshine

On 24 Aug 2015, at 08:33, Junio C Hamano <gitster@pobox.com> wrote:

> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>>> - Have you checked "git log" on our history and notice how nobody
>>>  says "PROBLEM:" and "SOLUTION:" in capital letters?  Don't try to
>>>  be original in the form; your contributions are already original
>>>  and valuable in the substance ;-)
>> haha ok. I will make them lower case :-)
> 
> I cannot tell if you are joking or not, but just in case you are
> serious, please check "git log" for recent history again.  We do not
> mark our paragraphs with noisy labels like "PROBLEM" and "SOLUTION",
> regardless of case.  Typically, our description outlines the current
> status (which prepares the reader's mind to understand what you are
> going to talk about), highlight what is problematic in that current
> status, and then explains what change the patch does and justifies
> why it is the right change, in this order.  So those who read your
> description can tell PROBLEM and SOLUTION apart without being told
> with labels.
I wasn’t joking. I got your point and I am going to change it. Sorry for the confusion.

> 
>>> - I think I saw v3 yesterday.  It would be nice to see a brief
>>>  description of what has been updated in this version.
>> I discovered an optimization. In v3 I fixed the paths of *all* files
>> that are underneath of a given P4 clone path. In v4 I fix only the
>> paths that are visible on the client via client-spec (P4 can perform
>> partial checkouts via “client-views”). I was wondering how to convey
>> this change. Would have been a cover letter for v4 the correct way or
>> should I have made another commit on top of my v3 change?
> 
> Often people do this with either
> 
> (1) a cover letter for v4, that shows the "git diff" output to go
>     from the result of applying v3 to the result of applying v4 to
>     the same initial state; or
> 
> (2) a textual description after three-dash line of v4 that explains
>     what has changed relative to v3.
> 
> The latter is often done when the change between v3 and v4 is small
> enough.
Ok. Thanks!

> 
>> Yes, that is PEP-8 style and I will change it
>> accordingly. Unfortunately git-p4.py does not follow a style guide.
>> e.g. line 2369: def commit(self, details, files, branch, parent = ""):
> 
> OK, just as I suspected.  Then do not worry too much about it for
> now, as fixes to existing style violations should be done outside of
> this change, perhaps after the dust settles (or if you prefer, you
> can do so as a preliminary clean-up patch, that does not change
> anything but style, and then build your fix on top of it).
> 
>> More annoyingly (to me at least) is that git-p4 mixes CamelCase with
>> snake_case even within classes/functions. I think I read somewhere
>> that these kind of refactorings are discouraged. I assume that applies
>> here, too?
> 
> If you are doing something other than style fixes (call that
> "meaningful work"), it is strongly discouraged to fix existing style
> violations in the same commit.  If you are going to do meaningful
> work on an otherwise dormant part of the system (you can judge it by
> checking the recent history of the files you are going to touch,
> e.g. "git log --no-merges pu -- git-p4.py"), you are encouraged to
> first do the style fixes in separate patches as preliminary clean-ups
> without changing anything else and then build your meaningful work
> on top of it.
> 
> What is discouraged is a change that tries to only do style fixes
> etc. to parts of the system that are actively being modified by
> other people for their meaningful work.
Ok. Thanks for the explanation.

> 
>>> You are verifying what the set of "canonical" paths should be by
>>> checking ls-files output.  I think that is what you intended to do
>>> (i.e. I am saying "I think the code is more correct than the earlier
>>> round that used find"), but I just am double checking.
>> I agree that “ls-files” is better because it reflects what ends up
>> in the Git repository and how it ends up there.
> 
> Thanks. I wanted to double-check that the problem you saw was not
> about what is left in the filesystem but more about what is recorded
> in the Git history.  "ls-files" check is absolutely the right approach
> in that case.
Cool!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-23  0:11 ` Luke Diamand
@ 2015-08-24  9:51   ` Lars Schneider
  2015-08-24 12:28     ` Luke Diamand
  0 siblings, 1 reply; 10+ messages in thread
From: Lars Schneider @ 2015-08-24  9:51 UTC (permalink / raw
  To: Luke Diamand
  Cc: Git Users, Pete Wyckoff, Tor Arvid Lund, ksaitoh560,
	Torsten Bögershausen, Eric Sunshine, Junio C Hamano


> Lars - thanks for persisting with this!
> 
> I'm still trying to fully understand what's going on here - can you
> point out where I've got it wrong below please!
Your welcome + sure :)


> The server is on Linux, and is case-sensitive. For whatever reason
> (probably people committing changes on Windows in the first place)
We have many P4 servers. Some run on Linux and some on Windows. The Linux ones are executed with “-C1” flag to "Force the server to operate in case-insensitive mode on a normally case-sensitive platform.”. I did that in the test case, too.


> we've ended up with a P4 repo that has differences in path case that
> need to be ignored, with all upper-case paths being mapped to
> lower-case.
You are correct with “P4 repo that has differences in path case”. But it can be any path case variation. Not only all upper-case. I just realized that all my examples and tests show all upper-case. I will fix that. Consider these files in P4:

//depot/Path/File1
//depot/PATH/File2
//depot/pATH/File3

P4 would sync them on a case insensitive filesystem to:

$CLIENT_DIR/Path/File1
$CLIENT_DIR/Path/File2
$CLIENT_DIR/Path/File3

… and this is how they should end up in Git.


> e.g. the *real* depot might be:
> 
> //depot/path/File1
> //depot/PATH/File2
> 
> git-p4 clone should return this case-folded on Windows (and Linux?)
> 
> $GIT_DIR/path/File1
> $GIT_DIR/path/File2
Correct. Although the correct path might be the following too:
$GIT_DIR/PATH/File1
$GIT_DIR/PATH/File2


> (Am I right in thinking that this change folds the directory, but not
> the filename?)
Correct.


> I don't really understand why a dictionary is required to do this
> though - is there a reason we can't just lowercase all incoming paths?
The result is not necessarily all lowercase. Even though the case doesn’t matter as we are talking about case-insensitve filesystems I want to use the case that P4 would pick (see example above).


> Which is what the existing code in p4StartWith() is trying to do. That
> code lowercases the *entire* path including the filename; this change
> seems to leave the filename portion unchanged. I guess though that the
> answer may be to do with your finding that P4 makes up the case of
> directories based on lexicographic ordering - is that the basic
> problem?
Correct.


> For a large repo, this approach is surely going to be really slow.
True. But we use git-p4 as a one time operation to migrate our repos from P4 to Git. Therefore correctness is more important than speed. Plus it is not enabled by default. You need to pass a parameter switch to git-p4.


> Additionally, could you update your patch to add some words to
> Documentation/git-p4.txt please?
I will do that!


> On 21 August 2015 at 18:19,  <larsxschneider@gmail.com> wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> PROBLEM:
>> We run P4 servers on Linux and P4 clients on Windows. For an unknown
>> reason the file path for a number of files in P4 does not match the
>> directory path with respect to case sensitivity.
>> 
>> E.g. `p4 files` might return
>> //depot/path/to/file1
>> //depot/PATH/to/file2
>> 
>> If you use P4/P4V then these files end up in the same directory, e.g.
>> //depot/path/to/file1
>> //depot/path/to/file2
>> 
>> If you use git-p4 then all files not matching the correct file path
>> (e.g. `file2`) will be ignored.
> 
> I'd like to think that the existing code that checks core.ignorecase
> should handle this. I haven't tried it myself though.
core.ignorecase doesn’t help. I added a test case to prove that.

> 
> 
>> 
>> SOLUTION:
>> Identify path names that are different with respect to case sensitivity.
>> If there are any then run `p4 dirs` to build up a dictionary
>> containing the "correct" cases for each path. It looks like P4
>> interprets "correct" here as the existing path of the first file in a
>> directory. The path dictionary is used later on to fix all paths.
>> 
>> This is only applied if the parameter "--ignore-path-case" is passed to
>> the git-p4 clone command.
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
>> git-p4.py                         |  82 ++++++++++++++++++++++++++--
>> t/t9821-git-p4-path-variations.sh | 109 ++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 187 insertions(+), 4 deletions(-)
>> create mode 100755 t/t9821-git-p4-path-variations.sh
>> 
>> diff --git a/git-p4.py b/git-p4.py
>> index 073f87b..61a587b 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -1931,7 +1931,7 @@ class View(object):
>>                 (self.client_prefix, clientFile))
>>         return clientFile[len(self.client_prefix):]
>> 
>> -    def update_client_spec_path_cache(self, files):
>> +    def update_client_spec_path_cache(self, files, fixPathCase = None):
>>         """ Caching file paths by "p4 where" batch query """
>> 
>>         # List depot file paths exclude that already cached
>> @@ -1950,6 +1950,8 @@ class View(object):
>>             if "unmap" in res:
>>                 # it will list all of them, but only one not unmap-ped
>>                 continue
>> +            if fixPathCase:
>> +                res['depotFile'] = fixPathCase(res['depotFile'])
>>             self.client_spec_path_cache[res['depotFile']] = self.convert_client_path(res["clientFile"])
>> 
>>         # not found files or unmap files set to ""
>> @@ -1987,6 +1989,7 @@ class P4Sync(Command, P4UserMap):
>>                                      help="Maximum number of changes to import"),
>>                 optparse.make_option("--changes-block-size", dest="changes_block_size", type="int",
>>                                      help="Internal block size to use when iteratively calling p4 changes"),
>> +                optparse.make_option("--ignore-path-case", dest="ignorePathCase", action="store_true"),
>>                 optparse.make_option("--keep-path", dest="keepRepoPath", action='store_true',
>>                                      help="Keep entire BRANCH/DIR/SUBDIR prefix during import"),
>>                 optparse.make_option("--use-client-spec", dest="useClientSpec", action='store_true',
>> @@ -2017,6 +2020,7 @@ class P4Sync(Command, P4UserMap):
>>         self.maxChanges = ""
>>         self.changes_block_size = None
>>         self.keepRepoPath = False
>> +        self.ignorePathCase = False
>>         self.depotPaths = None
>>         self.p4BranchesInGit = []
>>         self.cloneExclude = []
>> @@ -2049,7 +2053,8 @@ class P4Sync(Command, P4UserMap):
>>         files = []
>>         fnum = 0
>>         while commit.has_key("depotFile%s" % fnum):
>> -            path =  commit["depotFile%s" % fnum]
>> +            path = commit["depotFile%s" % fnum]
>> +            path = self.fixPathCase(path)
>> 
>>             if [p for p in self.cloneExclude
>>                 if p4PathStartsWith(path, p)]:
>> @@ -2113,7 +2118,9 @@ class P4Sync(Command, P4UserMap):
>>         branches = {}
>>         fnum = 0
>>         while commit.has_key("depotFile%s" % fnum):
>> -            path =  commit["depotFile%s" % fnum]
>> +            path = commit["depotFile%s" % fnum]
>> +            path = self.fixPathCase(path)
>> +
>>             found = [p for p in self.depotPaths
>>                      if p4PathStartsWith(path, p)]
>>             if not found:
>> @@ -2240,6 +2247,10 @@ class P4Sync(Command, P4UserMap):
>>             if marshalled["code"] == "error":
>>                 if "data" in marshalled:
>>                     err = marshalled["data"].rstrip()
>> +
>> +        if "depotFile" in marshalled:
>> +            marshalled['depotFile'] = self.fixPathCase(marshalled['depotFile'])
>> +
>>         if err:
>>             f = None
>>             if self.stream_have_file_info:
>> @@ -2314,6 +2325,7 @@ class P4Sync(Command, P4UserMap):
>> 
>>             # do the last chunk
>>             if self.stream_file.has_key('depotFile'):
>> +                self.stream_file['depotFile'] = self.fixPathCase(self.stream_file['depotFile'])
>>                 self.streamOneP4File(self.stream_file, self.stream_contents)
>> 
>>     def make_email(self, userid):
>> @@ -2371,7 +2383,8 @@ class P4Sync(Command, P4UserMap):
>>                 sys.stderr.write("Ignoring file outside of prefix: %s\n" % f['path'])
>> 
>>         if self.clientSpecDirs:
>> -            self.clientSpecDirs.update_client_spec_path_cache(files)
>> +            self.clientSpecDirs.update_client_spec_path_cache(
>> +                files, lambda x: self.fixPathCase(x))
>> 
>>         self.gitStream.write("commit %s\n" % branch)
>> #        gitStream.write("mark :%s\n" % details["change"])
>> @@ -2835,6 +2848,62 @@ class P4Sync(Command, P4UserMap):
>>             print "IO error with git fast-import. Is your git version recent enough?"
>>             print self.gitError.read()
>> 
>> +    def fixPathCase(self, path):
>> +        if self.caseCorrectedPaths:
>> +            components = path.split('/')
>> +            filename = components.pop()
>> +            dirname = '/'.join(components).lower() + '/'
>> +            if dirname in self.caseCorrectedPaths:
>> +                path = self.caseCorrectedPaths[dirname] + filename
>> +        return path
>> +
>> +    def generatePathCaseDict(self, clientPrefix):
>> +        # Query all files and generate a list of all used paths
>> +        # e.g. this files list:
>> +        # //depot/path/to/file1
>> +        # //depot/PATH/to/file2
>> +        #
>> +        # result in this path list:
>> +        # //depot/
>> +        # //depot/PATH/
>> +        # //depot/path/
>> +        # //depot/PATH/to/
>> +        # //depot/path/to/
>> +        p4_paths = set()
>> +        for f in p4CmdList(["files", clientPrefix + "..."]):
>> +            components = f["depotFile"].split('/')[0:-1]
>> +            for i in range(3, len(components)+1):
>> +                p4_paths.add('/'.join(components[0:i]) + '/')
>> +        p4_paths = sorted(list(p4_paths), key=len)
>> +
>> +        if len(p4_paths) > len(set([p.lower() for p in p4_paths])):
>> +            print "ATTENTION: File paths with different case variations detected. Fixing may take a while..."
> 
> Does this really need "ATTENTION" in the output?
No :-) I’ll remove it.

> 
>> +            found_variations = True
>> +            while found_variations:
>> +                for path in p4_paths:
>> +                    found_variations = False
>> +                    path_variations = [p for p in p4_paths if p.lower() == path.lower()]
>> +
>> +                    if len(path_variations) > 1:
>> +                        print  "%i different case variations for path '%s' detected." % (len(path_variations), path)
>> +                        # If we detect path variations (e.g. //depot/path and //depot/PATH) then we query P4 to list
>> +                        # the subdirectories of the parent (e.g //depot/*). P4 will return these subdirectories with
>> +                        # the correct case.
>> +                        parent_path = '/'.join(path.split('/')[0:-2])
>> +                        case_ok_paths = [p["dir"] + '/' for p in p4CmdList(["dirs", "-D", parent_path + '/*'])]
>> +
>> +                        # Replace all known paths with the case corrected path from P4 dirs command
>> +                        for case_ok_path in case_ok_paths:
>> +                            pattern = re.compile("^" + re.escape(case_ok_path), re.IGNORECASE)
>> +                            p4_paths = sorted(list(set([pattern.sub(case_ok_path, p) for p in p4_paths])), key=len)
> 
> Sorry, I don't quite get what this is doing.
> 
> We're generating a list of all of the files in the depot being cloned.
Correct. Take this example:

/Path/file1.txt
/path/Something/file2.txt
/pATh/sOmething/file3.txt
/path/sOmething/ELSE/file4.txt
/path/sOmething/Else/file5.txt


> From that we get a set of all of the paths, sorted by length.
Correct. Following the example:

/Path/
/path/Something/
/pATh/sOmething/
/path/sOmething/ELSE/
/Path/sOmething/Else/

> And then we seem to repeatedly rewrite that set with a new set where
> each element has been replaced using a regular expression in a way
> that I don't quite understand….
I look for paths in this list that differ only in case. In the example the first duplicate that I would detect is “/path/something”. I would use the “dirs” command to figure out which case is the correct one (result here: “/Path/Something”). Afterwards I use the regex to replace items in the list that start with a case variation of this of this path in the list:

/Path/
/Path/Something/
/Path/Something/
/Path/Something/ELSE/
/Path/Something/Else/

In the last step I transform the list to a set to remove duplicates

/Path/
/Path/Something/
/Path/Something/ELSE/
/Path/Something/Else/

At this point the algorithm detects another duplicate “/path/something/else”. We start all over and request the correct path with “dirs” and replace all occurrences. We end up with:

/Path/
/Path/Something/
/Path/Something/Else/

> 
>> +
>> +                        found_variations = True
>> +                        break
>> +            return dict((p.lower(), p) for p in p4_paths)
> 
> ... and then we just return a dictionary that maps every path to a
> lowercase version of itself? So couldn't we have just blindly
> lowercased all the paths in the first place?
The dictionary key is lower case. The dictionary value contains the correct case. In the example:

/path/ —> /Path/
/path/something/ —> /Path/Something/ 
/path/something/else/ —> /Path/Something/Else

> 
> 
>> +        else:
>> +            if self.verbose:
>> +                print "All file paths have consistent case"
>> +            return None
>> 
>>     def run(self, args):
>>         self.depotPaths = []
>> @@ -3006,6 +3075,11 @@ class P4Sync(Command, P4UserMap):
>> 
>>         self.depotPaths = newPaths
>> 
>> +        if self.ignorePathCase and self.clientSpecDirs:
>> +            self.caseCorrectedPaths = self.generatePathCaseDict(self.clientSpecDirs.client_prefix)
>> +        else:
>> +            self.caseCorrectedPaths = None
>> +
>>         # --detect-branches may change this for each branch
>>         self.branchPrefixes = self.depotPaths
>> 
>> diff --git a/t/t9821-git-p4-path-variations.sh b/t/t9821-git-p4-path-variations.sh
>> new file mode 100755
>> index 0000000..90515a1
>> --- /dev/null
>> +++ b/t/t9821-git-p4-path-variations.sh
>> @@ -0,0 +1,109 @@
>> +#!/bin/sh
>> +
>> +test_description='Clone repositories with path case variations'
>> +
>> +. ./lib-git-p4.sh
>> +
>> +test_expect_success 'start p4d with case folding enabled' '
>> +       start_p4d -C1
>> +'
>> +
>> +test_expect_success 'Create a repo with path case variations' '
>> +       client_view "//depot/... //client/..." &&
>> +       (
>> +               cd "$cli" &&
>> +
>> +               mkdir -p One/two &&
>> +               >One/two/File2.txt &&
>> +               p4 add One/two/File2.txt &&
>> +               p4 submit -d "Add file2" &&
>> +               rm -rf One &&
>> +
>> +               mkdir -p one/TWO &&
>> +               >one/TWO/file1.txt &&
>> +               p4 add one/TWO/file1.txt &&
>> +               p4 submit -d "Add file1" &&
>> +               rm -rf one &&
>> +
>> +               mkdir -p one/two &&
>> +               >one/two/file3.txt &&
>> +               p4 add one/two/file3.txt &&
>> +               p4 submit -d "Add file3" &&
>> +               rm -rf one &&
>> +
>> +               mkdir -p outside-spec &&
>> +               >outside-spec/file4.txt &&
>> +               p4 add outside-spec/file4.txt &&
>> +               p4 submit -d "Add file4" &&
>> +               rm -rf outside-spec
>> +       )
>> +'
>> +
>> +test_expect_success 'Clone the repo and WITHOUT path fixing' '
> 
> s/Clone the repo and/Clone the repo WITHOUT/
Fixed!

> 
> 
> 
>> +       client_view "//depot/One/... //client/..." &&
>> +       git p4 clone --use-client-spec --destination="$git" //depot &&
>> +       test_when_finished cleanup_git &&
>> +       (
>> +               cd "$git" &&
>> +               # This method is used instead of "test -f" to ensure the case is
>> +               # checked even if the test is executed on case-insensitive file systems.
>> +               cat >expect <<-\EOF &&
>> +                       two/File2.txt
>> +               EOF
>> +               git ls-files >actual &&
>> +               test_cmp expect actual
>> +       )
>> +'
>> +
>> +test_expect_success 'Clone the repo WITH path fixing' '
>> +       client_view "//depot/One/... //client/..." &&
>> +       git p4 clone --ignore-path-case --use-client-spec --destination="$git" //depot &&
>> +       test_when_finished cleanup_git &&
>> +       (
>> +               cd "$git" &&
>> +               cat >expect <<-\EOF &&
>> +                       TWO/File2.txt
>> +                       TWO/file1.txt
>> +                       TWO/file3.txt
>> +               EOF
>> +               git ls-files >actual &&
>> +               test_cmp expect actual
>> +       )
>> +'
>> +
>> +# It looks like P4 determines the path case based on the first file in
>> +# lexicographical order. Please note the lower case "two" directory for all
>> +# files triggered through the addition of "File0.txt".
> 
> If that's what P4 is doing, then it's *really* nasty!
Indeed it is.

> 
> 
>> +test_expect_success 'Add a new file and clone the repo WITH path fixing' '
>> +       client_view "//depot/... //client/..." &&
>> +       (
>> +               cd "$cli" &&
>> +
>> +               mkdir -p One/two &&
>> +               >One/two/File0.txt &&
>> +               p4 add One/two/File0.txt &&
>> +               p4 submit -d "Add file" &&
>> +               rm -rf One
>> +       ) &&
>> +
>> +       client_view "//depot/One/... //client/..." &&
>> +       git p4 clone --ignore-path-case --use-client-spec --destination="$git" //depot &&
>> +       test_when_finished cleanup_git &&
>> +       (
>> +               cd "$git" &&
>> +               cat >expect <<-\EOF &&
>> +                       two/File0.txt
>> +                       two/File2.txt
>> +                       two/file1.txt
>> +                       two/file3.txt
>> +               EOF
>> +               git ls-files >actual &&
>> +               test_cmp expect actual
>> +       )
>> +'
>> +
>> +test_expect_success 'kill p4d' '
>> +       kill_p4d
>> +'
>> +
>> +test_done
>> --
>> 1.9.5 (Apple Git-50.3)

While I was working on the examples for this email reply I realized that the problem is only present for paths given in a client-spec. I added a test case to prove that. That means I don’t need to request *all* paths. I *think* it is sufficient to request the paths mentioned in the client spec which would usually be really fast. Stay tuned for PATCH v5.

Thanks for your review,
Lars

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-24  9:51   ` Lars Schneider
@ 2015-08-24 12:28     ` Luke Diamand
  2015-08-24 12:43       ` Lars Schneider
  0 siblings, 1 reply; 10+ messages in thread
From: Luke Diamand @ 2015-08-24 12:28 UTC (permalink / raw
  To: Lars Schneider
  Cc: Git Users, Pete Wyckoff, Tor Arvid Lund, ksaitoh560,
	Torsten Bögershausen, Eric Sunshine, Junio C Hamano

On 24 August 2015 at 10:51, Lars Schneider <larsxschneider@gmail.com> wrote:
>>
>> I'm still trying to fully understand what's going on here - can you
>> point out where I've got it wrong below please!
> Your welcome + sure :)
>

<snip>

>
> While I was working on the examples for this email reply I realized that the problem is only present for paths given in a client-spec. I added a test case to prove that. That means I don’t need to request *all* paths. I *think* it is sufficient to request the paths mentioned in the client spec which would usually be really fast. Stay tuned for PATCH v5.

I've just tried a small experiment with stock unaltered git-p4.

- Started p4d with -C1 option to case-fold.
- Add some files with different cases of directory (Path/File1,
PATH/File2, pATH/File3).
- git-p4 clone.

The result of the clone is separate directories if I do nothing
special (PATH/File1, Path/File2, etc). But if I set core.ignorecase, I
get a single case-folded directory, Path/File1, Path/File2, etc. I'm
still failing to get how that isn't what you need (other than being a
bit obscure to get to the right invocation).

I've put a script that shows this here:

https://github.com/luked99/quick-git-p4-case-folding-test

Thanks!
Luke

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-24 12:28     ` Luke Diamand
@ 2015-08-24 12:43       ` Lars Schneider
  2015-08-24 13:49         ` Luke Diamand
  0 siblings, 1 reply; 10+ messages in thread
From: Lars Schneider @ 2015-08-24 12:43 UTC (permalink / raw
  To: Luke Diamand
  Cc: Git Users, Pete Wyckoff, Tor Arvid Lund, ksaitoh560,
	Torsten Bögershausen, Eric Sunshine, Junio C Hamano


> 
> <snip>
> 
>> 
>> While I was working on the examples for this email reply I realized that the problem is only present for paths given in a client-spec. I added a test case to prove that. That means I don’t need to request *all* paths. I *think* it is sufficient to request the paths mentioned in the client spec which would usually be really fast. Stay tuned for PATCH v5.
> 
> I've just tried a small experiment with stock unaltered git-p4.
> 
> - Started p4d with -C1 option to case-fold.
> - Add some files with different cases of directory (Path/File1,
> PATH/File2, pATH/File3).
> - git-p4 clone.
> 
> The result of the clone is separate directories if I do nothing
> special (PATH/File1, Path/File2, etc). But if I set core.ignorecase, I
> get a single case-folded directory, Path/File1, Path/File2, etc. I'm
> still failing to get how that isn't what you need (other than being a
> bit obscure to get to the right invocation).
> 
> I've put a script that shows this here:
> 
> https://github.com/luked99/quick-git-p4-case-folding-test
As mentioned I realized that the problem occurs only if you use client specs. Can you take a look at this test case / run it?
https://github.com/larsxschneider/git/blob/d3a191cb5fb4d8f5f48ca9314c772169d5dbf65b/t/t9821-git-p4-path-variations.sh#L112-L127

Does this make sense to you? If you want to I could also modify “quick-git-p4-case-folding-test” to show the problem.

Cheers,
Lars

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] git-p4: fix faulty paths for case insensitive systems
  2015-08-24 12:43       ` Lars Schneider
@ 2015-08-24 13:49         ` Luke Diamand
  0 siblings, 0 replies; 10+ messages in thread
From: Luke Diamand @ 2015-08-24 13:49 UTC (permalink / raw
  To: Lars Schneider
  Cc: Git Users, Pete Wyckoff, Tor Arvid Lund, ksaitoh560,
	Torsten Bögershausen, Eric Sunshine, Junio C Hamano

On 24 August 2015 at 13:43, Lars Schneider <larsxschneider@gmail.com> wrote:
>>
>> https://github.com/luked99/quick-git-p4-case-folding-test
> As mentioned I realized that the problem occurs only if you use client specs. Can you take a look at this test case / run it?
> https://github.com/larsxschneider/git/blob/d3a191cb5fb4d8f5f48ca9314c772169d5dbf65b/t/t9821-git-p4-path-variations.sh#L112-L127
>
> Does this make sense to you? If you want to I could also modify “quick-git-p4-case-folding-test” to show the problem.

If you're able to fix my hacky test to show the problem that would be very kind.

If the problem only shows up when using client specs, is it possible
that the core.ignorecase logic is just missing a code path somewhere?

Glancing through the code, stripRepoPath() could perhaps be the
culprit? If self.useClientSpec is FALSE, it will do the
core.ignorecase trick by calling p4PathStartsWith, but if it is TRUE,
it won't.

Thanks!
Luke

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-08-24 13:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-21 17:19 [PATCH v4] git-p4: fix faulty paths for case insensitive systems larsxschneider
2015-08-21 18:01 ` Junio C Hamano
2015-08-23 17:10   ` Lars Schneider
2015-08-24  6:33     ` Junio C Hamano
2015-08-24  8:22       ` Lars Schneider
2015-08-23  0:11 ` Luke Diamand
2015-08-24  9:51   ` Lars Schneider
2015-08-24 12:28     ` Luke Diamand
2015-08-24 12:43       ` Lars Schneider
2015-08-24 13:49         ` Luke Diamand

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).