git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files
@ 2016-12-18 17:51 larsxschneider
  2016-12-19 21:29 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: larsxschneider @ 2016-12-18 17:51 UTC (permalink / raw)
  To: git; +Cc: luke, gitster, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

In a9e38359e3 we taught git-p4 a way to re-encode path names from what
was used in Perforce to UTF-8. This path re-encoding worked properly for
"added" paths. "Removed" paths were not re-encoded and therefore
different from the "added" paths. Consequently, these files were not
removed in a git-p4 cloned Git repository because the path names did not
match.

Fix this by moving the re-encoding to a place that affects "added" and
"removed" paths. Add a test to demonstrate the issue.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---

Notes:
    Base Commit: d1271bddd4 (v2.11.0)
    Diff on Web: https://github.com/git/git/compare/d1271bddd4...larsxschneider:05a82caa69
    Checkout:    git fetch https://github.com/larsxschneider/git git-p4/fix-path-encoding-v1 && git checkout 05a82caa69

 git-p4.py                       | 19 +++++++++----------
 t/t9822-git-p4-path-encoding.sh | 16 ++++++++++++++++
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index fd5ca52462..8f311cb4e8 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2366,6 +2366,15 @@ class P4Sync(Command, P4UserMap):
                     break
 
         path = wildcard_decode(path)
+        try:
+            path.decode('ascii')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if self.verbose:
+                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path)
         return path
 
     def splitFilesIntoBranches(self, commit):
@@ -2495,16 +2504,6 @@ class P4Sync(Command, P4UserMap):
             text = regexp.sub(r'$\1$', text)
             contents = [ text ]
 
-        try:
-            relPath.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            relPath = relPath.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, relPath)
-
         if self.largeFileSystem:
             (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents)
 
diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
index 7b83e696a9..c78477c19b 100755
--- a/t/t9822-git-p4-path-encoding.sh
+++ b/t/t9822-git-p4-path-encoding.sh
@@ -51,6 +51,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p
 	)
 '
 
+test_expect_success 'Delete iso8859-1 encoded paths and clone' '
+	(
+		cd "$cli" &&
+		ISO8859="$(printf "$ISO8859_ESCAPED")" &&
+		p4 delete "$ISO8859" &&
+		p4 submit -d "remove file"
+	) &&
+	git p4 clone --destination="$git" //depot@all &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_must_be_empty actual
+	)
+'
+
 test_expect_success 'kill p4d' '
 	kill_p4d
 '
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files
  2016-12-18 17:51 [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files larsxschneider
@ 2016-12-19 21:29 ` Junio C Hamano
  2016-12-20 11:01   ` Luke Diamand
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2016-12-19 21:29 UTC (permalink / raw)
  To: larsxschneider; +Cc: git, luke

larsxschneider@gmail.com writes:

> From: Lars Schneider <larsxschneider@gmail.com>
>
> In a9e38359e3 we taught git-p4 a way to re-encode path names from what
> was used in Perforce to UTF-8. This path re-encoding worked properly for
> "added" paths. "Removed" paths were not re-encoded and therefore
> different from the "added" paths. Consequently, these files were not
> removed in a git-p4 cloned Git repository because the path names did not
> match.
>
> Fix this by moving the re-encoding to a place that affects "added" and
> "removed" paths. Add a test to demonstrate the issue.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---

Thanks.

The above description makes me wonder what happens to "modified"
paths, but presumably they are handled in a separate codepath?  Or
does this also cover not just "removed" but also paths with any
change?

Luke, does this look good?

> Notes:
>     Base Commit: d1271bddd4 (v2.11.0)
>     Diff on Web: https://github.com/git/git/compare/d1271bddd4...larsxschneider:05a82caa69
>     Checkout:    git fetch https://github.com/larsxschneider/git git-p4/fix-path-encoding-v1 && git checkout 05a82caa69
>
>  git-p4.py                       | 19 +++++++++----------
>  t/t9822-git-p4-path-encoding.sh | 16 ++++++++++++++++
>  2 files changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index fd5ca52462..8f311cb4e8 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -2366,6 +2366,15 @@ class P4Sync(Command, P4UserMap):
>                      break
>  
>          path = wildcard_decode(path)
> +        try:
> +            path.decode('ascii')
> +        except:
> +            encoding = 'utf8'
> +            if gitConfig('git-p4.pathEncoding'):
> +                encoding = gitConfig('git-p4.pathEncoding')
> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
> +            if self.verbose:
> +                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path)
>          return path
>  
>      def splitFilesIntoBranches(self, commit):
> @@ -2495,16 +2504,6 @@ class P4Sync(Command, P4UserMap):
>              text = regexp.sub(r'$\1$', text)
>              contents = [ text ]
>  
> -        try:
> -            relPath.decode('ascii')
> -        except:
> -            encoding = 'utf8'
> -            if gitConfig('git-p4.pathEncoding'):
> -                encoding = gitConfig('git-p4.pathEncoding')
> -            relPath = relPath.decode(encoding, 'replace').encode('utf8', 'replace')
> -            if self.verbose:
> -                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, relPath)
> -
>          if self.largeFileSystem:
>              (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents)
>  
> diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
> index 7b83e696a9..c78477c19b 100755
> --- a/t/t9822-git-p4-path-encoding.sh
> +++ b/t/t9822-git-p4-path-encoding.sh
> @@ -51,6 +51,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p
>  	)
>  '
>  
> +test_expect_success 'Delete iso8859-1 encoded paths and clone' '
> +	(
> +		cd "$cli" &&
> +		ISO8859="$(printf "$ISO8859_ESCAPED")" &&
> +		p4 delete "$ISO8859" &&
> +		p4 submit -d "remove file"
> +	) &&
> +	git p4 clone --destination="$git" //depot@all &&
> +	test_when_finished cleanup_git &&
> +	(
> +		cd "$git" &&
> +		git -c core.quotepath=false ls-files >actual &&
> +		test_must_be_empty actual
> +	)
> +'
> +
>  test_expect_success 'kill p4d' '
>  	kill_p4d
>  '

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files
  2016-12-19 21:29 ` Junio C Hamano
@ 2016-12-20 11:01   ` Luke Diamand
  2016-12-22 21:23     ` Junio C Hamano
  2017-02-09 15:06     ` [PATCH v2] " Lars Schneider
  0 siblings, 2 replies; 8+ messages in thread
From: Luke Diamand @ 2016-12-20 11:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git Users

On 19 December 2016 at 21:29, Junio C Hamano <gitster@pobox.com> wrote:
> larsxschneider@gmail.com writes:
>
>> From: Lars Schneider <larsxschneider@gmail.com>
>>
>> In a9e38359e3 we taught git-p4 a way to re-encode path names from what
>> was used in Perforce to UTF-8. This path re-encoding worked properly for
>> "added" paths. "Removed" paths were not re-encoded and therefore
>> different from the "added" paths. Consequently, these files were not
>> removed in a git-p4 cloned Git repository because the path names did not
>> match.
>>
>> Fix this by moving the re-encoding to a place that affects "added" and
>> "removed" paths. Add a test to demonstrate the issue.
>>
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
>
> Thanks.
>
> The above description makes me wonder what happens to "modified"
> paths, but presumably they are handled in a separate codepath?  Or
> does this also cover not just "removed" but also paths with any
> change?
>
> Luke, does this look good?

I'm not totally sure. In the previous version the conversion happened
in streamOneP4File(). There is a counterpart to this,
streamOneP4Deletion() which would seem like the callpoint that needs
to know about this.

The change puts the logic into stripRepoPath() instead, which is
indeed called from both of those functions (good), but also from
splitFilesIntoBranches(), but only if self.useClientSpec is set. That
function only gets used if we're doing the automatic branch detection
logic, so it's possible that this code might now be broken and we
wouldn't know.

Lars, what do you think? Other than the above, the change looks good,
so it may all be fine.

(As an aside, this is the heart of the code that's going to need some
careful rework if/when we ever move to Python3).

Luke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files
  2016-12-20 11:01   ` Luke Diamand
@ 2016-12-22 21:23     ` Junio C Hamano
  2017-02-09 15:06     ` [PATCH v2] " Lars Schneider
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2016-12-22 21:23 UTC (permalink / raw)
  To: Luke Diamand; +Cc: Lars Schneider, Git Users

Luke Diamand <luke@diamand.org> writes:

> The change puts the logic into stripRepoPath() instead, which is
> indeed called from both of those functions (good), but also from
> splitFilesIntoBranches(), but only if self.useClientSpec is set. That
> function only gets used if we're doing the automatic branch detection
> logic, so it's possible that this code might now be broken and we
> wouldn't know.
>
> Lars, what do you think? Other than the above, the change looks good,
> so it may all be fine.
>
> (As an aside, this is the heart of the code that's going to need some
> careful rework if/when we ever move to Python3).

Thanks.  

I'll merge this as-is to 'next', expecting that further refinement
can be done incrementally.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files
  2016-12-20 11:01   ` Luke Diamand
  2016-12-22 21:23     ` Junio C Hamano
@ 2017-02-09 15:06     ` Lars Schneider
  2017-02-09 23:39       ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Lars Schneider @ 2017-02-09 15:06 UTC (permalink / raw)
  To: git; +Cc: luke, gitster

In a9e38359e3 we taught git-p4 a way to re-encode path names from what
was used in Perforce to UTF-8. This path re-encoding worked properly for
"added" paths. "Removed" paths were not re-encoded and therefore
different from the "added" paths. Consequently, these files were not
removed in a git-p4 cloned Git repository because the path names did not
match.

Fix this by moving the re-encoding to a place that affects "added" and
"removed" paths. Add a test to demonstrate the issue.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---

Hi,

unfortunately, I missed to send this v2. I agree with Luke's review and
I moved the re-encode of the path name to the `streamOneP4File` and
`streamOneP4Deletion` explicitly.

Discussion:
http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/

Thanks,
Lars


Notes:
    Base Commit: 454cb6bd52 (v2.11.0)
    Diff on Web: https://github.com/larsxschneider/git/commit/75ed3e92e2
    Checkout:    git fetch https://github.com/larsxschneider/git git-p4/fix-path-encoding-v2 && git checkout 75ed3e92e2

    Interdiff (v1..v2):

    diff --git a/git-p4.py b/git-p4.py
    index 8f311cb4e8..dac8b4955d 100755
    --- a/git-p4.py
    +++ b/git-p4.py
    @@ -2366,15 +2366,6 @@ class P4Sync(Command, P4UserMap):
                         break

             path = wildcard_decode(path)
    -        try:
    -            path.decode('ascii')
    -        except:
    -            encoding = 'utf8'
    -            if gitConfig('git-p4.pathEncoding'):
    -                encoding = gitConfig('git-p4.pathEncoding')
    -            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
    -            if self.verbose:
    -                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path)
             return path

         def splitFilesIntoBranches(self, commit):
    @@ -2427,11 +2418,24 @@ class P4Sync(Command, P4UserMap):
                 self.gitStream.write(d)
             self.gitStream.write('\n')

    +    def encodeWithUTF8(self, path):
    +        try:
    +            path.decode('ascii')
    +        except:
    +            encoding = 'utf8'
    +            if gitConfig('git-p4.pathEncoding'):
    +                encoding = gitConfig('git-p4.pathEncoding')
    +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
    +            if self.verbose:
    +                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path)
    +        return path
    +
         # output one file from the P4 stream
         # - helper for streamP4Files

         def streamOneP4File(self, file, contents):
             relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
    +        relPath = self.encodeWithUTF8(relPath)
             if verbose:
                 size = int(self.stream_file['fileSize'])
                 sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
    @@ -2511,6 +2515,7 @@ class P4Sync(Command, P4UserMap):

         def streamOneP4Deletion(self, file):
             relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
    +        relPath = self.encodeWithUTF8(relPath)
             if verbose:
                 sys.stdout.write("delete %s\n" % relPath)
                 sys.stdout.flush()

 git-p4.py                       | 24 ++++++++++++++----------
 t/t9822-git-p4-path-encoding.sh | 16 ++++++++++++++++
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index fd5ca52462..dac8b4955d 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2418,11 +2418,24 @@ class P4Sync(Command, P4UserMap):
             self.gitStream.write(d)
         self.gitStream.write('\n')

+    def encodeWithUTF8(self, path):
+        try:
+            path.decode('ascii')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if self.verbose:
+                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path)
+        return path
+
     # output one file from the P4 stream
     # - helper for streamP4Files

     def streamOneP4File(self, file, contents):
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
+        relPath = self.encodeWithUTF8(relPath)
         if verbose:
             size = int(self.stream_file['fileSize'])
             sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
@@ -2495,16 +2508,6 @@ class P4Sync(Command, P4UserMap):
             text = regexp.sub(r'$\1$', text)
             contents = [ text ]

-        try:
-            relPath.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            relPath = relPath.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, relPath)
-
         if self.largeFileSystem:
             (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents)

@@ -2512,6 +2515,7 @@ class P4Sync(Command, P4UserMap):

     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
+        relPath = self.encodeWithUTF8(relPath)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
index 7b83e696a9..c78477c19b 100755
--- a/t/t9822-git-p4-path-encoding.sh
+++ b/t/t9822-git-p4-path-encoding.sh
@@ -51,6 +51,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p
 	)
 '

+test_expect_success 'Delete iso8859-1 encoded paths and clone' '
+	(
+		cd "$cli" &&
+		ISO8859="$(printf "$ISO8859_ESCAPED")" &&
+		p4 delete "$ISO8859" &&
+		p4 submit -d "remove file"
+	) &&
+	git p4 clone --destination="$git" //depot@all &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_must_be_empty actual
+	)
+'
+
 test_expect_success 'kill p4d' '
 	kill_p4d
 '

base-commit: 454cb6bd52a4de614a3633e4f547af03d5c3b640
--
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files
  2017-02-09 15:06     ` [PATCH v2] " Lars Schneider
@ 2017-02-09 23:39       ` Junio C Hamano
  2017-02-10 22:05         ` Luke Diamand
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-02-09 23:39 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, luke

Lars Schneider <larsxschneider@gmail.com> writes:

> unfortunately, I missed to send this v2. I agree with Luke's review and
> I moved the re-encode of the path name to the `streamOneP4File` and
> `streamOneP4Deletion` explicitly.
>
> Discussion:
> http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/
>
> Thanks,
> Lars

Thanks.  Will replace but will not immediately merge to 'next' yet,
just in case Luke wants to tell me add his "Reviewed-by:".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files
  2017-02-09 23:39       ` Junio C Hamano
@ 2017-02-10 22:05         ` Luke Diamand
  2017-02-10 22:32           ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Luke Diamand @ 2017-02-10 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git Users

On 9 February 2017 at 23:39, Junio C Hamano <gitster@pobox.com> wrote:
> Lars Schneider <larsxschneider@gmail.com> writes:
>
>> unfortunately, I missed to send this v2. I agree with Luke's review and
>> I moved the re-encode of the path name to the `streamOneP4File` and
>> `streamOneP4Deletion` explicitly.
>>
>> Discussion:
>> http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/
>>
>> Thanks,
>> Lars
>
> Thanks.  Will replace but will not immediately merge to 'next' yet,
> just in case Luke wants to tell me add his "Reviewed-by:".

Yes, this looks good to me now.

Luke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files
  2017-02-10 22:05         ` Luke Diamand
@ 2017-02-10 22:32           ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-02-10 22:32 UTC (permalink / raw)
  To: Luke Diamand; +Cc: Lars Schneider, Git Users

Luke Diamand <luke@diamand.org> writes:

> On 9 February 2017 at 23:39, Junio C Hamano <gitster@pobox.com> wrote:
>> Lars Schneider <larsxschneider@gmail.com> writes:
>>
>>> unfortunately, I missed to send this v2. I agree with Luke's review and
>>> I moved the re-encode of the path name to the `streamOneP4File` and
>>> `streamOneP4Deletion` explicitly.
>>>
>>> Discussion:
>>> http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/
>>>
>>> Thanks,
>>> Lars
>>
>> Thanks.  Will replace but will not immediately merge to 'next' yet,
>> just in case Luke wants to tell me add his "Reviewed-by:".
>
> Yes, this looks good to me now.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-02-10 22:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-18 17:51 [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files larsxschneider
2016-12-19 21:29 ` Junio C Hamano
2016-12-20 11:01   ` Luke Diamand
2016-12-22 21:23     ` Junio C Hamano
2017-02-09 15:06     ` [PATCH v2] " Lars Schneider
2017-02-09 23:39       ` Junio C Hamano
2017-02-10 22:05         ` Luke Diamand
2017-02-10 22:32           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).