git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Yang Zhao <yang.zhao@skyboxlabs.com>
To: git@vger.kernel.org
Cc: Yang Zhao <yang.zhao@skyboxlabs.com>
Subject: [RFC PATCH 4/4] git-p4: use utf-8 encoding for file paths throughout
Date: Wed, 27 Nov 2019 17:28:07 -0800	[thread overview]
Message-ID: <20191128012807.3103-5-yang.zhao@skyboxlabs.com> (raw)
In-Reply-To: <20191128012807.3103-1-yang.zhao@skyboxlabs.com>

Try to decode file paths in responses from p4 as soon as possible so
that we are working with unicode string throughout the rest of the flow.
This makes python 3 a lot happier.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
---

This is probably the most risky patch out of the set. It's very likely
that I've neglected to consider certain corner cases with decoding of
path data.

 git-p4.py | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 6821d6aafd..bd693e1404 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -650,11 +650,27 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if use_encoded_streams:
                 # Decode unmarshalled dict to use str keys and values, except for:
                 #   - `data` which may contain arbitrary binary data
-                #   - `depotFile` which may contain non-UTF8 encoded text
+                #   - `depotFile` which may contain non-UTF8 encoded text, and is decoded
+                #     according to git-p4.pathEncoding config
                 decoded_entry = {}
                 for key, value in entry.items():
                     key = key.decode()
-                    decoded_entry[key] = value.decode() if not (key in ['data', 'depotFile'] or isinstance(value, str)) else value
+                    if key == 'data':
+                        pass
+                    elif key == 'depotFile':
+                        try:
+                            value = value.decode('ascii')
+                        except:
+                            encoding = 'utf-8'
+                            if gitConfig('git-p4.pathEncoding'):
+                                encoding = gitConfig('git-p4.pathEncoding')
+                            path = path.decode(encoding, 'replace')
+                            if verbose:
+                                print('Path with non-ASCII characters detected. Used %s to decode: %s ' % (encoding, path))
+                    elif not isinstance(value, str):
+                        value = value.decode()
+
+                    decoded_entry[key] = value
                 entry = decoded_entry
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
@@ -2758,24 +2774,11 @@ def writeToGitStream(self, gitMode, relPath, contents):
             self.gitStream.write(d)
         self.gitStream.write('\n')
 
-    def encodeWithUTF8(self, path):
-        try:
-            path.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
-        return path
-
     # output one file from the P4 stream
     # - helper for streamP4Files
 
     def streamOneP4File(self, file, contents):
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
         if verbose:
             if 'fileSize' in self.stream_file:
                 size = int(self.stream_file['fileSize'])
@@ -2858,7 +2861,6 @@ def streamOneP4File(self, file, contents):
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
-- 
2.24.0.windows.2


  parent reply	other threads:[~2019-11-28  1:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-28  1:28 [RFC PATCH 0/4] git-p4: python 3 compatability Yang Zhao
2019-11-28  1:28 ` [RFC PATCH 1/4] git-p4: decode response from p4 to str for python3 Yang Zhao
2019-11-28  1:28 ` [RFC PATCH 2/4] git-p4: properly encode/decode communication with git for python 3 Yang Zhao
2019-11-28  1:28 ` [RFC PATCH 3/4] git-p4: open .gitp4-usercache.txt in text mode Yang Zhao
2019-11-28  1:28 ` Yang Zhao [this message]
2019-11-28  2:57   ` [RFC PATCH 4/4] git-p4: use utf-8 encoding for file paths throughout Elijah Newren
2019-11-28 12:54 ` [RFC PATCH 0/4] git-p4: python 3 compatability Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191128012807.3103-5-yang.zhao@skyboxlabs.com \
    --to=yang.zhao@skyboxlabs.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).