git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3
@ 2019-11-13 21:07 Ben Keene via GitGitGadget
  2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-13 21:07 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

commit: git-p4.py: Cast byte strings to unicode strings in python3

I tried to run git-p4 under python3 and it failed with an error that it
could not connect to the P4 server. This is caused by the return values from
the process.popen returning byte strings and the code is failing when it is
comparing these with literal strings which are Unicode in Python 3.

To support this, I added a new function ustring() in the code that
determines if python is natively supporting Unicode (Python 3) or not
(Python 2). 

 * If the python version supports Unicode (Python 3), it will cast the text
   (expected a byte string) to UTF-8. This allows the existing code to match
   literal strings as expected.
   
   
 * If the python version does not natively support Unicode (Python 2) the
   ustring() function does not change the byte string, maintaining current
   behavior.
   
   

There are a few notable methods changed:

 * pipe functions have their output passed through the ustring() function:
   
    * read_pipe_full(c)
    * p4_has_move_command()
   
   
 * p4CmdList has new conditional code to parse the dictionary marshaled from
   the process call. Both the keys and values are converted to Unicode.
   
   
 * gitConfig passes the return value through ustring() so all calls to
   gitConfig return unicode values.
   
   

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (1):
  Cast byte strings to unicode strings in python3

 git-p4.py | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)


base-commit: d9f6f3b6195a0ca35642561e530798ad1469bd41
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/463
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/1] Cast byte strings to unicode strings in python3
  2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
@ 2019-11-13 21:07 ` " Ben Keene via GitGitGadget
  2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-13 21:07 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <bkeene@partswatch.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..6e8b3a26cd 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -36,12 +36,22 @@
     unicode = str
     bytes = bytes
     basestring = (str,bytes)
+    isunicode = True
+    def ustring(text):
+        """Returns the byte string as a unicode string"""
+        if text == '' or text == b'':
+            return ''
+        return unicode(text, "utf-8")
 else:
     # 'unicode' exists, must be Python 2
     str = str
     unicode = unicode
     bytes = str
     basestring = basestring
+    isunicode = False
+    def ustring(text):
+        """Returns the byte string unchanged"""
+        return text
 
 try:
     from subprocess import CalledProcessError
@@ -196,6 +206,8 @@ def read_pipe_full(c):
     expand = isinstance(c,basestring)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = ustring(out)
+    err = ustring(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -263,6 +275,7 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    err = ustring(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -646,10 +659,18 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
+                if b'code' in entry and entry[b'code'] == b'info':
+                    continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                if isunicode:
+                    out = {}
+                    for key, value in entry.items():
+                        out[ustring(key)] = ustring(value)
+                    result.append(out)
+                else:
+                    result.append(entry)
     except EOFError:
         pass
     exitCode = p4.wait()
@@ -792,7 +813,7 @@ def gitConfig(key, typeSpecifier=None):
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
         _gitConfig[key] = s.strip()
-    return _gitConfig[key]
+    return ustring(_gitConfig[key])
 
 def gitConfigBool(key):
     """Return a bool, using git config --bool.  It is True only if the
@@ -860,6 +881,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = ustring(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
@ 2019-11-14  2:25 ` " Junio C Hamano
  2019-11-14  9:46   ` Luke Diamand
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2019-11-14  2:25 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git, Ben Keene, Ben Keene via GitGitGadget

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> commit: git-p4.py: Cast byte strings to unicode strings in python3

Luke, this patch [*1*] came in my way, but I am hardly an expert on
Py2to3 and know nothing about P4.  Could you take a look at them,
please?

Thanks.


[References]

<0bca930ff82623bbef172b4cb6c36ef8e5c46098.1573679258.git.gitgitgadget@gmail.com>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
@ 2019-11-14  9:46   ` Luke Diamand
  0 siblings, 0 replies; 64+ messages in thread
From: Luke Diamand @ 2019-11-14  9:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Users, Ben Keene, Ben Keene via GitGitGadget

On Thu, 14 Nov 2019 at 02:25, Junio C Hamano <gitster@pobox.com> wrote:
>
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > commit: git-p4.py: Cast byte strings to unicode strings in python3
>
> Luke, this patch [*1*] came in my way, but I am hardly an expert on
> Py2to3 and know nothing about P4.  Could you take a look at them,
> please?
>
> Thanks.
>
>
> [References]
>
> <0bca930ff82623bbef172b4cb6c36ef8e5c46098.1573679258.git.gitgitgadget@gmail.com>

I just quickly tried it, and with git-p4 switched to using python3,
the unit tests fail.

$ make -C t T=t98*

But it looks like a reasonable approach, and with the demise of
Python2 fast approaching it would be good to get this fully working!

Luke

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 0/3] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
  2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
@ 2019-11-15 14:39 ` " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
                     ` (3 more replies)
  2 siblings, 4 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

git-p4.py: Cast byte strings to unicode strings in python3

I tried to run git-p4 under python3 and it failed with an error that it
could not connect to the P4 server. This PR covers updating the git-p4.py
python script to work with unicode strings in python3.

Changes since v1: Commit: (0435d0e) 2019-11-14

The problem was caused by the ustring() function being called on a string
that had already been cast as a unicode string. This second call to
ustring() would fail with an error of "decoding str is not supported"

The following changes were made to fix this:

The call to ustring() in the gitConfig() function is actually unnecessary
because the read_pipe() function returns unicode strings so the call has
been removed.

The ustring() function was given a new conditional test to see if the value
is already a unicode value. If it is, the value will be returned without any
casting.

These two changes should fix the immediate fail. However, I do not have an
environment that I can run the test suite against so I don't know if another
error will be uncovered yet. I'm still working on it.

v1: (Initial Commit)

This is caused by the return values from the process.popen returning byte
strings and the code is failing when it is comparing these with literal
strings which are Unicode in Python 3.

To support this, I added a new function ustring() in the code that
determines if python is natively supporting Unicode (Python 3) or not
(Python 2). 

 * If the python version supports Unicode (Python 3), it will cast the text
   (expected a byte string) to UTF-8. This allows the existing code to match
   literal strings as expected.
   
   
 * If the python version does not natively support Unicode (Python 2) the
   ustring() function does not change the byte string, maintaining current
   behavior.
   
   

There are a few notable methods changed:

 * pipe functions have their output passed through the ustring() function:
   
    * read_pipe_full(c)
    * p4_has_move_command()
   
   
 * p4CmdList has new conditional code to parse the dictionary marshaled from
   the process call. Both the keys and values are converted to Unicode.
   
   
 * gitConfig passes the return value through ustring() so all calls to
   gitConfig return unicode values.
   
   

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (3):
  Cast byte strings to unicode strings in python3
  FIX: cast as unicode fails when a value is already unicode
  FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read
    of the pointer file in ustring()

 git-p4.py | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)


base-commit: d9f6f3b6195a0ca35642561e530798ad1469bd41
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/463

Range-diff vs v1:

 1:  0bca930ff8 = 1:  0bca930ff8 Cast byte strings to unicode strings in python3
 -:  ---------- > 2:  0435d0e2cb FIX: cast as unicode fails when a value is already unicode
 -:  ---------- > 3:  2288690b94 FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 1/3] Cast byte strings to unicode strings in python3
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
@ 2019-11-15 14:39   ` " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <bkeene@partswatch.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..6e8b3a26cd 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -36,12 +36,22 @@
     unicode = str
     bytes = bytes
     basestring = (str,bytes)
+    isunicode = True
+    def ustring(text):
+        """Returns the byte string as a unicode string"""
+        if text == '' or text == b'':
+            return ''
+        return unicode(text, "utf-8")
 else:
     # 'unicode' exists, must be Python 2
     str = str
     unicode = unicode
     bytes = str
     basestring = basestring
+    isunicode = False
+    def ustring(text):
+        """Returns the byte string unchanged"""
+        return text
 
 try:
     from subprocess import CalledProcessError
@@ -196,6 +206,8 @@ def read_pipe_full(c):
     expand = isinstance(c,basestring)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = ustring(out)
+    err = ustring(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -263,6 +275,7 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    err = ustring(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -646,10 +659,18 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
+                if b'code' in entry and entry[b'code'] == b'info':
+                    continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                if isunicode:
+                    out = {}
+                    for key, value in entry.items():
+                        out[ustring(key)] = ustring(value)
+                    result.append(out)
+                else:
+                    result.append(entry)
     except EOFError:
         pass
     exitCode = p4.wait()
@@ -792,7 +813,7 @@ def gitConfig(key, typeSpecifier=None):
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
         _gitConfig[key] = s.strip()
-    return _gitConfig[key]
+    return ustring(_gitConfig[key])
 
 def gitConfigBool(key):
     """Return a bool, using git config --bool.  It is True only if the
@@ -860,6 +881,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = ustring(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
@ 2019-11-15 14:39   ` Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  3 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index 6e8b3a26cd..b088095b15 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -39,6 +39,8 @@
     isunicode = True
     def ustring(text):
         """Returns the byte string as a unicode string"""
+        if isinstance(text, unicode):
+            return text
         if text == '' or text == b'':
             return ''
         return unicode(text, "utf-8")
@@ -813,7 +815,7 @@ def gitConfig(key, typeSpecifier=None):
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
         _gitConfig[key] = s.strip()
-    return ustring(_gitConfig[key])
+    return _gitConfig[key]
 
 def gitConfigBool(key):
     """Return a bool, using git config --bool.  It is True only if the
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring()
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
@ 2019-11-15 14:39   ` Ben Keene via GitGitGadget
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  3 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index b088095b15..83f59ddca5 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -180,6 +180,11 @@ def die(msg):
         sys.exit(1)
 
 def write_pipe(c, stdin):
+    """Writes stdin to the command's stdin
+    Returns the number of bytes written.
+
+    Be aware - the byte count may change between 
+    Python2 and Python3"""
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
@@ -249,6 +254,11 @@ def read_pipe_lines(c):
     val = pipe.readlines()
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
+    # Unicode conversion from str
+    # Iterate and fix in-place to avoid a second list in memory.
+    if isunicode:
+        for i in range(len(val)):
+            val[i] = ustring(val[i])
 
     return val
 
@@ -1268,7 +1278,7 @@ def generatePointer(self, contentFile):
             ['git', 'lfs', 'pointer', '--file=' + contentFile],
             stdout=subprocess.PIPE
         )
-        pointerFile = pointerProcess.stdout.read()
+        pointerFile = ustring(pointerProcess.stdout.read())
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
                     ` (2 preceding siblings ...)
  2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
@ 2019-12-02 19:02   ` Ben Keene via GitGitGadget
  2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  3 siblings, 2 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-02 19:02 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

Issue: The current git-p4.py script does not work with python3.

I have attempted to use the P4 integration built into GIT and I was unable
to get the program to run because I have Python 3.8 installed on my
computer. I was able to get the program to run when I downgraded my python
to version 2.7. However, python 2 is reaching its end of life.

Submission: I am submitting a patch for the git-p4.py script that partially
supports python 3.8. This code was able to pass the basic tests (t9800) when
run against Python3. This provides basic functionality. 

In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
git P4 Clone was introduced. 

--encoding Format-identifier

This will create the GIT repository following the current functionality;
however, before importing the files from P4, it will set the
git-p4.pathEncoding option so any files or paths that are encoded with
non-ASCII/non-UTF-8 formats will import correctly.

Technical details: The script was updated by futurize (
https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
references to classes in future were reworked so that future would not be
required. The existing code test for Unicode support was extended to
normalize the classes “unicode” and “bytes” to across platforms:

 * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
 * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.

New coercion methods were written for both Python2 and Python3:

 * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
   encoded Unicode string. 
 * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
   bytes.

In Python2, these functions do not change the data since a ‘str’ object
function in both roles as strings and byte arrays. This reduces the
potential impact on backward compatibility with Python 2.

 * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
   string. This function will encode data in both Python2 and Python3. * 
      path_as_string(path) – This function is an extension function that
      honors the option “git-p4.pathEncoding” to convert a set of bytes or
      characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
      use the encodeWithUTF8() method to convert the custom encoded bytes to
      Unicode in UTF-8.
   
   

Generally speaking, information in the script is converted to Unicode as
early as possible and converted back to a byte array just before passing to
external programs or files. The exception to this rule is P4 Repository file
paths.

Paths are not converted but left as “bytes” so the original file path
encoding can be preserved. This formatting is required for commands that
interact with the P4 file path. When the file path is used by GIT, it is
converted with encodeWithUTF8().

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (1):
  Python3 support for t9800 tests. Basic P4/Python3 support

 git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 628 insertions(+), 197 deletions(-)


base-commit: d9f6f3b6195a0ca35642561e530798ad1469bd41
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/463

Range-diff vs v2:

 1:  0bca930ff8 < -:  ---------- Cast byte strings to unicode strings in python3
 2:  0435d0e2cb < -:  ---------- FIX: cast as unicode fails when a value is already unicode
 3:  2288690b94 < -:  ---------- FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring()
 -:  ---------- > 1:  02b3843e9f Python3 support for t9800 tests. Basic P4/Python3 support

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
@ 2019-12-02 19:02     ` Ben Keene via GitGitGadget
  2019-12-03  0:18       ` Denton Liu
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  1 sibling, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-02 19:02 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 628 insertions(+), 197 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..6f82184fe5 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -26,22 +26,87 @@
 import zlib
 import ctypes
 import errno
+import os.path
+import codecs
+import io
 
 # support basestring in python3
 try:
     unicode = unicode
 except NameError:
     # 'unicode' is undefined, must be Python 3
-    str = str
+    #
+    # For Python3 which is natively unicode, we will use 
+    # unicode for internal information but all P4 Data
+    # will remain in bytes
+    isunicode = True
     unicode = str
     bytes = bytes
-    basestring = (str,bytes)
+
+    def as_string(text):
+        """Return a byte array as a unicode string"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return unicode(text, "utf-8")
+        else:
+            return text
+
+    def as_bytes(text):
+        """Return a Unicode string as a byte array"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return text
+        else:
+            return bytes(text, "utf-8")
+
+    def to_unicode(text):
+        """Return a byte array as a unicode string"""
+        return as_string(text)    
+
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded string """
+        if isinstance(path, unicode):
+            return path
+        return encodeWithUTF8(path).decode('utf-8')
+    
 else:
     # 'unicode' exists, must be Python 2
-    str = str
+    #
+    # We will treat the data as:
+    #   str   -> str
+    #   bytes -> str
+    # So for Python2 these functions are no-ops
+    # and will leave the data in the ambiguious
+    # string/bytes state
+    isunicode = False
     unicode = unicode
     bytes = str
-    basestring = basestring
+
+    def as_string(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def as_bytes(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def to_unicode(text):
+        """Return a string as a unicode string"""
+        return text.decode('utf-8')
+    
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded bytes """
+        return encodeWithUTF8(path)
+
+
+ 
+# Check for raw_input support
+try:
+    raw_input
+except NameError:
+    raw_input = input
 
 try:
     from subprocess import CalledProcessError
@@ -75,7 +140,11 @@ def p4_build_cmd(cmd):
     location. It means that hooking into the environment, or other configuration
     can be done more easily.
     """
-    real_cmd = ["p4"]
+    # Look for the P4 binary
+    if (platform.system() == "Windows"):
+        real_cmd = ["p4.exe"]    
+    else:
+        real_cmd = ["p4"]
 
     user = gitConfig("git-p4.user")
     if len(user) > 0:
@@ -105,7 +174,7 @@ def p4_build_cmd(cmd):
         # Provide a way to not pass this option by setting git-p4.retries to 0
         real_cmd += ["-r", str(retries)]
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         real_cmd = ' '.join(real_cmd) + ' ' + cmd
     else:
         real_cmd += cmd
@@ -168,10 +237,11 @@ def die(msg):
         sys.exit(1)
 
 def write_pipe(c, stdin):
+    """Executes the command 'c', passing 'stdin' on the standard input"""
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
     pipe = p.stdin
     val = pipe.write(stdin)
@@ -179,11 +249,11 @@ def write_pipe(c, stdin):
     if p.wait():
         die('Command failed: %s' % str(c))
 
-    return val
 
 def p4_write_pipe(c, stdin):
+    """ Runs a P4 command 'c', passing 'stdin' data to P4"""
     real_cmd = p4_build_cmd(c)
-    return write_pipe(real_cmd, stdin)
+    write_pipe(real_cmd, stdin)
 
 def read_pipe_full(c):
     """ Read output from  command. Returns a tuple
@@ -193,9 +263,11 @@ def read_pipe_full(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = as_string(out)
+    err = as_string(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -222,19 +294,31 @@ def read_pipe_text(c):
         return out.rstrip()
 
 def p4_read_pipe(c, ignore_error=False):
+    """ Read output from the P4 command 'c'. Returns the output text on
+        success. On failure, terminates execution, unless
+        ignore_error is True, when it returns an empty string.
+    """
     real_cmd = p4_build_cmd(c)
     return read_pipe(real_cmd, ignore_error)
 
 def read_pipe_lines(c):
+    """ Returns a list of text from executing the command 'c'.
+        The program will die if the command fails to execute.
+    """
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c, basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     pipe = p.stdout
     val = pipe.readlines()
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
+    # Unicode conversion from byte-string
+    # Iterate and fix in-place to avoid a second list in memory.
+    if isunicode:
+        for i in range(len(val)):
+            val[i] = as_string(val[i])
 
     return val
 
@@ -263,6 +347,8 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    out=as_string(out)
+    err=as_string(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -272,7 +358,7 @@ def p4_has_move_command():
     return True
 
 def system(cmd, ignore_error=False):
-    expand = isinstance(cmd,basestring)
+    expand = not isinstance(cmd, list)
     if verbose:
         sys.stderr.write("executing %s\n" % str(cmd))
     retcode = subprocess.call(cmd, shell=expand)
@@ -282,9 +368,10 @@ def system(cmd, ignore_error=False):
     return retcode
 
 def p4_system(cmd):
-    """Specifically invoke p4 as the system command. """
+    """ Specifically invoke p4 as the system command. 
+    """
     real_cmd = p4_build_cmd(cmd)
-    expand = isinstance(real_cmd, basestring)
+    expand = not isinstance(real_cmd, list)
     retcode = subprocess.call(real_cmd, shell=expand)
     if retcode:
         raise CalledProcessError(retcode, real_cmd)
@@ -390,16 +477,20 @@ def p4_last_change():
     return int(results[0]['change'])
 
 def p4_describe(change, shelved=False):
-    """Make sure it returns a valid result by checking for
-       the presence of field "time".  Return a dict of the
-       results."""
+    """ Returns information about the requested P4 change list.
+
+        Data returns is not string encoded (returned as bytes)
+    """
+    # Make sure it returns a valid result by checking for
+    #   the presence of field "time".  Return a dict of the
+    #   results.
 
     cmd = ["describe", "-s"]
     if shelved:
         cmd += ["-S"]
     cmd += [str(change)]
 
-    ds = p4CmdList(cmd, skip_info=True)
+    ds = p4CmdList(cmd, skip_info=True, encode_data=False)
     if len(ds) != 1:
         die("p4 describe -s %d did not return 1 result: %s" % (change, str(ds)))
 
@@ -409,21 +500,31 @@ def p4_describe(change, shelved=False):
         die("p4 describe -s %d exited with %d: %s" % (change, d["p4ExitCode"],
                                                       str(d)))
     if "code" in d:
-        if d["code"] == "error":
+        if d["code"] == b"error":
             die("p4 describe -s %d returned error code: %s" % (change, str(d)))
 
     if "time" not in d:
         die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
 
+    # Convert depotFile(X) to be UTF-8 encoded, as this is what GIT
+    # requires. This will also allow us to encode the rest of the text
+    # at the same time to simplify textual processing later.
+    keys=d.keys()
+    for key in keys:
+        if key.startswith('depotFile'):
+            d[key]=d[key] #DepotPath(d[key])
+        elif key == 'path':
+            d[key]=d[key] #DepotPath(d[key])
+        else:
+            d[key] = as_string(d[key])
+
     return d
 
-#
-# Canonicalize the p4 type and return a tuple of the
-# base type, plus any modifiers.  See "p4 help filetypes"
-# for a list and explanation.
-#
 def split_p4_type(p4type):
-
+    """ Canonicalize the p4 type and return a tuple of the
+        base type, plus any modifiers.  See "p4 help filetypes"
+        for a list and explanation.
+    """
     p4_filetypes_historical = {
         "ctempobj": "binary+Sw",
         "ctext": "text+C",
@@ -452,18 +553,16 @@ def split_p4_type(p4type):
         mods = s[1]
     return (base, mods)
 
-#
-# return the raw p4 type of a file (text, text+ko, etc)
-#
 def p4_type(f):
+    """ return the raw p4 type of a file (text, text+ko, etc)
+    """
     results = p4CmdList(["fstat", "-T", "headType", wildcard_encode(f)])
     return results[0]['headType']
 
-#
-# Given a type base and modifier, return a regexp matching
-# the keywords that can be expanded in the file
-#
 def p4_keywords_regexp_for_type(base, type_mods):
+    """ Given a type base and modifier, return a regexp matching
+        the keywords that can be expanded in the file
+    """
     if base in ("text", "unicode", "binary"):
         kwords = None
         if "ko" in type_mods:
@@ -482,12 +581,11 @@ def p4_keywords_regexp_for_type(base, type_mods):
     else:
         return None
 
-#
-# Given a file, return a regexp matching the possible
-# RCS keywords that will be expanded, or None for files
-# with kw expansion turned off.
-#
 def p4_keywords_regexp_for_file(file):
+    """ Given a file, return a regexp matching the possible
+        RCS keywords that will be expanded, or None for files
+        with kw expansion turned off.
+    """
     if not os.path.exists(file):
         return None
     else:
@@ -522,7 +620,7 @@ def getP4OpenedType(file):
 # Return the set of all p4 labels
 def getP4Labels(depotPaths):
     labels = set()
-    if isinstance(depotPaths,basestring):
+    if not isinstance(depotPaths, list):
         depotPaths = [depotPaths]
 
     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
@@ -531,8 +629,8 @@ def getP4Labels(depotPaths):
 
     return labels
 
-# Return the set of all git tags
 def getGitTags():
+    """Return the set of all git tags"""
     gitTags = set()
     for line in read_pipe_lines(["git", "tag"]):
         tag = line.strip()
@@ -565,7 +663,7 @@ def parseDiffTreeEntry(entry):
 
     If the pattern is not matched, None is returned."""
 
-    match = diffTreePattern().next().match(entry)
+    match = next(diffTreePattern()).match(entry)
     if match:
         return {
             'src_mode': match.group(1),
@@ -584,6 +682,38 @@ def isModeExec(mode):
     # otherwise False.
     return mode[-3:] == "755"
 
+def encodeWithUTF8(path, verbose = False):
+    """ Ensure that the path is encoded as a UTF-8 string
+
+        Returns bytes(P3)/str(P2)
+    """
+   
+    if isunicode:
+        try:
+            if isinstance(path, unicode):
+                # It is already unicode, cast it as a bytes
+                # that is encoded as utf-8.
+                return path.encode('utf-8', 'strict')
+            path.decode('ascii', 'strict')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
+    else:    
+        try:
+            path.decode('ascii')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
+    return path
+
 class P4Exception(Exception):
     """ Base class for exceptions from the p4 client """
     def __init__(self, exit_code):
@@ -607,9 +737,25 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
-        errors_as_exceptions=False):
+        errors_as_exceptions=False, encode_data=True):
+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
+        standard input via a temporary file with 'stdin_mode' mode.
+
+        Output from the command is optionally passed to the callback function 'cb'.
+        If 'cb' is None, the response from the command is parsed into a list
+        of resulting dictionaries. (For each block read from the process pipe.)
+
+        If 'skip_info' is true, information in a block read that has a code type of
+        'info' will be skipped.
 
-    if isinstance(cmd,basestring):
+        If 'errors_as_exceptions' is set to true (the default is false) the error
+        code returned from the execution will generate an exception.
+
+        If 'encode_data' is set to true (the default) the data that is returned 
+        by this function will be passed through the "as_string" function.
+    """
+
+    if not isinstance(cmd, list):
         cmd = "-G " + cmd
         expand = True
     else:
@@ -626,11 +772,11 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     stdin_file = None
     if stdin is not None:
         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
-        if isinstance(stdin,basestring):
+        if not isinstance(stdin, list):
             stdin_file.write(stdin)
         else:
             for i in stdin:
-                stdin_file.write(i + '\n')
+                stdin_file.write(as_bytes(i) + b'\n')
         stdin_file.flush()
         stdin_file.seek(0)
 
@@ -644,12 +790,15 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         while True:
             entry = marshal.load(p4.stdout)
             if skip_info:
-                if 'code' in entry and entry['code'] == 'info':
+                if b'code' in entry and entry[b'code'] == b'info':
                     continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                out = {}
+                for key, value in entry.items():
+                    out[as_string(key)] = (as_string(value) if encode_data else value)
+                result.append(out)
     except EOFError:
         pass
     exitCode = p4.wait()
@@ -677,6 +826,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     return result
 
 def p4Cmd(cmd):
+    """ Executes a P4 command an returns the results in a dictionary"""
     list = p4CmdList(cmd)
     result = {}
     for entry in list:
@@ -772,6 +922,7 @@ def extractSettingsGitLog(log):
     return values
 
 def gitBranchExists(branch):
+    """Checks to see if a given branch exists in the git repo"""
     proc = subprocess.Popen(["git", "rev-parse", branch],
                             stderr=subprocess.PIPE, stdout=subprocess.PIPE);
     return proc.wait() == 0;
@@ -785,20 +936,22 @@ def gitDeleteRef(ref):
 _gitConfig = {}
 
 def gitConfig(key, typeSpecifier=None):
+    """ Return a configuration setting from GIT
+	"""
     if key not in _gitConfig:
         cmd = [ "git", "config" ]
         if typeSpecifier:
             cmd += [ typeSpecifier ]
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
-        _gitConfig[key] = s.strip()
+        _gitConfig[key] = as_string(s).strip()
     return _gitConfig[key]
 
 def gitConfigBool(key):
-    """Return a bool, using git config --bool.  It is True only if the
-       variable is set to true, and False if set to false or not present
-       in the config."""
-
+    """ Return a bool, using git config --bool.  It is True only if the
+        variable is set to true, and False if set to false or not present
+        in the config.
+    """
     if key not in _gitConfig:
         _gitConfig[key] = gitConfig(key, '--bool') == "true"
     return _gitConfig[key]
@@ -822,6 +975,11 @@ def gitConfigList(key):
             _gitConfig[key] = []
     return _gitConfig[key]
 
+def gitConfigSet(key, value):
+    """ Set the git configuration key 'key' to 'value' for this session
+    """
+    _gitConfig[key] = value
+
 def p4BranchesInGit(branchesAreInRemotes=True):
     """Find all the branches whose names start with "p4/", looking
        in remotes or heads as specified by the argument.  Return
@@ -860,6 +1018,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = as_string(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
@@ -869,7 +1028,7 @@ def findUpstreamBranchPoint(head = "HEAD"):
     branches = p4BranchesInGit()
     # map from depot-path to branch name
     branchByDepotPath = {}
-    for branch in branches.keys():
+    for branch in list(branches.keys()):
         tip = branches[branch]
         log = extractLogMessageFromGitCommit(tip)
         settings = extractSettingsGitLog(log)
@@ -940,7 +1099,8 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/", silent
             system("git update-ref %s %s" % (remoteHead, originHead))
 
 def originP4BranchesExist():
-        return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
+    """Checks if origin/p4/master exists"""
+    return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
 
 
 def p4ParseNumericChangeRange(parts):
@@ -1035,7 +1195,7 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize):
     changes = sorted(changes)
     return changes
 
-def p4PathStartsWith(path, prefix):
+def p4PathStartsWith(path, prefix, verbose = False):
     # This method tries to remedy a potential mixed-case issue:
     #
     # If UserA adds  //depot/DirA/file1
@@ -1043,9 +1203,22 @@ def p4PathStartsWith(path, prefix):
     #
     # we may or may not have a problem. If you have core.ignorecase=true,
     # we treat DirA and dira as the same directory
+    
+    # Since we have to deal with mixed encodings for p4 file
+    # paths, first perform a simple startswith check, this covers
+    # the case that the formats and path are identical.
+    if as_bytes(path).startswith(as_bytes(prefix)):
+        return True
+    
+    # attempt to convert the prefix and path both to utf8
+    path_utf8 = encodeWithUTF8(path)
+    prefix_utf8 = encodeWithUTF8(prefix)
+
     if gitConfigBool("core.ignorecase"):
-        return path.lower().startswith(prefix.lower())
-    return path.startswith(prefix)
+        # Check if we match byte-per-byte.  
+        
+        return path_utf8.lower().startswith(prefix_utf8.lower())
+    return path_utf8.startswith(prefix_utf8)
 
 def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
@@ -1063,7 +1236,7 @@ def getClientSpec():
     client_name = entry["Client"]
 
     # just the keys that start with "View"
-    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
 
     # hold this new View
     view = View(client_name)
@@ -1101,18 +1274,24 @@ def wildcard_decode(path):
     # Cannot have * in a filename in windows; untested as to
     # what p4 would do in such a case.
     if not platform.system() == "Windows":
-        path = path.replace("%2A", "*")
-    path = path.replace("%23", "#") \
-               .replace("%40", "@") \
-               .replace("%25", "%")
+        path = path.replace(b"%2A", b"*")
+    path = path.replace(b"%23", b"#") \
+               .replace(b"%40", b"@") \
+               .replace(b"%25", b"%")
     return path
 
 def wildcard_encode(path):
     # do % first to avoid double-encoding the %s introduced here
-    path = path.replace("%", "%25") \
-               .replace("*", "%2A") \
-               .replace("#", "%23") \
-               .replace("@", "%40")
+    if isinstance(path, unicode):
+        path = path.replace("%", "%25") \
+                   .replace("*", "%2A") \
+                   .replace("#", "%23") \
+                   .replace("@", "%40")
+    else:
+        path = path.replace(b"%", b"%25") \
+                   .replace(b"*", b"%2A") \
+                   .replace(b"#", b"%23") \
+                   .replace(b"@", b"%40")
     return path
 
 def wildcard_present(path):
@@ -1244,7 +1423,7 @@ def generatePointer(self, contentFile):
             ['git', 'lfs', 'pointer', '--file=' + contentFile],
             stdout=subprocess.PIPE
         )
-        pointerFile = pointerProcess.stdout.read()
+        pointerFile = as_string(pointerProcess.stdout.read())
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
@@ -1305,7 +1484,7 @@ def processContent(self, git_mode, relPath, contents):
         else:
             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
 
-class Command:
+class Command(object):
     delete_actions = ( "delete", "move/delete", "purge" )
     add_actions = ( "add", "branch", "move/add" )
 
@@ -1320,7 +1499,7 @@ def ensure_value(self, attr, value):
             setattr(self, attr, value)
         return getattr(self, attr)
 
-class P4UserMap:
+class P4UserMap(object):
     def __init__(self):
         self.userMapFromPerforceServer = False
         self.myP4UserId = None
@@ -1345,10 +1524,14 @@ def p4UserIsMe(self, p4User):
             return True
 
     def getUserCacheFilename(self):
+        """ Returns the filename of the username cache """
         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
-        return home + "/.gitp4-usercache.txt"
+        return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
+        """ Creates the usercache from the data in P4.
+        """
+        
         if self.userMapFromPerforceServer:
             return
         self.users = {}
@@ -1371,21 +1554,24 @@ def getUserMapFromPerforceServer(self):
                 self.emails[email] = user
 
         s = ''
-        for (key, val) in self.users.items():
+        for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
-        open(self.getUserCacheFilename(), "wb").write(s)
+        cache = io.open(self.getUserCacheFilename(), "wb")
+        cache.write(as_bytes(s))
+        cache.close()
         self.userMapFromPerforceServer = True
 
     def loadUserMapFromCache(self):
+        """ Reads the P4 username to git email map """
         self.users = {}
         self.userMapFromPerforceServer = False
         try:
-            cache = open(self.getUserCacheFilename(), "rb")
+            cache = io.open(self.getUserCacheFilename(), "rb")
             lines = cache.readlines()
             cache.close()
             for line in lines:
-                entry = line.strip().split("\t")
+                entry = as_string(line).strip().split("\t")
                 self.users[entry[0]] = entry[1]
         except IOError:
             self.getUserMapFromPerforceServer()
@@ -1585,21 +1771,27 @@ def prepareLogMessage(self, template, message, jobs):
         return result
 
     def patchRCSKeywords(self, file, pattern):
-        # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
+        """ Attempt to zap the RCS keywords in a p4 
+            controlled file matching the given pattern
+        """
+        bSubLine = as_bytes(r'$\1$')
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
-            outFile = os.fdopen(handle, "w+")
-            inFile = open(file, "r")
-            regexp = re.compile(pattern, re.VERBOSE)
+            outFile = os.fdopen(handle, "w+b")
+            inFile = open(file, "rb")
+            regexp = re.compile(as_bytes(pattern), re.VERBOSE)
             for line in inFile.readlines():
-                line = regexp.sub(r'$\1$', line)
+                line = regexp.sub(bSubLine, line)
                 outFile.write(line)
             inFile.close()
             outFile.close()
+            outFile = None
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
         except:
+            if outFile != None:
+                outFile.close()
             # cleanup our temporary file
             os.unlink(outFileName)
             print("Failed to strip RCS keywords in %s" % file)
@@ -1722,14 +1914,14 @@ def prepareSubmitTemplate(self, changelist=None):
                 break
         if not change_entry:
             die('Failed to decode output of p4 change -o')
-        for key, value in change_entry.iteritems():
+        for key, value in list(change_entry.items()):
             if key.startswith('File'):
                 if 'depot-paths' in settings:
                     if not [p for p in settings['depot-paths']
-                            if p4PathStartsWith(value, p)]:
+                            if p4PathStartsWith(value, p, self.verbose)]:
                         continue
                 else:
-                    if not p4PathStartsWith(value, self.depotPath):
+                    if not p4PathStartsWith(value, self.depotPath, self.verbose):
                         continue
                 files_list.append(value)
                 continue
@@ -1779,7 +1971,8 @@ def edit_template(self, template_file):
             return True
 
         while True:
-            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
+                .strip()[0]
             if response == 'y':
                 return True
             if response == 'n':
@@ -1817,8 +2010,8 @@ def get_diff_description(self, editedFiles, filesToAdd, symlinks):
     def applyCommit(self, id):
         """Apply one commit, return True if it succeeded."""
 
-        print("Applying", read_pipe(["git", "show", "-s",
-                                     "--format=format:%h %s", id]))
+        print(("Applying", read_pipe(["git", "show", "-s",
+                                     "--format=format:%h %s", id])))
 
         (p4User, gitEmail) = self.p4UserForCommit(id)
 
@@ -1939,8 +2132,23 @@ def applyCommit(self, id):
                     # disable the read-only bit on windows.
                     if self.isWindows and file not in editedFiles:
                         os.chmod(file, stat.S_IWRITE)
-                    self.patchRCSKeywords(file, kwfiles[file])
-                    fixed_rcs_keywords = True
+                    
+                    try:
+                        self.patchRCSKeywords(file, kwfiles[file])
+                        fixed_rcs_keywords = True
+                    except:
+                        # We are throwing an exception, undo all open edits
+                        for f in editedFiles:
+                            p4_revert(f)
+                        raise
+            else:
+                # They do not have attemptRCSCleanup set, this might be the fail point
+                # Check to see if the file has RCS keywords and suggest setting the property.
+                for file in editedFiles | filesToDelete:
+                    if p4_keywords_regexp_for_file(file) != None:
+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
+                        break
 
             if fixed_rcs_keywords:
                 print("Retrying the patch with RCS keywords cleaned up")
@@ -1966,7 +2174,7 @@ def applyCommit(self, id):
             p4_delete(f)
 
         # Set/clear executable bits
-        for f in filesToChangeExecBit.keys():
+        for f in list(filesToChangeExecBit.keys()):
             mode = filesToChangeExecBit[f]
             setP4ExecBit(f, mode)
 
@@ -2003,7 +2211,7 @@ def applyCommit(self, id):
         tmpFile = os.fdopen(handle, "w+b")
         if self.isWindows:
             submitTemplate = submitTemplate.replace("\n", "\r\n")
-        tmpFile.write(submitTemplate)
+        tmpFile.write(as_bytes(submitTemplate))
         tmpFile.close()
 
         if self.prepare_p4_only:
@@ -2053,8 +2261,8 @@ def applyCommit(self, id):
                 message = tmpFile.read()
                 tmpFile.close()
                 if self.isWindows:
-                    message = message.replace("\r\n", "\n")
-                submitTemplate = message[:message.index(separatorLine)]
+                    message = message.replace(b"\r\n", b"\n")
+                submitTemplate = message[:message.index(as_bytes(separatorLine))]
 
                 if update_shelve:
                     p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
@@ -2164,6 +2372,50 @@ def exportGitTags(self, gitTags):
                 if verbose:
                     print("created p4 label for tag %s" % name)
 
+    def run_hook(self, hook_name, args = []):
+        """ Runs a hook if it is found.
+
+            Returns NONE if the hook does not exist
+            Returns TRUE if the exit code is 0, FALSE for a non-zero exit code.
+        """
+        hook_file = self.find_hook(hook_name)
+        if hook_file == None:
+            if self.verbose:
+                print("Skipping hook: %s" % hook_name)
+            return None
+
+        if self.verbose:
+            print("hooks_path = %s " % hooks_path)
+            print("hook_file = %s " % hook_file)
+
+        # Run the hook
+        # TODO - allow non-list format
+        cmd = [hook_file] + args
+        return subprocess.call(cmd) == 0
+
+    def find_hook(self, hook_name):
+        """ Locates the hook file for the given operating system.
+        """
+        hooks_path = gitConfig("core.hooksPath")
+        if len(hooks_path) <= 0:
+            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
+
+        # Look in the obvious place
+        hook_file = os.path.join(hooks_path, hook_name)
+        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK):
+            return hook_file
+
+        # if we are windows, we will also allow them to have the hooks have extensions
+        if (platform.system() == "Windows"):
+            for ext in ['.exe', '.bat', 'ps1']:
+                if os.path.isfile(hook_file + ext) and os.access(hook_file + ext, os.X_OK):
+                    return hook_file + ext
+
+        # We didn't find the file
+        return None
+
+
+
     def run(self, args):
         if len(args) == 0:
             self.master = currentGitBranch()
@@ -2219,7 +2471,7 @@ def run(self, args):
             self.clientSpecDirs = getClientSpec()
 
         # Check for the existence of P4 branches
-        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
 
         if self.useClientSpec and not branchesDetected:
             # all files are relative to the client spec
@@ -2314,12 +2566,8 @@ def run(self, args):
             sys.exit("number of commits (%d) must match number of shelved changelist (%d)" %
                      (len(commits), num_shelves))
 
-        hooks_path = gitConfig("core.hooksPath")
-        if len(hooks_path) <= 0:
-            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
-
-        hook_file = os.path.join(hooks_path, "p4-pre-submit")
-        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK) and subprocess.call([hook_file]) != 0:
+        rtn = self.run_hook("p4-pre-submit")
+        if rtn == False:
             sys.exit(1)
 
         #
@@ -2332,8 +2580,8 @@ def run(self, args):
         last = len(commits) - 1
         for i, commit in enumerate(commits):
             if self.dry_run:
-                print(" ", read_pipe(["git", "show", "-s",
-                                      "--format=format:%h %s", commit]))
+                print((" ", read_pipe(["git", "show", "-s",
+                                      "--format=format:%h %s", commit])))
                 ok = True
             else:
                 ok = self.applyCommit(commit)
@@ -2351,7 +2599,7 @@ def run(self, args):
                         if self.conflict_behavior == "ask":
                             print("What do you want to do?")
                             response = raw_input("[s]kip this commit but apply"
-                                                 " the rest, or [q]uit? ")
+                                                 " the rest, or [q]uit? ").lower().strip()[0]
                             if not response:
                                 continue
                         elif self.conflict_behavior == "skip":
@@ -2403,8 +2651,8 @@ def run(self, args):
                         star = "*"
                     else:
                         star = " "
-                    print(star, read_pipe(["git", "show", "-s",
-                                           "--format=format:%h %s",  c]))
+                    print((star, read_pipe(["git", "show", "-s",
+                                           "--format=format:%h %s",  c])))
                 print("You will have to do 'git p4 sync' and rebase.")
 
         if gitConfigBool("git-p4.exportLabels"):
@@ -2533,6 +2781,7 @@ def cloneExcludeCallback(option, opt_str, value, parser):
     # ("-//depot/A/..." becomes "/depot/A/..." after option parsing)
     parser.values.cloneExclude += ["/" + re.sub(r"\.\.\.$", "", value)]
 
+
 class P4Sync(Command, P4UserMap):
 
     def __init__(self):
@@ -2610,7 +2859,7 @@ def __init__(self):
         self.knownBranches = {}
         self.initialParents = {}
 
-        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
         self.labels = {}
 
     # Force a checkpoint in fast-import and wait for it to finish
@@ -2624,17 +2873,23 @@ def checkpoint(self):
     def isPathWanted(self, path):
         for p in self.cloneExclude:
             if p.endswith("/"):
-                if p4PathStartsWith(path, p):
+                if p4PathStartsWith(path, p, self.verbose):
                     return False
             # "-//depot/file1" without a trailing "/" should only exclude "file1", but not "file111" or "file1_dir/file2"
             elif path.lower() == p.lower():
                 return False
         for p in self.depotPaths:
-            if p4PathStartsWith(path, p):
+            if p4PathStartsWith(path, p, self.verbose):
                 return True
         return False
 
     def extractFilesFromCommit(self, commit, shelved=False, shelved_cl = 0):
+        """ Generates the list of files to be added in this git commit.
+
+            commit     = Unicode[] - data read from the P4 commit
+            shelved    = Bool      - Is the P4 commit flagged as being shelved.
+            shelved_cl = Unicode   - Numeric string with the changelist number.
+        """
         files = []
         fnum = 0
         while "depotFile%s" % fnum in commit:
@@ -2676,7 +2931,7 @@ def stripRepoPath(self, path, prefixes):
             path = self.clientSpecDirs.map_in_client(path)
             if self.detectBranches:
                 for b in self.knownBranches:
-                    if p4PathStartsWith(path, b + "/"):
+                    if p4PathStartsWith(path, b + "/", self.verbose):
                         path = path[len(b)+1:]
 
         elif self.keepRepoPath:
@@ -2684,12 +2939,12 @@ def stripRepoPath(self, path, prefixes):
             # //depot/; just look at first prefix as they all should
             # be in the same depot.
             depot = re.sub("^(//[^/]+/).*", r'\1', prefixes[0])
-            if p4PathStartsWith(path, depot):
+            if p4PathStartsWith(path, depot, self.verbose):
                 path = path[len(depot):]
 
         else:
             for p in prefixes:
-                if p4PathStartsWith(path, p):
+                if p4PathStartsWith(path, p, self.verbose):
                     path = path[len(p):]
                     break
 
@@ -2697,8 +2952,11 @@ def stripRepoPath(self, path, prefixes):
         return path
 
     def splitFilesIntoBranches(self, commit):
-        """Look at each depotFile in the commit to figure out to what
-           branch it belongs."""
+        """ Look at each depotFile in the commit to figure out to what
+            branch it belongs.
+
+            Data in the commit will NOT be encoded
+        """
 
         if self.clientSpecDirs:
             files = self.extractFilesFromCommit(commit)
@@ -2727,10 +2985,10 @@ def splitFilesIntoBranches(self, commit):
             else:
                 relPath = self.stripRepoPath(path, self.depotPaths)
 
-            for branch in self.knownBranches.keys():
+            for branch in list(self.knownBranches.keys()):
                 # add a trailing slash so that a commit into qt/4.2foo
                 # doesn't end up in qt/4.2, e.g.
-                if p4PathStartsWith(relPath, branch + "/"):
+                if p4PathStartsWith(relPath, branch + "/", self.verbose):
                     if branch not in branches:
                         branches[branch] = []
                     branches[branch].append(file)
@@ -2739,36 +2997,34 @@ def splitFilesIntoBranches(self, commit):
         return branches
 
     def writeToGitStream(self, gitMode, relPath, contents):
-        self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+        """ Writes the bytes[] 'contents' to the git fast-import
+            with the given 'gitMode' and 'relPath' as the relative
+            path.
+        """
+        self.gitStream.write('M %s inline %s\n' % (gitMode, as_string(relPath)))
         self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
         for d in contents:
-            self.gitStream.write(d)
+            self.gitStreamBytes.write(d)
         self.gitStream.write('\n')
 
-    def encodeWithUTF8(self, path):
-        try:
-            path.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
-        return path
-
-    # output one file from the P4 stream
-    # - helper for streamP4Files
-
     def streamOneP4File(self, file, contents):
+        """ output one file from the P4 stream to the git inbound stream.
+            helper for streamP4files.
+
+            contents should be a bytes (bytes) 
+        """
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             if 'fileSize' in self.stream_file:
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
+            #if isunicode:
+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), to_unicode(relPath), size//1024//1024))
+            #else:
+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), relPath, size//1024//1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2786,7 +3042,7 @@ def streamOneP4File(self, file, contents):
                 # to nothing.  This causes p4 errors when checking out such
                 # a change, and errors here too.  Work around it by ignoring
                 # the bad symlink; hopefully a future change fixes it.
-                print("\nIgnoring empty symlink in %s" % file['depotFile'])
+                print("\nIgnoring empty symlink in %s" % path_as_string(file['depotFile']))
                 return
             elif data[-1] == '\n':
                 contents = [data[:-1]]
@@ -2826,16 +3082,16 @@ def streamOneP4File(self, file, contents):
             # Ideally, someday, this script can learn how to generate
             # appledouble files directly and import those to git, but
             # non-mac machines can never find a use for apple filetype.
-            print("\nIgnoring apple filetype file %s" % file['depotFile'])
+            print("\nIgnoring apple filetype file %s" % path_as_string(file['depotFile']))
             return
 
         # Note that we do not try to de-mangle keywords on utf16 files,
         # even though in theory somebody may want that.
-        pattern = p4_keywords_regexp_for_type(type_base, type_mods)
+        pattern = as_bytes(p4_keywords_regexp_for_type(type_base, type_mods))
         if pattern:
             regexp = re.compile(pattern, re.VERBOSE)
-            text = ''.join(contents)
-            text = regexp.sub(r'$\1$', text)
+            text = b''.join(contents)
+            text = regexp.sub(as_bytes(r'$\1$'), text)
             contents = [ text ]
 
         if self.largeFileSystem:
@@ -2845,7 +3101,7 @@ def streamOneP4File(self, file, contents):
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
@@ -2854,21 +3110,25 @@ def streamOneP4Deletion(self, file):
         if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
             self.largeFileSystem.removeLargeFile(relPath)
 
-    # handle another chunk of streaming data
     def streamP4FilesCb(self, marshalled):
+        """ Callback function for recording P4 chunks of data for streaming 
+            into GIT.
+
+            marshalled data is bytes[] from the caller
+        """
 
         # catch p4 errors and complain
         err = None
-        if "code" in marshalled:
-            if marshalled["code"] == "error":
-                if "data" in marshalled:
-                    err = marshalled["data"].rstrip()
+        if b"code" in marshalled:
+            if marshalled[b"code"] == b"error":
+                if b"data" in marshalled:
+                    err = marshalled[b"data"].rstrip()
 
         if not err and 'fileSize' in self.stream_file:
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
             if required_bytes > 0:
                 err = 'Not enough space left on %s! Free at least %i MB.' % (
-                    os.getcwd(), required_bytes/1024/1024
+                    os.getcwd(), required_bytes//1024//1024
                 )
 
         if err:
@@ -2884,11 +3144,11 @@ def streamP4FilesCb(self, marshalled):
             # ignore errors, but make sure it exits first
             self.importProcess.wait()
             if f:
-                die("Error from p4 print for %s: %s" % (f, err))
+                die("Error from p4 print for %s: %s" % (path_as_string(f), err))
             else:
                 die("Error from p4 print: %s" % err)
 
-        if 'depotFile' in marshalled and self.stream_have_file_info:
+        if b'depotFile' in marshalled and self.stream_have_file_info:
             # start of a new file - output the old one first
             self.streamOneP4File(self.stream_file, self.stream_contents)
             self.stream_file = {}
@@ -2897,14 +3157,17 @@ def streamP4FilesCb(self, marshalled):
 
         # pick up the new file information... for the
         # 'data' field we need to append to our array
-        for k in marshalled.keys():
-            if k == 'data':
+        for k in list(marshalled.keys()):
+            if k == b'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
-                self.stream_file['streamContentSize'] += len(marshalled['data'])
-                self.stream_contents.append(marshalled['data'])
+                self.stream_file['streamContentSize'] += len(marshalled[b'data'])
+                self.stream_contents.append(marshalled[b'data'])
             else:
-                self.stream_file[k] = marshalled[k]
+                if k == b'depotFile':
+                    self.stream_file[as_string(k)] = marshalled[k]
+                else:
+                    self.stream_file[as_string(k)] = as_string(marshalled[k])
 
         if (verbose and
             'streamContentSize' in self.stream_file and
@@ -2912,14 +3175,15 @@ def streamP4FilesCb(self, marshalled):
             'depotFile' in self.stream_file):
             size = int(self.stream_file["fileSize"])
             if size > 0:
-                progress = 100*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
+                progress = 100.0*self.stream_file['streamContentSize']/size
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
 
-    # Stream directly from "p4 files" into "git fast-import"
     def streamP4Files(self, files):
+        """ Stream directly from "p4 files" into "git fast-import" 
+        """
         filesForCommit = []
         filesToRead = []
         filesToDelete = []
@@ -2940,7 +3204,7 @@ def streamP4Files(self, files):
             self.stream_contents = []
             self.stream_have_file_info = False
 
-            # curry self argument
+            # Callback for P4 command to collect file content
             def streamP4FilesCbSelf(entry):
                 self.streamP4FilesCb(entry)
 
@@ -2949,9 +3213,9 @@ def streamP4FilesCbSelf(entry):
                 if 'shelved_cl' in f:
                     # Handle shelved CLs using the "p4 print file@=N" syntax to print
                     # the contents
-                    fileArg = '%s@=%d' % (f['path'], f['shelved_cl'])
+                    fileArg = b'%s@=%d' % (f['path'], as_bytes(f['shelved_cl']))
                 else:
-                    fileArg = '%s#%s' % (f['path'], f['rev'])
+                    fileArg = b'%s#%s' % (f['path'], as_bytes(f['rev']))
 
                 fileArgs.append(fileArg)
 
@@ -2971,7 +3235,7 @@ def make_email(self, userid):
 
     def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
         """ Stream a p4 tag.
-        commit is either a git commit, or a fast-import mark, ":<p4commit>"
+            commit is either a git commit, or a fast-import mark, ":<p4commit>"
         """
 
         if verbose:
@@ -2994,7 +3258,7 @@ def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
 
         gitStream.write("tagger %s\n" % tagger)
 
-        print("labelDetails=",labelDetails)
+        print(("labelDetails=",labelDetails))
         if 'Description' in labelDetails:
             description = labelDetails['Description']
         else:
@@ -3016,7 +3280,7 @@ def hasBranchPrefix(self, path):
         if not self.branchPrefixes:
             return True
         hasPrefix = [p for p in self.branchPrefixes
-                        if p4PathStartsWith(path, p)]
+                        if p4PathStartsWith(path, p, self.verbose)]
         if not hasPrefix and self.verbose:
             print('Ignoring file outside of prefix: {0}'.format(path))
         return hasPrefix
@@ -3043,7 +3307,22 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
                 .format(details['change']))
             return
 
+        # fast-import:
+        #'commit' SP <ref> LF
+	    #mark?
+	    #original-oid?
+	    #('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
+	    #'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+	    #('encoding' SP <encoding>)?
+	    #data
+	    #('from' SP <commit-ish> LF)?
+	    #('merge' SP <commit-ish> LF)*
+	    #(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
+	    #LF?
+        
+        #'commit' - <ref> is the name of the branch to make the commit on
         self.gitStream.write("commit %s\n" % branch)
+        #'mark' SP :<idnum>
         self.gitStream.write("mark :%s\n" % details["change"])
         self.committedChanges.add(int(details["change"]))
         committer = ""
@@ -3053,19 +3332,29 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
 
         self.gitStream.write("committer %s\n" % committer)
 
-        self.gitStream.write("data <<EOT\n")
-        self.gitStream.write(details["desc"])
+        # Per https://git-scm.com/docs/git-fast-import
+        # The preferred method for creating the commit message is to supply the 
+        # byte count in the data method and not to use a Delimited format. 
+        # Collect all the text in the commit message into a single string and 
+        # compute the byte count.
+        commitText = details["desc"]
         if len(jobs) > 0:
-            self.gitStream.write("\nJobs: %s" % (' '.join(jobs)))
-
+            commitText += "\nJobs: %s" % (' '.join(jobs))
         if not self.suppress_meta_comment:
-            self.gitStream.write("\n[git-p4: depot-paths = \"%s\": change = %s" %
-                                (','.join(self.branchPrefixes), details["change"]))
-            if len(details['options']) > 0:
-                self.gitStream.write(": options = %s" % details['options'])
-            self.gitStream.write("]\n")
+            # coherce the path to the correct formatting in the branch prefixes as well.
+            dispPaths = []
+            for p in self.branchPrefixes:
+                dispPaths += [path_as_string(p)]
 
-        self.gitStream.write("EOT\n\n")
+            commitText += ("\n[git-p4: depot-paths = \"%s\": change = %s" %
+                                (','.join(dispPaths), details["change"]))
+            if len(details['options']) > 0:
+                commitText += (": options = %s" % details['options'])
+            commitText += "]"
+        commitText += "\n" 
+        self.gitStream.write("data %s\n" % len(as_bytes(commitText)))
+        self.gitStream.write(commitText)
+        self.gitStream.write("\n")
 
         if len(parent) > 0:
             if self.verbose:
@@ -3133,7 +3422,7 @@ def getLabels(self):
             self.labels[newestChange] = [output, revisions]
 
         if self.verbose:
-            print("Label changes: %s" % self.labels.keys())
+            print("Label changes: %s" % list(self.labels.keys()))
 
     # Import p4 labels as git tags. A direct mapping does not
     # exist, so assume that if all the files are at the same revision
@@ -3234,7 +3523,7 @@ def getBranchMapping(self):
                 source = paths[0]
                 destination = paths[1]
                 ## HACK
-                if p4PathStartsWith(source, self.depotPaths[0]) and p4PathStartsWith(destination, self.depotPaths[0]):
+                if p4PathStartsWith(source, self.depotPaths[0], self.verbose) and p4PathStartsWith(destination, self.depotPaths[0], self.verbose):
                     source = source[len(self.depotPaths[0]):-4]
                     destination = destination[len(self.depotPaths[0]):-4]
 
@@ -3276,7 +3565,7 @@ def getBranchMapping(self):
 
     def getBranchMappingFromGitBranches(self):
         branches = p4BranchesInGit(self.importIntoRemotes)
-        for branch in branches.keys():
+        for branch in list(branches.keys()):
             if branch == "master":
                 branch = "main"
             else:
@@ -3388,14 +3677,14 @@ def importChanges(self, changes, origin_revision=0):
             self.updateOptionDict(description)
 
             if not self.silent:
-                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
                 sys.stdout.flush()
             cnt = cnt + 1
 
             try:
                 if self.detectBranches:
                     branches = self.splitFilesIntoBranches(description)
-                    for branch in branches.keys():
+                    for branch in list(branches.keys()):
                         ## HACK  --hwn
                         branchPrefix = self.depotPaths[0] + branch + "/"
                         self.branchPrefixes = [ branchPrefix ]
@@ -3464,6 +3753,7 @@ def importChanges(self, changes, origin_revision=0):
                 sys.exit(1)
 
     def sync_origin_only(self):
+        """ Ensures that the origin has been synchronized if one is set """
         if self.syncWithOrigin:
             self.hasOrigin = originP4BranchesExist()
             if self.hasOrigin:
@@ -3472,30 +3762,35 @@ def sync_origin_only(self):
                 system("git fetch origin")
 
     def importHeadRevision(self, revision):
-        print("Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch))
-
+        # Re-encode depot text
+        dispPaths = []
+        utf8Paths = []
+        for p in self.depotPaths:
+            dispPaths += [path_as_string(p)]
+        print("Doing initial import of %s from revision %s into %s" % (' '.join(dispPaths), revision, self.branch))
         details = {}
         details["user"] = "git perforce import user"
-        details["desc"] = ("Initial import of %s from the state at revision %s\n"
-                           % (' '.join(self.depotPaths), revision))
+        details["desc"] = ("Initial import of %s from the state at revision %s\n" %
+                           (' '.join(dispPaths), revision))
         details["change"] = revision
         newestRevision = 0
+        del dispPaths
 
         fileCnt = 0
         fileArgs = ["%s...%s" % (p,revision) for p in self.depotPaths]
 
-        for info in p4CmdList(["files"] + fileArgs):
+        for info in p4CmdList(["files"] + fileArgs, encode_data = False):
 
-            if 'code' in info and info['code'] == 'error':
+            if 'code' in info and info['code'] == b'error':
                 sys.stderr.write("p4 returned an error: %s\n"
-                                 % info['data'])
-                if info['data'].find("must refer to client") >= 0:
+                                 % as_string(info['data']))
+                if info['data'].find(b"must refer to client") >= 0:
                     sys.stderr.write("This particular p4 error is misleading.\n")
                     sys.stderr.write("Perhaps the depot path was misspelled.\n");
                     sys.stderr.write("Depot path:  %s\n" % " ".join(self.depotPaths))
                 sys.exit(1)
             if 'p4ExitCode' in info:
-                sys.stderr.write("p4 exitcode: %s\n" % info['p4ExitCode'])
+                sys.stderr.write("p4 exitcode: %s\n" % as_string(info['p4ExitCode']))
                 sys.exit(1)
 
 
@@ -3508,8 +3803,10 @@ def importHeadRevision(self, revision):
                 #fileCnt = fileCnt + 1
                 continue
 
+            # Save all the file information, howerver do not translate the depotFile name at 
+            # this time. Leave that as bytes since the encoding may vary.
             for prop in ["depotFile", "rev", "action", "type" ]:
-                details["%s%s" % (prop, fileCnt)] = info[prop]
+                details["%s%s" % (prop, fileCnt)] = (info[prop] if prop == "depotFile" else as_string(info[prop]))
 
             fileCnt = fileCnt + 1
 
@@ -3529,13 +3826,18 @@ def importHeadRevision(self, revision):
             print(self.gitError.read())
 
     def openStreams(self):
+        """ Opens the fast import pipes.  Note that the git* streams are wrapped
+            to expect Unicode text.  To send a raw byte Array, use the importProcess
+            underlying port
+        """
         self.importProcess = subprocess.Popen(["git", "fast-import"],
                                               stdin=subprocess.PIPE,
                                               stdout=subprocess.PIPE,
                                               stderr=subprocess.PIPE);
-        self.gitOutput = self.importProcess.stdout
-        self.gitStream = self.importProcess.stdin
-        self.gitError = self.importProcess.stderr
+        self.gitOutput = Py23File(self.importProcess.stdout, verbose = self.verbose)
+        self.gitStream = Py23File(self.importProcess.stdin, verbose = self.verbose)
+        self.gitError = Py23File(self.importProcess.stderr, verbose = self.verbose)
+        self.gitStreamBytes = self.importProcess.stdin
 
     def closeStreams(self):
         self.gitStream.close()
@@ -3584,13 +3886,13 @@ def run(self, args):
                 if short in branches:
                     self.p4BranchesInGit = [ short ]
             else:
-                self.p4BranchesInGit = branches.keys()
+                self.p4BranchesInGit = list(branches.keys())
 
             if len(self.p4BranchesInGit) > 1:
                 if not self.silent:
                     print("Importing from/into multiple branches")
                 self.detectBranches = True
-                for branch in branches.keys():
+                for branch in list(branches.keys()):
                     self.initialParents[self.refPrefix + branch] = \
                         branches[branch]
 
@@ -3870,19 +4172,25 @@ def __init__(self):
                                  help="where to leave result of the clone"),
             optparse.make_option("--bare", dest="cloneBare",
                                  action="store_true", default=False),
+            optparse.make_option("--encoding", dest="setPathEncoding",
+                                 action="store", default=None,
+                                 help="Sets the path encoding for this depot")
         ]
         self.cloneDestination = None
         self.needsGit = False
         self.cloneBare = False
+        self.setPathEncoding = None
 
     def defaultDestination(self, args):
+        """Returns the last path component as the default git 
+        repository directory name"""
         ## TODO: use common prefix of args?
         depotPath = args[0]
         depotDir = re.sub("(@[^@]*)$", "", depotPath)
         depotDir = re.sub("(#[^#]*)$", "", depotDir)
         depotDir = re.sub(r"\.\.\.$", "", depotDir)
         depotDir = re.sub(r"/$", "", depotDir)
-        return os.path.split(depotDir)[1]
+        return depotDir.split('/')[-1]
 
     def run(self, args):
         if len(args) < 1:
@@ -3894,19 +4202,29 @@ def run(self, args):
 
         depotPaths = args
 
+        # If we have an encoding provided, ignore what may already exist
+        # in the registry. This will ensure we show the displayed values
+        # using the correct encoding.
+        if self.setPathEncoding:
+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
+
+        # If more than 1 path element is supplied, the last element
+        # is the clone destination.
         if not self.cloneDestination and len(depotPaths) > 1:
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
 
+        dispPaths = []
         for p in depotPaths:
             if not p.startswith("//"):
                 sys.stderr.write('Depot paths must start with "//": %s\n' % p)
                 return False
+            dispPaths += [path_as_string(p)]
 
         if not self.cloneDestination:
             self.cloneDestination = self.defaultDestination(args)
 
-        print("Importing from %s into %s" % (', '.join(depotPaths), self.cloneDestination))
+        print("Importing from %s into %s" % (', '.join(dispPaths), path_as_string(self.cloneDestination)))
 
         if not os.path.exists(self.cloneDestination):
             os.makedirs(self.cloneDestination)
@@ -3919,6 +4237,13 @@ def run(self, args):
         if retcode:
             raise CalledProcessError(retcode, init_cmd)
 
+        # Set the encoding if it was provided command line
+        if self.setPathEncoding:
+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
+            retcode = subprocess.call(init_cmd)
+            if retcode:
+                raise CalledProcessError(retcode, init_cmd)
+
         if not P4Sync.run(self, depotPaths):
             return False
 
@@ -3974,7 +4299,7 @@ def findLastP4Revision(self, starting_point):
             to find the P4 commit we are based on, and the depot-paths.
         """
 
-        for parent in (range(65535)):
+        for parent in (list(range(65535))):
             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
             settings = extractSettingsGitLog(log)
             if 'change' in settings:
@@ -4080,6 +4405,107 @@ def run(self, args):
             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
         return True
 
+class Py23File():
+    """ Python2/3 Unicode File Wrapper 
+    """
+    
+    stream_handle = None
+    verbose       = False
+    debug_handle  = None
+   
+    def __init__(self, stream_handle, verbose = False,
+                 debug_handle = None):
+        """ Create a Python3 compliant Unicode to Byte String
+            Windows compatible wrapper
+
+            stream_handle = the underlying file-like handle
+            verbose       = Boolean if content should be echoed
+            debug_handle  = A file-like handle data is duplicately written to
+        """
+        self.stream_handle = stream_handle
+        self.verbose       = verbose
+        self.debug_handle  = debug_handle
+
+    def write(self, utf8string):
+        """ Writes the utf8 encoded string to the underlying 
+            file stream
+        """
+        self.stream_handle.write(as_bytes(utf8string))
+        if self.verbose:
+            sys.stderr.write("Stream Output: %s" % utf8string)
+            sys.stderr.flush()
+        if self.debug_handle:
+            self.debug_handle.write(as_bytes(utf8string))
+
+    def read(self, size = None):
+        """ Reads int charcters from the underlying stream 
+            and converts it to utf8.
+
+            Be aware, the size value is for reading the underlying
+            bytes so the value may be incorrect. Usage of the size
+            value is discouraged.
+        """
+        if size == None:
+            return as_string(self.stream_handle.read())
+        else:
+            return as_string(self.stream_handle.read(size))
+
+    def readline(self):
+        """ Reads a line from the underlying byte stream 
+            and converts it to utf8
+        """
+        return as_string(self.stream_handle.readline())
+
+    def readlines(self, sizeHint = None):
+        """ Returns a list containing lines from the file converted to unicode.
+
+            sizehint - Optional. If the optional sizehint argument is 
+            present, instead of reading up to EOF, whole lines totalling 
+            approximately sizehint bytes are read.
+        """
+        lines = self.stream_handle.readlines(sizeHint)
+        for i in range(0, len(lines)):
+            lines[i] = as_string(lines[i])
+        return lines
+
+    def close(self):
+        """ Closes the underlying byte stream """
+        self.stream_handle.close()
+
+    def flush(self):
+        """ Flushes the underlying byte stream """
+        self.stream_handle.flush()
+
+class DepotPath():
+    """ Describes a DepotPath or File
+    """
+
+    raw_path = None
+    utf8_path = None
+    bytes_path = None
+
+    def __init__(self, path):
+        """ Creates a new DepotPath with the path encoded
+            with by the P4 repository
+        """
+        raw_path = path
+
+    def raw():
+        """ Returns the path as it was originally found
+            in the P4 repository
+        """
+        return raw_path
+
+    def startswith(self, prefix, start = None, end = None):
+        """ Return True if string starts with the prefix, otherwise 
+            return False. prefix can also be a tuple of prefixes to 
+            look for. With optional start, test string beginning at 
+            that position. With optional end, stop comparing 
+            string at that position.
+        """
+        return raw_path.startswith(prefix, start, end)
+
+
 class HelpFormatter(optparse.IndentedHelpFormatter):
     def __init__(self):
         optparse.IndentedHelpFormatter.__init__(self)
@@ -4113,7 +4539,7 @@ def printUsage(commands):
 
 def main():
     if len(sys.argv[1:]) == 0:
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     cmdName = sys.argv[1]
@@ -4123,7 +4549,7 @@ def main():
     except KeyError:
         print("unknown command %s" % cmdName)
         print("")
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     options = cmd.options
@@ -4140,7 +4566,12 @@ def main():
                                    description = cmd.description,
                                    formatter = HelpFormatter())
 
-    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    try:
+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    except:
+        parser.print_help()
+        raise
+
     global verbose
     verbose = cmd.verbose
     if cmd.needsGit:
@@ -4155,8 +4586,8 @@ def main():
                         chdir(cdup);
 
         if not isValidGitDir(cmd.gitdir):
-            if isValidGitDir(cmd.gitdir + "/.git"):
-                cmd.gitdir += "/.git"
+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
             else:
                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
@ 2019-12-03  0:18       ` Denton Liu
  2019-12-03 16:03         ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Denton Liu @ 2019-12-03  0:18 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

Hi Ben,

Thanks for the contribution!

> Subject: Python3 support for t9800 tests. Basic P4/Python3 support

In git.git, the convention for commit subjects is to use 
"<area>: <summary>". Perhaps something like, "git-p4: support Python 3"?
Although I doubt this patch should remain as is... More below.

On Mon, Dec 02, 2019 at 07:02:16PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>

It would be nice to have a bit more information about what this patch
does. Could you please fill this in with some more details about the
whats and, more importantly, the _whys_ of your change?

> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> ---
>  git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 628 insertions(+), 197 deletions(-)

This is a very big change to be done in one patch. Could you please
split this into multiple smaller patches that each do one logical
change? For example, you could have the following series of changes:

	1. git-p4: use p4.exe if on Windows
	2. git-p4: introduce encoding helper functions # this is to
	        introduce the as_string(), as_bytes(), etc. functions
	3. git-p4: start using the encoding helper functions
	...

This was just an example and you don't have to follow those literally. I
just wanted to give you an idea of what I meant.

You can see Documentation/SubmittingPatches#separate-commits for more
information.

Thanks,

Denton

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-03  0:18       ` Denton Liu
@ 2019-12-03 16:03         ` Ben Keene
  2019-12-04  6:14           ` Denton Liu
  0 siblings, 1 reply; 64+ messages in thread
From: Ben Keene @ 2019-12-03 16:03 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/2/2019 7:18 PM, Denton Liu wrote:
> Hi Ben,
>
> Thanks for the contribution!
>
>> Subject: Python3 support for t9800 tests. Basic P4/Python3 support
> In git.git, the convention for commit subjects is to use
> "<area>: <summary>". Perhaps something like, "git-p4: support Python 3"?
> Although I doubt this patch should remain as is... More below.
I didn't realize the email message from gitgitgadget was going to be the 
commit message, I thought it was the PR message.  I'll work on changing 
that!
>
> On Mon, Dec 02, 2019 at 07:02:16PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
> It would be nice to have a bit more information about what this patch
> does. Could you please fill this in with some more details about the
> whats and, more importantly, the _whys_ of your change?
Sure, I'll add more detail.
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> ---
>>   git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
>>   1 file changed, 628 insertions(+), 197 deletions(-)
> This is a very big change to be done in one patch. Could you please
> split this into multiple smaller patches that each do one logical
> change? For example, you could have the following series of changes:
>
> 	1. git-p4: use p4.exe if on Windows
> 	2. git-p4: introduce encoding helper functions # this is to
> 	        introduce the as_string(), as_bytes(), etc. functions
> 	3. git-p4: start using the encoding helper functions
> 	...
>
> This was just an example and you don't have to follow those literally. I
> just wanted to give you an idea of what I meant.
>
> You can see Documentation/SubmittingPatches#separate-commits for more
> information.
>
> Thanks,
>
> Denton
So my last question would be, should I open a different PR on 
gitgitgadget? I can cherry-pick my changes into another branch and 
restart my submission?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-03 16:03         ` Ben Keene
@ 2019-12-04  6:14           ` Denton Liu
  0 siblings, 0 replies; 64+ messages in thread
From: Denton Liu @ 2019-12-04  6:14 UTC (permalink / raw)
  To: Ben Keene; +Cc: Ben Keene via GitGitGadget, git, Junio C Hamano

On Tue, Dec 03, 2019 at 11:03:31AM -0500, Ben Keene wrote:
> So my last question would be, should I open a different PR on gitgitgadget?
> I can cherry-pick my changes into another branch and restart my submission?

You can reuse the same PR. Just force-push to overwrite your old commits
and then you'll be able to `/submit` again to send another revision.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
@ 2019-12-04 22:29     ` Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
                         ` (12 more replies)
  1 sibling, 13 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

Issue: The current git-p4.py script does not work with python3.

I have attempted to use the P4 integration built into GIT and I was unable
to get the program to run because I have Python 3.8 installed on my
computer. I was able to get the program to run when I downgraded my python
to version 2.7. However, python 2 is reaching its end of life.

Submission: I am submitting a patch for the git-p4.py script that partially 
supports python 3.8. This code was able to pass the basic tests (t9800) when
run against Python3. This provides basic functionality. 

In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
git P4 Clone was introduced. 

--encoding Format-identifier

This will create the GIT repository following the current functionality;
however, before importing the files from P4, it will set the
git-p4.pathEncoding option so any files or paths that are encoded with
non-ASCII/non-UTF-8 formats will import correctly.

Technical details: The script was updated by futurize (
https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
references to classes in future were reworked so that future would not be
required. The existing code test for Unicode support was extended to
normalize the classes “unicode” and “bytes” to across platforms:

 * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
 * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.

New coercion methods were written for both Python2 and Python3:

 * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
   encoded Unicode string. 
 * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
   bytes.

In Python2, these functions do not change the data since a ‘str’ object
function in both roles as strings and byte arrays. This reduces the
potential impact on backward compatibility with Python 2.

 * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
   string. This function will encode data in both Python2 and Python3. * 
      path_as_string(path) – This function is an extension function that
      honors the option “git-p4.pathEncoding” to convert a set of bytes or
      characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
      use the encodeWithUTF8() method to convert the custom encoded bytes to
      Unicode in UTF-8.
   
   

Generally speaking, information in the script is converted to Unicode as
early as possible and converted back to a byte array just before passing to
external programs or files. The exception to this rule is P4 Repository file
paths.

Paths are not converted but left as “bytes” so the original file path
encoding can be preserved. This formatting is required for commands that
interact with the P4 file path. When the file path is used by GIT, it is
converted with encodeWithUTF8().

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (11):
  git-p4: select p4 binary by operating-system
  git-p4: change the expansion test from basestring to list
  git-p4: add new helper functions for python3 conversion
  git-p4: python3 syntax changes
  git-p4: Add new functions in preparation of usage
  git-p4: Fix assumed path separators to be more Windows friendly
  git-p4: Add a helper class for stream writing
  git-p4: p4CmdList  - support Unicode encoding
  git-p4: Add usability enhancements
  git-p4: Support python3 for basic P4 clone, sync, and submit
  git-p4: Added --encoding parameter to p4 clone

 Documentation/git-p4.txt        |   5 +
 git-p4.py                       | 690 ++++++++++++++++++++++++--------
 t/t9822-git-p4-path-encoding.sh | 101 +++++
 3 files changed, 629 insertions(+), 167 deletions(-)


base-commit: 228f53135a4a41a37b6be8e4d6e2b6153db4a8ed
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/463

Range-diff vs v3:

  -:  ---------- >  1:  4012426993 git-p4: select p4 binary by operating-system
  -:  ---------- >  2:  0ef2f56b04 git-p4: change the expansion test from basestring to list
  -:  ---------- >  3:  f0e658b984 git-p4: add new helper functions for python3 conversion
  -:  ---------- >  4:  3c41db3e91 git-p4: python3 syntax changes
  -:  ---------- >  5:  1bf7b073b0 git-p4: Add new functions in preparation of usage
  -:  ---------- >  6:  8f5752c127 git-p4: Fix assumed path separators to be more Windows friendly
  -:  ---------- >  7:  10dc059444 git-p4: Add a helper class for stream writing
  -:  ---------- >  8:  e1a424a955 git-p4: p4CmdList  - support Unicode encoding
  -:  ---------- >  9:  4fc49313f0 git-p4: Add usability enhancements
  1:  02b3843e9f ! 10:  04a0aedbaa Python3 support for t9800 tests. Basic P4/Python3 support
     @@ -1,159 +1,60 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    Python3 support for t9800 tests. Basic P4/Python3 support
     +    git-p4: Support python3 for basic P4 clone, sync, and submit
     +
     +    Issue: Python 3 is still not properly supported for any use with the git-p4 python code.
     +    Warning - this is a very large atomic commit.  The commit text is also very large.
     +
     +    Change the code such that, with the exception of P4 depot paths and depot files, all text read by git-p4 is cast as a string as soon as possible and converted back to bytes as late as possible, following Python2 to Python3 conversion best practices.
     +
     +    Important: Do not cast the bytes that contain the p4 depot path or p4 depot file name.  These should be left as bytes until used.
     +
     +    These two values should not be converted because the encoding of these values is unknown.  git-p4 supports a configuration value git-p4.pathEncoding that is used by the encodeWithUTF8()  to determine what a UTF8 version of the path and filename should be.  However, since depot path and depot filename need to be sent to P4 in their original encoding, they will be left as byte streams until they are actually used:
     +
     +    * When sent to P4, the bytes are literally passed to the p4 command
     +    * When displayed in text for the user, they should be passed through the path_as_string() function
     +    * When used by GIT they should be passed through the encodeWithUTF8() function
     +
     +    Change all the rest of system calls to cast output (stdin) as_bytes() and input (stdout) as_string().  This retains existing Python 2 support, and adds python 3 support for these functions:
     +    * read_pipe_full
     +    * read_pipe_lines
     +    * p4_has_move_command (used internally)
     +    * gitConfig
     +    * branch_exists
     +    * GitLFS.generatePointer
     +    * applyCommit - template must be read and written to the temporary file as_bytes() since it is created in memory as a string.
     +    * streamOneP4File(file, contents) - wrap calls to the depotFile in path_as_string() for display. The file contents must be retained as bytes, so update the RCS changes to be forced to bytes.
     +    * streamP4Files
     +    * importHeadRevision(revision) - encode the depotPaths for display separate from the text for processing.
     +
     +    Py23File usage -
     +    Change the P4Sync.OpenStreams() function to cast the gitOutput, gitStream, and gitError streams as Py23File() wrapper classes.  This facilitates taking strings in both python 2 and python 3 and casting them to bytes in the wrapper class instead of having to modify each method. Since the fast-import command also expects a raw byte stream for file content, add a new stream handle - gitStreamBytes which is an unwrapped verison of gitStream.
     +
     +    Literal text -
     +    Depending on context, most literal text does not need casting to unicode or bytes as the text is Python dependent - In python 2, the string is implied as 'str' and python 3 the string is implied as 'unicode'. Under these conditions, they match the rest of the operating text, following best practices.  However, when a literal string is used in functions that are dealing with the raw input from and raw ouput to files streams, literal bytes may be required. Additionally, functions that are dealing with P4 depot paths or P4 depot file names are also dealing with bytes and will require the same casting as bytes.  The following functions cast text as byte strings:
     +    * wildcard_decode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
     +    * wildcard_encode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
     +    * streamP4FilesCb(marshalled) - the marshalled data is in bytes. Cast the literals as bytes. When using this data to manipulate self.stream_file, encode all the marshalled data except for the 'depotFile' name.
     +    * streamP4Files
     +
     +    Special behavior:
     +    * p4_describe - encoding is disabled for the depotFile(x) and path elements since these are depot path and depo filenames.
     +    * p4PathStartsWith(path, prefix) - Since P4 depot paths can contain non-UTF-8 encoded strings, change this method to compare paths while supporting the optional encoding.
     +       - First, perform a byte-to-byte check to see if the path and prefix are both identical text.  There is no need to perform encoding conversions if the text is identical.
     +       - If the byte check fails, pass both the path and prefix through encodeWithUTF8() to ensure both paths are using the same encoding. Then perform the test as originally written.
     +    * patchRCSKeywords(file, pattern) - the parameters of file and pattern are both strings. However this function changes the contents of the file itentified by name "file". Treat the content of this file as binary to ensure that python does not accidently change the original encoding. The regular expression is cast as_bytes() and run against the file as_bytes(). The P4 keywords are ASCII strings and cannot span lines so iterating over each line of the file is acceptable.
     +    * writeToGitStream(gitMode, relPath, contents) - Since 'contents' is already bytes data, instead of using the self.gitStream, use the new self.gitStreamBytes - the unwrapped gitStream that does not cast as_bytes() the binary data.
     +    * commit(details, files, branch, parent = "", allow_empty=False) - Changed the encoding for the commit message to the preferred format for fast-import. The number of bytes is sent in the data block instead of using the EOT marker.
     +    * Change the code for handling the user cache to use binary files. Cast text as_bytes() when writing to the cache and as_string() when reading from the cache.  This makes the reading and writing of the cache determinstic in it's encoding. Unlike file paths, P4 encodes the user names in UTF-8 encoding so no additional string encoding is required.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     +    (cherry picked from commit 65ff0c74ebe62a200b4385ecfd4aa618ce091f48)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
       +++ b/git-p4.py
      @@
     - import zlib
     - import ctypes
     - import errno
     -+import os.path
     -+import codecs
     -+import io
     - 
     - # support basestring in python3
     - try:
     -     unicode = unicode
     - except NameError:
     -     # 'unicode' is undefined, must be Python 3
     --    str = str
     -+    #
     -+    # For Python3 which is natively unicode, we will use 
     -+    # unicode for internal information but all P4 Data
     -+    # will remain in bytes
     -+    isunicode = True
     -     unicode = str
     -     bytes = bytes
     --    basestring = (str,bytes)
     -+
     -+    def as_string(text):
     -+        """Return a byte array as a unicode string"""
     -+        if text == None:
     -+            return None
     -+        if isinstance(text, bytes):
     -+            return unicode(text, "utf-8")
     -+        else:
     -+            return text
     -+
     -+    def as_bytes(text):
     -+        """Return a Unicode string as a byte array"""
     -+        if text == None:
     -+            return None
     -+        if isinstance(text, bytes):
     -+            return text
     -+        else:
     -+            return bytes(text, "utf-8")
     -+
     -+    def to_unicode(text):
     -+        """Return a byte array as a unicode string"""
     -+        return as_string(text)    
     -+
     -+    def path_as_string(path):
     -+        """ Converts a path to the UTF8 encoded string """
     -+        if isinstance(path, unicode):
     -+            return path
     -+        return encodeWithUTF8(path).decode('utf-8')
     -+    
     - else:
     -     # 'unicode' exists, must be Python 2
     --    str = str
     -+    #
     -+    # We will treat the data as:
     -+    #   str   -> str
     -+    #   bytes -> str
     -+    # So for Python2 these functions are no-ops
     -+    # and will leave the data in the ambiguious
     -+    # string/bytes state
     -+    isunicode = False
     -     unicode = unicode
     -     bytes = str
     --    basestring = basestring
     -+
     -+    def as_string(text):
     -+        """ Return text unaltered (for Python3 support) """
     -+        return text
     -+
     -+    def as_bytes(text):
     -+        """ Return text unaltered (for Python3 support) """
     -+        return text
     -+
     -+    def to_unicode(text):
     -+        """Return a string as a unicode string"""
     -+        return text.decode('utf-8')
     -+    
     -+    def path_as_string(path):
     -+        """ Converts a path to the UTF8 encoded bytes """
     -+        return encodeWithUTF8(path)
     -+
     -+
     -+ 
     -+# Check for raw_input support
     -+try:
     -+    raw_input
     -+except NameError:
     -+    raw_input = input
     - 
     - try:
     -     from subprocess import CalledProcessError
     -@@
     -     location. It means that hooking into the environment, or other configuration
     -     can be done more easily.
     -     """
     --    real_cmd = ["p4"]
     -+    # Look for the P4 binary
     -+    if (platform.system() == "Windows"):
     -+        real_cmd = ["p4.exe"]    
     -+    else:
     -+        real_cmd = ["p4"]
     - 
     -     user = gitConfig("git-p4.user")
     -     if len(user) > 0:
     -@@
     -         # Provide a way to not pass this option by setting git-p4.retries to 0
     -         real_cmd += ["-r", str(retries)]
     - 
     --    if isinstance(cmd,basestring):
     -+    if not isinstance(cmd, list):
     -         real_cmd = ' '.join(real_cmd) + ' ' + cmd
     -     else:
     -         real_cmd += cmd
     -@@
     -         sys.exit(1)
     - 
     - def write_pipe(c, stdin):
     -+    """Executes the command 'c', passing 'stdin' on the standard input"""
     -     if verbose:
     -         sys.stderr.write('Writing pipe: %s\n' % str(c))
     - 
     --    expand = isinstance(c,basestring)
     -+    expand = not isinstance(c, list)
     -     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
     -     pipe = p.stdin
     -     val = pipe.write(stdin)
     -@@
     -     if p.wait():
     -         die('Command failed: %s' % str(c))
     - 
     --    return val
     - 
     - def p4_write_pipe(c, stdin):
     -+    """ Runs a P4 command 'c', passing 'stdin' data to P4"""
     -     real_cmd = p4_build_cmd(c)
     --    return write_pipe(real_cmd, stdin)
     -+    write_pipe(real_cmd, stdin)
     - 
     - def read_pipe_full(c):
     -     """ Read output from  command. Returns a tuple
     -@@
     -     if verbose:
     -         sys.stderr.write('Reading pipe: %s\n' % str(c))
     - 
     --    expand = isinstance(c,basestring)
     -+    expand = not isinstance(c, list)
     +     expand = not isinstance(c, list)
           p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
           (out, err) = p.communicate()
      +    out = as_string(out)
     @@ -179,10 +80,7 @@
           if verbose:
               sys.stderr.write('Reading pipe: %s\n' % str(c))
       
     --    expand = isinstance(c, basestring)
     -+    expand = not isinstance(c, list)
     -     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     -     pipe = p.stdout
     +@@
           val = pipe.readlines()
           if pipe.close() or p.wait():
               die('Command failed: %s' % str(c))
     @@ -203,28 +101,6 @@
           # return code will be 1 in either case
           if err.find("Invalid option") >= 0:
               return False
     -@@
     -     return True
     - 
     - def system(cmd, ignore_error=False):
     --    expand = isinstance(cmd,basestring)
     -+    expand = not isinstance(cmd, list)
     -     if verbose:
     -         sys.stderr.write("executing %s\n" % str(cmd))
     -     retcode = subprocess.call(cmd, shell=expand)
     -@@
     -     return retcode
     - 
     - def p4_system(cmd):
     --    """Specifically invoke p4 as the system command. """
     -+    """ Specifically invoke p4 as the system command. 
     -+    """
     -     real_cmd = p4_build_cmd(cmd)
     --    expand = isinstance(real_cmd, basestring)
     -+    expand = not isinstance(real_cmd, list)
     -     retcode = subprocess.call(real_cmd, shell=expand)
     -     if retcode:
     -         raise CalledProcessError(retcode, real_cmd)
      @@
           return int(results[0]['change'])
       
     @@ -234,7 +110,7 @@
      -       results."""
      +    """ Returns information about the requested P4 change list.
      +
     -+        Data returns is not string encoded (returned as bytes)
     ++        Data returned is not string encoded (returned as bytes)
      +    """
      +    # Make sure it returns a valid result by checking for
      +    #   the presence of field "time".  Return a dict of the
     @@ -261,218 +137,29 @@
           if "time" not in d:
               die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
       
     -+    # Convert depotFile(X) to be UTF-8 encoded, as this is what GIT
     -+    # requires. This will also allow us to encode the rest of the text
     -+    # at the same time to simplify textual processing later.
     ++    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however 
     ++    # cast as_string() the rest of the text. 
      +    keys=d.keys()
      +    for key in keys:
      +        if key.startswith('depotFile'):
     -+            d[key]=d[key] #DepotPath(d[key])
     ++            d[key]=d[key] 
      +        elif key == 'path':
     -+            d[key]=d[key] #DepotPath(d[key])
     ++            d[key]=d[key] 
      +        else:
      +            d[key] = as_string(d[key])
      +
           return d
       
     --#
     --# Canonicalize the p4 type and return a tuple of the
     --# base type, plus any modifiers.  See "p4 help filetypes"
     --# for a list and explanation.
     --#
     - def split_p4_type(p4type):
     --
     -+    """ Canonicalize the p4 type and return a tuple of the
     -+        base type, plus any modifiers.  See "p4 help filetypes"
     -+        for a list and explanation.
     -+    """
     -     p4_filetypes_historical = {
     -         "ctempobj": "binary+Sw",
     -         "ctext": "text+C",
     -@@
     -         mods = s[1]
     -     return (base, mods)
     - 
     --#
     --# return the raw p4 type of a file (text, text+ko, etc)
     --#
     - def p4_type(f):
     -+    """ return the raw p4 type of a file (text, text+ko, etc)
     -+    """
     -     results = p4CmdList(["fstat", "-T", "headType", wildcard_encode(f)])
     -     return results[0]['headType']
     - 
     --#
     --# Given a type base and modifier, return a regexp matching
     --# the keywords that can be expanded in the file
     --#
     - def p4_keywords_regexp_for_type(base, type_mods):
     -+    """ Given a type base and modifier, return a regexp matching
     -+        the keywords that can be expanded in the file
     -+    """
     -     if base in ("text", "unicode", "binary"):
     -         kwords = None
     -         if "ko" in type_mods:
     -@@
     -     else:
     -         return None
     - 
     --#
     --# Given a file, return a regexp matching the possible
     --# RCS keywords that will be expanded, or None for files
     --# with kw expansion turned off.
     --#
     - def p4_keywords_regexp_for_file(file):
     -+    """ Given a file, return a regexp matching the possible
     -+        RCS keywords that will be expanded, or None for files
     -+        with kw expansion turned off.
     -+    """
     -     if not os.path.exists(file):
     -         return None
     -     else:
     -@@
     - # Return the set of all p4 labels
     - def getP4Labels(depotPaths):
     -     labels = set()
     --    if isinstance(depotPaths,basestring):
     -+    if not isinstance(depotPaths, list):
     -         depotPaths = [depotPaths]
     - 
     -     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
     -@@
     - 
     -     return labels
     - 
     --# Return the set of all git tags
     - def getGitTags():
     -+    """Return the set of all git tags"""
     -     gitTags = set()
     -     for line in read_pipe_lines(["git", "tag"]):
     -         tag = line.strip()
     -@@
     - 
     -     If the pattern is not matched, None is returned."""
     - 
     --    match = diffTreePattern().next().match(entry)
     -+    match = next(diffTreePattern()).match(entry)
     -     if match:
     -         return {
     -             'src_mode': match.group(1),
     -@@
     -     # otherwise False.
     -     return mode[-3:] == "755"
     - 
     -+def encodeWithUTF8(path, verbose = False):
     -+    """ Ensure that the path is encoded as a UTF-8 string
     -+
     -+        Returns bytes(P3)/str(P2)
     -+    """
     -+   
     -+    if isunicode:
     -+        try:
     -+            if isinstance(path, unicode):
     -+                # It is already unicode, cast it as a bytes
     -+                # that is encoded as utf-8.
     -+                return path.encode('utf-8', 'strict')
     -+            path.decode('ascii', 'strict')
     -+        except:
     -+            encoding = 'utf8'
     -+            if gitConfig('git-p4.pathEncoding'):
     -+                encoding = gitConfig('git-p4.pathEncoding')
     -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
     -+            if verbose:
     -+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
     -+    else:    
     -+        try:
     -+            path.decode('ascii')
     -+        except:
     -+            encoding = 'utf8'
     -+            if gitConfig('git-p4.pathEncoding'):
     -+                encoding = gitConfig('git-p4.pathEncoding')
     -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
     -+            if verbose:
     -+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
     -+    return path
     -+
     - class P4Exception(Exception):
     -     """ Base class for exceptions from the p4 client """
     -     def __init__(self, exit_code):
     -@@
     -     return isModeExec(src_mode) != isModeExec(dst_mode)
     - 
     - def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     --        errors_as_exceptions=False):
     -+        errors_as_exceptions=False, encode_data=True):
     -+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
     -+        standard input via a temporary file with 'stdin_mode' mode.
     -+
     -+        Output from the command is optionally passed to the callback function 'cb'.
     -+        If 'cb' is None, the response from the command is parsed into a list
     -+        of resulting dictionaries. (For each block read from the process pipe.)
     -+
     -+        If 'skip_info' is true, information in a block read that has a code type of
     -+        'info' will be skipped.
     - 
     --    if isinstance(cmd,basestring):
     -+        If 'errors_as_exceptions' is set to true (the default is false) the error
     -+        code returned from the execution will generate an exception.
     -+
     -+        If 'encode_data' is set to true (the default) the data that is returned 
     -+        by this function will be passed through the "as_string" function.
     -+    """
     -+
     -+    if not isinstance(cmd, list):
     -         cmd = "-G " + cmd
     -         expand = True
     -     else:
     -@@
     -     stdin_file = None
     -     if stdin is not None:
     -         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
     --        if isinstance(stdin,basestring):
     -+        if not isinstance(stdin, list):
     -             stdin_file.write(stdin)
     -         else:
     -             for i in stdin:
     --                stdin_file.write(i + '\n')
     -+                stdin_file.write(as_bytes(i) + b'\n')
     -         stdin_file.flush()
     -         stdin_file.seek(0)
     - 
     -@@
     -         while True:
     -             entry = marshal.load(p4.stdout)
     -             if skip_info:
     --                if 'code' in entry and entry['code'] == 'info':
     -+                if b'code' in entry and entry[b'code'] == b'info':
     -                     continue
     -             if cb is not None:
     -                 cb(entry)
     -             else:
     --                result.append(entry)
     -+                out = {}
     -+                for key, value in entry.items():
     -+                    out[as_string(key)] = (as_string(value) if encode_data else value)
     -+                result.append(out)
     -     except EOFError:
     -         pass
     -     exitCode = p4.wait()
     + #
      @@
           return result
       
       def p4Cmd(cmd):
     -+    """ Executes a P4 command an returns the results in a dictionary"""
     ++    """ Executes a P4 command and returns the results in a dictionary
     ++    """
           list = p4CmdList(cmd)
           result = {}
           for entry in list:
     -@@
     -     return values
     - 
     - def gitBranchExists(branch):
     -+    """Checks to see if a given branch exists in the git repo"""
     -     proc = subprocess.Popen(["git", "rev-parse", branch],
     -                             stderr=subprocess.PIPE, stdout=subprocess.PIPE);
     -     return proc.wait() == 0;
      @@
       _gitConfig = {}
       
     @@ -490,29 +177,6 @@
           return _gitConfig[key]
       
       def gitConfigBool(key):
     --    """Return a bool, using git config --bool.  It is True only if the
     --       variable is set to true, and False if set to false or not present
     --       in the config."""
     --
     -+    """ Return a bool, using git config --bool.  It is True only if the
     -+        variable is set to true, and False if set to false or not present
     -+        in the config.
     -+    """
     -     if key not in _gitConfig:
     -         _gitConfig[key] = gitConfig(key, '--bool') == "true"
     -     return _gitConfig[key]
     -@@
     -             _gitConfig[key] = []
     -     return _gitConfig[key]
     - 
     -+def gitConfigSet(key, value):
     -+    """ Set the git configuration key 'key' to 'value' for this session
     -+    """
     -+    _gitConfig[key] = value
     -+
     - def p4BranchesInGit(branchesAreInRemotes=True):
     -     """Find all the branches whose names start with "p4/", looking
     -        in remotes or heads as specified by the argument.  Return
      @@
           cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
           p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     @@ -521,34 +185,6 @@
           if p.returncode:
               return False
           # expect exactly one line of output: the branch name
     -@@
     -     branches = p4BranchesInGit()
     -     # map from depot-path to branch name
     -     branchByDepotPath = {}
     --    for branch in branches.keys():
     -+    for branch in list(branches.keys()):
     -         tip = branches[branch]
     -         log = extractLogMessageFromGitCommit(tip)
     -         settings = extractSettingsGitLog(log)
     -@@
     -             system("git update-ref %s %s" % (remoteHead, originHead))
     - 
     - def originP4BranchesExist():
     --        return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
     -+    """Checks if origin/p4/master exists"""
     -+    return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
     - 
     - 
     - def p4ParseNumericChangeRange(parts):
     -@@
     -     changes = sorted(changes)
     -     return changes
     - 
     --def p4PathStartsWith(path, prefix):
     -+def p4PathStartsWith(path, prefix, verbose = False):
     -     # This method tries to remedy a potential mixed-case issue:
     -     #
     -     # If UserA adds  //depot/DirA/file1
      @@
           #
           # we may or may not have a problem. If you have core.ignorecase=true,
     @@ -574,15 +210,6 @@
       
       def getClientSpec():
           """Look at the p4 client spec, create a View() object that contains
     -@@
     -     client_name = entry["Client"]
     - 
     -     # just the keys that start with "View"
     --    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
     -+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
     - 
     -     # hold this new View
     -     view = View(client_name)
      @@
           # Cannot have * in a filename in windows; untested as to
           # what p4 would do in such a case.
     @@ -626,45 +253,16 @@
                   os.remove(contentFile)
                   die('git-lfs pointer command failed. Did you install the extension?')
      @@
     -         else:
     -             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
     - 
     --class Command:
     -+class Command(object):
     -     delete_actions = ( "delete", "move/delete", "purge" )
     -     add_actions = ( "add", "branch", "move/add" )
     - 
     -@@
     -             setattr(self, attr, value)
     -         return getattr(self, attr)
     - 
     --class P4UserMap:
     -+class P4UserMap(object):
     -     def __init__(self):
     -         self.userMapFromPerforceServer = False
     -         self.myP4UserId = None
     -@@
     -             return True
     - 
     -     def getUserCacheFilename(self):
     -+        """ Returns the filename of the username cache """
     -         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
     --        return home + "/.gitp4-usercache.txt"
     -+        return os.path.join(home, ".gitp4-usercache.txt")
     +         return os.path.join(home, ".gitp4-usercache.txt")
       
           def getUserMapFromPerforceServer(self):
      +        """ Creates the usercache from the data in P4.
      +        """
     -+        
               if self.userMapFromPerforceServer:
                   return
               self.users = {}
      @@
     -                 self.emails[email] = user
     - 
     -         s = ''
     --        for (key, val) in self.users.items():
     -+        for (key, val) in list(self.users.items()):
     +         for (key, val) in list(self.users.items()):
                   s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
       
      -        open(self.getUserCacheFilename(), "wb").write(s)
     @@ -674,7 +272,8 @@
               self.userMapFromPerforceServer = True
       
           def loadUserMapFromCache(self):
     -+        """ Reads the P4 username to git email map """
     ++        """ Reads the P4 username to git email map 
     ++        """
               self.users = {}
               self.userMapFromPerforceServer = False
               try:
     @@ -721,80 +320,6 @@
                   # cleanup our temporary file
                   os.unlink(outFileName)
                   print("Failed to strip RCS keywords in %s" % file)
     -@@
     -                 break
     -         if not change_entry:
     -             die('Failed to decode output of p4 change -o')
     --        for key, value in change_entry.iteritems():
     -+        for key, value in list(change_entry.items()):
     -             if key.startswith('File'):
     -                 if 'depot-paths' in settings:
     -                     if not [p for p in settings['depot-paths']
     --                            if p4PathStartsWith(value, p)]:
     -+                            if p4PathStartsWith(value, p, self.verbose)]:
     -                         continue
     -                 else:
     --                    if not p4PathStartsWith(value, self.depotPath):
     -+                    if not p4PathStartsWith(value, self.depotPath, self.verbose):
     -                         continue
     -                 files_list.append(value)
     -                 continue
     -@@
     -             return True
     - 
     -         while True:
     --            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
     -+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
     -+                .strip()[0]
     -             if response == 'y':
     -                 return True
     -             if response == 'n':
     -@@
     -     def applyCommit(self, id):
     -         """Apply one commit, return True if it succeeded."""
     - 
     --        print("Applying", read_pipe(["git", "show", "-s",
     --                                     "--format=format:%h %s", id]))
     -+        print(("Applying", read_pipe(["git", "show", "-s",
     -+                                     "--format=format:%h %s", id])))
     - 
     -         (p4User, gitEmail) = self.p4UserForCommit(id)
     - 
     -@@
     -                     # disable the read-only bit on windows.
     -                     if self.isWindows and file not in editedFiles:
     -                         os.chmod(file, stat.S_IWRITE)
     --                    self.patchRCSKeywords(file, kwfiles[file])
     --                    fixed_rcs_keywords = True
     -+                    
     -+                    try:
     -+                        self.patchRCSKeywords(file, kwfiles[file])
     -+                        fixed_rcs_keywords = True
     -+                    except:
     -+                        # We are throwing an exception, undo all open edits
     -+                        for f in editedFiles:
     -+                            p4_revert(f)
     -+                        raise
     -+            else:
     -+                # They do not have attemptRCSCleanup set, this might be the fail point
     -+                # Check to see if the file has RCS keywords and suggest setting the property.
     -+                for file in editedFiles | filesToDelete:
     -+                    if p4_keywords_regexp_for_file(file) != None:
     -+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
     -+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
     -+                        break
     - 
     -             if fixed_rcs_keywords:
     -                 print("Retrying the patch with RCS keywords cleaned up")
     -@@
     -             p4_delete(f)
     - 
     -         # Set/clear executable bits
     --        for f in filesToChangeExecBit.keys():
     -+        for f in list(filesToChangeExecBit.keys()):
     -             mode = filesToChangeExecBit[f]
     -             setP4ExecBit(f, mode)
     - 
      @@
               tmpFile = os.fdopen(handle, "w+b")
               if self.isWindows:
     @@ -815,179 +340,6 @@
       
                       if update_shelve:
                           p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
     -@@
     -                 if verbose:
     -                     print("created p4 label for tag %s" % name)
     - 
     -+    def run_hook(self, hook_name, args = []):
     -+        """ Runs a hook if it is found.
     -+
     -+            Returns NONE if the hook does not exist
     -+            Returns TRUE if the exit code is 0, FALSE for a non-zero exit code.
     -+        """
     -+        hook_file = self.find_hook(hook_name)
     -+        if hook_file == None:
     -+            if self.verbose:
     -+                print("Skipping hook: %s" % hook_name)
     -+            return None
     -+
     -+        if self.verbose:
     -+            print("hooks_path = %s " % hooks_path)
     -+            print("hook_file = %s " % hook_file)
     -+
     -+        # Run the hook
     -+        # TODO - allow non-list format
     -+        cmd = [hook_file] + args
     -+        return subprocess.call(cmd) == 0
     -+
     -+    def find_hook(self, hook_name):
     -+        """ Locates the hook file for the given operating system.
     -+        """
     -+        hooks_path = gitConfig("core.hooksPath")
     -+        if len(hooks_path) <= 0:
     -+            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
     -+
     -+        # Look in the obvious place
     -+        hook_file = os.path.join(hooks_path, hook_name)
     -+        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK):
     -+            return hook_file
     -+
     -+        # if we are windows, we will also allow them to have the hooks have extensions
     -+        if (platform.system() == "Windows"):
     -+            for ext in ['.exe', '.bat', 'ps1']:
     -+                if os.path.isfile(hook_file + ext) and os.access(hook_file + ext, os.X_OK):
     -+                    return hook_file + ext
     -+
     -+        # We didn't find the file
     -+        return None
     -+
     -+
     -+
     -     def run(self, args):
     -         if len(args) == 0:
     -             self.master = currentGitBranch()
     -@@
     -             self.clientSpecDirs = getClientSpec()
     - 
     -         # Check for the existence of P4 branches
     --        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
     -+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
     - 
     -         if self.useClientSpec and not branchesDetected:
     -             # all files are relative to the client spec
     -@@
     -             sys.exit("number of commits (%d) must match number of shelved changelist (%d)" %
     -                      (len(commits), num_shelves))
     - 
     --        hooks_path = gitConfig("core.hooksPath")
     --        if len(hooks_path) <= 0:
     --            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
     --
     --        hook_file = os.path.join(hooks_path, "p4-pre-submit")
     --        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK) and subprocess.call([hook_file]) != 0:
     -+        rtn = self.run_hook("p4-pre-submit")
     -+        if rtn == False:
     -             sys.exit(1)
     - 
     -         #
     -@@
     -         last = len(commits) - 1
     -         for i, commit in enumerate(commits):
     -             if self.dry_run:
     --                print(" ", read_pipe(["git", "show", "-s",
     --                                      "--format=format:%h %s", commit]))
     -+                print((" ", read_pipe(["git", "show", "-s",
     -+                                      "--format=format:%h %s", commit])))
     -                 ok = True
     -             else:
     -                 ok = self.applyCommit(commit)
     -@@
     -                         if self.conflict_behavior == "ask":
     -                             print("What do you want to do?")
     -                             response = raw_input("[s]kip this commit but apply"
     --                                                 " the rest, or [q]uit? ")
     -+                                                 " the rest, or [q]uit? ").lower().strip()[0]
     -                             if not response:
     -                                 continue
     -                         elif self.conflict_behavior == "skip":
     -@@
     -                         star = "*"
     -                     else:
     -                         star = " "
     --                    print(star, read_pipe(["git", "show", "-s",
     --                                           "--format=format:%h %s",  c]))
     -+                    print((star, read_pipe(["git", "show", "-s",
     -+                                           "--format=format:%h %s",  c])))
     -                 print("You will have to do 'git p4 sync' and rebase.")
     - 
     -         if gitConfigBool("git-p4.exportLabels"):
     -@@
     -     # ("-//depot/A/..." becomes "/depot/A/..." after option parsing)
     -     parser.values.cloneExclude += ["/" + re.sub(r"\.\.\.$", "", value)]
     - 
     -+
     - class P4Sync(Command, P4UserMap):
     - 
     -     def __init__(self):
     -@@
     -         self.knownBranches = {}
     -         self.initialParents = {}
     - 
     --        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
     -+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
     -         self.labels = {}
     - 
     -     # Force a checkpoint in fast-import and wait for it to finish
     -@@
     -     def isPathWanted(self, path):
     -         for p in self.cloneExclude:
     -             if p.endswith("/"):
     --                if p4PathStartsWith(path, p):
     -+                if p4PathStartsWith(path, p, self.verbose):
     -                     return False
     -             # "-//depot/file1" without a trailing "/" should only exclude "file1", but not "file111" or "file1_dir/file2"
     -             elif path.lower() == p.lower():
     -                 return False
     -         for p in self.depotPaths:
     --            if p4PathStartsWith(path, p):
     -+            if p4PathStartsWith(path, p, self.verbose):
     -                 return True
     -         return False
     - 
     -     def extractFilesFromCommit(self, commit, shelved=False, shelved_cl = 0):
     -+        """ Generates the list of files to be added in this git commit.
     -+
     -+            commit     = Unicode[] - data read from the P4 commit
     -+            shelved    = Bool      - Is the P4 commit flagged as being shelved.
     -+            shelved_cl = Unicode   - Numeric string with the changelist number.
     -+        """
     -         files = []
     -         fnum = 0
     -         while "depotFile%s" % fnum in commit:
     -@@
     -             path = self.clientSpecDirs.map_in_client(path)
     -             if self.detectBranches:
     -                 for b in self.knownBranches:
     --                    if p4PathStartsWith(path, b + "/"):
     -+                    if p4PathStartsWith(path, b + "/", self.verbose):
     -                         path = path[len(b)+1:]
     - 
     -         elif self.keepRepoPath:
     -@@
     -             # //depot/; just look at first prefix as they all should
     -             # be in the same depot.
     -             depot = re.sub("^(//[^/]+/).*", r'\1', prefixes[0])
     --            if p4PathStartsWith(path, depot):
     -+            if p4PathStartsWith(path, depot, self.verbose):
     -                 path = path[len(depot):]
     - 
     -         else:
     -             for p in prefixes:
     --                if p4PathStartsWith(path, p):
     -+                if p4PathStartsWith(path, p, self.verbose):
     -                     path = path[len(p):]
     -                     break
     - 
      @@
               return path
       
     @@ -1002,19 +354,6 @@
       
               if self.clientSpecDirs:
                   files = self.extractFilesFromCommit(commit)
     -@@
     -             else:
     -                 relPath = self.stripRepoPath(path, self.depotPaths)
     - 
     --            for branch in self.knownBranches.keys():
     -+            for branch in list(self.knownBranches.keys()):
     -                 # add a trailing slash so that a commit into qt/4.2foo
     -                 # doesn't end up in qt/4.2, e.g.
     --                if p4PathStartsWith(relPath, branch + "/"):
     -+                if p4PathStartsWith(relPath, branch + "/", self.verbose):
     -                     if branch not in branches:
     -                         branches[branch] = []
     -                     branches[branch].append(file)
      @@
               return branches
       
     @@ -1031,18 +370,6 @@
      +            self.gitStreamBytes.write(d)
               self.gitStream.write('\n')
       
     --    def encodeWithUTF8(self, path):
     --        try:
     --            path.decode('ascii')
     --        except:
     --            encoding = 'utf8'
     --            if gitConfig('git-p4.pathEncoding'):
     --                encoding = gitConfig('git-p4.pathEncoding')
     --            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
     --            if self.verbose:
     --                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
     --        return path
     --
      -    # output one file from the P4 stream
      -    # - helper for streamP4Files
      -
     @@ -1053,18 +380,13 @@
      +            contents should be a bytes (bytes) 
      +        """
               relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
     --        relPath = self.encodeWithUTF8(relPath)
     -+        relPath = encodeWithUTF8(relPath, self.verbose)
     +         relPath = encodeWithUTF8(relPath, self.verbose)
               if verbose:
     -             if 'fileSize' in self.stream_file:
     +@@
                       size = int(self.stream_file['fileSize'])
                   else:
                       size = 0 # deleted files don't get a fileSize apparently
     --            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
     -+            #if isunicode:
     -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), to_unicode(relPath), size//1024//1024))
     -+            #else:
     -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), relPath, size//1024//1024))
     +-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
      +            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
                   sys.stdout.flush()
       
     @@ -1100,15 +422,6 @@
       
               if self.largeFileSystem:
      @@
     - 
     -     def streamOneP4Deletion(self, file):
     -         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
     --        relPath = self.encodeWithUTF8(relPath)
     -+        relPath = encodeWithUTF8(relPath, self.verbose)
     -         if verbose:
     -             sys.stdout.write("delete %s\n" % relPath)
     -             sys.stdout.flush()
     -@@
               if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
                   self.largeFileSystem.removeLargeFile(relPath)
       
     @@ -1133,13 +446,6 @@
       
               if not err and 'fileSize' in self.stream_file:
                   required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
     -             if required_bytes > 0:
     -                 err = 'Not enough space left on %s! Free at least %i MB.' % (
     --                    os.getcwd(), required_bytes/1024/1024
     -+                    os.getcwd(), required_bytes//1024//1024
     -                 )
     - 
     -         if err:
      @@
                   # ignore errors, but make sure it exits first
                   self.importProcess.wait()
     @@ -1155,12 +461,10 @@
                   self.streamOneP4File(self.stream_file, self.stream_contents)
                   self.stream_file = {}
      @@
     - 
               # pick up the new file information... for the
               # 'data' field we need to append to our array
     --        for k in marshalled.keys():
     +         for k in list(marshalled.keys()):
      -            if k == 'data':
     -+        for k in list(marshalled.keys()):
      +            if k == b'data':
                       if 'streamContentSize' not in self.stream_file:
                           self.stream_file['streamContentSize'] = 0
     @@ -1178,12 +482,10 @@
               if (verbose and
                   'streamContentSize' in self.stream_file and
      @@
     -             'depotFile' in self.stream_file):
                   size = int(self.stream_file["fileSize"])
                   if size > 0:
     --                progress = 100*self.stream_file['streamContentSize']/size
     --                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
     -+                progress = 100.0*self.stream_file['streamContentSize']/size
     +                 progress = 100.0*self.stream_file['streamContentSize']/size
     +-                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
      +                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
                       sys.stdout.flush()
       
     @@ -1227,24 +529,6 @@
       
               if verbose:
      @@
     - 
     -         gitStream.write("tagger %s\n" % tagger)
     - 
     --        print("labelDetails=",labelDetails)
     -+        print(("labelDetails=",labelDetails))
     -         if 'Description' in labelDetails:
     -             description = labelDetails['Description']
     -         else:
     -@@
     -         if not self.branchPrefixes:
     -             return True
     -         hasPrefix = [p for p in self.branchPrefixes
     --                        if p4PathStartsWith(path, p)]
     -+                        if p4PathStartsWith(path, p, self.verbose)]
     -         if not hasPrefix and self.verbose:
     -             print('Ignoring file outside of prefix: {0}'.format(path))
     -         return hasPrefix
     -@@
                       .format(details['change']))
                   return
       
     @@ -1307,58 +591,6 @@
       
               if len(parent) > 0:
                   if self.verbose:
     -@@
     -             self.labels[newestChange] = [output, revisions]
     - 
     -         if self.verbose:
     --            print("Label changes: %s" % self.labels.keys())
     -+            print("Label changes: %s" % list(self.labels.keys()))
     - 
     -     # Import p4 labels as git tags. A direct mapping does not
     -     # exist, so assume that if all the files are at the same revision
     -@@
     -                 source = paths[0]
     -                 destination = paths[1]
     -                 ## HACK
     --                if p4PathStartsWith(source, self.depotPaths[0]) and p4PathStartsWith(destination, self.depotPaths[0]):
     -+                if p4PathStartsWith(source, self.depotPaths[0], self.verbose) and p4PathStartsWith(destination, self.depotPaths[0], self.verbose):
     -                     source = source[len(self.depotPaths[0]):-4]
     -                     destination = destination[len(self.depotPaths[0]):-4]
     - 
     -@@
     - 
     -     def getBranchMappingFromGitBranches(self):
     -         branches = p4BranchesInGit(self.importIntoRemotes)
     --        for branch in branches.keys():
     -+        for branch in list(branches.keys()):
     -             if branch == "master":
     -                 branch = "main"
     -             else:
     -@@
     -             self.updateOptionDict(description)
     - 
     -             if not self.silent:
     --                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
     -+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
     -                 sys.stdout.flush()
     -             cnt = cnt + 1
     - 
     -             try:
     -                 if self.detectBranches:
     -                     branches = self.splitFilesIntoBranches(description)
     --                    for branch in branches.keys():
     -+                    for branch in list(branches.keys()):
     -                         ## HACK  --hwn
     -                         branchPrefix = self.depotPaths[0] + branch + "/"
     -                         self.branchPrefixes = [ branchPrefix ]
     -@@
     -                 sys.exit(1)
     - 
     -     def sync_origin_only(self):
     -+        """ Ensures that the origin has been synchronized if one is set """
     -         if self.syncWithOrigin:
     -             self.hasOrigin = originP4BranchesExist()
     -             if self.hasOrigin:
      @@
                       system("git fetch origin")
       
     @@ -1439,61 +671,6 @@
           def closeStreams(self):
               self.gitStream.close()
      @@
     -                 if short in branches:
     -                     self.p4BranchesInGit = [ short ]
     -             else:
     --                self.p4BranchesInGit = branches.keys()
     -+                self.p4BranchesInGit = list(branches.keys())
     - 
     -             if len(self.p4BranchesInGit) > 1:
     -                 if not self.silent:
     -                     print("Importing from/into multiple branches")
     -                 self.detectBranches = True
     --                for branch in branches.keys():
     -+                for branch in list(branches.keys()):
     -                     self.initialParents[self.refPrefix + branch] = \
     -                         branches[branch]
     - 
     -@@
     -                                  help="where to leave result of the clone"),
     -             optparse.make_option("--bare", dest="cloneBare",
     -                                  action="store_true", default=False),
     -+            optparse.make_option("--encoding", dest="setPathEncoding",
     -+                                 action="store", default=None,
     -+                                 help="Sets the path encoding for this depot")
     -         ]
     -         self.cloneDestination = None
     -         self.needsGit = False
     -         self.cloneBare = False
     -+        self.setPathEncoding = None
     - 
     -     def defaultDestination(self, args):
     -+        """Returns the last path component as the default git 
     -+        repository directory name"""
     -         ## TODO: use common prefix of args?
     -         depotPath = args[0]
     -         depotDir = re.sub("(@[^@]*)$", "", depotPath)
     -         depotDir = re.sub("(#[^#]*)$", "", depotDir)
     -         depotDir = re.sub(r"\.\.\.$", "", depotDir)
     -         depotDir = re.sub(r"/$", "", depotDir)
     --        return os.path.split(depotDir)[1]
     -+        return depotDir.split('/')[-1]
     - 
     -     def run(self, args):
     -         if len(args) < 1:
     -@@
     - 
     -         depotPaths = args
     - 
     -+        # If we have an encoding provided, ignore what may already exist
     -+        # in the registry. This will ensure we show the displayed values
     -+        # using the correct encoding.
     -+        if self.setPathEncoding:
     -+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
     -+
     -+        # If more than 1 path element is supplied, the last element
     -+        # is the clone destination.
     -         if not self.cloneDestination and len(depotPaths) > 1:
                   self.cloneDestination = depotPaths[-1]
                   depotPaths = depotPaths[:-1]
       
     @@ -1512,177 +689,3 @@
       
               if not os.path.exists(self.cloneDestination):
                   os.makedirs(self.cloneDestination)
     -@@
     -         if retcode:
     -             raise CalledProcessError(retcode, init_cmd)
     - 
     -+        # Set the encoding if it was provided command line
     -+        if self.setPathEncoding:
     -+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
     -+            retcode = subprocess.call(init_cmd)
     -+            if retcode:
     -+                raise CalledProcessError(retcode, init_cmd)
     -+
     -         if not P4Sync.run(self, depotPaths):
     -             return False
     - 
     -@@
     -             to find the P4 commit we are based on, and the depot-paths.
     -         """
     - 
     --        for parent in (range(65535)):
     -+        for parent in (list(range(65535))):
     -             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
     -             settings = extractSettingsGitLog(log)
     -             if 'change' in settings:
     -@@
     -             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
     -         return True
     - 
     -+class Py23File():
     -+    """ Python2/3 Unicode File Wrapper 
     -+    """
     -+    
     -+    stream_handle = None
     -+    verbose       = False
     -+    debug_handle  = None
     -+   
     -+    def __init__(self, stream_handle, verbose = False,
     -+                 debug_handle = None):
     -+        """ Create a Python3 compliant Unicode to Byte String
     -+            Windows compatible wrapper
     -+
     -+            stream_handle = the underlying file-like handle
     -+            verbose       = Boolean if content should be echoed
     -+            debug_handle  = A file-like handle data is duplicately written to
     -+        """
     -+        self.stream_handle = stream_handle
     -+        self.verbose       = verbose
     -+        self.debug_handle  = debug_handle
     -+
     -+    def write(self, utf8string):
     -+        """ Writes the utf8 encoded string to the underlying 
     -+            file stream
     -+        """
     -+        self.stream_handle.write(as_bytes(utf8string))
     -+        if self.verbose:
     -+            sys.stderr.write("Stream Output: %s" % utf8string)
     -+            sys.stderr.flush()
     -+        if self.debug_handle:
     -+            self.debug_handle.write(as_bytes(utf8string))
     -+
     -+    def read(self, size = None):
     -+        """ Reads int charcters from the underlying stream 
     -+            and converts it to utf8.
     -+
     -+            Be aware, the size value is for reading the underlying
     -+            bytes so the value may be incorrect. Usage of the size
     -+            value is discouraged.
     -+        """
     -+        if size == None:
     -+            return as_string(self.stream_handle.read())
     -+        else:
     -+            return as_string(self.stream_handle.read(size))
     -+
     -+    def readline(self):
     -+        """ Reads a line from the underlying byte stream 
     -+            and converts it to utf8
     -+        """
     -+        return as_string(self.stream_handle.readline())
     -+
     -+    def readlines(self, sizeHint = None):
     -+        """ Returns a list containing lines from the file converted to unicode.
     -+
     -+            sizehint - Optional. If the optional sizehint argument is 
     -+            present, instead of reading up to EOF, whole lines totalling 
     -+            approximately sizehint bytes are read.
     -+        """
     -+        lines = self.stream_handle.readlines(sizeHint)
     -+        for i in range(0, len(lines)):
     -+            lines[i] = as_string(lines[i])
     -+        return lines
     -+
     -+    def close(self):
     -+        """ Closes the underlying byte stream """
     -+        self.stream_handle.close()
     -+
     -+    def flush(self):
     -+        """ Flushes the underlying byte stream """
     -+        self.stream_handle.flush()
     -+
     -+class DepotPath():
     -+    """ Describes a DepotPath or File
     -+    """
     -+
     -+    raw_path = None
     -+    utf8_path = None
     -+    bytes_path = None
     -+
     -+    def __init__(self, path):
     -+        """ Creates a new DepotPath with the path encoded
     -+            with by the P4 repository
     -+        """
     -+        raw_path = path
     -+
     -+    def raw():
     -+        """ Returns the path as it was originally found
     -+            in the P4 repository
     -+        """
     -+        return raw_path
     -+
     -+    def startswith(self, prefix, start = None, end = None):
     -+        """ Return True if string starts with the prefix, otherwise 
     -+            return False. prefix can also be a tuple of prefixes to 
     -+            look for. With optional start, test string beginning at 
     -+            that position. With optional end, stop comparing 
     -+            string at that position.
     -+        """
     -+        return raw_path.startswith(prefix, start, end)
     -+
     -+
     - class HelpFormatter(optparse.IndentedHelpFormatter):
     -     def __init__(self):
     -         optparse.IndentedHelpFormatter.__init__(self)
     -@@
     - 
     - def main():
     -     if len(sys.argv[1:]) == 0:
     --        printUsage(commands.keys())
     -+        printUsage(list(commands.keys()))
     -         sys.exit(2)
     - 
     -     cmdName = sys.argv[1]
     -@@
     -     except KeyError:
     -         print("unknown command %s" % cmdName)
     -         print("")
     --        printUsage(commands.keys())
     -+        printUsage(list(commands.keys()))
     -         sys.exit(2)
     - 
     -     options = cmd.options
     -@@
     -                                    description = cmd.description,
     -                                    formatter = HelpFormatter())
     - 
     --    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
     -+    try:
     -+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
     -+    except:
     -+        parser.print_help()
     -+        raise
     -+
     -     global verbose
     -     verbose = cmd.verbose
     -     if cmd.needsGit:
     -@@
     -                         chdir(cdup);
     - 
     -         if not isValidGitDir(cmd.gitdir):
     --            if isValidGitDir(cmd.gitdir + "/.git"):
     --                cmd.gitdir += "/.git"
     -+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
     -+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
     -             else:
     -                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
     - 
  -:  ---------- > 11:  883ef45ca5 git-p4: Added --encoding parameter to p4 clone

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 01/11] git-p4: select p4 binary by operating-system
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:19         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
                         ` (11 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Depending on the version of GIT and Python installed, the perforce program (p4) may not resolve on Windows without the program extension.

Check the operating system (platform.system) and if it is reporting that it is Windows, use the full filename of "p4.exe" instead of "p4"

The original code unconditionally used "p4" as the binary filename.

This change is Python2 and Python3 compatible.

Thanks to: Junio C Hamano <gitster@pobox.com> and  Denton Liu <liu.denton@gmail.com> for patiently explaining proper format for my submissions.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 9a3a5c4e6d29dbef670072a9605c7a82b3729434)
---
 git-p4.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..b2ffbc057b 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -75,7 +75,11 @@ def p4_build_cmd(cmd):
     location. It means that hooking into the environment, or other configuration
     can be done more easily.
     """
-    real_cmd = ["p4"]
+    # Look for the P4 binary
+    if (platform.system() == "Windows"):
+        real_cmd = ["p4.exe"]    
+    else:
+        real_cmd = ["p4"]
 
     user = gitConfig("git-p4.user")
     if len(user) > 0:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 02/11] git-p4: change the expansion test from basestring to list
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:27         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
                         ` (10 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

Change references to basestring in the isinstance tests to use list instead. This prepares the code to remove all references to basestring.

The original code used basestring in a test to determine if a list or literal string was passed into 9 different functions.  This is used to determine if the shell should be evoked when calling subprocess methods.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 5b1b1c145479b5d5fd242122737a3134890409e6)
---
 git-p4.py | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index b2ffbc057b..0f27996393 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -109,7 +109,7 @@ def p4_build_cmd(cmd):
         # Provide a way to not pass this option by setting git-p4.retries to 0
         real_cmd += ["-r", str(retries)]
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         real_cmd = ' '.join(real_cmd) + ' ' + cmd
     else:
         real_cmd += cmd
@@ -175,7 +175,7 @@ def write_pipe(c, stdin):
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
     pipe = p.stdin
     val = pipe.write(stdin)
@@ -197,7 +197,7 @@ def read_pipe_full(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
     return (p.returncode, out, err)
@@ -233,7 +233,7 @@ def read_pipe_lines(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c, basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     pipe = p.stdout
     val = pipe.readlines()
@@ -276,7 +276,7 @@ def p4_has_move_command():
     return True
 
 def system(cmd, ignore_error=False):
-    expand = isinstance(cmd,basestring)
+    expand = not isinstance(cmd, list)
     if verbose:
         sys.stderr.write("executing %s\n" % str(cmd))
     retcode = subprocess.call(cmd, shell=expand)
@@ -288,7 +288,7 @@ def system(cmd, ignore_error=False):
 def p4_system(cmd):
     """Specifically invoke p4 as the system command. """
     real_cmd = p4_build_cmd(cmd)
-    expand = isinstance(real_cmd, basestring)
+    expand = not isinstance(real_cmd, list)
     retcode = subprocess.call(real_cmd, shell=expand)
     if retcode:
         raise CalledProcessError(retcode, real_cmd)
@@ -526,7 +526,7 @@ def getP4OpenedType(file):
 # Return the set of all p4 labels
 def getP4Labels(depotPaths):
     labels = set()
-    if isinstance(depotPaths,basestring):
+    if not isinstance(depotPaths, list):
         depotPaths = [depotPaths]
 
     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
@@ -613,7 +613,7 @@ def isModeExecChanged(src_mode, dst_mode):
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         errors_as_exceptions=False):
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         cmd = "-G " + cmd
         expand = True
     else:
@@ -630,7 +630,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     stdin_file = None
     if stdin is not None:
         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
-        if isinstance(stdin,basestring):
+        if not isinstance(stdin, list):
             stdin_file.write(stdin)
         else:
             for i in stdin:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:40         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
                         ` (9 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

Change the existing unicode test add new support functions for python2-python3 support.

Define the following variables:
- isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
- unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
- bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.

Add the following new functions:

- as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
- as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
- to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.

Add a new function alias raw_input:
If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.

The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.

basestring is removed since its only references are found in tests that were changed in the previous change list.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
---
 git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 0f27996393..93dfd0920a 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -32,16 +32,78 @@
     unicode = unicode
 except NameError:
     # 'unicode' is undefined, must be Python 3
-    str = str
+    #
+    # For Python3 which is natively unicode, we will use 
+    # unicode for internal information but all P4 Data
+    # will remain in bytes
+    isunicode = True
     unicode = str
     bytes = bytes
-    basestring = (str,bytes)
+
+    def as_string(text):
+        """Return a byte array as a unicode string"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return unicode(text, "utf-8")
+        else:
+            return text
+
+    def as_bytes(text):
+        """Return a Unicode string as a byte array"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return text
+        else:
+            return bytes(text, "utf-8")
+
+    def to_unicode(text):
+        """Return a byte array as a unicode string"""
+        return as_string(text)    
+
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded string """
+        if isinstance(path, unicode):
+            return path
+        return encodeWithUTF8(path).decode('utf-8')
+    
 else:
     # 'unicode' exists, must be Python 2
-    str = str
+    #
+    # We will treat the data as:
+    #   str   -> str
+    #   bytes -> str
+    # So for Python2 these functions are no-ops
+    # and will leave the data in the ambiguious
+    # string/bytes state
+    isunicode = False
     unicode = unicode
     bytes = str
-    basestring = basestring
+
+    def as_string(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def as_bytes(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def to_unicode(text):
+        """Return a string as a unicode string"""
+        return text.decode('utf-8')
+    
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded bytes """
+        return encodeWithUTF8(path)
+
+
+ 
+# Check for raw_input support
+try:
+    raw_input
+except NameError:
+    raw_input = input
 
 try:
     from subprocess import CalledProcessError
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 04/11] git-p4: python3 syntax changes
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (2 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 11:02         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
                         ` (8 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

There are a number of translations suggested by modernize/futureize that should be taken to fix numerous non-string specific issues.

Change references to the X.next() iterator to the function next(X) which is compatible with both Python2 and Python3.

Change references to X.keys() to list(X.keys()) to return a list that can be iterated in both Python2 and Python3.

Add the literal text (object) to the end of class definitions to be consistent with Python3 class definition.

Change integer divison to use "//" instead of "/"  Under Both python2 and python3 // will return a floor()ed result which matches existing functionality.

Change the format string for displaying decimal values from %d to %4.1f% when displaying a progress.  This avoids displaying long repeating decimals in user displayed text.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit bde6b83296aa9b3e7a584c5ce2b571c7287d8f9f)
---
 git-p4.py | 55 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 93dfd0920a..b283ef1029 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -26,6 +26,9 @@
 import zlib
 import ctypes
 import errno
+import os.path
+import codecs
+import io
 
 # support basestring in python3
 try:
@@ -631,7 +634,7 @@ def parseDiffTreeEntry(entry):
 
     If the pattern is not matched, None is returned."""
 
-    match = diffTreePattern().next().match(entry)
+    match = next(diffTreePattern()).match(entry)
     if match:
         return {
             'src_mode': match.group(1),
@@ -935,7 +938,7 @@ def findUpstreamBranchPoint(head = "HEAD"):
     branches = p4BranchesInGit()
     # map from depot-path to branch name
     branchByDepotPath = {}
-    for branch in branches.keys():
+    for branch in list(branches.keys()):
         tip = branches[branch]
         log = extractLogMessageFromGitCommit(tip)
         settings = extractSettingsGitLog(log)
@@ -1129,7 +1132,7 @@ def getClientSpec():
     client_name = entry["Client"]
 
     # just the keys that start with "View"
-    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
 
     # hold this new View
     view = View(client_name)
@@ -1371,7 +1374,7 @@ def processContent(self, git_mode, relPath, contents):
         else:
             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
 
-class Command:
+class Command(object):
     delete_actions = ( "delete", "move/delete", "purge" )
     add_actions = ( "add", "branch", "move/add" )
 
@@ -1386,7 +1389,7 @@ def ensure_value(self, attr, value):
             setattr(self, attr, value)
         return getattr(self, attr)
 
-class P4UserMap:
+class P4UserMap(object):
     def __init__(self):
         self.userMapFromPerforceServer = False
         self.myP4UserId = None
@@ -1437,7 +1440,7 @@ def getUserMapFromPerforceServer(self):
                 self.emails[email] = user
 
         s = ''
-        for (key, val) in self.users.items():
+        for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
         open(self.getUserCacheFilename(), "wb").write(s)
@@ -1788,7 +1791,7 @@ def prepareSubmitTemplate(self, changelist=None):
                 break
         if not change_entry:
             die('Failed to decode output of p4 change -o')
-        for key, value in change_entry.iteritems():
+        for key, value in list(change_entry.items()):
             if key.startswith('File'):
                 if 'depot-paths' in settings:
                     if not [p for p in settings['depot-paths']
@@ -2032,7 +2035,7 @@ def applyCommit(self, id):
             p4_delete(f)
 
         # Set/clear executable bits
-        for f in filesToChangeExecBit.keys():
+        for f in list(filesToChangeExecBit.keys()):
             mode = filesToChangeExecBit[f]
             setP4ExecBit(f, mode)
 
@@ -2285,7 +2288,7 @@ def run(self, args):
             self.clientSpecDirs = getClientSpec()
 
         # Check for the existence of P4 branches
-        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
 
         if self.useClientSpec and not branchesDetected:
             # all files are relative to the client spec
@@ -2676,7 +2679,7 @@ def __init__(self):
         self.knownBranches = {}
         self.initialParents = {}
 
-        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
         self.labels = {}
 
     # Force a checkpoint in fast-import and wait for it to finish
@@ -2793,7 +2796,7 @@ def splitFilesIntoBranches(self, commit):
             else:
                 relPath = self.stripRepoPath(path, self.depotPaths)
 
-            for branch in self.knownBranches.keys():
+            for branch in list(self.knownBranches.keys()):
                 # add a trailing slash so that a commit into qt/4.2foo
                 # doesn't end up in qt/4.2, e.g.
                 if p4PathStartsWith(relPath, branch + "/"):
@@ -2834,7 +2837,7 @@ def streamOneP4File(self, file, contents):
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2934,7 +2937,7 @@ def streamP4FilesCb(self, marshalled):
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
             if required_bytes > 0:
                 err = 'Not enough space left on %s! Free at least %i MB.' % (
-                    os.getcwd(), required_bytes/1024/1024
+                    os.getcwd(), required_bytes//1024//1024
                 )
 
         if err:
@@ -2963,7 +2966,7 @@ def streamP4FilesCb(self, marshalled):
 
         # pick up the new file information... for the
         # 'data' field we need to append to our array
-        for k in marshalled.keys():
+        for k in list(marshalled.keys()):
             if k == 'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
@@ -2978,8 +2981,8 @@ def streamP4FilesCb(self, marshalled):
             'depotFile' in self.stream_file):
             size = int(self.stream_file["fileSize"])
             if size > 0:
-                progress = 100*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
+                progress = 100.0*self.stream_file['streamContentSize']/size
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
@@ -3060,7 +3063,7 @@ def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
 
         gitStream.write("tagger %s\n" % tagger)
 
-        print("labelDetails=",labelDetails)
+        print(("labelDetails=",labelDetails))
         if 'Description' in labelDetails:
             description = labelDetails['Description']
         else:
@@ -3199,7 +3202,7 @@ def getLabels(self):
             self.labels[newestChange] = [output, revisions]
 
         if self.verbose:
-            print("Label changes: %s" % self.labels.keys())
+            print("Label changes: %s" % list(self.labels.keys()))
 
     # Import p4 labels as git tags. A direct mapping does not
     # exist, so assume that if all the files are at the same revision
@@ -3342,7 +3345,7 @@ def getBranchMapping(self):
 
     def getBranchMappingFromGitBranches(self):
         branches = p4BranchesInGit(self.importIntoRemotes)
-        for branch in branches.keys():
+        for branch in list(branches.keys()):
             if branch == "master":
                 branch = "main"
             else:
@@ -3454,14 +3457,14 @@ def importChanges(self, changes, origin_revision=0):
             self.updateOptionDict(description)
 
             if not self.silent:
-                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
                 sys.stdout.flush()
             cnt = cnt + 1
 
             try:
                 if self.detectBranches:
                     branches = self.splitFilesIntoBranches(description)
-                    for branch in branches.keys():
+                    for branch in list(branches.keys()):
                         ## HACK  --hwn
                         branchPrefix = self.depotPaths[0] + branch + "/"
                         self.branchPrefixes = [ branchPrefix ]
@@ -3650,13 +3653,13 @@ def run(self, args):
                 if short in branches:
                     self.p4BranchesInGit = [ short ]
             else:
-                self.p4BranchesInGit = branches.keys()
+                self.p4BranchesInGit = list(branches.keys())
 
             if len(self.p4BranchesInGit) > 1:
                 if not self.silent:
                     print("Importing from/into multiple branches")
                 self.detectBranches = True
-                for branch in branches.keys():
+                for branch in list(branches.keys()):
                     self.initialParents[self.refPrefix + branch] = \
                         branches[branch]
 
@@ -4040,7 +4043,7 @@ def findLastP4Revision(self, starting_point):
             to find the P4 commit we are based on, and the depot-paths.
         """
 
-        for parent in (range(65535)):
+        for parent in (list(range(65535))):
             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
             settings = extractSettingsGitLog(log)
             if 'change' in settings:
@@ -4179,7 +4182,7 @@ def printUsage(commands):
 
 def main():
     if len(sys.argv[1:]) == 0:
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     cmdName = sys.argv[1]
@@ -4189,7 +4192,7 @@ def main():
     except KeyError:
         print("unknown command %s" % cmdName)
         print("")
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     options = cmd.options
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 05/11] git-p4: Add new functions in preparation of usage
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (3 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:50         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
                         ` (7 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

This changelist is an intermediate submission for migrating the P4 support from Python2 to Python3. The code needs access to the encodeWithUTF8() for support of non-UTF8 filenames in the clone class as well as the sync class.

Move the function encodeWithUTF8() from the P4Sync class to a stand-alone function.  This will allow other classes to use this function without instanciating the P4Sync class. Change the self.verbose reference to an optional method parameter. Update the existing references to this function to pass the self.verbose since it is no longer available on "self" since the function is no longer contained on the P4Sync class.

Modify the functions write_pipe() and p4_write_pipe() to remove the return value.  The return value for both functions is the number of bytes, but the meaning is lost under python3 since the count does not match the number of characters that may have been encoded.  Additionally, the return value was never used, so this is removed to avoid future ambiguity.

Add a new method gitConfigSet(). This method will set a value in the git configuration cache list.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit affe888f432bb6833df78962e8671fccdf76c47a)
---
 git-p4.py | 60 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index b283ef1029..2659531c2e 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -237,6 +237,8 @@ def die(msg):
         sys.exit(1)
 
 def write_pipe(c, stdin):
+    """ Executes the command 'c', passing 'stdin' on the standard input
+    """
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
@@ -248,11 +250,12 @@ def write_pipe(c, stdin):
     if p.wait():
         die('Command failed: %s' % str(c))
 
-    return val
 
 def p4_write_pipe(c, stdin):
+    """ Runs a P4 command 'c', passing 'stdin' data to P4
+    """
     real_cmd = p4_build_cmd(c)
-    return write_pipe(real_cmd, stdin)
+    write_pipe(real_cmd, stdin)
 
 def read_pipe_full(c):
     """ Read output from  command. Returns a tuple
@@ -653,6 +656,38 @@ def isModeExec(mode):
     # otherwise False.
     return mode[-3:] == "755"
 
+def encodeWithUTF8(path, verbose = False):
+    """ Ensure that the path is encoded as a UTF-8 string
+
+        Returns bytes(P3)/str(P2)
+    """
+   
+    if isunicode:
+        try:
+            if isinstance(path, unicode):
+                # It is already unicode, cast it as a bytes
+                # that is encoded as utf-8.
+                return path.encode('utf-8', 'strict')
+            path.decode('ascii', 'strict')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
+    else:    
+        try:
+            path.decode('ascii')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
+    return path
+
 class P4Exception(Exception):
     """ Base class for exceptions from the p4 client """
     def __init__(self, exit_code):
@@ -891,6 +926,11 @@ def gitConfigList(key):
             _gitConfig[key] = []
     return _gitConfig[key]
 
+def gitConfigSet(key, value):
+    """ Set the git configuration key 'key' to 'value' for this session
+    """
+    _gitConfig[key] = value
+
 def p4BranchesInGit(branchesAreInRemotes=True):
     """Find all the branches whose names start with "p4/", looking
        in remotes or heads as specified by the argument.  Return
@@ -2814,24 +2854,12 @@ def writeToGitStream(self, gitMode, relPath, contents):
             self.gitStream.write(d)
         self.gitStream.write('\n')
 
-    def encodeWithUTF8(self, path):
-        try:
-            path.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
-        return path
-
     # output one file from the P4 stream
     # - helper for streamP4Files
 
     def streamOneP4File(self, file, contents):
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             if 'fileSize' in self.stream_file:
                 size = int(self.stream_file['fileSize'])
@@ -2914,7 +2942,7 @@ def streamOneP4File(self, file, contents):
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (4 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 13:38         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
                         ` (6 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

When a computer is configured to use Git for windows and Python for windows, and not a Unix subsystem like cygwin or WSL, the directory separator changes and causes git-p4 to fail to properly determine paths.

Fix 3 path separator errors:

1. getUserCacheFilename should not use string concatenation. Change this code to use os.path.join to build an OS tolerant path.
2. defaultDestiantion used the OS.path.split to split depot paths.  This is incorrect on windows. Change the code to split on a forward slash(/) instead since depot paths use this character regardless  of the operating system.
3. The call to isvalidGitDir() in the main code also used a literal forward slash. Change the cose to use os.path.join to correctly format the path for the operating system.

These three changes allow the suggested windows configuration to properly locate files while retaining the existing behavior on non-windows operating systems.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit a5b45c12c3861638a933b05a1ffee0c83978dcb2)
---
 git-p4.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 2659531c2e..7ac8cb42ef 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1454,8 +1454,10 @@ def p4UserIsMe(self, p4User):
             return True
 
     def getUserCacheFilename(self):
+        """ Returns the filename of the username cache 
+	    """
         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
-        return home + "/.gitp4-usercache.txt"
+        return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
         if self.userMapFromPerforceServer:
@@ -3973,13 +3975,16 @@ def __init__(self):
         self.cloneBare = False
 
     def defaultDestination(self, args):
+        """ Returns the last path component as the default git 
+            repository directory name
+        """
         ## TODO: use common prefix of args?
         depotPath = args[0]
         depotDir = re.sub("(@[^@]*)$", "", depotPath)
         depotDir = re.sub("(#[^#]*)$", "", depotDir)
         depotDir = re.sub(r"\.\.\.$", "", depotDir)
         depotDir = re.sub(r"/$", "", depotDir)
-        return os.path.split(depotDir)[1]
+        return depotDir.split('/')[-1]
 
     def run(self, args):
         if len(args) < 1:
@@ -4252,8 +4257,8 @@ def main():
                         chdir(cdup);
 
         if not isValidGitDir(cmd.gitdir):
-            if isValidGitDir(cmd.gitdir + "/.git"):
-                cmd.gitdir += "/.git"
+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
             else:
                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 07/11] git-p4: Add a helper class for stream writing
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (5 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 13:42         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
                         ` (5 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

This is a transtional commit that does not change current behvior.  It adds a new class Py23File.

Following the Python recommendation of keeping text as unicode internally and only converting to and from bytes on input and output, this class provides an interface for the methods used for reading and writing files and file like streams.

Create a class that wraps the input and output functions used by the git-p4.py code for reading and writing to standard file handles.

The methods of this class should take a Unicode string for writing and return unicode strings in reads.  This class should be a drop-in for existing file like streams

The following methods should be coded for supporting existing read/write calls:
* write - this should write a Unicode string to the underlying stream
* read - this should read from the underlying stream and cast the bytes as a unicode string
* readline - this should read one line of text from the underlying stream and cast it as a unicode string
* readline - this should read a number of lines, optionally hinted, and cast each line as a unicode string

The expression "cast as a unicode string" is used because the code should use the AS_BYTES() and AS_UNICODE() functions instead of cohercing the data to actual unicode strings or bytes.  This allows python 2 code to continue to use the internal "str" data type instead of converting the data back and forth to actual unicode strings. This retains current python2 support while python3 support may be incomplete.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 12919111fbaa3e4c0c4c2fdd4f79744cc683d860)
---
 git-p4.py | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index 7ac8cb42ef..0da640be93 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -4182,6 +4182,72 @@ def run(self, args):
             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
         return True
 
+class Py23File():
+    """ Python2/3 Unicode File Wrapper 
+    """
+    
+    stream_handle = None
+    verbose       = False
+    debug_handle  = None
+   
+    def __init__(self, stream_handle, verbose = False):
+        """ Create a Python3 compliant Unicode to Byte String
+            Windows compatible wrapper
+
+            stream_handle = the underlying file-like handle
+            verbose       = Boolean if content should be echoed
+        """
+        self.stream_handle = stream_handle
+        self.verbose       = verbose
+
+    def write(self, utf8string):
+        """ Writes the utf8 encoded string to the underlying 
+            file stream
+        """
+        self.stream_handle.write(as_bytes(utf8string))
+        if self.verbose:
+            sys.stderr.write("Stream Output: %s" % utf8string)
+            sys.stderr.flush()
+
+    def read(self, size = None):
+        """ Reads int charcters from the underlying stream 
+            and converts it to utf8.
+
+            Be aware, the size value is for reading the underlying
+            bytes so the value may be incorrect. Usage of the size
+            value is discouraged.
+        """
+        if size == None:
+            return as_string(self.stream_handle.read())
+        else:
+            return as_string(self.stream_handle.read(size))
+
+    def readline(self):
+        """ Reads a line from the underlying byte stream 
+            and converts it to utf8
+        """
+        return as_string(self.stream_handle.readline())
+
+    def readlines(self, sizeHint = None):
+        """ Returns a list containing lines from the file converted to unicode.
+
+            sizehint - Optional. If the optional sizehint argument is 
+            present, instead of reading up to EOF, whole lines totalling 
+            approximately sizehint bytes are read.
+        """
+        lines = self.stream_handle.readlines(sizeHint)
+        for i in range(0, len(lines)):
+            lines[i] = as_string(lines[i])
+        return lines
+
+    def close(self):
+        """ Closes the underlying byte stream """
+        self.stream_handle.close()
+
+    def flush(self):
+        """ Flushes the underlying byte stream """
+        self.stream_handle.flush()
+
 class HelpFormatter(optparse.IndentedHelpFormatter):
     def __init__(self):
         optparse.IndentedHelpFormatter.__init__(self)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 08/11] git-p4: p4CmdList  - support Unicode encoding
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (6 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 13:55         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
                         ` (4 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.

Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.

Change the code so that the key will always be encoded AS_STRING()

Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.

Additionally, change literal text prior to conversion to be literal bytes.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 88306ac269186cbd0f6dc6cfd366b50b28ee4886)
---
 git-p4.py | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 0da640be93..f7c0ef0c53 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -711,7 +711,23 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
-        errors_as_exceptions=False):
+        errors_as_exceptions=False, encode_data=True):
+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
+        standard input via a temporary file with 'stdin_mode' mode.
+
+        Output from the command is optionally passed to the callback function 'cb'.
+        If 'cb' is None, the response from the command is parsed into a list
+        of resulting dictionaries. (For each block read from the process pipe.)
+
+        If 'skip_info' is true, information in a block read that has a code type of
+        'info' will be skipped.
+
+        If 'errors_as_exceptions' is set to true (the default is false) the error
+        code returned from the execution will generate an exception.
+
+        If 'encode_data' is set to true (the default) the data that is returned 
+        by this function will be passed through the "as_string" function.
+    """
 
     if not isinstance(cmd, list):
         cmd = "-G " + cmd
@@ -734,7 +750,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             stdin_file.write(stdin)
         else:
             for i in stdin:
-                stdin_file.write(i + '\n')
+                stdin_file.write(as_bytes(i) + b'\n')
         stdin_file.flush()
         stdin_file.seek(0)
 
@@ -748,12 +764,15 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         while True:
             entry = marshal.load(p4.stdout)
             if skip_info:
-                if 'code' in entry and entry['code'] == 'info':
+                if b'code' in entry and entry[b'code'] == b'info':
                     continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                out = {}
+                for key, value in entry.items():
+                    out[as_string(key)] = (as_string(value) if encode_data else value)
+                result.append(out)
     except EOFError:
         pass
     exitCode = p4.wait()
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 09/11] git-p4: Add usability enhancements
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (7 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 14:04         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
                         ` (3 subsequent siblings)
  12 siblings, 1 reply; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Issue: when prompting the user with raw_input, the tests are not forgiving of user input.  For example, on the first query asks for a yes/no response. If the user enters the full word "yes" or "no" the test will fail. Additionally, offer the suggestion of setting git-p4.attemptRCSCleanup when applying a commit fails because of RCS keywords. Both of these changes are usability enhancement suggestions.

Change the code prompting the user for input to sanitize the user input before checking the response by asking the response as a lower case string, trimming leading/trailing spaces, and returning the first character.

Change the applyCommit() method that when applying a commit fails becasue of the P4 RCS Keywords, the user should consider setting git-p4.attemptRCSCleanup.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 1fab571664f5b6ad4ef321199f52615a32a9f8c7)
---
 git-p4.py | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index f7c0ef0c53..f13e4645a3 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1909,7 +1909,8 @@ def edit_template(self, template_file):
             return True
 
         while True:
-            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
+                .strip()[0]
             if response == 'y':
                 return True
             if response == 'n':
@@ -2069,8 +2070,23 @@ def applyCommit(self, id):
                     # disable the read-only bit on windows.
                     if self.isWindows and file not in editedFiles:
                         os.chmod(file, stat.S_IWRITE)
-                    self.patchRCSKeywords(file, kwfiles[file])
-                    fixed_rcs_keywords = True
+                    
+                    try:
+                        self.patchRCSKeywords(file, kwfiles[file])
+                        fixed_rcs_keywords = True
+                    except:
+                        # We are throwing an exception, undo all open edits
+                        for f in editedFiles:
+                            p4_revert(f)
+                        raise
+            else:
+                # They do not have attemptRCSCleanup set, this might be the fail point
+                # Check to see if the file has RCS keywords and suggest setting the property.
+                for file in editedFiles | filesToDelete:
+                    if p4_keywords_regexp_for_file(file) != None:
+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
+                        break
 
             if fixed_rcs_keywords:
                 print("Retrying the patch with RCS keywords cleaned up")
@@ -2481,7 +2497,7 @@ def run(self, args):
                         if self.conflict_behavior == "ask":
                             print("What do you want to do?")
                             response = raw_input("[s]kip this commit but apply"
-                                                 " the rest, or [q]uit? ")
+                                                 " the rest, or [q]uit? ").lower().strip()[0]
                             if not response:
                                 continue
                         elif self.conflict_behavior == "skip":
@@ -4327,7 +4343,12 @@ def main():
                                    description = cmd.description,
                                    formatter = HelpFormatter())
 
-    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    try:
+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    except:
+        parser.print_help()
+        raise
+
     global verbose
     verbose = cmd.verbose
     if cmd.needsGit:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (8 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
                         ` (2 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Issue: Python 3 is still not properly supported for any use with the git-p4 python code.
Warning - this is a very large atomic commit.  The commit text is also very large.

Change the code such that, with the exception of P4 depot paths and depot files, all text read by git-p4 is cast as a string as soon as possible and converted back to bytes as late as possible, following Python2 to Python3 conversion best practices.

Important: Do not cast the bytes that contain the p4 depot path or p4 depot file name.  These should be left as bytes until used.

These two values should not be converted because the encoding of these values is unknown.  git-p4 supports a configuration value git-p4.pathEncoding that is used by the encodeWithUTF8()  to determine what a UTF8 version of the path and filename should be.  However, since depot path and depot filename need to be sent to P4 in their original encoding, they will be left as byte streams until they are actually used:

* When sent to P4, the bytes are literally passed to the p4 command
* When displayed in text for the user, they should be passed through the path_as_string() function
* When used by GIT they should be passed through the encodeWithUTF8() function

Change all the rest of system calls to cast output (stdin) as_bytes() and input (stdout) as_string().  This retains existing Python 2 support, and adds python 3 support for these functions:
* read_pipe_full
* read_pipe_lines
* p4_has_move_command (used internally)
* gitConfig
* branch_exists
* GitLFS.generatePointer
* applyCommit - template must be read and written to the temporary file as_bytes() since it is created in memory as a string.
* streamOneP4File(file, contents) - wrap calls to the depotFile in path_as_string() for display. The file contents must be retained as bytes, so update the RCS changes to be forced to bytes.
* streamP4Files
* importHeadRevision(revision) - encode the depotPaths for display separate from the text for processing.

Py23File usage -
Change the P4Sync.OpenStreams() function to cast the gitOutput, gitStream, and gitError streams as Py23File() wrapper classes.  This facilitates taking strings in both python 2 and python 3 and casting them to bytes in the wrapper class instead of having to modify each method. Since the fast-import command also expects a raw byte stream for file content, add a new stream handle - gitStreamBytes which is an unwrapped verison of gitStream.

Literal text -
Depending on context, most literal text does not need casting to unicode or bytes as the text is Python dependent - In python 2, the string is implied as 'str' and python 3 the string is implied as 'unicode'. Under these conditions, they match the rest of the operating text, following best practices.  However, when a literal string is used in functions that are dealing with the raw input from and raw ouput to files streams, literal bytes may be required. Additionally, functions that are dealing with P4 depot paths or P4 depot file names are also dealing with bytes and will require the same casting as bytes.  The following functions cast text as byte strings:
* wildcard_decode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
* wildcard_encode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
* streamP4FilesCb(marshalled) - the marshalled data is in bytes. Cast the literals as bytes. When using this data to manipulate self.stream_file, encode all the marshalled data except for the 'depotFile' name.
* streamP4Files

Special behavior:
* p4_describe - encoding is disabled for the depotFile(x) and path elements since these are depot path and depo filenames.
* p4PathStartsWith(path, prefix) - Since P4 depot paths can contain non-UTF-8 encoded strings, change this method to compare paths while supporting the optional encoding.
   - First, perform a byte-to-byte check to see if the path and prefix are both identical text.  There is no need to perform encoding conversions if the text is identical.
   - If the byte check fails, pass both the path and prefix through encodeWithUTF8() to ensure both paths are using the same encoding. Then perform the test as originally written.
* patchRCSKeywords(file, pattern) - the parameters of file and pattern are both strings. However this function changes the contents of the file itentified by name "file". Treat the content of this file as binary to ensure that python does not accidently change the original encoding. The regular expression is cast as_bytes() and run against the file as_bytes(). The P4 keywords are ASCII strings and cannot span lines so iterating over each line of the file is acceptable.
* writeToGitStream(gitMode, relPath, contents) - Since 'contents' is already bytes data, instead of using the self.gitStream, use the new self.gitStreamBytes - the unwrapped gitStream that does not cast as_bytes() the binary data.
* commit(details, files, branch, parent = "", allow_empty=False) - Changed the encoding for the commit message to the preferred format for fast-import. The number of bytes is sent in the data block instead of using the EOT marker.
* Change the code for handling the user cache to use binary files. Cast text as_bytes() when writing to the cache and as_string() when reading from the cache.  This makes the reading and writing of the cache determinstic in it's encoding. Unlike file paths, P4 encodes the user names in UTF-8 encoding so no additional string encoding is required.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 65ff0c74ebe62a200b4385ecfd4aa618ce091f48)
---
 git-p4.py | 287 ++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 205 insertions(+), 82 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index f13e4645a3..05db2ec657 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -268,6 +268,8 @@ def read_pipe_full(c):
     expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = as_string(out)
+    err = as_string(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -294,10 +296,17 @@ def read_pipe_text(c):
         return out.rstrip()
 
 def p4_read_pipe(c, ignore_error=False):
+    """ Read output from the P4 command 'c'. Returns the output text on
+        success. On failure, terminates execution, unless
+        ignore_error is True, when it returns an empty string.
+    """
     real_cmd = p4_build_cmd(c)
     return read_pipe(real_cmd, ignore_error)
 
 def read_pipe_lines(c):
+    """ Returns a list of text from executing the command 'c'.
+        The program will die if the command fails to execute.
+    """
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
@@ -307,6 +316,11 @@ def read_pipe_lines(c):
     val = pipe.readlines()
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
+    # Unicode conversion from byte-string
+    # Iterate and fix in-place to avoid a second list in memory.
+    if isunicode:
+        for i in range(len(val)):
+            val[i] = as_string(val[i])
 
     return val
 
@@ -335,6 +349,8 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    out=as_string(out)
+    err=as_string(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -462,16 +478,20 @@ def p4_last_change():
     return int(results[0]['change'])
 
 def p4_describe(change, shelved=False):
-    """Make sure it returns a valid result by checking for
-       the presence of field "time".  Return a dict of the
-       results."""
+    """ Returns information about the requested P4 change list.
+
+        Data returned is not string encoded (returned as bytes)
+    """
+    # Make sure it returns a valid result by checking for
+    #   the presence of field "time".  Return a dict of the
+    #   results.
 
     cmd = ["describe", "-s"]
     if shelved:
         cmd += ["-S"]
     cmd += [str(change)]
 
-    ds = p4CmdList(cmd, skip_info=True)
+    ds = p4CmdList(cmd, skip_info=True, encode_data=False)
     if len(ds) != 1:
         die("p4 describe -s %d did not return 1 result: %s" % (change, str(ds)))
 
@@ -481,12 +501,23 @@ def p4_describe(change, shelved=False):
         die("p4 describe -s %d exited with %d: %s" % (change, d["p4ExitCode"],
                                                       str(d)))
     if "code" in d:
-        if d["code"] == "error":
+        if d["code"] == b"error":
             die("p4 describe -s %d returned error code: %s" % (change, str(d)))
 
     if "time" not in d:
         die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
 
+    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however 
+    # cast as_string() the rest of the text. 
+    keys=d.keys()
+    for key in keys:
+        if key.startswith('depotFile'):
+            d[key]=d[key] 
+        elif key == 'path':
+            d[key]=d[key] 
+        else:
+            d[key] = as_string(d[key])
+
     return d
 
 #
@@ -800,6 +831,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     return result
 
 def p4Cmd(cmd):
+    """ Executes a P4 command and returns the results in a dictionary
+    """
     list = p4CmdList(cmd)
     result = {}
     for entry in list:
@@ -908,13 +941,15 @@ def gitDeleteRef(ref):
 _gitConfig = {}
 
 def gitConfig(key, typeSpecifier=None):
+    """ Return a configuration setting from GIT
+	"""
     if key not in _gitConfig:
         cmd = [ "git", "config" ]
         if typeSpecifier:
             cmd += [ typeSpecifier ]
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
-        _gitConfig[key] = s.strip()
+        _gitConfig[key] = as_string(s).strip()
     return _gitConfig[key]
 
 def gitConfigBool(key):
@@ -988,6 +1023,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = as_string(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
@@ -1171,9 +1207,22 @@ def p4PathStartsWith(path, prefix):
     #
     # we may or may not have a problem. If you have core.ignorecase=true,
     # we treat DirA and dira as the same directory
+    
+    # Since we have to deal with mixed encodings for p4 file
+    # paths, first perform a simple startswith check, this covers
+    # the case that the formats and path are identical.
+    if as_bytes(path).startswith(as_bytes(prefix)):
+        return True
+    
+    # attempt to convert the prefix and path both to utf8
+    path_utf8 = encodeWithUTF8(path)
+    prefix_utf8 = encodeWithUTF8(prefix)
+
     if gitConfigBool("core.ignorecase"):
-        return path.lower().startswith(prefix.lower())
-    return path.startswith(prefix)
+        # Check if we match byte-per-byte.  
+        
+        return path_utf8.lower().startswith(prefix_utf8.lower())
+    return path_utf8.startswith(prefix_utf8)
 
 def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
@@ -1229,18 +1278,24 @@ def wildcard_decode(path):
     # Cannot have * in a filename in windows; untested as to
     # what p4 would do in such a case.
     if not platform.system() == "Windows":
-        path = path.replace("%2A", "*")
-    path = path.replace("%23", "#") \
-               .replace("%40", "@") \
-               .replace("%25", "%")
+        path = path.replace(b"%2A", b"*")
+    path = path.replace(b"%23", b"#") \
+               .replace(b"%40", b"@") \
+               .replace(b"%25", b"%")
     return path
 
 def wildcard_encode(path):
     # do % first to avoid double-encoding the %s introduced here
-    path = path.replace("%", "%25") \
-               .replace("*", "%2A") \
-               .replace("#", "%23") \
-               .replace("@", "%40")
+    if isinstance(path, unicode):
+        path = path.replace("%", "%25") \
+                   .replace("*", "%2A") \
+                   .replace("#", "%23") \
+                   .replace("@", "%40")
+    else:
+        path = path.replace(b"%", b"%25") \
+                   .replace(b"*", b"%2A") \
+                   .replace(b"#", b"%23") \
+                   .replace(b"@", b"%40")
     return path
 
 def wildcard_present(path):
@@ -1372,7 +1427,7 @@ def generatePointer(self, contentFile):
             ['git', 'lfs', 'pointer', '--file=' + contentFile],
             stdout=subprocess.PIPE
         )
-        pointerFile = pointerProcess.stdout.read()
+        pointerFile = as_string(pointerProcess.stdout.read())
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
@@ -1479,6 +1534,8 @@ def getUserCacheFilename(self):
         return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
+        """ Creates the usercache from the data in P4.
+        """
         if self.userMapFromPerforceServer:
             return
         self.users = {}
@@ -1504,18 +1561,22 @@ def getUserMapFromPerforceServer(self):
         for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
-        open(self.getUserCacheFilename(), "wb").write(s)
+        cache = io.open(self.getUserCacheFilename(), "wb")
+        cache.write(as_bytes(s))
+        cache.close()
         self.userMapFromPerforceServer = True
 
     def loadUserMapFromCache(self):
+        """ Reads the P4 username to git email map 
+        """
         self.users = {}
         self.userMapFromPerforceServer = False
         try:
-            cache = open(self.getUserCacheFilename(), "rb")
+            cache = io.open(self.getUserCacheFilename(), "rb")
             lines = cache.readlines()
             cache.close()
             for line in lines:
-                entry = line.strip().split("\t")
+                entry = as_string(line).strip().split("\t")
                 self.users[entry[0]] = entry[1]
         except IOError:
             self.getUserMapFromPerforceServer()
@@ -1715,21 +1776,27 @@ def prepareLogMessage(self, template, message, jobs):
         return result
 
     def patchRCSKeywords(self, file, pattern):
-        # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
+        """ Attempt to zap the RCS keywords in a p4 
+            controlled file matching the given pattern
+        """
+        bSubLine = as_bytes(r'$\1$')
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
-            outFile = os.fdopen(handle, "w+")
-            inFile = open(file, "r")
-            regexp = re.compile(pattern, re.VERBOSE)
+            outFile = os.fdopen(handle, "w+b")
+            inFile = open(file, "rb")
+            regexp = re.compile(as_bytes(pattern), re.VERBOSE)
             for line in inFile.readlines():
-                line = regexp.sub(r'$\1$', line)
+                line = regexp.sub(bSubLine, line)
                 outFile.write(line)
             inFile.close()
             outFile.close()
+            outFile = None
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
         except:
+            if outFile != None:
+                outFile.close()
             # cleanup our temporary file
             os.unlink(outFileName)
             print("Failed to strip RCS keywords in %s" % file)
@@ -2149,7 +2216,7 @@ def applyCommit(self, id):
         tmpFile = os.fdopen(handle, "w+b")
         if self.isWindows:
             submitTemplate = submitTemplate.replace("\n", "\r\n")
-        tmpFile.write(submitTemplate)
+        tmpFile.write(as_bytes(submitTemplate))
         tmpFile.close()
 
         if self.prepare_p4_only:
@@ -2199,8 +2266,8 @@ def applyCommit(self, id):
                 message = tmpFile.read()
                 tmpFile.close()
                 if self.isWindows:
-                    message = message.replace("\r\n", "\n")
-                submitTemplate = message[:message.index(separatorLine)]
+                    message = message.replace(b"\r\n", b"\n")
+                submitTemplate = message[:message.index(as_bytes(separatorLine))]
 
                 if update_shelve:
                     p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
@@ -2843,8 +2910,11 @@ def stripRepoPath(self, path, prefixes):
         return path
 
     def splitFilesIntoBranches(self, commit):
-        """Look at each depotFile in the commit to figure out to what
-           branch it belongs."""
+        """ Look at each depotFile in the commit to figure out to what
+            branch it belongs.
+
+            Data in the commit will NOT be encoded
+        """
 
         if self.clientSpecDirs:
             files = self.extractFilesFromCommit(commit)
@@ -2885,16 +2955,22 @@ def splitFilesIntoBranches(self, commit):
         return branches
 
     def writeToGitStream(self, gitMode, relPath, contents):
-        self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+        """ Writes the bytes[] 'contents' to the git fast-import
+            with the given 'gitMode' and 'relPath' as the relative
+            path.
+        """
+        self.gitStream.write('M %s inline %s\n' % (gitMode, as_string(relPath)))
         self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
         for d in contents:
-            self.gitStream.write(d)
+            self.gitStreamBytes.write(d)
         self.gitStream.write('\n')
 
-    # output one file from the P4 stream
-    # - helper for streamP4Files
-
     def streamOneP4File(self, file, contents):
+        """ output one file from the P4 stream to the git inbound stream.
+            helper for streamP4files.
+
+            contents should be a bytes (bytes) 
+        """
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
         relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
@@ -2902,7 +2978,7 @@ def streamOneP4File(self, file, contents):
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2920,7 +2996,7 @@ def streamOneP4File(self, file, contents):
                 # to nothing.  This causes p4 errors when checking out such
                 # a change, and errors here too.  Work around it by ignoring
                 # the bad symlink; hopefully a future change fixes it.
-                print("\nIgnoring empty symlink in %s" % file['depotFile'])
+                print("\nIgnoring empty symlink in %s" % path_as_string(file['depotFile']))
                 return
             elif data[-1] == '\n':
                 contents = [data[:-1]]
@@ -2960,16 +3036,16 @@ def streamOneP4File(self, file, contents):
             # Ideally, someday, this script can learn how to generate
             # appledouble files directly and import those to git, but
             # non-mac machines can never find a use for apple filetype.
-            print("\nIgnoring apple filetype file %s" % file['depotFile'])
+            print("\nIgnoring apple filetype file %s" % path_as_string(file['depotFile']))
             return
 
         # Note that we do not try to de-mangle keywords on utf16 files,
         # even though in theory somebody may want that.
-        pattern = p4_keywords_regexp_for_type(type_base, type_mods)
+        pattern = as_bytes(p4_keywords_regexp_for_type(type_base, type_mods))
         if pattern:
             regexp = re.compile(pattern, re.VERBOSE)
-            text = ''.join(contents)
-            text = regexp.sub(r'$\1$', text)
+            text = b''.join(contents)
+            text = regexp.sub(as_bytes(r'$\1$'), text)
             contents = [ text ]
 
         if self.largeFileSystem:
@@ -2988,15 +3064,19 @@ def streamOneP4Deletion(self, file):
         if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
             self.largeFileSystem.removeLargeFile(relPath)
 
-    # handle another chunk of streaming data
     def streamP4FilesCb(self, marshalled):
+        """ Callback function for recording P4 chunks of data for streaming 
+            into GIT.
+
+            marshalled data is bytes[] from the caller
+        """
 
         # catch p4 errors and complain
         err = None
-        if "code" in marshalled:
-            if marshalled["code"] == "error":
-                if "data" in marshalled:
-                    err = marshalled["data"].rstrip()
+        if b"code" in marshalled:
+            if marshalled[b"code"] == b"error":
+                if b"data" in marshalled:
+                    err = marshalled[b"data"].rstrip()
 
         if not err and 'fileSize' in self.stream_file:
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
@@ -3018,11 +3098,11 @@ def streamP4FilesCb(self, marshalled):
             # ignore errors, but make sure it exits first
             self.importProcess.wait()
             if f:
-                die("Error from p4 print for %s: %s" % (f, err))
+                die("Error from p4 print for %s: %s" % (path_as_string(f), err))
             else:
                 die("Error from p4 print: %s" % err)
 
-        if 'depotFile' in marshalled and self.stream_have_file_info:
+        if b'depotFile' in marshalled and self.stream_have_file_info:
             # start of a new file - output the old one first
             self.streamOneP4File(self.stream_file, self.stream_contents)
             self.stream_file = {}
@@ -3032,13 +3112,16 @@ def streamP4FilesCb(self, marshalled):
         # pick up the new file information... for the
         # 'data' field we need to append to our array
         for k in list(marshalled.keys()):
-            if k == 'data':
+            if k == b'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
-                self.stream_file['streamContentSize'] += len(marshalled['data'])
-                self.stream_contents.append(marshalled['data'])
+                self.stream_file['streamContentSize'] += len(marshalled[b'data'])
+                self.stream_contents.append(marshalled[b'data'])
             else:
-                self.stream_file[k] = marshalled[k]
+                if k == b'depotFile':
+                    self.stream_file[as_string(k)] = marshalled[k]
+                else:
+                    self.stream_file[as_string(k)] = as_string(marshalled[k])
 
         if (verbose and
             'streamContentSize' in self.stream_file and
@@ -3047,13 +3130,14 @@ def streamP4FilesCb(self, marshalled):
             size = int(self.stream_file["fileSize"])
             if size > 0:
                 progress = 100.0*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
 
-    # Stream directly from "p4 files" into "git fast-import"
     def streamP4Files(self, files):
+        """ Stream directly from "p4 files" into "git fast-import" 
+        """
         filesForCommit = []
         filesToRead = []
         filesToDelete = []
@@ -3074,7 +3158,7 @@ def streamP4Files(self, files):
             self.stream_contents = []
             self.stream_have_file_info = False
 
-            # curry self argument
+            # Callback for P4 command to collect file content
             def streamP4FilesCbSelf(entry):
                 self.streamP4FilesCb(entry)
 
@@ -3083,9 +3167,9 @@ def streamP4FilesCbSelf(entry):
                 if 'shelved_cl' in f:
                     # Handle shelved CLs using the "p4 print file@=N" syntax to print
                     # the contents
-                    fileArg = '%s@=%d' % (f['path'], f['shelved_cl'])
+                    fileArg = b'%s@=%d' % (f['path'], as_bytes(f['shelved_cl']))
                 else:
-                    fileArg = '%s#%s' % (f['path'], f['rev'])
+                    fileArg = b'%s#%s' % (f['path'], as_bytes(f['rev']))
 
                 fileArgs.append(fileArg)
 
@@ -3105,7 +3189,7 @@ def make_email(self, userid):
 
     def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
         """ Stream a p4 tag.
-        commit is either a git commit, or a fast-import mark, ":<p4commit>"
+            commit is either a git commit, or a fast-import mark, ":<p4commit>"
         """
 
         if verbose:
@@ -3177,7 +3261,22 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
                 .format(details['change']))
             return
 
+        # fast-import:
+        #'commit' SP <ref> LF
+	    #mark?
+	    #original-oid?
+	    #('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
+	    #'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+	    #('encoding' SP <encoding>)?
+	    #data
+	    #('from' SP <commit-ish> LF)?
+	    #('merge' SP <commit-ish> LF)*
+	    #(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
+	    #LF?
+        
+        #'commit' - <ref> is the name of the branch to make the commit on
         self.gitStream.write("commit %s\n" % branch)
+        #'mark' SP :<idnum>
         self.gitStream.write("mark :%s\n" % details["change"])
         self.committedChanges.add(int(details["change"]))
         committer = ""
@@ -3187,19 +3286,29 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
 
         self.gitStream.write("committer %s\n" % committer)
 
-        self.gitStream.write("data <<EOT\n")
-        self.gitStream.write(details["desc"])
+        # Per https://git-scm.com/docs/git-fast-import
+        # The preferred method for creating the commit message is to supply the 
+        # byte count in the data method and not to use a Delimited format. 
+        # Collect all the text in the commit message into a single string and 
+        # compute the byte count.
+        commitText = details["desc"]
         if len(jobs) > 0:
-            self.gitStream.write("\nJobs: %s" % (' '.join(jobs)))
-
+            commitText += "\nJobs: %s" % (' '.join(jobs))
         if not self.suppress_meta_comment:
-            self.gitStream.write("\n[git-p4: depot-paths = \"%s\": change = %s" %
-                                (','.join(self.branchPrefixes), details["change"]))
-            if len(details['options']) > 0:
-                self.gitStream.write(": options = %s" % details['options'])
-            self.gitStream.write("]\n")
+            # coherce the path to the correct formatting in the branch prefixes as well.
+            dispPaths = []
+            for p in self.branchPrefixes:
+                dispPaths += [path_as_string(p)]
 
-        self.gitStream.write("EOT\n\n")
+            commitText += ("\n[git-p4: depot-paths = \"%s\": change = %s" %
+                                (','.join(dispPaths), details["change"]))
+            if len(details['options']) > 0:
+                commitText += (": options = %s" % details['options'])
+            commitText += "]"
+        commitText += "\n" 
+        self.gitStream.write("data %s\n" % len(as_bytes(commitText)))
+        self.gitStream.write(commitText)
+        self.gitStream.write("\n")
 
         if len(parent) > 0:
             if self.verbose:
@@ -3606,30 +3715,35 @@ def sync_origin_only(self):
                 system("git fetch origin")
 
     def importHeadRevision(self, revision):
-        print("Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch))
-
+        # Re-encode depot text
+        dispPaths = []
+        utf8Paths = []
+        for p in self.depotPaths:
+            dispPaths += [path_as_string(p)]
+        print("Doing initial import of %s from revision %s into %s" % (' '.join(dispPaths), revision, self.branch))
         details = {}
         details["user"] = "git perforce import user"
-        details["desc"] = ("Initial import of %s from the state at revision %s\n"
-                           % (' '.join(self.depotPaths), revision))
+        details["desc"] = ("Initial import of %s from the state at revision %s\n" %
+                           (' '.join(dispPaths), revision))
         details["change"] = revision
         newestRevision = 0
+        del dispPaths
 
         fileCnt = 0
         fileArgs = ["%s...%s" % (p,revision) for p in self.depotPaths]
 
-        for info in p4CmdList(["files"] + fileArgs):
+        for info in p4CmdList(["files"] + fileArgs, encode_data = False):
 
-            if 'code' in info and info['code'] == 'error':
+            if 'code' in info and info['code'] == b'error':
                 sys.stderr.write("p4 returned an error: %s\n"
-                                 % info['data'])
-                if info['data'].find("must refer to client") >= 0:
+                                 % as_string(info['data']))
+                if info['data'].find(b"must refer to client") >= 0:
                     sys.stderr.write("This particular p4 error is misleading.\n")
                     sys.stderr.write("Perhaps the depot path was misspelled.\n");
                     sys.stderr.write("Depot path:  %s\n" % " ".join(self.depotPaths))
                 sys.exit(1)
             if 'p4ExitCode' in info:
-                sys.stderr.write("p4 exitcode: %s\n" % info['p4ExitCode'])
+                sys.stderr.write("p4 exitcode: %s\n" % as_string(info['p4ExitCode']))
                 sys.exit(1)
 
 
@@ -3642,8 +3756,10 @@ def importHeadRevision(self, revision):
                 #fileCnt = fileCnt + 1
                 continue
 
+            # Save all the file information, howerver do not translate the depotFile name at 
+            # this time. Leave that as bytes since the encoding may vary.
             for prop in ["depotFile", "rev", "action", "type" ]:
-                details["%s%s" % (prop, fileCnt)] = info[prop]
+                details["%s%s" % (prop, fileCnt)] = (info[prop] if prop == "depotFile" else as_string(info[prop]))
 
             fileCnt = fileCnt + 1
 
@@ -3663,13 +3779,18 @@ def importHeadRevision(self, revision):
             print(self.gitError.read())
 
     def openStreams(self):
+        """ Opens the fast import pipes.  Note that the git* streams are wrapped
+            to expect Unicode text.  To send a raw byte Array, use the importProcess
+            underlying port
+        """
         self.importProcess = subprocess.Popen(["git", "fast-import"],
                                               stdin=subprocess.PIPE,
                                               stdout=subprocess.PIPE,
                                               stderr=subprocess.PIPE);
-        self.gitOutput = self.importProcess.stdout
-        self.gitStream = self.importProcess.stdin
-        self.gitError = self.importProcess.stderr
+        self.gitOutput = Py23File(self.importProcess.stdout, verbose = self.verbose)
+        self.gitStream = Py23File(self.importProcess.stdin, verbose = self.verbose)
+        self.gitError = Py23File(self.importProcess.stderr, verbose = self.verbose)
+        self.gitStreamBytes = self.importProcess.stdin
 
     def closeStreams(self):
         self.gitStream.close()
@@ -4035,15 +4156,17 @@ def run(self, args):
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
 
+        dispPaths = []
         for p in depotPaths:
             if not p.startswith("//"):
                 sys.stderr.write('Depot paths must start with "//": %s\n' % p)
                 return False
+            dispPaths += [path_as_string(p)]
 
         if not self.cloneDestination:
             self.cloneDestination = self.defaultDestination(args)
 
-        print("Importing from %s into %s" % (', '.join(depotPaths), self.cloneDestination))
+        print("Importing from %s into %s" % (', '.join(dispPaths), path_as_string(self.cloneDestination)))
 
         if not os.path.exists(self.cloneDestination):
             os.makedirs(self.cloneDestination)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (9 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
  12 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The test t9822 did not have any tests that had encoded a directory name in ISO8859-1.

Additionally, to make it easier for the user to clone new repositories with a non-UTF-8 encoded path in P4, add a new parameter to p4clone "--encoding" that sets the

Add new tests that use ISO8859-1 encoded text in both the directory and file names.  git-p4.pathEncoding.

Update the View class in the git-p4 code to properly cast text as_string() except for depot path and filenames.

Update the documentation to include the new command line parameter for p4clone

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit e26f6309d60c6c1615320d4a9071935e23efe6fb)
---
 Documentation/git-p4.txt        |   5 ++
 git-p4.py                       |  61 +++++++++++++------
 t/t9822-git-p4-path-encoding.sh | 101 ++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 3494a1db3e..f54af3c917 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -305,6 +305,11 @@ options described above.
 --bare::
 	Perform a bare clone.  See linkgit:git-clone[1].
 
+--encoding <encoding>::
+    Optionally sets the git-p4.pathEncoding configuration value in 
+	the newly created Git repository before files are synchronized 
+	from P4. See git-p4.pathEncoding for more information.
+
 Submit options
 ~~~~~~~~~~~~~~
 These options can be used to modify 'git p4 submit' behavior.
diff --git a/git-p4.py b/git-p4.py
index 05db2ec657..1f2e43430a 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1228,7 +1228,7 @@ def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
        all the mappings, and return it."""
 
-    specList = p4CmdList("client -o")
+    specList = p4CmdList("client -o", encode_data=False)
     if len(specList) != 1:
         die('Output from "client -o" is %d lines, expecting 1' %
             len(specList))
@@ -1237,7 +1237,7 @@ def getClientSpec():
     entry = specList[0]
 
     # the //client/ name
-    client_name = entry["Client"]
+    client_name = as_string(entry["Client"])
 
     # just the keys that start with "View"
     view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
@@ -2637,19 +2637,25 @@ def run(self, args):
         return True
 
 class View(object):
-    """Represent a p4 view ("p4 help views"), and map files in a
-       repo according to the view."""
+    """ Represent a p4 view ("p4 help views"), and map files in a
+        repo according to the view.
+    """
 
     def __init__(self, client_name):
         self.mappings = []
-        self.client_prefix = "//%s/" % client_name
+        # the client prefix is saved in bytes as it is used for comparison
+        # against server data.
+        self.client_prefix = as_bytes("//%s/" % client_name)
         # cache results of "p4 where" to lookup client file locations
         self.client_spec_path_cache = {}
 
     def append(self, view_line):
-        """Parse a view line, splitting it into depot and client
-           sides.  Append to self.mappings, preserving order.  This
-           is only needed for tag creation."""
+        """ Parse a view line, splitting it into depot and client
+            sides.  Append to self.mappings, preserving order.  This
+            is only needed for tag creation.
+
+            view_line should be in bytes (depot path encoding)
+        """
 
         # Split the view line into exactly two words.  P4 enforces
         # structure on these lines that simplifies this quite a bit.
@@ -2662,28 +2668,28 @@ def append(self, view_line):
         # The line is already white-space stripped.
         # The two words are separated by a single space.
         #
-        if view_line[0] == '"':
+        if view_line[0] == b'"':
             # First word is double quoted.  Find its end.
-            close_quote_index = view_line.find('"', 1)
+            close_quote_index = view_line.find(b'"', 1)
             if close_quote_index <= 0:
-                die("No first-word closing quote found: %s" % view_line)
+                die("No first-word closing quote found: %s" % path_as_string(view_line))
             depot_side = view_line[1:close_quote_index]
             # skip closing quote and space
             rhs_index = close_quote_index + 1 + 1
         else:
-            space_index = view_line.find(" ")
+            space_index = view_line.find(b" ")
             if space_index <= 0:
-                die("No word-splitting space found: %s" % view_line)
+                die("No word-splitting space found: %s" % path_as_string(view_line))
             depot_side = view_line[0:space_index]
             rhs_index = space_index + 1
 
         # prefix + means overlay on previous mapping
-        if depot_side.startswith("+"):
+        if depot_side.startswith(b"+"):
             depot_side = depot_side[1:]
 
         # prefix - means exclude this path, leave out of mappings
         exclude = False
-        if depot_side.startswith("-"):
+        if depot_side.startswith(b"-"):
             exclude = True
             depot_side = depot_side[1:]
 
@@ -2694,7 +2700,7 @@ def convert_client_path(self, clientFile):
         # chop off //client/ part to make it relative
         if not clientFile.startswith(self.client_prefix):
             die("No prefix '%s' on clientFile '%s'" %
-                (self.client_prefix, clientFile))
+                (as_string(self.client_prefix)), path_as_string(clientFile))
         return clientFile[len(self.client_prefix):]
 
     def update_client_spec_path_cache(self, files):
@@ -2706,9 +2712,9 @@ def update_client_spec_path_cache(self, files):
         if len(fileArgs) == 0:
             return  # All files in cache
 
-        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs)
+        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs, encode_data=False)
         for res in where_result:
-            if "code" in res and res["code"] == "error":
+            if "code" in res and res["code"] == b"error":
                 # assume error is "... file(s) not in client view"
                 continue
             if "clientFile" not in res:
@@ -4125,10 +4131,14 @@ def __init__(self):
                                  help="where to leave result of the clone"),
             optparse.make_option("--bare", dest="cloneBare",
                                  action="store_true", default=False),
+            optparse.make_option("--encoding", dest="setPathEncoding",
+                                 action="store", default=None,
+                                 help="Sets the path encoding for this depot")
         ]
         self.cloneDestination = None
         self.needsGit = False
         self.cloneBare = False
+        self.setPathEncoding = None
 
     def defaultDestination(self, args):
         """ Returns the last path component as the default git 
@@ -4152,6 +4162,14 @@ def run(self, args):
 
         depotPaths = args
 
+        # If we have an encoding provided, ignore what may already exist
+        # in the registry. This will ensure we show the displayed values
+        # using the correct encoding.
+        if self.setPathEncoding:
+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
+
+        # If more than 1 path element is supplied, the last element
+        # is the clone destination.
         if not self.cloneDestination and len(depotPaths) > 1:
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
@@ -4179,6 +4197,13 @@ def run(self, args):
         if retcode:
             raise CalledProcessError(retcode, init_cmd)
 
+        # Set the encoding if it was provided command line
+        if self.setPathEncoding:
+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
+            retcode = subprocess.call(init_cmd)
+            if retcode:
+                raise CalledProcessError(retcode, init_cmd)
+
         if not P4Sync.run(self, depotPaths):
             return False
 
diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
index 572d395498..cf8a15b2e4 100755
--- a/t/t9822-git-p4-path-encoding.sh
+++ b/t/t9822-git-p4-path-encoding.sh
@@ -4,9 +4,20 @@ test_description='Clone repositories with non ASCII paths'
 
 . ./lib-git-p4.sh
 
+# lowercase filename
+# UTF8    - HEX:   a-\xc3\xa4_o-\xc3\xb6_u-\xc3\xbc
+#         - octal: a-\303\244_o-\303\266_u-\303\274
+# ISO8859 - HEX:   a-\xe4_o-\xf6_u-\xfc
 UTF8_ESCAPED="a-\303\244_o-\303\266_u-\303\274.txt"
 ISO8859_ESCAPED="a-\344_o-\366_u-\374.txt"
 
+# lowercase directory
+# UTF8    - HEX:   dir_a-\xc3\xa4_o-\xc3\xb6_u-\xc3\xbc
+# ISO8859 - HEX:   dir_a-\xe4_o-\xf6_u-\xfc
+DIR_UTF8_ESCAPED="dir_a-\303\244_o-\303\266_u-\303\274"
+DIR_ISO8859_ESCAPED="dir_a-\344_o-\366_u-\374"
+
+
 ISO8859="$(printf "$ISO8859_ESCAPED")" &&
 echo content123 >"$ISO8859" &&
 rm "$ISO8859" || {
@@ -58,6 +69,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p
 	)
 '
 
+test_expect_success 'Clone repo containing iso8859-1 encoded paths with using --encoding parameter' '
+	test_when_finished cleanup_git &&
+	(
+		git p4 clone --encoding iso8859 --destination="$git" //depot &&
+		cd "$git" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'Delete iso8859-1 encoded paths and clone' '
 	(
 		cd "$cli" &&
@@ -74,4 +101,78 @@ test_expect_success 'Delete iso8859-1 encoded paths and clone' '
 	)
 '
 
+# These tests will create a directory with ISO8859-1 characters in both the 
+# directory and the path.  Since it is possible to clone a path instead of using
+# the whole client-spec.  Check both versions:  client-spec and with a direct
+# path using --encoding
+test_expect_success 'Create a repo containing iso8859-1 encoded directory and filename' '
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		ISO8859="$(printf "$ISO8859_ESCAPED")" &&
+		cd "$cli" &&
+		mkdir "$DIR_ISO8859" && 
+		cd "$DIR_ISO8859" &&
+		echo content123 >"$ISO8859" &&
+		p4 add "$ISO8859" &&
+		p4 submit -d "test commit (encoded directory)"
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with git-p4.pathEncoding' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		DIR_UTF8="$(printf "$DIR_UTF8_ESCAPED")" &&
+		cd "$git" &&
+		git init . &&
+		git config git-p4.pathEncoding iso8859-1 &&
+		git p4 clone --use-client-spec --destination="$git" "//depot/$DIR_ISO8859" &&
+		cd "$DIR_UTF8" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with git-p4.pathEncoding, without --use-client-spec' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		cd "$git" &&
+		git init . &&
+		git config git-p4.pathEncoding iso8859-1 &&
+		git p4 clone --destination="$git" "//depot/$DIR_ISO8859" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with using --encoding parameter' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		git p4 clone --encoding iso8859 --destination="$git" "//depot/$DIR_ISO8859" &&
+		cd "$git" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (10 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
@ 2019-12-05  9:54       ` Luke Diamand
  2019-12-05 16:16         ` Ben Keene
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
  12 siblings, 1 reply; 64+ messages in thread
From: Luke Diamand @ 2019-12-05  9:54 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: Git Users, Ben Keene, Junio C Hamano

On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Issue: The current git-p4.py script does not work with python3.
>
> I have attempted to use the P4 integration built into GIT and I was unable
> to get the program to run because I have Python 3.8 installed on my
> computer. I was able to get the program to run when I downgraded my python
> to version 2.7. However, python 2 is reaching its end of life.
>
> Submission: I am submitting a patch for the git-p4.py script that partially
> supports python 3.8. This code was able to pass the basic tests (t9800) when
> run against Python3. This provides basic functionality.
>
> In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
> git P4 Clone was introduced.
>
> --encoding Format-identifier
>
> This will create the GIT repository following the current functionality;
> however, before importing the files from P4, it will set the
> git-p4.pathEncoding option so any files or paths that are encoded with
> non-ASCII/non-UTF-8 formats will import correctly.
>
> Technical details: The script was updated by futurize (
> https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
> references to classes in future were reworked so that future would not be
> required. The existing code test for Unicode support was extended to
> normalize the classes “unicode” and “bytes” to across platforms:
>
>  * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
>  * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.
>
> New coercion methods were written for both Python2 and Python3:
>
>  * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
>    encoded Unicode string.
>  * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
>    bytes.
>
> In Python2, these functions do not change the data since a ‘str’ object
> function in both roles as strings and byte arrays. This reduces the
> potential impact on backward compatibility with Python 2.
>
>  * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
>    string. This function will encode data in both Python2 and Python3. *
>       path_as_string(path) – This function is an extension function that
>       honors the option “git-p4.pathEncoding” to convert a set of bytes or
>       characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
>       use the encodeWithUTF8() method to convert the custom encoded bytes to
>       Unicode in UTF-8.
>
>
>
> Generally speaking, information in the script is converted to Unicode as
> early as possible and converted back to a byte array just before passing to
> external programs or files. The exception to this rule is P4 Repository file
> paths.
>
> Paths are not converted but left as “bytes” so the original file path
> encoding can be preserved. This formatting is required for commands that
> interact with the P4 file path. When the file path is used by GIT, it is
> converted with encodeWithUTF8().
>

Almost all the tests pass now - nice!

(There's one test that fails for me, t9830-git-p4-symlink-dir.sh).

Nitpicking:

- There are some bits of trailing whitespace around - can you strip
those out? You can use "git diff --check".
- Also I think the convention for git commits is that they be limited
to 72 (?) characters.
- In 10dc commit message, s/behvior/behavior
- Maybe submit 4fc4 as a separate patch series? It doesn't seem
directly related to your python3 changes.
- s/howerver/however/

The comment at line 3261 (showing the fast-import syntax) has wonky
indentation, and needs a space after the '#'.

This code looked like we're duplicating stuff:

+    if isinstance(path, unicode):
+        path = path.replace("%", "%25") \
+                   .replace("*", "%2A") \
+                   .replace("#", "%23") \
+                   .replace("@", "%40")
+    else:
+        path = path.replace(b"%", b"%25") \
+                   .replace(b"*", b"%2A") \
+                   .replace(b"#", b"%23") \
+                   .replace(b"@", b"%40")

I wonder if we can have a helper to do this?

In patchRCSKeywords() you've added code to cleanup outFile. But I
wonder if we could just use a 'finally' block, or a contextexpr ("with
blah as outFile:")

I don't know if it's worth doing now that you've got it going, but at
one point I tried simplifying code like this:

   path_as_string(file['depotFile'])
and
   marshalled[b'data']

by using a dictionary with overloaded operators which would do the
bytes/string conversion automatically. However, your approach isn't
actually _that_ invasive, so maybe this is not necessary.

Looks good though, thanks!
Luke






> Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]
>
> Ben Keene (11):
>   git-p4: select p4 binary by operating-system
>   git-p4: change the expansion test from basestring to list
>   git-p4: add new helper functions for python3 conversion
>   git-p4: python3 syntax changes
>   git-p4: Add new functions in preparation of usage
>   git-p4: Fix assumed path separators to be more Windows friendly
>   git-p4: Add a helper class for stream writing
>   git-p4: p4CmdList  - support Unicode encoding
>   git-p4: Add usability enhancements
>   git-p4: Support python3 for basic P4 clone, sync, and submit
>   git-p4: Added --encoding parameter to p4 clone
>
>  Documentation/git-p4.txt        |   5 +
>  git-p4.py                       | 690 ++++++++++++++++++++++++--------
>  t/t9822-git-p4-path-encoding.sh | 101 +++++
>  3 files changed, 629 insertions(+), 167 deletions(-)
>
>
> base-commit: 228f53135a4a41a37b6be8e4d6e2b6153db4a8ed
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v4
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v4
> Pull-Request: https://github.com/gitgitgadget/git/pull/463
>
> Range-diff vs v3:
>
>   -:  ---------- >  1:  4012426993 git-p4: select p4 binary by operating-system
>   -:  ---------- >  2:  0ef2f56b04 git-p4: change the expansion test from basestring to list
>   -:  ---------- >  3:  f0e658b984 git-p4: add new helper functions for python3 conversion
>   -:  ---------- >  4:  3c41db3e91 git-p4: python3 syntax changes
>   -:  ---------- >  5:  1bf7b073b0 git-p4: Add new functions in preparation of usage
>   -:  ---------- >  6:  8f5752c127 git-p4: Fix assumed path separators to be more Windows friendly
>   -:  ---------- >  7:  10dc059444 git-p4: Add a helper class for stream writing
>   -:  ---------- >  8:  e1a424a955 git-p4: p4CmdList  - support Unicode encoding
>   -:  ---------- >  9:  4fc49313f0 git-p4: Add usability enhancements
>   1:  02b3843e9f ! 10:  04a0aedbaa Python3 support for t9800 tests. Basic P4/Python3 support
>      @@ -1,159 +1,60 @@
>       Author: Ben Keene <seraphire@gmail.com>
>
>      -    Python3 support for t9800 tests. Basic P4/Python3 support
>      +    git-p4: Support python3 for basic P4 clone, sync, and submit
>      +
>      +    Issue: Python 3 is still not properly supported for any use with the git-p4 python code.
>      +    Warning - this is a very large atomic commit.  The commit text is also very large.
>      +
>      +    Change the code such that, with the exception of P4 depot paths and depot files, all text read by git-p4 is cast as a string as soon as possible and converted back to bytes as late as possible, following Python2 to Python3 conversion best practices.
>      +
>      +    Important: Do not cast the bytes that contain the p4 depot path or p4 depot file name.  These should be left as bytes until used.
>      +
>      +    These two values should not be converted because the encoding of these values is unknown.  git-p4 supports a configuration value git-p4.pathEncoding that is used by the encodeWithUTF8()  to determine what a UTF8 version of the path and filename should be.  However, since depot path and depot filename need to be sent to P4 in their original encoding, they will be left as byte streams until they are actually used:
>      +
>      +    * When sent to P4, the bytes are literally passed to the p4 command
>      +    * When displayed in text for the user, they should be passed through the path_as_string() function
>      +    * When used by GIT they should be passed through the encodeWithUTF8() function
>      +
>      +    Change all the rest of system calls to cast output (stdin) as_bytes() and input (stdout) as_string().  This retains existing Python 2 support, and adds python 3 support for these functions:
>      +    * read_pipe_full
>      +    * read_pipe_lines
>      +    * p4_has_move_command (used internally)
>      +    * gitConfig
>      +    * branch_exists
>      +    * GitLFS.generatePointer
>      +    * applyCommit - template must be read and written to the temporary file as_bytes() since it is created in memory as a string.
>      +    * streamOneP4File(file, contents) - wrap calls to the depotFile in path_as_string() for display. The file contents must be retained as bytes, so update the RCS changes to be forced to bytes.
>      +    * streamP4Files
>      +    * importHeadRevision(revision) - encode the depotPaths for display separate from the text for processing.
>      +
>      +    Py23File usage -
>      +    Change the P4Sync.OpenStreams() function to cast the gitOutput, gitStream, and gitError streams as Py23File() wrapper classes.  This facilitates taking strings in both python 2 and python 3 and casting them to bytes in the wrapper class instead of having to modify each method. Since the fast-import command also expects a raw byte stream for file content, add a new stream handle - gitStreamBytes which is an unwrapped verison of gitStream.
>      +
>      +    Literal text -
>      +    Depending on context, most literal text does not need casting to unicode or bytes as the text is Python dependent - In python 2, the string is implied as 'str' and python 3 the string is implied as 'unicode'. Under these conditions, they match the rest of the operating text, following best practices.  However, when a literal string is used in functions that are dealing with the raw input from and raw ouput to files streams, literal bytes may be required. Additionally, functions that are dealing with P4 depot paths or P4 depot file names are also dealing with bytes and will require the same casting as bytes.  The following functions cast text as byte strings:
>      +    * wildcard_decode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
>      +    * wildcard_encode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
>      +    * streamP4FilesCb(marshalled) - the marshalled data is in bytes. Cast the literals as bytes. When using this data to manipulate self.stream_file, encode all the marshalled data except for the 'depotFile' name.
>      +    * streamP4Files
>      +
>      +    Special behavior:
>      +    * p4_describe - encoding is disabled for the depotFile(x) and path elements since these are depot path and depo filenames.
>      +    * p4PathStartsWith(path, prefix) - Since P4 depot paths can contain non-UTF-8 encoded strings, change this method to compare paths while supporting the optional encoding.
>      +       - First, perform a byte-to-byte check to see if the path and prefix are both identical text.  There is no need to perform encoding conversions if the text is identical.
>      +       - If the byte check fails, pass both the path and prefix through encodeWithUTF8() to ensure both paths are using the same encoding. Then perform the test as originally written.
>      +    * patchRCSKeywords(file, pattern) - the parameters of file and pattern are both strings. However this function changes the contents of the file itentified by name "file". Treat the content of this file as binary to ensure that python does not accidently change the original encoding. The regular expression is cast as_bytes() and run against the file as_bytes(). The P4 keywords are ASCII strings and cannot span lines so iterating over each line of the file is acceptable.
>      +    * writeToGitStream(gitMode, relPath, contents) - Since 'contents' is already bytes data, instead of using the self.gitStream, use the new self.gitStreamBytes - the unwrapped gitStream that does not cast as_bytes() the binary data.
>      +    * commit(details, files, branch, parent = "", allow_empty=False) - Changed the encoding for the commit message to the preferred format for fast-import. The number of bytes is sent in the data block instead of using the EOT marker.
>      +    * Change the code for handling the user cache to use binary files. Cast text as_bytes() when writing to the cache and as_string() when reading from the cache.  This makes the reading and writing of the cache determinstic in it's encoding. Unlike file paths, P4 encodes the user names in UTF-8 encoding so no additional string encoding is required.
>
>           Signed-off-by: Ben Keene <seraphire@gmail.com>
>      +    (cherry picked from commit 65ff0c74ebe62a200b4385ecfd4aa618ce091f48)
>
>        diff --git a/git-p4.py b/git-p4.py
>        --- a/git-p4.py
>        +++ b/git-p4.py
>       @@
>      - import zlib
>      - import ctypes
>      - import errno
>      -+import os.path
>      -+import codecs
>      -+import io
>      -
>      - # support basestring in python3
>      - try:
>      -     unicode = unicode
>      - except NameError:
>      -     # 'unicode' is undefined, must be Python 3
>      --    str = str
>      -+    #
>      -+    # For Python3 which is natively unicode, we will use
>      -+    # unicode for internal information but all P4 Data
>      -+    # will remain in bytes
>      -+    isunicode = True
>      -     unicode = str
>      -     bytes = bytes
>      --    basestring = (str,bytes)
>      -+
>      -+    def as_string(text):
>      -+        """Return a byte array as a unicode string"""
>      -+        if text == None:
>      -+            return None
>      -+        if isinstance(text, bytes):
>      -+            return unicode(text, "utf-8")
>      -+        else:
>      -+            return text
>      -+
>      -+    def as_bytes(text):
>      -+        """Return a Unicode string as a byte array"""
>      -+        if text == None:
>      -+            return None
>      -+        if isinstance(text, bytes):
>      -+            return text
>      -+        else:
>      -+            return bytes(text, "utf-8")
>      -+
>      -+    def to_unicode(text):
>      -+        """Return a byte array as a unicode string"""
>      -+        return as_string(text)
>      -+
>      -+    def path_as_string(path):
>      -+        """ Converts a path to the UTF8 encoded string """
>      -+        if isinstance(path, unicode):
>      -+            return path
>      -+        return encodeWithUTF8(path).decode('utf-8')
>      -+
>      - else:
>      -     # 'unicode' exists, must be Python 2
>      --    str = str
>      -+    #
>      -+    # We will treat the data as:
>      -+    #   str   -> str
>      -+    #   bytes -> str
>      -+    # So for Python2 these functions are no-ops
>      -+    # and will leave the data in the ambiguious
>      -+    # string/bytes state
>      -+    isunicode = False
>      -     unicode = unicode
>      -     bytes = str
>      --    basestring = basestring
>      -+
>      -+    def as_string(text):
>      -+        """ Return text unaltered (for Python3 support) """
>      -+        return text
>      -+
>      -+    def as_bytes(text):
>      -+        """ Return text unaltered (for Python3 support) """
>      -+        return text
>      -+
>      -+    def to_unicode(text):
>      -+        """Return a string as a unicode string"""
>      -+        return text.decode('utf-8')
>      -+
>      -+    def path_as_string(path):
>      -+        """ Converts a path to the UTF8 encoded bytes """
>      -+        return encodeWithUTF8(path)
>      -+
>      -+
>      -+
>      -+# Check for raw_input support
>      -+try:
>      -+    raw_input
>      -+except NameError:
>      -+    raw_input = input
>      -
>      - try:
>      -     from subprocess import CalledProcessError
>      -@@
>      -     location. It means that hooking into the environment, or other configuration
>      -     can be done more easily.
>      -     """
>      --    real_cmd = ["p4"]
>      -+    # Look for the P4 binary
>      -+    if (platform.system() == "Windows"):
>      -+        real_cmd = ["p4.exe"]
>      -+    else:
>      -+        real_cmd = ["p4"]
>      -
>      -     user = gitConfig("git-p4.user")
>      -     if len(user) > 0:
>      -@@
>      -         # Provide a way to not pass this option by setting git-p4.retries to 0
>      -         real_cmd += ["-r", str(retries)]
>      -
>      --    if isinstance(cmd,basestring):
>      -+    if not isinstance(cmd, list):
>      -         real_cmd = ' '.join(real_cmd) + ' ' + cmd
>      -     else:
>      -         real_cmd += cmd
>      -@@
>      -         sys.exit(1)
>      -
>      - def write_pipe(c, stdin):
>      -+    """Executes the command 'c', passing 'stdin' on the standard input"""
>      -     if verbose:
>      -         sys.stderr.write('Writing pipe: %s\n' % str(c))
>      -
>      --    expand = isinstance(c,basestring)
>      -+    expand = not isinstance(c, list)
>      -     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
>      -     pipe = p.stdin
>      -     val = pipe.write(stdin)
>      -@@
>      -     if p.wait():
>      -         die('Command failed: %s' % str(c))
>      -
>      --    return val
>      -
>      - def p4_write_pipe(c, stdin):
>      -+    """ Runs a P4 command 'c', passing 'stdin' data to P4"""
>      -     real_cmd = p4_build_cmd(c)
>      --    return write_pipe(real_cmd, stdin)
>      -+    write_pipe(real_cmd, stdin)
>      -
>      - def read_pipe_full(c):
>      -     """ Read output from  command. Returns a tuple
>      -@@
>      -     if verbose:
>      -         sys.stderr.write('Reading pipe: %s\n' % str(c))
>      -
>      --    expand = isinstance(c,basestring)
>      -+    expand = not isinstance(c, list)
>      +     expand = not isinstance(c, list)
>            p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
>            (out, err) = p.communicate()
>       +    out = as_string(out)
>      @@ -179,10 +80,7 @@
>            if verbose:
>                sys.stderr.write('Reading pipe: %s\n' % str(c))
>
>      --    expand = isinstance(c, basestring)
>      -+    expand = not isinstance(c, list)
>      -     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
>      -     pipe = p.stdout
>      +@@
>            val = pipe.readlines()
>            if pipe.close() or p.wait():
>                die('Command failed: %s' % str(c))
>      @@ -203,28 +101,6 @@
>            # return code will be 1 in either case
>            if err.find("Invalid option") >= 0:
>                return False
>      -@@
>      -     return True
>      -
>      - def system(cmd, ignore_error=False):
>      --    expand = isinstance(cmd,basestring)
>      -+    expand = not isinstance(cmd, list)
>      -     if verbose:
>      -         sys.stderr.write("executing %s\n" % str(cmd))
>      -     retcode = subprocess.call(cmd, shell=expand)
>      -@@
>      -     return retcode
>      -
>      - def p4_system(cmd):
>      --    """Specifically invoke p4 as the system command. """
>      -+    """ Specifically invoke p4 as the system command.
>      -+    """
>      -     real_cmd = p4_build_cmd(cmd)
>      --    expand = isinstance(real_cmd, basestring)
>      -+    expand = not isinstance(real_cmd, list)
>      -     retcode = subprocess.call(real_cmd, shell=expand)
>      -     if retcode:
>      -         raise CalledProcessError(retcode, real_cmd)
>       @@
>            return int(results[0]['change'])
>
>      @@ -234,7 +110,7 @@
>       -       results."""
>       +    """ Returns information about the requested P4 change list.
>       +
>      -+        Data returns is not string encoded (returned as bytes)
>      ++        Data returned is not string encoded (returned as bytes)
>       +    """
>       +    # Make sure it returns a valid result by checking for
>       +    #   the presence of field "time".  Return a dict of the
>      @@ -261,218 +137,29 @@
>            if "time" not in d:
>                die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
>
>      -+    # Convert depotFile(X) to be UTF-8 encoded, as this is what GIT
>      -+    # requires. This will also allow us to encode the rest of the text
>      -+    # at the same time to simplify textual processing later.
>      ++    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however
>      ++    # cast as_string() the rest of the text.
>       +    keys=d.keys()
>       +    for key in keys:
>       +        if key.startswith('depotFile'):
>      -+            d[key]=d[key] #DepotPath(d[key])
>      ++            d[key]=d[key]
>       +        elif key == 'path':
>      -+            d[key]=d[key] #DepotPath(d[key])
>      ++            d[key]=d[key]
>       +        else:
>       +            d[key] = as_string(d[key])
>       +
>            return d
>
>      --#
>      --# Canonicalize the p4 type and return a tuple of the
>      --# base type, plus any modifiers.  See "p4 help filetypes"
>      --# for a list and explanation.
>      --#
>      - def split_p4_type(p4type):
>      --
>      -+    """ Canonicalize the p4 type and return a tuple of the
>      -+        base type, plus any modifiers.  See "p4 help filetypes"
>      -+        for a list and explanation.
>      -+    """
>      -     p4_filetypes_historical = {
>      -         "ctempobj": "binary+Sw",
>      -         "ctext": "text+C",
>      -@@
>      -         mods = s[1]
>      -     return (base, mods)
>      -
>      --#
>      --# return the raw p4 type of a file (text, text+ko, etc)
>      --#
>      - def p4_type(f):
>      -+    """ return the raw p4 type of a file (text, text+ko, etc)
>      -+    """
>      -     results = p4CmdList(["fstat", "-T", "headType", wildcard_encode(f)])
>      -     return results[0]['headType']
>      -
>      --#
>      --# Given a type base and modifier, return a regexp matching
>      --# the keywords that can be expanded in the file
>      --#
>      - def p4_keywords_regexp_for_type(base, type_mods):
>      -+    """ Given a type base and modifier, return a regexp matching
>      -+        the keywords that can be expanded in the file
>      -+    """
>      -     if base in ("text", "unicode", "binary"):
>      -         kwords = None
>      -         if "ko" in type_mods:
>      -@@
>      -     else:
>      -         return None
>      -
>      --#
>      --# Given a file, return a regexp matching the possible
>      --# RCS keywords that will be expanded, or None for files
>      --# with kw expansion turned off.
>      --#
>      - def p4_keywords_regexp_for_file(file):
>      -+    """ Given a file, return a regexp matching the possible
>      -+        RCS keywords that will be expanded, or None for files
>      -+        with kw expansion turned off.
>      -+    """
>      -     if not os.path.exists(file):
>      -         return None
>      -     else:
>      -@@
>      - # Return the set of all p4 labels
>      - def getP4Labels(depotPaths):
>      -     labels = set()
>      --    if isinstance(depotPaths,basestring):
>      -+    if not isinstance(depotPaths, list):
>      -         depotPaths = [depotPaths]
>      -
>      -     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
>      -@@
>      -
>      -     return labels
>      -
>      --# Return the set of all git tags
>      - def getGitTags():
>      -+    """Return the set of all git tags"""
>      -     gitTags = set()
>      -     for line in read_pipe_lines(["git", "tag"]):
>      -         tag = line.strip()
>      -@@
>      -
>      -     If the pattern is not matched, None is returned."""
>      -
>      --    match = diffTreePattern().next().match(entry)
>      -+    match = next(diffTreePattern()).match(entry)
>      -     if match:
>      -         return {
>      -             'src_mode': match.group(1),
>      -@@
>      -     # otherwise False.
>      -     return mode[-3:] == "755"
>      -
>      -+def encodeWithUTF8(path, verbose = False):
>      -+    """ Ensure that the path is encoded as a UTF-8 string
>      -+
>      -+        Returns bytes(P3)/str(P2)
>      -+    """
>      -+
>      -+    if isunicode:
>      -+        try:
>      -+            if isinstance(path, unicode):
>      -+                # It is already unicode, cast it as a bytes
>      -+                # that is encoded as utf-8.
>      -+                return path.encode('utf-8', 'strict')
>      -+            path.decode('ascii', 'strict')
>      -+        except:
>      -+            encoding = 'utf8'
>      -+            if gitConfig('git-p4.pathEncoding'):
>      -+                encoding = gitConfig('git-p4.pathEncoding')
>      -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>      -+            if verbose:
>      -+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
>      -+    else:
>      -+        try:
>      -+            path.decode('ascii')
>      -+        except:
>      -+            encoding = 'utf8'
>      -+            if gitConfig('git-p4.pathEncoding'):
>      -+                encoding = gitConfig('git-p4.pathEncoding')
>      -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>      -+            if verbose:
>      -+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>      -+    return path
>      -+
>      - class P4Exception(Exception):
>      -     """ Base class for exceptions from the p4 client """
>      -     def __init__(self, exit_code):
>      -@@
>      -     return isModeExec(src_mode) != isModeExec(dst_mode)
>      -
>      - def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
>      --        errors_as_exceptions=False):
>      -+        errors_as_exceptions=False, encode_data=True):
>      -+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
>      -+        standard input via a temporary file with 'stdin_mode' mode.
>      -+
>      -+        Output from the command is optionally passed to the callback function 'cb'.
>      -+        If 'cb' is None, the response from the command is parsed into a list
>      -+        of resulting dictionaries. (For each block read from the process pipe.)
>      -+
>      -+        If 'skip_info' is true, information in a block read that has a code type of
>      -+        'info' will be skipped.
>      -
>      --    if isinstance(cmd,basestring):
>      -+        If 'errors_as_exceptions' is set to true (the default is false) the error
>      -+        code returned from the execution will generate an exception.
>      -+
>      -+        If 'encode_data' is set to true (the default) the data that is returned
>      -+        by this function will be passed through the "as_string" function.
>      -+    """
>      -+
>      -+    if not isinstance(cmd, list):
>      -         cmd = "-G " + cmd
>      -         expand = True
>      -     else:
>      -@@
>      -     stdin_file = None
>      -     if stdin is not None:
>      -         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
>      --        if isinstance(stdin,basestring):
>      -+        if not isinstance(stdin, list):
>      -             stdin_file.write(stdin)
>      -         else:
>      -             for i in stdin:
>      --                stdin_file.write(i + '\n')
>      -+                stdin_file.write(as_bytes(i) + b'\n')
>      -         stdin_file.flush()
>      -         stdin_file.seek(0)
>      -
>      -@@
>      -         while True:
>      -             entry = marshal.load(p4.stdout)
>      -             if skip_info:
>      --                if 'code' in entry and entry['code'] == 'info':
>      -+                if b'code' in entry and entry[b'code'] == b'info':
>      -                     continue
>      -             if cb is not None:
>      -                 cb(entry)
>      -             else:
>      --                result.append(entry)
>      -+                out = {}
>      -+                for key, value in entry.items():
>      -+                    out[as_string(key)] = (as_string(value) if encode_data else value)
>      -+                result.append(out)
>      -     except EOFError:
>      -         pass
>      -     exitCode = p4.wait()
>      + #
>       @@
>            return result
>
>        def p4Cmd(cmd):
>      -+    """ Executes a P4 command an returns the results in a dictionary"""
>      ++    """ Executes a P4 command and returns the results in a dictionary
>      ++    """
>            list = p4CmdList(cmd)
>            result = {}
>            for entry in list:
>      -@@
>      -     return values
>      -
>      - def gitBranchExists(branch):
>      -+    """Checks to see if a given branch exists in the git repo"""
>      -     proc = subprocess.Popen(["git", "rev-parse", branch],
>      -                             stderr=subprocess.PIPE, stdout=subprocess.PIPE);
>      -     return proc.wait() == 0;
>       @@
>        _gitConfig = {}
>
>      @@ -490,29 +177,6 @@
>            return _gitConfig[key]
>
>        def gitConfigBool(key):
>      --    """Return a bool, using git config --bool.  It is True only if the
>      --       variable is set to true, and False if set to false or not present
>      --       in the config."""
>      --
>      -+    """ Return a bool, using git config --bool.  It is True only if the
>      -+        variable is set to true, and False if set to false or not present
>      -+        in the config.
>      -+    """
>      -     if key not in _gitConfig:
>      -         _gitConfig[key] = gitConfig(key, '--bool') == "true"
>      -     return _gitConfig[key]
>      -@@
>      -             _gitConfig[key] = []
>      -     return _gitConfig[key]
>      -
>      -+def gitConfigSet(key, value):
>      -+    """ Set the git configuration key 'key' to 'value' for this session
>      -+    """
>      -+    _gitConfig[key] = value
>      -+
>      - def p4BranchesInGit(branchesAreInRemotes=True):
>      -     """Find all the branches whose names start with "p4/", looking
>      -        in remotes or heads as specified by the argument.  Return
>       @@
>            cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
>            p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>      @@ -521,34 +185,6 @@
>            if p.returncode:
>                return False
>            # expect exactly one line of output: the branch name
>      -@@
>      -     branches = p4BranchesInGit()
>      -     # map from depot-path to branch name
>      -     branchByDepotPath = {}
>      --    for branch in branches.keys():
>      -+    for branch in list(branches.keys()):
>      -         tip = branches[branch]
>      -         log = extractLogMessageFromGitCommit(tip)
>      -         settings = extractSettingsGitLog(log)
>      -@@
>      -             system("git update-ref %s %s" % (remoteHead, originHead))
>      -
>      - def originP4BranchesExist():
>      --        return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
>      -+    """Checks if origin/p4/master exists"""
>      -+    return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
>      -
>      -
>      - def p4ParseNumericChangeRange(parts):
>      -@@
>      -     changes = sorted(changes)
>      -     return changes
>      -
>      --def p4PathStartsWith(path, prefix):
>      -+def p4PathStartsWith(path, prefix, verbose = False):
>      -     # This method tries to remedy a potential mixed-case issue:
>      -     #
>      -     # If UserA adds  //depot/DirA/file1
>       @@
>            #
>            # we may or may not have a problem. If you have core.ignorecase=true,
>      @@ -574,15 +210,6 @@
>
>        def getClientSpec():
>            """Look at the p4 client spec, create a View() object that contains
>      -@@
>      -     client_name = entry["Client"]
>      -
>      -     # just the keys that start with "View"
>      --    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
>      -+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
>      -
>      -     # hold this new View
>      -     view = View(client_name)
>       @@
>            # Cannot have * in a filename in windows; untested as to
>            # what p4 would do in such a case.
>      @@ -626,45 +253,16 @@
>                    os.remove(contentFile)
>                    die('git-lfs pointer command failed. Did you install the extension?')
>       @@
>      -         else:
>      -             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
>      -
>      --class Command:
>      -+class Command(object):
>      -     delete_actions = ( "delete", "move/delete", "purge" )
>      -     add_actions = ( "add", "branch", "move/add" )
>      -
>      -@@
>      -             setattr(self, attr, value)
>      -         return getattr(self, attr)
>      -
>      --class P4UserMap:
>      -+class P4UserMap(object):
>      -     def __init__(self):
>      -         self.userMapFromPerforceServer = False
>      -         self.myP4UserId = None
>      -@@
>      -             return True
>      -
>      -     def getUserCacheFilename(self):
>      -+        """ Returns the filename of the username cache """
>      -         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
>      --        return home + "/.gitp4-usercache.txt"
>      -+        return os.path.join(home, ".gitp4-usercache.txt")
>      +         return os.path.join(home, ".gitp4-usercache.txt")
>
>            def getUserMapFromPerforceServer(self):
>       +        """ Creates the usercache from the data in P4.
>       +        """
>      -+
>                if self.userMapFromPerforceServer:
>                    return
>                self.users = {}
>       @@
>      -                 self.emails[email] = user
>      -
>      -         s = ''
>      --        for (key, val) in self.users.items():
>      -+        for (key, val) in list(self.users.items()):
>      +         for (key, val) in list(self.users.items()):
>                    s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
>
>       -        open(self.getUserCacheFilename(), "wb").write(s)
>      @@ -674,7 +272,8 @@
>                self.userMapFromPerforceServer = True
>
>            def loadUserMapFromCache(self):
>      -+        """ Reads the P4 username to git email map """
>      ++        """ Reads the P4 username to git email map
>      ++        """
>                self.users = {}
>                self.userMapFromPerforceServer = False
>                try:
>      @@ -721,80 +320,6 @@
>                    # cleanup our temporary file
>                    os.unlink(outFileName)
>                    print("Failed to strip RCS keywords in %s" % file)
>      -@@
>      -                 break
>      -         if not change_entry:
>      -             die('Failed to decode output of p4 change -o')
>      --        for key, value in change_entry.iteritems():
>      -+        for key, value in list(change_entry.items()):
>      -             if key.startswith('File'):
>      -                 if 'depot-paths' in settings:
>      -                     if not [p for p in settings['depot-paths']
>      --                            if p4PathStartsWith(value, p)]:
>      -+                            if p4PathStartsWith(value, p, self.verbose)]:
>      -                         continue
>      -                 else:
>      --                    if not p4PathStartsWith(value, self.depotPath):
>      -+                    if not p4PathStartsWith(value, self.depotPath, self.verbose):
>      -                         continue
>      -                 files_list.append(value)
>      -                 continue
>      -@@
>      -             return True
>      -
>      -         while True:
>      --            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
>      -+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
>      -+                .strip()[0]
>      -             if response == 'y':
>      -                 return True
>      -             if response == 'n':
>      -@@
>      -     def applyCommit(self, id):
>      -         """Apply one commit, return True if it succeeded."""
>      -
>      --        print("Applying", read_pipe(["git", "show", "-s",
>      --                                     "--format=format:%h %s", id]))
>      -+        print(("Applying", read_pipe(["git", "show", "-s",
>      -+                                     "--format=format:%h %s", id])))
>      -
>      -         (p4User, gitEmail) = self.p4UserForCommit(id)
>      -
>      -@@
>      -                     # disable the read-only bit on windows.
>      -                     if self.isWindows and file not in editedFiles:
>      -                         os.chmod(file, stat.S_IWRITE)
>      --                    self.patchRCSKeywords(file, kwfiles[file])
>      --                    fixed_rcs_keywords = True
>      -+
>      -+                    try:
>      -+                        self.patchRCSKeywords(file, kwfiles[file])
>      -+                        fixed_rcs_keywords = True
>      -+                    except:
>      -+                        # We are throwing an exception, undo all open edits
>      -+                        for f in editedFiles:
>      -+                            p4_revert(f)
>      -+                        raise
>      -+            else:
>      -+                # They do not have attemptRCSCleanup set, this might be the fail point
>      -+                # Check to see if the file has RCS keywords and suggest setting the property.
>      -+                for file in editedFiles | filesToDelete:
>      -+                    if p4_keywords_regexp_for_file(file) != None:
>      -+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
>      -+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
>      -+                        break
>      -
>      -             if fixed_rcs_keywords:
>      -                 print("Retrying the patch with RCS keywords cleaned up")
>      -@@
>      -             p4_delete(f)
>      -
>      -         # Set/clear executable bits
>      --        for f in filesToChangeExecBit.keys():
>      -+        for f in list(filesToChangeExecBit.keys()):
>      -             mode = filesToChangeExecBit[f]
>      -             setP4ExecBit(f, mode)
>      -
>       @@
>                tmpFile = os.fdopen(handle, "w+b")
>                if self.isWindows:
>      @@ -815,179 +340,6 @@
>
>                        if update_shelve:
>                            p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
>      -@@
>      -                 if verbose:
>      -                     print("created p4 label for tag %s" % name)
>      -
>      -+    def run_hook(self, hook_name, args = []):
>      -+        """ Runs a hook if it is found.
>      -+
>      -+            Returns NONE if the hook does not exist
>      -+            Returns TRUE if the exit code is 0, FALSE for a non-zero exit code.
>      -+        """
>      -+        hook_file = self.find_hook(hook_name)
>      -+        if hook_file == None:
>      -+            if self.verbose:
>      -+                print("Skipping hook: %s" % hook_name)
>      -+            return None
>      -+
>      -+        if self.verbose:
>      -+            print("hooks_path = %s " % hooks_path)
>      -+            print("hook_file = %s " % hook_file)
>      -+
>      -+        # Run the hook
>      -+        # TODO - allow non-list format
>      -+        cmd = [hook_file] + args
>      -+        return subprocess.call(cmd) == 0
>      -+
>      -+    def find_hook(self, hook_name):
>      -+        """ Locates the hook file for the given operating system.
>      -+        """
>      -+        hooks_path = gitConfig("core.hooksPath")
>      -+        if len(hooks_path) <= 0:
>      -+            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
>      -+
>      -+        # Look in the obvious place
>      -+        hook_file = os.path.join(hooks_path, hook_name)
>      -+        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK):
>      -+            return hook_file
>      -+
>      -+        # if we are windows, we will also allow them to have the hooks have extensions
>      -+        if (platform.system() == "Windows"):
>      -+            for ext in ['.exe', '.bat', 'ps1']:
>      -+                if os.path.isfile(hook_file + ext) and os.access(hook_file + ext, os.X_OK):
>      -+                    return hook_file + ext
>      -+
>      -+        # We didn't find the file
>      -+        return None
>      -+
>      -+
>      -+
>      -     def run(self, args):
>      -         if len(args) == 0:
>      -             self.master = currentGitBranch()
>      -@@
>      -             self.clientSpecDirs = getClientSpec()
>      -
>      -         # Check for the existence of P4 branches
>      --        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
>      -+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
>      -
>      -         if self.useClientSpec and not branchesDetected:
>      -             # all files are relative to the client spec
>      -@@
>      -             sys.exit("number of commits (%d) must match number of shelved changelist (%d)" %
>      -                      (len(commits), num_shelves))
>      -
>      --        hooks_path = gitConfig("core.hooksPath")
>      --        if len(hooks_path) <= 0:
>      --            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
>      --
>      --        hook_file = os.path.join(hooks_path, "p4-pre-submit")
>      --        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK) and subprocess.call([hook_file]) != 0:
>      -+        rtn = self.run_hook("p4-pre-submit")
>      -+        if rtn == False:
>      -             sys.exit(1)
>      -
>      -         #
>      -@@
>      -         last = len(commits) - 1
>      -         for i, commit in enumerate(commits):
>      -             if self.dry_run:
>      --                print(" ", read_pipe(["git", "show", "-s",
>      --                                      "--format=format:%h %s", commit]))
>      -+                print((" ", read_pipe(["git", "show", "-s",
>      -+                                      "--format=format:%h %s", commit])))
>      -                 ok = True
>      -             else:
>      -                 ok = self.applyCommit(commit)
>      -@@
>      -                         if self.conflict_behavior == "ask":
>      -                             print("What do you want to do?")
>      -                             response = raw_input("[s]kip this commit but apply"
>      --                                                 " the rest, or [q]uit? ")
>      -+                                                 " the rest, or [q]uit? ").lower().strip()[0]
>      -                             if not response:
>      -                                 continue
>      -                         elif self.conflict_behavior == "skip":
>      -@@
>      -                         star = "*"
>      -                     else:
>      -                         star = " "
>      --                    print(star, read_pipe(["git", "show", "-s",
>      --                                           "--format=format:%h %s",  c]))
>      -+                    print((star, read_pipe(["git", "show", "-s",
>      -+                                           "--format=format:%h %s",  c])))
>      -                 print("You will have to do 'git p4 sync' and rebase.")
>      -
>      -         if gitConfigBool("git-p4.exportLabels"):
>      -@@
>      -     # ("-//depot/A/..." becomes "/depot/A/..." after option parsing)
>      -     parser.values.cloneExclude += ["/" + re.sub(r"\.\.\.$", "", value)]
>      -
>      -+
>      - class P4Sync(Command, P4UserMap):
>      -
>      -     def __init__(self):
>      -@@
>      -         self.knownBranches = {}
>      -         self.initialParents = {}
>      -
>      --        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
>      -+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
>      -         self.labels = {}
>      -
>      -     # Force a checkpoint in fast-import and wait for it to finish
>      -@@
>      -     def isPathWanted(self, path):
>      -         for p in self.cloneExclude:
>      -             if p.endswith("/"):
>      --                if p4PathStartsWith(path, p):
>      -+                if p4PathStartsWith(path, p, self.verbose):
>      -                     return False
>      -             # "-//depot/file1" without a trailing "/" should only exclude "file1", but not "file111" or "file1_dir/file2"
>      -             elif path.lower() == p.lower():
>      -                 return False
>      -         for p in self.depotPaths:
>      --            if p4PathStartsWith(path, p):
>      -+            if p4PathStartsWith(path, p, self.verbose):
>      -                 return True
>      -         return False
>      -
>      -     def extractFilesFromCommit(self, commit, shelved=False, shelved_cl = 0):
>      -+        """ Generates the list of files to be added in this git commit.
>      -+
>      -+            commit     = Unicode[] - data read from the P4 commit
>      -+            shelved    = Bool      - Is the P4 commit flagged as being shelved.
>      -+            shelved_cl = Unicode   - Numeric string with the changelist number.
>      -+        """
>      -         files = []
>      -         fnum = 0
>      -         while "depotFile%s" % fnum in commit:
>      -@@
>      -             path = self.clientSpecDirs.map_in_client(path)
>      -             if self.detectBranches:
>      -                 for b in self.knownBranches:
>      --                    if p4PathStartsWith(path, b + "/"):
>      -+                    if p4PathStartsWith(path, b + "/", self.verbose):
>      -                         path = path[len(b)+1:]
>      -
>      -         elif self.keepRepoPath:
>      -@@
>      -             # //depot/; just look at first prefix as they all should
>      -             # be in the same depot.
>      -             depot = re.sub("^(//[^/]+/).*", r'\1', prefixes[0])
>      --            if p4PathStartsWith(path, depot):
>      -+            if p4PathStartsWith(path, depot, self.verbose):
>      -                 path = path[len(depot):]
>      -
>      -         else:
>      -             for p in prefixes:
>      --                if p4PathStartsWith(path, p):
>      -+                if p4PathStartsWith(path, p, self.verbose):
>      -                     path = path[len(p):]
>      -                     break
>      -
>       @@
>                return path
>
>      @@ -1002,19 +354,6 @@
>
>                if self.clientSpecDirs:
>                    files = self.extractFilesFromCommit(commit)
>      -@@
>      -             else:
>      -                 relPath = self.stripRepoPath(path, self.depotPaths)
>      -
>      --            for branch in self.knownBranches.keys():
>      -+            for branch in list(self.knownBranches.keys()):
>      -                 # add a trailing slash so that a commit into qt/4.2foo
>      -                 # doesn't end up in qt/4.2, e.g.
>      --                if p4PathStartsWith(relPath, branch + "/"):
>      -+                if p4PathStartsWith(relPath, branch + "/", self.verbose):
>      -                     if branch not in branches:
>      -                         branches[branch] = []
>      -                     branches[branch].append(file)
>       @@
>                return branches
>
>      @@ -1031,18 +370,6 @@
>       +            self.gitStreamBytes.write(d)
>                self.gitStream.write('\n')
>
>      --    def encodeWithUTF8(self, path):
>      --        try:
>      --            path.decode('ascii')
>      --        except:
>      --            encoding = 'utf8'
>      --            if gitConfig('git-p4.pathEncoding'):
>      --                encoding = gitConfig('git-p4.pathEncoding')
>      --            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>      --            if self.verbose:
>      --                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>      --        return path
>      --
>       -    # output one file from the P4 stream
>       -    # - helper for streamP4Files
>       -
>      @@ -1053,18 +380,13 @@
>       +            contents should be a bytes (bytes)
>       +        """
>                relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
>      --        relPath = self.encodeWithUTF8(relPath)
>      -+        relPath = encodeWithUTF8(relPath, self.verbose)
>      +         relPath = encodeWithUTF8(relPath, self.verbose)
>                if verbose:
>      -             if 'fileSize' in self.stream_file:
>      +@@
>                        size = int(self.stream_file['fileSize'])
>                    else:
>                        size = 0 # deleted files don't get a fileSize apparently
>      --            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
>      -+            #if isunicode:
>      -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), to_unicode(relPath), size//1024//1024))
>      -+            #else:
>      -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), relPath, size//1024//1024))
>      +-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
>       +            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
>                    sys.stdout.flush()
>
>      @@ -1100,15 +422,6 @@
>
>                if self.largeFileSystem:
>       @@
>      -
>      -     def streamOneP4Deletion(self, file):
>      -         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
>      --        relPath = self.encodeWithUTF8(relPath)
>      -+        relPath = encodeWithUTF8(relPath, self.verbose)
>      -         if verbose:
>      -             sys.stdout.write("delete %s\n" % relPath)
>      -             sys.stdout.flush()
>      -@@
>                if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
>                    self.largeFileSystem.removeLargeFile(relPath)
>
>      @@ -1133,13 +446,6 @@
>
>                if not err and 'fileSize' in self.stream_file:
>                    required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
>      -             if required_bytes > 0:
>      -                 err = 'Not enough space left on %s! Free at least %i MB.' % (
>      --                    os.getcwd(), required_bytes/1024/1024
>      -+                    os.getcwd(), required_bytes//1024//1024
>      -                 )
>      -
>      -         if err:
>       @@
>                    # ignore errors, but make sure it exits first
>                    self.importProcess.wait()
>      @@ -1155,12 +461,10 @@
>                    self.streamOneP4File(self.stream_file, self.stream_contents)
>                    self.stream_file = {}
>       @@
>      -
>                # pick up the new file information... for the
>                # 'data' field we need to append to our array
>      --        for k in marshalled.keys():
>      +         for k in list(marshalled.keys()):
>       -            if k == 'data':
>      -+        for k in list(marshalled.keys()):
>       +            if k == b'data':
>                        if 'streamContentSize' not in self.stream_file:
>                            self.stream_file['streamContentSize'] = 0
>      @@ -1178,12 +482,10 @@
>                if (verbose and
>                    'streamContentSize' in self.stream_file and
>       @@
>      -             'depotFile' in self.stream_file):
>                    size = int(self.stream_file["fileSize"])
>                    if size > 0:
>      --                progress = 100*self.stream_file['streamContentSize']/size
>      --                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
>      -+                progress = 100.0*self.stream_file['streamContentSize']/size
>      +                 progress = 100.0*self.stream_file['streamContentSize']/size
>      +-                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
>       +                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
>                        sys.stdout.flush()
>
>      @@ -1227,24 +529,6 @@
>
>                if verbose:
>       @@
>      -
>      -         gitStream.write("tagger %s\n" % tagger)
>      -
>      --        print("labelDetails=",labelDetails)
>      -+        print(("labelDetails=",labelDetails))
>      -         if 'Description' in labelDetails:
>      -             description = labelDetails['Description']
>      -         else:
>      -@@
>      -         if not self.branchPrefixes:
>      -             return True
>      -         hasPrefix = [p for p in self.branchPrefixes
>      --                        if p4PathStartsWith(path, p)]
>      -+                        if p4PathStartsWith(path, p, self.verbose)]
>      -         if not hasPrefix and self.verbose:
>      -             print('Ignoring file outside of prefix: {0}'.format(path))
>      -         return hasPrefix
>      -@@
>                        .format(details['change']))
>                    return
>
>      @@ -1307,58 +591,6 @@
>
>                if len(parent) > 0:
>                    if self.verbose:
>      -@@
>      -             self.labels[newestChange] = [output, revisions]
>      -
>      -         if self.verbose:
>      --            print("Label changes: %s" % self.labels.keys())
>      -+            print("Label changes: %s" % list(self.labels.keys()))
>      -
>      -     # Import p4 labels as git tags. A direct mapping does not
>      -     # exist, so assume that if all the files are at the same revision
>      -@@
>      -                 source = paths[0]
>      -                 destination = paths[1]
>      -                 ## HACK
>      --                if p4PathStartsWith(source, self.depotPaths[0]) and p4PathStartsWith(destination, self.depotPaths[0]):
>      -+                if p4PathStartsWith(source, self.depotPaths[0], self.verbose) and p4PathStartsWith(destination, self.depotPaths[0], self.verbose):
>      -                     source = source[len(self.depotPaths[0]):-4]
>      -                     destination = destination[len(self.depotPaths[0]):-4]
>      -
>      -@@
>      -
>      -     def getBranchMappingFromGitBranches(self):
>      -         branches = p4BranchesInGit(self.importIntoRemotes)
>      --        for branch in branches.keys():
>      -+        for branch in list(branches.keys()):
>      -             if branch == "master":
>      -                 branch = "main"
>      -             else:
>      -@@
>      -             self.updateOptionDict(description)
>      -
>      -             if not self.silent:
>      --                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
>      -+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
>      -                 sys.stdout.flush()
>      -             cnt = cnt + 1
>      -
>      -             try:
>      -                 if self.detectBranches:
>      -                     branches = self.splitFilesIntoBranches(description)
>      --                    for branch in branches.keys():
>      -+                    for branch in list(branches.keys()):
>      -                         ## HACK  --hwn
>      -                         branchPrefix = self.depotPaths[0] + branch + "/"
>      -                         self.branchPrefixes = [ branchPrefix ]
>      -@@
>      -                 sys.exit(1)
>      -
>      -     def sync_origin_only(self):
>      -+        """ Ensures that the origin has been synchronized if one is set """
>      -         if self.syncWithOrigin:
>      -             self.hasOrigin = originP4BranchesExist()
>      -             if self.hasOrigin:
>       @@
>                        system("git fetch origin")
>
>      @@ -1439,61 +671,6 @@
>            def closeStreams(self):
>                self.gitStream.close()
>       @@
>      -                 if short in branches:
>      -                     self.p4BranchesInGit = [ short ]
>      -             else:
>      --                self.p4BranchesInGit = branches.keys()
>      -+                self.p4BranchesInGit = list(branches.keys())
>      -
>      -             if len(self.p4BranchesInGit) > 1:
>      -                 if not self.silent:
>      -                     print("Importing from/into multiple branches")
>      -                 self.detectBranches = True
>      --                for branch in branches.keys():
>      -+                for branch in list(branches.keys()):
>      -                     self.initialParents[self.refPrefix + branch] = \
>      -                         branches[branch]
>      -
>      -@@
>      -                                  help="where to leave result of the clone"),
>      -             optparse.make_option("--bare", dest="cloneBare",
>      -                                  action="store_true", default=False),
>      -+            optparse.make_option("--encoding", dest="setPathEncoding",
>      -+                                 action="store", default=None,
>      -+                                 help="Sets the path encoding for this depot")
>      -         ]
>      -         self.cloneDestination = None
>      -         self.needsGit = False
>      -         self.cloneBare = False
>      -+        self.setPathEncoding = None
>      -
>      -     def defaultDestination(self, args):
>      -+        """Returns the last path component as the default git
>      -+        repository directory name"""
>      -         ## TODO: use common prefix of args?
>      -         depotPath = args[0]
>      -         depotDir = re.sub("(@[^@]*)$", "", depotPath)
>      -         depotDir = re.sub("(#[^#]*)$", "", depotDir)
>      -         depotDir = re.sub(r"\.\.\.$", "", depotDir)
>      -         depotDir = re.sub(r"/$", "", depotDir)
>      --        return os.path.split(depotDir)[1]
>      -+        return depotDir.split('/')[-1]
>      -
>      -     def run(self, args):
>      -         if len(args) < 1:
>      -@@
>      -
>      -         depotPaths = args
>      -
>      -+        # If we have an encoding provided, ignore what may already exist
>      -+        # in the registry. This will ensure we show the displayed values
>      -+        # using the correct encoding.
>      -+        if self.setPathEncoding:
>      -+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
>      -+
>      -+        # If more than 1 path element is supplied, the last element
>      -+        # is the clone destination.
>      -         if not self.cloneDestination and len(depotPaths) > 1:
>                    self.cloneDestination = depotPaths[-1]
>                    depotPaths = depotPaths[:-1]
>
>      @@ -1512,177 +689,3 @@
>
>                if not os.path.exists(self.cloneDestination):
>                    os.makedirs(self.cloneDestination)
>      -@@
>      -         if retcode:
>      -             raise CalledProcessError(retcode, init_cmd)
>      -
>      -+        # Set the encoding if it was provided command line
>      -+        if self.setPathEncoding:
>      -+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
>      -+            retcode = subprocess.call(init_cmd)
>      -+            if retcode:
>      -+                raise CalledProcessError(retcode, init_cmd)
>      -+
>      -         if not P4Sync.run(self, depotPaths):
>      -             return False
>      -
>      -@@
>      -             to find the P4 commit we are based on, and the depot-paths.
>      -         """
>      -
>      --        for parent in (range(65535)):
>      -+        for parent in (list(range(65535))):
>      -             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
>      -             settings = extractSettingsGitLog(log)
>      -             if 'change' in settings:
>      -@@
>      -             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
>      -         return True
>      -
>      -+class Py23File():
>      -+    """ Python2/3 Unicode File Wrapper
>      -+    """
>      -+
>      -+    stream_handle = None
>      -+    verbose       = False
>      -+    debug_handle  = None
>      -+
>      -+    def __init__(self, stream_handle, verbose = False,
>      -+                 debug_handle = None):
>      -+        """ Create a Python3 compliant Unicode to Byte String
>      -+            Windows compatible wrapper
>      -+
>      -+            stream_handle = the underlying file-like handle
>      -+            verbose       = Boolean if content should be echoed
>      -+            debug_handle  = A file-like handle data is duplicately written to
>      -+        """
>      -+        self.stream_handle = stream_handle
>      -+        self.verbose       = verbose
>      -+        self.debug_handle  = debug_handle
>      -+
>      -+    def write(self, utf8string):
>      -+        """ Writes the utf8 encoded string to the underlying
>      -+            file stream
>      -+        """
>      -+        self.stream_handle.write(as_bytes(utf8string))
>      -+        if self.verbose:
>      -+            sys.stderr.write("Stream Output: %s" % utf8string)
>      -+            sys.stderr.flush()
>      -+        if self.debug_handle:
>      -+            self.debug_handle.write(as_bytes(utf8string))
>      -+
>      -+    def read(self, size = None):
>      -+        """ Reads int charcters from the underlying stream
>      -+            and converts it to utf8.
>      -+
>      -+            Be aware, the size value is for reading the underlying
>      -+            bytes so the value may be incorrect. Usage of the size
>      -+            value is discouraged.
>      -+        """
>      -+        if size == None:
>      -+            return as_string(self.stream_handle.read())
>      -+        else:
>      -+            return as_string(self.stream_handle.read(size))
>      -+
>      -+    def readline(self):
>      -+        """ Reads a line from the underlying byte stream
>      -+            and converts it to utf8
>      -+        """
>      -+        return as_string(self.stream_handle.readline())
>      -+
>      -+    def readlines(self, sizeHint = None):
>      -+        """ Returns a list containing lines from the file converted to unicode.
>      -+
>      -+            sizehint - Optional. If the optional sizehint argument is
>      -+            present, instead of reading up to EOF, whole lines totalling
>      -+            approximately sizehint bytes are read.
>      -+        """
>      -+        lines = self.stream_handle.readlines(sizeHint)
>      -+        for i in range(0, len(lines)):
>      -+            lines[i] = as_string(lines[i])
>      -+        return lines
>      -+
>      -+    def close(self):
>      -+        """ Closes the underlying byte stream """
>      -+        self.stream_handle.close()
>      -+
>      -+    def flush(self):
>      -+        """ Flushes the underlying byte stream """
>      -+        self.stream_handle.flush()
>      -+
>      -+class DepotPath():
>      -+    """ Describes a DepotPath or File
>      -+    """
>      -+
>      -+    raw_path = None
>      -+    utf8_path = None
>      -+    bytes_path = None
>      -+
>      -+    def __init__(self, path):
>      -+        """ Creates a new DepotPath with the path encoded
>      -+            with by the P4 repository
>      -+        """
>      -+        raw_path = path
>      -+
>      -+    def raw():
>      -+        """ Returns the path as it was originally found
>      -+            in the P4 repository
>      -+        """
>      -+        return raw_path
>      -+
>      -+    def startswith(self, prefix, start = None, end = None):
>      -+        """ Return True if string starts with the prefix, otherwise
>      -+            return False. prefix can also be a tuple of prefixes to
>      -+            look for. With optional start, test string beginning at
>      -+            that position. With optional end, stop comparing
>      -+            string at that position.
>      -+        """
>      -+        return raw_path.startswith(prefix, start, end)
>      -+
>      -+
>      - class HelpFormatter(optparse.IndentedHelpFormatter):
>      -     def __init__(self):
>      -         optparse.IndentedHelpFormatter.__init__(self)
>      -@@
>      -
>      - def main():
>      -     if len(sys.argv[1:]) == 0:
>      --        printUsage(commands.keys())
>      -+        printUsage(list(commands.keys()))
>      -         sys.exit(2)
>      -
>      -     cmdName = sys.argv[1]
>      -@@
>      -     except KeyError:
>      -         print("unknown command %s" % cmdName)
>      -         print("")
>      --        printUsage(commands.keys())
>      -+        printUsage(list(commands.keys()))
>      -         sys.exit(2)
>      -
>      -     options = cmd.options
>      -@@
>      -                                    description = cmd.description,
>      -                                    formatter = HelpFormatter())
>      -
>      --    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>      -+    try:
>      -+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>      -+    except:
>      -+        parser.print_help()
>      -+        raise
>      -+
>      -     global verbose
>      -     verbose = cmd.verbose
>      -     if cmd.needsGit:
>      -@@
>      -                         chdir(cdup);
>      -
>      -         if not isValidGitDir(cmd.gitdir):
>      --            if isValidGitDir(cmd.gitdir + "/.git"):
>      --                cmd.gitdir += "/.git"
>      -+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
>      -+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
>      -             else:
>      -                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
>      -
>   -:  ---------- > 11:  883ef45ca5 git-p4: Added --encoding parameter to p4 clone
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 01/11] git-p4: select p4 binary by operating-system
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
@ 2019-12-05 10:19         ` Denton Liu
  2019-12-05 16:32           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Denton Liu @ 2019-12-05 10:19 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

Hi Ben,

First of all, as a note to you and possibly others, I don't have much
(read: any) experience with git-p4. I do have experience with Python and
how git.git generally does things so I'll be reviewing from that
perspective.

On Wed, Dec 04, 2019 at 10:29:27PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Depending on the version of GIT and Python installed, the perforce program (p4) may not resolve on Windows without the program extension.

Nit: "GIT" should be written as "Git" when referring to the whole
project and "git" when referring to the command. Never in all-caps.

Also, please wrap your paragraphs at 72 characters. I'll say it once
here but it applies to your whole series.

> 
> Check the operating system (platform.system) and if it is reporting that it is Windows, use the full filename of "p4.exe" instead of "p4"
> 
> The original code unconditionally used "p4" as the binary filename.

As a rule of thumb, we want to state the problem first before we state
what we did (and why). I'd move this paragraph up.

> 
> This change is Python2 and Python3 compatible.
> 
> Thanks to: Junio C Hamano <gitster@pobox.com> and  Denton Liu <liu.denton@gmail.com> for patiently explaining proper format for my submissions.

I appreciate the credit but I don't think it's necessary. At _most_, you
could include the

	Helped-by: Junio C Hamano <gitster@pobox.com>
	Helped-by: Denton Liu <liu.denton@gmail.com>

tags before your signoff but I don't think we've done anything to
warrant it.

> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 9a3a5c4e6d29dbef670072a9605c7a82b3729434)

You should remove this line in all of your commits. The referenced
commit isn't public so the information isn't very useful. Also, try to
not include anything after your signoff so if this hypothetically were
useful information, you'd include it before your signoff.

If it's information that's ephemerally useful for current reviewers but
not for future readers of your commit in the log message, you can
include it after the three hyphens...

> ---
like this and it won't be included as part of the log message.

>  git-p4.py | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/git-p4.py b/git-p4.py
> index 60c73b6a37..b2ffbc057b 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -75,7 +75,11 @@ def p4_build_cmd(cmd):
>      location. It means that hooking into the environment, or other configuration
>      can be done more easily.
>      """
> -    real_cmd = ["p4"]
> +    # Look for the P4 binary

I don't think this comment is necessary as the code itself is pretty
self-explanatory.

> +    if (platform.system() == "Windows"):
> +        real_cmd = ["p4.exe"]    

You have trailing whitespace here. Try to run `git diff --check` before
committing to ensure that you have no whitespace errors.

Thanks,

Denton

> +    else:
> +        real_cmd = ["p4"]
>  
>      user = gitConfig("git-p4.user")
>      if len(user) > 0:
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 02/11] git-p4: change the expansion test from basestring to list
  2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
@ 2019-12-05 10:27         ` Denton Liu
  2019-12-05 17:05           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Denton Liu @ 2019-12-05 10:27 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

Hi Ben,

On Wed, Dec 04, 2019 at 10:29:28PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Python 3+ handles strings differently than Python 2.7.

Do you mean Python 3?

> Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

Python 3.5 doesn't reach EOL until Q4 2020[1]. We should be testing
these changes under 3.5 to ensure that we're not accidentally
introducing stuff that's not backwards compatible.

> 
> Change references to basestring in the isinstance tests to use list instead. This prepares the code to remove all references to basestring.
> 
> The original code used basestring in a test to determine if a list or literal string was passed into 9 different functions.  This is used to determine if the shell should be evoked when calling subprocess methods.

Once again, I'd swap the above two paragraphs. Problem then solution.

Also, did you mean "invoked" instead of "evoked"?

> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 5b1b1c145479b5d5fd242122737a3134890409e6)
> ---
>  git-p4.py | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

The patch itself looks good, though.

[1]: https://devguide.python.org/#branchstatus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
  2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
@ 2019-12-05 10:40         ` Denton Liu
  2019-12-05 18:42           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Denton Liu @ 2019-12-05 10:40 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

On Wed, Dec 04, 2019 at 10:29:29PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
> 
> Change the existing unicode test add new support functions for python2-python3 support.
> 
> Define the following variables:
> - isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
> - unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
> - bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.
> 
> Add the following new functions:
> 
> - as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
> - as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
> - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.
> 
> Add a new function alias raw_input:
> If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.
> 
> The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.

How come AS_STRING and AS_BYTES are all-caps here?

> 
> basestring is removed since its only references are found in tests that were changed in the previous change list.
> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
> ---
>  git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/git-p4.py b/git-p4.py
> index 0f27996393..93dfd0920a 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -32,16 +32,78 @@
>      unicode = unicode
>  except NameError:
>      # 'unicode' is undefined, must be Python 3
> -    str = str
> +    #
> +    # For Python3 which is natively unicode, we will use 
> +    # unicode for internal information but all P4 Data
> +    # will remain in bytes
> +    isunicode = True
>      unicode = str
>      bytes = bytes
> -    basestring = (str,bytes)
> +
> +    def as_string(text):
> +        """Return a byte array as a unicode string"""
> +        if text == None:

Nit: use `text is None` instead. Actually, any time you're checking an
object to see if it's None, you should use `is` instead of `==` since
there's usually only one None reference.

> +            return None
> +        if isinstance(text, bytes):
> +            return unicode(text, "utf-8")
> +        else:
> +            return text
> +
> +    def as_bytes(text):
> +        """Return a Unicode string as a byte array"""
> +        if text == None:
> +            return None
> +        if isinstance(text, bytes):
> +            return text
> +        else:
> +            return bytes(text, "utf-8")
> +
> +    def to_unicode(text):
> +        """Return a byte array as a unicode string"""
> +        return as_string(text)    
> +
> +    def path_as_string(path):
> +        """ Converts a path to the UTF8 encoded string """
> +        if isinstance(path, unicode):
> +            return path
> +        return encodeWithUTF8(path).decode('utf-8')
> +    

Trailing whitespace.

>  else:
>      # 'unicode' exists, must be Python 2
> -    str = str
> +    #
> +    # We will treat the data as:
> +    #   str   -> str
> +    #   bytes -> str
> +    # So for Python2 these functions are no-ops
> +    # and will leave the data in the ambiguious
> +    # string/bytes state
> +    isunicode = False
>      unicode = unicode
>      bytes = str
> -    basestring = basestring
> +
> +    def as_string(text):
> +        """ Return text unaltered (for Python3 support) """

I didn't mention this in earlier emails but it's been bothering me a
lot: is there any reason why you write it as "Python3" vs. "Python 3"
sometimes (and Python2 as well)? If there's no difference, then we
should probably stick to one variant in both the commit messages and in
the code. (I prefer the spaced variant.)

> +        return text
> +
> +    def as_bytes(text):
> +        """ Return text unaltered (for Python3 support) """
> +        return text
> +
> +    def to_unicode(text):
> +        """Return a string as a unicode string"""
> +        return text.decode('utf-8')
> +    

Trailing whitespace.

> +    def path_as_string(path):
> +        """ Converts a path to the UTF8 encoded bytes """
> +        return encodeWithUTF8(path)
> +
> +
> + 

Trailing whitespace.

> +# Check for raw_input support
> +try:
> +    raw_input
> +except NameError:
> +    raw_input = input
>  
>  try:
>      from subprocess import CalledProcessError
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 05/11] git-p4: Add new functions in preparation of usage
  2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
@ 2019-12-05 10:50         ` Denton Liu
  2019-12-05 19:23           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Denton Liu @ 2019-12-05 10:50 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

> Subject: git-p4: Add new functions in preparation of usage

Nit: as a convention, you should lowercase the letter after the colon in
the subject. As in "git-p4: add new functions..."

This applies for other patches as well.

On Wed, Dec 04, 2019 at 10:29:31PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> This changelist is an intermediate submission for migrating the P4 support from Python2 to Python3. The code needs access to the encodeWithUTF8() for support of non-UTF8 filenames in the clone class as well as the sync class.
> 
> Move the function encodeWithUTF8() from the P4Sync class to a stand-alone function.  This will allow other classes to use this function without instanciating the P4Sync class. Change the self.verbose reference to an optional method parameter. Update the existing references to this function to pass the self.verbose since it is no longer available on "self" since the function is no longer contained on the P4Sync class.

Hmmm, so does the patch before this not actually work since
encodeWithUTF8() isn't defined yet? When you reroll this series, you
should swap the order of the patches since the previous patch depends on
this one, not the other way around.

> 
> Modify the functions write_pipe() and p4_write_pipe() to remove the return value.  The return value for both functions is the number of bytes, but the meaning is lost under python3 since the count does not match the number of characters that may have been encoded.  Additionally, the return value was never used, so this is removed to avoid future ambiguity.
> 
> Add a new method gitConfigSet(). This method will set a value in the git configuration cache list.
> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit affe888f432bb6833df78962e8671fccdf76c47a)
> ---
>  git-p4.py | 60 ++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 16 deletions(-)
> 
> diff --git a/git-p4.py b/git-p4.py
> index b283ef1029..2659531c2e 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -237,6 +237,8 @@ def die(msg):
>          sys.exit(1)
>  
>  def write_pipe(c, stdin):
> +    """ Executes the command 'c', passing 'stdin' on the standard input
> +    """
>      if verbose:
>          sys.stderr.write('Writing pipe: %s\n' % str(c))
>  
> @@ -248,11 +250,12 @@ def write_pipe(c, stdin):
>      if p.wait():
>          die('Command failed: %s' % str(c))
>  
> -    return val
>  
>  def p4_write_pipe(c, stdin):
> +    """ Runs a P4 command 'c', passing 'stdin' data to P4
> +    """
>      real_cmd = p4_build_cmd(c)
> -    return write_pipe(real_cmd, stdin)
> +    write_pipe(real_cmd, stdin)
>  
>  def read_pipe_full(c):
>      """ Read output from  command. Returns a tuple
> @@ -653,6 +656,38 @@ def isModeExec(mode):
>      # otherwise False.
>      return mode[-3:] == "755"
>  
> +def encodeWithUTF8(path, verbose = False):

Nit: no spaces surrounding `=` in default args.

> +    """ Ensure that the path is encoded as a UTF-8 string
> +
> +        Returns bytes(P3)/str(P2)
> +    """
> +   

Trailing whitespace.

> +    if isunicode:
> +        try:
> +            if isinstance(path, unicode):
> +                # It is already unicode, cast it as a bytes
> +                # that is encoded as utf-8.
> +                return path.encode('utf-8', 'strict')
> +            path.decode('ascii', 'strict')
> +        except:
> +            encoding = 'utf8'
> +            if gitConfig('git-p4.pathEncoding'):
> +                encoding = gitConfig('git-p4.pathEncoding')
> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
> +            if verbose:
> +                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
> +    else:    

Trailing whitespace.

> +        try:
> +            path.decode('ascii')
> +        except:
> +            encoding = 'utf8'
> +            if gitConfig('git-p4.pathEncoding'):
> +                encoding = gitConfig('git-p4.pathEncoding')
> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
> +            if verbose:
> +                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
> +    return path
> +
>  class P4Exception(Exception):
>      """ Base class for exceptions from the p4 client """
>      def __init__(self, exit_code):
> @@ -891,6 +926,11 @@ def gitConfigList(key):
>              _gitConfig[key] = []
>      return _gitConfig[key]
>  
> +def gitConfigSet(key, value):
> +    """ Set the git configuration key 'key' to 'value' for this session
> +    """
> +    _gitConfig[key] = value
> +
>  def p4BranchesInGit(branchesAreInRemotes=True):
>      """Find all the branches whose names start with "p4/", looking
>         in remotes or heads as specified by the argument.  Return
> @@ -2814,24 +2854,12 @@ def writeToGitStream(self, gitMode, relPath, contents):
>              self.gitStream.write(d)
>          self.gitStream.write('\n')
>  
> -    def encodeWithUTF8(self, path):
> -        try:
> -            path.decode('ascii')
> -        except:
> -            encoding = 'utf8'
> -            if gitConfig('git-p4.pathEncoding'):
> -                encoding = gitConfig('git-p4.pathEncoding')
> -            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
> -            if self.verbose:
> -                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
> -        return path
> -
>      # output one file from the P4 stream
>      # - helper for streamP4Files
>  
>      def streamOneP4File(self, file, contents):
>          relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
> -        relPath = self.encodeWithUTF8(relPath)
> +        relPath = encodeWithUTF8(relPath, self.verbose)
>          if verbose:
>              if 'fileSize' in self.stream_file:
>                  size = int(self.stream_file['fileSize'])
> @@ -2914,7 +2942,7 @@ def streamOneP4File(self, file, contents):
>  
>      def streamOneP4Deletion(self, file):
>          relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
> -        relPath = self.encodeWithUTF8(relPath)
> +        relPath = encodeWithUTF8(relPath, self.verbose)
>          if verbose:
>              sys.stdout.write("delete %s\n" % relPath)
>              sys.stdout.flush()
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 04/11] git-p4: python3 syntax changes
  2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
@ 2019-12-05 11:02         ` Denton Liu
  0 siblings, 0 replies; 64+ messages in thread
From: Denton Liu @ 2019-12-05 11:02 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

On Wed, Dec 04, 2019 at 10:29:30PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
> 
> There are a number of translations suggested by modernize/futureize that should be taken to fix numerous non-string specific issues.
> 
> Change references to the X.next() iterator to the function next(X) which is compatible with both Python2 and Python3.
> 
> Change references to X.keys() to list(X.keys()) to return a list that can be iterated in both Python2 and Python3.

I don't think this is necessary. From what I can tell, using the
key-view of the dict objects is fine since we're always doing so in a
read-only manner.

> 
> Add the literal text (object) to the end of class definitions to be consistent with Python3 class definition.

Since we're going to be dropping Python 2 soon, do we need this? I get
that we'd be mixing old-style with new-style classes in Python 2 vs
Python 3 but it's not like we do anything with the classess related to
type() or isinstance().

Anyway, I'm going to stop here since it's way past my bedtime. I hope
that my suggestions so far have been helpful.

> 
> Change integer divison to use "//" instead of "/"  Under Both python2 and python3 // will return a floor()ed result which matches existing functionality.
> 
> Change the format string for displaying decimal values from %d to %4.1f% when displaying a progress.  This avoids displaying long repeating decimals in user displayed text.
> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit bde6b83296aa9b3e7a584c5ce2b571c7287d8f9f)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly
  2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
@ 2019-12-05 13:38         ` Junio C Hamano
  2019-12-05 19:37           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2019-12-05 13:38 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> When a computer is configured to use Git for windows and Python for windows, and not a Unix subsystem like cygwin or WSL, the directory separator changes and causes git-p4 to fail to properly determine paths.
>
> Fix 3 path separator errors:
>
> 1. getUserCacheFilename should not use string concatenation. Change this code to use os.path.join to build an OS tolerant path.
> 2. defaultDestiantion used the OS.path.split to split depot paths.  This is incorrect on windows. Change the code to split on a forward slash(/) instead since depot paths use this character regardless  of the operating system.
> 3. The call to isvalidGitDir() in the main code also used a literal forward slash. Change the cose to use os.path.join to correctly format the path for the operating system.

s/isvalid/isValid/;
s/cose/code/; 

Also please wrap your lines at around 72 columns (that will let
reviewers quote what you write (which adds "> " prefix and consumes
2 more columns), and would allow us a handful of exchanges (each
round adding ">" prefix to consume 1 more column) before bumping
into the right edge of the terminal at 80 columns.

> These three changes allow the suggested windows configuration to properly locate files while retaining the existing behavior on non-windows operating systems.
>
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit a5b45c12c3861638a933b05a1ffee0c83978dcb2)

As Denton mentioned, general public do not care if you "cherry
picked" it from your earlier unpublished work.  Remove it.

Aside from these small nits, the proposed log message for this step
is quite cleanly done and easily readable.  All the decisions are
clearly written and agreeable.  Nicely done.

> ---
>  git-p4.py | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index 2659531c2e..7ac8cb42ef 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1454,8 +1454,10 @@ def p4UserIsMe(self, p4User):
>              return True
>  
>      def getUserCacheFilename(self):
> +        """ Returns the filename of the username cache 
> +	    """

Inconsistent use of spaces and a tab I see on these two lines.
Intended?

>          home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
> -        return home + "/.gitp4-usercache.txt"
> +        return os.path.join(home, ".gitp4-usercache.txt")
>  
>      def getUserMapFromPerforceServer(self):
>          if self.userMapFromPerforceServer:
> @@ -3973,13 +3975,16 @@ def __init__(self):
>          self.cloneBare = False
>  
>      def defaultDestination(self, args):
> +        """ Returns the last path component as the default git 
> +            repository directory name
> +        """
>          ## TODO: use common prefix of args?
>          depotPath = args[0]
>          depotDir = re.sub("(@[^@]*)$", "", depotPath)
>          depotDir = re.sub("(#[^#]*)$", "", depotDir)
>          depotDir = re.sub(r"\.\.\.$", "", depotDir)
>          depotDir = re.sub(r"/$", "", depotDir)
> -        return os.path.split(depotDir)[1]
> +        return depotDir.split('/')[-1]
>  
>      def run(self, args):
>          if len(args) < 1:
> @@ -4252,8 +4257,8 @@ def main():
>                          chdir(cdup);
>  
>          if not isValidGitDir(cmd.gitdir):
> -            if isValidGitDir(cmd.gitdir + "/.git"):
> -                cmd.gitdir += "/.git"
> +            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
> +                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
>              else:
>                  die("fatal: cannot locate git repository at %s" % cmd.gitdir)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 07/11] git-p4: Add a helper class for stream writing
  2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
@ 2019-12-05 13:42         ` Junio C Hamano
  2019-12-05 19:52           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2019-12-05 13:42 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> This is a transtional commit that does not change current behvior.  It adds a new class Py23File.

Perhaps s/transitional/preparatory/?  It does not change the
behaviour because nobody uses the class yet, if I understand
correctly.  Which is fine.

It is kind of surprising that each project needs to reinvent and
maintain a wrapper class like this one, as what the new class does
smells quite generic.

> Following the Python recommendation of keeping text as unicode internally and only converting to and from bytes on input and output, this class provides an interface for the methods used for reading and writing files and file like streams.
>
> Create a class that wraps the input and output functions used by the git-p4.py code for reading and writing to standard file handles.
>
> The methods of this class should take a Unicode string for writing and return unicode strings in reads.  This class should be a drop-in for existing file like streams
>
> The following methods should be coded for supporting existing read/write calls:
> * write - this should write a Unicode string to the underlying stream
> * read - this should read from the underlying stream and cast the bytes as a unicode string
> * readline - this should read one line of text from the underlying stream and cast it as a unicode string
> * readline - this should read a number of lines, optionally hinted, and cast each line as a unicode string
>
> The expression "cast as a unicode string" is used because the code should use the AS_BYTES() and AS_UNICODE() functions instead of cohercing the data to actual unicode strings or bytes.  This allows python 2 code to continue to use the internal "str" data type instead of converting the data back and forth to actual unicode strings. This retains current python2 support while python3 support may be incomplete.
>
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 12919111fbaa3e4c0c4c2fdd4f79744cc683d860)
> ---
>  git-p4.py | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
>
> diff --git a/git-p4.py b/git-p4.py
> index 7ac8cb42ef..0da640be93 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -4182,6 +4182,72 @@ def run(self, args):
>              print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
>          return True
>  
> +class Py23File():
> +    """ Python2/3 Unicode File Wrapper 
> +    """
> +    
> +    stream_handle = None
> +    verbose       = False
> +    debug_handle  = None
> +   
> +    def __init__(self, stream_handle, verbose = False):
> +        """ Create a Python3 compliant Unicode to Byte String
> +            Windows compatible wrapper
> +
> +            stream_handle = the underlying file-like handle
> +            verbose       = Boolean if content should be echoed
> +        """
> +        self.stream_handle = stream_handle
> +        self.verbose       = verbose
> +
> +    def write(self, utf8string):
> +        """ Writes the utf8 encoded string to the underlying 
> +            file stream
> +        """
> +        self.stream_handle.write(as_bytes(utf8string))
> +        if self.verbose:
> +            sys.stderr.write("Stream Output: %s" % utf8string)
> +            sys.stderr.flush()
> +
> +    def read(self, size = None):
> +        """ Reads int charcters from the underlying stream 
> +            and converts it to utf8.
> +
> +            Be aware, the size value is for reading the underlying
> +            bytes so the value may be incorrect. Usage of the size
> +            value is discouraged.
> +        """
> +        if size == None:
> +            return as_string(self.stream_handle.read())
> +        else:
> +            return as_string(self.stream_handle.read(size))
> +
> +    def readline(self):
> +        """ Reads a line from the underlying byte stream 
> +            and converts it to utf8
> +        """
> +        return as_string(self.stream_handle.readline())
> +
> +    def readlines(self, sizeHint = None):
> +        """ Returns a list containing lines from the file converted to unicode.
> +
> +            sizehint - Optional. If the optional sizehint argument is 
> +            present, instead of reading up to EOF, whole lines totalling 
> +            approximately sizehint bytes are read.
> +        """
> +        lines = self.stream_handle.readlines(sizeHint)
> +        for i in range(0, len(lines)):
> +            lines[i] = as_string(lines[i])
> +        return lines
> +
> +    def close(self):
> +        """ Closes the underlying byte stream """
> +        self.stream_handle.close()
> +
> +    def flush(self):
> +        """ Flushes the underlying byte stream """
> +        self.stream_handle.flush()
> +
>  class HelpFormatter(optparse.IndentedHelpFormatter):
>      def __init__(self):
>          optparse.IndentedHelpFormatter.__init__(self)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 08/11] git-p4: p4CmdList  - support Unicode encoding
  2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
@ 2019-12-05 13:55         ` Junio C Hamano
  2019-12-05 20:23           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2019-12-05 13:55 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.

Somewhere in the midway of the series, the log message starts using
all-caps AS_STRING and AS_BYTES to describe some specific things,
and it would help readers if the first one of these steps explain
what they mean (I am guessing AS_STRING is an unicode object in both
Python 2 and 3, and AS_BYTES is a plain vanilla string in Python 2,
or something like that?).

> Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.

s/isto/is to/;

This sentence is a bit hard to read.

This change does not make the function optionally convert the input
we feed to the p4 command---it only changes the values in the
command output.  But the readers cannot tell that easily until
reading to the very end of the sentence, i.e. "returned by the
function", as written.

We probably want to be a bit more explicit to say what gets
converted; perhaps renaming the parameter to encode_cmd_output may
help.

> Change the code so that the key will always be encoded AS_STRING()

s/key/key of the returned hash/ or something to clarify what key you
are talking about.

> Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.

"Data that is passed to the standard input stream of the p4 process"
to clarify whose standard input you are talking about (iow, "git p4"
also has and it may use its standard input, but this function does
not muck with it).


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 09/11] git-p4: Add usability enhancements
  2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
@ 2019-12-05 14:04         ` Junio C Hamano
  2019-12-05 15:40           ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Junio C Hamano @ 2019-12-05 14:04 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> Issue: when prompting the user with raw_input, the tests are not forgiving of user input.  For example, on the first query asks for a yes/no response. If the user enters the full word "yes" or "no" the test will fail. Additionally, offer the suggestion of setting git-p4.attemptRCSCleanup when applying a commit fails because of RCS keywords. Both of these changes are usability enhancement suggestions.

Drop "Issue: " and upcase "when" that follows.  The rest of the
paragraph reads a lot better without it as a human friendly
description.

"are usability enhancement suggestions"???  Leaves readers wonder
who suggested them, or you are suggesting but are willing the change
to be dropped, or what.  Be a bit more assertive if you want to say
that you believe these two would improve usability.

> Change the code prompting the user for input to sanitize the user input before checking the response by asking the response as a lower case string, trimming leading/trailing spaces, and returning the first character.
>
> Change the applyCommit() method that when applying a commit fails becasue of the P4 RCS Keywords, the user should consider setting git-p4.attemptRCSCleanup.

s/becasue/because/;

I have a feeling that these two may be worth doing but are totally
separate issues, deserving two separate commits.  Is there a good
reason why these two must go hand-in-hand?


> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 1fab571664f5b6ad4ef321199f52615a32a9f8c7)
> ---
>  git-p4.py | 31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index f7c0ef0c53..f13e4645a3 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1909,7 +1909,8 @@ def edit_template(self, template_file):
>              return True
>  
>          while True:
> -            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
> +            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
> +                .strip()[0]

You could have saved the patch by doing

	+	.lower().strip()[0]

instead, no?

I wonder if it would be better to write a thin wrapper around raw_input()
that does the "downcase and take the first meaningful letter" thing
for you and call it prompt() or something like that.

> @@ -4327,7 +4343,12 @@ def main():
>                                     description = cmd.description,
>                                     formatter = HelpFormatter())
>  
> -    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
> +    try:
> +        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
> +    except:
> +        parser.print_help()
> +        raise
> +

This change may be a good idea to give help text when the command
line parsing fails, but a good change deserves to be explained.  I
do not think I saw any mention of it in the proposed log message,
though.

>      global verbose
>      verbose = cmd.verbose
>      if cmd.needsGit:

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 09/11] git-p4: Add usability enhancements
  2019-12-05 14:04         ` Junio C Hamano
@ 2019-12-05 15:40           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 15:40 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 9:04 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Issue: when prompting the user with raw_input, the tests are not forgiving of user input.  For example, on the first query asks for a yes/no response. If the user enters the full word "yes" or "no" the test will fail. Additionally, offer the suggestion of setting git-p4.attemptRCSCleanup when applying a commit fails because of RCS keywords. Both of these changes are usability enhancement suggestions.
> Drop "Issue: " and upcase "when" that follows.  The rest of the
> paragraph reads a lot better without it as a human friendly
> description.
>
> "are usability enhancement suggestions"???  Leaves readers wonder
> who suggested them, or you are suggesting but are willing the change
> to be dropped, or what.  Be a bit more assertive if you want to say
> that you believe these two would improve usability.
Thank you and I reworked my submissions. I'm moving them to a separate 
PR and will split the commit into 3 separate commits.
>> Change the code prompting the user for input to sanitize the user input before checking the response by asking the response as a lower case string, trimming leading/trailing spaces, and returning the first character.
>>
>> Change the applyCommit() method that when applying a commit fails becasue of the P4 RCS Keywords, the user should consider setting git-p4.attemptRCSCleanup.
> s/becasue/because/;
>
> I have a feeling that these two may be worth doing but are totally
> separate issues, deserving two separate commits.  Is there a good
> reason why these two must go hand-in-hand?
>
Good idea, and I split them out.
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 1fab571664f5b6ad4ef321199f52615a32a9f8c7)
>> ---
>>   git-p4.py | 31 ++++++++++++++++++++++++++-----
>>   1 file changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index f7c0ef0c53..f13e4645a3 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -1909,7 +1909,8 @@ def edit_template(self, template_file):
>>               return True
>>   
>>           while True:
>> -            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
>> +            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
>> +                .strip()[0]
> You could have saved the patch by doing
>
> 	+	.lower().strip()[0]
>
> instead, no?
>
> I wonder if it would be better to write a thin wrapper around raw_input()
> that does the "downcase and take the first meaningful letter" thing
> for you and call it prompt() or something like that.
I created a new function prompt() as you suggested.
>> @@ -4327,7 +4343,12 @@ def main():
>>                                      description = cmd.description,
>>                                      formatter = HelpFormatter())
>>   
>> -    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>> +    try:
>> +        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>> +    except:
>> +        parser.print_help()
>> +        raise
>> +
> This change may be a good idea to give help text when the command
> line parsing fails, but a good change deserves to be explained.  I
> do not think I saw any mention of it in the proposed log message,
> though.

Yes, you're right.  I split this out into a separate commit as well and 
gave it a place or prominence.

>>       global verbose
>>       verbose = cmd.verbose
>>       if cmd.needsGit:

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
@ 2019-12-05 16:16         ` Ben Keene
  2019-12-05 18:51           ` Denton Liu
  0 siblings, 1 reply; 64+ messages in thread
From: Ben Keene @ 2019-12-05 16:16 UTC (permalink / raw)
  To: Luke Diamand, Ben Keene via GitGitGadget; +Cc: Git Users, Junio C Hamano


On 12/5/2019 4:54 AM, Luke Diamand wrote:
> On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Issue: The current git-p4.py script does not work with python3.
>>
>> I have attempted to use the P4 integration built into GIT and I was unable
>> to get the program to run because I have Python 3.8 installed on my
>> computer. I was able to get the program to run when I downgraded my python
>> to version 2.7. However, python 2 is reaching its end of life.
>>
>> Submission: I am submitting a patch for the git-p4.py script that partially
>> supports python 3.8. This code was able to pass the basic tests (t9800) when
>> run against Python3. This provides basic functionality.
>>
>> In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
>> git P4 Clone was introduced.
>>
>> --encoding Format-identifier
>>
>> This will create the GIT repository following the current functionality;
>> however, before importing the files from P4, it will set the
>> git-p4.pathEncoding option so any files or paths that are encoded with
>> non-ASCII/non-UTF-8 formats will import correctly.
>>
>> Technical details: The script was updated by futurize (
>> https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
>> references to classes in future were reworked so that future would not be
>> required. The existing code test for Unicode support was extended to
>> normalize the classes “unicode” and “bytes” to across platforms:
>>
>>   * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
>>   * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.
>>
>> New coercion methods were written for both Python2 and Python3:
>>
>>   * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
>>     encoded Unicode string.
>>   * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
>>     bytes.
>>
>> In Python2, these functions do not change the data since a ‘str’ object
>> function in both roles as strings and byte arrays. This reduces the
>> potential impact on backward compatibility with Python 2.
>>
>>   * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
>>     string. This function will encode data in both Python2 and Python3. *
>>        path_as_string(path) – This function is an extension function that
>>        honors the option “git-p4.pathEncoding” to convert a set of bytes or
>>        characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
>>        use the encodeWithUTF8() method to convert the custom encoded bytes to
>>        Unicode in UTF-8.
>>
>>
>>
>> Generally speaking, information in the script is converted to Unicode as
>> early as possible and converted back to a byte array just before passing to
>> external programs or files. The exception to this rule is P4 Repository file
>> paths.
>>
>> Paths are not converted but left as “bytes” so the original file path
>> encoding can be preserved. This formatting is required for commands that
>> interact with the P4 file path. When the file path is used by GIT, it is
>> converted with encodeWithUTF8().
>>
> Almost all the tests pass now - nice!
>
> (There's one test that fails for me, t9830-git-p4-symlink-dir.sh).


Which version of Python are running the failing test against?  I run it 
against Python 2.7 and it passes the test. I don't expect all Python 3.x 
tests to pass yet, just t9800.


>
> Nitpicking:
>
> - There are some bits of trailing whitespace around - can you strip
> those out? You can use "git diff --check".


Is there a way that I can find out which branches I need to remove white 
space from now that they have been committed?


> - Also I think the convention for git commits is that they be limited
> to 72 (?) characters.


I'm going through all my commits and fixing them.


> - In 10dc commit message, s/behvior/behavior
> - Maybe submit 4fc4 as a separate patch series? It doesn't seem
> directly related to your python3 changes.


I moved the enhancements to https://github.com/git/git/pull/675


> - s/howerver/however/
>
> The comment at line 3261 (showing the fast-import syntax) has wonky
> indentation, and needs a space after the '#'.
>
> This code looked like we're duplicating stuff:
>
> +    if isinstance(path, unicode):
> +        path = path.replace("%", "%25") \
> +                   .replace("*", "%2A") \
> +                   .replace("#", "%23") \
> +                   .replace("@", "%40")
> +    else:
> +        path = path.replace(b"%", b"%25") \
> +                   .replace(b"*", b"%2A") \
> +                   .replace(b"#", b"%23") \
> +                   .replace(b"@", b"%40")
>
> I wonder if we can have a helper to do this?

I was just looking at this code block, and at this time, I'm not sure if 
the text coming in will be Unicode or bytes, so I'm hesitant to change 
it until more of the code is converted, but I understand about the 
duplication.


>
> In patchRCSKeywords() you've added code to cleanup outFile. But I
> wonder if we could just use a 'finally' block, or a contextexpr ("with
> blah as outFile:")
>
> I don't know if it's worth doing now that you've got it going, but at
> one point I tried simplifying code like this:
>
>     path_as_string(file['depotFile'])
> and
>     marshalled[b'data']
>
> by using a dictionary with overloaded operators which would do the
> bytes/string conversion automatically. However, your approach isn't
> actually _that_ invasive, so maybe this is not necessary.
>
> Looks good though, thanks!
> Luke
>
I toyed with making a class object that would hold the path data and 
have methods to cast to bytes and encodeWithUTF8() and Unicode versions, 
but it quickly got out of hand.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 01/11] git-p4: select p4 binary by operating-system
  2019-12-05 10:19         ` Denton Liu
@ 2019-12-05 16:32           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 16:32 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:19 AM, Denton Liu wrote:
> Hi Ben,
>
> First of all, as a note to you and possibly others, I don't have much
> (read: any) experience with git-p4. I do have experience with Python and
> how git.git generally does things so I'll be reviewing from that
> perspective.
>
> On Wed, Dec 04, 2019 at 10:29:27PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Depending on the version of GIT and Python installed, the perforce program (p4) may not resolve on Windows without the program extension.
> Nit: "GIT" should be written as "Git" when referring to the whole
> project and "git" when referring to the command. Never in all-caps.
>
> Also, please wrap your paragraphs at 72 characters. I'll say it once
> here but it applies to your whole series.


Got it. I'll update all my commit messages to fit within this space.  I 
didn't realize
they didn't word wrap properly. (I'm using a GUI tool to manage this.)


>> Check the operating system (platform.system) and if it is reporting that it is Windows, use the full filename of "p4.exe" instead of "p4"
>>
>> The original code unconditionally used "p4" as the binary filename.
> As a rule of thumb, we want to state the problem first before we state
> what we did (and why). I'd move this paragraph up.
>
>> This change is Python2 and Python3 compatible.
>>
>> Thanks to: Junio C Hamano <gitster@pobox.com> and  Denton Liu <liu.denton@gmail.com> for patiently explaining proper format for my submissions.
> I appreciate the credit but I don't think it's necessary. At _most_, you
> could include the
>
> 	Helped-by: Junio C Hamano <gitster@pobox.com>
> 	Helped-by: Denton Liu <liu.denton@gmail.com>
>
> tags before your signoff but I don't think we've done anything to
> warrant it.


Thank you, I'll keep that in mind for the next submission!


>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 9a3a5c4e6d29dbef670072a9605c7a82b3729434)
> You should remove this line in all of your commits. The referenced
> commit isn't public so the information isn't very useful. Also, try to
> not include anything after your signoff so if this hypothetically were
> useful information, you'd include it before your signoff.
>
> If it's information that's ephemerally useful for current reviewers but
> not for future readers of your commit in the log message, you can
> include it after the three hyphens...


I'll look to pull these out before I update my submission.


>> ---
> like this and it won't be included as part of the log message.
>
>>   git-p4.py | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 60c73b6a37..b2ffbc057b 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -75,7 +75,11 @@ def p4_build_cmd(cmd):
>>       location. It means that hooking into the environment, or other configuration
>>       can be done more easily.
>>       """
>> -    real_cmd = ["p4"]
>> +    # Look for the P4 binary
> I don't think this comment is necessary as the code itself is pretty
> self-explanatory.
>
>> +    if (platform.system() == "Windows"):
>> +        real_cmd = ["p4.exe"]
> You have trailing whitespace here. Try to run `git diff --check` before
> committing to ensure that you have no whitespace errors.
>
> Thanks,
>
> Denton
>
>> +    else:
>> +        real_cmd = ["p4"]
>>   
>>       user = gitConfig("git-p4.user")
>>       if len(user) > 0:
>> -- 
>> gitgitgadget
>>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 02/11] git-p4: change the expansion test from basestring to list
  2019-12-05 10:27         ` Denton Liu
@ 2019-12-05 17:05           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 17:05 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:27 AM, Denton Liu wrote:
> Hi Ben,
>
> On Wed, Dec 04, 2019 at 10:29:28PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Python 3+ handles strings differently than Python 2.7.
> Do you mean Python 3?
>
>> Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
> Python 3.5 doesn't reach EOL until Q4 2020[1]. We should be testing
> these changes under 3.5 to ensure that we're not accidentally
> introducing stuff that's not backwards compatible.


I changed my commit text to say support for version 3.5 (which is 
actually the version I am running the test with).


>> Change references to basestring in the isinstance tests to use list instead. This prepares the code to remove all references to basestring.
>>
>> The original code used basestring in a test to determine if a list or literal string was passed into 9 different functions.  This is used to determine if the shell should be evoked when calling subprocess methods.
> Once again, I'd swap the above two paragraphs. Problem then solution.
>
> Also, did you mean "invoked" instead of "evoked"?


Changed.  And yes, I meant 'invoked'. I wasn't trying to make my code 
feel anything!


>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 5b1b1c145479b5d5fd242122737a3134890409e6)
>> ---
>>   git-p4.py | 18 +++++++++---------
>>   1 file changed, 9 insertions(+), 9 deletions(-)
> The patch itself looks good, though.
>
> [1]: https://devguide.python.org/#branchstatus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
  2019-12-05 10:40         ` Denton Liu
@ 2019-12-05 18:42           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 18:42 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:40 AM, Denton Liu wrote:
> On Wed, Dec 04, 2019 at 10:29:29PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
>>
>> Change the existing unicode test add new support functions for python2-python3 support.
>>
>> Define the following variables:
>> - isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
>> - unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
>> - bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.
>>
>> Add the following new functions:
>>
>> - as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
>> - as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
>> - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.
>>
>> Add a new function alias raw_input:
>> If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.
>>
>> The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.
> How come AS_STRING and AS_BYTES are all-caps here?


I changed them.  I used all caps to designate that they are code string. 
I changed them to as_string() and as_bytes()


>
>> basestring is removed since its only references are found in tests that were changed in the previous change list.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
>> ---
>>   git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 0f27996393..93dfd0920a 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -32,16 +32,78 @@
>>       unicode = unicode
>>   except NameError:
>>       # 'unicode' is undefined, must be Python 3
>> -    str = str
>> +    #
>> +    # For Python3 which is natively unicode, we will use
>> +    # unicode for internal information but all P4 Data
>> +    # will remain in bytes
>> +    isunicode = True
>>       unicode = str
>>       bytes = bytes
>> -    basestring = (str,bytes)
>> +
>> +    def as_string(text):
>> +        """Return a byte array as a unicode string"""
>> +        if text == None:
> Nit: use `text is None` instead. Actually, any time you're checking an
> object to see if it's None, you should use `is` instead of `==` since
> there's usually only one None reference.

I changed this in this commit and will attempt to fix this in all the 
following commits as well.


>
>> +            return None
>> +        if isinstance(text, bytes):
>> +            return unicode(text, "utf-8")
>> +        else:
>> +            return text
>> +
>> +    def as_bytes(text):
>> +        """Return a Unicode string as a byte array"""
>> +        if text == None:
>> +            return None
>> +        if isinstance(text, bytes):
>> +            return text
>> +        else:
>> +            return bytes(text, "utf-8")
>> +
>> +    def to_unicode(text):
>> +        """Return a byte array as a unicode string"""
>> +        return as_string(text)
>> +
>> +    def path_as_string(path):
>> +        """ Converts a path to the UTF8 encoded string """
>> +        if isinstance(path, unicode):
>> +            return path
>> +        return encodeWithUTF8(path).decode('utf-8')
>> +
> Trailing whitespace.
>
>>   else:
>>       # 'unicode' exists, must be Python 2
>> -    str = str
>> +    #
>> +    # We will treat the data as:
>> +    #   str   -> str
>> +    #   bytes -> str
>> +    # So for Python2 these functions are no-ops
>> +    # and will leave the data in the ambiguious
>> +    # string/bytes state
>> +    isunicode = False
>>       unicode = unicode
>>       bytes = str
>> -    basestring = basestring
>> +
>> +    def as_string(text):
>> +        """ Return text unaltered (for Python3 support) """
> I didn't mention this in earlier emails but it's been bothering me a
> lot: is there any reason why you write it as "Python3" vs. "Python 3"
> sometimes (and Python2 as well)? If there's no difference, then we
> should probably stick to one variant in both the commit messages and in
> the code. (I prefer the spaced variant.)


The difference was sloppy typing.  Like the "is None" and trailing white 
spaces, I'll work on fixing these.


>> +        return text
>> +
>> +    def as_bytes(text):
>> +        """ Return text unaltered (for Python3 support) """
>> +        return text
>> +
>> +    def to_unicode(text):
>> +        """Return a string as a unicode string"""
>> +        return text.decode('utf-8')
>> +
> Trailing whitespace.
>
>> +    def path_as_string(path):
>> +        """ Converts a path to the UTF8 encoded bytes """
>> +        return encodeWithUTF8(path)
>> +
>> +
>> +
> Trailing whitespace.
>
>> +# Check for raw_input support
>> +try:
>> +    raw_input
>> +except NameError:
>> +    raw_input = input
>>   
>>   try:
>>       from subprocess import CalledProcessError
>> -- 
>> gitgitgadget
>>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-05 16:16         ` Ben Keene
@ 2019-12-05 18:51           ` Denton Liu
  2019-12-05 20:47             ` Ben Keene
  0 siblings, 1 reply; 64+ messages in thread
From: Denton Liu @ 2019-12-05 18:51 UTC (permalink / raw)
  To: Ben Keene
  Cc: Luke Diamand, Ben Keene via GitGitGadget, Git Users, Junio C Hamano

On Thu, Dec 05, 2019 at 11:16:27AM -0500, Ben Keene wrote:
> 
> On 12/5/2019 4:54 AM, Luke Diamand wrote:
> > On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
> > - There are some bits of trailing whitespace around - can you strip
> > those out? You can use "git diff --check".
> 
> 
> Is there a way that I can find out which branches I need to remove white
> space from now that they have been committed?

I'm assuming you mean commits? You can run

	git log --check master..

and git will highlight the whitespace errors.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 05/11] git-p4: Add new functions in preparation of usage
  2019-12-05 10:50         ` Denton Liu
@ 2019-12-05 19:23           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 19:23 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:50 AM, Denton Liu wrote:
>> Subject: git-p4: Add new functions in preparation of usage
> Nit: as a convention, you should lowercase the letter after the colon in
> the subject. As in "git-p4: add new functions..."
>
> This applies for other patches as well.


Got it.  Changing all leading characters to lower case.


>
> On Wed, Dec 04, 2019 at 10:29:31PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> This changelist is an intermediate submission for migrating the P4 support from Python2 to Python3. The code needs access to the encodeWithUTF8() for support of non-UTF8 filenames in the clone class as well as the sync class.
>>
>> Move the function encodeWithUTF8() from the P4Sync class to a stand-alone function.  This will allow other classes to use this function without instanciating the P4Sync class. Change the self.verbose reference to an optional method parameter. Update the existing references to this function to pass the self.verbose since it is no longer available on "self" since the function is no longer contained on the P4Sync class.
> Hmmm, so does the patch before this not actually work since
> encodeWithUTF8() isn't defined yet? When you reroll this series, you
> should swap the order of the patches since the previous patch depends on
> this one, not the other way around.

Good catch.  That's correct, the encodeWithUTF8() should be first.  I 
moved that commit earlier in the chain and actually split it up from the 
changes to write_pipe and gitConfigSet() so the text will be easier to see.


>> Modify the functions write_pipe() and p4_write_pipe() to remove the return value.  The return value for both functions is the number of bytes, but the meaning is lost under python3 since the count does not match the number of characters that may have been encoded.  Additionally, the return value was never used, so this is removed to avoid future ambiguity.
>>
>> Add a new method gitConfigSet(). This method will set a value in the git configuration cache list.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit affe888f432bb6833df78962e8671fccdf76c47a)
>> ---
>>   git-p4.py | 60 ++++++++++++++++++++++++++++++++++++++++---------------
>>   1 file changed, 44 insertions(+), 16 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index b283ef1029..2659531c2e 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -237,6 +237,8 @@ def die(msg):
>>           sys.exit(1)
>>   
>>   def write_pipe(c, stdin):
>> +    """ Executes the command 'c', passing 'stdin' on the standard input
>> +    """
>>       if verbose:
>>           sys.stderr.write('Writing pipe: %s\n' % str(c))
>>   
>> @@ -248,11 +250,12 @@ def write_pipe(c, stdin):
>>       if p.wait():
>>           die('Command failed: %s' % str(c))
>>   
>> -    return val
>>   
>>   def p4_write_pipe(c, stdin):
>> +    """ Runs a P4 command 'c', passing 'stdin' data to P4
>> +    """
>>       real_cmd = p4_build_cmd(c)
>> -    return write_pipe(real_cmd, stdin)
>> +    write_pipe(real_cmd, stdin)
>>   
>>   def read_pipe_full(c):
>>       """ Read output from  command. Returns a tuple
>> @@ -653,6 +656,38 @@ def isModeExec(mode):
>>       # otherwise False.
>>       return mode[-3:] == "755"
>>   
>> +def encodeWithUTF8(path, verbose = False):
> Nit: no spaces surrounding `=` in default args.


Fixed


>> +    """ Ensure that the path is encoded as a UTF-8 string
>> +
>> +        Returns bytes(P3)/str(P2)
>> +    """
>> +
> Trailing whitespace.
>
>> +    if isunicode:
>> +        try:
>> +            if isinstance(path, unicode):
>> +                # It is already unicode, cast it as a bytes
>> +                # that is encoded as utf-8.
>> +                return path.encode('utf-8', 'strict')
>> +            path.decode('ascii', 'strict')
>> +        except:
>> +            encoding = 'utf8'
>> +            if gitConfig('git-p4.pathEncoding'):
>> +                encoding = gitConfig('git-p4.pathEncoding')
>> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>> +            if verbose:
>> +                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
>> +    else:
> Trailing whitespace.
>
>> +        try:
>> +            path.decode('ascii')
>> +        except:
>> +            encoding = 'utf8'
>> +            if gitConfig('git-p4.pathEncoding'):
>> +                encoding = gitConfig('git-p4.pathEncoding')
>> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>> +            if verbose:
>> +                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>> +    return path
>> +
>>   class P4Exception(Exception):
>>       """ Base class for exceptions from the p4 client """
>>       def __init__(self, exit_code):
>> @@ -891,6 +926,11 @@ def gitConfigList(key):
>>               _gitConfig[key] = []
>>       return _gitConfig[key]
>>   
>> +def gitConfigSet(key, value):
>> +    """ Set the git configuration key 'key' to 'value' for this session
>> +    """
>> +    _gitConfig[key] = value
>> +
>>   def p4BranchesInGit(branchesAreInRemotes=True):
>>       """Find all the branches whose names start with "p4/", looking
>>          in remotes or heads as specified by the argument.  Return
>> @@ -2814,24 +2854,12 @@ def writeToGitStream(self, gitMode, relPath, contents):
>>               self.gitStream.write(d)
>>           self.gitStream.write('\n')
>>   
>> -    def encodeWithUTF8(self, path):
>> -        try:
>> -            path.decode('ascii')
>> -        except:
>> -            encoding = 'utf8'
>> -            if gitConfig('git-p4.pathEncoding'):
>> -                encoding = gitConfig('git-p4.pathEncoding')
>> -            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>> -            if self.verbose:
>> -                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>> -        return path
>> -
>>       # output one file from the P4 stream
>>       # - helper for streamP4Files
>>   
>>       def streamOneP4File(self, file, contents):
>>           relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
>> -        relPath = self.encodeWithUTF8(relPath)
>> +        relPath = encodeWithUTF8(relPath, self.verbose)
>>           if verbose:
>>               if 'fileSize' in self.stream_file:
>>                   size = int(self.stream_file['fileSize'])
>> @@ -2914,7 +2942,7 @@ def streamOneP4File(self, file, contents):
>>   
>>       def streamOneP4Deletion(self, file):
>>           relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
>> -        relPath = self.encodeWithUTF8(relPath)
>> +        relPath = encodeWithUTF8(relPath, self.verbose)
>>           if verbose:
>>               sys.stdout.write("delete %s\n" % relPath)
>>               sys.stdout.flush()
>> -- 
>> gitgitgadget
>>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly
  2019-12-05 13:38         ` Junio C Hamano
@ 2019-12-05 19:37           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 19:37 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 8:38 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> When a computer is configured to use Git for windows and Python for windows, and not a Unix subsystem like cygwin or WSL, the directory separator changes and causes git-p4 to fail to properly determine paths.
>>
>> Fix 3 path separator errors:
>>
>> 1. getUserCacheFilename should not use string concatenation. Change this code to use os.path.join to build an OS tolerant path.
>> 2. defaultDestiantion used the OS.path.split to split depot paths.  This is incorrect on windows. Change the code to split on a forward slash(/) instead since depot paths use this character regardless  of the operating system.
>> 3. The call to isvalidGitDir() in the main code also used a literal forward slash. Change the cose to use os.path.join to correctly format the path for the operating system.
> s/isvalid/isValid/;
> s/cose/code/;
>
> Also please wrap your lines at around 72 columns (that will let
> reviewers quote what you write (which adds "> " prefix and consumes
> 2 more columns), and would allow us a handful of exchanges (each
> round adding ">" prefix to consume 1 more column) before bumping
> into the right edge of the terminal at 80 columns.
>
>> These three changes allow the suggested windows configuration to properly locate files while retaining the existing behavior on non-windows operating systems.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit a5b45c12c3861638a933b05a1ffee0c83978dcb2)
> As Denton mentioned, general public do not care if you "cherry
> picked" it from your earlier unpublished work.  Remove it.
>
> Aside from these small nits, the proposed log message for this step
> is quite cleanly done and easily readable.  All the decisions are
> clearly written and agreeable.  Nicely done.


Thank you. I've been working through all the commits and updating them.


>> ---
>>   git-p4.py | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 2659531c2e..7ac8cb42ef 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -1454,8 +1454,10 @@ def p4UserIsMe(self, p4User):
>>               return True
>>   
>>       def getUserCacheFilename(self):
>> +        """ Returns the filename of the username cache
>> +	    """
> Inconsistent use of spaces and a tab I see on these two lines.
> Intended?

Good catch! It should have been spaces.  Corrected.


>
>>           home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
>> -        return home + "/.gitp4-usercache.txt"
>> +        return os.path.join(home, ".gitp4-usercache.txt")
>>   
>>       def getUserMapFromPerforceServer(self):
>>           if self.userMapFromPerforceServer:
>> @@ -3973,13 +3975,16 @@ def __init__(self):
>>           self.cloneBare = False
>>   
>>       def defaultDestination(self, args):
>> +        """ Returns the last path component as the default git
>> +            repository directory name
>> +        """
>>           ## TODO: use common prefix of args?
>>           depotPath = args[0]
>>           depotDir = re.sub("(@[^@]*)$", "", depotPath)
>>           depotDir = re.sub("(#[^#]*)$", "", depotDir)
>>           depotDir = re.sub(r"\.\.\.$", "", depotDir)
>>           depotDir = re.sub(r"/$", "", depotDir)
>> -        return os.path.split(depotDir)[1]
>> +        return depotDir.split('/')[-1]
>>   
>>       def run(self, args):
>>           if len(args) < 1:
>> @@ -4252,8 +4257,8 @@ def main():
>>                           chdir(cdup);
>>   
>>           if not isValidGitDir(cmd.gitdir):
>> -            if isValidGitDir(cmd.gitdir + "/.git"):
>> -                cmd.gitdir += "/.git"
>> +            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
>> +                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
>>               else:
>>                   die("fatal: cannot locate git repository at %s" % cmd.gitdir)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 07/11] git-p4: Add a helper class for stream writing
  2019-12-05 13:42         ` Junio C Hamano
@ 2019-12-05 19:52           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 19:52 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 8:42 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> This is a transtional commit that does not change current behvior.  It adds a new class Py23File.
> Perhaps s/transitional/preparatory/?  It does not change the
> behaviour because nobody uses the class yet, if I understand
> correctly.  Which is fine.
>
> It is kind of surprising that each project needs to reinvent and
> maintain a wrapper class like this one, as what the new class does
> smells quite generic.

It is a rather generic class.  My intention was to avoid adding
any additional dependencies so a small class that only implements
the few methods we need seemed safest.

I cleaned up this commit message as well.

>> Following the Python recommendation of keeping text as unicode internally and only converting to and from bytes on input and output, this class provides an interface for the methods used for reading and writing files and file like streams.
>>
>> Create a class that wraps the input and output functions used by the git-p4.py code for reading and writing to standard file handles.
>>
>> The methods of this class should take a Unicode string for writing and return unicode strings in reads.  This class should be a drop-in for existing file like streams
>>
>> The following methods should be coded for supporting existing read/write calls:
>> * write - this should write a Unicode string to the underlying stream
>> * read - this should read from the underlying stream and cast the bytes as a unicode string
>> * readline - this should read one line of text from the underlying stream and cast it as a unicode string
>> * readline - this should read a number of lines, optionally hinted, and cast each line as a unicode string
>>
>> The expression "cast as a unicode string" is used because the code should use the AS_BYTES() and AS_UNICODE() functions instead of cohercing the data to actual unicode strings or bytes.  This allows python 2 code to continue to use the internal "str" data type instead of converting the data back and forth to actual unicode strings. This retains current python2 support while python3 support may be incomplete.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 12919111fbaa3e4c0c4c2fdd4f79744cc683d860)
>> ---
>>   git-p4.py | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 7ac8cb42ef..0da640be93 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -4182,6 +4182,72 @@ def run(self, args):
>>               print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
>>           return True
>>   
>> +class Py23File():
>> +    """ Python2/3 Unicode File Wrapper
>> +    """
>> +
>> +    stream_handle = None
>> +    verbose       = False
>> +    debug_handle  = None
>> +
>> +    def __init__(self, stream_handle, verbose = False):
>> +        """ Create a Python3 compliant Unicode to Byte String
>> +            Windows compatible wrapper
>> +
>> +            stream_handle = the underlying file-like handle
>> +            verbose       = Boolean if content should be echoed
>> +        """
>> +        self.stream_handle = stream_handle
>> +        self.verbose       = verbose
>> +
>> +    def write(self, utf8string):
>> +        """ Writes the utf8 encoded string to the underlying
>> +            file stream
>> +        """
>> +        self.stream_handle.write(as_bytes(utf8string))
>> +        if self.verbose:
>> +            sys.stderr.write("Stream Output: %s" % utf8string)
>> +            sys.stderr.flush()
>> +
>> +    def read(self, size = None):
>> +        """ Reads int charcters from the underlying stream
>> +            and converts it to utf8.
>> +
>> +            Be aware, the size value is for reading the underlying
>> +            bytes so the value may be incorrect. Usage of the size
>> +            value is discouraged.
>> +        """
>> +        if size == None:
>> +            return as_string(self.stream_handle.read())
>> +        else:
>> +            return as_string(self.stream_handle.read(size))
>> +
>> +    def readline(self):
>> +        """ Reads a line from the underlying byte stream
>> +            and converts it to utf8
>> +        """
>> +        return as_string(self.stream_handle.readline())
>> +
>> +    def readlines(self, sizeHint = None):
>> +        """ Returns a list containing lines from the file converted to unicode.
>> +
>> +            sizehint - Optional. If the optional sizehint argument is
>> +            present, instead of reading up to EOF, whole lines totalling
>> +            approximately sizehint bytes are read.
>> +        """
>> +        lines = self.stream_handle.readlines(sizeHint)
>> +        for i in range(0, len(lines)):
>> +            lines[i] = as_string(lines[i])
>> +        return lines
>> +
>> +    def close(self):
>> +        """ Closes the underlying byte stream """
>> +        self.stream_handle.close()
>> +
>> +    def flush(self):
>> +        """ Flushes the underlying byte stream """
>> +        self.stream_handle.flush()
>> +
>>   class HelpFormatter(optparse.IndentedHelpFormatter):
>>       def __init__(self):
>>           optparse.IndentedHelpFormatter.__init__(self)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding
  2019-12-05 13:55         ` Junio C Hamano
@ 2019-12-05 20:23           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 20:23 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 8:55 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.
> Somewhere in the midway of the series, the log message starts using
> all-caps AS_STRING and AS_BYTES to describe some specific things,
> and it would help readers if the first one of these steps explain
> what they mean (I am guessing AS_STRING is an unicode object in both
> Python 2 and 3, and AS_BYTES is a plain vanilla string in Python 2,
> or something like that?).

I rewrote almost the entire commit message. Hopefully this will clarify 
the code.

>> Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.
> s/isto/is to/;
>
> This sentence is a bit hard to read.
>
> This change does not make the function optionally convert the input
> we feed to the p4 command---it only changes the values in the
> command output.  But the readers cannot tell that easily until
> reading to the very end of the sentence, i.e. "returned by the
> function", as written.
>
> We probably want to be a bit more explicit to say what gets
> converted; perhaps renaming the parameter to encode_cmd_output may
> help.


I renamed the parameter as suggested.


>> Change the code so that the key will always be encoded AS_STRING()
> s/key/key of the returned hash/ or something to clarify what key you
> are talking about.
>
>> Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.
> "Data that is passed to the standard input stream of the p4 process"
> to clarify whose standard input you are talking about (iow, "git p4"
> also has and it may use its standard input, but this function does
> not muck with it).
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-05 18:51           ` Denton Liu
@ 2019-12-05 20:47             ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-05 20:47 UTC (permalink / raw)
  To: Denton Liu; +Cc: Luke Diamand, Git Users, Junio C Hamano


On 12/5/2019 1:51 PM, Denton Liu wrote:
> On Thu, Dec 05, 2019 at 11:16:27AM -0500, Ben Keene wrote:
>> On 12/5/2019 4:54 AM, Luke Diamand wrote:
>>> On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
>>> - There are some bits of trailing whitespace around - can you strip
>>> those out? You can use "git diff --check".
>>
>> Is there a way that I can find out which branches I need to remove white
>> space from now that they have been committed?
> I'm assuming you mean commits? You can run
>
> 	git log --check master..
>
> and git will highlight the whitespace errors.
Yes, that's exactly what I meant.  Thank you.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (11 preceding siblings ...)
  2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
@ 2019-12-07 17:47       ` " Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes Jeff King via GitGitGadget
                           ` (15 more replies)
  12 siblings, 16 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

Issue: The current git-p4.py script does not work with python3.

I have attempted to use the P4 integration built into GIT and I was unable
to get the program to run because I have Python 3.8 installed on my
computer. I was able to get the program to run when I downgraded my python
to version 2.7. However, python 2 is reaching its end of life.

Submission: I am submitting a patch for the git-p4.py script that partially 
supports python 3.8. This code was able to pass the basic tests (t9800) when
run against Python3. This provides basic functionality. 

In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
git P4 Clone was introduced. 

--encoding Format-identifier

This will create the GIT repository following the current functionality;
however, before importing the files from P4, it will set the
git-p4.pathEncoding option so any files or paths that are encoded with
non-ASCII/non-UTF-8 formats will import correctly.

Technical details: The script was updated by futurize (
https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
references to classes in future were reworked so that future would not be
required. The existing code test for Unicode support was extended to
normalize the classes “unicode” and “bytes” to across platforms:

 * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
 * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.

New coercion methods were written for both Python2 and Python3:

 * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
   encoded Unicode string. 
 * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
   bytes.

In Python2, these functions do not change the data since a ‘str’ object
function in both roles as strings and byte arrays. This reduces the
potential impact on backward compatibility with Python 2.

 * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
   string. This function will encode data in both Python2 and Python3. * 
      path_as_string(path) – This function is an extension function that
      honors the option “git-p4.pathEncoding” to convert a set of bytes or
      characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
      use the encodeWithUTF8() method to convert the custom encoded bytes to
      Unicode in UTF-8.
   
   

Generally speaking, information in the script is converted to Unicode as
early as possible and converted back to a byte array just before passing to
external programs or files. The exception to this rule is P4 Repository file
paths.

Paths are not converted but left as “bytes” so the original file path
encoding can be preserved. This formatting is required for commands that
interact with the P4 file path. When the file path is used by GIT, it is
converted with encodeWithUTF8().

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (13):
  git-p4: select P4 binary by operating-system
  git-p4: change the expansion test from basestring to list
  git-p4: promote encodeWithUTF8() to a global function
  git-p4: remove p4_write_pipe() and write_pipe() return values
  git-p4: add new support function gitConfigSet()
  git-p4: add casting helper functions for python 3 conversion
  git-p4: python 3 syntax changes
  git-p4: fix assumed path separators to be more Windows friendly
  git-p4: add Py23File() - helper class for stream writing
  git-p4: p4CmdList - support Unicode encoding
  git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800)
  git-p4: added --encoding parameter to p4 clone
  git-p4: Add depot manipulation functions

Jeff King (2):
  t/gitweb-lib.sh: drop confusing quotes
  t/gitweb-lib.sh: set $REQUEST_URI

 Documentation/git-p4.txt        |   5 +
 git-p4.py                       | 768 +++++++++++++++++++++++++-------
 t/t9822-git-p4-path-encoding.sh | 101 +++++
 3 files changed, 706 insertions(+), 168 deletions(-)


base-commit: 083378cc35c4dbcc607e4cdd24a5fca440163d17
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/463

Range-diff vs v4:

  -:  ---------- >  1:  bb7f8f0a0a t/gitweb-lib.sh: drop confusing quotes
  -:  ---------- >  2:  a7a4c5a2aa t/gitweb-lib.sh: set $REQUEST_URI
  1:  4012426993 !  3:  e425ccc10f git-p4: select p4 binary by operating-system
     @@ -1,19 +1,18 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: select p4 binary by operating-system
     -
     -    Depending on the version of GIT and Python installed, the perforce program (p4) may not resolve on Windows without the program extension.
     -
     -    Check the operating system (platform.system) and if it is reporting that it is Windows, use the full filename of "p4.exe" instead of "p4"
     +    git-p4: select P4 binary by operating-system
      
          The original code unconditionally used "p4" as the binary filename.
      
     -    This change is Python2 and Python3 compatible.
     +    Depending on the version of Git and Python installed, the perforce
     +    program (p4) may not resolve on Windows without the program extension.
     +
     +    Check the operating system (platform.system) and if it is reporting that
     +    it is Windows, use the full filename of "p4.exe" instead of "p4"
      
     -    Thanks to: Junio C Hamano <gitster@pobox.com> and  Denton Liu <liu.denton@gmail.com> for patiently explaining proper format for my submissions.
     +    This change is Python 2 and Python 3 compatible.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit 9a3a5c4e6d29dbef670072a9605c7a82b3729434)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
     @@ -23,9 +22,8 @@
           can be done more easily.
           """
      -    real_cmd = ["p4"]
     -+    # Look for the P4 binary
      +    if (platform.system() == "Windows"):
     -+        real_cmd = ["p4.exe"]    
     ++        real_cmd = ["p4.exe"]
      +    else:
      +        real_cmd = ["p4"]
       
  2:  0ef2f56b04 !  4:  7170aface2 git-p4: change the expansion test from basestring to list
     @@ -2,14 +2,20 @@
      
          git-p4: change the expansion test from basestring to list
      
     -    Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
     +    Python 3 handles strings differently than Python 2.7.  Since Python 2
     +    is reaching it's end of life, a series of changes are being submitted to
     +    enable python 3.5 and following support. The current code fails basic
     +    tests under python 3.5.
      
     -    Change references to basestring in the isinstance tests to use list instead. This prepares the code to remove all references to basestring.
     +    The original code used 'basestring' in a test to determine if a list or
     +    literal string was passed into 9 different functions.  This is used to
     +    determine if the shell should be invoked when calling subprocess
     +    methods.
      
     -    The original code used basestring in a test to determine if a list or literal string was passed into 9 different functions.  This is used to determine if the shell should be evoked when calling subprocess methods.
     +    Change references to 'basestring' in the isinstance tests to use 'list'
     +    instead. This prepares the code to remove all references to basestring.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit 5b1b1c145479b5d5fd242122737a3134890409e6)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
  5:  1bf7b073b0 !  5:  11d7703e41 git-p4: Add new functions in preparation of usage
     @@ -1,55 +1,53 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: Add new functions in preparation of usage
     +    git-p4: promote encodeWithUTF8() to a global function
      
     -    This changelist is an intermediate submission for migrating the P4 support from Python2 to Python3. The code needs access to the encodeWithUTF8() for support of non-UTF8 filenames in the clone class as well as the sync class.
     +    This changelist is an intermediate submission for migrating the P4
     +    support from Python 2 to Python 3. The code needs access to the
     +    encodeWithUTF8() for support of non-UTF8 filenames in the clone class as
     +    well as the sync class.
      
     -    Move the function encodeWithUTF8() from the P4Sync class to a stand-alone function.  This will allow other classes to use this function without instanciating the P4Sync class. Change the self.verbose reference to an optional method parameter. Update the existing references to this function to pass the self.verbose since it is no longer available on "self" since the function is no longer contained on the P4Sync class.
     -
     -    Modify the functions write_pipe() and p4_write_pipe() to remove the return value.  The return value for both functions is the number of bytes, but the meaning is lost under python3 since the count does not match the number of characters that may have been encoded.  Additionally, the return value was never used, so this is removed to avoid future ambiguity.
     -
     -    Add a new method gitConfigSet(). This method will set a value in the git configuration cache list.
     +    Move the function encodeWithUTF8() from the P4Sync class to a
     +    stand-alone function.  This will allow other classes to use this
     +    function without instanciating the P4Sync class. Change the self.verbose
     +    reference to an optional method parameter. Update the existing
     +    references to this function to pass the self.verbose since it is no
     +    longer available on "self" since the function is no longer contained on
     +    the P4Sync class.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit affe888f432bb6833df78962e8671fccdf76c47a)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
       +++ b/git-p4.py
      @@
     -         sys.exit(1)
     - 
     - def write_pipe(c, stdin):
     -+    """ Executes the command 'c', passing 'stdin' on the standard input
     -+    """
     -     if verbose:
     -         sys.stderr.write('Writing pipe: %s\n' % str(c))
     + import ctypes
     + import errno
       
     +-# support basestring in python3
     ++# support basestring in Python 3
     + try:
     +     unicode = unicode
     + except NameError:
      @@
     -     if p.wait():
     -         die('Command failed: %s' % str(c))
     - 
     --    return val
     - 
     - def p4_write_pipe(c, stdin):
     -+    """ Runs a P4 command 'c', passing 'stdin' data to P4
     -+    """
     -     real_cmd = p4_build_cmd(c)
     --    return write_pipe(real_cmd, stdin)
     -+    write_pipe(real_cmd, stdin)
     - 
     - def read_pipe_full(c):
     -     """ Read output from  command. Returns a tuple
     + try:
     +     from subprocess import CalledProcessError
     + except ImportError:
     +-    # from python2.7:subprocess.py
     ++    # from Python 2.7:subprocess.py
     +     # Exception classes used by this module.
     +     class CalledProcessError(Exception):
     +         """This exception is raised when a process run by check_call() returns
      @@
           # otherwise False.
           return mode[-3:] == "755"
       
     -+def encodeWithUTF8(path, verbose = False):
     ++def encodeWithUTF8(path, verbose=False):
      +    """ Ensure that the path is encoded as a UTF-8 string
      +
      +        Returns bytes(P3)/str(P2)
      +    """
     -+   
     ++
      +    if isunicode:
      +        try:
      +            if isinstance(path, unicode):
     @@ -64,7 +62,7 @@
      +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
      +            if verbose:
      +                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
     -+    else:    
     ++    else:
      +        try:
      +            path.decode('ascii')
      +        except:
     @@ -79,18 +77,6 @@
       class P4Exception(Exception):
           """ Base class for exceptions from the p4 client """
           def __init__(self, exit_code):
     -@@
     -             _gitConfig[key] = []
     -     return _gitConfig[key]
     - 
     -+def gitConfigSet(key, value):
     -+    """ Set the git configuration key 'key' to 'value' for this session
     -+    """
     -+    _gitConfig[key] = value
     -+
     - def p4BranchesInGit(branchesAreInRemotes=True):
     -     """Find all the branches whose names start with "p4/", looking
     -        in remotes or heads as specified by the argument.  Return
      @@
                   self.gitStream.write(d)
               self.gitStream.write('\n')
  9:  4fc49313f0 !  6:  e28fe095b4 git-p4: Add usability enhancements
     @@ -1,75 +1,44 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: Add usability enhancements
     +    git-p4: remove p4_write_pipe() and write_pipe() return values
      
     -    Issue: when prompting the user with raw_input, the tests are not forgiving of user input.  For example, on the first query asks for a yes/no response. If the user enters the full word "yes" or "no" the test will fail. Additionally, offer the suggestion of setting git-p4.attemptRCSCleanup when applying a commit fails because of RCS keywords. Both of these changes are usability enhancement suggestions.
     +    The git-p4 functions write_pipe() and p4_write_pipe() originally
     +    return the number of bytes returned from the system call. However,
     +    this is a misleading value when this function is run by Python 3.
      
     -    Change the code prompting the user for input to sanitize the user input before checking the response by asking the response as a lower case string, trimming leading/trailing spaces, and returning the first character.
     -
     -    Change the applyCommit() method that when applying a commit fails becasue of the P4 RCS Keywords, the user should consider setting git-p4.attemptRCSCleanup.
     +    Modify the functions write_pipe() and p4_write_pipe() to remove the
     +    return value.  The return value for both functions is the number of
     +    bytes, but the meaning is lost under python3 since the count does not
     +    match the number of characters that may have been encoded.
     +    Additionally, the return value was never used, so this is removed to
     +    avoid future ambiguity.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit 1fab571664f5b6ad4ef321199f52615a32a9f8c7)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
       +++ b/git-p4.py
      @@
     -             return True
     +         sys.exit(1)
       
     -         while True:
     --            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
     -+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
     -+                .strip()[0]
     -             if response == 'y':
     -                 return True
     -             if response == 'n':
     -@@
     -                     # disable the read-only bit on windows.
     -                     if self.isWindows and file not in editedFiles:
     -                         os.chmod(file, stat.S_IWRITE)
     --                    self.patchRCSKeywords(file, kwfiles[file])
     --                    fixed_rcs_keywords = True
     -+                    
     -+                    try:
     -+                        self.patchRCSKeywords(file, kwfiles[file])
     -+                        fixed_rcs_keywords = True
     -+                    except:
     -+                        # We are throwing an exception, undo all open edits
     -+                        for f in editedFiles:
     -+                            p4_revert(f)
     -+                        raise
     -+            else:
     -+                # They do not have attemptRCSCleanup set, this might be the fail point
     -+                # Check to see if the file has RCS keywords and suggest setting the property.
     -+                for file in editedFiles | filesToDelete:
     -+                    if p4_keywords_regexp_for_file(file) != None:
     -+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
     -+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
     -+                        break
     + def write_pipe(c, stdin):
     ++    """ Executes the command 'c', passing 'stdin' on the standard input
     ++    """
     +     if verbose:
     +         sys.stderr.write('Writing pipe: %s\n' % str(c))
       
     -             if fixed_rcs_keywords:
     -                 print("Retrying the patch with RCS keywords cleaned up")
     -@@
     -                         if self.conflict_behavior == "ask":
     -                             print("What do you want to do?")
     -                             response = raw_input("[s]kip this commit but apply"
     --                                                 " the rest, or [q]uit? ")
     -+                                                 " the rest, or [q]uit? ").lower().strip()[0]
     -                             if not response:
     -                                 continue
     -                         elif self.conflict_behavior == "skip":
      @@
     -                                    description = cmd.description,
     -                                    formatter = HelpFormatter())
     +     if p.wait():
     +         die('Command failed: %s' % str(c))
     + 
     +-    return val
     + 
     + def p4_write_pipe(c, stdin):
     ++    """ Runs a P4 command 'c', passing 'stdin' data to P4
     ++    """
     +     real_cmd = p4_build_cmd(c)
     +-    return write_pipe(real_cmd, stdin)
     ++    write_pipe(real_cmd, stdin)
       
     --    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
     -+    try:
     -+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
     -+    except:
     -+        parser.print_help()
     -+        raise
     -+
     -     global verbose
     -     verbose = cmd.verbose
     -     if cmd.needsGit:
     + def read_pipe_full(c):
     +     """ Read output from  command. Returns a tuple
  -:  ---------- >  7:  bc7009541b git-p4: add new support function gitConfigSet()
  3:  f0e658b984 !  8:  1e677781d2 git-p4: add new helper functions for python3 conversion
     @@ -1,31 +1,54 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: add new helper functions for python3 conversion
     +    git-p4: add casting helper functions for python 3 conversion
      
     -    Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
     +    Python 3 handles strings differently than Python 2.7.  Since Python 2
     +    is reaching it's end of life, a series of changes are being submitted to
     +    enable python 3.5 and following support. The current code fails basic
     +    tests under python 3.5.
      
     -    Change the existing unicode test add new support functions for python2-python3 support.
     +    Change the existing unicode test add new support functions for
     +    Python 2 - Python 3 support.
      
          Define the following variables:
     -    - isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
     -    - unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
     -    - bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.
     +    - isunicode - a boolean variable that states if the version of python
     +                  natively supports unicode (true) or not (false). This is
     +                  true for Python 3 and false for Python 2.
     +    - unicode   - a type alias for the datatype that holds a unicode string.
     +                  It is assigned to a str under Python 3 and the unicode
     +                  type for Python 2.
     +    - bytes     - a type alias for an array of bytes.  It is assigned the
     +                  native bytes type for Python 3 and str for Python 2.
      
          Add the following new functions:
      
     -    - as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
     -    - as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
     -    - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.
     +    - as_string(text)  - A new function that will convert a byte array to a
     +                         unicode (UTF-8) string under Python 3.  Under
     +                         Python 2, this returns the string unchanged.
     +    - as_bytes(text)   - A new function that will convert a unicode string
     +                         to a byte array under Python 3.  Under Python 2,
     +                         this returns the string unchanged.
     +    - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both
     +                         Python 2 and Python 3.
      
          Add a new function alias raw_input:
     -    If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.
     +    If raw_input does not exist (it was renamed to input in Python 3) alias
     +    input as raw_input.
      
     -    The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.
     +    The as_string() and as_bytes() functions allow for modifying the code
     +    with a minimal amount of impact on Python 2 support. When a string is
     +    expected, the as_string() will be used to "cast" the incoming "bytes"
     +    to a string type.
      
     -    basestring is removed since its only references are found in tests that were changed in the previous change list.
     +    Conversely as_bytes() will be used to cast a "string" to a "byte array"
     +    type. Since Python 2 overloads the datatype 'str' to serve both purposes,
     +    the Python 2 versions of these function do not change the data. This
     +    reduces the regression impact of these code changes.
     +
     +    'basestring' is removed since its only references are found in tests
     +    that were changed in modified in previous commits.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
     @@ -36,7 +59,7 @@
           # 'unicode' is undefined, must be Python 3
      -    str = str
      +    #
     -+    # For Python3 which is natively unicode, we will use 
     ++    # For Python 3 which is natively unicode, we will use
      +    # unicode for internal information but all P4 Data
      +    # will remain in bytes
      +    isunicode = True
     @@ -45,8 +68,9 @@
      -    basestring = (str,bytes)
      +
      +    def as_string(text):
     -+        """Return a byte array as a unicode string"""
     -+        if text == None:
     ++        """ Return a byte array as a unicode string
     ++        """
     ++        if text is None:
      +            return None
      +        if isinstance(text, bytes):
      +            return unicode(text, "utf-8")
     @@ -54,8 +78,9 @@
      +            return text
      +
      +    def as_bytes(text):
     -+        """Return a Unicode string as a byte array"""
     -+        if text == None:
     ++        """ Return a Unicode string as a byte array
     ++        """
     ++        if text is None:
      +            return None
      +        if isinstance(text, bytes):
      +            return text
     @@ -63,15 +88,17 @@
      +            return bytes(text, "utf-8")
      +
      +    def to_unicode(text):
     -+        """Return a byte array as a unicode string"""
     -+        return as_string(text)    
     ++        """ Return a byte array as a unicode string
     ++        """
     ++        return as_string(text)
      +
      +    def path_as_string(path):
     -+        """ Converts a path to the UTF8 encoded string """
     ++        """ Converts a path to the UTF8 encoded string
     ++        """
      +        if isinstance(path, unicode):
      +            return path
      +        return encodeWithUTF8(path).decode('utf-8')
     -+    
     ++
       else:
           # 'unicode' exists, must be Python 2
      -    str = str
     @@ -79,7 +106,7 @@
      +    # We will treat the data as:
      +    #   str   -> str
      +    #   bytes -> str
     -+    # So for Python2 these functions are no-ops
     ++    # So for Python 2 these functions are no-ops
      +    # and will leave the data in the ambiguious
      +    # string/bytes state
      +    isunicode = False
     @@ -88,23 +115,25 @@
      -    basestring = basestring
      +
      +    def as_string(text):
     -+        """ Return text unaltered (for Python3 support) """
     ++        """ Return text unaltered (for Python 3 support)
     ++        """
      +        return text
      +
      +    def as_bytes(text):
     -+        """ Return text unaltered (for Python3 support) """
     ++        """ Return text unaltered (for Python 3 support)
     ++        """
      +        return text
      +
      +    def to_unicode(text):
     -+        """Return a string as a unicode string"""
     ++        """ Return a string as a unicode string
     ++        """
      +        return text.decode('utf-8')
     -+    
     ++
      +    def path_as_string(path):
     -+        """ Converts a path to the UTF8 encoded bytes """
     ++        """ Converts a path to the UTF8 encoded bytes
     ++        """
      +        return encodeWithUTF8(path)
      +
     -+
     -+ 
      +# Check for raw_input support
      +try:
      +    raw_input
     @@ -113,3 +142,21 @@
       
       try:
           from subprocess import CalledProcessError
     +@@
     +             if data[:space] == depotPath:
     +                 output = entry
     +                 break
     +-    if output == None:
     ++    if output is None:
     +         return ""
     +     if output["code"] == "error":
     +         return ""
     +@@
     +     global verbose
     +     verbose = cmd.verbose
     +     if cmd.needsGit:
     +-        if cmd.gitdir == None:
     ++        if cmd.gitdir is None:
     +             cmd.gitdir = os.path.abspath(".git")
     +             if not isValidGitDir(cmd.gitdir):
     +                 # "rev-parse --git-dir" without arguments will try $PWD/.git
  4:  3c41db3e91 !  9:  a221eb8bb6 git-p4: python3 syntax changes
     @@ -1,23 +1,33 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: python3 syntax changes
     +    git-p4: python 3 syntax changes
      
     -    Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
     +    Python 3 handles strings differently than Python 2.7.  Since Python 2
     +    is reaching it's end of life, a series of changes are being submitted to
     +    enable python 3.5 and following support. The current code fails basic
     +    tests under python 3.5.
      
     -    There are a number of translations suggested by modernize/futureize that should be taken to fix numerous non-string specific issues.
     +    There are a number of translations suggested by modernize/futureize that
     +    should be taken to fix numerous non-string specific issues.
      
     -    Change references to the X.next() iterator to the function next(X) which is compatible with both Python2 and Python3.
     +    Change references to the X.next() iterator to the function next(X) which
     +    is compatible with both Python2 and Python3.
      
     -    Change references to X.keys() to list(X.keys()) to return a list that can be iterated in both Python2 and Python3.
     +    Change references to X.keys() to list(X.keys()) to return a list that
     +    can be iterated in both Python2 and Python3.
      
     -    Add the literal text (object) to the end of class definitions to be consistent with Python3 class definition.
     +    Add the literal text (object) to the end of class definitions to be
     +    consistent with Python3 class definition.
      
     -    Change integer divison to use "//" instead of "/"  Under Both python2 and python3 // will return a floor()ed result which matches existing functionality.
     +    Change integer divison to use "//" instead of "/"  Under Both Python 2
     +    and Python 3 // will return a floor()ed result which matches existing
     +    functionality.
      
     -    Change the format string for displaying decimal values from %d to %4.1f% when displaying a progress.  This avoids displaying long repeating decimals in user displayed text.
     +    Change the format string for displaying decimal values from %d to %4.1f%
     +    when displaying a progress.  This avoids displaying long repeating
     +    decimals in user displayed text.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit bde6b83296aa9b3e7a584c5ce2b571c7287d8f9f)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
     @@ -30,7 +40,7 @@
      +import codecs
      +import io
       
     - # support basestring in python3
     + # support basestring in Python 3
       try:
      @@
       
  6:  8f5752c127 ! 10:  b962cce8cd git-p4: Fix assumed path separators to be more Windows friendly
     @@ -1,19 +1,30 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: Fix assumed path separators to be more Windows friendly
     +    git-p4: fix assumed path separators to be more Windows friendly
      
     -    When a computer is configured to use Git for windows and Python for windows, and not a Unix subsystem like cygwin or WSL, the directory separator changes and causes git-p4 to fail to properly determine paths.
     +    When a computer is configured to use Git for windows and Python for
     +    windows, and not a Unix subsystem like cygwin or WSL, the directory
     +    separator changes and causes git-p4 to fail to properly determine paths.
      
          Fix 3 path separator errors:
      
     -    1. getUserCacheFilename should not use string concatenation. Change this code to use os.path.join to build an OS tolerant path.
     -    2. defaultDestiantion used the OS.path.split to split depot paths.  This is incorrect on windows. Change the code to split on a forward slash(/) instead since depot paths use this character regardless  of the operating system.
     -    3. The call to isvalidGitDir() in the main code also used a literal forward slash. Change the cose to use os.path.join to correctly format the path for the operating system.
     +    1. getUserCacheFilename() - should not use string concatenation. Change
     +       this code to use os.path.join to build an OS tolerant path.
      
     -    These three changes allow the suggested windows configuration to properly locate files while retaining the existing behavior on non-windows operating systems.
     +    2. defaultDestiantion used the OS.path.split to split depot paths.  This
     +       is incorrect on windows. Change the code to split on a forward
     +       slash(/) instead since depot paths use this character regardless  of
     +       the operating system.
     +
     +    3. The call to isValidGitDir() in the main code also used a literal
     +       forward slash. Change the code to use os.path.join to correctly
     +       format the path for the operating system.
     +
     +    These three changes allow the suggested windows configuration to
     +    properly locate files while retaining the existing behavior on
     +    non-windows operating systems.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit a5b45c12c3861638a933b05a1ffee0c83978dcb2)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
     @@ -22,8 +33,8 @@
                   return True
       
           def getUserCacheFilename(self):
     -+        """ Returns the filename of the username cache 
     -+	    """
     ++        """ Returns the filename of the username cache
     ++        """
               home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
      -        return home + "/.gitp4-usercache.txt"
      +        return os.path.join(home, ".gitp4-usercache.txt")
     @@ -34,7 +45,7 @@
               self.cloneBare = False
       
           def defaultDestination(self, args):
     -+        """ Returns the last path component as the default git 
     ++        """ Returns the last path component as the default git
      +            repository directory name
      +        """
               ## TODO: use common prefix of args?
  7:  10dc059444 ! 11:  d22ada1614 git-p4: Add a helper class for stream writing
     @@ -1,25 +1,43 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: Add a helper class for stream writing
     +    git-p4: add Py23File() - helper class for stream writing
      
     -    This is a transtional commit that does not change current behvior.  It adds a new class Py23File.
     +    This is a preparatory commit that does not change current behavior.
     +    It adds a new class Py23File.
      
     -    Following the Python recommendation of keeping text as unicode internally and only converting to and from bytes on input and output, this class provides an interface for the methods used for reading and writing files and file like streams.
     +    Following the Python recommendation of keeping text as unicode
     +    internally and only converting to and from bytes on input and output,
     +    this class provides an interface for the methods used for reading and
     +    writing files and file like streams.
      
     -    Create a class that wraps the input and output functions used by the git-p4.py code for reading and writing to standard file handles.
     +    A new class was implemented to avoid requiring additional dependencies.
      
     -    The methods of this class should take a Unicode string for writing and return unicode strings in reads.  This class should be a drop-in for existing file like streams
     +    Create a class that wraps the input and output functions used by the
     +    git-p4.py code for reading and writing to standard file handles.
      
     -    The following methods should be coded for supporting existing read/write calls:
     -    * write - this should write a Unicode string to the underlying stream
     -    * read - this should read from the underlying stream and cast the bytes as a unicode string
     -    * readline - this should read one line of text from the underlying stream and cast it as a unicode string
     -    * readline - this should read a number of lines, optionally hinted, and cast each line as a unicode string
     +    The methods of this class should take a Unicode string for writing and
     +    return unicode strings in reads.  This class should be a drop-in for
     +    existing file like streams
      
     -    The expression "cast as a unicode string" is used because the code should use the AS_BYTES() and AS_UNICODE() functions instead of cohercing the data to actual unicode strings or bytes.  This allows python 2 code to continue to use the internal "str" data type instead of converting the data back and forth to actual unicode strings. This retains current python2 support while python3 support may be incomplete.
     +    The following methods should be coded for supporting existing read/write
     +    calls:
     +      * write - this should write a Unicode string to the underlying stream
     +      * read  - this should read from the underlying stream and cast the
     +                bytes as a unicode string
     +      * readline - this should read one line of text from the underlying
     +                stream and cast it as a unicode string
     +      * readline - this should read a number of lines, optionally hinted,
     +                and cast each line as a unicode string
     +
     +    The expression "cast as a unicode string" is used because the code
     +    should use the as_bytes() and as_string() functions instead of
     +    cohercing the data to actual unicode strings or bytes.  This allows
     +    Python 2 code to continue to use the internal "str" data type instead
     +    of converting the data back and forth to actual unicode strings. This
     +    retains current Python 2 support while Python 3 support may be
     +    incomplete.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit 12919111fbaa3e4c0c4c2fdd4f79744cc683d860)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
     @@ -29,13 +47,13 @@
               return True
       
      +class Py23File():
     -+    """ Python2/3 Unicode File Wrapper 
     ++    """ Python2/3 Unicode File Wrapper
      +    """
     -+    
     ++
      +    stream_handle = None
      +    verbose       = False
      +    debug_handle  = None
     -+   
     ++
      +    def __init__(self, stream_handle, verbose = False):
      +        """ Create a Python3 compliant Unicode to Byte String
      +            Windows compatible wrapper
     @@ -47,7 +65,7 @@
      +        self.verbose       = verbose
      +
      +    def write(self, utf8string):
     -+        """ Writes the utf8 encoded string to the underlying 
     ++        """ Writes the utf8 encoded string to the underlying
      +            file stream
      +        """
      +        self.stream_handle.write(as_bytes(utf8string))
     @@ -56,7 +74,7 @@
      +            sys.stderr.flush()
      +
      +    def read(self, size = None):
     -+        """ Reads int charcters from the underlying stream 
     ++        """ Reads int charcters from the underlying stream
      +            and converts it to utf8.
      +
      +            Be aware, the size value is for reading the underlying
     @@ -69,7 +87,7 @@
      +            return as_string(self.stream_handle.read(size))
      +
      +    def readline(self):
     -+        """ Reads a line from the underlying byte stream 
     ++        """ Reads a line from the underlying byte stream
      +            and converts it to utf8
      +        """
      +        return as_string(self.stream_handle.readline())
     @@ -77,8 +95,8 @@
      +    def readlines(self, sizeHint = None):
      +        """ Returns a list containing lines from the file converted to unicode.
      +
     -+            sizehint - Optional. If the optional sizehint argument is 
     -+            present, instead of reading up to EOF, whole lines totalling 
     ++            sizehint - Optional. If the optional sizehint argument is
     ++            present, instead of reading up to EOF, whole lines totalling
      +            approximately sizehint bytes are read.
      +        """
      +        lines = self.stream_handle.readlines(sizeHint)
  8:  e1a424a955 ! 12:  e97ac0af8a git-p4: p4CmdList  - support Unicode encoding
     @@ -1,19 +1,55 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: p4CmdList  - support Unicode encoding
     +    git-p4: p4CmdList - support Unicode encoding
      
     -    The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.
     +    The p4CmdList is a commonly used function in the git-p4 code. It is used
     +    to execute a command in P4 and return the results of the call in a list.
      
     -    Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.
     +    The problem is that p4CmdList takes bytes as the parameter data and
     +    returns bytes in the return list.
      
     -    Change the code so that the key will always be encoded AS_STRING()
     +    Add a new optional parameter to the signature, encode_cmd_output, that
     +    determines if the dictionary values returned in the function output are
     +    treated as bytes or as strings.
      
     -    Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.
     +    Change the code to conditionally pass the output data through the
     +    as_string() function when encode_cmd_output is true. Otherwise the
     +    function should return the data as bytes.
      
     -    Additionally, change literal text prior to conversion to be literal bytes.
     +    Change the code so that regardless of the setting of encode_cmd_output,
     +    the dictionary keys in the return value will always be encoded with
     +    as_string().
     +
     +    as_string(bytes) is a method defined in this project that treats the
     +    byte data as a string. The word "string" is used because the meaning
     +    varies depending on the version of Python:
     +
     +      - Python 2: The "bytes" are returned as "str", functionally a No-op.
     +      - Python 3: The "bytes" are returned as a Unicode string.
     +
     +    The p4CmdList function returns a list of dictionaries that contain
     +    the result of p4 command. If the callback (cb) is defined, the
     +    standard output of the p4 command is redirected.
     +
     +    Data that is passed to the standard input of the P4 process should be
     +    as_bytes() to avoid conversion unicode encoding errors.
     +
     +    as_bytes(text) is a method defined in this project that treats the text
     +    data as a string that should be converted to a byte array (bytes). The
     +    behavior of this function depends on the version of python:
     +
     +      - Python 2: The "text" is returned as "str", functionally a No-op.
     +      - Python 3: The "text" is treated as a UTF-8 encoded Unicode string
     +            and is decoded to bytes.
     +
     +    Additionally, change literal text prior to conversion to be literal
     +    bytes for the code that is evaluating the standard output from the
     +    p4 call.
     +
     +    Add encode_cmd_output to the p4Cmd since this is a helper function that
     +    wraps the behavior of p4CmdList.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit 88306ac269186cbd0f6dc6cfd366b50b28ee4886)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
     @@ -23,7 +59,7 @@
       
       def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
      -        errors_as_exceptions=False):
     -+        errors_as_exceptions=False, encode_data=True):
     ++        errors_as_exceptions=False, encode_cmd_output=True):
      +    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
      +        standard input via a temporary file with 'stdin_mode' mode.
      +
     @@ -37,7 +73,7 @@
      +        If 'errors_as_exceptions' is set to true (the default is false) the error
      +        code returned from the execution will generate an exception.
      +
     -+        If 'encode_data' is set to true (the default) the data that is returned 
     ++        If 'encode_cmd_output' is set to true (the default) the data that is returned
      +        by this function will be passed through the "as_string" function.
      +    """
       
     @@ -65,8 +101,38 @@
      -                result.append(entry)
      +                out = {}
      +                for key, value in entry.items():
     -+                    out[as_string(key)] = (as_string(value) if encode_data else value)
     ++                    out[as_string(key)] = (as_string(value) if encode_cmd_output else value)
      +                result.append(out)
           except EOFError:
               pass
           exitCode = p4.wait()
     +@@
     + 
     +     return result
     + 
     +-def p4Cmd(cmd):
     +-    list = p4CmdList(cmd)
     ++def p4Cmd(cmd, encode_cmd_output=True):
     ++    """Executes a P4 command and returns the results in a dictionary"""
     ++    list = p4CmdList(cmd, encode_cmd_output=encode_cmd_output)
     +     result = {}
     +     for entry in list:
     +         result.update(entry)
     +@@
     +     """Look at the p4 client spec, create a View() object that contains
     +        all the mappings, and return it."""
     + 
     +-    specList = p4CmdList("client -o")
     ++    specList = p4CmdList("client -o", encode_cmd_output=False)
     +     if len(specList) != 1:
     +         die('Output from "client -o" is %d lines, expecting 1' %
     +             len(specList))
     +@@
     +         if len(fileArgs) == 0:
     +             return  # All files in cache
     + 
     +-        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs)
     ++        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs, encode_cmd_output=False)
     +         for res in where_result:
     +             if "code" in res and res["code"] == "error":
     +                 # assume error is "... file(s) not in client view"
 10:  04a0aedbaa ! 13:  e7bb92bcd6 git-p4: Support python3 for basic P4 clone, sync, and submit
     @@ -1,54 +1,130 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: Support python3 for basic P4 clone, sync, and submit
     +    git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800)
      
     -    Issue: Python 3 is still not properly supported for any use with the git-p4 python code.
     -    Warning - this is a very large atomic commit.  The commit text is also very large.
     +    NOTE: Python 3 is still not properly supported for any use with the
     +    git-p4 python code.
      
     -    Change the code such that, with the exception of P4 depot paths and depot files, all text read by git-p4 is cast as a string as soon as possible and converted back to bytes as late as possible, following Python2 to Python3 conversion best practices.
     +    Warning - this is a very large atomic commit.  The commit text is also
     +    very large.
      
     -    Important: Do not cast the bytes that contain the p4 depot path or p4 depot file name.  These should be left as bytes until used.
     +    Change the code such that, with the exception of P4 depot paths and
     +    depot files, all text read by git-p4 is cast as a string as soon as
     +    possible and converted back to bytes as late as possible, following
     +    Python 2 to Python 3 conversion best practices.
      
     -    These two values should not be converted because the encoding of these values is unknown.  git-p4 supports a configuration value git-p4.pathEncoding that is used by the encodeWithUTF8()  to determine what a UTF8 version of the path and filename should be.  However, since depot path and depot filename need to be sent to P4 in their original encoding, they will be left as byte streams until they are actually used:
     +    Important: Do not cast the bytes that contain the p4 depot path or p4
     +    depot file name.  These should be left as bytes until used.
      
     -    * When sent to P4, the bytes are literally passed to the p4 command
     -    * When displayed in text for the user, they should be passed through the path_as_string() function
     -    * When used by GIT they should be passed through the encodeWithUTF8() function
     +    These two values should not be converted because the encoding of these
     +    values is unknown.  git-p4 supports a configuration value
     +    git-p4.pathEncoding that is used by the encodeWithUTF8() to determine
     +    what a UTF8 version of the path and filename should be. However, since
     +    depot path and depot filename need to be sent to P4 in their original
     +    encoding, they will be left as byte streams until they are actually
     +    used:
      
     -    Change all the rest of system calls to cast output (stdin) as_bytes() and input (stdout) as_string().  This retains existing Python 2 support, and adds python 3 support for these functions:
     -    * read_pipe_full
     -    * read_pipe_lines
     -    * p4_has_move_command (used internally)
     -    * gitConfig
     -    * branch_exists
     -    * GitLFS.generatePointer
     -    * applyCommit - template must be read and written to the temporary file as_bytes() since it is created in memory as a string.
     -    * streamOneP4File(file, contents) - wrap calls to the depotFile in path_as_string() for display. The file contents must be retained as bytes, so update the RCS changes to be forced to bytes.
     -    * streamP4Files
     -    * importHeadRevision(revision) - encode the depotPaths for display separate from the text for processing.
     +      * When sent to P4, the bytes are literally passed to the p4 command
     +      * When displayed in text for the user, they should be passed through
     +        the path_as_string() function
     +      * When used by GIT they should be passed through the encodeWithUTF8()
     +        function
     +
     +    Change all the rest of system calls to cast output from system calls
     +    (stdin) as_bytes() and input (stdout) as_string().  This retains
     +    existing Python 2 support, and adds python 3 support for these
     +    functions:
     +
     +     * read_pipe_full(c)
     +     * read_pipe_lines(c)
     +     * p4_has_move_command() - used internally
     +     * gitConfig(key, typeSpecifier=None)
     +     * branch_exists(branch)
     +     * GitLFS.generatePointer(cloneDestination, contentFile)
     +     * P4Submit.applyCommit(id) - template must be read and written to the
     +           temporary file as_bytes() since it is created in memory as a
     +           string.
     +     * P4Sync.streamOneP4File(file, contents) - wrap calls to the depotFile
     +           in path_as_string() for display. The file contents must be
     +           retained as bytes, so update the RCS changes to be forced to
     +           bytes.
     +     * P4Sync.streamP4Files(marshalled)
     +     * P4Sync.importHeadRevision(revision) - encode the depotPaths for
     +           display separate from the text for processing.
      
          Py23File usage -
     -    Change the P4Sync.OpenStreams() function to cast the gitOutput, gitStream, and gitError streams as Py23File() wrapper classes.  This facilitates taking strings in both python 2 and python 3 and casting them to bytes in the wrapper class instead of having to modify each method. Since the fast-import command also expects a raw byte stream for file content, add a new stream handle - gitStreamBytes which is an unwrapped verison of gitStream.
     +
     +    Change the P4Sync.OpenStreams() function to cast the gitOutput,
     +    gitStream, and gitError streams as Py23File() wrapper classes.
     +    This facilitates taking strings in both python 2 and python 3 and
     +    casting them to bytes in the wrapper class instead of having to modify
     +    each method. Since the fast-import command also expects a raw byte
     +    stream for file content, add a new stream handle - gitStreamBytes which
     +    is an unwrapped verison of gitStream.
      
          Literal text -
     -    Depending on context, most literal text does not need casting to unicode or bytes as the text is Python dependent - In python 2, the string is implied as 'str' and python 3 the string is implied as 'unicode'. Under these conditions, they match the rest of the operating text, following best practices.  However, when a literal string is used in functions that are dealing with the raw input from and raw ouput to files streams, literal bytes may be required. Additionally, functions that are dealing with P4 depot paths or P4 depot file names are also dealing with bytes and will require the same casting as bytes.  The following functions cast text as byte strings:
     -    * wildcard_decode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
     -    * wildcard_encode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
     -    * streamP4FilesCb(marshalled) - the marshalled data is in bytes. Cast the literals as bytes. When using this data to manipulate self.stream_file, encode all the marshalled data except for the 'depotFile' name.
     -    * streamP4Files
     +    Depending on context, most literal text does not need casting to unicode
     +    or bytes as the text is Python dependent - In Python 2, the string is
     +    implied as 'str' and python 3 the string is implied as 'unicode'. Under
     +    these conditions, they match the rest of the operating text, following
     +    best practices.  However, when a literal string is used in functions
     +    that are dealing with the raw input from and raw ouput to files streams,
     +    literal bytes may be required. Additionally, functions that are dealing
     +    with P4 depot paths or P4 depot file names are also dealing with bytes
     +    and will require the same casting as bytes.  The following functions
     +    cast text as byte strings:
     +
     +     * wildcard_decode(path) - the path parameter is a P4 depot and is
     +           bytes. Cast all the literals to bytes.
     +     * wildcard_encode(path) - the path parameter is a P4 depot and is
     +           bytes. Cast all the literals to bytes.
     +     * P4Sync.streamP4FilesCb(marshalled) - the marshalled data is in bytes.
     +           Cast the literals as bytes. When using this data to manipulate
     +           self.stream_file, encode all the marshalled data except for the
     +           'depotFile' name.
     +     * P4Sync.streamP4Files(marshalled)
      
          Special behavior:
     -    * p4_describe - encoding is disabled for the depotFile(x) and path elements since these are depot path and depo filenames.
     -    * p4PathStartsWith(path, prefix) - Since P4 depot paths can contain non-UTF-8 encoded strings, change this method to compare paths while supporting the optional encoding.
     -       - First, perform a byte-to-byte check to see if the path and prefix are both identical text.  There is no need to perform encoding conversions if the text is identical.
     -       - If the byte check fails, pass both the path and prefix through encodeWithUTF8() to ensure both paths are using the same encoding. Then perform the test as originally written.
     -    * patchRCSKeywords(file, pattern) - the parameters of file and pattern are both strings. However this function changes the contents of the file itentified by name "file". Treat the content of this file as binary to ensure that python does not accidently change the original encoding. The regular expression is cast as_bytes() and run against the file as_bytes(). The P4 keywords are ASCII strings and cannot span lines so iterating over each line of the file is acceptable.
     -    * writeToGitStream(gitMode, relPath, contents) - Since 'contents' is already bytes data, instead of using the self.gitStream, use the new self.gitStreamBytes - the unwrapped gitStream that does not cast as_bytes() the binary data.
     -    * commit(details, files, branch, parent = "", allow_empty=False) - Changed the encoding for the commit message to the preferred format for fast-import. The number of bytes is sent in the data block instead of using the EOT marker.
     -    * Change the code for handling the user cache to use binary files. Cast text as_bytes() when writing to the cache and as_string() when reading from the cache.  This makes the reading and writing of the cache determinstic in it's encoding. Unlike file paths, P4 encodes the user names in UTF-8 encoding so no additional string encoding is required.
     +
     +     * p4_describep4_describe(change, shelved=False) - encoding is disabled
     +           for the depotFile(x) and path elements since these are depot path
     +           and depo filenames.
     +     * p4PathStartsWith(path, prefix) - Since P4 depot paths can contain
     +           non-UTF-8 encoded strings, change this method to compare paths
     +           while supporting the optional encoding.
     +
     +            - First, perform a byte-to-byte check to see if the path and
     +                  prefix are both identical text.  There is no need to
     +                  perform encoding conversions if the text is identical.
     +            - If the byte check fails, pass both the path and prefix through
     +                  encodeWithUTF8() to ensure both paths are using the same
     +                  encoding. Then perform the test as originally written.
     +
     +     * P4Submit.patchRCSKeywords(file, pattern) - the parameters of file and
     +           pattern are both strings. However this function changes the
     +           contents of the file itentified by name "file". Treat the content
     +           of this file as binary to ensure that python does not accidently
     +           change the original encoding. The regular expression is cast
     +           as_bytes() and run against the file as_bytes(). The P4 keywords
     +           are ASCII strings and cannot span lines so iterating over each
     +           line of the file is acceptable.
     +     * P4Sync.writeToGitStream(gitMode, relPath, contents) - Since
     +           'contents' is already bytes data, instead of using the
     +           self.gitStream, use the new self.gitStreamBytes - the unwrapped
     +           gitStream that does not cast as_bytes() the binary data.
     +     * P4Sync.commit(details, files, branch, parent = "", allow_empty=False)
     +           Changed the encoding for the commit message to the preferred
     +           format for fast-import. The number of bytes is sent in the data
     +           block instead of using the EOT marker.
     +
     +     * Change the code for handling the user cache to use binary files.
     +           Cast text as_bytes() when writing to the cache and as_string()
     +           when reading from the cache.  This makes the reading and writing
     +           of the cache determinstic in it's encoding. Unlike file paths,
     +           P4 encodes the user names in UTF-8 encoding so no additional
     +           string encoding is required.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit 65ff0c74ebe62a200b4385ecfd4aa618ce091f48)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
     @@ -122,7 +198,7 @@
           cmd += [str(change)]
       
      -    ds = p4CmdList(cmd, skip_info=True)
     -+    ds = p4CmdList(cmd, skip_info=True, encode_data=False)
     ++    ds = p4CmdList(cmd, skip_info=True, encode_cmd_output=False)
           if len(ds) != 1:
               die("p4 describe -s %d did not return 1 result: %s" % (change, str(ds)))
       
     @@ -137,29 +213,20 @@
           if "time" not in d:
               die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
       
     -+    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however 
     -+    # cast as_string() the rest of the text. 
     ++    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however
     ++    # cast as_string() the rest of the text.
      +    keys=d.keys()
      +    for key in keys:
      +        if key.startswith('depotFile'):
     -+            d[key]=d[key] 
     ++            d[key]=d[key]
      +        elif key == 'path':
     -+            d[key]=d[key] 
     ++            d[key]=d[key]
      +        else:
      +            d[key] = as_string(d[key])
      +
           return d
       
       #
     -@@
     -     return result
     - 
     - def p4Cmd(cmd):
     -+    """ Executes a P4 command and returns the results in a dictionary
     -+    """
     -     list = p4CmdList(cmd)
     -     result = {}
     -     for entry in list:
      @@
       _gitConfig = {}
       
     @@ -189,13 +256,13 @@
           #
           # we may or may not have a problem. If you have core.ignorecase=true,
           # we treat DirA and dira as the same directory
     -+    
     ++
      +    # Since we have to deal with mixed encodings for p4 file
      +    # paths, first perform a simple startswith check, this covers
      +    # the case that the formats and path are identical.
      +    if as_bytes(path).startswith(as_bytes(prefix)):
      +        return True
     -+    
     ++
      +    # attempt to convert the prefix and path both to utf8
      +    path_utf8 = encodeWithUTF8(path)
      +    prefix_utf8 = encodeWithUTF8(prefix)
     @@ -203,8 +270,8 @@
           if gitConfigBool("core.ignorecase"):
      -        return path.lower().startswith(prefix.lower())
      -    return path.startswith(prefix)
     -+        # Check if we match byte-per-byte.  
     -+        
     ++        # Check if we match byte-per-byte.
     ++
      +        return path_utf8.lower().startswith(prefix_utf8.lower())
      +    return path_utf8.startswith(prefix_utf8)
       
     @@ -272,7 +339,7 @@
               self.userMapFromPerforceServer = True
       
           def loadUserMapFromCache(self):
     -+        """ Reads the P4 username to git email map 
     ++        """ Reads the P4 username to git email map
      +        """
               self.users = {}
               self.userMapFromPerforceServer = False
     @@ -292,7 +359,7 @@
       
           def patchRCSKeywords(self, file, pattern):
      -        # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
     -+        """ Attempt to zap the RCS keywords in a p4 
     ++        """ Attempt to zap the RCS keywords in a p4
      +            controlled file matching the given pattern
      +        """
      +        bSubLine = as_bytes(r'$\1$')
     @@ -377,7 +444,7 @@
      +        """ output one file from the P4 stream to the git inbound stream.
      +            helper for streamP4files.
      +
     -+            contents should be a bytes (bytes) 
     ++            contents should be a bytes (bytes)
      +        """
               relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
               relPath = encodeWithUTF8(relPath, self.verbose)
     @@ -427,7 +494,7 @@
       
      -    # handle another chunk of streaming data
           def streamP4FilesCb(self, marshalled):
     -+        """ Callback function for recording P4 chunks of data for streaming 
     ++        """ Callback function for recording P4 chunks of data for streaming
      +            into GIT.
      +
      +            marshalled data is bytes[] from the caller
     @@ -493,7 +560,7 @@
       
      -    # Stream directly from "p4 files" into "git fast-import"
           def streamP4Files(self, files):
     -+        """ Stream directly from "p4 files" into "git fast-import" 
     ++        """ Stream directly from "p4 files" into "git fast-import"
      +        """
               filesForCommit = []
               filesToRead = []
     @@ -544,7 +611,7 @@
      +	    #('merge' SP <commit-ish> LF)*
      +	    #(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
      +	    #LF?
     -+        
     ++
      +        #'commit' - <ref> is the name of the branch to make the commit on
               self.gitStream.write("commit %s\n" % branch)
      +        #'mark' SP :<idnum>
     @@ -558,9 +625,9 @@
      -        self.gitStream.write("data <<EOT\n")
      -        self.gitStream.write(details["desc"])
      +        # Per https://git-scm.com/docs/git-fast-import
     -+        # The preferred method for creating the commit message is to supply the 
     -+        # byte count in the data method and not to use a Delimited format. 
     -+        # Collect all the text in the commit message into a single string and 
     ++        # The preferred method for creating the commit message is to supply the
     ++        # byte count in the data method and not to use a Delimited format.
     ++        # Collect all the text in the commit message into a single string and
      +        # compute the byte count.
      +        commitText = details["desc"]
               if len(jobs) > 0:
     @@ -584,7 +651,7 @@
      +            if len(details['options']) > 0:
      +                commitText += (": options = %s" % details['options'])
      +            commitText += "]"
     -+        commitText += "\n" 
     ++        commitText += "\n"
      +        self.gitStream.write("data %s\n" % len(as_bytes(commitText)))
      +        self.gitStream.write(commitText)
      +        self.gitStream.write("\n")
     @@ -617,7 +684,7 @@
               fileArgs = ["%s...%s" % (p,revision) for p in self.depotPaths]
       
      -        for info in p4CmdList(["files"] + fileArgs):
     -+        for info in p4CmdList(["files"] + fileArgs, encode_data = False):
     ++        for info in p4CmdList(["files"] + fileArgs, encode_cmd_output=False):
       
      -            if 'code' in info and info['code'] == 'error':
      +            if 'code' in info and info['code'] == b'error':
     @@ -640,7 +707,7 @@
                       #fileCnt = fileCnt + 1
                       continue
       
     -+            # Save all the file information, howerver do not translate the depotFile name at 
     ++            # Save all the file information, howerver do not translate the depotFile name at
      +            # this time. Leave that as bytes since the encoding may vary.
                   for prop in ["depotFile", "rev", "action", "type" ]:
      -                details["%s%s" % (prop, fileCnt)] = info[prop]
 11:  883ef45ca5 ! 14:  25ad3e23a3 git-p4: Added --encoding parameter to p4 clone
     @@ -1,19 +1,24 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    git-p4: Added --encoding parameter to p4 clone
     +    git-p4: added --encoding parameter to p4 clone
      
     -    The test t9822 did not have any tests that had encoded a directory name in ISO8859-1.
     +    The test t9822 did not have any tests that had encoded a directory name
     +    in ISO8859-1.
      
     -    Additionally, to make it easier for the user to clone new repositories with a non-UTF-8 encoded path in P4, add a new parameter to p4clone "--encoding" that sets the
     +    Additionally, to make it easier for the user to clone new repositories
     +    with a non-UTF-8 encoded path in P4, add a new parameter to p4clone
     +    "--encoding" that sets the
      
     -    Add new tests that use ISO8859-1 encoded text in both the directory and file names.  git-p4.pathEncoding.
     +    Add new tests that use ISO8859-1 encoded text in both the directory and
     +    file names.  git-p4.pathEncoding.
      
     -    Update the View class in the git-p4 code to properly cast text as_string() except for depot path and filenames.
     +    Update the View class in the git-p4 code to properly cast text
     +    as_string() except for depot path and filenames.
      
     -    Update the documentation to include the new command line parameter for p4clone
     +    Update the documentation to include the new command line parameter for
     +    p4clone
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     -    (cherry picked from commit e26f6309d60c6c1615320d4a9071935e23efe6fb)
      
       diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
       --- a/Documentation/git-p4.txt
     @@ -23,8 +28,8 @@
       	Perform a bare clone.  See linkgit:git-clone[1].
       
      +--encoding <encoding>::
     -+    Optionally sets the git-p4.pathEncoding configuration value in 
     -+	the newly created Git repository before files are synchronized 
     ++    Optionally sets the git-p4.pathEncoding configuration value in
     ++	the newly created Git repository before files are synchronized
      +	from P4. See git-p4.pathEncoding for more information.
      +
       Submit options
     @@ -34,15 +39,6 @@
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
       +++ b/git-p4.py
     -@@
     -     """Look at the p4 client spec, create a View() object that contains
     -        all the mappings, and return it."""
     - 
     --    specList = p4CmdList("client -o")
     -+    specList = p4CmdList("client -o", encode_data=False)
     -     if len(specList) != 1:
     -         die('Output from "client -o" is %d lines, expecting 1' %
     -             len(specList))
      @@
           entry = specList[0]
       
     @@ -130,11 +126,8 @@
       
           def update_client_spec_path_cache(self, files):
      @@
     -         if len(fileArgs) == 0:
     -             return  # All files in cache
       
     --        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs)
     -+        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs, encode_data=False)
     +         where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs, encode_cmd_output=False)
               for res in where_result:
      -            if "code" in res and res["code"] == "error":
      +            if "code" in res and res["code"] == b"error":
     @@ -155,7 +148,7 @@
      +        self.setPathEncoding = None
       
           def defaultDestination(self, args):
     -         """ Returns the last path component as the default git 
     +         """ Returns the last path component as the default git
      @@
       
               depotPaths = args
     @@ -246,7 +239,7 @@
      +		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
      +		ISO8859="$(printf "$ISO8859_ESCAPED")" &&
      +		cd "$cli" &&
     -+		mkdir "$DIR_ISO8859" && 
     ++		mkdir "$DIR_ISO8859" &&
      +		cd "$DIR_ISO8859" &&
      +		echo content123 >"$ISO8859" &&
      +		p4 add "$ISO8859" &&
  -:  ---------- > 15:  445dbc59f0 git-p4: Add depot manipulation functions

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Jeff King via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 02/15] t/gitweb-lib.sh: set $REQUEST_URI Jeff King via GitGitGadget
                           ` (14 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Jeff King via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Jeff King

From: Jeff King <peff@peff.net>

Some variables assignments in gitweb_run() look like this:

  FOO=""$1""

The extra quotes aren't doing anything. Each set opens and closes an
empty string, and $1 is actually outside of any double-quotes (which is
OK, because variable assignment does not do whitespace splitting on the
expanded value).

Let's drop them, as they're simply confusing.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/gitweb-lib.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 1f32ca66ea..b8455d1182 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -60,7 +60,10 @@ gitweb_run () {
 	REQUEST_METHOD='GET'
 	QUERY_STRING=$1
 	PATH_INFO=$2
+<<<<<<< HEAD
 	REQUEST_URI=/gitweb.cgi$PATH_INFO
+=======
+>>>>>>> t/gitweb-lib.sh: drop confusing quotes
 	export GATEWAY_INTERFACE HTTP_ACCEPT REQUEST_METHOD \
 		QUERY_STRING PATH_INFO REQUEST_URI
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 02/15] t/gitweb-lib.sh: set $REQUEST_URI
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes Jeff King via GitGitGadget
@ 2019-12-07 17:47         ` Jeff King via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 03/15] git-p4: select P4 binary by operating-system Ben Keene via GitGitGadget
                           ` (13 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Jeff King via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Jeff King

From: Jeff King <peff@peff.net>

In a real webserver's CGI call, gitweb.cgi would typically see
$REQUEST_URI set. This variable does impact how we display our URL in
the resulting page, so let's try to make our test as realistic as
possible (we can just use the $PATH_INFO our caller passed in, if any).

This doesn't change the outcome of any tests, but it will help us add
some new tests in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/gitweb-lib.sh | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index b8455d1182..1f32ca66ea 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -60,10 +60,7 @@ gitweb_run () {
 	REQUEST_METHOD='GET'
 	QUERY_STRING=$1
 	PATH_INFO=$2
-<<<<<<< HEAD
 	REQUEST_URI=/gitweb.cgi$PATH_INFO
-=======
->>>>>>> t/gitweb-lib.sh: drop confusing quotes
 	export GATEWAY_INTERFACE HTTP_ACCEPT REQUEST_METHOD \
 		QUERY_STRING PATH_INFO REQUEST_URI
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 03/15] git-p4: select P4 binary by operating-system
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes Jeff King via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 02/15] t/gitweb-lib.sh: set $REQUEST_URI Jeff King via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 04/15] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
                           ` (12 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The original code unconditionally used "p4" as the binary filename.

Depending on the version of Git and Python installed, the perforce
program (p4) may not resolve on Windows without the program extension.

Check the operating system (platform.system) and if it is reporting that
it is Windows, use the full filename of "p4.exe" instead of "p4"

This change is Python 2 and Python 3 compatible.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..65e926758c 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -75,7 +75,10 @@ def p4_build_cmd(cmd):
     location. It means that hooking into the environment, or other configuration
     can be done more easily.
     """
-    real_cmd = ["p4"]
+    if (platform.system() == "Windows"):
+        real_cmd = ["p4.exe"]
+    else:
+        real_cmd = ["p4"]
 
     user = gitConfig("git-p4.user")
     if len(user) > 0:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 04/15] git-p4: change the expansion test from basestring to list
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (2 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 03/15] git-p4: select P4 binary by operating-system Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 05/15] git-p4: promote encodeWithUTF8() to a global function Ben Keene via GitGitGadget
                           ` (11 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3 handles strings differently than Python 2.7.  Since Python 2
is reaching it's end of life, a series of changes are being submitted to
enable python 3.5 and following support. The current code fails basic
tests under python 3.5.

The original code used 'basestring' in a test to determine if a list or
literal string was passed into 9 different functions.  This is used to
determine if the shell should be invoked when calling subprocess
methods.

Change references to 'basestring' in the isinstance tests to use 'list'
instead. This prepares the code to remove all references to basestring.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 65e926758c..3153186df0 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -108,7 +108,7 @@ def p4_build_cmd(cmd):
         # Provide a way to not pass this option by setting git-p4.retries to 0
         real_cmd += ["-r", str(retries)]
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         real_cmd = ' '.join(real_cmd) + ' ' + cmd
     else:
         real_cmd += cmd
@@ -174,7 +174,7 @@ def write_pipe(c, stdin):
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
     pipe = p.stdin
     val = pipe.write(stdin)
@@ -196,7 +196,7 @@ def read_pipe_full(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
     return (p.returncode, out, err)
@@ -232,7 +232,7 @@ def read_pipe_lines(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c, basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     pipe = p.stdout
     val = pipe.readlines()
@@ -275,7 +275,7 @@ def p4_has_move_command():
     return True
 
 def system(cmd, ignore_error=False):
-    expand = isinstance(cmd,basestring)
+    expand = not isinstance(cmd, list)
     if verbose:
         sys.stderr.write("executing %s\n" % str(cmd))
     retcode = subprocess.call(cmd, shell=expand)
@@ -287,7 +287,7 @@ def system(cmd, ignore_error=False):
 def p4_system(cmd):
     """Specifically invoke p4 as the system command. """
     real_cmd = p4_build_cmd(cmd)
-    expand = isinstance(real_cmd, basestring)
+    expand = not isinstance(real_cmd, list)
     retcode = subprocess.call(real_cmd, shell=expand)
     if retcode:
         raise CalledProcessError(retcode, real_cmd)
@@ -525,7 +525,7 @@ def getP4OpenedType(file):
 # Return the set of all p4 labels
 def getP4Labels(depotPaths):
     labels = set()
-    if isinstance(depotPaths,basestring):
+    if not isinstance(depotPaths, list):
         depotPaths = [depotPaths]
 
     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
@@ -612,7 +612,7 @@ def isModeExecChanged(src_mode, dst_mode):
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         errors_as_exceptions=False):
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         cmd = "-G " + cmd
         expand = True
     else:
@@ -629,7 +629,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     stdin_file = None
     if stdin is not None:
         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
-        if isinstance(stdin,basestring):
+        if not isinstance(stdin, list):
             stdin_file.write(stdin)
         else:
             for i in stdin:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 05/15] git-p4: promote encodeWithUTF8() to a global function
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (3 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 04/15] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 06/15] git-p4: remove p4_write_pipe() and write_pipe() return values Ben Keene via GitGitGadget
                           ` (10 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

This changelist is an intermediate submission for migrating the P4
support from Python 2 to Python 3. The code needs access to the
encodeWithUTF8() for support of non-UTF8 filenames in the clone class as
well as the sync class.

Move the function encodeWithUTF8() from the P4Sync class to a
stand-alone function.  This will allow other classes to use this
function without instanciating the P4Sync class. Change the self.verbose
reference to an optional method parameter. Update the existing
references to this function to pass the self.verbose since it is no
longer available on "self" since the function is no longer contained on
the P4Sync class.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 52 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 3153186df0..cc6c490e2c 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -27,7 +27,7 @@
 import ctypes
 import errno
 
-# support basestring in python3
+# support basestring in Python 3
 try:
     unicode = unicode
 except NameError:
@@ -46,7 +46,7 @@
 try:
     from subprocess import CalledProcessError
 except ImportError:
-    # from python2.7:subprocess.py
+    # from Python 2.7:subprocess.py
     # Exception classes used by this module.
     class CalledProcessError(Exception):
         """This exception is raised when a process run by check_call() returns
@@ -587,6 +587,38 @@ def isModeExec(mode):
     # otherwise False.
     return mode[-3:] == "755"
 
+def encodeWithUTF8(path, verbose=False):
+    """ Ensure that the path is encoded as a UTF-8 string
+
+        Returns bytes(P3)/str(P2)
+    """
+
+    if isunicode:
+        try:
+            if isinstance(path, unicode):
+                # It is already unicode, cast it as a bytes
+                # that is encoded as utf-8.
+                return path.encode('utf-8', 'strict')
+            path.decode('ascii', 'strict')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
+    else:
+        try:
+            path.decode('ascii')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
+    return path
+
 class P4Exception(Exception):
     """ Base class for exceptions from the p4 client """
     def __init__(self, exit_code):
@@ -2748,24 +2780,12 @@ def writeToGitStream(self, gitMode, relPath, contents):
             self.gitStream.write(d)
         self.gitStream.write('\n')
 
-    def encodeWithUTF8(self, path):
-        try:
-            path.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
-        return path
-
     # output one file from the P4 stream
     # - helper for streamP4Files
 
     def streamOneP4File(self, file, contents):
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             if 'fileSize' in self.stream_file:
                 size = int(self.stream_file['fileSize'])
@@ -2848,7 +2868,7 @@ def streamOneP4File(self, file, contents):
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 06/15] git-p4: remove p4_write_pipe() and write_pipe() return values
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (4 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 05/15] git-p4: promote encodeWithUTF8() to a global function Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 07/15] git-p4: add new support function gitConfigSet() Ben Keene via GitGitGadget
                           ` (9 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The git-p4 functions write_pipe() and p4_write_pipe() originally
return the number of bytes returned from the system call. However,
this is a misleading value when this function is run by Python 3.

Modify the functions write_pipe() and p4_write_pipe() to remove the
return value.  The return value for both functions is the number of
bytes, but the meaning is lost under python3 since the count does not
match the number of characters that may have been encoded.
Additionally, the return value was never used, so this is removed to
avoid future ambiguity.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index cc6c490e2c..e7c24817ad 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -171,6 +171,8 @@ def die(msg):
         sys.exit(1)
 
 def write_pipe(c, stdin):
+    """ Executes the command 'c', passing 'stdin' on the standard input
+    """
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
@@ -182,11 +184,12 @@ def write_pipe(c, stdin):
     if p.wait():
         die('Command failed: %s' % str(c))
 
-    return val
 
 def p4_write_pipe(c, stdin):
+    """ Runs a P4 command 'c', passing 'stdin' data to P4
+    """
     real_cmd = p4_build_cmd(c)
-    return write_pipe(real_cmd, stdin)
+    write_pipe(real_cmd, stdin)
 
 def read_pipe_full(c):
     """ Read output from  command. Returns a tuple
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 07/15] git-p4: add new support function gitConfigSet()
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (5 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 06/15] git-p4: remove p4_write_pipe() and write_pipe() return values Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 08/15] git-p4: add casting helper functions for python 3 conversion Ben Keene via GitGitGadget
                           ` (8 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Add a new method gitConfigSet(). This method will set a value in the git
configuration cache list.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index e7c24817ad..e020958083 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -860,6 +860,11 @@ def gitConfigList(key):
             _gitConfig[key] = []
     return _gitConfig[key]
 
+def gitConfigSet(key, value):
+    """ Set the git configuration key 'key' to 'value' for this session
+    """
+    _gitConfig[key] = value
+
 def p4BranchesInGit(branchesAreInRemotes=True):
     """Find all the branches whose names start with "p4/", looking
        in remotes or heads as specified by the argument.  Return
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 08/15] git-p4: add casting helper functions for python 3 conversion
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (6 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 07/15] git-p4: add new support function gitConfigSet() Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 09/15] git-p4: python 3 syntax changes Ben Keene via GitGitGadget
                           ` (7 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3 handles strings differently than Python 2.7.  Since Python 2
is reaching it's end of life, a series of changes are being submitted to
enable python 3.5 and following support. The current code fails basic
tests under python 3.5.

Change the existing unicode test add new support functions for
Python 2 - Python 3 support.

Define the following variables:
- isunicode - a boolean variable that states if the version of python
              natively supports unicode (true) or not (false). This is
              true for Python 3 and false for Python 2.
- unicode   - a type alias for the datatype that holds a unicode string.
              It is assigned to a str under Python 3 and the unicode
              type for Python 2.
- bytes     - a type alias for an array of bytes.  It is assigned the
              native bytes type for Python 3 and str for Python 2.

Add the following new functions:

- as_string(text)  - A new function that will convert a byte array to a
                     unicode (UTF-8) string under Python 3.  Under
                     Python 2, this returns the string unchanged.
- as_bytes(text)   - A new function that will convert a unicode string
                     to a byte array under Python 3.  Under Python 2,
                     this returns the string unchanged.
- to_unicode(text) - Converts a text string as Unicode(UTF-8) on both
                     Python 2 and Python 3.

Add a new function alias raw_input:
If raw_input does not exist (it was renamed to input in Python 3) alias
input as raw_input.

The as_string() and as_bytes() functions allow for modifying the code
with a minimal amount of impact on Python 2 support. When a string is
expected, the as_string() will be used to "cast" the incoming "bytes"
to a string type.

Conversely as_bytes() will be used to cast a "string" to a "byte array"
type. Since Python 2 overloads the datatype 'str' to serve both purposes,
the Python 2 versions of these function do not change the data. This
reduces the regression impact of these code changes.

'basestring' is removed since its only references are found in tests
that were changed in modified in previous commits.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 74 insertions(+), 6 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index e020958083..e6f7513384 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -32,16 +32,84 @@
     unicode = unicode
 except NameError:
     # 'unicode' is undefined, must be Python 3
-    str = str
+    #
+    # For Python 3 which is natively unicode, we will use
+    # unicode for internal information but all P4 Data
+    # will remain in bytes
+    isunicode = True
     unicode = str
     bytes = bytes
-    basestring = (str,bytes)
+
+    def as_string(text):
+        """ Return a byte array as a unicode string
+        """
+        if text is None:
+            return None
+        if isinstance(text, bytes):
+            return unicode(text, "utf-8")
+        else:
+            return text
+
+    def as_bytes(text):
+        """ Return a Unicode string as a byte array
+        """
+        if text is None:
+            return None
+        if isinstance(text, bytes):
+            return text
+        else:
+            return bytes(text, "utf-8")
+
+    def to_unicode(text):
+        """ Return a byte array as a unicode string
+        """
+        return as_string(text)
+
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded string
+        """
+        if isinstance(path, unicode):
+            return path
+        return encodeWithUTF8(path).decode('utf-8')
+
 else:
     # 'unicode' exists, must be Python 2
-    str = str
+    #
+    # We will treat the data as:
+    #   str   -> str
+    #   bytes -> str
+    # So for Python 2 these functions are no-ops
+    # and will leave the data in the ambiguious
+    # string/bytes state
+    isunicode = False
     unicode = unicode
     bytes = str
-    basestring = basestring
+
+    def as_string(text):
+        """ Return text unaltered (for Python 3 support)
+        """
+        return text
+
+    def as_bytes(text):
+        """ Return text unaltered (for Python 3 support)
+        """
+        return text
+
+    def to_unicode(text):
+        """ Return a string as a unicode string
+        """
+        return text.decode('utf-8')
+
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded bytes
+        """
+        return encodeWithUTF8(path)
+
+# Check for raw_input support
+try:
+    raw_input
+except NameError:
+    raw_input = input
 
 try:
     from subprocess import CalledProcessError
@@ -740,7 +808,7 @@ def p4Where(depotPath):
             if data[:space] == depotPath:
                 output = entry
                 break
-    if output == None:
+    if output is None:
         return ""
     if output["code"] == "error":
         return ""
@@ -4175,7 +4243,7 @@ def main():
     global verbose
     verbose = cmd.verbose
     if cmd.needsGit:
-        if cmd.gitdir == None:
+        if cmd.gitdir is None:
             cmd.gitdir = os.path.abspath(".git")
             if not isValidGitDir(cmd.gitdir):
                 # "rev-parse --git-dir" without arguments will try $PWD/.git
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 09/15] git-p4: python 3 syntax changes
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (7 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 08/15] git-p4: add casting helper functions for python 3 conversion Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 10/15] git-p4: fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
                           ` (6 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3 handles strings differently than Python 2.7.  Since Python 2
is reaching it's end of life, a series of changes are being submitted to
enable python 3.5 and following support. The current code fails basic
tests under python 3.5.

There are a number of translations suggested by modernize/futureize that
should be taken to fix numerous non-string specific issues.

Change references to the X.next() iterator to the function next(X) which
is compatible with both Python2 and Python3.

Change references to X.keys() to list(X.keys()) to return a list that
can be iterated in both Python2 and Python3.

Add the literal text (object) to the end of class definitions to be
consistent with Python3 class definition.

Change integer divison to use "//" instead of "/"  Under Both Python 2
and Python 3 // will return a floor()ed result which matches existing
functionality.

Change the format string for displaying decimal values from %d to %4.1f%
when displaying a progress.  This avoids displaying long repeating
decimals in user displayed text.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 55 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index e6f7513384..fc6c9406c2 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -26,6 +26,9 @@
 import zlib
 import ctypes
 import errno
+import os.path
+import codecs
+import io
 
 # support basestring in Python 3
 try:
@@ -639,7 +642,7 @@ def parseDiffTreeEntry(entry):
 
     If the pattern is not matched, None is returned."""
 
-    match = diffTreePattern().next().match(entry)
+    match = next(diffTreePattern()).match(entry)
     if match:
         return {
             'src_mode': match.group(1),
@@ -980,7 +983,7 @@ def findUpstreamBranchPoint(head = "HEAD"):
     branches = p4BranchesInGit()
     # map from depot-path to branch name
     branchByDepotPath = {}
-    for branch in branches.keys():
+    for branch in list(branches.keys()):
         tip = branches[branch]
         log = extractLogMessageFromGitCommit(tip)
         settings = extractSettingsGitLog(log)
@@ -1174,7 +1177,7 @@ def getClientSpec():
     client_name = entry["Client"]
 
     # just the keys that start with "View"
-    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
 
     # hold this new View
     view = View(client_name)
@@ -1416,7 +1419,7 @@ def processContent(self, git_mode, relPath, contents):
         else:
             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
 
-class Command:
+class Command(object):
     delete_actions = ( "delete", "move/delete", "purge" )
     add_actions = ( "add", "branch", "move/add" )
 
@@ -1431,7 +1434,7 @@ def ensure_value(self, attr, value):
             setattr(self, attr, value)
         return getattr(self, attr)
 
-class P4UserMap:
+class P4UserMap(object):
     def __init__(self):
         self.userMapFromPerforceServer = False
         self.myP4UserId = None
@@ -1482,7 +1485,7 @@ def getUserMapFromPerforceServer(self):
                 self.emails[email] = user
 
         s = ''
-        for (key, val) in self.users.items():
+        for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
         open(self.getUserCacheFilename(), "wb").write(s)
@@ -1833,7 +1836,7 @@ def prepareSubmitTemplate(self, changelist=None):
                 break
         if not change_entry:
             die('Failed to decode output of p4 change -o')
-        for key, value in change_entry.iteritems():
+        for key, value in list(change_entry.items()):
             if key.startswith('File'):
                 if 'depot-paths' in settings:
                     if not [p for p in settings['depot-paths']
@@ -2077,7 +2080,7 @@ def applyCommit(self, id):
             p4_delete(f)
 
         # Set/clear executable bits
-        for f in filesToChangeExecBit.keys():
+        for f in list(filesToChangeExecBit.keys()):
             mode = filesToChangeExecBit[f]
             setP4ExecBit(f, mode)
 
@@ -2330,7 +2333,7 @@ def run(self, args):
             self.clientSpecDirs = getClientSpec()
 
         # Check for the existence of P4 branches
-        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
 
         if self.useClientSpec and not branchesDetected:
             # all files are relative to the client spec
@@ -2721,7 +2724,7 @@ def __init__(self):
         self.knownBranches = {}
         self.initialParents = {}
 
-        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
         self.labels = {}
 
     # Force a checkpoint in fast-import and wait for it to finish
@@ -2838,7 +2841,7 @@ def splitFilesIntoBranches(self, commit):
             else:
                 relPath = self.stripRepoPath(path, self.depotPaths)
 
-            for branch in self.knownBranches.keys():
+            for branch in list(self.knownBranches.keys()):
                 # add a trailing slash so that a commit into qt/4.2foo
                 # doesn't end up in qt/4.2, e.g.
                 if p4PathStartsWith(relPath, branch + "/"):
@@ -2867,7 +2870,7 @@ def streamOneP4File(self, file, contents):
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2967,7 +2970,7 @@ def streamP4FilesCb(self, marshalled):
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
             if required_bytes > 0:
                 err = 'Not enough space left on %s! Free at least %i MB.' % (
-                    os.getcwd(), required_bytes/1024/1024
+                    os.getcwd(), required_bytes//1024//1024
                 )
 
         if err:
@@ -2996,7 +2999,7 @@ def streamP4FilesCb(self, marshalled):
 
         # pick up the new file information... for the
         # 'data' field we need to append to our array
-        for k in marshalled.keys():
+        for k in list(marshalled.keys()):
             if k == 'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
@@ -3011,8 +3014,8 @@ def streamP4FilesCb(self, marshalled):
             'depotFile' in self.stream_file):
             size = int(self.stream_file["fileSize"])
             if size > 0:
-                progress = 100*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
+                progress = 100.0*self.stream_file['streamContentSize']/size
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
@@ -3093,7 +3096,7 @@ def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
 
         gitStream.write("tagger %s\n" % tagger)
 
-        print("labelDetails=",labelDetails)
+        print(("labelDetails=",labelDetails))
         if 'Description' in labelDetails:
             description = labelDetails['Description']
         else:
@@ -3232,7 +3235,7 @@ def getLabels(self):
             self.labels[newestChange] = [output, revisions]
 
         if self.verbose:
-            print("Label changes: %s" % self.labels.keys())
+            print("Label changes: %s" % list(self.labels.keys()))
 
     # Import p4 labels as git tags. A direct mapping does not
     # exist, so assume that if all the files are at the same revision
@@ -3375,7 +3378,7 @@ def getBranchMapping(self):
 
     def getBranchMappingFromGitBranches(self):
         branches = p4BranchesInGit(self.importIntoRemotes)
-        for branch in branches.keys():
+        for branch in list(branches.keys()):
             if branch == "master":
                 branch = "main"
             else:
@@ -3487,14 +3490,14 @@ def importChanges(self, changes, origin_revision=0):
             self.updateOptionDict(description)
 
             if not self.silent:
-                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
                 sys.stdout.flush()
             cnt = cnt + 1
 
             try:
                 if self.detectBranches:
                     branches = self.splitFilesIntoBranches(description)
-                    for branch in branches.keys():
+                    for branch in list(branches.keys()):
                         ## HACK  --hwn
                         branchPrefix = self.depotPaths[0] + branch + "/"
                         self.branchPrefixes = [ branchPrefix ]
@@ -3683,13 +3686,13 @@ def run(self, args):
                 if short in branches:
                     self.p4BranchesInGit = [ short ]
             else:
-                self.p4BranchesInGit = branches.keys()
+                self.p4BranchesInGit = list(branches.keys())
 
             if len(self.p4BranchesInGit) > 1:
                 if not self.silent:
                     print("Importing from/into multiple branches")
                 self.detectBranches = True
-                for branch in branches.keys():
+                for branch in list(branches.keys()):
                     self.initialParents[self.refPrefix + branch] = \
                         branches[branch]
 
@@ -4073,7 +4076,7 @@ def findLastP4Revision(self, starting_point):
             to find the P4 commit we are based on, and the depot-paths.
         """
 
-        for parent in (range(65535)):
+        for parent in (list(range(65535))):
             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
             settings = extractSettingsGitLog(log)
             if 'change' in settings:
@@ -4212,7 +4215,7 @@ def printUsage(commands):
 
 def main():
     if len(sys.argv[1:]) == 0:
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     cmdName = sys.argv[1]
@@ -4222,7 +4225,7 @@ def main():
     except KeyError:
         print("unknown command %s" % cmdName)
         print("")
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     options = cmd.options
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 10/15] git-p4: fix assumed path separators to be more Windows friendly
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (8 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 09/15] git-p4: python 3 syntax changes Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 11/15] git-p4: add Py23File() - helper class for stream writing Ben Keene via GitGitGadget
                           ` (5 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

When a computer is configured to use Git for windows and Python for
windows, and not a Unix subsystem like cygwin or WSL, the directory
separator changes and causes git-p4 to fail to properly determine paths.

Fix 3 path separator errors:

1. getUserCacheFilename() - should not use string concatenation. Change
   this code to use os.path.join to build an OS tolerant path.

2. defaultDestiantion used the OS.path.split to split depot paths.  This
   is incorrect on windows. Change the code to split on a forward
   slash(/) instead since depot paths use this character regardless  of
   the operating system.

3. The call to isValidGitDir() in the main code also used a literal
   forward slash. Change the code to use os.path.join to correctly
   format the path for the operating system.

These three changes allow the suggested windows configuration to
properly locate files while retaining the existing behavior on
non-windows operating systems.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index fc6c9406c2..1838045078 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1459,8 +1459,10 @@ def p4UserIsMe(self, p4User):
             return True
 
     def getUserCacheFilename(self):
+        """ Returns the filename of the username cache
+        """
         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
-        return home + "/.gitp4-usercache.txt"
+        return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
         if self.userMapFromPerforceServer:
@@ -3978,13 +3980,16 @@ def __init__(self):
         self.cloneBare = False
 
     def defaultDestination(self, args):
+        """ Returns the last path component as the default git
+            repository directory name
+        """
         ## TODO: use common prefix of args?
         depotPath = args[0]
         depotDir = re.sub("(@[^@]*)$", "", depotPath)
         depotDir = re.sub("(#[^#]*)$", "", depotDir)
         depotDir = re.sub(r"\.\.\.$", "", depotDir)
         depotDir = re.sub(r"/$", "", depotDir)
-        return os.path.split(depotDir)[1]
+        return depotDir.split('/')[-1]
 
     def run(self, args):
         if len(args) < 1:
@@ -4257,8 +4262,8 @@ def main():
                         chdir(cdup);
 
         if not isValidGitDir(cmd.gitdir):
-            if isValidGitDir(cmd.gitdir + "/.git"):
-                cmd.gitdir += "/.git"
+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
             else:
                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 11/15] git-p4: add Py23File() - helper class for stream writing
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (9 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 10/15] git-p4: fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 12/15] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
                           ` (4 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

This is a preparatory commit that does not change current behavior.
It adds a new class Py23File.

Following the Python recommendation of keeping text as unicode
internally and only converting to and from bytes on input and output,
this class provides an interface for the methods used for reading and
writing files and file like streams.

A new class was implemented to avoid requiring additional dependencies.

Create a class that wraps the input and output functions used by the
git-p4.py code for reading and writing to standard file handles.

The methods of this class should take a Unicode string for writing and
return unicode strings in reads.  This class should be a drop-in for
existing file like streams

The following methods should be coded for supporting existing read/write
calls:
  * write - this should write a Unicode string to the underlying stream
  * read  - this should read from the underlying stream and cast the
            bytes as a unicode string
  * readline - this should read one line of text from the underlying
            stream and cast it as a unicode string
  * readline - this should read a number of lines, optionally hinted,
            and cast each line as a unicode string

The expression "cast as a unicode string" is used because the code
should use the as_bytes() and as_string() functions instead of
cohercing the data to actual unicode strings or bytes.  This allows
Python 2 code to continue to use the internal "str" data type instead
of converting the data back and forth to actual unicode strings. This
retains current Python 2 support while Python 3 support may be
incomplete.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index 1838045078..03829f796d 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -4187,6 +4187,72 @@ def run(self, args):
             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
         return True
 
+class Py23File():
+    """ Python2/3 Unicode File Wrapper
+    """
+
+    stream_handle = None
+    verbose       = False
+    debug_handle  = None
+
+    def __init__(self, stream_handle, verbose = False):
+        """ Create a Python3 compliant Unicode to Byte String
+            Windows compatible wrapper
+
+            stream_handle = the underlying file-like handle
+            verbose       = Boolean if content should be echoed
+        """
+        self.stream_handle = stream_handle
+        self.verbose       = verbose
+
+    def write(self, utf8string):
+        """ Writes the utf8 encoded string to the underlying
+            file stream
+        """
+        self.stream_handle.write(as_bytes(utf8string))
+        if self.verbose:
+            sys.stderr.write("Stream Output: %s" % utf8string)
+            sys.stderr.flush()
+
+    def read(self, size = None):
+        """ Reads int charcters from the underlying stream
+            and converts it to utf8.
+
+            Be aware, the size value is for reading the underlying
+            bytes so the value may be incorrect. Usage of the size
+            value is discouraged.
+        """
+        if size == None:
+            return as_string(self.stream_handle.read())
+        else:
+            return as_string(self.stream_handle.read(size))
+
+    def readline(self):
+        """ Reads a line from the underlying byte stream
+            and converts it to utf8
+        """
+        return as_string(self.stream_handle.readline())
+
+    def readlines(self, sizeHint = None):
+        """ Returns a list containing lines from the file converted to unicode.
+
+            sizehint - Optional. If the optional sizehint argument is
+            present, instead of reading up to EOF, whole lines totalling
+            approximately sizehint bytes are read.
+        """
+        lines = self.stream_handle.readlines(sizeHint)
+        for i in range(0, len(lines)):
+            lines[i] = as_string(lines[i])
+        return lines
+
+    def close(self):
+        """ Closes the underlying byte stream """
+        self.stream_handle.close()
+
+    def flush(self):
+        """ Flushes the underlying byte stream """
+        self.stream_handle.flush()
+
 class HelpFormatter(optparse.IndentedHelpFormatter):
     def __init__(self):
         optparse.IndentedHelpFormatter.__init__(self)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 12/15] git-p4: p4CmdList - support Unicode encoding
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (10 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 11/15] git-p4: add Py23File() - helper class for stream writing Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 13/15] git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800) Ben Keene via GitGitGadget
                           ` (3 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The p4CmdList is a commonly used function in the git-p4 code. It is used
to execute a command in P4 and return the results of the call in a list.

The problem is that p4CmdList takes bytes as the parameter data and
returns bytes in the return list.

Add a new optional parameter to the signature, encode_cmd_output, that
determines if the dictionary values returned in the function output are
treated as bytes or as strings.

Change the code to conditionally pass the output data through the
as_string() function when encode_cmd_output is true. Otherwise the
function should return the data as bytes.

Change the code so that regardless of the setting of encode_cmd_output,
the dictionary keys in the return value will always be encoded with
as_string().

as_string(bytes) is a method defined in this project that treats the
byte data as a string. The word "string" is used because the meaning
varies depending on the version of Python:

  - Python 2: The "bytes" are returned as "str", functionally a No-op.
  - Python 3: The "bytes" are returned as a Unicode string.

The p4CmdList function returns a list of dictionaries that contain
the result of p4 command. If the callback (cb) is defined, the
standard output of the p4 command is redirected.

Data that is passed to the standard input of the P4 process should be
as_bytes() to avoid conversion unicode encoding errors.

as_bytes(text) is a method defined in this project that treats the text
data as a string that should be converted to a byte array (bytes). The
behavior of this function depends on the version of python:

  - Python 2: The "text" is returned as "str", functionally a No-op.
  - Python 3: The "text" is treated as a UTF-8 encoded Unicode string
        and is decoded to bytes.

Additionally, change literal text prior to conversion to be literal
bytes for the code that is evaluating the standard output from the
p4 call.

Add encode_cmd_output to the p4Cmd since this is a helper function that
wraps the behavior of p4CmdList.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 03829f796d..e8f31339e4 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -716,7 +716,23 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
-        errors_as_exceptions=False):
+        errors_as_exceptions=False, encode_cmd_output=True):
+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
+        standard input via a temporary file with 'stdin_mode' mode.
+
+        Output from the command is optionally passed to the callback function 'cb'.
+        If 'cb' is None, the response from the command is parsed into a list
+        of resulting dictionaries. (For each block read from the process pipe.)
+
+        If 'skip_info' is true, information in a block read that has a code type of
+        'info' will be skipped.
+
+        If 'errors_as_exceptions' is set to true (the default is false) the error
+        code returned from the execution will generate an exception.
+
+        If 'encode_cmd_output' is set to true (the default) the data that is returned
+        by this function will be passed through the "as_string" function.
+    """
 
     if not isinstance(cmd, list):
         cmd = "-G " + cmd
@@ -739,7 +755,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             stdin_file.write(stdin)
         else:
             for i in stdin:
-                stdin_file.write(i + '\n')
+                stdin_file.write(as_bytes(i) + b'\n')
         stdin_file.flush()
         stdin_file.seek(0)
 
@@ -753,12 +769,15 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         while True:
             entry = marshal.load(p4.stdout)
             if skip_info:
-                if 'code' in entry and entry['code'] == 'info':
+                if b'code' in entry and entry[b'code'] == b'info':
                     continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                out = {}
+                for key, value in entry.items():
+                    out[as_string(key)] = (as_string(value) if encode_cmd_output else value)
+                result.append(out)
     except EOFError:
         pass
     exitCode = p4.wait()
@@ -785,8 +804,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
 
     return result
 
-def p4Cmd(cmd):
-    list = p4CmdList(cmd)
+def p4Cmd(cmd, encode_cmd_output=True):
+    """Executes a P4 command and returns the results in a dictionary"""
+    list = p4CmdList(cmd, encode_cmd_output=encode_cmd_output)
     result = {}
     for entry in list:
         result.update(entry)
@@ -1165,7 +1185,7 @@ def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
        all the mappings, and return it."""
 
-    specList = p4CmdList("client -o")
+    specList = p4CmdList("client -o", encode_cmd_output=False)
     if len(specList) != 1:
         die('Output from "client -o" is %d lines, expecting 1' %
             len(specList))
@@ -2609,7 +2629,7 @@ def update_client_spec_path_cache(self, files):
         if len(fileArgs) == 0:
             return  # All files in cache
 
-        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs)
+        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs, encode_cmd_output=False)
         for res in where_result:
             if "code" in res and res["code"] == "error":
                 # assume error is "... file(s) not in client view"
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 13/15] git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800)
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (11 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 12/15] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 14/15] git-p4: added --encoding parameter to p4 clone Ben Keene via GitGitGadget
                           ` (2 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

NOTE: Python 3 is still not properly supported for any use with the
git-p4 python code.

Warning - this is a very large atomic commit.  The commit text is also
very large.

Change the code such that, with the exception of P4 depot paths and
depot files, all text read by git-p4 is cast as a string as soon as
possible and converted back to bytes as late as possible, following
Python 2 to Python 3 conversion best practices.

Important: Do not cast the bytes that contain the p4 depot path or p4
depot file name.  These should be left as bytes until used.

These two values should not be converted because the encoding of these
values is unknown.  git-p4 supports a configuration value
git-p4.pathEncoding that is used by the encodeWithUTF8() to determine
what a UTF8 version of the path and filename should be. However, since
depot path and depot filename need to be sent to P4 in their original
encoding, they will be left as byte streams until they are actually
used:

  * When sent to P4, the bytes are literally passed to the p4 command
  * When displayed in text for the user, they should be passed through
    the path_as_string() function
  * When used by GIT they should be passed through the encodeWithUTF8()
    function

Change all the rest of system calls to cast output from system calls
(stdin) as_bytes() and input (stdout) as_string().  This retains
existing Python 2 support, and adds python 3 support for these
functions:

 * read_pipe_full(c)
 * read_pipe_lines(c)
 * p4_has_move_command() - used internally
 * gitConfig(key, typeSpecifier=None)
 * branch_exists(branch)
 * GitLFS.generatePointer(cloneDestination, contentFile)
 * P4Submit.applyCommit(id) - template must be read and written to the
       temporary file as_bytes() since it is created in memory as a
       string.
 * P4Sync.streamOneP4File(file, contents) - wrap calls to the depotFile
       in path_as_string() for display. The file contents must be
       retained as bytes, so update the RCS changes to be forced to
       bytes.
 * P4Sync.streamP4Files(marshalled)
 * P4Sync.importHeadRevision(revision) - encode the depotPaths for
       display separate from the text for processing.

Py23File usage -

Change the P4Sync.OpenStreams() function to cast the gitOutput,
gitStream, and gitError streams as Py23File() wrapper classes.
This facilitates taking strings in both python 2 and python 3 and
casting them to bytes in the wrapper class instead of having to modify
each method. Since the fast-import command also expects a raw byte
stream for file content, add a new stream handle - gitStreamBytes which
is an unwrapped verison of gitStream.

Literal text -
Depending on context, most literal text does not need casting to unicode
or bytes as the text is Python dependent - In Python 2, the string is
implied as 'str' and python 3 the string is implied as 'unicode'. Under
these conditions, they match the rest of the operating text, following
best practices.  However, when a literal string is used in functions
that are dealing with the raw input from and raw ouput to files streams,
literal bytes may be required. Additionally, functions that are dealing
with P4 depot paths or P4 depot file names are also dealing with bytes
and will require the same casting as bytes.  The following functions
cast text as byte strings:

 * wildcard_decode(path) - the path parameter is a P4 depot and is
       bytes. Cast all the literals to bytes.
 * wildcard_encode(path) - the path parameter is a P4 depot and is
       bytes. Cast all the literals to bytes.
 * P4Sync.streamP4FilesCb(marshalled) - the marshalled data is in bytes.
       Cast the literals as bytes. When using this data to manipulate
       self.stream_file, encode all the marshalled data except for the
       'depotFile' name.
 * P4Sync.streamP4Files(marshalled)

Special behavior:

 * p4_describep4_describe(change, shelved=False) - encoding is disabled
       for the depotFile(x) and path elements since these are depot path
       and depo filenames.
 * p4PathStartsWith(path, prefix) - Since P4 depot paths can contain
       non-UTF-8 encoded strings, change this method to compare paths
       while supporting the optional encoding.

        - First, perform a byte-to-byte check to see if the path and
              prefix are both identical text.  There is no need to
              perform encoding conversions if the text is identical.
        - If the byte check fails, pass both the path and prefix through
              encodeWithUTF8() to ensure both paths are using the same
              encoding. Then perform the test as originally written.

 * P4Submit.patchRCSKeywords(file, pattern) - the parameters of file and
       pattern are both strings. However this function changes the
       contents of the file itentified by name "file". Treat the content
       of this file as binary to ensure that python does not accidently
       change the original encoding. The regular expression is cast
       as_bytes() and run against the file as_bytes(). The P4 keywords
       are ASCII strings and cannot span lines so iterating over each
       line of the file is acceptable.
 * P4Sync.writeToGitStream(gitMode, relPath, contents) - Since
       'contents' is already bytes data, instead of using the
       self.gitStream, use the new self.gitStreamBytes - the unwrapped
       gitStream that does not cast as_bytes() the binary data.
 * P4Sync.commit(details, files, branch, parent = "", allow_empty=False)
       Changed the encoding for the commit message to the preferred
       format for fast-import. The number of bytes is sent in the data
       block instead of using the EOT marker.

 * Change the code for handling the user cache to use binary files.
       Cast text as_bytes() when writing to the cache and as_string()
       when reading from the cache.  This makes the reading and writing
       of the cache determinstic in it's encoding. Unlike file paths,
       P4 encodes the user names in UTF-8 encoding so no additional
       string encoding is required.

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 285 ++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 203 insertions(+), 82 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index e8f31339e4..9cf4e94e28 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -273,6 +273,8 @@ def read_pipe_full(c):
     expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = as_string(out)
+    err = as_string(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -299,10 +301,17 @@ def read_pipe_text(c):
         return out.rstrip()
 
 def p4_read_pipe(c, ignore_error=False):
+    """ Read output from the P4 command 'c'. Returns the output text on
+        success. On failure, terminates execution, unless
+        ignore_error is True, when it returns an empty string.
+    """
     real_cmd = p4_build_cmd(c)
     return read_pipe(real_cmd, ignore_error)
 
 def read_pipe_lines(c):
+    """ Returns a list of text from executing the command 'c'.
+        The program will die if the command fails to execute.
+    """
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
@@ -312,6 +321,11 @@ def read_pipe_lines(c):
     val = pipe.readlines()
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
+    # Unicode conversion from byte-string
+    # Iterate and fix in-place to avoid a second list in memory.
+    if isunicode:
+        for i in range(len(val)):
+            val[i] = as_string(val[i])
 
     return val
 
@@ -340,6 +354,8 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    out=as_string(out)
+    err=as_string(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -467,16 +483,20 @@ def p4_last_change():
     return int(results[0]['change'])
 
 def p4_describe(change, shelved=False):
-    """Make sure it returns a valid result by checking for
-       the presence of field "time".  Return a dict of the
-       results."""
+    """ Returns information about the requested P4 change list.
+
+        Data returned is not string encoded (returned as bytes)
+    """
+    # Make sure it returns a valid result by checking for
+    #   the presence of field "time".  Return a dict of the
+    #   results.
 
     cmd = ["describe", "-s"]
     if shelved:
         cmd += ["-S"]
     cmd += [str(change)]
 
-    ds = p4CmdList(cmd, skip_info=True)
+    ds = p4CmdList(cmd, skip_info=True, encode_cmd_output=False)
     if len(ds) != 1:
         die("p4 describe -s %d did not return 1 result: %s" % (change, str(ds)))
 
@@ -486,12 +506,23 @@ def p4_describe(change, shelved=False):
         die("p4 describe -s %d exited with %d: %s" % (change, d["p4ExitCode"],
                                                       str(d)))
     if "code" in d:
-        if d["code"] == "error":
+        if d["code"] == b"error":
             die("p4 describe -s %d returned error code: %s" % (change, str(d)))
 
     if "time" not in d:
         die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
 
+    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however
+    # cast as_string() the rest of the text.
+    keys=d.keys()
+    for key in keys:
+        if key.startswith('depotFile'):
+            d[key]=d[key]
+        elif key == 'path':
+            d[key]=d[key]
+        else:
+            d[key] = as_string(d[key])
+
     return d
 
 #
@@ -914,13 +945,15 @@ def gitDeleteRef(ref):
 _gitConfig = {}
 
 def gitConfig(key, typeSpecifier=None):
+    """ Return a configuration setting from GIT
+	"""
     if key not in _gitConfig:
         cmd = [ "git", "config" ]
         if typeSpecifier:
             cmd += [ typeSpecifier ]
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
-        _gitConfig[key] = s.strip()
+        _gitConfig[key] = as_string(s).strip()
     return _gitConfig[key]
 
 def gitConfigBool(key):
@@ -994,6 +1027,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = as_string(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
@@ -1177,9 +1211,22 @@ def p4PathStartsWith(path, prefix):
     #
     # we may or may not have a problem. If you have core.ignorecase=true,
     # we treat DirA and dira as the same directory
+
+    # Since we have to deal with mixed encodings for p4 file
+    # paths, first perform a simple startswith check, this covers
+    # the case that the formats and path are identical.
+    if as_bytes(path).startswith(as_bytes(prefix)):
+        return True
+
+    # attempt to convert the prefix and path both to utf8
+    path_utf8 = encodeWithUTF8(path)
+    prefix_utf8 = encodeWithUTF8(prefix)
+
     if gitConfigBool("core.ignorecase"):
-        return path.lower().startswith(prefix.lower())
-    return path.startswith(prefix)
+        # Check if we match byte-per-byte.
+
+        return path_utf8.lower().startswith(prefix_utf8.lower())
+    return path_utf8.startswith(prefix_utf8)
 
 def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
@@ -1235,18 +1282,24 @@ def wildcard_decode(path):
     # Cannot have * in a filename in windows; untested as to
     # what p4 would do in such a case.
     if not platform.system() == "Windows":
-        path = path.replace("%2A", "*")
-    path = path.replace("%23", "#") \
-               .replace("%40", "@") \
-               .replace("%25", "%")
+        path = path.replace(b"%2A", b"*")
+    path = path.replace(b"%23", b"#") \
+               .replace(b"%40", b"@") \
+               .replace(b"%25", b"%")
     return path
 
 def wildcard_encode(path):
     # do % first to avoid double-encoding the %s introduced here
-    path = path.replace("%", "%25") \
-               .replace("*", "%2A") \
-               .replace("#", "%23") \
-               .replace("@", "%40")
+    if isinstance(path, unicode):
+        path = path.replace("%", "%25") \
+                   .replace("*", "%2A") \
+                   .replace("#", "%23") \
+                   .replace("@", "%40")
+    else:
+        path = path.replace(b"%", b"%25") \
+                   .replace(b"*", b"%2A") \
+                   .replace(b"#", b"%23") \
+                   .replace(b"@", b"%40")
     return path
 
 def wildcard_present(path):
@@ -1378,7 +1431,7 @@ def generatePointer(self, contentFile):
             ['git', 'lfs', 'pointer', '--file=' + contentFile],
             stdout=subprocess.PIPE
         )
-        pointerFile = pointerProcess.stdout.read()
+        pointerFile = as_string(pointerProcess.stdout.read())
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
@@ -1485,6 +1538,8 @@ def getUserCacheFilename(self):
         return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
+        """ Creates the usercache from the data in P4.
+        """
         if self.userMapFromPerforceServer:
             return
         self.users = {}
@@ -1510,18 +1565,22 @@ def getUserMapFromPerforceServer(self):
         for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
-        open(self.getUserCacheFilename(), "wb").write(s)
+        cache = io.open(self.getUserCacheFilename(), "wb")
+        cache.write(as_bytes(s))
+        cache.close()
         self.userMapFromPerforceServer = True
 
     def loadUserMapFromCache(self):
+        """ Reads the P4 username to git email map
+        """
         self.users = {}
         self.userMapFromPerforceServer = False
         try:
-            cache = open(self.getUserCacheFilename(), "rb")
+            cache = io.open(self.getUserCacheFilename(), "rb")
             lines = cache.readlines()
             cache.close()
             for line in lines:
-                entry = line.strip().split("\t")
+                entry = as_string(line).strip().split("\t")
                 self.users[entry[0]] = entry[1]
         except IOError:
             self.getUserMapFromPerforceServer()
@@ -1721,21 +1780,27 @@ def prepareLogMessage(self, template, message, jobs):
         return result
 
     def patchRCSKeywords(self, file, pattern):
-        # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
+        """ Attempt to zap the RCS keywords in a p4
+            controlled file matching the given pattern
+        """
+        bSubLine = as_bytes(r'$\1$')
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
-            outFile = os.fdopen(handle, "w+")
-            inFile = open(file, "r")
-            regexp = re.compile(pattern, re.VERBOSE)
+            outFile = os.fdopen(handle, "w+b")
+            inFile = open(file, "rb")
+            regexp = re.compile(as_bytes(pattern), re.VERBOSE)
             for line in inFile.readlines():
-                line = regexp.sub(r'$\1$', line)
+                line = regexp.sub(bSubLine, line)
                 outFile.write(line)
             inFile.close()
             outFile.close()
+            outFile = None
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
         except:
+            if outFile != None:
+                outFile.close()
             # cleanup our temporary file
             os.unlink(outFileName)
             print("Failed to strip RCS keywords in %s" % file)
@@ -2139,7 +2204,7 @@ def applyCommit(self, id):
         tmpFile = os.fdopen(handle, "w+b")
         if self.isWindows:
             submitTemplate = submitTemplate.replace("\n", "\r\n")
-        tmpFile.write(submitTemplate)
+        tmpFile.write(as_bytes(submitTemplate))
         tmpFile.close()
 
         if self.prepare_p4_only:
@@ -2189,8 +2254,8 @@ def applyCommit(self, id):
                 message = tmpFile.read()
                 tmpFile.close()
                 if self.isWindows:
-                    message = message.replace("\r\n", "\n")
-                submitTemplate = message[:message.index(separatorLine)]
+                    message = message.replace(b"\r\n", b"\n")
+                submitTemplate = message[:message.index(as_bytes(separatorLine))]
 
                 if update_shelve:
                     p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
@@ -2833,8 +2898,11 @@ def stripRepoPath(self, path, prefixes):
         return path
 
     def splitFilesIntoBranches(self, commit):
-        """Look at each depotFile in the commit to figure out to what
-           branch it belongs."""
+        """ Look at each depotFile in the commit to figure out to what
+            branch it belongs.
+
+            Data in the commit will NOT be encoded
+        """
 
         if self.clientSpecDirs:
             files = self.extractFilesFromCommit(commit)
@@ -2875,16 +2943,22 @@ def splitFilesIntoBranches(self, commit):
         return branches
 
     def writeToGitStream(self, gitMode, relPath, contents):
-        self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+        """ Writes the bytes[] 'contents' to the git fast-import
+            with the given 'gitMode' and 'relPath' as the relative
+            path.
+        """
+        self.gitStream.write('M %s inline %s\n' % (gitMode, as_string(relPath)))
         self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
         for d in contents:
-            self.gitStream.write(d)
+            self.gitStreamBytes.write(d)
         self.gitStream.write('\n')
 
-    # output one file from the P4 stream
-    # - helper for streamP4Files
-
     def streamOneP4File(self, file, contents):
+        """ output one file from the P4 stream to the git inbound stream.
+            helper for streamP4files.
+
+            contents should be a bytes (bytes)
+        """
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
         relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
@@ -2892,7 +2966,7 @@ def streamOneP4File(self, file, contents):
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2910,7 +2984,7 @@ def streamOneP4File(self, file, contents):
                 # to nothing.  This causes p4 errors when checking out such
                 # a change, and errors here too.  Work around it by ignoring
                 # the bad symlink; hopefully a future change fixes it.
-                print("\nIgnoring empty symlink in %s" % file['depotFile'])
+                print("\nIgnoring empty symlink in %s" % path_as_string(file['depotFile']))
                 return
             elif data[-1] == '\n':
                 contents = [data[:-1]]
@@ -2950,16 +3024,16 @@ def streamOneP4File(self, file, contents):
             # Ideally, someday, this script can learn how to generate
             # appledouble files directly and import those to git, but
             # non-mac machines can never find a use for apple filetype.
-            print("\nIgnoring apple filetype file %s" % file['depotFile'])
+            print("\nIgnoring apple filetype file %s" % path_as_string(file['depotFile']))
             return
 
         # Note that we do not try to de-mangle keywords on utf16 files,
         # even though in theory somebody may want that.
-        pattern = p4_keywords_regexp_for_type(type_base, type_mods)
+        pattern = as_bytes(p4_keywords_regexp_for_type(type_base, type_mods))
         if pattern:
             regexp = re.compile(pattern, re.VERBOSE)
-            text = ''.join(contents)
-            text = regexp.sub(r'$\1$', text)
+            text = b''.join(contents)
+            text = regexp.sub(as_bytes(r'$\1$'), text)
             contents = [ text ]
 
         if self.largeFileSystem:
@@ -2978,15 +3052,19 @@ def streamOneP4Deletion(self, file):
         if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
             self.largeFileSystem.removeLargeFile(relPath)
 
-    # handle another chunk of streaming data
     def streamP4FilesCb(self, marshalled):
+        """ Callback function for recording P4 chunks of data for streaming
+            into GIT.
+
+            marshalled data is bytes[] from the caller
+        """
 
         # catch p4 errors and complain
         err = None
-        if "code" in marshalled:
-            if marshalled["code"] == "error":
-                if "data" in marshalled:
-                    err = marshalled["data"].rstrip()
+        if b"code" in marshalled:
+            if marshalled[b"code"] == b"error":
+                if b"data" in marshalled:
+                    err = marshalled[b"data"].rstrip()
 
         if not err and 'fileSize' in self.stream_file:
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
@@ -3008,11 +3086,11 @@ def streamP4FilesCb(self, marshalled):
             # ignore errors, but make sure it exits first
             self.importProcess.wait()
             if f:
-                die("Error from p4 print for %s: %s" % (f, err))
+                die("Error from p4 print for %s: %s" % (path_as_string(f), err))
             else:
                 die("Error from p4 print: %s" % err)
 
-        if 'depotFile' in marshalled and self.stream_have_file_info:
+        if b'depotFile' in marshalled and self.stream_have_file_info:
             # start of a new file - output the old one first
             self.streamOneP4File(self.stream_file, self.stream_contents)
             self.stream_file = {}
@@ -3022,13 +3100,16 @@ def streamP4FilesCb(self, marshalled):
         # pick up the new file information... for the
         # 'data' field we need to append to our array
         for k in list(marshalled.keys()):
-            if k == 'data':
+            if k == b'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
-                self.stream_file['streamContentSize'] += len(marshalled['data'])
-                self.stream_contents.append(marshalled['data'])
+                self.stream_file['streamContentSize'] += len(marshalled[b'data'])
+                self.stream_contents.append(marshalled[b'data'])
             else:
-                self.stream_file[k] = marshalled[k]
+                if k == b'depotFile':
+                    self.stream_file[as_string(k)] = marshalled[k]
+                else:
+                    self.stream_file[as_string(k)] = as_string(marshalled[k])
 
         if (verbose and
             'streamContentSize' in self.stream_file and
@@ -3037,13 +3118,14 @@ def streamP4FilesCb(self, marshalled):
             size = int(self.stream_file["fileSize"])
             if size > 0:
                 progress = 100.0*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
 
-    # Stream directly from "p4 files" into "git fast-import"
     def streamP4Files(self, files):
+        """ Stream directly from "p4 files" into "git fast-import"
+        """
         filesForCommit = []
         filesToRead = []
         filesToDelete = []
@@ -3064,7 +3146,7 @@ def streamP4Files(self, files):
             self.stream_contents = []
             self.stream_have_file_info = False
 
-            # curry self argument
+            # Callback for P4 command to collect file content
             def streamP4FilesCbSelf(entry):
                 self.streamP4FilesCb(entry)
 
@@ -3073,9 +3155,9 @@ def streamP4FilesCbSelf(entry):
                 if 'shelved_cl' in f:
                     # Handle shelved CLs using the "p4 print file@=N" syntax to print
                     # the contents
-                    fileArg = '%s@=%d' % (f['path'], f['shelved_cl'])
+                    fileArg = b'%s@=%d' % (f['path'], as_bytes(f['shelved_cl']))
                 else:
-                    fileArg = '%s#%s' % (f['path'], f['rev'])
+                    fileArg = b'%s#%s' % (f['path'], as_bytes(f['rev']))
 
                 fileArgs.append(fileArg)
 
@@ -3095,7 +3177,7 @@ def make_email(self, userid):
 
     def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
         """ Stream a p4 tag.
-        commit is either a git commit, or a fast-import mark, ":<p4commit>"
+            commit is either a git commit, or a fast-import mark, ":<p4commit>"
         """
 
         if verbose:
@@ -3167,7 +3249,22 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
                 .format(details['change']))
             return
 
+        # fast-import:
+        #'commit' SP <ref> LF
+	    #mark?
+	    #original-oid?
+	    #('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
+	    #'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+	    #('encoding' SP <encoding>)?
+	    #data
+	    #('from' SP <commit-ish> LF)?
+	    #('merge' SP <commit-ish> LF)*
+	    #(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
+	    #LF?
+
+        #'commit' - <ref> is the name of the branch to make the commit on
         self.gitStream.write("commit %s\n" % branch)
+        #'mark' SP :<idnum>
         self.gitStream.write("mark :%s\n" % details["change"])
         self.committedChanges.add(int(details["change"]))
         committer = ""
@@ -3177,19 +3274,29 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
 
         self.gitStream.write("committer %s\n" % committer)
 
-        self.gitStream.write("data <<EOT\n")
-        self.gitStream.write(details["desc"])
+        # Per https://git-scm.com/docs/git-fast-import
+        # The preferred method for creating the commit message is to supply the
+        # byte count in the data method and not to use a Delimited format.
+        # Collect all the text in the commit message into a single string and
+        # compute the byte count.
+        commitText = details["desc"]
         if len(jobs) > 0:
-            self.gitStream.write("\nJobs: %s" % (' '.join(jobs)))
-
+            commitText += "\nJobs: %s" % (' '.join(jobs))
         if not self.suppress_meta_comment:
-            self.gitStream.write("\n[git-p4: depot-paths = \"%s\": change = %s" %
-                                (','.join(self.branchPrefixes), details["change"]))
-            if len(details['options']) > 0:
-                self.gitStream.write(": options = %s" % details['options'])
-            self.gitStream.write("]\n")
+            # coherce the path to the correct formatting in the branch prefixes as well.
+            dispPaths = []
+            for p in self.branchPrefixes:
+                dispPaths += [path_as_string(p)]
 
-        self.gitStream.write("EOT\n\n")
+            commitText += ("\n[git-p4: depot-paths = \"%s\": change = %s" %
+                                (','.join(dispPaths), details["change"]))
+            if len(details['options']) > 0:
+                commitText += (": options = %s" % details['options'])
+            commitText += "]"
+        commitText += "\n"
+        self.gitStream.write("data %s\n" % len(as_bytes(commitText)))
+        self.gitStream.write(commitText)
+        self.gitStream.write("\n")
 
         if len(parent) > 0:
             if self.verbose:
@@ -3596,30 +3703,35 @@ def sync_origin_only(self):
                 system("git fetch origin")
 
     def importHeadRevision(self, revision):
-        print("Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch))
-
+        # Re-encode depot text
+        dispPaths = []
+        utf8Paths = []
+        for p in self.depotPaths:
+            dispPaths += [path_as_string(p)]
+        print("Doing initial import of %s from revision %s into %s" % (' '.join(dispPaths), revision, self.branch))
         details = {}
         details["user"] = "git perforce import user"
-        details["desc"] = ("Initial import of %s from the state at revision %s\n"
-                           % (' '.join(self.depotPaths), revision))
+        details["desc"] = ("Initial import of %s from the state at revision %s\n" %
+                           (' '.join(dispPaths), revision))
         details["change"] = revision
         newestRevision = 0
+        del dispPaths
 
         fileCnt = 0
         fileArgs = ["%s...%s" % (p,revision) for p in self.depotPaths]
 
-        for info in p4CmdList(["files"] + fileArgs):
+        for info in p4CmdList(["files"] + fileArgs, encode_cmd_output=False):
 
-            if 'code' in info and info['code'] == 'error':
+            if 'code' in info and info['code'] == b'error':
                 sys.stderr.write("p4 returned an error: %s\n"
-                                 % info['data'])
-                if info['data'].find("must refer to client") >= 0:
+                                 % as_string(info['data']))
+                if info['data'].find(b"must refer to client") >= 0:
                     sys.stderr.write("This particular p4 error is misleading.\n")
                     sys.stderr.write("Perhaps the depot path was misspelled.\n");
                     sys.stderr.write("Depot path:  %s\n" % " ".join(self.depotPaths))
                 sys.exit(1)
             if 'p4ExitCode' in info:
-                sys.stderr.write("p4 exitcode: %s\n" % info['p4ExitCode'])
+                sys.stderr.write("p4 exitcode: %s\n" % as_string(info['p4ExitCode']))
                 sys.exit(1)
 
 
@@ -3632,8 +3744,10 @@ def importHeadRevision(self, revision):
                 #fileCnt = fileCnt + 1
                 continue
 
+            # Save all the file information, howerver do not translate the depotFile name at
+            # this time. Leave that as bytes since the encoding may vary.
             for prop in ["depotFile", "rev", "action", "type" ]:
-                details["%s%s" % (prop, fileCnt)] = info[prop]
+                details["%s%s" % (prop, fileCnt)] = (info[prop] if prop == "depotFile" else as_string(info[prop]))
 
             fileCnt = fileCnt + 1
 
@@ -3653,13 +3767,18 @@ def importHeadRevision(self, revision):
             print(self.gitError.read())
 
     def openStreams(self):
+        """ Opens the fast import pipes.  Note that the git* streams are wrapped
+            to expect Unicode text.  To send a raw byte Array, use the importProcess
+            underlying port
+        """
         self.importProcess = subprocess.Popen(["git", "fast-import"],
                                               stdin=subprocess.PIPE,
                                               stdout=subprocess.PIPE,
                                               stderr=subprocess.PIPE);
-        self.gitOutput = self.importProcess.stdout
-        self.gitStream = self.importProcess.stdin
-        self.gitError = self.importProcess.stderr
+        self.gitOutput = Py23File(self.importProcess.stdout, verbose = self.verbose)
+        self.gitStream = Py23File(self.importProcess.stdin, verbose = self.verbose)
+        self.gitError = Py23File(self.importProcess.stderr, verbose = self.verbose)
+        self.gitStreamBytes = self.importProcess.stdin
 
     def closeStreams(self):
         self.gitStream.close()
@@ -4025,15 +4144,17 @@ def run(self, args):
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
 
+        dispPaths = []
         for p in depotPaths:
             if not p.startswith("//"):
                 sys.stderr.write('Depot paths must start with "//": %s\n' % p)
                 return False
+            dispPaths += [path_as_string(p)]
 
         if not self.cloneDestination:
             self.cloneDestination = self.defaultDestination(args)
 
-        print("Importing from %s into %s" % (', '.join(depotPaths), self.cloneDestination))
+        print("Importing from %s into %s" % (', '.join(dispPaths), path_as_string(self.cloneDestination)))
 
         if not os.path.exists(self.cloneDestination):
             os.makedirs(self.cloneDestination)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 14/15] git-p4: added --encoding parameter to p4 clone
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (12 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 13/15] git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800) Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 17:47         ` [PATCH v5 15/15] git-p4: Add depot manipulation functions Ben Keene via GitGitGadget
  2019-12-07 19:47         ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 Jeff King
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The test t9822 did not have any tests that had encoded a directory name
in ISO8859-1.

Additionally, to make it easier for the user to clone new repositories
with a non-UTF-8 encoded path in P4, add a new parameter to p4clone
"--encoding" that sets the

Add new tests that use ISO8859-1 encoded text in both the directory and
file names.  git-p4.pathEncoding.

Update the View class in the git-p4 code to properly cast text
as_string() except for depot path and filenames.

Update the documentation to include the new command line parameter for
p4clone

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 Documentation/git-p4.txt        |   5 ++
 git-p4.py                       |  57 +++++++++++++-----
 t/t9822-git-p4-path-encoding.sh | 101 ++++++++++++++++++++++++++++++++
 3 files changed, 147 insertions(+), 16 deletions(-)

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 3494a1db3e..8fb844fc49 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -305,6 +305,11 @@ options described above.
 --bare::
 	Perform a bare clone.  See linkgit:git-clone[1].
 
+--encoding <encoding>::
+    Optionally sets the git-p4.pathEncoding configuration value in
+	the newly created Git repository before files are synchronized
+	from P4. See git-p4.pathEncoding for more information.
+
 Submit options
 ~~~~~~~~~~~~~~
 These options can be used to modify 'git p4 submit' behavior.
diff --git a/git-p4.py b/git-p4.py
index 9cf4e94e28..16f29aae41 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1241,7 +1241,7 @@ def getClientSpec():
     entry = specList[0]
 
     # the //client/ name
-    client_name = entry["Client"]
+    client_name = as_string(entry["Client"])
 
     # just the keys that start with "View"
     view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
@@ -2625,19 +2625,25 @@ def run(self, args):
         return True
 
 class View(object):
-    """Represent a p4 view ("p4 help views"), and map files in a
-       repo according to the view."""
+    """ Represent a p4 view ("p4 help views"), and map files in a
+        repo according to the view.
+    """
 
     def __init__(self, client_name):
         self.mappings = []
-        self.client_prefix = "//%s/" % client_name
+        # the client prefix is saved in bytes as it is used for comparison
+        # against server data.
+        self.client_prefix = as_bytes("//%s/" % client_name)
         # cache results of "p4 where" to lookup client file locations
         self.client_spec_path_cache = {}
 
     def append(self, view_line):
-        """Parse a view line, splitting it into depot and client
-           sides.  Append to self.mappings, preserving order.  This
-           is only needed for tag creation."""
+        """ Parse a view line, splitting it into depot and client
+            sides.  Append to self.mappings, preserving order.  This
+            is only needed for tag creation.
+
+            view_line should be in bytes (depot path encoding)
+        """
 
         # Split the view line into exactly two words.  P4 enforces
         # structure on these lines that simplifies this quite a bit.
@@ -2650,28 +2656,28 @@ def append(self, view_line):
         # The line is already white-space stripped.
         # The two words are separated by a single space.
         #
-        if view_line[0] == '"':
+        if view_line[0] == b'"':
             # First word is double quoted.  Find its end.
-            close_quote_index = view_line.find('"', 1)
+            close_quote_index = view_line.find(b'"', 1)
             if close_quote_index <= 0:
-                die("No first-word closing quote found: %s" % view_line)
+                die("No first-word closing quote found: %s" % path_as_string(view_line))
             depot_side = view_line[1:close_quote_index]
             # skip closing quote and space
             rhs_index = close_quote_index + 1 + 1
         else:
-            space_index = view_line.find(" ")
+            space_index = view_line.find(b" ")
             if space_index <= 0:
-                die("No word-splitting space found: %s" % view_line)
+                die("No word-splitting space found: %s" % path_as_string(view_line))
             depot_side = view_line[0:space_index]
             rhs_index = space_index + 1
 
         # prefix + means overlay on previous mapping
-        if depot_side.startswith("+"):
+        if depot_side.startswith(b"+"):
             depot_side = depot_side[1:]
 
         # prefix - means exclude this path, leave out of mappings
         exclude = False
-        if depot_side.startswith("-"):
+        if depot_side.startswith(b"-"):
             exclude = True
             depot_side = depot_side[1:]
 
@@ -2682,7 +2688,7 @@ def convert_client_path(self, clientFile):
         # chop off //client/ part to make it relative
         if not clientFile.startswith(self.client_prefix):
             die("No prefix '%s' on clientFile '%s'" %
-                (self.client_prefix, clientFile))
+                (as_string(self.client_prefix)), path_as_string(clientFile))
         return clientFile[len(self.client_prefix):]
 
     def update_client_spec_path_cache(self, files):
@@ -2696,7 +2702,7 @@ def update_client_spec_path_cache(self, files):
 
         where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs, encode_cmd_output=False)
         for res in where_result:
-            if "code" in res and res["code"] == "error":
+            if "code" in res and res["code"] == b"error":
                 # assume error is "... file(s) not in client view"
                 continue
             if "clientFile" not in res:
@@ -4113,10 +4119,14 @@ def __init__(self):
                                  help="where to leave result of the clone"),
             optparse.make_option("--bare", dest="cloneBare",
                                  action="store_true", default=False),
+            optparse.make_option("--encoding", dest="setPathEncoding",
+                                 action="store", default=None,
+                                 help="Sets the path encoding for this depot")
         ]
         self.cloneDestination = None
         self.needsGit = False
         self.cloneBare = False
+        self.setPathEncoding = None
 
     def defaultDestination(self, args):
         """ Returns the last path component as the default git
@@ -4140,6 +4150,14 @@ def run(self, args):
 
         depotPaths = args
 
+        # If we have an encoding provided, ignore what may already exist
+        # in the registry. This will ensure we show the displayed values
+        # using the correct encoding.
+        if self.setPathEncoding:
+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
+
+        # If more than 1 path element is supplied, the last element
+        # is the clone destination.
         if not self.cloneDestination and len(depotPaths) > 1:
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
@@ -4167,6 +4185,13 @@ def run(self, args):
         if retcode:
             raise CalledProcessError(retcode, init_cmd)
 
+        # Set the encoding if it was provided command line
+        if self.setPathEncoding:
+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
+            retcode = subprocess.call(init_cmd)
+            if retcode:
+                raise CalledProcessError(retcode, init_cmd)
+
         if not P4Sync.run(self, depotPaths):
             return False
 
diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
index 572d395498..8d3fe6c5d1 100755
--- a/t/t9822-git-p4-path-encoding.sh
+++ b/t/t9822-git-p4-path-encoding.sh
@@ -4,9 +4,20 @@ test_description='Clone repositories with non ASCII paths'
 
 . ./lib-git-p4.sh
 
+# lowercase filename
+# UTF8    - HEX:   a-\xc3\xa4_o-\xc3\xb6_u-\xc3\xbc
+#         - octal: a-\303\244_o-\303\266_u-\303\274
+# ISO8859 - HEX:   a-\xe4_o-\xf6_u-\xfc
 UTF8_ESCAPED="a-\303\244_o-\303\266_u-\303\274.txt"
 ISO8859_ESCAPED="a-\344_o-\366_u-\374.txt"
 
+# lowercase directory
+# UTF8    - HEX:   dir_a-\xc3\xa4_o-\xc3\xb6_u-\xc3\xbc
+# ISO8859 - HEX:   dir_a-\xe4_o-\xf6_u-\xfc
+DIR_UTF8_ESCAPED="dir_a-\303\244_o-\303\266_u-\303\274"
+DIR_ISO8859_ESCAPED="dir_a-\344_o-\366_u-\374"
+
+
 ISO8859="$(printf "$ISO8859_ESCAPED")" &&
 echo content123 >"$ISO8859" &&
 rm "$ISO8859" || {
@@ -58,6 +69,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p
 	)
 '
 
+test_expect_success 'Clone repo containing iso8859-1 encoded paths with using --encoding parameter' '
+	test_when_finished cleanup_git &&
+	(
+		git p4 clone --encoding iso8859 --destination="$git" //depot &&
+		cd "$git" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'Delete iso8859-1 encoded paths and clone' '
 	(
 		cd "$cli" &&
@@ -74,4 +101,78 @@ test_expect_success 'Delete iso8859-1 encoded paths and clone' '
 	)
 '
 
+# These tests will create a directory with ISO8859-1 characters in both the 
+# directory and the path.  Since it is possible to clone a path instead of using
+# the whole client-spec.  Check both versions:  client-spec and with a direct
+# path using --encoding
+test_expect_success 'Create a repo containing iso8859-1 encoded directory and filename' '
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		ISO8859="$(printf "$ISO8859_ESCAPED")" &&
+		cd "$cli" &&
+		mkdir "$DIR_ISO8859" &&
+		cd "$DIR_ISO8859" &&
+		echo content123 >"$ISO8859" &&
+		p4 add "$ISO8859" &&
+		p4 submit -d "test commit (encoded directory)"
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with git-p4.pathEncoding' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		DIR_UTF8="$(printf "$DIR_UTF8_ESCAPED")" &&
+		cd "$git" &&
+		git init . &&
+		git config git-p4.pathEncoding iso8859-1 &&
+		git p4 clone --use-client-spec --destination="$git" "//depot/$DIR_ISO8859" &&
+		cd "$DIR_UTF8" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with git-p4.pathEncoding, without --use-client-spec' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		cd "$git" &&
+		git init . &&
+		git config git-p4.pathEncoding iso8859-1 &&
+		git p4 clone --destination="$git" "//depot/$DIR_ISO8859" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with using --encoding parameter' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		git p4 clone --encoding iso8859 --destination="$git" "//depot/$DIR_ISO8859" &&
+		cd "$git" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v5 15/15] git-p4: Add depot manipulation functions
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (13 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 14/15] git-p4: added --encoding parameter to p4 clone Ben Keene via GitGitGadget
@ 2019-12-07 17:47         ` Ben Keene via GitGitGadget
  2019-12-07 19:47         ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 Jeff King
  15 siblings, 0 replies; 64+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-07 17:47 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Since the Depot paths and filenames are encoded according to P4, we need
to track them in bytes but also have to decode them with different
encodings (either ASCII or the encoding configured in pathEncoding,
which defaults to UTF-8)

Add the following functions to support future code conversion actions.

 * depot_count_depth         - counts the number of directories in the
       path
 * depot_remove_leading_path - removes (n) directories from the front
       of the depot path.
 * depot_Remove_p4_wildcard  - removes "/..." from the end of the path
 * depot_encode_utf8         - converts the path from the native
       encoding to utf8 encoding.  Returns (depot_path, did_decode)
 * depot_encode_restore      - restores the original encoding of the
       path.

Signed-off-by: Ben Keene <seraphire@gmail.com>

---
This code block could use review for the depot_encode_* functions.

Should this code return an absolute Unicode string or a byte array.
---
 git-p4.py | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index 16f29aae41..f82f05632c 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -724,6 +724,99 @@ def encodeWithUTF8(path, verbose=False):
                 print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
     return path
 
+
+def depot_count_depth(depot_path):
+    """Counts the number of directories found
+    in the depot_path. Paths will be decoded 
+    with encodeWithUTF8 to ensure that depot
+    encoding is repected.
+
+    Example:
+        //depot         = 1
+        //depot/        = 1
+        //depot/dir     = 2
+    """
+    depot_path=encodeWithUTF8(depot_path)
+    if not depot_path.endswith(b"/"):
+        depot_path+=b"/"
+    return depot_path.count(b"/") - 2
+
+def depot_remove_leading_path(depot_path, depth):
+    """Remove depth number of directories from 
+    the beginning of the depot_path. This will
+    be returned in the original encoding.
+    The leading "//" does not count as a directory
+    and will be automatically stripped.
+
+    depot_path should be in bytes
+
+    Example:
+    Given a depot_path of: //depot/main/file.txt
+    depth: 0        - depot/main/file.txt
+    depth: 1        - main/file.txt
+    depth: 2        - file.txt
+    depth: 3        - (empty string)
+    """
+
+    # First, decode the path
+    [depot_path, did_decode] = depot_encode_utf8(depot_path)
+
+    #remove leading //
+    if depot_path.startswith(b"//"):
+        depot_path=depot_path[2:]
+    if depth != 0:
+        segments=depot_path.split(b"/")
+        segments=segments[depth:]
+        depot_path=b"/".join(segments)
+
+    if did_decode:
+        depot_path = depot_encode_restore(depot_path)
+
+    return depot_path
+
+def depot_remove_p4_wildcard(depot_path):
+    """Removes the "/..." from the end of depot
+    path.
+
+    depot_path must be bytes. Bytes are returned.
+    """
+    # First, decode the path
+    [path, did_decode] = depot_encode_utf8(depot_path)
+    
+    if not path.endswith(b"/..."):
+        return depot_path
+    path=path[:-4]
+
+    if did_decode:
+        path = depot_encode_restore(path)
+
+    return path
+
+def depot_encode_utf8(depot_path):
+    """conditionally encodes depot_path
+    in utf8 using the defined pathEncoding.
+
+    Returns a (depot_path, was_encoded)"""
+    did_decode=False
+    encoding = 'utf8'
+    try:
+        depot_path.decode('ascii', 'strict')
+    except:
+        if gitConfig('git-p4.pathEncoding'):
+            encoding = gitConfig('git-p4.pathEncoding')
+        depot_path = depot_path.decode(encoding, 'replace').encode('utf8', 'replace')
+        did_decode=True
+    return [depot_path, did_decode]
+
+def depot_encode_restore(encoded_depot_path):
+    """Recodes an encoded_depot_path 
+    from utf8 back to the configured 
+    pathEncoding"""
+    encoding = 'utf8'
+    if gitConfig('git-p4.pathEncoding'):
+        encoding = gitConfig('git-p4.pathEncoding')
+    return encoded_depot_path.decode('utf8', 'replace').encode(encoding, 'replace')
+
 class P4Exception(Exception):
     """ Base class for exceptions from the p4 client """
     def __init__(self, exit_code):
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
                           ` (14 preceding siblings ...)
  2019-12-07 17:47         ` [PATCH v5 15/15] git-p4: Add depot manipulation functions Ben Keene via GitGitGadget
@ 2019-12-07 19:47         ` Jeff King
  2019-12-07 21:27           ` Ben Keene
  15 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-12-07 19:47 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

On Sat, Dec 07, 2019 at 05:47:28PM +0000, Ben Keene via GitGitGadget wrote:

> Ben Keene (13):
>   git-p4: select P4 binary by operating-system
>   git-p4: change the expansion test from basestring to list
>   git-p4: promote encodeWithUTF8() to a global function
>   git-p4: remove p4_write_pipe() and write_pipe() return values
>   git-p4: add new support function gitConfigSet()
>   git-p4: add casting helper functions for python 3 conversion
>   git-p4: python 3 syntax changes
>   git-p4: fix assumed path separators to be more Windows friendly
>   git-p4: add Py23File() - helper class for stream writing
>   git-p4: p4CmdList - support Unicode encoding
>   git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800)
>   git-p4: added --encoding parameter to p4 clone
>   git-p4: Add depot manipulation functions
> 
> Jeff King (2):
>   t/gitweb-lib.sh: drop confusing quotes
>   t/gitweb-lib.sh: set $REQUEST_URI

Hmm, looks like rebasing leftovers. :) I think we can probably drop
these first two?

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-07 19:47         ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 Jeff King
@ 2019-12-07 21:27           ` Ben Keene
  0 siblings, 0 replies; 64+ messages in thread
From: Ben Keene @ 2019-12-07 21:27 UTC (permalink / raw)
  To: Jeff King, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano

Yes indeed!

I hadn't pulled before I attempted the rebase, and got bit.  Yes those 
shouldn't be there!

On 12/7/2019 2:47 PM, Jeff King wrote:
> On Sat, Dec 07, 2019 at 05:47:28PM +0000, Ben Keene via GitGitGadget wrote:
>
>> Ben Keene (13):
>>    git-p4: select P4 binary by operating-system
>>    git-p4: change the expansion test from basestring to list
>>    git-p4: promote encodeWithUTF8() to a global function
>>    git-p4: remove p4_write_pipe() and write_pipe() return values
>>    git-p4: add new support function gitConfigSet()
>>    git-p4: add casting helper functions for python 3 conversion
>>    git-p4: python 3 syntax changes
>>    git-p4: fix assumed path separators to be more Windows friendly
>>    git-p4: add Py23File() - helper class for stream writing
>>    git-p4: p4CmdList - support Unicode encoding
>>    git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800)
>>    git-p4: added --encoding parameter to p4 clone
>>    git-p4: Add depot manipulation functions
>>
>> Jeff King (2):
>>    t/gitweb-lib.sh: drop confusing quotes
>>    t/gitweb-lib.sh: set $REQUEST_URI
> Hmm, looks like rebasing leftovers. :) I think we can probably drop
> these first two?
>
> -Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, back to index

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
2019-11-14  9:46   ` Luke Diamand
2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
2019-12-03  0:18       ` Denton Liu
2019-12-03 16:03         ` Ben Keene
2019-12-04  6:14           ` Denton Liu
2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
2019-12-05 10:19         ` Denton Liu
2019-12-05 16:32           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-05 10:27         ` Denton Liu
2019-12-05 17:05           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
2019-12-05 10:40         ` Denton Liu
2019-12-05 18:42           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
2019-12-05 11:02         ` Denton Liu
2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
2019-12-05 10:50         ` Denton Liu
2019-12-05 19:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-05 13:38         ` Junio C Hamano
2019-12-05 19:37           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
2019-12-05 13:42         ` Junio C Hamano
2019-12-05 19:52           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-05 13:55         ` Junio C Hamano
2019-12-05 20:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
2019-12-05 14:04         ` Junio C Hamano
2019-12-05 15:40           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
2019-12-05 16:16         ` Ben Keene
2019-12-05 18:51           ` Denton Liu
2019-12-05 20:47             ` Ben Keene
2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes Jeff King via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 02/15] t/gitweb-lib.sh: set $REQUEST_URI Jeff King via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 03/15] git-p4: select P4 binary by operating-system Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 04/15] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 05/15] git-p4: promote encodeWithUTF8() to a global function Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 06/15] git-p4: remove p4_write_pipe() and write_pipe() return values Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 07/15] git-p4: add new support function gitConfigSet() Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 08/15] git-p4: add casting helper functions for python 3 conversion Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 09/15] git-p4: python 3 syntax changes Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 10/15] git-p4: fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 11/15] git-p4: add Py23File() - helper class for stream writing Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 12/15] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 13/15] git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800) Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 14/15] git-p4: added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 15/15] git-p4: Add depot manipulation functions Ben Keene via GitGitGadget
2019-12-07 19:47         ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 Jeff King
2019-12-07 21:27           ` Ben Keene

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git