git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Keene <seraphire@gmail.com>
To: Denton Liu <liu.denton@gmail.com>,
	Ben Keene via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
Date: Thu, 5 Dec 2019 13:42:07 -0500	[thread overview]
Message-ID: <c6969495-912d-3364-9876-b7cb6a7a3e04@gmail.com> (raw)
In-Reply-To: <20191205104056.GA1192079@generichostname>


On 12/5/2019 5:40 AM, Denton Liu wrote:
> On Wed, Dec 04, 2019 at 10:29:29PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
>>
>> Change the existing unicode test add new support functions for python2-python3 support.
>>
>> Define the following variables:
>> - isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
>> - unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
>> - bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.
>>
>> Add the following new functions:
>>
>> - as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
>> - as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
>> - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.
>>
>> Add a new function alias raw_input:
>> If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.
>>
>> The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.
> How come AS_STRING and AS_BYTES are all-caps here?


I changed them.  I used all caps to designate that they are code string. 
I changed them to as_string() and as_bytes()


>
>> basestring is removed since its only references are found in tests that were changed in the previous change list.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
>> ---
>>   git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 0f27996393..93dfd0920a 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -32,16 +32,78 @@
>>       unicode = unicode
>>   except NameError:
>>       # 'unicode' is undefined, must be Python 3
>> -    str = str
>> +    #
>> +    # For Python3 which is natively unicode, we will use
>> +    # unicode for internal information but all P4 Data
>> +    # will remain in bytes
>> +    isunicode = True
>>       unicode = str
>>       bytes = bytes
>> -    basestring = (str,bytes)
>> +
>> +    def as_string(text):
>> +        """Return a byte array as a unicode string"""
>> +        if text == None:
> Nit: use `text is None` instead. Actually, any time you're checking an
> object to see if it's None, you should use `is` instead of `==` since
> there's usually only one None reference.

I changed this in this commit and will attempt to fix this in all the 
following commits as well.


>
>> +            return None
>> +        if isinstance(text, bytes):
>> +            return unicode(text, "utf-8")
>> +        else:
>> +            return text
>> +
>> +    def as_bytes(text):
>> +        """Return a Unicode string as a byte array"""
>> +        if text == None:
>> +            return None
>> +        if isinstance(text, bytes):
>> +            return text
>> +        else:
>> +            return bytes(text, "utf-8")
>> +
>> +    def to_unicode(text):
>> +        """Return a byte array as a unicode string"""
>> +        return as_string(text)
>> +
>> +    def path_as_string(path):
>> +        """ Converts a path to the UTF8 encoded string """
>> +        if isinstance(path, unicode):
>> +            return path
>> +        return encodeWithUTF8(path).decode('utf-8')
>> +
> Trailing whitespace.
>
>>   else:
>>       # 'unicode' exists, must be Python 2
>> -    str = str
>> +    #
>> +    # We will treat the data as:
>> +    #   str   -> str
>> +    #   bytes -> str
>> +    # So for Python2 these functions are no-ops
>> +    # and will leave the data in the ambiguious
>> +    # string/bytes state
>> +    isunicode = False
>>       unicode = unicode
>>       bytes = str
>> -    basestring = basestring
>> +
>> +    def as_string(text):
>> +        """ Return text unaltered (for Python3 support) """
> I didn't mention this in earlier emails but it's been bothering me a
> lot: is there any reason why you write it as "Python3" vs. "Python 3"
> sometimes (and Python2 as well)? If there's no difference, then we
> should probably stick to one variant in both the commit messages and in
> the code. (I prefer the spaced variant.)


The difference was sloppy typing.  Like the "is None" and trailing white 
spaces, I'll work on fixing these.


>> +        return text
>> +
>> +    def as_bytes(text):
>> +        """ Return text unaltered (for Python3 support) """
>> +        return text
>> +
>> +    def to_unicode(text):
>> +        """Return a string as a unicode string"""
>> +        return text.decode('utf-8')
>> +
> Trailing whitespace.
>
>> +    def path_as_string(path):
>> +        """ Converts a path to the UTF8 encoded bytes """
>> +        return encodeWithUTF8(path)
>> +
>> +
>> +
> Trailing whitespace.
>
>> +# Check for raw_input support
>> +try:
>> +    raw_input
>> +except NameError:
>> +    raw_input = input
>>   
>>   try:
>>       from subprocess import CalledProcessError
>> -- 
>> gitgitgadget
>>

  reply	other threads:[~2019-12-05 18:42 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
2019-11-14  9:46   ` Luke Diamand
2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
2019-12-03  0:18       ` Denton Liu
2019-12-03 16:03         ` Ben Keene
2019-12-04  6:14           ` Denton Liu
2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
2019-12-05 10:19         ` Denton Liu
2019-12-05 16:32           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-05 10:27         ` Denton Liu
2019-12-05 17:05           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
2019-12-05 10:40         ` Denton Liu
2019-12-05 18:42           ` Ben Keene [this message]
2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
2019-12-05 11:02         ` Denton Liu
2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
2019-12-05 10:50         ` Denton Liu
2019-12-05 19:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-05 13:38         ` Junio C Hamano
2019-12-05 19:37           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
2019-12-05 13:42         ` Junio C Hamano
2019-12-05 19:52           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-05 13:55         ` Junio C Hamano
2019-12-05 20:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
2019-12-05 14:04         ` Junio C Hamano
2019-12-05 15:40           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
2019-12-05 16:16         ` Ben Keene
2019-12-05 18:51           ` Denton Liu
2019-12-05 20:47             ` Ben Keene
2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes Jeff King via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 02/15] t/gitweb-lib.sh: set $REQUEST_URI Jeff King via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 03/15] git-p4: select P4 binary by operating-system Ben Keene via GitGitGadget
2019-12-09 19:47           ` Junio C Hamano
2019-12-07 17:47         ` [PATCH v5 04/15] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-09 20:25           ` Junio C Hamano
2019-12-13 14:40             ` Ben Keene
2019-12-07 17:47         ` [PATCH v5 05/15] git-p4: promote encodeWithUTF8() to a global function Ben Keene via GitGitGadget
2019-12-11 16:39           ` Junio C Hamano
2019-12-07 17:47         ` [PATCH v5 06/15] git-p4: remove p4_write_pipe() and write_pipe() return values Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 07/15] git-p4: add new support function gitConfigSet() Ben Keene via GitGitGadget
2019-12-11 17:11           ` Junio C Hamano
2019-12-07 17:47         ` [PATCH v5 08/15] git-p4: add casting helper functions for python 3 conversion Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 09/15] git-p4: python 3 syntax changes Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 10/15] git-p4: fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 11/15] git-p4: add Py23File() - helper class for stream writing Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 12/15] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 13/15] git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800) Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 14/15] git-p4: added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 15/15] git-p4: Add depot manipulation functions Ben Keene via GitGitGadget
2019-12-07 19:47         ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 Jeff King
2019-12-07 21:27           ` Ben Keene
2019-12-11 16:54             ` Junio C Hamano
2019-12-11 17:13               ` Denton Liu
2019-12-11 17:57                 ` Junio C Hamano
2019-12-11 20:19                   ` Luke Diamand
2019-12-11 21:46                     ` Junio C Hamano
2019-12-11 22:30                       ` Yang Zhao
2019-12-12 14:13                         ` Ben Keene
2019-12-13 19:42                           ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 - Code Review Ben Keene

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6969495-912d-3364-9876-b7cb6a7a3e04@gmail.com \
    --to=seraphire@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=liu.denton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).