From: Ben Keene <seraphire@gmail.com>
To: Denton Liu <liu.denton@gmail.com>,
Ben Keene via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
Date: Thu, 5 Dec 2019 13:42:07 -0500 [thread overview]
Message-ID: <c6969495-912d-3364-9876-b7cb6a7a3e04@gmail.com> (raw)
In-Reply-To: <20191205104056.GA1192079@generichostname>
On 12/5/2019 5:40 AM, Denton Liu wrote:
> On Wed, Dec 04, 2019 at 10:29:29PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Python 3+ handles strings differently than Python 2.7. Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
>>
>> Change the existing unicode test add new support functions for python2-python3 support.
>>
>> Define the following variables:
>> - isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
>> - unicode - a type alias for the datatype that holds a unicode string. It is assigned to a str under python 3 and the unicode type for Python2.
>> - bytes - a type alias for an array of bytes. It is assigned the native bytes type for Python3 and str for Python2.
>>
>> Add the following new functions:
>>
>> - as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3. Under python 2, this returns the string unchanged.
>> - as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3. Under python 2, this returns the string unchanged.
>> - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.
>>
>> Add a new function alias raw_input:
>> If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.
>>
>> The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support. When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.
> How come AS_STRING and AS_BYTES are all-caps here?
I changed them. I used all caps to designate that they are code string.
I changed them to as_string() and as_bytes()
>
>> basestring is removed since its only references are found in tests that were changed in the previous change list.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
>> ---
>> git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 0f27996393..93dfd0920a 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -32,16 +32,78 @@
>> unicode = unicode
>> except NameError:
>> # 'unicode' is undefined, must be Python 3
>> - str = str
>> + #
>> + # For Python3 which is natively unicode, we will use
>> + # unicode for internal information but all P4 Data
>> + # will remain in bytes
>> + isunicode = True
>> unicode = str
>> bytes = bytes
>> - basestring = (str,bytes)
>> +
>> + def as_string(text):
>> + """Return a byte array as a unicode string"""
>> + if text == None:
> Nit: use `text is None` instead. Actually, any time you're checking an
> object to see if it's None, you should use `is` instead of `==` since
> there's usually only one None reference.
I changed this in this commit and will attempt to fix this in all the
following commits as well.
>
>> + return None
>> + if isinstance(text, bytes):
>> + return unicode(text, "utf-8")
>> + else:
>> + return text
>> +
>> + def as_bytes(text):
>> + """Return a Unicode string as a byte array"""
>> + if text == None:
>> + return None
>> + if isinstance(text, bytes):
>> + return text
>> + else:
>> + return bytes(text, "utf-8")
>> +
>> + def to_unicode(text):
>> + """Return a byte array as a unicode string"""
>> + return as_string(text)
>> +
>> + def path_as_string(path):
>> + """ Converts a path to the UTF8 encoded string """
>> + if isinstance(path, unicode):
>> + return path
>> + return encodeWithUTF8(path).decode('utf-8')
>> +
> Trailing whitespace.
>
>> else:
>> # 'unicode' exists, must be Python 2
>> - str = str
>> + #
>> + # We will treat the data as:
>> + # str -> str
>> + # bytes -> str
>> + # So for Python2 these functions are no-ops
>> + # and will leave the data in the ambiguious
>> + # string/bytes state
>> + isunicode = False
>> unicode = unicode
>> bytes = str
>> - basestring = basestring
>> +
>> + def as_string(text):
>> + """ Return text unaltered (for Python3 support) """
> I didn't mention this in earlier emails but it's been bothering me a
> lot: is there any reason why you write it as "Python3" vs. "Python 3"
> sometimes (and Python2 as well)? If there's no difference, then we
> should probably stick to one variant in both the commit messages and in
> the code. (I prefer the spaced variant.)
The difference was sloppy typing. Like the "is None" and trailing white
spaces, I'll work on fixing these.
>> + return text
>> +
>> + def as_bytes(text):
>> + """ Return text unaltered (for Python3 support) """
>> + return text
>> +
>> + def to_unicode(text):
>> + """Return a string as a unicode string"""
>> + return text.decode('utf-8')
>> +
> Trailing whitespace.
>
>> + def path_as_string(path):
>> + """ Converts a path to the UTF8 encoded bytes """
>> + return encodeWithUTF8(path)
>> +
>> +
>> +
> Trailing whitespace.
>
>> +# Check for raw_input support
>> +try:
>> + raw_input
>> +except NameError:
>> + raw_input = input
>>
>> try:
>> from subprocess import CalledProcessError
>> --
>> gitgitgadget
>>
next prev parent reply other threads:[~2019-12-05 18:42 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
2019-11-14 2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
2019-11-14 9:46 ` Luke Diamand
2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
2019-11-15 14:39 ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
2019-11-15 14:39 ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
2019-11-15 14:39 ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
2019-12-02 19:02 ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-02 19:02 ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
2019-12-03 0:18 ` Denton Liu
2019-12-03 16:03 ` Ben Keene
2019-12-04 6:14 ` Denton Liu
2019-12-04 22:29 ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-04 22:29 ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
2019-12-05 10:19 ` Denton Liu
2019-12-05 16:32 ` Ben Keene
2019-12-04 22:29 ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-05 10:27 ` Denton Liu
2019-12-05 17:05 ` Ben Keene
2019-12-04 22:29 ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
2019-12-05 10:40 ` Denton Liu
2019-12-05 18:42 ` Ben Keene [this message]
2019-12-04 22:29 ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
2019-12-05 11:02 ` Denton Liu
2019-12-04 22:29 ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
2019-12-05 10:50 ` Denton Liu
2019-12-05 19:23 ` Ben Keene
2019-12-04 22:29 ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-05 13:38 ` Junio C Hamano
2019-12-05 19:37 ` Ben Keene
2019-12-04 22:29 ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
2019-12-05 13:42 ` Junio C Hamano
2019-12-05 19:52 ` Ben Keene
2019-12-04 22:29 ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-05 13:55 ` Junio C Hamano
2019-12-05 20:23 ` Ben Keene
2019-12-04 22:29 ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
2019-12-05 14:04 ` Junio C Hamano
2019-12-05 15:40 ` Ben Keene
2019-12-04 22:29 ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
2019-12-04 22:29 ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-05 9:54 ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
2019-12-05 16:16 ` Ben Keene
2019-12-05 18:51 ` Denton Liu
2019-12-05 20:47 ` Ben Keene
2019-12-07 17:47 ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes Jeff King via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 02/15] t/gitweb-lib.sh: set $REQUEST_URI Jeff King via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 03/15] git-p4: select P4 binary by operating-system Ben Keene via GitGitGadget
2019-12-09 19:47 ` Junio C Hamano
2019-12-07 17:47 ` [PATCH v5 04/15] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-09 20:25 ` Junio C Hamano
2019-12-13 14:40 ` Ben Keene
2019-12-07 17:47 ` [PATCH v5 05/15] git-p4: promote encodeWithUTF8() to a global function Ben Keene via GitGitGadget
2019-12-11 16:39 ` Junio C Hamano
2019-12-07 17:47 ` [PATCH v5 06/15] git-p4: remove p4_write_pipe() and write_pipe() return values Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 07/15] git-p4: add new support function gitConfigSet() Ben Keene via GitGitGadget
2019-12-11 17:11 ` Junio C Hamano
2019-12-07 17:47 ` [PATCH v5 08/15] git-p4: add casting helper functions for python 3 conversion Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 09/15] git-p4: python 3 syntax changes Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 10/15] git-p4: fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 11/15] git-p4: add Py23File() - helper class for stream writing Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 12/15] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 13/15] git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800) Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 14/15] git-p4: added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-07 17:47 ` [PATCH v5 15/15] git-p4: Add depot manipulation functions Ben Keene via GitGitGadget
2019-12-07 19:47 ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 Jeff King
2019-12-07 21:27 ` Ben Keene
2019-12-11 16:54 ` Junio C Hamano
2019-12-11 17:13 ` Denton Liu
2019-12-11 17:57 ` Junio C Hamano
2019-12-11 20:19 ` Luke Diamand
2019-12-11 21:46 ` Junio C Hamano
2019-12-11 22:30 ` Yang Zhao
2019-12-12 14:13 ` Ben Keene
2019-12-13 19:42 ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 - Code Review Ben Keene
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c6969495-912d-3364-9876-b7cb6a7a3e04@gmail.com \
--to=seraphire@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=liu.denton@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).