git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Keene <seraphire@gmail.com>
To: Luke Diamand <luke@diamand.org>,
	Ben Keene via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Users <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
Date: Thu, 5 Dec 2019 11:16:27 -0500	[thread overview]
Message-ID: <be2a6839-aa73-dbf8-de19-823d3ae5265a@gmail.com> (raw)
In-Reply-To: <CAE5ih7-6EbEM4z5BtY87=82H_tLypiOPq4WY5mm3190QExTZWQ@mail.gmail.com>


On 12/5/2019 4:54 AM, Luke Diamand wrote:
> On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Issue: The current git-p4.py script does not work with python3.
>>
>> I have attempted to use the P4 integration built into GIT and I was unable
>> to get the program to run because I have Python 3.8 installed on my
>> computer. I was able to get the program to run when I downgraded my python
>> to version 2.7. However, python 2 is reaching its end of life.
>>
>> Submission: I am submitting a patch for the git-p4.py script that partially
>> supports python 3.8. This code was able to pass the basic tests (t9800) when
>> run against Python3. This provides basic functionality.
>>
>> In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
>> git P4 Clone was introduced.
>>
>> --encoding Format-identifier
>>
>> This will create the GIT repository following the current functionality;
>> however, before importing the files from P4, it will set the
>> git-p4.pathEncoding option so any files or paths that are encoded with
>> non-ASCII/non-UTF-8 formats will import correctly.
>>
>> Technical details: The script was updated by futurize (
>> https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
>> references to classes in future were reworked so that future would not be
>> required. The existing code test for Unicode support was extended to
>> normalize the classes “unicode” and “bytes” to across platforms:
>>
>>   * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
>>   * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.
>>
>> New coercion methods were written for both Python2 and Python3:
>>
>>   * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
>>     encoded Unicode string.
>>   * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
>>     bytes.
>>
>> In Python2, these functions do not change the data since a ‘str’ object
>> function in both roles as strings and byte arrays. This reduces the
>> potential impact on backward compatibility with Python 2.
>>
>>   * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
>>     string. This function will encode data in both Python2 and Python3. *
>>        path_as_string(path) – This function is an extension function that
>>        honors the option “git-p4.pathEncoding” to convert a set of bytes or
>>        characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
>>        use the encodeWithUTF8() method to convert the custom encoded bytes to
>>        Unicode in UTF-8.
>>
>>
>>
>> Generally speaking, information in the script is converted to Unicode as
>> early as possible and converted back to a byte array just before passing to
>> external programs or files. The exception to this rule is P4 Repository file
>> paths.
>>
>> Paths are not converted but left as “bytes” so the original file path
>> encoding can be preserved. This formatting is required for commands that
>> interact with the P4 file path. When the file path is used by GIT, it is
>> converted with encodeWithUTF8().
>>
> Almost all the tests pass now - nice!
>
> (There's one test that fails for me, t9830-git-p4-symlink-dir.sh).


Which version of Python are running the failing test against?  I run it 
against Python 2.7 and it passes the test. I don't expect all Python 3.x 
tests to pass yet, just t9800.


>
> Nitpicking:
>
> - There are some bits of trailing whitespace around - can you strip
> those out? You can use "git diff --check".


Is there a way that I can find out which branches I need to remove white 
space from now that they have been committed?


> - Also I think the convention for git commits is that they be limited
> to 72 (?) characters.


I'm going through all my commits and fixing them.


> - In 10dc commit message, s/behvior/behavior
> - Maybe submit 4fc4 as a separate patch series? It doesn't seem
> directly related to your python3 changes.


I moved the enhancements to https://github.com/git/git/pull/675


> - s/howerver/however/
>
> The comment at line 3261 (showing the fast-import syntax) has wonky
> indentation, and needs a space after the '#'.
>
> This code looked like we're duplicating stuff:
>
> +    if isinstance(path, unicode):
> +        path = path.replace("%", "%25") \
> +                   .replace("*", "%2A") \
> +                   .replace("#", "%23") \
> +                   .replace("@", "%40")
> +    else:
> +        path = path.replace(b"%", b"%25") \
> +                   .replace(b"*", b"%2A") \
> +                   .replace(b"#", b"%23") \
> +                   .replace(b"@", b"%40")
>
> I wonder if we can have a helper to do this?

I was just looking at this code block, and at this time, I'm not sure if 
the text coming in will be Unicode or bytes, so I'm hesitant to change 
it until more of the code is converted, but I understand about the 
duplication.


>
> In patchRCSKeywords() you've added code to cleanup outFile. But I
> wonder if we could just use a 'finally' block, or a contextexpr ("with
> blah as outFile:")
>
> I don't know if it's worth doing now that you've got it going, but at
> one point I tried simplifying code like this:
>
>     path_as_string(file['depotFile'])
> and
>     marshalled[b'data']
>
> by using a dictionary with overloaded operators which would do the
> bytes/string conversion automatically. However, your approach isn't
> actually _that_ invasive, so maybe this is not necessary.
>
> Looks good though, thanks!
> Luke
>
I toyed with making a class object that would hold the path data and 
have methods to cast to bytes and encodeWithUTF8() and Unicode versions, 
but it quickly got out of hand.


  reply	other threads:[~2019-12-05 16:16 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
2019-11-14  9:46   ` Luke Diamand
2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
2019-12-03  0:18       ` Denton Liu
2019-12-03 16:03         ` Ben Keene
2019-12-04  6:14           ` Denton Liu
2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
2019-12-05 10:19         ` Denton Liu
2019-12-05 16:32           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-05 10:27         ` Denton Liu
2019-12-05 17:05           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
2019-12-05 10:40         ` Denton Liu
2019-12-05 18:42           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
2019-12-05 11:02         ` Denton Liu
2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
2019-12-05 10:50         ` Denton Liu
2019-12-05 19:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-05 13:38         ` Junio C Hamano
2019-12-05 19:37           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
2019-12-05 13:42         ` Junio C Hamano
2019-12-05 19:52           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-05 13:55         ` Junio C Hamano
2019-12-05 20:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
2019-12-05 14:04         ` Junio C Hamano
2019-12-05 15:40           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
2019-12-05 16:16         ` Ben Keene [this message]
2019-12-05 18:51           ` Denton Liu
2019-12-05 20:47             ` Ben Keene
2019-12-07 17:47       ` [PATCH v5 00/15] " Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 01/15] t/gitweb-lib.sh: drop confusing quotes Jeff King via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 02/15] t/gitweb-lib.sh: set $REQUEST_URI Jeff King via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 03/15] git-p4: select P4 binary by operating-system Ben Keene via GitGitGadget
2019-12-09 19:47           ` Junio C Hamano
2019-12-07 17:47         ` [PATCH v5 04/15] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-09 20:25           ` Junio C Hamano
2019-12-13 14:40             ` Ben Keene
2019-12-07 17:47         ` [PATCH v5 05/15] git-p4: promote encodeWithUTF8() to a global function Ben Keene via GitGitGadget
2019-12-11 16:39           ` Junio C Hamano
2019-12-07 17:47         ` [PATCH v5 06/15] git-p4: remove p4_write_pipe() and write_pipe() return values Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 07/15] git-p4: add new support function gitConfigSet() Ben Keene via GitGitGadget
2019-12-11 17:11           ` Junio C Hamano
2019-12-07 17:47         ` [PATCH v5 08/15] git-p4: add casting helper functions for python 3 conversion Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 09/15] git-p4: python 3 syntax changes Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 10/15] git-p4: fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 11/15] git-p4: add Py23File() - helper class for stream writing Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 12/15] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 13/15] git-p4: support Python 3 for basic P4 clone, sync, and submit (t9800) Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 14/15] git-p4: added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-07 17:47         ` [PATCH v5 15/15] git-p4: Add depot manipulation functions Ben Keene via GitGitGadget
2019-12-07 19:47         ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 Jeff King
2019-12-07 21:27           ` Ben Keene
2019-12-11 16:54             ` Junio C Hamano
2019-12-11 17:13               ` Denton Liu
2019-12-11 17:57                 ` Junio C Hamano
2019-12-11 20:19                   ` Luke Diamand
2019-12-11 21:46                     ` Junio C Hamano
2019-12-11 22:30                       ` Yang Zhao
2019-12-12 14:13                         ` Ben Keene
2019-12-13 19:42                           ` [PATCH v5 00/15] git-p4.py: Cast byte strings to unicode strings in python3 - Code Review Ben Keene

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be2a6839-aa73-dbf8-de19-823d3ae5265a@gmail.com \
    --to=seraphire@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=luke@diamand.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).