git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Joel Holdsworth <jholdsworth@nvidia.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Tzadik Vanderhoof <tzadik.vanderhoof@gmail.com>
Subject: Re: [PATCH 0/6] Transition git-p4.py to support Python 3 only
Date: Fri, 10 Dec 2021 12:30:01 +0100	[thread overview]
Message-ID: <211210.86h7bgd6wj.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <BN8PR12MB3361388476E57E097DEBF9F7C8719@BN8PR12MB3361.namprd12.prod.outlook.com>


On Fri, Dec 10 2021, Joel Holdsworth wrote:

>> The commit messages could just really use some extra hand-holding and
>> explanation, and a clear split-out of things related to the version bump v.s.
>> things not needed for that, or unrelated refactorings.
>
> Yes, I am getting this message loud and clear. I will resubmit with more detailed commit messages today.

Thanks...

> To explain the story here: I started using git-p4 as part of my
> work-flow, and I expect to need it for several years to come. As I
> began to use it, I found that a series of bugs - mostly related to
> character encoding. In fixing these, I found that some of the troubles
> were specific to Python 3 - or rather Python 2's less strict approach
> to distinguishing between byte sequences and textual strings allowed
> the script to proceed Python 2 even though what it was doing was in
> fact invalid, and was potentially corrupting data.
>
> A common problem that users are encountering is that the script
> attempts to decode incoming textual byte-streams into UTF-8
> strings. On Python 3 this fails with an exception if the data contains
> invalid UTF-8 codes. For text files created in Windows, CP1252 Smart
> Quote characters: 0x93 and 0x94 are seen fairly frequently. These
> codes are invalid in UTF-8, so if the script encounters any file or
> file name containing them, it will fail with an exception.
>
> Tzadik Vanderhoof submitted a patch attempting to fix some of these issues in April 2021:
> https://lore.kernel.org/git/20210429073905.837-1-tzadik.vanderhoof@gmail.com/
>
> My two comments about this patch are that 1. It doesn't fix my issue, and 2. Even with the proposed fallbackEncoding option it still leaves git-p4 broken by default.
>
> A fallbackEncoding option may still be necessary, but I found that most of the issues I encountered could be side-stepped by simply avoiding decoding incoming data into UTF-8 in the first place.
>
> Keeping a clean separation between encoded and decoded text is much
> easier to do in Python 3. If Python 2 support must be maintained, this
> will require careful testing of separate code-paths both platforms
> which I don't regard as reasonable given that Python 2 is thoroughly
> deprecated. Therefore, this first patch-set focusses primarily on
> removing Python 2 support.

This all makes perfect sense to me (having never used git-p4). Having
this sort of explanation be part of the relevant commit message would be
great :)

I haven't worked extensively with Python myself, but I've understood
that its Unicode support was a big pain before v3 as you describe, which
is just the sort of thing that would justify a version prereq bump, even
if it's a bit painful to some users on older systems (if even that,
maybe everyone's upgraded already...)

> Furthermore, because I expect to be using git-p4 in my daily work-flow
> for some time to come, I am interested in investing some effort into
> improving it. There are bugs, unreliable behaviour, user-hostile
> behaviour, as well as code that would benefit from clean-up and
> modernisation. In submitting these patches, I am trying to get a read
> on to what extent such efforts would be accepted by the Git
> maintainers.

I don't think there's any reason we wouldn't accept these sorts of
changes.

The comment from me (and others I see) is purely on the topic of making
them easier to review, i.e. splitting out "this is for a version
upgrade" v.s. "this is just better Python style" or whatever.

> Is it preferable that patch-sets have a tight focus on a single topic?
> I am already dividing up my full patch collection. I can divide it
> further if requested. I am happy to do this, I was just worried that
> it just might make longer to get all my patches through review.

Yeah this project really prefers to do it that way. A good example is
this recent 19-part series:
https://lore.kernel.org/git/20211210095757.gu7w4n2rqulx2dvg@fs/T/#m5d9e8180551907578d56cdd6cd6244b9df6b59d5

This would probably be 1-3 patches, or even 1 patch in some other
projects, but especially with repetitive formatting changes I think it's
fair to say that we prefer for them to be split up closer to that,
i.e. one commit with some %s -> {} formatting change explaining why
(probably just style, preferenc) etc.

There's also the option of splitting things into different patch
series. I'd say if you e.g. have one series of "we're dropping python 2
support" and another "here's nice formatting changes" it would be nice
to split those into two different patch serieses.

But that's always a matter of taste & how easy it is. If they
extensively textually conflict it's often worth it to just stack them
together, or if they changes are all small/easy enough to review some
"while we're at it..." changes are generally fine.

Finally, for a re-submission it's also nice to find people who've worked
on the relevant code (with some fuzzing for "is this person likely to
still be active in the project?") and CC them on the series, or in this
case people who've made recent changes to git-p4.py.

>> Some of these changes also just seem to be entirely unrelated refactorings,
>> e.g. 6/6 where you're changing a multi-line commented regexp into
>> something that's a dense one-liner. Does Python 3 not support the
>> equivalent of Perl's /x, or is something else going on here?
>
> I will improve the commit message to explain the changes being made here.
>
> The regexp is matching RCS keywords:
> https://www.perforce.com/manuals/p4guide/Content/P4Guide/filetypes.rcs.html
> - $File$, $Author$, $Author$ etc., a very simple match. We could keep
> it multi-line, though this seems overkill to me.

Sure, my preference in Perl would be to split it, but I'm never going to
be maintaining git-p4.py, so... :)

I.e. I think it's perfectly fair to roll it into some general "this
improves readability" changes, just as long as they're clearly labeled
as such.

> The main significance of this change that previously git-p4 would
> compile one of these two regexes for every single file processed. This
> patch just pre-compiles the two regexes (now binary regexes, not utf-8
> regexes) at the start of the script.

Makes sens, and another thing that would be perfect for pretty much
copy/pasting as-is to a re-rolled commit's message :)

  reply	other threads:[~2021-12-10 11:41 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 20:10 [PATCH 0/6] Transition git-p4.py to support Python 3 only Joel Holdsworth
2021-12-09 20:10 ` [PATCH 1/6] git-p4: Always pass cmd arguments to subprocess as a python lists Joel Holdsworth
2021-12-09 22:42   ` Junio C Hamano
2021-12-09 20:10 ` [PATCH 2/6] git-p4: Don't print shell commands as " Joel Holdsworth
2021-12-09 20:10 ` [PATCH 3/6] git-p4: Removed support for Python 2 Joel Holdsworth
2021-12-09 22:44   ` Junio C Hamano
2021-12-09 23:07     ` rsbecker
2021-12-10  3:25   ` David Aguilar
2021-12-10 10:44     ` Joel Holdsworth
2021-12-09 20:10 ` [PATCH 4/6] git-p4: Decode byte strings before printing Joel Holdsworth
2021-12-09 22:47   ` Junio C Hamano
2021-12-10  8:40     ` Fabian Stelzer
2021-12-10 10:48       ` Joel Holdsworth
2021-12-10 10:41     ` Joel Holdsworth
2021-12-09 20:10 ` [PATCH 5/6] git-p4: Eliminate decode_stream and encode_stream Joel Holdsworth
2021-12-09 20:10 ` [PATCH 6/6] git-p4: Resolve RCS keywords in binary Joel Holdsworth
2021-12-10  7:57   ` Luke Diamand
2021-12-10 10:51     ` Joel Holdsworth
2021-12-10  0:48 ` [PATCH 0/6] Transition git-p4.py to support Python 3 only Ævar Arnfjörð Bjarmason
2021-12-10 10:37   ` Joel Holdsworth
2021-12-10 11:30     ` Ævar Arnfjörð Bjarmason [this message]
2021-12-10 21:34   ` Junio C Hamano
2021-12-10 21:53     ` rsbecker
2021-12-11 21:00     ` Elijah Newren
2021-12-12  8:55       ` Luke Diamand
2021-12-10  7:53 ` Luke Diamand
2021-12-10 10:54   ` Joel Holdsworth
2021-12-11  9:58     ` Luke Diamand
2021-12-13 13:47       ` Joel Holdsworth
2021-12-13 19:29         ` Junio C Hamano
2021-12-13 19:58           ` Joel Holdsworth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211210.86h7bgd6wj.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jholdsworth@nvidia.com \
    --cc=tzadik.vanderhoof@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).