git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Thomas Rast <tr@thomasrast.ch>
Subject: Re: [PATCH] t/perf: correctly align non-ASCII descriptions in output
Date: Fri, 21 Apr 2017 23:28:42 +0200	[thread overview]
Message-ID: <CACBZZX4oMFkZ93YxXrByh-jCK-eVxNBj+UgF77zm5Pq1mzf+WQ@mail.gmail.com> (raw)
In-Reply-To: <20170421204154.c5mvmnccxkxdm5aw@sigill.intra.peff.net>

On Fri, Apr 21, 2017 at 10:41 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 21, 2017 at 07:44:28PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the test descriptions from being treated as binary blobs by
>> perl to being treated as UTF-8. This ensures that e.g. a test
>> description like "æ" is counted as 1 character, not 2.
>>
>> I have WIP performance tests for non-ASCII grep patterns on another
>> topic that are affected by this.
>
> Makes sense. As this is purely about test titles in our project,
> choosing utf8 as the only encoding is quite sensible.

*Nod*

>> diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
>> index 924b19dab4..1dbc85b214 100755
>> --- a/t/perf/aggregate.perl
>> +++ b/t/perf/aggregate.perl
>> @@ -88,6 +88,7 @@ for my $t (@tests) {
>>  sub read_descr {
>>       my $name = shift;
>>       open my $fh, "<", $name or return "<error reading description>";
>> +     binmode $fh, ":utf8" or die "PANIC on binmode: $!";
>
> I thought there was some "use" flag we could set to just make all of our
> handles utf8. But all I could come up with was stuff like PERLIO and
> "perl -C". Using binmode isn't too bad, though (I think you could
> just do it as part of the open, too, but I'm not sure if antique
> versions of perl support that).

[Debugging perl encoding issues is one of the many perks of my dayjob]

Using binmode like this is about as straightforward as you can get,
the former occurrence could be equivalently replaced by:

    utf8::decode(my $line = <$fh>);

But better just to mark the handle as utf8. There's a fancier way to
do it as part of the three-arg-open syntax, but I couldn't remember
whether all the perl versions we support have it.

About the "use" flag, you're probably thinking of the confusingly
named "use utf8", but that's to set your source code to utf8, not your
handles, e.g.:

$ perl -CA -MDevel::Peek -wE 'use utf8; my $日本語 = shift; Dump $日本語' æ
SV = PV(0x12cc090) at 0x12cded8
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x12de460 "\303\246"\0 [UTF8 "\x{e6}"]
  CUR = 2
  LEN = 16

As you can see people got a bit overexcited about Unicode in the 90s.

  reply	other threads:[~2017-04-21 21:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-21 19:44 [PATCH] t/perf: correctly align non-ASCII descriptions in output Ævar Arnfjörð Bjarmason
2017-04-21 20:41 ` Jeff King
2017-04-21 21:28   ` Ævar Arnfjörð Bjarmason [this message]
2017-04-21 21:35     ` Jeff King
2017-04-21 22:02       ` Ævar Arnfjörð Bjarmason
2017-04-21 22:05         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACBZZX4oMFkZ93YxXrByh-jCK-eVxNBj+UgF77zm5Pq1mzf+WQ@mail.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).