git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] t/perf: correctly align non-ASCII descriptions in output
@ 2017-04-21 19:44 Ævar Arnfjörð Bjarmason
  2017-04-21 20:41 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-21 19:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Rast,
	Ævar Arnfjörð Bjarmason

Change the test descriptions from being treated as binary blobs by
perl to being treated as UTF-8. This ensures that e.g. a test
description like "æ" is counted as 1 character, not 2.

I have WIP performance tests for non-ASCII grep patterns on another
topic that are affected by this.

Now instead of:

    $ ./run p0000-perf-lib-sanity.sh
    [...]
    0000.4: export a weird var                                    0.00(0.00+0.00)
    0000.5: éḿíẗ ńöń-ÁŚĆÍÍ ćḧáŕáćẗéŕś   0.00(0.00+0.00)
    0000.7: important variables available in subshells            0.00(0.00+0.00)
    [...]

We emit:

    [...]
    0000.4: export a weird var                                 0.00(0.00+0.00)
    0000.5: éḿíẗ ńöń-ÁŚĆÍÍ ćḧáŕáćẗéŕś                          0.00(0.00+0.00)
    0000.7: important variables available in subshells         0.00(0.00+0.00)
    [...]

Fixes code originally added in 342e9ef2d9 ("Introduce a performance
testing framework", 2012-02-17).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/aggregate.perl           | 3 +++
 t/perf/p0000-perf-lib-sanity.sh | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 924b19dab4..1dbc85b214 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -88,6 +88,7 @@ for my $t (@tests) {
 sub read_descr {
 	my $name = shift;
 	open my $fh, "<", $name or return "<error reading description>";
+	binmode $fh, ":utf8" or die "PANIC on binmode: $!";
 	my $line = <$fh>;
 	close $fh or die "cannot close $name";
 	chomp $line;
@@ -147,6 +148,8 @@ for my $t (@subtests) {
 my $totalwidth = 3*@dirs+$descrlen;
 $totalwidth += $_ for (@colwidth);
 
+binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+
 printf "%-${descrlen}s", "Test";
 for my $i (0..$#dirs) {
 	my $d = $dirs[$i];
diff --git a/t/perf/p0000-perf-lib-sanity.sh b/t/perf/p0000-perf-lib-sanity.sh
index cf8e1efce7..002c21e52a 100755
--- a/t/perf/p0000-perf-lib-sanity.sh
+++ b/t/perf/p0000-perf-lib-sanity.sh
@@ -33,6 +33,8 @@ test_perf 'export a weird var' '
 	test_export bar
 '
 
+test_perf 'éḿíẗ ńöń-ÁŚĆÍÍ ćḧáŕáćẗéŕś' 'true'
+
 test_expect_success 'test_export works with weird vars' '
 	echo "$bar" &&
 	test "$bar" = "weird # variable"
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/perf: correctly align non-ASCII descriptions in output
  2017-04-21 19:44 [PATCH] t/perf: correctly align non-ASCII descriptions in output Ævar Arnfjörð Bjarmason
@ 2017-04-21 20:41 ` Jeff King
  2017-04-21 21:28   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2017-04-21 20:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Thomas Rast

On Fri, Apr 21, 2017 at 07:44:28PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Change the test descriptions from being treated as binary blobs by
> perl to being treated as UTF-8. This ensures that e.g. a test
> description like "æ" is counted as 1 character, not 2.
> 
> I have WIP performance tests for non-ASCII grep patterns on another
> topic that are affected by this.

Makes sense. As this is purely about test titles in our project,
choosing utf8 as the only encoding is quite sensible.

> diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
> index 924b19dab4..1dbc85b214 100755
> --- a/t/perf/aggregate.perl
> +++ b/t/perf/aggregate.perl
> @@ -88,6 +88,7 @@ for my $t (@tests) {
>  sub read_descr {
>  	my $name = shift;
>  	open my $fh, "<", $name or return "<error reading description>";
> +	binmode $fh, ":utf8" or die "PANIC on binmode: $!";

I thought there was some "use" flag we could set to just make all of our
handles utf8. But all I could come up with was stuff like PERLIO and
"perl -C". Using binmode isn't too bad, though (I think you could
just do it as part of the open, too, but I'm not sure if antique
versions of perl support that).

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/perf: correctly align non-ASCII descriptions in output
  2017-04-21 20:41 ` Jeff King
@ 2017-04-21 21:28   ` Ævar Arnfjörð Bjarmason
  2017-04-21 21:35     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-21 21:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Junio C Hamano, Thomas Rast

On Fri, Apr 21, 2017 at 10:41 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 21, 2017 at 07:44:28PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the test descriptions from being treated as binary blobs by
>> perl to being treated as UTF-8. This ensures that e.g. a test
>> description like "æ" is counted as 1 character, not 2.
>>
>> I have WIP performance tests for non-ASCII grep patterns on another
>> topic that are affected by this.
>
> Makes sense. As this is purely about test titles in our project,
> choosing utf8 as the only encoding is quite sensible.

*Nod*

>> diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
>> index 924b19dab4..1dbc85b214 100755
>> --- a/t/perf/aggregate.perl
>> +++ b/t/perf/aggregate.perl
>> @@ -88,6 +88,7 @@ for my $t (@tests) {
>>  sub read_descr {
>>       my $name = shift;
>>       open my $fh, "<", $name or return "<error reading description>";
>> +     binmode $fh, ":utf8" or die "PANIC on binmode: $!";
>
> I thought there was some "use" flag we could set to just make all of our
> handles utf8. But all I could come up with was stuff like PERLIO and
> "perl -C". Using binmode isn't too bad, though (I think you could
> just do it as part of the open, too, but I'm not sure if antique
> versions of perl support that).

[Debugging perl encoding issues is one of the many perks of my dayjob]

Using binmode like this is about as straightforward as you can get,
the former occurrence could be equivalently replaced by:

    utf8::decode(my $line = <$fh>);

But better just to mark the handle as utf8. There's a fancier way to
do it as part of the three-arg-open syntax, but I couldn't remember
whether all the perl versions we support have it.

About the "use" flag, you're probably thinking of the confusingly
named "use utf8", but that's to set your source code to utf8, not your
handles, e.g.:

$ perl -CA -MDevel::Peek -wE 'use utf8; my $日本語 = shift; Dump $日本語' æ
SV = PV(0x12cc090) at 0x12cded8
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x12de460 "\303\246"\0 [UTF8 "\x{e6}"]
  CUR = 2
  LEN = 16

As you can see people got a bit overexcited about Unicode in the 90s.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/perf: correctly align non-ASCII descriptions in output
  2017-04-21 21:28   ` Ævar Arnfjörð Bjarmason
@ 2017-04-21 21:35     ` Jeff King
  2017-04-21 22:02       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2017-04-21 21:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Thomas Rast

On Fri, Apr 21, 2017 at 11:28:42PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I thought there was some "use" flag we could set to just make all of our
> > handles utf8. But all I could come up with was stuff like PERLIO and
> > "perl -C". Using binmode isn't too bad, though (I think you could
> > just do it as part of the open, too, but I'm not sure if antique
> > versions of perl support that).
> 
> [Debugging perl encoding issues is one of the many perks of my dayjob]
> 
> Using binmode like this is about as straightforward as you can get,
> the former occurrence could be equivalently replaced by:
> 
>     utf8::decode(my $line = <$fh>);
> 
> But better just to mark the handle as utf8. There's a fancier way to
> do it as part of the three-arg-open syntax, but I couldn't remember
> whether all the perl versions we support have it.

Yeah, I agree marking the handle is better. binmode is pretty
straightforward, but we'd have to remember to manually set it if we add
any other handles. That's probably not a big deal in this particular
script, though, which is pretty short.

> About the "use" flag, you're probably thinking of the confusingly
> named "use utf8", but that's to set your source code to utf8, not your
> handles, e.g.:
> 
> $ perl -CA -MDevel::Peek -wE 'use utf8; my $日本語 = shift; Dump $日本語' æ
> SV = PV(0x12cc090) at 0x12cded8
>   REFCNT = 1
>   FLAGS = (PADMY,POK,pPOK,UTF8)
>   PV = 0x12de460 "\303\246"\0 [UTF8 "\x{e6}"]
>   CUR = 2
>   LEN = 16
> 
> As you can see people got a bit overexcited about Unicode in the 90s.

Yeah, I know "use utf8" doesn't work for that, but I was thinking there
was some other trick. Digging...ah, here it is:

  use open ':encoding(utf8)'

No clue how portable that is. For such a small script it may be better
to just stick with vanilla binmode().

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/perf: correctly align non-ASCII descriptions in output
  2017-04-21 21:35     ` Jeff King
@ 2017-04-21 22:02       ` Ævar Arnfjörð Bjarmason
  2017-04-21 22:05         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-21 22:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Junio C Hamano, Thomas Rast

On Fri, Apr 21, 2017 at 11:35 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 21, 2017 at 11:28:42PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I thought there was some "use" flag we could set to just make all of our
>> > handles utf8. But all I could come up with was stuff like PERLIO and
>> > "perl -C". Using binmode isn't too bad, though (I think you could
>> > just do it as part of the open, too, but I'm not sure if antique
>> > versions of perl support that).
>>
>> [Debugging perl encoding issues is one of the many perks of my dayjob]
>>
>> Using binmode like this is about as straightforward as you can get,
>> the former occurrence could be equivalently replaced by:
>>
>>     utf8::decode(my $line = <$fh>);
>>
>> But better just to mark the handle as utf8. There's a fancier way to
>> do it as part of the three-arg-open syntax, but I couldn't remember
>> whether all the perl versions we support have it.
>
> Yeah, I agree marking the handle is better. binmode is pretty
> straightforward, but we'd have to remember to manually set it if we add
> any other handles. That's probably not a big deal in this particular
> script, though, which is pretty short.
>
>> About the "use" flag, you're probably thinking of the confusingly
>> named "use utf8", but that's to set your source code to utf8, not your
>> handles, e.g.:
>>
>> $ perl -CA -MDevel::Peek -wE 'use utf8; my $日本語 = shift; Dump $日本語' æ
>> SV = PV(0x12cc090) at 0x12cded8
>>   REFCNT = 1
>>   FLAGS = (PADMY,POK,pPOK,UTF8)
>>   PV = 0x12de460 "\303\246"\0 [UTF8 "\x{e6}"]
>>   CUR = 2
>>   LEN = 16
>>
>> As you can see people got a bit overexcited about Unicode in the 90s.
>
> Yeah, I know "use utf8" doesn't work for that, but I was thinking there
> was some other trick. Digging...ah, here it is:
>
>   use open ':encoding(utf8)'
>
> No clue how portable that is. For such a small script it may be better
> to just stick with vanilla binmode().

Yeah that would work, but doesn't work on 5.8.0, which is the lowest
version we support.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/perf: correctly align non-ASCII descriptions in output
  2017-04-21 22:02       ` Ævar Arnfjörð Bjarmason
@ 2017-04-21 22:05         ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-04-21 22:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Thomas Rast

On Sat, Apr 22, 2017 at 12:02:02AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, I know "use utf8" doesn't work for that, but I was thinking there
> > was some other trick. Digging...ah, here it is:
> >
> >   use open ':encoding(utf8)'
> >
> > No clue how portable that is. For such a small script it may be better
> > to just stick with vanilla binmode().
> 
> Yeah that would work, but doesn't work on 5.8.0, which is the lowest
> version we support.

Thanks for looking that up.

For the benefit of the maintainer, I'll summarize: Ævar's original patch
is fine as-is.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-21 22:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-21 19:44 [PATCH] t/perf: correctly align non-ASCII descriptions in output Ævar Arnfjörð Bjarmason
2017-04-21 20:41 ` Jeff King
2017-04-21 21:28   ` Ævar Arnfjörð Bjarmason
2017-04-21 21:35     ` Jeff King
2017-04-21 22:02       ` Ævar Arnfjörð Bjarmason
2017-04-21 22:05         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).