git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi.
@ 2018-09-08 17:15 Julien Moutinho
  2018-09-08 18:14 ` Julien Moutinho
  0 siblings, 1 reply; 2+ messages in thread
From: Julien Moutinho @ 2018-09-08 17:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Description
===========
An old (2011) description of the problem is here:
https://stackoverflow.com/questions/7285215/nginx-fastcgi-utf-8-encoding-output-iso-8859-1-instead-of-utf8#answer-18149487

Basically, gitweb's HTML output is not always encoded in UTF-8
when using --fastcgi.


Reproduction
============
gitweb v2.18.0
perl   v5.28.0

| echo Système >test.git/description

According to the 2011 problem report,
the problem only appears when using gitweb.cgi --fastcgi
not when gitweb.cgi is spawned by fcgiwrap.

And apparently, the text must not contain one character
which cannot be correctly converted to ISO-8859-1,
or an UTF-8 encoding is done (not sure by what);
which made this bug harder to spot.


Explanation
===========
According to Christian Hansen (chansen), the problem is that:
> FCGI streams are implemented using the older stream API,
> TIEHANDLE. Applying PerlIO layers using binmode() has no effect.
https://stackoverflow.com/questions/5005104/how-to-force-fastcgi-to-encode-form-data-as-utf-8-as-cgi-pm-has-option#answer-7097698
https://perldoc.perl.org/perltie.html

Indeed:
> FCGI.pm isn't Unicode aware,
> only characters within the range 0x00-0xFF are supported.
https://metacpan.org/pod/FCGI#LIMITATIONS

But, as stated in gitweb's to_utf8():
> gitweb writes out in utf-8
> thanks to "binmode STDOUT, ':utf8'" at beginning"


Correction
==========
Christian Hansen suggested that:
"The proper solution would be to encode your data before outputting it,
but if thats not an option I can offer this hotpatch:"

| my $enc = Encode::find_encoding('UTF-8');
| my $org = \&FCGI::Stream::PRINT;
| no warnings 'redefine';
| local *FCGI::Stream::PRINT = sub {
|     my @OUTPUT = @_;
|     for (my $i = 1; $i < @_; $i++) {
|         $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC);
|     }
|     @_ = @OUTPUT;
|     goto $org;
| };

As a quick workaround this hotpatch can even be put in $GITWEB_CONFIG
by removing the `local` before `*FCGI::Stream::PRINT`.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi.
  2018-09-08 17:15 gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi Julien Moutinho
@ 2018-09-08 18:14 ` Julien Moutinho
  0 siblings, 0 replies; 2+ messages in thread
From: Julien Moutinho @ 2018-09-08 18:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Le sam. 08 sept. 2018 à 19:15:32 +0200, Julien Moutinho a écrit :
> As a quick workaround this hotpatch can even be put in $GITWEB_CONFIG
> by removing the `local` before `*FCGI::Stream::PRINT`.

Turns out to require more care than that,
due to $per_request_config reloading $GITWEB_CONFIG at each request,
overwriting FCGI::Stream::PRINT multiple times, messing the encoding.
This seems to work(around):
| if ($first_request) {
|         my $enc = Encode::find_encoding('UTF-8');
|         my $org = \&FCGI::Stream::PRINT;
|         no warnings 'redefine';
|         *FCGI::Stream::PRINT = sub {
|             my @OUTPUT = @_;
|             for (my $i = 1; $i < @_; $i++) {
|                 $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC);
|             }
|             @_ = @OUTPUT;
|             goto $org;
|         };
| }

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-09-08 18:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-08 17:15 gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi Julien Moutinho
2018-09-08 18:14 ` Julien Moutinho

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).