From: Julien Moutinho <julm+git@autogeree.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>
Subject: gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi.
Date: Sat, 8 Sep 2018 19:15:32 +0200 [thread overview]
Message-ID: <20180908171532.uyz7i5oy6w6otp2r@localhost> (raw)
Description
===========
An old (2011) description of the problem is here:
https://stackoverflow.com/questions/7285215/nginx-fastcgi-utf-8-encoding-output-iso-8859-1-instead-of-utf8#answer-18149487
Basically, gitweb's HTML output is not always encoded in UTF-8
when using --fastcgi.
Reproduction
============
gitweb v2.18.0
perl v5.28.0
| echo Système >test.git/description
According to the 2011 problem report,
the problem only appears when using gitweb.cgi --fastcgi
not when gitweb.cgi is spawned by fcgiwrap.
And apparently, the text must not contain one character
which cannot be correctly converted to ISO-8859-1,
or an UTF-8 encoding is done (not sure by what);
which made this bug harder to spot.
Explanation
===========
According to Christian Hansen (chansen), the problem is that:
> FCGI streams are implemented using the older stream API,
> TIEHANDLE. Applying PerlIO layers using binmode() has no effect.
https://stackoverflow.com/questions/5005104/how-to-force-fastcgi-to-encode-form-data-as-utf-8-as-cgi-pm-has-option#answer-7097698
https://perldoc.perl.org/perltie.html
Indeed:
> FCGI.pm isn't Unicode aware,
> only characters within the range 0x00-0xFF are supported.
https://metacpan.org/pod/FCGI#LIMITATIONS
But, as stated in gitweb's to_utf8():
> gitweb writes out in utf-8
> thanks to "binmode STDOUT, ':utf8'" at beginning"
Correction
==========
Christian Hansen suggested that:
"The proper solution would be to encode your data before outputting it,
but if thats not an option I can offer this hotpatch:"
| my $enc = Encode::find_encoding('UTF-8');
| my $org = \&FCGI::Stream::PRINT;
| no warnings 'redefine';
| local *FCGI::Stream::PRINT = sub {
| my @OUTPUT = @_;
| for (my $i = 1; $i < @_; $i++) {
| $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC);
| }
| @_ = @OUTPUT;
| goto $org;
| };
As a quick workaround this hotpatch can even be put in $GITWEB_CONFIG
by removing the `local` before `*FCGI::Stream::PRINT`.
next reply other threads:[~2018-09-08 17:30 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-08 17:15 Julien Moutinho [this message]
2018-09-08 18:14 ` gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi Julien Moutinho
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180908171532.uyz7i5oy6w6otp2r@localhost \
--to=julm+git@autogeree.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).