git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Shin Kojima <shin@kojima.org>
Cc: git@vger.kernel.org, Christopher Wilson <cwilson@cdwilson.us>,
	Jakub Narebski <jnareb@gmail.com>
Subject: Re: [PATCH] gitweb: apply fallback encoding before highlight
Date: Mon, 02 May 2016 10:49:41 -0700	[thread overview]
Message-ID: <xmqqbn4ouz7u.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <1461151948-38583-1-git-send-email-shin@kojima.org> (Shin Kojima's message of "Wed, 20 Apr 2016 20:32:28 +0900")

Shin Kojima <shin@kojima.org> writes:

> Some multi-byte character encodings (such as Shift_JIS and GBK) have
> characters whose final bytes is an ASCII '\' (0x5c), and they
> will be displayed as funny-characters even if $fallback_encoding is
> correct.  This is because `highlight` command always expects UTF-8
> encoded strings from STDIN.
>
>     $ echo 'my $v = "申";' | highlight --syntax perl | w3m -T text/html -dump
>     my $v = "申";
>
>     $ echo 'my $v = "申";' | iconv -f UTF-8 -t Shift_JIS | highlight \
>         --syntax perl | iconv -f Shift_JIS -t UTF-8 | w3m -T text/html -dump
>
>     iconv: (stdin):9:135: cannot convert
>     my $v = "
>
> This patch prepare git blob objects to be encoded into UTF-8 before
> highlighting in the manner of `to_utf8` subroutine.
> ---

The single liner Perl invoked from the script felt a bit too dense
to my taste but other than that I have no complaints to what the
patched code does.

Jakub, does it look good to you, too?

Please sign-off your patch (see Documentation/SubmittingPatches).

Thanks.


>  gitweb/gitweb.perl | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 05d7910..2fddf75 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3935,6 +3935,9 @@ sub run_highlighter {
>  
>  	close $fd;
>  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
> +	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
> +	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
> +	            '--', "-fe=$fallback_encoding")." | ".
>  	          quote_command($highlight_bin).
>  	          " --replace-tabs=8 --fragment --syntax $syntax |"
>  		or die_error(500, "Couldn't open file or run syntax highlighter");

  reply	other threads:[~2016-05-02 17:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-20 11:32 [PATCH] gitweb: apply fallback encoding before highlight Shin Kojima
2016-05-02 17:49 ` Junio C Hamano [this message]
2016-05-02 18:12   ` Jakub Narębski
2016-05-03 13:00   ` [PATCH v2] " Shin Kojima
2016-05-03 18:33     ` Junio C Hamano
2016-05-04  8:34       ` Shin Kojima
2016-05-04 19:34         ` Junio C Hamano
2016-05-05 10:22           ` Shin Kojima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbn4ouz7u.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=cwilson@cdwilson.us \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=shin@kojima.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).