user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 1/6] viewvcs: cleanup utf8 handling
Date: Tue,  5 Feb 2019 11:10:48 +0000
Message-ID: <20190205111053.7155-2-e@80x24.org> (raw)
In-Reply-To: <20190205111053.7155-1-e@80x24.org>

Favor in-place utf8::decode since it's a bit faster without
method dispatch overhead; and don't care about validity just
yet.

HlMod->do_hl itself should return "utf8" strings, since other
parts of our code can use it, so it's not the job of ViewVCS to
post-process HlMod output.
---
 lib/PublicInbox/HlMod.pm   | 7 ++++++-
 lib/PublicInbox/ViewVCS.pm | 6 ++----
 t/hl_mod.t                 | 1 +
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/HlMod.pm b/lib/PublicInbox/HlMod.pm
index 237ffac..decfd71 100644
--- a/lib/PublicInbox/HlMod.pm
+++ b/lib/PublicInbox/HlMod.pm
@@ -107,7 +107,12 @@ sub do_hl {
 		$g->setEncoding('utf-8');
 		$g;
 	};
-	\($gen->generateString($$str))
+
+	# we assume $$str is valid UTF-8, but the SWIG binding doesn't
+	# know that, so ensure it's marked as UTF-8 even if it isnt...
+	my $out = $gen->generateString($$str);
+	utf8::decode($out);
+	\$out;
 }
 
 # SWIG instances aren't reference-counted, but $self is;
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index d67b5eb..acdd822 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -16,7 +16,6 @@
 package PublicInbox::ViewVCS;
 use strict;
 use warnings;
-use Encode qw(find_encoding);
 use PublicInbox::SolverGit;
 use PublicInbox::WwwStream;
 use PublicInbox::Linkify;
@@ -33,7 +32,6 @@ END { $hl = undef };
 
 my %QP_MAP = ( A => 'oid_a', B => 'oid_b', a => 'path_a', b => 'path_b' );
 my $max_size = 1024 * 1024; # TODO: configurable
-my $enc_utf8 = find_encoding('UTF-8');
 my $BIN_DETECT = 8000; # same as git
 
 sub html_page ($$$) {
@@ -122,14 +120,14 @@ sub solve_result {
 		return html_page($ctx, 200, \$log);
 	}
 
-	$$blob = $enc_utf8->decode($$blob);
+	# TODO: detect + convert to ensure validity
+	utf8::decode($$blob);
 	my $nl = ($$blob =~ tr/\n/\n/);
 	my $pad = length($nl);
 
 	$l->linkify_1($$blob);
 	my $ok = $hl->do_hl($blob, $path) if $hl;
 	if ($ok) {
-		$$ok = $enc_utf8->decode($$ok);
 		src_escape($$ok);
 		$blob = $ok;
 	} else {
diff --git a/t/hl_mod.t b/t/hl_mod.t
index 80f8890..c402f1f 100644
--- a/t/hl_mod.t
+++ b/t/hl_mod.t
@@ -19,6 +19,7 @@ my $orig = $str;
 {
 	my $ref = $hls->do_hl(\$str, 'foo.perl');
 	is(ref($ref), 'SCALAR', 'got a scalar reference back');
+	ok(utf8::valid($$ref), 'resulting string is utf8::valid');
 	like($$ref, qr/I can see you!/, 'we can see ourselves in output');
 	like($$ref, qr/&amp;&amp;/, 'escaped');
 
-- 
EW


  reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 11:10 [PATCH 0/6] highlighting cleanups + help update Eric Wong
2019-02-05 11:10 ` Eric Wong [this message]
2019-02-05 11:10 ` [PATCH 2/6] hlmod: hoist out do_hl_lang sub Eric Wong
2019-02-05 11:10 ` [PATCH 3/6] hlmod: make into a singleton Eric Wong
2019-02-05 11:10 ` [PATCH 4/6] hlmod: do_hl* performs src_escape immediately Eric Wong
2019-02-05 11:10 ` [PATCH 5/6] hlmod: support "```$LANG" blocks in text Eric Wong
2019-02-05 11:10 ` [PATCH 6/6] wwwtext: inline sample CSS and use highlight Eric Wong

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190205111053.7155-2-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox