user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 00/22] HTML display cleanups, fixes, speedups
@ 2020-01-25  4:44 Eric Wong
  2020-01-25  4:44 ` [PATCH 01/22] www*stream: favor \&close instead of *close Eric Wong
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

There's a lot more to do, but there's a couple of fixes for diff
viewing and href generation for solver.  ViewDiff.pm is
significantly easier-to-read and follow <span> tags for.

Eric Wong (22):
  www*stream: favor \&close instead of *close
  www: use "skel" terminology consistently
  view: improve readability around walk_thread
  searchview: keep $noop sub private to the package
  view: reduce parameters for html_footer
  view: thread_skel: drop constant tpfx parameter
  view: simplify duplicate Message-ID handling
  wwwstream: discard single-use $ctx fields after use
  view: start performing buffering into {obuf}
  t/plack.t: modernize and unindent
  init: use Import::run_die instead of system()
  tests: move the majority of t/view.t into t/plack.t
  xt/perf-msgview: switch to multipart_text_as_html
  view: inline and eliminate msg_html
  linkify: compile $LINK_RE once
  linkify: move to_html over from ViewDiff
  searchidx: skip filenames on "diff --git ..."
  searchidx: don't assume "a/" and "b/" as prefixes
  viewdiff: add "b=" param with non-standard diff prefix
  viewdiff: add "b=" param when missing "diff --git" line
  viewdiff: use autovivification for long_path hash
  viewdiff: rewrite and simplify

 lib/PublicInbox/Linkify.pm       |   4 +-
 lib/PublicInbox/SearchIdx.pm     |  12 +-
 lib/PublicInbox/SearchView.pm    |  14 +-
 lib/PublicInbox/View.pm          | 214 +++++++--------
 lib/PublicInbox/ViewDiff.pm      | 283 ++++++++++---------
 lib/PublicInbox/ViewVCS.pm       |  10 +-
 lib/PublicInbox/WwwAtomStream.pm |  10 +-
 lib/PublicInbox/WwwListing.pm    |   3 +-
 lib/PublicInbox/WwwStream.pm     |  21 +-
 script/public-inbox-init         |  10 +-
 t/mid.t                          |   8 +-
 t/plack.t                        | 456 ++++++++++++++++++-------------
 t/view.t                         | 207 ++------------
 xt/perf-msgview.t                |  10 +-
 xt/solver.t                      |   2 +-
 15 files changed, 564 insertions(+), 700 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/22] www*stream: favor \&close instead of *close
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 02/22] www: use "skel" terminology consistently Eric Wong
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

Be explicit that we're making a code reference, and not
a reference to a scalar, array, hash, or IO...
---
 lib/PublicInbox/WwwAtomStream.pm | 2 +-
 lib/PublicInbox/WwwStream.pm     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 9ec1383d..cb7ffe35 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -23,7 +23,7 @@ sub new {
 	my ($class, $ctx, $cb) = @_;
 	$ctx->{emit_header} = 1;
 	$ctx->{feed_base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	bless { cb => $cb || *close, ctx => $ctx }, $class;
+	bless { cb => $cb || \&close, ctx => $ctx }, $class;
 }
 
 sub response {
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index a724d069..ef5897b2 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -23,7 +23,7 @@ sub new {
 	chop $base_url; # no trailing slash for clone
 	bless {
 		nr => 0,
-		cb => $cb || *close,
+		cb => $cb || \&close,
 		ctx => $ctx,
 		base_url => $base_url,
 	}, $class;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 02/22] www: use "skel" terminology consistently
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
  2020-01-25  4:44 ` [PATCH 01/22] www*stream: favor \&close instead of *close Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 03/22] view: improve readability around walk_thread Eric Wong
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

This saves us a few comments and confusion.   Yes, it's a
destination so "dst" can be appropriate, but we may be using
that term elsewhere.
---
 lib/PublicInbox/SearchView.pm |  4 +--
 lib/PublicInbox/View.pm       | 46 +++++++++++++++++------------------
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 0d2e71fd..584b88ed 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -277,7 +277,7 @@ sub mset_thread {
 	$ctx->{-upfx} = '';
 	$ctx->{anchor_idx} = 1;
 	$ctx->{cur_level} = 0;
-	$ctx->{dst} = \$skel;
+	$ctx->{skel} = \$skel;
 	$ctx->{mapping} = {};
 	$ctx->{searchview} = 1;
 	$ctx->{prev_attr} = '';
@@ -303,7 +303,7 @@ sub mset_thread_i {
 		return PublicInbox::View::index_entry($smsg, $ctx,
 							scalar @$msgs);
 	}
-	my ($skel) = delete @$ctx{qw(dst msgs)};
+	my ($skel) = delete @$ctx{qw(skel msgs)};
 	$$skel .= "\n</pre>";
 }
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index d88b34da..66b9c92c 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -412,7 +412,7 @@ sub thread_index_entry {
 
 sub stream_thread_i { # PublicInbox::WwwStream::getline callback
 	my ($nr, $ctx) = @_;
-	return unless exists($ctx->{dst});
+	return unless exists($ctx->{skel});
 	my $q = $ctx->{-queue};
 	while (@$q) {
 		my $level = shift @$q;
@@ -425,7 +425,7 @@ sub stream_thread_i { # PublicInbox::WwwStream::getline callback
 			return ghost_index_entry($ctx, $level, $node);
 		}
 	}
-	join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{dst}}; # skel
+	join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{skel}};
 }
 
 sub stream_thread ($$) {
@@ -449,6 +449,7 @@ sub stream_thread ($$) {
 	PublicInbox::WwwStream->response($ctx, 200, \&stream_thread_i);
 }
 
+# /$INBOX/$MESSAGE_ID/t/
 sub thread_html {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
@@ -465,7 +466,7 @@ sub thread_html {
 	$skel .= "-- links below jump to the message on this page --\n";
 	$ctx->{-upfx} = '../../';
 	$ctx->{cur_level} = 0;
-	$ctx->{dst} = \$skel;
+	$ctx->{skel} = \$skel;
 	$ctx->{prev_attr} = '';
 	$ctx->{prev_level} = 0;
 	$ctx->{root_anchor} = anchor_for($mid);
@@ -501,7 +502,7 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 		$ctx->{-inbox}->smsg_mime($smsg) or next;
 		return index_entry($smsg, $ctx, scalar @$msgs);
 	}
-	my ($skel) = delete @$ctx{qw(dst msgs)};
+	my ($skel) = delete @$ctx{qw(skel msgs)};
 	$$skel;
 }
 
@@ -722,7 +723,7 @@ sub _msg_html_prepare {
 }
 
 sub thread_skel {
-	my ($dst, $ctx, $hdr, $tpfx) = @_;
+	my ($skel, $ctx, $hdr, $tpfx) = @_;
 	my $mid = mids($hdr)->[0];
 	my $ibx = $ctx->{-inbox};
 	my ($nr, $msgs) = $ibx->over->get_thread($mid);
@@ -732,21 +733,21 @@ sub thread_skel {
 			qq(<a\nhref="${tpfx}t.atom">Atom feed</a>);
 
 	my $parent = in_reply_to($hdr);
-	$$dst .= "\n<b>Thread overview: </b>";
+	$$skel .= "\n<b>Thread overview: </b>";
 	if ($nr <= 1) {
 		if (defined $parent) {
-			$$dst .= "$expand\n ";
-			$$dst .= ghost_parent("$tpfx../", $parent) . "\n";
+			$$skel .= "$expand\n ";
+			$$skel .= ghost_parent("$tpfx../", $parent) . "\n";
 		} else {
-			$$dst .= "[no followups] $expand\n";
+			$$skel .= "[no followups] $expand\n";
 		}
 		$ctx->{next_msg} = undef;
 		$ctx->{parent_msg} = $parent;
 		return;
 	}
 
-	$$dst .= "$nr+ messages / $expand";
-	$$dst .= qq!  <a\nhref="#b">top</a>\n!;
+	$$skel .= "$nr+ messages / $expand";
+	$$skel .= qq!  <a\nhref="#b">top</a>\n!;
 
 	# nb: mutt only shows the first Subject in the index pane
 	# when multiple Subject: headers are present, so we follow suit:
@@ -756,7 +757,7 @@ sub thread_skel {
 	$ctx->{cur} = $mid;
 	$ctx->{prev_attr} = '';
 	$ctx->{prev_level} = 0;
-	$ctx->{dst} = $dst;
+	$ctx->{skel} = $skel;
 
 	# reduce hash lookups in skel_dump
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
@@ -804,11 +805,11 @@ sub html_footer {
 	my $ibx = $ctx->{-inbox} if $ctx;
 	my $upfx = '../';
 	my $tpfx = '';
-	my $idx = $standalone ? " <a\nhref=\"$upfx\">index</a>" : '';
+	my $skel = $standalone ? " <a\nhref=\"$upfx\">index</a>" : '';
 	my $irt = '';
-	if ($idx && $ibx->over) {
-		$idx .= "\n";
-		thread_skel(\$idx, $ctx, $hdr, $tpfx);
+	if ($skel && $ibx->over) {
+		$skel .= "\n";
+		thread_skel(\$skel, $ctx, $hdr, $tpfx);
 		my ($next, $prev);
 		my $parent = '       ';
 		$next = $prev = '    ';
@@ -843,7 +844,7 @@ sub html_footer {
 	}
 	$rhref ||= '#R';
 	$irt .= qq(<a\nhref="$rhref">reply</a>);
-	$irt .= $idx;
+	$irt .= $skel;
 }
 
 sub linkify_ref_no_over {
@@ -952,12 +953,12 @@ sub skel_dump {
 	my ($ctx, $level, $node) = @_;
 	my $smsg = $node->{smsg} or return _skel_ghost($ctx, $level, $node);
 
-	my $dst = $ctx->{dst};
+	my $skel = $ctx->{skel};
 	my $cur = $ctx->{cur};
 	my $mid = $smsg->{mid};
 
 	if ($level == 0 && $ctx->{skel_dump_roots}++) {
-		$$dst .= delete $ctx->{sl_note} || '';
+		$$skel .= delete($ctx->{sl_note}) || '';
 	}
 
 	my $f = ascii_html($smsg->from_name);
@@ -986,7 +987,7 @@ sub skel_dump {
 	if ($cur) {
 		if ($cur eq $mid) {
 			delete $ctx->{cur};
-			$$dst .= "<b>$d<a\nid=r\nhref=\"#t\">".
+			$$skel .= "<b>$d<a\nid=r\nhref=\"#t\">".
 				 "$attr [this message]</a></b>\n";
 			return 1;
 		} else {
@@ -1026,7 +1027,7 @@ sub skel_dump {
 	} else {
 		$m = $ctx->{-upfx}.mid_escape($mid).'/';
 	}
-	$$dst .=  $d . "<a\nhref=\"$m\"$id>" . $end;
+	$$skel .=  $d . "<a\nhref=\"$m\"$id>" . $end;
 	1;
 }
 
@@ -1051,8 +1052,7 @@ sub _skel_ghost {
 	} else {
 		$d .= qq{&lt;<a\nhref="$href">$html</a>&gt;\n};
 	}
-	my $dst = $ctx->{dst};
-	$$dst .= $d;
+	${$ctx->{skel}} .= $d;
 	1;
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 03/22] view: improve readability around walk_thread
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
  2020-01-25  4:44 ` [PATCH 01/22] www*stream: favor \&close instead of *close Eric Wong
  2020-01-25  4:44 ` [PATCH 02/22] www: use "skel" terminology consistently Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 04/22] searchview: keep $noop sub private to the package Eric Wong
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

Pass \&coderefs explicitly to walk_thread, and add some
prototypes + comments to describe what goes on.
---
 lib/PublicInbox/SearchView.pm |  2 +-
 lib/PublicInbox/View.pm       | 19 ++++++++++---------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 584b88ed..94a55b8d 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -287,7 +287,7 @@ sub mset_thread {
 	# reduce hash lookups in skel_dump
 	$ctx->{-obfuscate} = $ctx->{-inbox}->{obfuscate};
 	PublicInbox::View::walk_thread($rootset, $ctx,
-		*PublicInbox::View::pre_thread);
+		\&PublicInbox::View::pre_thread);
 
 	@$msgs = reverse @$msgs if $r;
 	$ctx->{msgs} = $msgs;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 66b9c92c..0a6903f1 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -385,7 +385,8 @@ sub _th_index_lite {
 	$rv .= $pad ."<a\nhref=#r$id>$s_s, $s_c; $ctx->{s_nr}</a>\n";
 }
 
-sub walk_thread {
+# non-recursive thread walker
+sub walk_thread ($$$) {
 	my ($rootset, $ctx, $cb) = @_;
 	my @q = map { (0, $_, -1) } @$rootset;
 	while (@q) {
@@ -398,7 +399,7 @@ sub walk_thread {
 	}
 }
 
-sub pre_thread  {
+sub pre_thread  { # walk_thread callback
 	my ($ctx, $level, $node, $idx) = @_;
 	$ctx->{mapping}->{$node->{id}} = [ '', $node, $idx, $level ];
 	skel_dump($ctx, $level, $node);
@@ -478,7 +479,7 @@ sub thread_html {
 
 	# reduce hash lookups in pre_thread->skel_dump
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
-	walk_thread($rootset, $ctx, *pre_thread);
+	walk_thread($rootset, $ctx, \&pre_thread);
 
 	$skel .= '</pre>';
 	return stream_thread($rootset, $ctx) unless $ctx->{flat};
@@ -761,7 +762,7 @@ sub thread_skel {
 
 	# reduce hash lookups in skel_dump
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
-	walk_thread(thread_results($ctx, $msgs), $ctx, *skel_dump);
+	walk_thread(thread_results($ctx, $msgs), $ctx, \&skel_dump);
 
 	$ctx->{parent_msg} = $parent;
 }
@@ -912,7 +913,7 @@ sub thread_results {
 	if (defined($mid) && scalar(@$rootset) > 1) {
 		$ctx->{root_idx} = -1;
 		my $nr = scalar @$msgs;
-		walk_thread($rootset, $ctx, *find_mid_root);
+		walk_thread($rootset, $ctx, \&find_mid_root);
 		my $idx = $ctx->{found_mid_at};
 		if (defined($idx) && $idx != 0) {
 			my $tip = splice(@$rootset, $idx, 1);
@@ -949,7 +950,7 @@ sub dedupe_subject {
 	$omit;
 }
 
-sub skel_dump {
+sub skel_dump { # walk_thread callback
 	my ($ctx, $level, $node) = @_;
 	my $smsg = $node->{smsg} or return _skel_ghost($ctx, $level, $node);
 
@@ -1065,7 +1066,7 @@ sub sort_ds {
 
 # accumulate recent topics if search is supported
 # returns 200 if done, 404 if not
-sub acc_topic {
+sub acc_topic { # walk_thread callback
 	my ($ctx, $level, $node) = @_;
 	my $mid = $node->{id};
 	my $x = $node->{smsg} || $ctx->{-inbox}->smsg_by_mid($mid);
@@ -1231,9 +1232,9 @@ sub index_topics {
 	my ($ctx) = @_;
 	my $msgs = paginate_recent($ctx, 200); # 200 is our window
 	if (@$msgs) {
-		walk_thread(thread_results($ctx, $msgs), $ctx, *acc_topic);
+		walk_thread(thread_results($ctx, $msgs), $ctx, \&acc_topic);
 	}
-	PublicInbox::WwwStream->response($ctx, dump_topics($ctx), *index_nav);
+	PublicInbox::WwwStream->response($ctx, dump_topics($ctx), \&index_nav);
 }
 
 sub thread_adj_level {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 04/22] searchview: keep $noop sub private to the package
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (2 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 03/22] view: improve readability around walk_thread Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 05/22] view: reduce parameters for html_footer Eric Wong
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

It'll always be used as a callback, so there's no point in
giving it a name to be called non-anonymously.  Making
assigments to it is slightly faster since there's no need
to repeatedly do a lookup by name.
---
 lib/PublicInbox/SearchView.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 94a55b8d..97233069 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -14,7 +14,7 @@ use PublicInbox::SearchThread;
 our $LIM = 200;
 my %rmap_inc;
 
-sub noop {}
+my $noop = sub {};
 
 sub mbox_results {
 	my ($ctx) = @_;
@@ -59,7 +59,7 @@ retry:
 	if ($err) {
 		$code = 400;
 		$ctx->{-html_tip} = '<pre>'.err_txt($ctx, $err).'</pre><hr>';
-		$cb = *noop;
+		$cb = $noop;
 	} elsif ($total == 0) {
 		if (defined($ctx->{-uxs_retried})) {
 			# undo retry damage:
@@ -71,7 +71,7 @@ retry:
 		}
 		$code = 404;
 		$ctx->{-html_tip} = "<pre>\n[No results found]</pre><hr>";
-		$cb = *noop;
+		$cb = $noop;
 	} else {
 		return adump($_[0], $mset, $q, $ctx) if $x eq 'A';
 
@@ -122,7 +122,7 @@ sub mset_summary {
 		$$res .= "$pfx  - by $f @ $date UTC [$pct%]\n\n";
 	}
 	$$res .= search_nav_bot($mset, $q);
-	*noop;
+	$noop;
 }
 
 # shorten "/full/path/to/Foo/Bar.pm" to "Foo/Bar.pm" so error

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 05/22] view: reduce parameters for html_footer
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (3 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 04/22] searchview: keep $noop sub private to the package Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 06/22] view: thread_skel: drop constant tpfx parameter Eric Wong
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

Put more logic into html_footer and less in its only caller so
we can control the buffering and string creation.
---
 lib/PublicInbox/View.pm | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 0a6903f1..02dd86da 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -40,11 +40,7 @@ sub msg_html_i {
 		# fake an EOF if generating the footer fails;
 		# we want to at least show the message if something
 		# here crashes:
-		eval {
-			my $hdr = delete($ctx->{hdr});
-			'<pre>' . html_footer($hdr, 1, $ctx) .
-			'</pre>' . msg_reply($ctx, $hdr)
-		};
+		eval { html_footer($ctx) };
 	} else {
 		undef
 	}
@@ -113,7 +109,7 @@ sub msg_html_more {
 }
 
 # /$INBOX/$MESSAGE_ID/#R
-sub msg_reply {
+sub msg_reply ($$) {
 	my ($ctx, $hdr) = @_;
 	my $se_url =
 	 'https://kernel.org/pub/software/scm/git/docs/git-send-email.html';
@@ -800,17 +796,17 @@ sub _parent_headers {
 	$rv;
 }
 
+# returns a string buffer via ->getline
 sub html_footer {
-	my ($hdr, $standalone, $ctx, $rhref) = @_;
-
-	my $ibx = $ctx->{-inbox} if $ctx;
+	my ($ctx) = @_;
+	my $ibx = $ctx->{-inbox};
+	my $hdr = delete $ctx->{hdr};
 	my $upfx = '../';
-	my $tpfx = '';
-	my $skel = $standalone ? " <a\nhref=\"$upfx\">index</a>" : '';
-	my $irt = '';
-	if ($skel && $ibx->over) {
+	my $skel = " <a\nhref=\"$upfx\">index</a>";
+	my $rv = '<pre>';
+	if ($ibx->over) {
 		$skel .= "\n";
-		thread_skel(\$skel, $ctx, $hdr, $tpfx);
+		thread_skel(\$skel, $ctx, $hdr, '');
 		my ($next, $prev);
 		my $parent = '       ';
 		$next = $prev = '    ';
@@ -839,13 +835,12 @@ sub html_footer {
 		} elsif ($u) { # unlikely
 			$parent = " <a\nhref=\"$u\"\nrel=prev>parent</a>";
 		}
-		$irt = "$next $prev$parent ";
-	} else {
-		$irt = '';
+		$rv .= "$next $prev$parent ";
 	}
-	$rhref ||= '#R';
-	$irt .= qq(<a\nhref="$rhref">reply</a>);
-	$irt .= $skel;
+	$rv .= qq(<a\nhref="#R">reply</a>);
+	$rv .= $skel;
+	$rv .= '</pre>';
+	$rv .= msg_reply($ctx, $hdr);
 }
 
 sub linkify_ref_no_over {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 06/22] view: thread_skel: drop constant tpfx parameter
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (4 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 05/22] view: reduce parameters for html_footer Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 07/22] view: simplify duplicate Message-ID handling Eric Wong
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

It hasn't changed in a few years.  Now we can rely on constant
folding to avoid extraneous ops to the $skel buffer.
---
 lib/PublicInbox/View.pm | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 02dd86da..8bfa9bec 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -719,32 +719,34 @@ sub _msg_html_prepare {
 	$rv .= "\n";
 }
 
-sub thread_skel {
-	my ($skel, $ctx, $hdr, $tpfx) = @_;
+sub SKEL_EXPAND () {
+	qq(expand[<a\nhref="T/#u">flat</a>) .
+		qq(|<a\nhref="t/#u">nested</a>]  ) .
+		qq(<a\nhref="t.mbox.gz">mbox.gz</a>  ) .
+		qq(<a\nhref="t.atom">Atom feed</a>);
+}
+
+sub thread_skel ($$$) {
+	my ($skel, $ctx, $hdr) = @_;
 	my $mid = mids($hdr)->[0];
 	my $ibx = $ctx->{-inbox};
 	my ($nr, $msgs) = $ibx->over->get_thread($mid);
-	my $expand = qq(expand[<a\nhref="${tpfx}T/#u">flat</a>) .
-	                qq(|<a\nhref="${tpfx}t/#u">nested</a>]  ) .
-			qq(<a\nhref="${tpfx}t.mbox.gz">mbox.gz</a>  ) .
-			qq(<a\nhref="${tpfx}t.atom">Atom feed</a>);
-
 	my $parent = in_reply_to($hdr);
 	$$skel .= "\n<b>Thread overview: </b>";
 	if ($nr <= 1) {
 		if (defined $parent) {
-			$$skel .= "$expand\n ";
-			$$skel .= ghost_parent("$tpfx../", $parent) . "\n";
+			$$skel .= SKEL_EXPAND."\n ";
+			$$skel .= ghost_parent('../', $parent) . "\n";
 		} else {
-			$$skel .= "[no followups] $expand\n";
+			$$skel .= '[no followups] '.SKEL_EXPAND."\n";
 		}
 		$ctx->{next_msg} = undef;
 		$ctx->{parent_msg} = $parent;
 		return;
 	}
 
-	$$skel .= "$nr+ messages / $expand";
-	$$skel .= qq!  <a\nhref="#b">top</a>\n!;
+	$$skel .= $nr;
+	$$skel .= '+ messages / '.SKEL_EXPAND.qq!  <a\nhref="#b">top</a>\n!;
 
 	# nb: mutt only shows the first Subject in the index pane
 	# when multiple Subject: headers are present, so we follow suit:
@@ -806,7 +808,7 @@ sub html_footer {
 	my $rv = '<pre>';
 	if ($ibx->over) {
 		$skel .= "\n";
-		thread_skel(\$skel, $ctx, $hdr, '');
+		thread_skel(\$skel, $ctx, $hdr);
 		my ($next, $prev);
 		my $parent = '       ';
 		$next = $prev = '    ';

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 07/22] view: simplify duplicate Message-ID handling
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (5 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 06/22] view: thread_skel: drop constant tpfx parameter Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 08/22] wwwstream: discard single-use $ctx fields after use Eric Wong
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

It's an uncommon code path, no need to make it more complex
than it needs to be by having extra sub parameters.
---
 lib/PublicInbox/View.pm | 55 +++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 33 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 8bfa9bec..11767dda 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -33,9 +33,10 @@ sub msg_html_i {
 		$ctx->{tip} .
 			multipart_text_as_html(delete $ctx->{mime}, $upfx,
 						$ctx) . '</pre><hr>'
-	} elsif ($more && @$more) {
+	} elsif ($more) {
 		++$ctx->{end_nr};
-		msg_html_more($ctx, $more, $nr);
+		# fake an EOF if {more} retrieval fails fails;
+		eval { msg_html_more($ctx, $nr) };
 	} elsif ($nr == $ctx->{end_nr}) {
 		# fake an EOF if generating the footer fails;
 		# we want to at least show the message if something
@@ -49,12 +50,11 @@ sub msg_html_i {
 # public functions: (unstable)
 
 sub msg_html {
-	my ($ctx, $mime, $more, $smsg) = @_;
+	my ($ctx, $mime, $smsg) = @_;
 	my $ibx = $ctx->{-inbox};
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
 	my $hdr = $ctx->{hdr} = $mime->header_obj;
-	$ctx->{tip} = _msg_html_prepare($hdr, $ctx, $more, 0);
-	$ctx->{more} = $more;
+	$ctx->{tip} = _msg_html_prepare($hdr, $ctx, 0);
 	$ctx->{end_nr} = 2;
 	$ctx->{smsg} = $smsg;
 	$ctx->{mime} = $mime;
@@ -65,7 +65,7 @@ sub msg_page {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
 	my $ibx = $ctx->{-inbox};
-	my ($first, $more);
+	my ($first);
 	my $smsg;
 	if (my $over = $ibx->over) {
 		my ($id, $prev);
@@ -73,39 +73,28 @@ sub msg_page {
 		$first = $ibx->msg_by_smsg($smsg) if $smsg;
 		if ($first) {
 			my $next = $over->next_by_mid($mid, \$id, \$prev);
-			$more = [ $id, $prev, $next ] if $next;
+			$ctx->{more} = [ $id, $prev, $next ] if $next;
 		}
 		return unless $first;
 	} else {
 		$first = $ibx->msg_by_mid($mid) or return;
 	}
-	msg_html($ctx, PublicInbox::MIME->new($first), $more, $smsg);
+	msg_html($ctx, PublicInbox::MIME->new($first), $smsg);
 }
 
 sub msg_html_more {
-	my ($ctx, $more, $nr) = @_;
-	my $str = eval {
-		my ($id, $prev, $smsg) = @$more;
-		my $mid = $ctx->{mid};
-		my $ibx = $ctx->{-inbox};
-		$smsg = $ibx->smsg_mime($smsg);
-		my $next = $ibx->over->next_by_mid($mid, \$id, \$prev);
-		@$more = $next ? ($id, $prev, $next) : ();
-		if ($smsg) {
-			my $upfx = '../' . mid_escape($smsg->mid) . '/';
-			my $mime = delete $smsg->{mime};
-			_msg_html_prepare($mime->header_obj, $ctx, $more, $nr) .
-				multipart_text_as_html($mime, $upfx, $ctx) .
-				'</pre><hr>'
-		} else {
-			'';
-		}
-	};
-	if ($@) {
-		warn "Error lookup up additional messages: $@\n";
-		$str = '<pre>Error looking up additional messages</pre>';
-	}
-	$str;
+	my ($ctx, $nr) = @_;
+	my ($id, $prev, $smsg) = @{$ctx->{more}};
+	my $ibx = $ctx->{-inbox};
+	$smsg = $ibx->smsg_mime($smsg);
+	my $next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev);
+	$ctx->{more} = $next ? [ $id, $prev, $next ] : undef;
+	return '' unless $smsg;
+	my $upfx = '../' . mid_escape($smsg->mid) . '/';
+	my $mime = delete $smsg->{mime};
+	_msg_html_prepare($mime->header_obj, $ctx, $nr) .
+		multipart_text_as_html($mime, $upfx, $ctx) .
+		'</pre><hr>'
 }
 
 # /$INBOX/$MESSAGE_ID/#R
@@ -638,14 +627,14 @@ sub add_text_body { # callback for msg_iter
 }
 
 sub _msg_html_prepare {
-	my ($hdr, $ctx, $more, $nr) = @_;
+	my ($hdr, $ctx, $nr) = @_;
 	my $atom = '';
 	my $over = $ctx->{-inbox}->over;
 	my $obfs_ibx = $ctx->{-obfs_ibx};
 	my $rv = '';
 	my $mids = mids_for_index($hdr);
 	if ($nr == 0) {
-		if ($more) {
+		if ($ctx->{more}) {
 			$rv .=
 "<pre>WARNING: multiple messages have this Message-ID\n</pre>";
 		}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 08/22] wwwstream: discard single-use $ctx fields after use
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (6 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 07/22] view: simplify duplicate Message-ID handling Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 09/22] view: start performing buffering into {obuf} Eric Wong
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

This should make it clear that we only use these elements
once and can discard them.  While we're in the area, avoid
escaping '"' by using qq() instead of "" to quote strings
requiring interpolation.
---
 lib/PublicInbox/WwwStream.pm | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index ef5897b2..a4ba1fff 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -40,25 +40,20 @@ sub _html_top ($) {
 	my $ctx = $self->{ctx};
 	my $ibx = $ctx->{-inbox};
 	my $desc = ascii_html($ibx->description);
-	my $title = $ctx->{-title_html} || $desc;
+	my $title = delete($ctx->{-title_html}) // $desc;
 	my $upfx = $ctx->{-upfx} || '';
 	my $help = $upfx.'_/text/help';
 	my $color = $upfx.'_/text/color';
 	my $atom = $ctx->{-atom} || $upfx.'new.atom';
-	my $tip = $ctx->{-html_tip} || '';
 	my $top = "<b>$desc</b>";
 	my $links = "<a\nhref=\"$help\">help</a> / ".
 			"<a\nhref=\"$color\">color</a> / ".
 			"<a\nhref=\"$atom\">Atom feed</a>";
 	if ($ibx->search) {
-		my $q_val = $ctx->{-q_value_html};
-		if (defined $q_val && $q_val ne '') {
-			$q_val = qq(\nvalue="$q_val");
-		} else {
-			$q_val = '';
-		}
+		my $q_val = delete($ctx->{-q_value_html}) // '';
+		$q_val = qq(\nvalue="$q_val") if $q_val ne '';
 		# XXX gross, for SearchView.pm
-		my $extra = $ctx->{-extra_form_html} || '';
+		my $extra = delete($ctx->{-extra_form_html}) // '';
 		my $action = $upfx eq '' ? './' : $upfx;
 		$top = qq{<form\naction="$action"><pre>$top} .
 			  qq{\n<input\nname=q\ntype=text$q_val />} .
@@ -70,10 +65,10 @@ sub _html_top ($) {
 		$top = '<pre>' . $top . "\n" . $links . '</pre>';
 	}
 	"<html><head><title>$title</title>" .
-		"<link\nrel=alternate\ntitle=\"Atom feed\"\n".
-		"href=\"$atom\"\ntype=\"application/atom+xml\"/>" .
+		qq(<link\nrel=alternate\ntitle="Atom feed"\n).
+		qq(href="$atom"\ntype="application/atom+xml"/>) .
 	        $ctx->{www}->style($upfx) .
-		"</head><body>". $top . $tip;
+		'</head><body>'. $top . (delete($ctx->{-html_tip}) // '');
 }
 
 sub code_footer ($) {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 09/22] view: start performing buffering into {obuf}
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (7 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 08/22] wwwstream: discard single-use $ctx fields after use Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 10/22] t/plack.t: modernize and unindent Eric Wong
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

Get rid of the confusingly named {rv} and {tip} fields
and unify them into {obuf} for readability.

{obuf} usage may be expanded to more areas in the future.  This
will eventually make it easier for us to experiment with
alternative buffering schemes.
---
 lib/PublicInbox/View.pm          | 27 ++++++++++++---------------
 lib/PublicInbox/ViewVCS.pm       |  4 ++--
 lib/PublicInbox/WwwAtomStream.pm |  8 +++++---
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 11767dda..9f340132 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -30,9 +30,8 @@ sub msg_html_i {
 	if ($nr == 1) {
 		# $more cannot be true w/o $smsg being defined:
 		my $upfx = $more ? '../'.mid_escape($ctx->{smsg}->mid).'/' : '';
-		$ctx->{tip} .
-			multipart_text_as_html(delete $ctx->{mime}, $upfx,
-						$ctx) . '</pre><hr>'
+		multipart_text_as_html(delete $ctx->{mime}, $upfx, $ctx);
+		${delete $ctx->{obuf}} .= '</pre><hr>';
 	} elsif ($more) {
 		++$ctx->{end_nr};
 		# fake an EOF if {more} retrieval fails fails;
@@ -54,7 +53,7 @@ sub msg_html {
 	my $ibx = $ctx->{-inbox};
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
 	my $hdr = $ctx->{hdr} = $mime->header_obj;
-	$ctx->{tip} = _msg_html_prepare($hdr, $ctx, 0);
+	_msg_html_prepare_obuf($hdr, $ctx, 0);
 	$ctx->{end_nr} = 2;
 	$ctx->{smsg} = $smsg;
 	$ctx->{mime} = $mime;
@@ -92,9 +91,9 @@ sub msg_html_more {
 	return '' unless $smsg;
 	my $upfx = '../' . mid_escape($smsg->mid) . '/';
 	my $mime = delete $smsg->{mime};
-	_msg_html_prepare($mime->header_obj, $ctx, $nr) .
-		multipart_text_as_html($mime, $upfx, $ctx) .
-		'</pre><hr>'
+	_msg_html_prepare_obuf($mime->header_obj, $ctx, $nr);
+	multipart_text_as_html($mime, $upfx, $ctx);
+	${delete $ctx->{obuf}} .= '</pre><hr>';
 }
 
 # /$INBOX/$MESSAGE_ID/#R
@@ -259,9 +258,9 @@ sub index_entry {
 
 	# scan through all parts, looking for displayable text
 	$ctx->{mhref} = $mhref;
-	$ctx->{rv} = \$rv;
+	$ctx->{obuf} = \$rv;
 	msg_iter($mime, \&add_text_body, $ctx, 1);
-	delete $ctx->{rv};
+	delete $ctx->{obuf};
 
 	# add the footer
 	$rv .= "\n<a\nhref=#$id_m\nid=e$id>^</a> ".
@@ -495,11 +494,9 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 sub multipart_text_as_html {
 	my (undef, $mhref, $ctx) = @_; # $mime = $_[0]
 	$ctx->{mhref} = $mhref;
-	$ctx->{rv} = \(my $rv = '');
 
 	# scan through all parts, looking for displayable text
 	msg_iter($_[0], \&add_text_body, $ctx, 1);
-	${delete $ctx->{rv}};
 }
 
 sub flush_quote {
@@ -538,7 +535,7 @@ sub attach_link ($$$$;$) {
 	} else {
 		$sfn = 'a.bin';
 	}
-	my $rv = $ctx->{rv};
+	my $rv = $ctx->{obuf};
 	$$rv .= qq($nl<a\nhref="$ctx->{mhref}$idx-$sfn">);
 	if ($err) {
 		$$rv .= "[-- Warning: decoded text below may be mangled --]\n";
@@ -601,7 +598,7 @@ sub add_text_body { # callback for msg_iter
 	# split off quoted and unquoted blocks:
 	my @sections = split(/((?:^>[^\n]*\n)+)/sm, $s);
 	$s = '';
-	my $rv = $ctx->{rv};
+	my $rv = $ctx->{obuf};
 	if (defined($fn) || $depth > 0 || $err) {
 		# badly-encoded message with $err? tell the world about it!
 		attach_link($ctx, $ct, $p, $fn, $err);
@@ -626,9 +623,8 @@ sub add_text_body { # callback for msg_iter
 	obfuscate_addrs($ibx, $$rv) if $ibx->{obfuscate};
 }
 
-sub _msg_html_prepare {
+sub _msg_html_prepare_obuf {
 	my ($hdr, $ctx, $nr) = @_;
-	my $atom = '';
 	my $over = $ctx->{-inbox}->over;
 	my $obfs_ibx = $ctx->{-obfs_ibx};
 	my $rv = '';
@@ -706,6 +702,7 @@ sub _msg_html_prepare {
 	}
 	$rv .= _parent_headers($hdr, $over);
 	$rv .= "\n";
+	$ctx->{obuf} = \$rv;
 }
 
 sub SKEL_EXPAND () {
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index ead8c2b4..fdca70cb 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -33,14 +33,14 @@ my $BIN_DETECT = 8000; # same as git
 
 sub html_i { # WwwStream::getline callback
 	my ($nr, $ctx) =  @_;
-	$nr == 1 ? ${delete $ctx->{rv}} : undef;
+	$nr == 1 ? ${delete $ctx->{obuf}} : undef;
 }
 
 sub html_page ($$$) {
 	my ($ctx, $code, $strref) = @_;
 	my $wcb = delete $ctx->{-wcb};
 	$ctx->{-upfx} = '../../'; # from "/$INBOX/$OID/s/"
-	$ctx->{rv} = $strref;
+	$ctx->{obuf} = $strref;
 	my $res = PublicInbox::WwwStream->response($ctx, $code, \&html_i);
 	$wcb ? $wcb->($res) : $res;
 }
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index cb7ffe35..d142a469 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -140,9 +140,11 @@ sub feed_entry {
 		"<id>$uuid</id>$irt" .
 		qq{<content\ntype="xhtml">} .
 		qq{<div\nxmlns="http://www.w3.org/1999/xhtml">} .
-		qq(<pre\nstyle="white-space:pre-wrap">) .
-		PublicInbox::View::multipart_text_as_html($mime, $href, $ctx) .
-		'</pre></div></content></entry>';
+		qq(<pre\nstyle="white-space:pre-wrap">);
+	$ctx->{obuf} = \$s;
+	PublicInbox::View::multipart_text_as_html($mime, $href, $ctx);
+	delete $ctx->{obuf};
+	$s .= '</pre></div></content></entry>';
 }
 
 sub feed_updated {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 10/22] t/plack.t: modernize and unindent
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (8 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 09/22] view: start performing buffering into {obuf} Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:44 ` [PATCH 11/22] init: use Import::run_die instead of system() Eric Wong
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

This test will be expanded, and we can take advantage of
run_script to simplify our internal API use.
---
 t/plack.t | 393 ++++++++++++++++++++++++++----------------------------
 1 file changed, 191 insertions(+), 202 deletions(-)

diff --git a/t/plack.t b/t/plack.t
index ec45b02c..443831a1 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -8,7 +8,7 @@ use PublicInbox::TestCommon;
 my $psgi = "./examples/public-inbox.psgi";
 my ($tmpdir, $for_destroy) = tmpdir();
 my $pi_config = "$tmpdir/config";
-my $maindir = "$tmpdir/main.git";
+my $inboxdir = "$tmpdir/main.git";
 my $addr = 'test-public@example.com';
 my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape);
 require_mods(@mods);
@@ -17,25 +17,22 @@ use_ok 'PublicInbox::Git';
 my @ls;
 
 foreach my $mod (@mods) { use_ok $mod; }
+local $ENV{PI_CONFIG} = $pi_config;
+ok(-f $psgi, "psgi example file found");
+my $pfx = 'http://example.com/test';
+ok(run_script(['-init', 'test', $inboxdir, "$pfx/", $addr]),
+	'initialized repo');
+PublicInbox::Import::run_die([qw(git config -f), $pi_config,
+	'publicinbox.test.newsgroup', 'inbox.test']);
+open my $fh, '>', "$inboxdir/description" or die "open: $!\n";
+print $fh "test for public-inbox\n";
+close $fh or die "close: $!\n";
+my $app = require $psgi;
+my $git = PublicInbox::Git->new($inboxdir);
+my $im = PublicInbox::Import->new($git, 'test', $addr);
+# ensure successful message delivery
 {
-	ok(-f $psgi, "psgi example file found");
-	is(0, system(qw(git init -q --bare), $maindir), "git init (main)");
-	open my $fh, '>', "$maindir/description" or die "open: $!\n";
-	print $fh "test for public-inbox\n";
-	close $fh or die "close: $!\n";
-	open $fh, '>>', $pi_config or die;
-	print $fh <<EOF or die;
-[publicinbox "test"]
-	address = $addr
-	inboxdir = $maindir
-	url = http://example.com/test/
-	newsgroup = inbox.test
-EOF
-	close $fh or die;
-
-	# ensure successful message delivery
-	{
-		my $mime = Email::MIME->new(<<EOF);
+	my $mime = Email::MIME->new(<<EOF);
 From: Me <me\@example.com>
 To: You <you\@example.com>
 Cc: $addr
@@ -45,210 +42,202 @@ Date: Fri, 02 Oct 1993 00:00:00 +0000
 
 zzzzzz
 EOF
-		my $git = PublicInbox::Git->new($maindir);
-		my $im = PublicInbox::Import->new($git, 'test', $addr);
-		$im->add($mime);
-		$im->done;
-		my $rev = $git->qx(qw(rev-list HEAD));
-		like($rev, qr/\A[a-f0-9]{40}/, "good revision committed");
-		@ls = $git->qx(qw(ls-tree -r --name-only HEAD));
-		chomp @ls;
-	}
-	my $app = eval {
-		local $ENV{PI_CONFIG} = $pi_config;
-		require $psgi;
-	};
-
-	test_psgi($app, sub {
-		my ($cb) = @_;
-		foreach my $u (qw(robots.txt favicon.ico .well-known/foo)) {
-			my $res = $cb->(GET("http://example.com/$u"));
-			is($res->code, 404, "$u is missing");
-		}
-	});
-
-	# redirect with newsgroup
-	test_psgi($app, sub {
-		my ($cb) = @_;
-		my $from = 'http://example.com/inbox.test';
-		my $to = 'http://example.com/test/';
-		my $res = $cb->(GET($from));
-		is($res->code, 301, 'newsgroup name is permanent redirect');
-		is($to, $res->header('Location'), 'redirect location matches');
-		$from .= '/';
-		is($res->code, 301, 'newsgroup name/ is permanent redirect');
-		is($to, $res->header('Location'), 'redirect location matches');
-	});
-
-	# redirect with trailing /
-	test_psgi($app, sub {
-		my ($cb) = @_;
-		my $from = 'http://example.com/test';
-		my $to = "$from/";
-		my $res = $cb->(GET($from));
-		is(301, $res->code, 'is permanent redirect');
-		is($to, $res->header('Location'),
-			'redirect location matches with trailing slash');
-	});
+	$im->add($mime);
+	$im->done;
+	my $rev = $git->qx(qw(rev-list HEAD));
+	like($rev, qr/\A[a-f0-9]{40}/, "good revision committed");
+	@ls = $git->qx(qw(ls-tree -r --name-only HEAD));
+	chomp @ls;
+}
 
-	my $pfx = 'http://example.com/test';
-	foreach my $t (qw(t T)) {
-		test_psgi($app, sub {
-			my ($cb) = @_;
-			my $u = $pfx . "/blah\@example.com/$t";
-			my $res = $cb->(GET($u));
-			is(301, $res->code, "redirect for missing /");
-			my $location = $res->header('Location');
-			like($location, qr!/\Q$t\E/#u\z!,
-				'redirected with missing /');
-		});
-	}
-	foreach my $t (qw(f)) {
-		test_psgi($app, sub {
-			my ($cb) = @_;
-			my $u = $pfx . "/blah\@example.com/$t";
-			my $res = $cb->(GET($u));
-			is(301, $res->code, "redirect for legacy /f");
-			my $location = $res->header('Location');
-			like($location, qr!/blah\@example\.com/\z!,
-				'redirected with missing /');
-		});
+test_psgi($app, sub {
+	my ($cb) = @_;
+	foreach my $u (qw(robots.txt favicon.ico .well-known/foo)) {
+		my $res = $cb->(GET("http://example.com/$u"));
+		is($res->code, 404, "$u is missing");
 	}
-
-	test_psgi($app, sub {
-		my ($cb) = @_;
-		my $atomurl = 'http://example.com/test/new.atom';
-		my $res = $cb->(GET('http://example.com/test/new.html'));
-		is(200, $res->code, 'success response received');
-		like($res->content, qr!href="new\.atom"!,
-			'atom URL generated');
-		like($res->content, qr!href="blah\@example\.com/"!,
-			'index generated');
-		like($res->content, qr!1993-10-02!, 'date set');
-	});
-
+});
+
+# redirect with newsgroup
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $from = 'http://example.com/inbox.test';
+	my $to = 'http://example.com/test/';
+	my $res = $cb->(GET($from));
+	is($res->code, 301, 'newsgroup name is permanent redirect');
+	is($to, $res->header('Location'), 'redirect location matches');
+	$from .= '/';
+	is($res->code, 301, 'newsgroup name/ is permanent redirect');
+	is($to, $res->header('Location'), 'redirect location matches');
+});
+
+# redirect with trailing /
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $from = 'http://example.com/test';
+	my $to = "$from/";
+	my $res = $cb->(GET($from));
+	is(301, $res->code, 'is permanent redirect');
+	is($to, $res->header('Location'),
+		'redirect location matches with trailing slash');
+});
+
+foreach my $t (qw(t T)) {
 	test_psgi($app, sub {
 		my ($cb) = @_;
-		my $res = $cb->(GET($pfx . '/atom.xml'));
-		is(200, $res->code, 'success response received for atom');
-		my $body = $res->content;
-		like($body, qr!link\s+href="\Q$pfx\E/blah\@example\.com/"!s,
-			'atom feed generated correct URL');
-		like($body, qr/<title>test for public-inbox/,
-			"set title in XML feed");
-		like($body, qr/zzzzzz/, 'body included');
+		my $u = $pfx . "/blah\@example.com/$t";
+		my $res = $cb->(GET($u));
+		is(301, $res->code, "redirect for missing /");
+		my $location = $res->header('Location');
+		like($location, qr!/\Q$t\E/#u\z!,
+			'redirected with missing /');
 	});
-
+}
+foreach my $t (qw(f)) {
 	test_psgi($app, sub {
 		my ($cb) = @_;
-		my $path = '/blah@example.com/';
-		my $res = $cb->(GET($pfx . $path));
-		is(200, $res->code, "success for $path");
-		like($res->content, qr!<title>hihi - Me</title>!,
-			"HTML returned");
-
-		$path .= 'f/';
-		$res = $cb->(GET($pfx . $path));
-		is(301, $res->code, "redirect for $path");
+		my $u = $pfx . "/blah\@example.com/$t";
+		my $res = $cb->(GET($u));
+		is(301, $res->code, "redirect for legacy /f");
 		my $location = $res->header('Location');
 		like($location, qr!/blah\@example\.com/\z!,
-			'/$MESSAGE_ID/f/ redirected to /$MESSAGE_ID/');
-	});
-
-	test_psgi($app, sub {
-		my ($cb) = @_;
-		my $res = $cb->(GET($pfx . '/blah@example.com/raw'));
-		is(200, $res->code, 'success response received for /*/raw');
-		like($res->content, qr!^From !sm, "mbox returned");
+			'redirected with missing /');
 	});
+}
 
-	# legacy redirects
-	foreach my $t (qw(m f)) {
-		test_psgi($app, sub {
-			my ($cb) = @_;
-			my $res = $cb->(GET($pfx . "/$t/blah\@example.com.txt"));
-			is(301, $res->code, "redirect for old $t .txt link");
-			my $location = $res->header('Location');
-			like($location, qr!/blah\@example\.com/raw\z!,
-				".txt redirected to /raw");
-		});
-	}
-
-	my %umap = (
-		'm' => '',
-		'f' => '',
-		't' => 't/',
-	);
-	while (my ($t, $e) = each %umap) {
-		test_psgi($app, sub {
-			my ($cb) = @_;
-			my $res = $cb->(GET($pfx . "/$t/blah\@example.com.html"));
-			is(301, $res->code, "redirect for old $t .html link");
-			my $location = $res->header('Location');
-			like($location,
-				qr!/blah\@example\.com/$e(?:#u)?\z!,
-				".html redirected to new location");
-		});
-	}
-	foreach my $sfx (qw(mbox mbox.gz)) {
-		test_psgi($app, sub {
-			my ($cb) = @_;
-			my $res = $cb->(GET($pfx . "/t/blah\@example.com.$sfx"));
-			is(301, $res->code, 'redirect for old thread link');
-			my $location = $res->header('Location');
-			like($location,
-			     qr!/blah\@example\.com/t\.mbox(?:\.gz)?\z!,
-			     "$sfx redirected to /mbox.gz");
-		});
-	}
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $atomurl = 'http://example.com/test/new.atom';
+	my $res = $cb->(GET('http://example.com/test/new.html'));
+	is(200, $res->code, 'success response received');
+	like($res->content, qr!href="new\.atom"!,
+		'atom URL generated');
+	like($res->content, qr!href="blah\@example\.com/"!,
+		'index generated');
+	like($res->content, qr!1993-10-02!, 'date set');
+});
+
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $res = $cb->(GET($pfx . '/atom.xml'));
+	is(200, $res->code, 'success response received for atom');
+	my $body = $res->content;
+	like($body, qr!link\s+href="\Q$pfx\E/blah\@example\.com/"!s,
+		'atom feed generated correct URL');
+	like($body, qr/<title>test for public-inbox/,
+		"set title in XML feed");
+	like($body, qr/zzzzzz/, 'body included');
+});
+
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $path = '/blah@example.com/';
+	my $res = $cb->(GET($pfx . $path));
+	is(200, $res->code, "success for $path");
+	like($res->content, qr!<title>hihi - Me</title>!,
+		"HTML returned");
+
+	$path .= 'f/';
+	$res = $cb->(GET($pfx . $path));
+	is(301, $res->code, "redirect for $path");
+	my $location = $res->header('Location');
+	like($location, qr!/blah\@example\.com/\z!,
+		'/$MESSAGE_ID/f/ redirected to /$MESSAGE_ID/');
+});
+
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $res = $cb->(GET($pfx . '/blah@example.com/raw'));
+	is(200, $res->code, 'success response received for /*/raw');
+	like($res->content, qr!^From !sm, "mbox returned");
+});
+
+# legacy redirects
+foreach my $t (qw(m f)) {
 	test_psgi($app, sub {
 		my ($cb) = @_;
-		# for a while, we used to support /$INBOX/$X40/
-		# when we "compressed" long Message-IDs to SHA-1
-		# Now we're stuck supporting them forever :<
-		foreach my $path (@ls) {
-			$path =~ tr!/!!d;
-			my $from = "http://example.com/test/$path/";
-			my $res = $cb->(GET($from));
-			is(301, $res->code, 'is permanent redirect');
-			like($res->header('Location'),
-				qr!/test/blah\@example\.com/!,
-				'redirect from x40 MIDs works');
-		}
+		my $res = $cb->(GET($pfx . "/$t/blah\@example.com.txt"));
+		is(301, $res->code, "redirect for old $t .txt link");
+		my $location = $res->header('Location');
+		like($location, qr!/blah\@example\.com/raw\z!,
+			".txt redirected to /raw");
 	});
+}
 
-	# dumb HTTP clone/fetch support
+my %umap = (
+	'm' => '',
+	'f' => '',
+	't' => 't/',
+);
+while (my ($t, $e) = each %umap) {
 	test_psgi($app, sub {
 		my ($cb) = @_;
-		my $path = '/test/info/refs';
-		my $req = HTTP::Request->new('GET' => $path);
-		my $res = $cb->($req);
-		is(200, $res->code, 'refs readable');
-		my $orig = $res->content;
-
-		$req->header('Range', 'bytes=5-10');
-		$res = $cb->($req);
-		is(206, $res->code, 'got partial response');
-		is($res->content, substr($orig, 5, 6), 'partial body OK');
-
-		$req->header('Range', 'bytes=5-');
-		$res = $cb->($req);
-		is(206, $res->code, 'got partial another response');
-		is($res->content, substr($orig, 5), 'partial body OK past end');
+		my $res = $cb->(GET($pfx . "/$t/blah\@example.com.html"));
+		is(301, $res->code, "redirect for old $t .html link");
+		my $location = $res->header('Location');
+		like($location,
+			qr!/blah\@example\.com/$e(?:#u)?\z!,
+			".html redirected to new location");
 	});
-
-	# things which should fail
+}
+foreach my $sfx (qw(mbox mbox.gz)) {
 	test_psgi($app, sub {
 		my ($cb) = @_;
-
-		my $res = $cb->(PUT('/'));
-		is(405, $res->code, 'no PUT to / allowed');
-		$res = $cb->(PUT('/test/'));
-		is(405, $res->code, 'no PUT /$INBOX allowed');
-
-		# TODO
-		# $res = $cb->(GET('/'));
+		my $res = $cb->(GET($pfx . "/t/blah\@example.com.$sfx"));
+		is(301, $res->code, 'redirect for old thread link');
+		my $location = $res->header('Location');
+		like($location,
+		     qr!/blah\@example\.com/t\.mbox(?:\.gz)?\z!,
+		     "$sfx redirected to /mbox.gz");
 	});
 }
+test_psgi($app, sub {
+	my ($cb) = @_;
+	# for a while, we used to support /$INBOX/$X40/
+	# when we "compressed" long Message-IDs to SHA-1
+	# Now we're stuck supporting them forever :<
+	foreach my $path (@ls) {
+		$path =~ tr!/!!d;
+		my $from = "http://example.com/test/$path/";
+		my $res = $cb->(GET($from));
+		is(301, $res->code, 'is permanent redirect');
+		like($res->header('Location'),
+			qr!/test/blah\@example\.com/!,
+			'redirect from x40 MIDs works');
+	}
+});
+
+# dumb HTTP clone/fetch support
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $path = '/test/info/refs';
+	my $req = HTTP::Request->new('GET' => $path);
+	my $res = $cb->($req);
+	is(200, $res->code, 'refs readable');
+	my $orig = $res->content;
+
+	$req->header('Range', 'bytes=5-10');
+	$res = $cb->($req);
+	is(206, $res->code, 'got partial response');
+	is($res->content, substr($orig, 5, 6), 'partial body OK');
+
+	$req->header('Range', 'bytes=5-');
+	$res = $cb->($req);
+	is(206, $res->code, 'got partial another response');
+	is($res->content, substr($orig, 5), 'partial body OK past end');
+});
+
+# things which should fail
+test_psgi($app, sub {
+	my ($cb) = @_;
+
+	my $res = $cb->(PUT('/'));
+	is(405, $res->code, 'no PUT to / allowed');
+	$res = $cb->(PUT('/test/'));
+	is(405, $res->code, 'no PUT /$INBOX allowed');
+
+	# TODO
+	# $res = $cb->(GET('/'));
+});
 
 done_testing();

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 11/22] init: use Import::run_die instead of system()
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (9 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 10/22] t/plack.t: modernize and unindent Eric Wong
@ 2020-01-25  4:44 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 12/22] tests: move the majority of t/view.t into t/plack.t Eric Wong
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:44 UTC (permalink / raw)
  To: meta

We already load PublicInbox::Import via
PublicInbox::InboxWritable, so it's not an extra module
to load.  This can give us a slight speedup in tests.
---
 script/public-inbox-init | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/script/public-inbox-init b/script/public-inbox-init
index 34c93b47..6221871e 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -16,6 +16,7 @@ use PublicInbox::Admin;
 PublicInbox::Admin::require_or_die('-base');
 use PublicInbox::Config;
 use PublicInbox::InboxWritable;
+use PublicInbox::Import;
 use File::Temp qw/tempfile/;
 use PublicInbox::Lock;
 use File::Basename qw/dirname/;
@@ -23,7 +24,6 @@ use File::Path qw/mkpath/;
 use Fcntl qw(:DEFAULT);
 use Cwd qw/abs_path/;
 
-sub x { system(@_) and die join(' ', @_). " failed: $?\n" }
 my $version = undef;
 my $indexlevel = undef;
 my $skip_epoch;
@@ -151,13 +151,13 @@ umask(0077) if defined $perm;
 
 foreach my $addr (@address) {
 	next if $seen{lc($addr)};
-	x(@x, "--add", "$pfx.address", $addr);
+	PublicInbox::Import::run_die([@x, "--add", "$pfx.address", $addr]);
 }
-x(@x, "$pfx.url", $http_url);
-x(@x, "$pfx.inboxdir", $inboxdir);
+PublicInbox::Import::run_die([@x, "$pfx.url", $http_url]);
+PublicInbox::Import::run_die([@x, "$pfx.inboxdir", $inboxdir]);
 
 if (defined($indexlevel)) {
-	x(@x, "$pfx.indexlevel", $indexlevel);
+	PublicInbox::Import::run_die([@x, "$pfx.indexlevel", $indexlevel]);
 }
 
 # needed for git prior to v2.1.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 12/22] tests: move the majority of t/view.t into t/plack.t
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (10 preceding siblings ...)
  2020-01-25  4:44 ` [PATCH 11/22] init: use Import::run_die instead of system() Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 13/22] xt/perf-msgview: switch to multipart_text_as_html Eric Wong
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

And some more into t/mid.t.  PublicInbox::View::msg_html may
change internally, so lets rely on the stable PSGI interface
to test it, rather than a test which reaches deep into the
internals.
---
 t/mid.t   |   8 ++-
 t/plack.t |  81 ++++++++++++++++++++-
 t/view.t  | 207 ++++++------------------------------------------------
 3 files changed, 106 insertions(+), 190 deletions(-)

diff --git a/t/mid.t b/t/mid.t
index ecac04de..7005101f 100644
--- a/t/mid.t
+++ b/t/mid.t
@@ -2,12 +2,18 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use Test::More;
-use PublicInbox::MID qw(mid_escape mids references mids_for_index);
+use PublicInbox::MID qw(mid_escape mids references mids_for_index id_compress);
 
 is(mid_escape('foo!@(bar)'), 'foo!@(bar)');
 is(mid_escape('foo%!@(bar)'), 'foo%25!@(bar)');
 is(mid_escape('foo%!@(bar)'), 'foo%25!@(bar)');
 
+# n.b: this is probably invalid since we dropped CGI for PSGI:
+like(id_compress('foo%bar@wtf'), qr/\A[a-f0-9]{40}\z/,
+	"percent always converted to sha1 to workaround buggy httpds");
+
+is(id_compress('foobar-wtf'), 'foobar-wtf', 'regular ID not compressed');
+
 {
 	use Email::MIME;
 	my $mime = Email::MIME->create;
diff --git a/t/plack.t b/t/plack.t
index 443831a1..a9a053ed 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -40,6 +40,7 @@ Message-Id: <blah\@example.com>
 Subject: hihi
 Date: Fri, 02 Oct 1993 00:00:00 +0000
 
+> quoted text
 zzzzzz
 EOF
 	$im->add($mime);
@@ -48,6 +49,67 @@ EOF
 	like($rev, qr/\A[a-f0-9]{40}/, "good revision committed");
 	@ls = $git->qx(qw(ls-tree -r --name-only HEAD));
 	chomp @ls;
+
+	# multipart with two text bodies
+	my %attr_text = (attributes => { content_type => 'text/plain' });
+	my $parts = [
+		Email::MIME->create(%attr_text, body => 'hi'),
+		Email::MIME->create(%attr_text, body => 'bye')
+	];
+	$mime = Email::MIME->create(
+		header_str => [
+			From => 'a@example.com',
+			Subject => 'blargh',
+			'Message-ID' => '<multipart@example.com>',
+			'In-Reply-To' => '<irp@example.com>'
+		],
+		parts => $parts,
+	);
+	$im->add($mime);
+
+	# multipart with attached patch + filename
+	$parts = [ Email::MIME->create(%attr_text, body => 'hi, see attached'),
+		Email::MIME->create(
+			attributes => {
+					content_type => 'text/plain',
+					filename => "foo&.patch",
+			},
+			body => "--- a/file\n+++ b/file\n" .
+				"@@ -49, 7 +49,34 @@\n"
+			)
+	];
+	$mime = Email::MIME->create(
+		header_str => [
+			From => 'a@example.com',
+			Subject => '[PATCH] asdf',
+			'Message-ID' => '<patch@example.com>'
+		],
+		parts => $parts
+	);
+	$im->add($mime);
+
+	# multipart collapsed to single quoted-printable text/plain
+	$parts = [
+		Email::MIME->create(
+			attributes => {
+				content_type => 'text/plain',
+				encoding => 'quoted-printable'
+			},
+			body => 'hi = bye',
+		)
+	];
+	$mime = Email::MIME->create(
+		header_str => [
+			From => 'qp@example.com',
+			Subject => 'QP',
+			'Message-ID' => '<qp@example.com>',
+			],
+		parts => $parts,
+	);
+	like($mime->body_raw, qr/hi =3D bye=/, 'our test used QP correctly');
+	$im->add($mime);
+
+	$im->done;
 }
 
 test_psgi($app, sub {
@@ -134,8 +196,10 @@ test_psgi($app, sub {
 	my $path = '/blah@example.com/';
 	my $res = $cb->(GET($pfx . $path));
 	is(200, $res->code, "success for $path");
-	like($res->content, qr!<title>hihi - Me</title>!,
-		"HTML returned");
+	my $html = $res->content;
+	like($html, qr!<title>hihi - Me</title>!, 'HTML returned');
+	like($html, qr!<a\nhref="raw"!s, 'raw link present');
+	like($html, qr!&gt; quoted text!s, 'quoted text inline');
 
 	$path .= 'f/';
 	$res = $cb->(GET($pfx . $path));
@@ -143,6 +207,19 @@ test_psgi($app, sub {
 	my $location = $res->header('Location');
 	like($location, qr!/blah\@example\.com/\z!,
 		'/$MESSAGE_ID/f/ redirected to /$MESSAGE_ID/');
+
+	$res = $cb->(GET($pfx . '/multipart@example.com/'));
+	like($res->content,
+		qr/hi\n.*-- Attachment #2.*\nbye\n/s, 'multipart split');
+
+	$res = $cb->(GET($pfx . '/patch@example.com/'));
+	$html = $res->content;
+	like($html, qr!see attached!, 'original body');
+	like($html, qr!.*Attachment #2: foo&(?:amp|#38);\.patch --!,
+		'parts split with filename');
+
+	$res = $cb->(GET($pfx . '/qp@example.com/'));
+	like($res->content, qr/\bhi = bye\b/, "HTML output decoded QP");
 });
 
 test_psgi($app, sub {
diff --git a/t/view.t b/t/view.t
index 38c12fcc..462667f1 100644
--- a/t/view.t
+++ b/t/view.t
@@ -1,198 +1,31 @@
-# Copyright (C) 2013-2019 all contributors <meta@public-inbox.org>
+# Copyright (C) 2013-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
-use warnings;
 use Test::More;
 use PublicInbox::TestCommon;
-use Email::MIME;
 require_mods('Plack::Util');
 use_ok 'PublicInbox::View';
-use_ok 'PublicInbox::Config';
 
-# FIXME: make this test less fragile
-my $ctx = {
-	env => { HTTP_HOST => 'example.com', 'psgi.url_scheme' => 'http' },
-	-inbox => Plack::Util::inline_object(
-		name => 'test',
-		over => sub { undef },
-		search => sub { undef },
-		base_url => sub { 'http://example.com/' },
-		cloneurl => sub {[]},
-		nntp_url => sub {[]},
-		max_git_epoch => sub { undef },
-		description => sub { '' }),
-	www => Plack::Util::inline_object(style => sub { '' }),
-};
-$ctx->{-inbox}->{-primary_address} = 'test@example.com';
+# this only tests View.pm internals which are subject to change,
+# see t/plack.t for tests against the PSGI interface.
 
-sub msg_html ($$) {
-	my ($ctx, $mime) = @_;
-
-	my $s = '';
-	my $r = PublicInbox::View::msg_html($ctx, $mime);
-	my $body = $r->[2];
-	while (defined(my $buf = $body->getline)) {
-		$s .= $buf;
-	}
-	$body->close;
-	$s;
-}
-
-# plain text
-{
-	my $body = <<EOF;
-So and so wrote:
-> keep this inline
-
-OK
-
-> Long and wordy reply goes here and it is split across multiple lines.
-> We generate links to a separate full page where quoted-text is inline.
-> This is
->
-> Currently 12 lines
-> See MAX_INLINE_QUOTED
-> See MAX_INLINE_QUOTED
-> See MAX_INLINE_QUOTED
-> See MAX_INLINE_QUOTED
-> See MAX_INLINE_QUOTED
-> See MAX_INLINE_QUOTED
-> See MAX_INLINE_QUOTED
-> See MAX_INLINE_QUOTED
-
-hello world
-EOF
-	my $s = Email::Simple->create(
-		header => [
-			From => 'a@example.com',
-			To => 'b@example.com',
-			'Content-Type' => 'text/plain',
-			'Message-ID' => '<hello@example.com>',
-			Subject => 'this is a subject',
-		],
-		body => $body,
-	)->as_string;
-	my $mime = Email::MIME->new($s);
-	my $html = msg_html($ctx, $mime);
-
-	# ghetto tests
-	like($html, qr!<a\nhref="raw"!s, "raw link present");
-	like($html, qr/hello world\b/, "body present");
-	like($html, qr/&gt; keep this inline/, "short quoted text is inline");
-}
-
-# multipart crap
-{
-	my $parts = [
-		Email::MIME->create(
-			attributes => { content_type => 'text/plain', },
-			body => 'hi',
-		),
-		Email::MIME->create(
-			attributes => { content_type => 'text/plain', },
-			body => 'bye',
-		)
-	];
-	my $mime = Email::MIME->create(
-		header_str => [
-			From => 'a@example.com',
-			Subject => 'blargh',
-			'Message-ID' => '<blah@example.com>',
-			'In-Reply-To' => '<irp@example.com>',
-			],
-		parts => $parts,
-	);
-
-	my $html = msg_html($ctx, $mime);
-	like($html, qr/hi\n.*-- Attachment #2.*\nbye\n/s, "multipart split");
-}
-
-# multipart email with attached patch
-{
-	my $parts = [
-		Email::MIME->create(
-			attributes => { content_type => 'text/plain', },
-			body => 'hi, see attached patch',
-		),
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/plain',
-				filename => "foo&.patch",
-			},
-			body => "--- a/file\n+++ b/file\n" .
-			        "@@ -49, 7 +49,34 @@\n",
-		),
-	];
-	my $mime = Email::MIME->create(
-		header_str => [
-			From => 'a@example.com',
-			Subject => '[PATCH] asdf',
-			'Message-ID' => '<patch@example.com>',
-			],
-		parts => $parts,
-	);
-
-	my $html = msg_html($ctx, $mime);
-	like($html, qr!.*Attachment #2: foo&(?:amp|#38);\.patch --!,
-		"parts split with filename");
-}
-
-# multipart collapsed to single quoted-printable text/plain
-{
-	my $parts = [
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/plain',
-				encoding => 'quoted-printable',
-			},
-			body => 'hi = bye',
-		)
-	];
-	my $mime = Email::MIME->create(
-		header_str => [
-			From => 'qp@example.com',
-			Subject => 'QP',
-			'Message-ID' => '<qp@example.com>',
-			],
-		parts => $parts,
-	);
-
-	my $orig = $mime->body_raw;
-	my $html = msg_html($ctx, $mime);
-	like($orig, qr/hi =3D bye=/, "our test used QP correctly");
-	like($html, qr/\bhi = bye\b/, "HTML output decoded QP");
-}
-
-{
-	use PublicInbox::MID qw/id_compress/;
-
-	# n.b: this is probably invalid since we dropped CGI for PSGI:
-	like(id_compress('foo%bar@wtf'), qr/\A[a-f0-9]{40}\z/,
-		"percent always converted to sha1 to workaround buggy httpds");
-
-	is(id_compress('foobar-wtf'), 'foobar-wtf',
-		'regular ID not compressed');
-}
-
-{
-	my $cols = PublicInbox::View::COLS();
-	my @addr;
-	until (length(join(', ', @addr)) > ($cols * 2)) {
-		push @addr, '"l, f" <a@a>';
-		my $n = int(rand(20)) + 1;
-		push @addr, ('x'x$n).'@x';
-	}
-	my $orig = join(', ', @addr);
-	my $res = PublicInbox::View::fold_addresses($orig.'');
-	isnt($res, $orig, 'folded result');
-	unlike($res, qr/l,\n\tf/s, '"last, first" no broken');
-	my @nospc = ($res, $orig);
-	s/\s+//g for @nospc;
-	is($nospc[0], $nospc[1], 'no addresses lost in translation');
-	my $tws = PublicInbox::View::fold_addresses($orig.' ');
-	# (Email::Simple drops leading whitespace, but not trailing)
-	$tws =~ s/ \z//;
-	is($tws, $res, 'not thrown off by trailing whitespace');
+my $cols = PublicInbox::View::COLS();
+my @addr;
+until (length(join(', ', @addr)) > ($cols * 2)) {
+	push @addr, '"l, f" <a@a>';
+	my $n = int(rand(20)) + 1;
+	push @addr, ('x'x$n).'@x';
 }
+my $orig = join(', ', @addr);
+my $res = PublicInbox::View::fold_addresses($orig.'');
+isnt($res, $orig, 'folded result');
+unlike($res, qr/l,\n\tf/s, '"last, first" no broken');
+my @nospc = ($res, $orig);
+s/\s+//g for @nospc;
+is($nospc[0], $nospc[1], 'no addresses lost in translation');
+my $tws = PublicInbox::View::fold_addresses($orig.' ');
+# (Email::Simple drops leading whitespace, but not trailing)
+$tws =~ s/ \z//;
+is($tws, $res, 'not thrown off by trailing whitespace');
 
 done_testing();

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 13/22] xt/perf-msgview: switch to multipart_text_as_html
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (11 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 12/22] tests: move the majority of t/view.t into t/plack.t Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 14/22] view: inline and eliminate msg_html Eric Wong
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

It's a more widely-used (but still internal) API which will
probably last longer than msg_html.  It also reaches deeper into
the stack and avoids the overhead of ->getline via PSGI, so it's
faster and gives a more accurate measurement of lower-level parts.
---
 xt/perf-msgview.t | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t
index 8c9037ee..9375977a 100644
--- a/xt/perf-msgview.t
+++ b/xt/perf-msgview.t
@@ -35,18 +35,18 @@ my $ctx = {
 my ($str, $mime, $res, $cmt, $type);
 my $n = 0;
 my $t = timeit(1, sub {
+	my $obuf = '';
+	$ctx->{obuf} = \$obuf;
 	while (<$fh>) {
 		($cmt, $type) = split / /;
 		next if $type ne 'blob';
 		++$n;
 		$str = $git->cat_file($cmt);
 		$mime = PublicInbox::MIME->new($str);
-		$res = PublicInbox::View::msg_html($ctx, $mime);
-		$res = $res->[2];
-		while (defined($res->getline)) {}
-		$res->close;
+		PublicInbox::View::multipart_text_as_html($mime, '../', $ctx);
+		$obuf = '';
 	}
 });
-diag 'msg_html took '.timestr($t)." for $n messages";
+diag 'multipart_text_as_html took '.timestr($t)." for $n messages";
 ok 1;
 done_testing();

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 14/22] view: inline and eliminate msg_html
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (12 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 13/22] xt/perf-msgview: switch to multipart_text_as_html Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 15/22] linkify: compile $LINK_RE once Eric Wong
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

No need to keep the old sub around, anymore.  Rename auxiliary
subs to "msg_page_*" instead of the "html" version.
---
 lib/PublicInbox/View.pm | 30 ++++++++++++------------------
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 9f340132..54a49de6 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -24,7 +24,7 @@ use constant INDENT => '  ';
 use constant TCHILD => '` ';
 sub th_pfx ($) { $_[0] == 0 ? '' : TCHILD };
 
-sub msg_html_i {
+sub msg_page_i {
 	my ($nr, $ctx) = @_;
 	my $more = $ctx->{more};
 	if ($nr == 1) {
@@ -35,7 +35,7 @@ sub msg_html_i {
 	} elsif ($more) {
 		++$ctx->{end_nr};
 		# fake an EOF if {more} retrieval fails fails;
-		eval { msg_html_more($ctx, $nr) };
+		eval { msg_page_more($ctx, $nr) };
 	} elsif ($nr == $ctx->{end_nr}) {
 		# fake an EOF if generating the footer fails;
 		# we want to at least show the message if something
@@ -48,18 +48,6 @@ sub msg_html_i {
 
 # public functions: (unstable)
 
-sub msg_html {
-	my ($ctx, $mime, $smsg) = @_;
-	my $ibx = $ctx->{-inbox};
-	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
-	my $hdr = $ctx->{hdr} = $mime->header_obj;
-	_msg_html_prepare_obuf($hdr, $ctx, 0);
-	$ctx->{end_nr} = 2;
-	$ctx->{smsg} = $smsg;
-	$ctx->{mime} = $mime;
-	PublicInbox::WwwStream->response($ctx, 200, \&msg_html_i);
-}
-
 sub msg_page {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
@@ -78,10 +66,16 @@ sub msg_page {
 	} else {
 		$first = $ibx->msg_by_mid($mid) or return;
 	}
-	msg_html($ctx, PublicInbox::MIME->new($first), $smsg);
+	my $mime = $ctx->{mime} = PublicInbox::MIME->new($first);
+	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
+	my $hdr = $ctx->{hdr} = $mime->header_obj;
+	_msg_page_prepare_obuf($hdr, $ctx, 0);
+	$ctx->{end_nr} = 2;
+	$ctx->{smsg} = $smsg;
+	PublicInbox::WwwStream->response($ctx, 200, \&msg_page_i);
 }
 
-sub msg_html_more {
+sub msg_page_more {
 	my ($ctx, $nr) = @_;
 	my ($id, $prev, $smsg) = @{$ctx->{more}};
 	my $ibx = $ctx->{-inbox};
@@ -91,7 +85,7 @@ sub msg_html_more {
 	return '' unless $smsg;
 	my $upfx = '../' . mid_escape($smsg->mid) . '/';
 	my $mime = delete $smsg->{mime};
-	_msg_html_prepare_obuf($mime->header_obj, $ctx, $nr);
+	_msg_page_prepare_obuf($mime->header_obj, $ctx, $nr);
 	multipart_text_as_html($mime, $upfx, $ctx);
 	${delete $ctx->{obuf}} .= '</pre><hr>';
 }
@@ -623,7 +617,7 @@ sub add_text_body { # callback for msg_iter
 	obfuscate_addrs($ibx, $$rv) if $ibx->{obfuscate};
 }
 
-sub _msg_html_prepare_obuf {
+sub _msg_page_prepare_obuf {
 	my ($hdr, $ctx, $nr) = @_;
 	my $over = $ctx->{-inbox}->over;
 	my $obfs_ibx = $ctx->{-obfs_ibx};

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 15/22] linkify: compile $LINK_RE once
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (13 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 14/22] view: inline and eliminate msg_html Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 16/22] linkify: move to_html over from ViewDiff Eric Wong
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

This gives a 3-4% performance improvement in xt/perf-msgview.t
with a mirror of https://public-inbox.org/meta/
---
 lib/PublicInbox/Linkify.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Linkify.pm b/lib/PublicInbox/Linkify.pm
index af9be3ff..77b94f56 100644
--- a/lib/PublicInbox/Linkify.pm
+++ b/lib/PublicInbox/Linkify.pm
@@ -70,7 +70,7 @@ sub linkify_1 {
 
 		$_[0]->{$key} = $url;
 		$beg . 'PI-LINK-'. $key . $end;
-	^ge;
+	^geo;
 	$_[1];
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 16/22] linkify: move to_html over from ViewDiff
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (14 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 15/22] linkify: compile $LINK_RE once Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 17/22] searchidx: skip filenames on "diff --git ..." Eric Wong
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

We use the same idiom in many places for doing two-step
linkification and HTML escaping.  Get rid of an outdated
comment in flush_quote while we're at it.
---
 lib/PublicInbox/Linkify.pm    |  2 ++
 lib/PublicInbox/View.pm       |  8 ++------
 lib/PublicInbox/ViewDiff.pm   | 31 +++++++++++++------------------
 lib/PublicInbox/ViewVCS.pm    |  6 ++----
 lib/PublicInbox/WwwListing.pm |  3 +--
 5 files changed, 20 insertions(+), 30 deletions(-)

diff --git a/lib/PublicInbox/Linkify.pm b/lib/PublicInbox/Linkify.pm
index 77b94f56..199457af 100644
--- a/lib/PublicInbox/Linkify.pm
+++ b/lib/PublicInbox/Linkify.pm
@@ -120,4 +120,6 @@ sub linkify_mids {
 	!ge;
 }
 
+sub to_html { linkify_2($_[0], ascii_html(linkify_1(@_))) }
+
 1;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 54a49de6..f613afb5 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -496,13 +496,10 @@ sub multipart_text_as_html {
 sub flush_quote {
 	my ($s, $l, $quot) = @_;
 
-	# show everything in the full version with anchor from
-	# short version (see above)
-	my $rv = $l->linkify_1($$quot);
+	my $rv = $l->to_html($$quot);
 
 	# we use a <span> here to allow users to specify their own
 	# color for quoted text
-	$rv = $l->linkify_2(ascii_html($rv));
 	$$quot = undef;
 	$$s .= qq(<span\nclass="q">) . $rv . '</span>'
 }
@@ -608,8 +605,7 @@ sub add_text_body { # callback for msg_iter
 			flush_diff($rv, $ctx, $l);
 		} else {
 			# regular lines, OK
-			$l->linkify_1($cur);
-			$$rv .= $l->linkify_2(ascii_html($cur));
+			$$rv .= $l->to_html($cur);
 			$cur = undef;
 		}
 	}
diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index 10d71004..be0c0452 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -44,11 +44,6 @@ my $PATH_B = '"?b/.+|/dev/null';
 # cf. git diff.c :: get_compact_summary
 my $DIFFSTAT_COMMENT = qr/\((?:new|gone|(?:(?:new|mode) [\+\-][lx]))\)/;
 
-sub to_html ($$) {
-	$_[0]->linkify_1($_[1]);
-	$_[0]->linkify_2(ascii_html($_[1]));
-}
-
 # link to line numbers in blobs
 sub diff_hunk ($$$$) {
 	my ($dctx, $spfx, $ca, $cb) = @_;
@@ -107,7 +102,7 @@ sub anchor0 ($$$$$) {
 		my $spaces = ($orig =~ s/( +)\z//) ? $1 : '';
 		$$dst .= " <a\nid=i$attr\nhref=#$attr>" .
 			ascii_html($orig) . '</a>' . $spaces .
-			to_html($linkify, $rest);
+			$linkify->to_html($rest);
 		return 1;
 	}
 	undef;
@@ -116,7 +111,7 @@ sub anchor0 ($$$$$) {
 sub anchor1 ($$$$$) {
 	my ($dst, $ctx, $linkify, $pb, $s) = @_;
 	my $attr = to_attr($ctx->{-apfx}.$pb) or return;
-	my $line = to_html($linkify, $s);
+	my $line = $linkify->to_html($s);
 
 	my $ok = delete $ctx->{-anchors}->{$attr};
 
@@ -158,7 +153,7 @@ sub flush_diff ($$$) {
 			} elsif ($state2class[$state]) {
 				to_state($dst, $state, DSTATE_CTX);
 			}
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} elsif ($s =~ /^-- $/) { # email signature begins
 			$state == DSTATE_INIT or
 				to_state($dst, $state, DSTATE_INIT);
@@ -178,53 +173,53 @@ sub flush_diff ($$$) {
 					uri_escape_utf8($pa, UNSAFE);
 			}
 			anchor1($dst, $ctx, $linkify, $pb, $s) and next;
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} elsif ($s =~ s/^(index $OID_NULL\.\.)($OID_BLOB)\b//o) {
 			$$dst .= $1 . oid($dctx, $spfx, $2);
 			$dctx = { Q => '' };
-			$$dst .= to_html($linkify, $s) ;
+			$$dst .= $linkify->to_html($s) ;
 		} elsif ($s =~ s/^index ($OID_BLOB)(\.\.$OID_NULL)\b//o) {
 			$$dst .= 'index ' . oid($dctx, $spfx, $1) . $2;
 			$dctx = { Q => '' };
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} elsif ($s =~ /^index ($OID_BLOB)\.\.($OID_BLOB)/o) {
 			$dctx->{oid_a} = $1;
 			$dctx->{oid_b} = $2;
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} elsif ($s =~ s/^@@ (\S+) (\S+) @@//) {
 			$$dst .= '</span>' if $state2class[$state];
 			$$dst .= qq(<span\nclass="hunk">);
 			$$dst .= diff_hunk($dctx, $spfx, $1, $2);
 			$$dst .= '</span>';
 			$state = DSTATE_CTX;
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} elsif ($s =~ m!^--- (?:$PATH_A)!o ||
 		         $s =~ m!^\+{3} (?:$PATH_B)!o)  {
 			# color only (no oid link) if missing dctx->{oid_*}
 			$state <= DSTATE_STAT and
 				to_state($dst, $state, DSTATE_HEAD);
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} elsif ($s =~ /^\+/) {
 			if ($state != DSTATE_ADD && $state > DSTATE_STAT) {
 				to_state($dst, $state, DSTATE_ADD);
 			}
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} elsif ($s =~ /^-/) {
 			if ($state != DSTATE_DEL && $state > DSTATE_STAT) {
 				to_state($dst, $state, DSTATE_DEL);
 			}
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		# ignore the following lines in headers:
 		} elsif ($s =~ /^(?:dis)similarity index/ ||
 			 $s =~ /^(?:old|new) mode/ ||
 			 $s =~ /^(?:deleted|new) file mode/ ||
 			 $s =~ /^(?:copy|rename) (?:from|to) / ||
 			 $s =~ /^(?:dis)?similarity index /) {
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		} else {
 			$state <= DSTATE_STAT or
 				to_state($dst, $state, DSTATE_INIT);
-			$$dst .= to_html($linkify, $s);
+			$$dst .= $linkify->to_html($s);
 		}
 	}
 	@$diff = ();
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index fdca70cb..77b3ca28 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -90,8 +90,7 @@ sub show_other_result ($$) {
 	}
 	my $l = PublicInbox::Linkify->new;
 	utf8::decode($$bref);
-	$l->linkify_1($$bref);
-	$$bref = '<pre>'. $l->linkify_2(ascii_html($$bref));
+	$$bref = '<pre>'. $l->to_html($$bref);
 	$$bref .= '</pre><hr>' . $$logref;
 	html_page($ctx, 200, $bref);
 }
@@ -125,9 +124,8 @@ sub solve_result {
 
 	my $ref = ref($res);
 	my $l = PublicInbox::Linkify->new;
-	$l->linkify_1($log);
 	$log = '<pre>debug log:</pre><hr /><pre>' .
-		$l->linkify_2(ascii_html($log)) . '</pre>';
+		$l->to_html($log) . '</pre>';
 
 	$res or return html_page($ctx, 404, \$log);
 	$ref eq 'ARRAY' or return html_page($ctx, 500, \$log);
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index 8d610037..37ffffc1 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -111,8 +111,7 @@ sub html ($$) {
 
 		my $tmp = join("\n", map { ibx_entry(@$_, $env) } @$list);
 		my $l = PublicInbox::Linkify->new;
-		$l->linkify_1($tmp);
-		$out = '<pre>'.$l->linkify_2(ascii_html($tmp)).'</pre><hr>';
+		$out = '<pre>'.$l->to_html($tmp).'</pre><hr>';
 	}
 	$out = "<html><head><title>$title</title></head><body>" . $out;
 	$out .= '<pre>'. PublicInbox::WwwStream::code_footer($env) .

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 17/22] searchidx: skip filenames on "diff --git ..."
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (15 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 16/22] linkify: move to_html over from ViewDiff Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 18/22] searchidx: don't assume "a/" and "b/" as prefixes Eric Wong
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

We already capture filenames on the lines beginning
with "---" and "+++", so it's redundant work to capture
filenames from "diff --git ..." lines.
---
 lib/PublicInbox/SearchIdx.pm | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index cb554912..6ade147f 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -209,12 +209,8 @@ sub index_diff ($$$) {
 			index_diff_inc($self, $_, 'XDFCTX', $xnq);
 		} elsif (/^-- $/) { # email signature begins
 			$in_diff = undef;
-		} elsif (m!^diff --git ("?a/.+) ("?b/.+)\z!) {
-			my ($fa, $fb) = ($1, $2);
-			my $fn = (split('/', git_unquote($fa), 2))[1];
-			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);
-			$fn = (split('/', git_unquote($fb), 2))[1];
-			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);
+		} elsif (m!^diff --git "?[^/]+/.+ "?[^/]+/.+\z!) {
+			# wait until "---" and "+++" to capture filenames
 			$in_diff = 1;
 		# traditional diff:
 		} elsif (m/^diff -(.+) (\S+) (\S+)$/) {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 18/22] searchidx: don't assume "a/" and "b/" as prefixes
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (16 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 17/22] searchidx: skip filenames on "diff --git ..." Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 19/22] viewdiff: add "b=" param with non-standard diff prefix Eric Wong
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

Some people use "--{src,dst}-prefix=", try to deal with those
since git-apply can handle them when called by solver.
---
 lib/PublicInbox/SearchIdx.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 6ade147f..2204134b 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -220,12 +220,12 @@ sub index_diff ($$$) {
 			next unless $opt =~ /[uU]/;
 			$in_diff = index_old_diff_fn($self, \%seen, $fa, $fb,
 							$xnq);
-		} elsif (m!^--- ("?a/.+)!) {
+		} elsif (m!^--- ("?[^/]+/.+)!) {
 			my $fn = $1;
 			$fn = (split('/', git_unquote($fn), 2))[1];
 			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);
 			$in_diff = 1;
-		} elsif (m!^\+\+\+ ("?b/.+)!)  {
+		} elsif (m!^\+\+\+ ("?[^/]+/.+)!)  {
 			my $fn = $1;
 			$fn = (split('/', git_unquote($fn), 2))[1];
 			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 19/22] viewdiff: add "b=" param with non-standard diff prefix
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (17 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 18/22] searchidx: don't assume "a/" and "b/" as prefixes Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 20/22] viewdiff: add "b=" param when missing "diff --git" line Eric Wong
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to
git@vger) uses "i" and "w" prefixes instead of the standard "a"
and "b" prefixes, ensure we emit a "b=$FILENAME" param for the
solver endpoint to improve search accuracy, syntax highlighting,
and information density in the URL itself.
---
 lib/PublicInbox/ViewDiff.pm | 9 ++++-----
 xt/solver.t                 | 2 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index be0c0452..6a60d0dc 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -38,8 +38,7 @@ sub UNSAFE () { "^A-Za-z0-9\-\._~/" }
 
 my $OID_NULL = '0{7,40}';
 my $OID_BLOB = '[a-f0-9]{7,40}';
-my $PATH_A = '"?a/.+|/dev/null';
-my $PATH_B = '"?b/.+|/dev/null';
+my $PATH_X = '"?[^/]+/.+|/dev/null';
 
 # cf. git diff.c :: get_compact_summary
 my $DIFFSTAT_COMMENT = qr/\((?:new|gone|(?:(?:new|mode) [\+\-][lx]))\)/;
@@ -158,7 +157,7 @@ sub flush_diff ($$$) {
 			$state == DSTATE_INIT or
 				to_state($dst, $state, DSTATE_INIT);
 			$$dst .= $s;
-		} elsif ($s =~ m!^diff --git ($PATH_A) ($PATH_B)$!o) {
+		} elsif ($s =~ m!^diff --git ($PATH_X) ($PATH_X)$!o) {
 			my ($pa, $pb) = ($1, $2);
 			if ($state != DSTATE_HEAD) {
 				to_state($dst, $state, DSTATE_HEAD);
@@ -193,8 +192,8 @@ sub flush_diff ($$$) {
 			$$dst .= '</span>';
 			$state = DSTATE_CTX;
 			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ m!^--- (?:$PATH_A)!o ||
-		         $s =~ m!^\+{3} (?:$PATH_B)!o)  {
+		} elsif ($s =~ m!^--- (?:$PATH_X)!o ||
+		         $s =~ m!^\+{3} (?:$PATH_X)!o)  {
 			# color only (no oid link) if missing dctx->{oid_*}
 			$state <= DSTATE_STAT and
 				to_state($dst, $state, DSTATE_HEAD);
diff --git a/xt/solver.t b/xt/solver.t
index 5307e120..d2206b28 100644
--- a/xt/solver.t
+++ b/xt/solver.t
@@ -27,7 +27,7 @@ my $todo = {
 		'776fa90f7f/s/?b=contrib/git-jump/git-jump',
 		'5cd8845/s/?b=submodule.c',
 		'81c1164ae5/s/?b=builtin/log.c',
-		'6aa8857a11/s/?b=protocol.c', # TODO: i/, w/ instead of a/ b/
+		'6aa8857a11/s/?b=protocol.c',
 		'96f1c7f/s/', # TODO: b=contrib/completion/git-completion.bash
 		'b76f2c0/s/?b=po/zh_CN.po',
 	],

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 20/22] viewdiff: add "b=" param when missing "diff --git" line
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (18 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 19/22] viewdiff: add "b=" param with non-standard diff prefix Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 21/22] viewdiff: use autovivification for long_path hash Eric Wong
  2020-01-25  4:45 ` [PATCH 22/22] viewdiff: rewrite and simplify Eric Wong
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to
git@vger contained an otherwise valid diff without a "diff
--git" line.  Generate a "b=" parameter in that case using the
"+++" line instead of the "diff --git" line.  SearchIdx.pm no
longer uses the "diff --git" line for filename information,
either.
---
 lib/PublicInbox/ViewDiff.pm | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index 6a60d0dc..ff7d85f5 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -133,6 +133,17 @@ sub anchor1 ($$$$$) {
 	undef
 }
 
+sub missing_diff_git_line ($$) {
+	my ($dctx, $pb) = @_;
+	# missing "diff --git ..."
+	$dctx->{path_b} = $pb;
+	$dctx->{Q} = '?b='.uri_escape_utf8($pb, UNSAFE);
+	my $pa = $dctx->{path_a};
+	if (defined($pa) && $pa ne $pb) {
+		$dctx->{Q} .= '&amp;a='. uri_escape_utf8($pa, UNSAFE);
+	}
+}
+
 sub flush_diff ($$$) {
 	my ($dst, $ctx, $linkify) = @_;
 	my $diff = $ctx->{-diff};
@@ -192,8 +203,24 @@ sub flush_diff ($$$) {
 			$$dst .= '</span>';
 			$state = DSTATE_CTX;
 			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ m!^--- (?:$PATH_X)!o ||
-		         $s =~ m!^\+{3} (?:$PATH_X)!o)  {
+		} elsif ($s =~ m!^--- ($PATH_X)!o) {
+			my $pa = $1;
+			$pa = (split('/', git_unquote($pa), 2))[1];
+			if (($dctx->{path_a} // '') ne $pa) {
+				# missing "diff --git ..." ?
+				$dctx->{path_a} = $pa;
+			}
+			# color only (no oid link) if missing dctx->{oid_*}
+			$state <= DSTATE_STAT and
+				to_state($dst, $state, DSTATE_HEAD);
+			$$dst .= $linkify->to_html($s);
+		} elsif ($s =~ m!^\+{3} ($PATH_X)!o) {
+			my $pb = $1;
+			$pb = (split('/', git_unquote($pb), 2))[1];
+			if (($dctx->{path_b} // '') ne $pb) {
+				missing_diff_git_line($dctx, $pb);
+			}
+
 			# color only (no oid link) if missing dctx->{oid_*}
 			$state <= DSTATE_STAT and
 				to_state($dst, $state, DSTATE_HEAD);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 21/22] viewdiff: use autovivification for long_path hash
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (19 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 20/22] viewdiff: add "b=" param when missing "diff --git" line Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  2020-01-25  4:45 ` [PATCH 22/22] viewdiff: rewrite and simplify Eric Wong
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

No sense in wasting code to do something the interpreter
already does for us.
---
 lib/PublicInbox/ViewDiff.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index ff7d85f5..ece95f4c 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -92,8 +92,7 @@ sub anchor0 ($$$$$) {
 
 	# long filenames will require us to walk backwards in anchor1
 	if ($fn =~ s!\A\.\.\./?!!) {
-		my $lp = $ctx->{-long_path} ||= {};
-		$lp->{$fn} = qr/\Q$fn\E\z/s;
+		$ctx->{-long_path}->{$fn} = qr/\Q$fn\E\z/s;
 	}
 
 	if (my $attr = to_attr($ctx->{-apfx}.$fn)) {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 22/22] viewdiff: rewrite and simplify
  2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
                   ` (20 preceding siblings ...)
  2020-01-25  4:45 ` [PATCH 21/22] viewdiff: use autovivification for long_path hash Eric Wong
@ 2020-01-25  4:45 ` Eric Wong
  21 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2020-01-25  4:45 UTC (permalink / raw)
  To: meta

Instead of going line-by-line, use split() with a giant regexp
to capture groups of contiguous lines.  This offloads state
management to the regexp itself and makes it FAR easier to
keep track of <span> and </span> pairings.

Performance seems roughly on par after this change for the
meta@public-inbox archives.  It seems a tiny bit faster for
git@vger with xt/perf-msgview.t, likely due to the longer
messages and larger contiguous groups of lines having the same
prefix (or no prefix at all) and drastically reduces the number
of subroutine calls and Perl ops executed.
---
 lib/PublicInbox/View.pm     |   8 +-
 lib/PublicInbox/ViewDiff.pm | 297 +++++++++++++++++-------------------
 2 files changed, 139 insertions(+), 166 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index f613afb5..9a5e3997 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -562,7 +562,7 @@ sub add_text_body { # callback for msg_iter
 		$idx[0] = $upfx . $idx[0] if $upfx ne '';
 		$ctx->{-apfx} = join('/', @idx);
 		$ctx->{-anchors} = {}; # attr => filename
-		$ctx->{-diff} = $diff = [];
+		$diff = 1;
 		delete $ctx->{-long_path};
 		my $spfx;
 		if ($ibx->{-repo_objs}) {
@@ -595,14 +595,12 @@ sub add_text_body { # callback for msg_iter
 		attach_link($ctx, $ct, $p, $fn, $err);
 		$$rv .= "\n";
 	}
-	my $l = PublicInbox::Linkify->new;
+	my $l = $ctx->{-linkify} //= PublicInbox::Linkify->new;
 	foreach my $cur (@sections) {
 		if ($cur =~ /\A>/) {
 			flush_quote($rv, $l, \$cur);
 		} elsif ($diff) {
-			@$diff = split(/^/m, $cur);
-			$cur = undef;
-			flush_diff($rv, $ctx, $l);
+			flush_diff($rv, $ctx, \$cur);
 		} else {
 			# regular lines, OK
 			$$rv .= $l->to_html($cur);
diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index ece95f4c..8c37c139 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -7,6 +7,7 @@
 # (or reconstruct) blobs.
 
 package PublicInbox::ViewDiff;
+use 5.010_001;
 use strict;
 use warnings;
 use base qw(Exporter);
@@ -15,52 +16,31 @@ use URI::Escape qw(uri_escape_utf8);
 use PublicInbox::Hval qw(ascii_html to_attr);
 use PublicInbox::Git qw(git_unquote);
 
-# keep track of state so we can avoid redundant HTML tags for
-# identically-classed lines
-sub DSTATE_INIT () { 0 }
-sub DSTATE_STAT () { 1 }
-sub DSTATE_HEAD () { 2 } # /^diff --git /, /^index /, /^--- /, /^\+\+\+ /
-sub DSTATE_CTX () { 3 } # /^ /
-sub DSTATE_ADD () { 4 } # /^\+/
-sub DSTATE_DEL () { 5 } # /^\-/
-
-# maps the DSTATE_* to CSS class names compatible with what cgit uses:
-my @state2class = (
-	'', # init
-	'', # stat
-	'head',
-	'', # ctx
-	'add',
-	'del'
-);
-
 sub UNSAFE () { "^A-Za-z0-9\-\._~/" }
 
 my $OID_NULL = '0{7,40}';
 my $OID_BLOB = '[a-f0-9]{7,40}';
-my $PATH_X = '"?[^/]+/.+|/dev/null';
 
 # cf. git diff.c :: get_compact_summary
 my $DIFFSTAT_COMMENT = qr/\((?:new|gone|(?:(?:new|mode) [\+\-][lx]))\)/;
 
 # link to line numbers in blobs
 sub diff_hunk ($$$$) {
-	my ($dctx, $spfx, $ca, $cb) = @_;
-	my $oid_a = $dctx->{oid_a};
-	my $oid_b = $dctx->{oid_b};
-
-	(defined($spfx) && defined($oid_a) && defined($oid_b)) or
-		return "@@ $ca $cb @@";
+	my ($dst, $dctx, $ca, $cb) = @_;
+	my ($oid_a, $oid_b, $spfx) = @$dctx{qw(oid_a oid_b spfx)};
 
-	my ($n) = ($ca =~ /^-([0-9]+)/);
-	$n = defined($n) ? do { ++$n; "#n$n" } : '';
+	if (defined($spfx) && defined($oid_a) && defined($oid_b)) {
+		my ($n) = ($ca =~ /^-([0-9]+)/);
+		$n = defined($n) ? do { ++$n; "#n$n" } : '';
 
-	my $rv = qq(@@ <a\nhref="$spfx$oid_a/s/$dctx->{Q}$n">$ca</a>);
+		$$dst .= qq(@@ <a\nhref="$spfx$oid_a/s/$dctx->{Q}$n">$ca</a>);
 
-	($n) = ($cb =~ /^\+([0-9]+)/);
-	$n = defined($n) ? do { ++$n; "#n$n" } : '';
-
-	$rv .= qq( <a\nhref="$spfx$oid_b/s/$dctx->{Q}$n">$cb</a> @@);
+		($n) = ($cb =~ /^\+([0-9]+)/);
+		$n = defined($n) ? do { ++$n; "#n$n" } : '';
+		$$dst .= qq( <a\nhref="$spfx$oid_b/s/$dctx->{Q}$n">$cb</a> @@);
+	} else {
+		$$dst .= "@@ $ca $cb @@";
+	}
 }
 
 sub oid ($$$) {
@@ -68,16 +48,9 @@ sub oid ($$$) {
 	defined($spfx) ? qq(<a\nhref="$spfx$oid/s/$dctx->{Q}">$oid</a>) : $oid;
 }
 
-sub to_state ($$$) {
-	my ($dst, $state, $new_state) = @_;
-	$$dst .= '</span>' if $state2class[$state];
-	$_[1] = $new_state;
-	my $class = $state2class[$new_state] or return;
-	$$dst .= qq(<span\nclass="$class">);
-}
-
-sub anchor0 ($$$$$) {
-	my ($dst, $ctx, $linkify, $fn, $rest) = @_;
+# returns true if diffstat anchor written, false otherwise
+sub anchor0 ($$$$) {
+	my ($dst, $ctx, $fn, $rest) = @_;
 
 	my $orig = $fn;
 
@@ -100,16 +73,16 @@ sub anchor0 ($$$$$) {
 		my $spaces = ($orig =~ s/( +)\z//) ? $1 : '';
 		$$dst .= " <a\nid=i$attr\nhref=#$attr>" .
 			ascii_html($orig) . '</a>' . $spaces .
-			$linkify->to_html($rest);
+			$ctx->{-linkify}->to_html($rest);
 		return 1;
 	}
 	undef;
 }
 
-sub anchor1 ($$$$$) {
-	my ($dst, $ctx, $linkify, $pb, $s) = @_;
+# returns "diff --git" anchor destination, undef otherwise
+sub anchor1 ($$) {
+	my ($ctx, $pb) = @_;
 	my $attr = to_attr($ctx->{-apfx}.$pb) or return;
-	my $line = $linkify->to_html($s);
 
 	my $ok = delete $ctx->{-anchors}->{$attr};
 
@@ -125,131 +98,133 @@ sub anchor1 ($$$$$) {
 			last;
 		}
 	}
-	if ($ok && $line =~ s/^diff //) {
-		$$dst .= "<a\nhref=#i$attr\nid=$attr>diff</a> ".$line;
-		return 1;
-	}
-	undef
+	$ok ? "<a\nhref=#i$attr\nid=$attr>diff</a> --git" : undef
 }
 
-sub missing_diff_git_line ($$) {
-	my ($dctx, $pb) = @_;
-	# missing "diff --git ..."
-	$dctx->{path_b} = $pb;
-	$dctx->{Q} = '?b='.uri_escape_utf8($pb, UNSAFE);
-	my $pa = $dctx->{path_a};
-	if (defined($pa) && $pa ne $pb) {
-		$dctx->{Q} .= '&amp;a='. uri_escape_utf8($pa, UNSAFE);
-	}
-}
-
-sub flush_diff ($$$) {
-	my ($dst, $ctx, $linkify) = @_;
-	my $diff = $ctx->{-diff};
+sub diff_header ($$$$) {
+	my ($dst, $x, $ctx, $top) = @_;
+	my (undef, undef, $pa, $pb) = splice(@$top, 0, 4); # ignore oid_{a,b}
 	my $spfx = $ctx->{-spfx};
-	my $state = DSTATE_INIT;
-	my $dctx = { Q => '' }; # {}, keys: oid_a, oid_b, path_a, path_b
-
-	foreach my $s (@$diff) {
-		if ($s =~ /^---$/) {
-			to_state($dst, $state, DSTATE_STAT);
-			$$dst .= $s;
-		} elsif ($s =~ /^ / || ($s =~ /^$/ && $state >= DSTATE_CTX)) {
-			# works for common cases, but not weird/long filenames
-			if ($state == DSTATE_STAT &&
-					$s =~ /^ (.+)( +\| .*\z)/s) {
-				anchor0($dst, $ctx, $linkify, $1, $2) and next;
-			} elsif ($state2class[$state]) {
-				to_state($dst, $state, DSTATE_CTX);
-			}
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ /^-- $/) { # email signature begins
-			$state == DSTATE_INIT or
-				to_state($dst, $state, DSTATE_INIT);
-			$$dst .= $s;
-		} elsif ($s =~ m!^diff --git ($PATH_X) ($PATH_X)$!o) {
-			my ($pa, $pb) = ($1, $2);
-			if ($state != DSTATE_HEAD) {
-				to_state($dst, $state, DSTATE_HEAD);
-			}
-			$pa = (split('/', git_unquote($pa), 2))[1];
+	my $dctx = { spfx => $spfx };
+	if ($pa eq $pb && $pb ne '/dev/null') {
+		$pa = $pb = (split('/', git_unquote($pb), 2))[1];
+		$dctx->{Q} = "?b=".uri_escape_utf8($pb, UNSAFE);
+	} else {
+		my @q;
+		if ($pb ne '/dev/null') {
 			$pb = (split('/', git_unquote($pb), 2))[1];
-			$dctx = {
-				Q => "?b=".uri_escape_utf8($pb, UNSAFE),
-			};
-			if ($pa ne $pb) {
-				$dctx->{Q} .= '&amp;a='.
-					uri_escape_utf8($pa, UNSAFE);
-			}
-			anchor1($dst, $ctx, $linkify, $pb, $s) and next;
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ s/^(index $OID_NULL\.\.)($OID_BLOB)\b//o) {
-			$$dst .= $1 . oid($dctx, $spfx, $2);
-			$dctx = { Q => '' };
-			$$dst .= $linkify->to_html($s) ;
-		} elsif ($s =~ s/^index ($OID_BLOB)(\.\.$OID_NULL)\b//o) {
-			$$dst .= 'index ' . oid($dctx, $spfx, $1) . $2;
-			$dctx = { Q => '' };
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ /^index ($OID_BLOB)\.\.($OID_BLOB)/o) {
-			$dctx->{oid_a} = $1;
-			$dctx->{oid_b} = $2;
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ s/^@@ (\S+) (\S+) @@//) {
-			$$dst .= '</span>' if $state2class[$state];
-			$$dst .= qq(<span\nclass="hunk">);
-			$$dst .= diff_hunk($dctx, $spfx, $1, $2);
-			$$dst .= '</span>';
-			$state = DSTATE_CTX;
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ m!^--- ($PATH_X)!o) {
-			my $pa = $1;
+			push @q, 'b='.uri_escape_utf8($pb, UNSAFE);
+		}
+		if ($pa ne '/dev/null') {
 			$pa = (split('/', git_unquote($pa), 2))[1];
-			if (($dctx->{path_a} // '') ne $pa) {
-				# missing "diff --git ..." ?
-				$dctx->{path_a} = $pa;
-			}
-			# color only (no oid link) if missing dctx->{oid_*}
-			$state <= DSTATE_STAT and
-				to_state($dst, $state, DSTATE_HEAD);
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ m!^\+{3} ($PATH_X)!o) {
-			my $pb = $1;
-			$pb = (split('/', git_unquote($pb), 2))[1];
-			if (($dctx->{path_b} // '') ne $pb) {
-				missing_diff_git_line($dctx, $pb);
-			}
+			push @q, 'a='.uri_escape_utf8($pa, UNSAFE);
+		}
+		$dctx->{Q} = '?'.join('&amp;', @q);
+	}
 
-			# color only (no oid link) if missing dctx->{oid_*}
-			$state <= DSTATE_STAT and
-				to_state($dst, $state, DSTATE_HEAD);
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ /^\+/) {
-			if ($state != DSTATE_ADD && $state > DSTATE_STAT) {
-				to_state($dst, $state, DSTATE_ADD);
+	# linkify early and all at once, since we know the following
+	# subst ops on $$x won't need further escaping:
+	$$x = $ctx->{-linkify}->to_html($$x);
+
+	# no need to capture oid_a and oid_b on add/delete,
+	# we just linkify OIDs directly via s///e in conditional
+	if (($$x =~ s/^(index $OID_NULL\.\.)($OID_BLOB)\b/
+			$1 . oid($dctx, $spfx, $2)/emos) ||
+		($$x =~ s/^index ($OID_BLOB)(\.\.$OID_NULL)\b/
+			'index ' . oid($dctx, $spfx, $1) . $2/emos)) {
+	} elsif ($$x =~ /^index ($OID_BLOB)\.\.($OID_BLOB)/mos) {
+		# modification-only, not add/delete:
+		# linkify hunk headers later using oid_a and oid_b
+		@$dctx{qw(oid_a oid_b)} = ($1, $2);
+	} else {
+		warn "BUG? <$$x> had no ^index line";
+	}
+	$$x =~ s!^diff --git!anchor1($ctx, $pb) // 'diff --git'!emos;
+	$$dst .= qq(<span\nclass="head">);
+	$$dst .= $$x;
+	$$dst .= '</span>';
+	$dctx;
+}
+
+sub diff_before_or_after ($$$) {
+	my ($dst, $ctx, $x) = @_;
+	my $linkify = $ctx->{-linkify};
+	for my $y (split(/(^---\r?\n)/sm, $$x)) {
+		if ($y =~ /\A---\r?\n\z/s) {
+			$$dst .= "---\n"; # all HTML is "\r\n" => "\n"
+		} elsif ($y =~ /^ [0-9]+ files? changed, /sm) {
+			# ok, looks like a diffstat, go line-by-line:
+			for my $l (split(/^/m, $y)) {
+				if ($l =~ /^ (.+)( +\| .*\z)/s) {
+					anchor0($dst, $ctx, $1, $2) and next;
+				}
+				$$dst .= $linkify->to_html($l);
 			}
-			$$dst .= $linkify->to_html($s);
-		} elsif ($s =~ /^-/) {
-			if ($state != DSTATE_DEL && $state > DSTATE_STAT) {
-				to_state($dst, $state, DSTATE_DEL);
+		} else { # commit message, notes, etc
+			$$dst .= $linkify->to_html($y);
+		}
+	}
+}
+
+sub flush_diff ($$$) {
+	my ($dst, $ctx, $cur) = @_;
+	state $LF = qr!\r?\n!;
+	state $ANY = qr![^\r\n]!;
+	state $FN = qr!(?:"?[^/\n]+/[^\r\n]+|/dev/null)!;
+
+	my @top = split(/(
+		(?:	# begin header stuff, don't capture filenames, here,
+			# but instead wait for the --- and +++ lines.
+			(?:^diff\x20--git\x20$FN\x20$FN$LF)
+
+			# old mode || new mode || copy|rename|deleted|...
+			(?:^[a-z]$ANY+$LF)*
+		)? # end of optional stuff, everything below is required
+		^index\x20($OID_BLOB)\.\.($OID_BLOB)$ANY*$LF
+		^---\x20($FN)$LF
+		^\+{3}\x20($FN)$LF)/smxo, $$cur);
+	$$cur = undef;
+
+	my $linkify = $ctx->{-linkify};
+	my $dctx; # {}, keys: Q, oid_a, oid_b
+
+	while (defined(my $x = shift @top)) {
+		if (scalar(@top) >= 4 &&
+				$top[1] =~ /\A$OID_BLOB\z/os &&
+				$top[0] =~ /\A$OID_BLOB\z/os) {
+			$dctx = diff_header($dst, \$x, $ctx, \@top);
+		} elsif ($dctx) {
+			my $after = '';
+			for my $s (split(/((?:(?:^\+[^\n]*\n)+)|
+					(?:(?:^-[^\n]*\n)+)|
+					(?:^@@ [^\n]+\n))/xsm, $x)) {
+				if (!defined($dctx)) {
+					$after .= $s;
+				} elsif ($s =~ s/\A@@ (\S+) (\S+) @@//) {
+					$$dst .= qq(<span\nclass="hunk">);
+					diff_hunk($dst, $dctx, $1, $2);
+					$$dst .= $linkify->to_html($s);
+					$$dst .= '</span>';
+				} elsif ($s =~ /\A\+/) {
+					$$dst .= qq(<span\nclass="add">);
+					$$dst .= $linkify->to_html($s);
+					$$dst .= '</span>';
+				} elsif ($s =~ /\A-- $/sm) { # email sig starts
+					$dctx = undef;
+					$after .= $s;
+				} elsif ($s =~ /\A-/) {
+					$$dst .= qq(<span\nclass="del">);
+					$$dst .= $linkify->to_html($s);
+					$$dst .= '</span>';
+				} else {
+					$$dst .= $linkify->to_html($s);
+				}
 			}
-			$$dst .= $linkify->to_html($s);
-		# ignore the following lines in headers:
-		} elsif ($s =~ /^(?:dis)similarity index/ ||
-			 $s =~ /^(?:old|new) mode/ ||
-			 $s =~ /^(?:deleted|new) file mode/ ||
-			 $s =~ /^(?:copy|rename) (?:from|to) / ||
-			 $s =~ /^(?:dis)?similarity index /) {
-			$$dst .= $linkify->to_html($s);
+			diff_before_or_after($dst, $ctx, \$after) unless $dctx;
 		} else {
-			$state <= DSTATE_STAT or
-				to_state($dst, $state, DSTATE_INIT);
-			$$dst .= $linkify->to_html($s);
+			diff_before_or_after($dst, $ctx, \$x);
 		}
 	}
-	@$diff = ();
-	$$dst .= '</span>' if $state2class[$state];
-	undef;
 }
 
 1;

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-25  4:44 [PATCH 00/22] HTML display cleanups, fixes, speedups Eric Wong
2020-01-25  4:44 ` [PATCH 01/22] www*stream: favor \&close instead of *close Eric Wong
2020-01-25  4:44 ` [PATCH 02/22] www: use "skel" terminology consistently Eric Wong
2020-01-25  4:44 ` [PATCH 03/22] view: improve readability around walk_thread Eric Wong
2020-01-25  4:44 ` [PATCH 04/22] searchview: keep $noop sub private to the package Eric Wong
2020-01-25  4:44 ` [PATCH 05/22] view: reduce parameters for html_footer Eric Wong
2020-01-25  4:44 ` [PATCH 06/22] view: thread_skel: drop constant tpfx parameter Eric Wong
2020-01-25  4:44 ` [PATCH 07/22] view: simplify duplicate Message-ID handling Eric Wong
2020-01-25  4:44 ` [PATCH 08/22] wwwstream: discard single-use $ctx fields after use Eric Wong
2020-01-25  4:44 ` [PATCH 09/22] view: start performing buffering into {obuf} Eric Wong
2020-01-25  4:44 ` [PATCH 10/22] t/plack.t: modernize and unindent Eric Wong
2020-01-25  4:44 ` [PATCH 11/22] init: use Import::run_die instead of system() Eric Wong
2020-01-25  4:45 ` [PATCH 12/22] tests: move the majority of t/view.t into t/plack.t Eric Wong
2020-01-25  4:45 ` [PATCH 13/22] xt/perf-msgview: switch to multipart_text_as_html Eric Wong
2020-01-25  4:45 ` [PATCH 14/22] view: inline and eliminate msg_html Eric Wong
2020-01-25  4:45 ` [PATCH 15/22] linkify: compile $LINK_RE once Eric Wong
2020-01-25  4:45 ` [PATCH 16/22] linkify: move to_html over from ViewDiff Eric Wong
2020-01-25  4:45 ` [PATCH 17/22] searchidx: skip filenames on "diff --git ..." Eric Wong
2020-01-25  4:45 ` [PATCH 18/22] searchidx: don't assume "a/" and "b/" as prefixes Eric Wong
2020-01-25  4:45 ` [PATCH 19/22] viewdiff: add "b=" param with non-standard diff prefix Eric Wong
2020-01-25  4:45 ` [PATCH 20/22] viewdiff: add "b=" param when missing "diff --git" line Eric Wong
2020-01-25  4:45 ` [PATCH 21/22] viewdiff: use autovivification for long_path hash Eric Wong
2020-01-25  4:45 ` [PATCH 22/22] viewdiff: rewrite and simplify Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git