user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 0/3] avoid redundant CRLF handling
@ 2020-02-24  7:33 Eric Wong
  2020-02-24  7:33 ` [PATCH 1/3] hval: ascii_html: drop CRLF => LF conversion Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2020-02-24  7:33 UTC (permalink / raw)
  To: meta

Redundant ops waste cycles and make the code more difficult to
follow.  And 3/3 is an overdue cleanup which can also serve as
an impromptu test for solver...

Eric Wong (3):
  hval: ascii_html: drop CRLF => LF conversion
  viewdiff: remove optional CR handling
  examples/nginx_proxy: convert CRLF to LF

 examples/nginx_proxy        | 48 ++++++++++++++++++-------------------
 lib/PublicInbox/Hval.pm     |  1 -
 lib/PublicInbox/ViewDiff.pm | 11 +++++----
 lib/PublicInbox/ViewVCS.pm  |  2 +-
 t/plack.t                   | 27 +++++++++++++++++++++
 5 files changed, 58 insertions(+), 31 deletions(-)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] hval: ascii_html: drop CRLF => LF conversion
  2020-02-24  7:33 [PATCH 0/3] avoid redundant CRLF handling Eric Wong
@ 2020-02-24  7:33 ` Eric Wong
  2020-02-24  7:33 ` [PATCH 2/3] viewdiff: remove optional CR handling Eric Wong
  2020-02-24  7:33 ` [PATCH 3/3] examples/nginx_proxy: convert CRLF to LF Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-02-24  7:33 UTC (permalink / raw)
  To: meta

Instead, we add CRLF conversion to the only remaining place
which needs it, ViewVCS.  This save many redundant ops in in
many places.

The only other place where this mattered was in
View::add_text_body, but we already started doing CRLF
conversions when we added diff parsing and link generation for
ViewVCS.  Otherwise, all other places we used this was for
header viewing and Email::MIME doesn't preserve CRLF in headers.
---
 lib/PublicInbox/Hval.pm    |  1 -
 lib/PublicInbox/ViewVCS.pm |  2 +-
 t/plack.t                  | 27 +++++++++++++++++++++++++++
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Hval.pm b/lib/PublicInbox/Hval.pm
index 5f7ab513..79005d21 100644
--- a/lib/PublicInbox/Hval.pm
+++ b/lib/PublicInbox/Hval.pm
@@ -55,7 +55,6 @@ sub src_escape ($) {
 
 sub ascii_html {
 	my ($s) = @_;
-	$s =~ s/\r\n/\n/sg; # fixup bad line endings
 	$s =~ s/([<>&'"\x7f\x00-\x1f])/$xhtml_map{$1}/sge;
 	$enc_ascii->encode($s, Encode::HTMLCREF);
 }
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 1379bd58..2f8e1c4f 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -164,7 +164,7 @@ sub solve_result {
 
 	# TODO: detect + convert to ensure validity
 	utf8::decode($$blob);
-	my $nl = ($$blob =~ tr/\n/\n/);
+	my $nl = ($$blob =~ s/\r?\n/\n/sg);
 	my $pad = length($nl);
 
 	$l->linkify_1($$blob);
diff --git a/t/plack.t b/t/plack.t
index fe767aed..ea318792 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -109,6 +109,23 @@ EOF
 	like($mime->body_raw, qr/hi =3D bye=/, 'our test used QP correctly');
 	$im->add($mime);
 
+	my $crlf = <<EOF;
+From: Me
+  <me\@example.com>
+To: $addr
+Message-Id: <crlf\@example.com>
+Subject: carriage
+  return
+  in
+  long
+  subject
+Date: Fri, 02 Oct 1993 00:00:00 +0000
+
+:(
+EOF
+	$crlf =~ s/\n/\r\n/sg;
+	$im->add(Email::MIME->new($crlf));
+
 	$im->done;
 }
 
@@ -120,6 +137,16 @@ test_psgi($app, sub {
 	}
 });
 
+test_psgi($app, sub {
+	my ($cb) = @_;
+	my $res = $cb->(GET('http://example.com/test/crlf@example.com/'));
+	is($res->code, 200, 'retrieved CRLF as HTML');
+	unlike($res->content, qr/\r/, 'no CR in HTML');
+	$res = $cb->(GET('http://example.com/test/crlf@example.com/raw'));
+	is($res->code, 200, 'retrieved CRLF raw');
+	like($res->content, qr/\r/, 'CR preserved in raw message');
+});
+
 # redirect with newsgroup
 test_psgi($app, sub {
 	my ($cb) = @_;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] viewdiff: remove optional CR handling
  2020-02-24  7:33 [PATCH 0/3] avoid redundant CRLF handling Eric Wong
  2020-02-24  7:33 ` [PATCH 1/3] hval: ascii_html: drop CRLF => LF conversion Eric Wong
@ 2020-02-24  7:33 ` Eric Wong
  2020-02-24  7:33 ` [PATCH 3/3] examples/nginx_proxy: convert CRLF to LF Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-02-24  7:33 UTC (permalink / raw)
  To: meta

The only caller of `flush_diff' is `add_text_body', and that
already did CRLF conversion on the text part.  The regexps in
SolverGit still need to preserve CR, however, since that
actually applies patches (instead of rendering them), and we
need to preserve CRLF patches for CRLF files.
---
 lib/PublicInbox/ViewDiff.pm | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index 604b1f23..0f5c0e4e 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -150,8 +150,8 @@ sub diff_header ($$$$) {
 sub diff_before_or_after ($$$) {
 	my ($dst, $ctx, $x) = @_;
 	my $linkify = $ctx->{-linkify};
-	for my $y (split(/(^---\r?\n)/sm, $$x)) {
-		if ($y =~ /\A---\r?\n\z/s) {
+	for my $y (split(/(^---\n)/sm, $$x)) {
+		if ($y =~ /\A---\n\z/s) {
 			$$dst .= "---\n"; # all HTML is "\r\n" => "\n"
 		} elsif ($y =~ /^ [0-9]+ files? changed, /sm) {
 			# ok, looks like a diffstat, go line-by-line:
@@ -167,11 +167,12 @@ sub diff_before_or_after ($$$) {
 	}
 }
 
+# callers must do CRLF => LF conversion before calling this
 sub flush_diff ($$$) {
 	my ($dst, $ctx, $cur) = @_;
-	state $LF = qr!\r?\n!;
-	state $ANY = qr![^\r\n]!;
-	state $FN = qr!(?:"?[^/\n]+/[^\r\n]+|/dev/null)!;
+	state $LF = qr!\n!;
+	state $ANY = qr![^\n]!;
+	state $FN = qr!(?:"?[^/\n]+/[^\n]+|/dev/null)!;
 
 	my @top = split(/(
 		(?:	# begin header stuff, don't capture filenames, here,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/3] examples/nginx_proxy: convert CRLF to LF
  2020-02-24  7:33 [PATCH 0/3] avoid redundant CRLF handling Eric Wong
  2020-02-24  7:33 ` [PATCH 1/3] hval: ascii_html: drop CRLF => LF conversion Eric Wong
  2020-02-24  7:33 ` [PATCH 2/3] viewdiff: remove optional CR handling Eric Wong
@ 2020-02-24  7:33 ` Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-02-24  7:33 UTC (permalink / raw)
  To: meta

It was the only file in our tree which had CRLF line endings,
so make it consistent with the rest.
---
 examples/nginx_proxy | 48 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/examples/nginx_proxy b/examples/nginx_proxy
index 38e60643..d8d1e6df 100644
--- a/examples/nginx_proxy
+++ b/examples/nginx_proxy
@@ -1,24 +1,24 @@
-# Example NGINX configuration to proxy-pass requests\r
-# to public-inbox-httpd or to a standalone PSGI/Plack server.\r
-# The daemon is assumed to be running locally on port 8001.\r
-# Adjust ssl certificate paths if you use any, or remove\r
-# the ssl configuration directives if you don't.\r
-server {\r
-	server_name _;\r
-	listen 80;\r
-\r
-	access_log /var/log/nginx/public-inbox-httpd_access.log;\r
-	error_log /var/log/nginx/public-inbox-httpd_error.log;\r
-\r
-	location ~* ^/(.*)$ {\r
-		proxy_set_header    HOST $host;\r
-		proxy_set_header    X-Real-IP $remote_addr;\r
-		proxy_set_header    X-Forwarded-Proto $scheme;\r
-		proxy_pass          http://127.0.0.1:8001$request_uri;\r
-	}\r
-\r
-	listen 443 ssl;\r
-	ssl_certificate /path/to/certificate.pem;\r
-	ssl_certificate_key /path/to/certificate_key.pem;\r
-}\r
-\r
+# Example NGINX configuration to proxy-pass requests
+# to public-inbox-httpd or to a standalone PSGI/Plack server.
+# The daemon is assumed to be running locally on port 8001.
+# Adjust ssl certificate paths if you use any, or remove
+# the ssl configuration directives if you don't.
+server {
+	server_name _;
+	listen 80;
+
+	access_log /var/log/nginx/public-inbox-httpd_access.log;
+	error_log /var/log/nginx/public-inbox-httpd_error.log;
+
+	location ~* ^/(.*)$ {
+		proxy_set_header    HOST $host;
+		proxy_set_header    X-Real-IP $remote_addr;
+		proxy_set_header    X-Forwarded-Proto $scheme;
+		proxy_pass          http://127.0.0.1:8001$request_uri;
+	}
+
+	listen 443 ssl;
+	ssl_certificate /path/to/certificate.pem;
+	ssl_certificate_key /path/to/certificate_key.pem;
+}
+

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24  7:33 [PATCH 0/3] avoid redundant CRLF handling Eric Wong
2020-02-24  7:33 ` [PATCH 1/3] hval: ascii_html: drop CRLF => LF conversion Eric Wong
2020-02-24  7:33 ` [PATCH 2/3] viewdiff: remove optional CR handling Eric Wong
2020-02-24  7:33 ` [PATCH 3/3] examples/nginx_proxy: convert CRLF to LF Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git