user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Leah Neukirchen <leah@vuxu.org>
Cc: meta@public-inbox.org
Subject: [PATCH] www: use undecoded paths for Message-ID extraction
Date: Wed, 13 Jun 2018 22:43:56 +0000	[thread overview]
Message-ID: <20180613224356.jz7abxkyg4i3tlf5@dcvr> (raw)
In-Reply-To: <20180613214055.2nudcx5e7w2y4q73@dcvr>

> Leah Neukirchen <leah@vuxu.org> wrote:
> > During testing, we also found another thing when obscure characters
> > are used in Message-IDs, esp. / and ?.
> > 
> > E.g. using a Message-ID of <F1WYEAZPOF.3LOD2T7ZHY9I1@localdomain/raw/T>
> > will create a corrupt link.  Some more "ideas" are at
> > https://inbox.vuxu.org/pi-test/

I guess I'm spoiled by Rack where PATH_INFO is undecoded :x
However, REQUEST_URI is specified in PSGI specs(*)

Very lightly tested, but this seems to work; additions to the
test suite will be necessary...

------8<----
Subject: [PATCH] www: use undecoded paths for Message-ID extraction

In PSGI, PATH_INFO contains URI-decoded paths which cause
problems when Message-IDs contain ambiguous characters for used
for routing.  Instead, extract the undecoded path from
REQUEST_URI and use that.

Reported-by: Leah Neukirchen <leah@vuxu.org>
  https://public-inbox.org/meta/8736xsb5s5.fsf@vuxu.org/
---
 lib/PublicInbox/WWW.pm | 40 ++++++++++++++++++++++++++++------------
 t/cgi.t                |  2 ++
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 24e24f1..c1c3926 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -36,6 +36,17 @@ sub run {
 	PublicInbox::WWW->new->call($req->env);
 }
 
+my %path_re_cache;
+
+sub path_re ($) {
+	my $sn = $_[0]->{SCRIPT_NAME};
+	$path_re_cache{$sn} ||= do {
+		$sn = '/'.$sn unless index($sn, '/') == 0;
+		$sn =~ s!/\z!!;
+		qr!\A(?:https?://[^/]+)?\Q$sn\E(/[^\?\#]+)!;
+	};
+}
+
 sub call {
 	my ($self, $env) = @_;
 	my $ctx = { env => $env, www => $self };
@@ -50,7 +61,8 @@ sub call {
 	} split(/[&;]+/, $env->{QUERY_STRING});
 	$ctx->{qp} = \%qp;
 
-	my $path_info = $env->{PATH_INFO};
+	# not using $env->{PATH_INFO} here since that's already decoded
+	my ($path_info) = ($env->{REQUEST_URI} =~ path_re($env));
 	my $method = $env->{REQUEST_METHOD};
 
 	if ($method eq 'POST') {
@@ -91,13 +103,13 @@ sub call {
 		invalid_inbox_mid($ctx, $1, $2) || get_attach($ctx, $idx, $fn);
 	# in case people leave off the trailing slash:
 	} elsif ($path_info =~ m!$INBOX_RE/$MID_RE/(T|t)\z!o) {
-		my ($inbox, $mid, $suffix) = ($1, $2, $3);
+		my ($inbox, $mid_ue, $suffix) = ($1, $2, $3);
 		$suffix .= $suffix =~ /\A[tT]\z/ ? '/#u' : '/';
-		r301($ctx, $inbox, $mid, $suffix);
+		r301($ctx, $inbox, $mid_ue, $suffix);
 
 	} elsif ($path_info =~ m!$INBOX_RE/$MID_RE/R/?\z!o) {
-		my ($inbox, $mid) = ($1, $2);
-		r301($ctx, $inbox, $mid, '#R');
+		my ($inbox, $mid_ue) = ($1, $2);
+		r301($ctx, $inbox, $mid_ue, '#R');
 
 	} elsif ($path_info =~ m!$INBOX_RE/$MID_RE/f/?\z!o) {
 		r301($ctx, $1, $2);
@@ -164,11 +176,11 @@ sub invalid_inbox ($$) {
 
 # returns undef if valid, array ref response if invalid
 sub invalid_inbox_mid {
-	my ($ctx, $inbox, $mid) = @_;
+	my ($ctx, $inbox, $mid_ue) = @_;
 	my $ret = invalid_inbox($ctx, $inbox);
 	return $ret if $ret;
 
-	$ctx->{mid} = $mid;
+	my $mid = $ctx->{mid} = uri_unescape($mid_ue);
 	my $ibx = $ctx->{-inbox};
 	if ($mid =~ m!\A([a-f0-9]{2})([a-f0-9]{38})\z!) {
 		my ($x2, $x38) = ($1, $2);
@@ -177,7 +189,7 @@ sub invalid_inbox_mid {
 		require Email::Simple;
 		my $s = Email::Simple->new($str);
 		$mid = PublicInbox::MID::mid_clean($s->header('Message-ID'));
-		return r301($ctx, $inbox, $mid);
+		return r301($ctx, $inbox, mid_escape($mid));
 	}
 	undef;
 }
@@ -352,7 +364,7 @@ sub legacy_redirects {
 }
 
 sub r301 {
-	my ($ctx, $inbox, $mid, $suffix) = @_;
+	my ($ctx, $inbox, $mid_ue, $suffix) = @_;
 	my $obj = $ctx->{-inbox};
 	unless ($obj) {
 		my $r404 = invalid_inbox($ctx, $inbox);
@@ -361,7 +373,11 @@ sub r301 {
 	}
 	my $url = $obj->base_url($ctx->{env});
 	my $qs = $ctx->{env}->{QUERY_STRING};
-	$url .= (mid_escape($mid) . '/') if (defined $mid);
+	if (defined $mid_ue) {
+		# common, and much nicer as '@' than '%40':
+		$mid_ue =~ s/%40/@/g;
+		$url .= $mid_ue . '/';
+	}
 	$url .= $suffix if (defined $suffix);
 	$url .= "?$qs" if $qs ne '';
 
@@ -371,9 +387,9 @@ sub r301 {
 }
 
 sub msg_page {
-	my ($ctx, $inbox, $mid, $e) = @_;
+	my ($ctx, $inbox, $mid_ue, $e) = @_;
 	my $ret;
-	$ret = invalid_inbox_mid($ctx, $inbox, $mid) and return $ret;
+	$ret = invalid_inbox_mid($ctx, $inbox, $mid_ue) and return $ret;
 	'' eq $e and return get_mid_html($ctx);
 	'T/' eq $e and return get_thread($ctx, 1);
 	't/' eq $e and return get_thread($ctx);
diff --git a/t/cgi.t b/t/cgi.t
index bd92ca3..2e2476d 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -225,6 +225,8 @@ sub cgi_run {
 	my %env = (
 		PATH_INFO => $_[0],
 		QUERY_STRING => $_[1] || "",
+		SCRIPT_NAME => '',
+		REQUEST_URI => $_[0] . ($_[1] ? "?$_[1]" : ''),
 		REQUEST_METHOD => $_[2] || "GET",
 		GATEWAY_INTERFACE => 'CGI/1.1',
 		HTTP_ACCEPT => '*/*',
-- 
(*) git clone https://github.com/plack/psgi-specs.git

  reply	other threads:[~2018-06-13 22:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-09 17:06 Some points on public-inbox Leah Neukirchen
2018-06-12 10:09 ` Eric Wong
2018-06-12 11:31   ` Leah Neukirchen
2018-06-13  2:07     ` [PATCH] Makefile.PL: do not depend on git Eric Wong
2018-06-13 14:26       ` Leah Neukirchen
2018-06-13 21:04         ` Eric Wong
2018-06-13 21:20           ` Leah Neukirchen
2018-06-13 21:40     ` Some points on public-inbox Eric Wong
2018-06-13 22:43       ` Eric Wong [this message]
2018-06-26  7:46         ` [PATCH] additional tests for bad Message-IDs in URLs Eric Wong
2018-06-12 13:19   ` Some points on public-inbox Leah Neukirchen
2019-01-05  8:39     ` Eric Wong
2018-06-12 17:05   ` Konstantin Ryabitsev
2018-06-13  1:57     ` Eric Wong
2019-04-18  8:25   ` [RFC] www: support listing of inboxes Eric Wong
2019-05-05 23:36     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180613224356.jz7abxkyg4i3tlf5@dcvr \
    --to=e@80x24.org \
    --cc=leah@vuxu.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).