user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset
@ 2019-09-24  4:10 edef
  2019-09-24  4:10 ` [PATCH 1/1] wwwstream: copy $ctx->{env} in new edef
  2019-09-26  3:03 ` [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset Eric Wong
  0 siblings, 2 replies; 4+ messages in thread
From: edef @ 2019-09-24  4:10 UTC (permalink / raw)
  To: meta; +Cc: hi, edef

We're trying to get public-inbox working with a PSGI file that mounts
it to a subdirectory. This seems like it's intended to be a supported
use case, with stuff paying attention to SCRIPT_NAME and all when
generating URLs.

However, Plack::App::URLMap seems determined to reset SCRIPT_NAME
before getline gets called:

    my $orig_path_info   = $env->{PATH_INFO};
    my $orig_script_name = $env->{SCRIPT_NAME};

    $env->{PATH_INFO}  = $path;
    $env->{SCRIPT_NAME} = $script_name . $location;
    return $self->response_cb($app->($env), sub {
        $env->{PATH_INFO} = $orig_path_info;
        $env->{SCRIPT_NAME} = $orig_script_name;
    });

I'm not sure whether public-inbox or Plack is in the wrong here, but
the timing works out poorly. By the time
PublicInbox::WwwStream::_html_end gets invoked SCRIPT_NAME is blank,
and the wrong URLs get generated.

Copying env seems to fix it, and that's what the attached patch does.
I'm pretty sure this is the wrong approach, but it seems to work.

edef (1):
  wwwstream: copy $ctx->{env} in new

 lib/PublicInbox/WwwStream.pm | 4 ++++
 1 file changed, 4 insertions(+)

base-commit: 55283284757af5f5d8f63fd17d53340e4dea34fb
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] wwwstream: copy $ctx->{env} in new
  2019-09-24  4:10 [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset edef
@ 2019-09-24  4:10 ` edef
  2019-09-26  3:03 ` [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset Eric Wong
  1 sibling, 0 replies; 4+ messages in thread
From: edef @ 2019-09-24  4:10 UTC (permalink / raw)
  To: meta; +Cc: hi, edef

Plack::App::URLMap wipes out SCRIPT_NAME after we return,
and _html_end needs it for generating correct URLs
---
 lib/PublicInbox/WwwStream.pm | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index e0823c8..6bca095 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -19,6 +19,10 @@ sub close {}
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
+
+	my %env = %{$ctx->{env}}; # full hash copy
+	$ctx->{env} = \%env;
+
 	bless { nr => 0, cb => $cb || *close, ctx => $ctx }, $class;
 }
 
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset
  2019-09-24  4:10 [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset edef
  2019-09-24  4:10 ` [PATCH 1/1] wwwstream: copy $ctx->{env} in new edef
@ 2019-09-26  3:03 ` Eric Wong
  2019-10-01  7:13   ` [PATCH] www: fix absolute URLs when mounted under a subdir Eric Wong
  1 sibling, 1 reply; 4+ messages in thread
From: Eric Wong @ 2019-09-26  3:03 UTC (permalink / raw)
  To: edef; +Cc: meta, hi

edef <edef@edef.eu> wrote:
> We're trying to get public-inbox working with a PSGI file that mounts
> it to a subdirectory. This seems like it's intended to be a supported
> use case, with stuff paying attention to SCRIPT_NAME and all when
> generating URLs.
> 
> However, Plack::App::URLMap seems determined to reset SCRIPT_NAME
> before getline gets called:
> 
>     my $orig_path_info   = $env->{PATH_INFO};
>     my $orig_script_name = $env->{SCRIPT_NAME};
> 
>     $env->{PATH_INFO}  = $path;
>     $env->{SCRIPT_NAME} = $script_name . $location;
>     return $self->response_cb($app->($env), sub {
>         $env->{PATH_INFO} = $orig_path_info;
>         $env->{SCRIPT_NAME} = $orig_script_name;
>     });

Sounds like a familiar problem to me :x

> I'm not sure whether public-inbox or Plack is in the wrong here, but
> the timing works out poorly. By the time
> PublicInbox::WwwStream::_html_end gets invoked SCRIPT_NAME is blank,
> and the wrong URLs get generated.
> 
> Copying env seems to fix it, and that's what the attached patch does.
> I'm pretty sure this is the wrong approach, but it seems to work.

Yeah, it's a big hash and not needed to copy the whole thing.

I gotta run, now, but I think the patch below will work for you
by precalculating base_url up front.   Can you confirm?  Thanks.

Also, I suspect the mbox Archived-At headers could be wrong
and need a similar change...  Maybe Atom feeds, too.

diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index e0823c8d..b240c071 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -19,7 +19,17 @@ sub close {}
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
-	bless { nr => 0, cb => $cb || *close, ctx => $ctx }, $class;
+
+	my $env = $ctx->{env};
+	my $ibx = $ctx->{-inbox};
+	my $base_url = $ibx->base_url($env);
+	chop $base_url; # no trailing slash for clone
+	bless {
+		nr => 0,
+		cb => $cb || *close,
+		ctx => $ctx,
+		base_url => $base_url,
+	}, $class;
 }
 
 sub response {
@@ -83,8 +93,7 @@ sub _html_end {
 	my $desc = ascii_html($ibx->description);
 
 	my (%seen, @urls);
-	my $http = $ibx->base_url($ctx->{env});
-	chop $http; # no trailing slash for clone
+	my $http = $self->{base_url};
 	my $max = $ibx->max_git_epoch;
 	my $dir = (split(m!/!, $http))[-1];
 	if (defined($max)) { # v2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH] www: fix absolute URLs when mounted under a subdir
  2019-09-26  3:03 ` [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset Eric Wong
@ 2019-10-01  7:13   ` Eric Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2019-10-01  7:13 UTC (permalink / raw)
  To: edef; +Cc: meta, hi

Eric Wong <e@80x24.org> wrote:
> Also, I suspect the mbox Archived-At headers could be wrong
> and need a similar change...  Maybe Atom feeds, too.

Yup, mboxrd code needed changing.  Atom feeds already had full
URLs (and tests), so I added some test cases to t/psgi_mount.t
and fixed the remaining cases.

Just pushed this out to master:

---------8<-----------
Subject: [PATCH] www: fix absolute URLs when mounted under a subdir

While we avoid generating absolute URLs in most cases, our
"git clone" instructions and URL headers in mboxrd files
contain full URLs.

So do the same thing we do for WwwAtomStream and pre-generate
the full URL before Plack::App::URLMap changes $env->{PATH_INFO}
and $env->{SCRIPT_NAME} back to their original values.

Reported-by: edef <edef@edef.eu>
Link: https://public-inbox.org/meta/cover.0f97c47bb88db8b875be7497289d8fedd3b11991.1569296942.git-series.edef@edef.eu/
---
 lib/PublicInbox/Mbox.pm      |  5 ++++-
 lib/PublicInbox/WwwStream.pm | 13 +++++++++---
 t/psgi_mount.t               | 38 ++++++++++++++++++++++++++++++++++--
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 6d902e6c..67b671f5 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -60,10 +60,12 @@ sub getline {
 
 sub close {} # noop
 
+# /$INBOX/$MESSAGE_ID/raw
 sub emit_raw {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
 	my $ibx = $ctx->{-inbox};
+	$ctx->{base_url} = $ibx->base_url($ctx->{env});
 	my ($mref, $more, $id, $prev, $next);
 	if (my $over = $ibx->over) {
 		my $smsg = $over->next_by_mid($mid, \$id, \$prev) or return;
@@ -97,7 +99,7 @@ sub msg_hdr ($$;$) {
 		$header_obj->header_set($d);
 	}
 	my $ibx = $ctx->{-inbox};
-	my $base = $ibx->base_url($ctx->{env});
+	my $base = $ctx->{base_url};
 	$mid = $ctx->{mid} unless defined $mid;
 	$mid = mid_escape($mid);
 	my @append = (
@@ -246,6 +248,7 @@ use PublicInbox::Hval qw/to_filename/;
 sub new {
 	my ($class, $ctx, $cb) = @_;
 	my $buf = '';
+	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
 	bless {
 		buf => \$buf,
 		gz => IO::Compress::Gzip->new(\$buf, Time => 0),
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 7399b0ad..f5338c39 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -19,7 +19,15 @@ sub close {}
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
-	bless { nr => 0, cb => $cb || *close, ctx => $ctx }, $class;
+
+	my $base_url = $ctx->{-inbox}->base_url($ctx->{env});
+	chop $base_url; # no trailing slash for clone
+	bless {
+		nr => 0,
+		cb => $cb || *close,
+		ctx => $ctx,
+		base_url => $base_url,
+	}, $class;
 }
 
 sub response {
@@ -83,8 +91,7 @@ sub _html_end {
 	my $desc = ascii_html($ibx->description);
 
 	my (%seen, @urls);
-	my $http = $ibx->base_url($ctx->{env});
-	chop $http; # no trailing slash for clone
+	my $http = $self->{base_url};
 	my $max = $ibx->max_git_epoch;
 	my $dir = (split(m!/!, $http))[-1];
 	if (defined($max)) { # v2
diff --git a/t/psgi_mount.t b/t/psgi_mount.t
index 05dbd736..8da2bc89 100644
--- a/t/psgi_mount.t
+++ b/t/psgi_mount.t
@@ -60,11 +60,24 @@ test_psgi($app, sub {
 	unlike($res->content, qr!\b\Qhttp://[^/]+/test/\E!,
 		'No URLs which are not mount-aware');
 
-	# redirects
+	$res = $cb->(GET('/a/test/new.html'));
+	like($res->content, qr!git clone --mirror http://[^/]+/a/test\b!,
+		'clone URL in new.html is mount-aware');
+
 	$res = $cb->(GET('/a/test/blah%40example.com/'));
 	is($res->code, 200, 'OK with URLMap mount');
+	like($res->content, qr!git clone --mirror http://[^/]+/a/test\b!,
+		'clone URL in /$INBOX/$MESSAGE_ID/ is mount-aware');
+
 	$res = $cb->(GET('/a/test/blah%40example.com/raw'));
 	is($res->code, 200, 'OK with URLMap mount');
+	like($res->content, qr!^List-Archive: <http://[^/]+/a/test/>!m,
+		'List-Archive set in /raw mboxrd');
+	like($res->content,
+		qr!^Archived-At: <http://[^/]+/a/test/blah\@example\.com/>!m,
+		'Archived-At set in /raw mboxrd');
+
+	# redirects
 	$res = $cb->(GET('/a/test/m/blah%40example.com.html'));
 	is($res->header('Location'),
 		'http://localhost/a/test/blah@example.com/',
@@ -72,7 +85,28 @@ test_psgi($app, sub {
 
 	$res = $cb->(GET('/test/blah%40example.com/'));
 	is($res->code, 404, 'intentional 404 with URLMap mount');
-
 });
 
+SKIP: {
+	my @mods = qw(DBI DBD::SQLite Search::Xapian IO::Uncompress::Gunzip);
+	foreach my $mod (@mods) {
+		eval "require $mod" or skip "$mod not available: $@", 2;
+	}
+	my $ibx = $config->lookup_name('test');
+	PublicInbox::SearchIdx->new($ibx, 1)->index_sync;
+	test_psgi($app, sub {
+		my ($cb) = @_;
+		my $res = $cb->(GET('/a/test/blah@example.com/t.mbox.gz'));
+		my $gz = $res->content;
+		my $raw;
+		IO::Uncompress::Gunzip::gunzip(\$gz => \$raw);
+		like($raw, qr!^List-Archive: <http://[^/]+/a/test/>!m,
+			'List-Archive set in /t.mbox.gz mboxrd');
+		like($raw,
+			qr!^Archived-At:\x20
+				<http://[^/]+/a/test/blah\@example\.com/>!mx,
+			'Archived-At set in /t.mbox.gz mboxrd');
+	});
+}
+
 done_testing();
-- 
EW


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-01  7:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-24  4:10 [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset edef
2019-09-24  4:10 ` [PATCH 1/1] wwwstream: copy $ctx->{env} in new edef
2019-09-26  3:03 ` [PATCH 0/1] Fix broken clone URLs due to SCRIPT_NAME getting reset Eric Wong
2019-10-01  7:13   ` [PATCH] www: fix absolute URLs when mounted under a subdir Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).