user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/7] improved thread views and 404 reductions
@ 2015-09-02  6:59 Eric Wong
  2015-09-02  6:59 ` [PATCH 1/7] view: close possible race condition in thread view Eric Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

The thread HTML view may now be flat (chronological, newest
first) to make active threads easier-to-follow.  We also make
unknown Message-IDs more usable by avoiding running SHA-1
on them.

The Message-ID finder is also handy for cross posts
and can probably link to multiple, external sources such as
mid.gmane.org and other places.

Eric Wong (7):
      view: close possible race condition in thread view
      view: optional flat view for recent messages
      view: account for missing In-Reply-To header
      view: simplify parent anchoring code
      view: pre-anchor entries for flat view
      view: avoid links to unknown compressed Message-IDs
      implement external Message-ID finder

 lib/PublicInbox/ExtMsg.pm |  92 ++++++++++++++++++++++++
 lib/PublicInbox/Hval.pm   |   4 +-
 lib/PublicInbox/View.pm   | 180 ++++++++++++++++++++++++++++++----------------
 lib/PublicInbox/WWW.pm    |  18 +++--
 public-inbox.cgi          |   1 +
 5 files changed, 226 insertions(+), 69 deletions(-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/7] view: close possible race condition in thread view
  2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
@ 2015-09-02  6:59 ` Eric Wong
  2015-09-02  6:59 ` [PATCH 2/7] view: optional flat view for recent messages Eric Wong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

It's possible that the Xapian index and git HEAD can be out-of-sync
and a message which existed when we did the search is no longer
accessible by the time we get to rendering it.
---
 lib/PublicInbox/View.pm | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 6aa199e..1eb12a9 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -144,7 +144,7 @@ sub emit_thread_html {
 	my $msgs = load_results($res);
 	my $nr = scalar @$msgs;
 	return missing_thread($cb) if $nr == 0;
-	my $fh = $cb->([200,['Content-Type'=>'text/html; charset=UTF-8']]);
+	my $orig_cb = $cb;
 	my $th = thread_results($msgs);
 	my $state = {
 		ctx => $ctx,
@@ -155,18 +155,23 @@ sub emit_thread_html {
 	{
 		require PublicInbox::GitCatFile;
 		my $git = PublicInbox::GitCatFile->new($ctx->{git_dir});
-		thread_entry($fh, $git, $state, $_, 0) for $th->rootset;
+		thread_entry(\$cb, $git, $state, $_, 0) for $th->rootset;
 	}
 	Email::Address->purge_cache;
+
+	# there could be a race due to a message being deleted in git
+	# but still being in the Xapian index:
+	return missing_thread($cb) if ($orig_cb eq $cb);
+
 	my $final_anchor = $state->{anchor_idx};
 	my $next = "<a\nid=\"s$final_anchor\">";
 	$next .= $final_anchor == 1 ? 'only message in' : 'end of';
 	$next .= " thread</a>, back to <a\nhref=\"../../\">index</a>\n";
 	$next .= "download thread: <a\nhref=\"../t.mbox.gz\">mbox.gz</a>";
 	$next .= " / follow: <a\nhref=\"../t.atom\">Atom feed</a>\n\n";
-	$fh->write("<hr />" . PRE_WRAP . $next . $foot .
+	$cb->write("<hr />" . PRE_WRAP . $next . $foot .
 		   "</pre></body></html>");
-	$fh->close;
+	$cb->close;
 }
 
 sub index_walk {
@@ -536,14 +541,16 @@ sub anchor_for {
 }
 
 sub thread_html_head {
-	my ($mime) = @_;
+	my ($cb, $mime) = @_;
+	$$cb = $$cb->([200, ['Content-Type'=> 'text/html; charset=UTF-8']]);
+
 	my $s = PublicInbox::Hval->new_oneline($mime->header('Subject'));
 	$s = $s->as_html;
-	"<html><head><title>$s</title></head><body>";
+	$$cb->write("<html><head><title>$s</title></head><body>");
 }
 
 sub thread_entry {
-	my ($fh, $git, $state, $node, $level) = @_;
+	my ($cb, $git, $state, $node, $level) = @_;
 	return unless $node;
 	if (my $mime = $node->message) {
 
@@ -552,13 +559,13 @@ sub thread_entry {
 		$mime = eval { Email::MIME->new($git->cat_file("HEAD:$path")) };
 		if ($mime) {
 			if ($state->{anchor_idx} == 0) {
-				$fh->write(thread_html_head($mime));
+				thread_html_head($cb, $mime);
 			}
-			index_entry($fh, $mime, $level, $state);
+			index_entry($$cb, $mime, $level, $state);
 		}
 	}
-	thread_entry($fh, $git, $state, $node->child, $level + 1);
-	thread_entry($fh, $git, $state, $node->next, $level);
+	thread_entry($cb, $git, $state, $node->child, $level + 1);
+	thread_entry($cb, $git, $state, $node->next, $level);
 }
 
 sub load_results {
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/7] view: optional flat view for recent messages
  2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
  2015-09-02  6:59 ` [PATCH 1/7] view: close possible race condition in thread view Eric Wong
@ 2015-09-02  6:59 ` Eric Wong
  2015-09-02  6:59 ` [PATCH 3/7] view: account for missing In-Reply-To header Eric Wong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

For still-active threads, it will likely be easier to follow
them chronologically, especially if we have links to parent
messages.
---
 lib/PublicInbox/View.pm | 67 ++++++++++++++++++++++++++++++++++---------------
 lib/PublicInbox/WWW.pm  |  3 +++
 2 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 1eb12a9..a3df319 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -81,7 +81,8 @@ sub index_entry {
 		$anchor = $seen->{$anchor_idx};
 	}
 	if ($srch) {
-		$subj = "<a\nhref=\"${path}$href/t/#u\">$subj</a>";
+		my $t = $ctx->{flat} ? 'T' : 't';
+		$subj = "<a\nhref=\"${path}$href/$t/#u\">$subj</a>";
 	}
 	if ($root_anchor && $root_anchor eq $id) {
 		$subj = "<u\nid=\"u\">$subj</u>";
@@ -103,7 +104,9 @@ sub index_entry {
 
 	my ($fhref, $more_ref);
 	my $mhref = "${path}$href/";
-	if ($level > 0) {
+
+	# show full messages at level == 0 in threaded view
+	if ($level > 0 || ($ctx->{flat} && $root_anchor ne $id)) {
 		$fhref = "${path}$href/f/";
 		$more_ref = \$more;
 	}
@@ -126,6 +129,15 @@ sub index_entry {
 		}
 		$rv .= " <a\nhref=\"$anchor\">parent</a>";
 	}
+	if ($srch) {
+		if ($ctx->{flat}) {
+			$rv .= " [<a\nhref=\"${path}$href/t/#u\">threaded</a>" .
+				"|<b>flat</b>]";
+		} else {
+			$rv .= " [<b>threaded</b>|" .
+				"<a\nhref=\"${path}$href/T/#u\">flat</a>]";
+		}
+	}
 
 	$fh->write($rv .= '</pre></td></tr></table>');
 }
@@ -144,19 +156,24 @@ sub emit_thread_html {
 	my $msgs = load_results($res);
 	my $nr = scalar @$msgs;
 	return missing_thread($cb) if $nr == 0;
+	my $flat = $ctx->{flat};
 	my $orig_cb = $cb;
-	my $th = thread_results($msgs);
 	my $state = {
 		ctx => $ctx,
 		seen => {},
 		root_anchor => anchor_for($mid),
 		anchor_idx => 0,
 	};
-	{
-		require PublicInbox::GitCatFile;
-		my $git = PublicInbox::GitCatFile->new($ctx->{git_dir});
+
+	require PublicInbox::GitCatFile;
+	my $git = PublicInbox::GitCatFile->new($ctx->{git_dir});
+	if ($flat) {
+		__thread_entry(\$cb, $git, $state, $_, 0) for (@$msgs);
+	} else {
+		my $th = thread_results($msgs);
 		thread_entry(\$cb, $git, $state, $_, 0) for $th->rootset;
 	}
+	$git = undef;
 	Email::Address->purge_cache;
 
 	# there could be a race due to a message being deleted in git
@@ -166,10 +183,15 @@ sub emit_thread_html {
 	my $final_anchor = $state->{anchor_idx};
 	my $next = "<a\nid=\"s$final_anchor\">";
 	$next .= $final_anchor == 1 ? 'only message in' : 'end of';
-	$next .= " thread</a>, back to <a\nhref=\"../../\">index</a>\n";
-	$next .= "download thread: <a\nhref=\"../t.mbox.gz\">mbox.gz</a>";
-	$next .= " / follow: <a\nhref=\"../t.atom\">Atom feed</a>\n\n";
-	$cb->write("<hr />" . PRE_WRAP . $next . $foot .
+	$next .= " thread</a>, back to <a\nhref=\"../../\">index</a>";
+	if ($flat) {
+		$next .= " [<a\nhref=\"../t/#u\">threaded</a>|<b>flat</b>]";
+	} else {
+		$next .= " [<b>threaded</b>|<a\nhref=\"../T/#u\">flat</a>]";
+	}
+	$next .= "\ndownload thread: <a\nhref=\"../t.mbox.gz\">mbox.gz</a>";
+	$next .= " / follow: <a\nhref=\"../t.atom\">Atom feed</a>";
+	$cb->write("<hr />" . PRE_WRAP . $next . "\n\n". $foot .
 		   "</pre></body></html>");
 	$cb->close;
 }
@@ -549,20 +571,25 @@ sub thread_html_head {
 	$$cb->write("<html><head><title>$s</title></head><body>");
 }
 
+sub __thread_entry {
+	my ($cb, $git, $state, $mime, $level) = @_;
+
+	# lazy load the full message from mini_mime:
+	my $path = mid2path(mid_clean($mime->header('Message-ID')));
+	$mime = eval { Email::MIME->new($git->cat_file("HEAD:$path")) };
+	if ($mime) {
+		if ($state->{anchor_idx} == 0) {
+			thread_html_head($cb, $mime);
+		}
+		index_entry($$cb, $mime, $level, $state);
+	}
+}
+
 sub thread_entry {
 	my ($cb, $git, $state, $node, $level) = @_;
 	return unless $node;
 	if (my $mime = $node->message) {
-
-		# lazy load the full message from mini_mime:
-		my $path = mid2path(mid_clean($mime->header('Message-ID')));
-		$mime = eval { Email::MIME->new($git->cat_file("HEAD:$path")) };
-		if ($mime) {
-			if ($state->{anchor_idx} == 0) {
-				thread_html_head($cb, $mime);
-			}
-			index_entry($$cb, $mime, $level, $state);
-		}
+		__thread_entry($cb, $git, $state, $mime, $level);
 	}
 	thread_entry($cb, $git, $state, $node->child, $level + 1);
 	thread_entry($cb, $git, $state, $node->next, $level);
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index d666a1b..9ae7f7b 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -46,6 +46,9 @@ sub run {
 		invalid_list_mid(\%ctx, $1, $2) || get_thread_mbox(\%ctx, $sfx);
 	} elsif ($path_info =~ m!$LISTNAME_RE/$MID_RE/t\.atom\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_thread_atom(\%ctx);
+	} elsif ($path_info =~ m!$LISTNAME_RE/$MID_RE/T/\z!o) {
+		$ctx{flat} = 1;
+		invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
 
 	# single-message pages
 	} elsif ($path_info =~ m!$LISTNAME_RE/$MID_RE/\z!o) {
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/7] view: account for missing In-Reply-To header
  2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
  2015-09-02  6:59 ` [PATCH 1/7] view: close possible race condition in thread view Eric Wong
  2015-09-02  6:59 ` [PATCH 2/7] view: optional flat view for recent messages Eric Wong
@ 2015-09-02  6:59 ` Eric Wong
  2015-09-02  6:59 ` [PATCH 4/7] view: simplify parent anchoring code Eric Wong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

Some mail clients do not generate In-Reply-To headers,
but do generate a proper References header.

This matches the behavior of Mail::Thread as well
as our SearchIdx code to link threads in the Xapian DB.
---
 lib/PublicInbox/View.pm | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index a3df319..d213124 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -46,6 +46,19 @@ sub feed_entry {
 	PRE_WRAP . multipart_text_as_html($mime, $full_pfx) . '</pre>';
 }
 
+sub in_reply_to {
+	my ($header_obj) = @_;
+	my $irt = $header_obj->header('In-Reply-To');
+
+	return mid_clean($irt) if (defined $irt);
+
+	my $refs = $header_obj->header('References');
+	if ($refs && $refs =~ /<([^>]+)>\s*\z/s) {
+		return $1;
+	}
+	undef;
+}
+
 # this is already inside a <pre>
 sub index_entry {
 	my ($fh, $mime, $level, $state) = @_;
@@ -74,7 +87,8 @@ sub index_entry {
 	my $root_anchor = $state->{root_anchor};
 	my $path = $root_anchor ? '../../' : '';
 	my $href = $mid->as_href;
-	my $irt = $header_obj->header('In-Reply-To');
+	my $irt = in_reply_to($header_obj);
+
 	my ($anchor_idx, $anchor);
 	if (defined $irt) {
 		$anchor_idx = anchor_for($irt);
@@ -463,7 +477,7 @@ sub _parent_headers_nosrch {
 	my ($header_obj) = @_;
 	my $rv = '';
 
-	my $irt = $header_obj->header('In-Reply-To');
+	my $irt = in_reply_to($header_obj);
 	if (defined $irt) {
 		my $v = PublicInbox::Hval->new_msgid($irt);
 		my $html = $v->as_html;
@@ -476,7 +490,7 @@ sub _parent_headers_nosrch {
 	if ($refs) {
 		# avoid redundant URLs wasting bandwidth
 		my %seen;
-		$seen{mid_clean($irt)} = 1 if defined $irt;
+		$seen{$irt} = 1 if defined $irt;
 		my @refs;
 		my @raw_refs = ($refs =~ /<([^>]+)>/g);
 		foreach my $ref (@raw_refs) {
@@ -526,7 +540,7 @@ sub html_footer {
 	my $idx = $standalone ? " <a\nhref=\"$upfx\">index</a>" : '';
 	if ($idx && $srch) {
 		my $next = thread_inline(\$idx, $ctx, $mime, $full_pfx);
-		$irt = $mime->header('In-Reply-To');
+		$irt = in_reply_to($mime->header_obj);
 		if (defined $irt) {
 			$irt = PublicInbox::Hval->new_msgid($irt);
 			$irt = $irt->as_href;
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/7] view: simplify parent anchoring code
  2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
                   ` (2 preceding siblings ...)
  2015-09-02  6:59 ` [PATCH 3/7] view: account for missing In-Reply-To header Eric Wong
@ 2015-09-02  6:59 ` Eric Wong
  2015-09-02  6:59 ` [PATCH 5/7] view: pre-anchor entries for flat view Eric Wong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

This will make things easier for the next commit to pre-populate
the `$seen' hash for linking within the flat view of a thread.
---
 lib/PublicInbox/View.pm | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index d213124..0331b62 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -74,7 +74,7 @@ sub index_entry {
 	my $mid_raw = $header_obj->header('Message-ID');
 	my $id = anchor_for($mid_raw);
 	my $seen = $state->{seen};
-	$seen->{$id} = "#$id"; # save the anchor for later
+	$seen->{$id} = "#$id"; # save the anchor for children, later
 
 	my $mid = PublicInbox::Hval->new_msgid($mid_raw);
 	my $from = PublicInbox::Hval->new_oneline($mime->header('From'))->raw;
@@ -88,12 +88,8 @@ sub index_entry {
 	my $path = $root_anchor ? '../../' : '';
 	my $href = $mid->as_href;
 	my $irt = in_reply_to($header_obj);
+	my $parent_anchor = $seen->{anchor_for($irt)} if defined $irt;
 
-	my ($anchor_idx, $anchor);
-	if (defined $irt) {
-		$anchor_idx = anchor_for($irt);
-		$anchor = $seen->{$anchor_idx};
-	}
 	if ($srch) {
 		my $t = $ctx->{flat} ? 'T' : 't';
 		$subj = "<a\nhref=\"${path}$href/$t/#u\">$subj</a>";
@@ -135,13 +131,12 @@ sub index_entry {
 	$rv .= html_footer($mime, 0, undef, $ctx);
 
 	if (defined $irt) {
-		unless (defined $anchor) {
+		unless (defined $parent_anchor) {
 			my $v = PublicInbox::Hval->new_msgid($irt);
 			$v = $v->as_href;
-			$anchor = "${path}$v/";
-			$seen->{$anchor_idx} = $anchor;
+			$parent_anchor = "${path}$v/";
 		}
-		$rv .= " <a\nhref=\"$anchor\">parent</a>";
+		$rv .= " <a\nhref=\"$parent_anchor\">parent</a>";
 	}
 	if ($srch) {
 		if ($ctx->{flat}) {
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/7] view: pre-anchor entries for flat view
  2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
                   ` (3 preceding siblings ...)
  2015-09-02  6:59 ` [PATCH 4/7] view: simplify parent anchoring code Eric Wong
@ 2015-09-02  6:59 ` Eric Wong
  2015-09-02  6:59 ` [PATCH 6/7] view: avoid links to unknown compressed Message-IDs Eric Wong
  2015-09-02  6:59 ` [PATCH 7/7] implement external Message-ID finder Eric Wong
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

This will allow users to navigate the flat view without making extra
HTTP requests.
---
 lib/PublicInbox/View.pm | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 0331b62..98fc133 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -167,9 +167,10 @@ sub emit_thread_html {
 	return missing_thread($cb) if $nr == 0;
 	my $flat = $ctx->{flat};
 	my $orig_cb = $cb;
+	my $seen = {};
 	my $state = {
 		ctx => $ctx,
-		seen => {},
+		seen => $seen,
 		root_anchor => anchor_for($mid),
 		anchor_idx => 0,
 	};
@@ -177,6 +178,7 @@ sub emit_thread_html {
 	require PublicInbox::GitCatFile;
 	my $git = PublicInbox::GitCatFile->new($ctx->{git_dir});
 	if ($flat) {
+		pre_anchor_entry($seen, $_) for (@$msgs);
 		__thread_entry(\$cb, $git, $state, $_, 0) for (@$msgs);
 	} else {
 		my $th = thread_results($msgs);
@@ -580,6 +582,12 @@ sub thread_html_head {
 	$$cb->write("<html><head><title>$s</title></head><body>");
 }
 
+sub pre_anchor_entry {
+	my ($seen, $mime) = @_;
+	my $id = anchor_for($mime->header('Message-ID'));
+	$seen->{$id} = "#$id"; # save the anchor for children, later
+}
+
 sub __thread_entry {
 	my ($cb, $git, $state, $mime, $level) = @_;
 
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/7] view: avoid links to unknown compressed Message-IDs
  2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
                   ` (4 preceding siblings ...)
  2015-09-02  6:59 ` [PATCH 5/7] view: pre-anchor entries for flat view Eric Wong
@ 2015-09-02  6:59 ` Eric Wong
  2015-09-02  6:59 ` [PATCH 7/7] implement external Message-ID finder Eric Wong
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

Compressed Message-IDs are irreversible and may not be used
at other sites.  So avoid compressing Message-IDs we do not
know about so users have a chance of finding the message in
other archives by doing a Message-ID lookup.
---
 lib/PublicInbox/Hval.pm |  4 ++--
 lib/PublicInbox/View.pm | 33 +++++++++++++++++++--------------
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/Hval.pm b/lib/PublicInbox/Hval.pm
index 21efe40..0445e57 100644
--- a/lib/PublicInbox/Hval.pm
+++ b/lib/PublicInbox/Hval.pm
@@ -25,9 +25,9 @@ sub new {
 }
 
 sub new_msgid {
-	my ($class, $msgid) = @_;
+	my ($class, $msgid, $no_compress) = @_;
 	$msgid = mid_clean($msgid);
-	$class->new($msgid, mid_compress($msgid));
+	$class->new($msgid, $no_compress ? $msgid : mid_compress($msgid));
 }
 
 sub new_oneline {
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 98fc133..1528a87 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -132,7 +132,7 @@ sub index_entry {
 
 	if (defined $irt) {
 		unless (defined $parent_anchor) {
-			my $v = PublicInbox::Hval->new_msgid($irt);
+			my $v = PublicInbox::Hval->new_msgid($irt, 1);
 			$v = $v->as_href;
 			$parent_anchor = "${path}$v/";
 		}
@@ -452,22 +452,25 @@ sub thread_inline {
 
 	if ($nr <= 1) {
 		$$dst .= "\n[no followups, yet]\n";
-		return;
+		return (undef, in_reply_to($cur));
 	}
 	my $upfx = $full_pfx ? '' : '../';
 
 	$$dst .= "\n\n~$nr messages in thread: ".
 		 "(<a\nhref=\"${upfx}t/#u\">expand</a>)\n";
 	my $subj = $srch->subject_path($cur->header('Subject'));
+	my $parent = in_reply_to($cur);
 	my $state = {
 		seen => { $subj => 1 },
 		srch => $srch,
 		cur => $mid,
+		parent_cmp => $parent ? mid_compress($parent) : '',
+		parent => $parent,
 	};
 	for (thread_results(load_results($res))->rootset) {
 		inline_dump($dst, $state, $upfx, $_, 0);
 	}
-	$state->{next_msg};
+	($state->{next_msg}, $state->{parent});
 }
 
 sub _parent_headers_nosrch {
@@ -476,7 +479,7 @@ sub _parent_headers_nosrch {
 
 	my $irt = in_reply_to($header_obj);
 	if (defined $irt) {
-		my $v = PublicInbox::Hval->new_msgid($irt);
+		my $v = PublicInbox::Hval->new_msgid($irt, 1);
 		my $html = $v->as_html;
 		my $href = $v->as_href;
 		$rv .= "In-Reply-To: &lt;";
@@ -493,7 +496,7 @@ sub _parent_headers_nosrch {
 		foreach my $ref (@raw_refs) {
 			next if $seen{$ref};
 			$seen{$ref} = 1;
-			push @refs, linkify_ref($ref);
+			push @refs, linkify_ref_nosrch($ref);
 		}
 
 		if (@refs) {
@@ -536,12 +539,11 @@ sub html_footer {
 	my $upfx = $full_pfx ? '../' : '../../';
 	my $idx = $standalone ? " <a\nhref=\"$upfx\">index</a>" : '';
 	if ($idx && $srch) {
-		my $next = thread_inline(\$idx, $ctx, $mime, $full_pfx);
-		$irt = in_reply_to($mime->header_obj);
-		if (defined $irt) {
-			$irt = PublicInbox::Hval->new_msgid($irt);
-			$irt = $irt->as_href;
-			$irt = "<a\nhref=\"$upfx$irt/\">parent</a> ";
+		my ($next, $p) = thread_inline(\$idx, $ctx, $mime, $full_pfx);
+		if (defined $p) {
+			$p = PublicInbox::Hval->new_oneline($p);
+			$p = $p->as_href;
+			$irt = "<a\nhref=\"$upfx$p/\">parent</a> ";
 		} else {
 			$irt = ' ' x length('parent ');
 		}
@@ -557,8 +559,8 @@ sub html_footer {
 	"$irt<a\nhref=\"" . ascii_html($href) . '">reply</a>' . $idx;
 }
 
-sub linkify_ref {
-	my $v = PublicInbox::Hval->new_msgid($_[0]);
+sub linkify_ref_nosrch {
+	my $v = PublicInbox::Hval->new_msgid($_[0], 1);
 	my $html = $v->as_html;
 	my $href = $v->as_href;
 	"&lt;<a\nhref=\"../$href/\">$html</a>&gt;";
@@ -699,8 +701,11 @@ sub _inline_header {
 sub inline_dump {
 	my ($dst, $state, $upfx, $node, $level) = @_;
 	return unless $node;
-	return if $state->{stopped};
 	if (my $mime = $node->message) {
+		my $mid = mid_clean($mime->header('Message-ID'));
+		if ($mid eq $state->{parent_cmp}) {
+			$state->{parent} = $mid;
+		}
 		_inline_header($dst, $state, $upfx, $mime, $level);
 	}
 	inline_dump($dst, $state, $upfx, $node->child, $level+1);
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 7/7] implement external Message-ID finder
  2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
                   ` (5 preceding siblings ...)
  2015-09-02  6:59 ` [PATCH 6/7] view: avoid links to unknown compressed Message-IDs Eric Wong
@ 2015-09-02  6:59 ` Eric Wong
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2015-09-02  6:59 UTC (permalink / raw)
  To: meta

Currently, this looks at other public-inbox configurations
served in the same process.  In the future, it will generate
links to other Message-ID lookup endpoints.
---
 lib/PublicInbox/ExtMsg.pm | 92 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/PublicInbox/View.pm   | 14 ++++----
 lib/PublicInbox/WWW.pm    | 15 +++++---
 public-inbox.cgi          |  1 +
 4 files changed, 110 insertions(+), 12 deletions(-)
 create mode 100644 lib/PublicInbox/ExtMsg.pm

diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm
new file mode 100644
index 0000000..1c0887c
--- /dev/null
+++ b/lib/PublicInbox/ExtMsg.pm
@@ -0,0 +1,92 @@
+# Copyright (C) 2015 all contributors <meta@public-inbox.org>
+# License: AGPLv3 or later (https://www.gnu.org/licenses/agpl-3.0.txt)
+package PublicInbox::ExtMsg;
+use strict;
+use warnings;
+use URI::Escape qw(uri_escape_utf8);
+use PublicInbox::Hval;
+use PublicInbox::MID qw/mid_compress mid2path/;
+
+sub ext_msg {
+	my ($ctx) = @_;
+	my $pi_config = $ctx->{pi_config};
+	my $listname = $ctx->{listname};
+	my $mid = $ctx->{mid};
+	my $cmid = mid_compress($mid);
+
+	eval { require PublicInbox::Search };
+	my $have_xap = $@ ? 0 : 1;
+	my @nox;
+
+	foreach my $k (keys %$pi_config) {
+		$k =~ /\Apublicinbox\.([A-Z0-9a-z-]+)\.url\z/ or next;
+		my $list = $1;
+		next if $list eq $listname;
+
+		my $git_dir = $pi_config->{"publicinbox.$list.mainrepo"};
+		defined $git_dir or next;
+
+		my $url = $pi_config->{"publicinbox.$list.url"};
+		defined $url or next;
+
+		$url =~ s!/+\z!!;
+
+		# try to find the URL with Xapian to avoid forking
+		if ($have_xap) {
+			my $doc_id = eval {
+				my $s = PublicInbox::Search->new($git_dir);
+				$s->find_unique_doc_id('mid', $cmid);
+			};
+			if ($@) {
+				# xapian not configured for this repo
+			} else {
+				# maybe we found it!
+				return r302($url, $cmid) if (defined $doc_id);
+
+				# no point in trying the fork fallback if we
+				# know Xapian is up-to-date but missing the
+				# message in the current repo
+				next;
+			}
+		}
+
+		# queue up for forking after we've tried Xapian on all of them
+		push @nox, { git_dir => $git_dir, url => $url };
+	}
+
+	# Xapian not installed or configured for some repos
+	my $path = "HEAD:" . mid2path($cmid);
+
+	foreach my $n (@nox) {
+		my @cmd = ('git', "--git-dir=$n->{git_dir}", 'cat-file',
+			   '-t', $path);
+		my $pid = open my $fh, '-|';
+		defined $pid or die "fork failed: $!\n";
+
+		if ($pid == 0) {
+			open STDERR, '>', '/dev/null'; # ignore errors
+			exec @cmd or die "exec failed: $!\n";
+		} else {
+			my $type = eval { local $/; <$fh> };
+			close $fh;
+			if ($? == 0 && $type eq "blob\n") {
+				return r302($n->{url}, $cmid);
+			}
+		}
+	}
+
+	# Fall back to external repos
+
+	[404, ['Content-Type'=>'text/plain'], ['Not found']];
+}
+
+# Redirect to another public-inbox which is mapped by $pi_config
+sub r302 {
+	my ($url, $mid) = @_;
+	$url .= '/' . uri_escape_utf8($mid) . '/';
+	[ 302,
+	  [ 'Location' => $url, 'Content-Type' => 'text/plain' ],
+	  [ "Redirecting to\n$url\n" ] ]
+}
+
+1;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 1528a87..e18895f 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -164,7 +164,7 @@ sub emit_thread_html {
 	my $res = $srch->get_thread($mid);
 	my $msgs = load_results($res);
 	my $nr = scalar @$msgs;
-	return missing_thread($cb) if $nr == 0;
+	return missing_thread($cb, $ctx) if $nr == 0;
 	my $flat = $ctx->{flat};
 	my $orig_cb = $cb;
 	my $seen = {};
@@ -189,7 +189,7 @@ sub emit_thread_html {
 
 	# there could be a race due to a message being deleted in git
 	# but still being in the Xapian index:
-	return missing_thread($cb) if ($orig_cb eq $cb);
+	return missing_thread($cb, $ctx) if ($orig_cb eq $cb);
 
 	my $final_anchor = $state->{anchor_idx};
 	my $next = "<a\nid=\"s$final_anchor\">";
@@ -637,12 +637,10 @@ sub thread_results {
 }
 
 sub missing_thread {
-	my ($cb) = @_;
-	my $title = 'Thread does not exist';
-	$cb->([404, ['Content-Type' => 'text/html']])->write(<<EOF);
-<html><head><title>$title</title></head><body><pre>$title
-<a href="../../">Return to index</a></pre></body></html>
-EOF
+	my ($cb, $ctx) = @_;
+	require PublicInbox::ExtMsg;
+
+	$cb->(PublicInbox::ExtMsg::ext_msg($ctx))
 }
 
 sub _msg_date {
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 9ae7f7b..16fd16a 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -88,7 +88,14 @@ sub preload {
 
 # private functions below
 
-sub r404 { r(404, 'Not Found') }
+sub r404 {
+	my ($ctx) = @_;
+	if ($ctx && $ctx->{mid}) {
+		require PublicInbox::ExtMsg;
+		return PublicInbox::ExtMsg::ext_msg($ctx);
+	}
+	r(404, 'Not Found');
+}
 
 # simple response for errors
 sub r { [ $_[0], ['Content-Type' => 'text/plain'], [ join(' ', @_, "\n") ] ] }
@@ -151,7 +158,7 @@ sub mid2blob {
 # /$LISTNAME/$MESSAGE_ID/raw                    -> raw mbox
 sub get_mid_txt {
 	my ($ctx) = @_;
-	my $x = mid2blob($ctx) or return r404();
+	my $x = mid2blob($ctx) or return r404($ctx);
 	require PublicInbox::Mbox;
 	PublicInbox::Mbox::emit1($x);
 }
@@ -159,7 +166,7 @@ sub get_mid_txt {
 # /$LISTNAME/$MESSAGE_ID/                   -> HTML content (short quotes)
 sub get_mid_html {
 	my ($ctx) = @_;
-	my $x = mid2blob($ctx) or return r404();
+	my $x = mid2blob($ctx) or return r404($ctx);
 
 	require PublicInbox::View;
 	my $foot = footer($ctx);
@@ -173,7 +180,7 @@ sub get_mid_html {
 # /$LISTNAME/$MESSAGE_ID/f/                   -> HTML content (fullquotes)
 sub get_full_html {
 	my ($ctx) = @_;
-	my $x = mid2blob($ctx) or return r404();
+	my $x = mid2blob($ctx) or return r404($ctx);
 
 	require PublicInbox::View;
 	my $foot = footer($ctx);
diff --git a/public-inbox.cgi b/public-inbox.cgi
index 75d510c..1fcc04f 100755
--- a/public-inbox.cgi
+++ b/public-inbox.cgi
@@ -18,6 +18,7 @@ BEGIN {
 	%HTTP_CODES = (
 		200 => 'OK',
 		301 => 'Moved Permanently',
+		302 => 'Found',
 		404 => 'Not Found',
 		405 => 'Method Not Allowed',
 		501 => 'Not Implemented',
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-02  6:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-02  6:59 [PATCH 0/7] improved thread views and 404 reductions Eric Wong
2015-09-02  6:59 ` [PATCH 1/7] view: close possible race condition in thread view Eric Wong
2015-09-02  6:59 ` [PATCH 2/7] view: optional flat view for recent messages Eric Wong
2015-09-02  6:59 ` [PATCH 3/7] view: account for missing In-Reply-To header Eric Wong
2015-09-02  6:59 ` [PATCH 4/7] view: simplify parent anchoring code Eric Wong
2015-09-02  6:59 ` [PATCH 5/7] view: pre-anchor entries for flat view Eric Wong
2015-09-02  6:59 ` [PATCH 6/7] view: avoid links to unknown compressed Message-IDs Eric Wong
2015-09-02  6:59 ` [PATCH 7/7] implement external Message-ID finder Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).