user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 1/2] thread: prevent hidden threads in /$INBOX/ landing page
  @ 2018-04-25  8:52  5% ` Eric Wong
  0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2018-04-25  8:52 UTC (permalink / raw)
  To: meta

In retrospect, the loop prevention done by our indexer is not
always sufficient since it can have an improperly sorted
or incomplete References headers.

This bug was triggered multiple bracketed Message-IDs in an
In-Reply-To: header (not References) where the Message-IDs were
in non-chronological order when somebody tried to reply to
different leafs of a thread with a single message.

So we must check for descendents before blindly trying to
use the last one.

Fixes: c6a8fdf71e2c336f ("thread: last Reference always wins")
---
 lib/PublicInbox/SearchThread.pm |  4 +-
 t/thread-cycle.t                | 88 ++++++++++++++++++++++++++++++-----------
 2 files changed, 68 insertions(+), 24 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 1d250b4..450a06f 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -76,7 +76,9 @@ sub _add_message ($$) {
 
 	# C. Set the parent of this message to be the last element in
 	# References.
-	$prev->add_child($this) if defined $prev;
+	if (defined $prev && !$this->has_descendent($prev)) { # would loop
+		$prev->add_child($this);
+	}
 }
 
 package PublicInbox::SearchThread::Msg;
diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index 7d85909..9e915a1 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -11,18 +11,25 @@ my $mt = eval {
 	$Mail::Thread::nosubject = 1;
 	$Mail::Thread::noprune = 1;
 };
-my @check;
-my @msgs = map {
-	my $msg = $_;
-	$msg->{references} =~ s/\s+/ /sg if $msg->{references};
-	my $simple = Email::Simple->create(header => [
-		'Message-Id' => "<$msg->{mid}>",
-		'References' => $msg->{references},
-	]);
-	push @check, $simple;
-	bless $msg, 'PublicInbox::SearchMsg'
-} (
 
+sub make_objs {
+	my @simples;
+	my $n = 0;
+	my @msgs = map {
+		my $msg = $_;
+		$msg->{ds} ||= ++$n;
+		$msg->{references} =~ s/\s+/ /sg if $msg->{references};
+		my $simple = Email::Simple->create(header => [
+			'Message-ID' => "<$msg->{mid}>",
+			'References' => $msg->{references},
+		]);
+		push @simples, $simple;
+		bless $msg, 'PublicInbox::SearchMsg'
+	} @_;
+	(\@simples, \@msgs);
+}
+
+my ($simples, $smsgs) = make_objs(
 # data from t/testbox-6 in Mail::Thread 2.55:
 	{ mid => '20021124145312.GA1759@nlin.net' },
 	{ mid => 'slrnau448m.7l4.markj+0111@cloaked.freeserve.co.uk',
@@ -50,23 +57,42 @@ my @msgs = map {
 	}
 );
 
-my $st = thread_to_s(\@msgs);
+my $st = thread_to_s($smsgs);
 
 SKIP: {
 	skip 'Mail::Thread missing', 1 unless $mt;
-	$mt = Mail::Thread->new(@check);
-	$mt->thread;
-	$mt->order(sub { sort { $a->messageid cmp $b->messageid } @_ });
-	my $check = '';
+	check_mt($st, $simples, 'Mail::Thread output matches');
+}
 
-	my @q = map { (0, $_) } $mt->rootset;
-	while (@q) {
-		my $level = shift @q;
-		my $node = shift @q or next;
-		$check .= (" "x$level) . $node->messageid . "\n";
-		unshift @q, $level + 1, $node->child, $level, $node->next;
+my @backwards = (
+	{ mid => 1, references => '<2> <3> <4>' },
+	{ mid => 4, references => '<2> <3>' },
+	{ mid => 5, references => '<6> <7> <8> <3> <2>' },
+	{ mid => 9, references => '<6> <3>' },
+	{ mid => 10, references => '<8> <7> <6>' },
+	{ mid => 2, references => '<6> <7> <8> <3>' },
+	{ mid => 3, references => '<6> <7> <8>' },
+	{ mid => 6, references => '<8> <7>' },
+	{ mid => 7, references => '<8>' },
+	{ mid => 8, references => '' }
+);
+
+($simples, $smsgs) = make_objs(@backwards);
+my $backward = thread_to_s($smsgs);
+SKIP: {
+	skip 'Mail::Thread missing', 1 unless $mt;
+	check_mt($backward, $simples, 'matches Mail::Thread backwards');
+}
+($simples, $smsgs) = make_objs(reverse @backwards);
+my $forward = thread_to_s($smsgs);
+if ('Mail::Thread sorts by Date') {
+	SKIP: {
+		skip 'Mail::Thread missing', 1 unless $mt;
+		check_mt($forward, $simples, 'matches Mail::Thread forwards');
 	}
-	is($check, $st, 'Mail::Thread output matches');
+}
+unless ('sorting by Date') {
+	is("\n".$backward, "\n".$forward, 'forward and backward matches');
 }
 
 done_testing();
@@ -86,3 +112,19 @@ sub thread_to_s {
 	}
 	$st;
 }
+
+sub check_mt {
+	my ($st, $simples, $msg) = @_;
+	my $mt = Mail::Thread->new(@$simples);
+	$mt->thread;
+	$mt->order(sub { sort { $a->messageid cmp $b->messageid } @_ });
+	my $check = '';
+	my @q = map { (0, $_) } $mt->rootset;
+	while (@q) {
+		my $level = shift @q;
+		my $node = shift @q or next;
+		$check .= (" "x$level) . $node->messageid . "\n";
+		unshift @q, $level + 1, $node->child, $level, $node->next;
+	}
+	is("\n".$check, "\n".$st, $msg);
+}
-- 
EW


^ permalink raw reply related	[relevance 5%]

* [PATCH 6/7] thread: last Reference always wins
  2016-12-10  3:42  6% [PATCH 0/7] message threading fixes for WWW UI Eric Wong
@ 2016-12-10  3:43  7% ` Eric Wong
  2016-12-10  3:43  5% ` [PATCH 7/7] search: always sort thread results in ascending time order Eric Wong
  1 sibling, 0 replies; 4+ results
From: Eric Wong @ 2016-12-10  3:43 UTC (permalink / raw)
  To: meta

Since we use SearchMsg from Xapian data, we can be
assured we do not get self-referential {references}
field.

However, we may need to be more careful when checking
has_descendent for loops, as blindly calling add_child
could open us up to that possibility...
---
 lib/PublicInbox/SearchThread.pm | 13 ++++++++-----
 t/thread-cycle.t                |  8 --------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index ee35f0d..601a84b 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -52,6 +52,10 @@ sub _add_message ($$) {
 	# B. For each element in the message's References field:
 	defined(my $refs = $smsg->{references}) or return;
 
+	# This loop exists to help fill in gaps left from missing
+	# messages.  It is not needed in a perfect world where
+	# everything is perfectly referenced, only the last ref
+	# matters.
 	my $prev;
 	foreach my $ref ($refs =~ m/<([^>]+)>/g) {
 		# Find a Container object for the given Message-ID
@@ -74,10 +78,8 @@ sub _add_message ($$) {
 	}
 
 	# C. Set the parent of this message to be the last element in
-	# References...
-	if ($prev && !$this->has_descendent($prev)) { # would loop
-		$prev->add_child($this)
-	}
+	# References.
+	$prev->add_child($this) if defined $prev;
 }
 
 sub order {
@@ -127,8 +129,9 @@ sub add_child {
 
 sub has_descendent {
 	my ($self, $child) = @_;
+	my %seen; # loop prevention XXX may not be necessary
 	while ($child) {
-		return 1 if $self == $child;
+		return 1 if $self == $child || $seen{$child}++;
 		$child = $child->{parent};
 	}
 	0;
diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index 0e1ecfe..b084449 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -70,14 +70,6 @@ SKIP: {
 	is($check, $st, 'Mail::Thread output matches');
 }
 
-@msgs = map { bless $_, 'PublicInbox::SearchMsg' } (
-	{ mid => 'a@b' },
-	{ mid => 'b@c', references => '<a@b> <b@c>' },
-	{ mid => 'd@e', references => '<d@e>' },
-);
-
-is(thread_to_s(\@msgs), "a\@b\n b\@c\nd\@e\n", 'ok with self-references');
-
 done_testing();
 
 sub thread_to_s {
-- 
EW


^ permalink raw reply related	[relevance 7%]

* [PATCH 7/7] search: always sort thread results in ascending time order
  2016-12-10  3:42  6% [PATCH 0/7] message threading fixes for WWW UI Eric Wong
  2016-12-10  3:43  7% ` [PATCH 6/7] thread: last Reference always wins Eric Wong
@ 2016-12-10  3:43  5% ` Eric Wong
  1 sibling, 0 replies; 4+ results
From: Eric Wong @ 2016-12-10  3:43 UTC (permalink / raw)
  To: meta

This makes life easier for the threading algorithm, as we can
use the implied ordering of timestamps to avoid temporary ghosts
and resulting container vivication.

This would've also allowed us to hide the bug (in most cases)
fixed by the patch titled "thread: last Reference always wins",
in case that needs to be reverted due to infinite looping.
---
 lib/PublicInbox/Mbox.pm   | 2 +-
 lib/PublicInbox/Search.pm | 5 +++++
 lib/PublicInbox/View.pm   | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index fd623f6..2565ea5 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -115,7 +115,7 @@ sub new {
 		cb => $cb,
 		ctx => $ctx,
 		msgs => [],
-		opts => { asc => 1, offset => 0 },
+		opts => { offset => 0 },
 	}, $class;
 }
 
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 8da30c1..5e6bfc6 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -158,6 +158,11 @@ sub get_thread {
 	}
 	$opts ||= {};
 	$opts->{limit} ||= 1000;
+
+	# always sort threads by timestamp, this makes life easier
+	# for the threading algorithm (in SearchThread.pm)
+	$opts->{asc} = 1;
+
 	_do_enquire($self, $qtid, $opts);
 }
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index c2e1ae7..ec5f7e0 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -327,7 +327,7 @@ sub stream_thread ($$) {
 sub thread_html {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
-	my $sres = $ctx->{srch}->get_thread($mid, { asc => 1 });
+	my $sres = $ctx->{srch}->get_thread($mid);
 	my $msgs = load_results($sres);
 	my $nr = $sres->{total};
 	return missing_thread($ctx) if $nr == 0;
-- 
EW


^ permalink raw reply related	[relevance 5%]

* [PATCH 0/7] message threading fixes for WWW UI
@ 2016-12-10  3:42  6% Eric Wong
  2016-12-10  3:43  7% ` [PATCH 6/7] thread: last Reference always wins Eric Wong
  2016-12-10  3:43  5% ` [PATCH 7/7] search: always sort thread results in ascending time order Eric Wong
  0 siblings, 2 replies; 4+ results
From: Eric Wong @ 2016-12-10  3:42 UTC (permalink / raw)
  To: meta

This series improves thread handling in several oddball
cases.

In the Xapian search indexing phase, the In-Reply-To header
is always considered the last (direct) parent of a message.
This is necessary in cases where a MUA specifies References
in an invalid order.  This is also what our View.pm display
has done for generating "reply" links.

Not many repos are affected by this, but
"public-inbox-index --reindex" will make those consistent
(there is no incompatible Xapian DB version bump).

We will now prune ghosts without children before display, as
they are sometimes the result of buggy (or malicious) MUAs
inserting spaces or otherwise mangling References: headers.
Ghosts with valid children remain shown, as they are likely to
be legitimate (but lost) messages.

Deploying over the next few hours, .onions first!

  Currently reindexing git@vger mirror:

    http://czquwvybam4bgbro.onion/meta

  Up next:

    http://hjrcffqmbrq6wope.onion/meta

  Last: (also public-inbox.org)

    http://ou63pmih66umazou.onion/meta


Eric Wong (7):
  search: favor In-Reply-To over last References iff IRT exists
  view: favor SearchMsg for In-Reply-To over Email::MIME
  thread: fix comment describing its existence
  view: reduce indentation for skeleton generation
  view: skip ghosts with no direct children
  thread: last Reference always wins
  search: always sort thread results in ascending time order

 lib/PublicInbox/Mbox.pm         |  2 +-
 lib/PublicInbox/Search.pm       |  5 ++++
 lib/PublicInbox/SearchIdx.pm    | 22 ++++++++++++---
 lib/PublicInbox/SearchThread.pm | 30 ++++++++++++++------
 lib/PublicInbox/View.pm         | 61 +++++++++++++++++++++--------------------
 t/thread-cycle.t                |  8 ------
 6 files changed, 76 insertions(+), 52 deletions(-)

-- 
EW

^ permalink raw reply	[relevance 6%]

Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2016-12-10  3:42  6% [PATCH 0/7] message threading fixes for WWW UI Eric Wong
2016-12-10  3:43  7% ` [PATCH 6/7] thread: last Reference always wins Eric Wong
2016-12-10  3:43  5% ` [PATCH 7/7] search: always sort thread results in ascending time order Eric Wong
2018-04-25  8:52     [PATCH 0/2] threading fixes Eric Wong
2018-04-25  8:52  5% ` [PATCH 1/2] thread: prevent hidden threads in /$INBOX/ landing page Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).