about summary refs log tree commit homepage
path: root/lib/PublicInbox/SearchThread.pm
DateCommit message (Collapse)
2019-12-20searchthread: fix usage of user-supplied parameter
Instead of only passing an Inbox object, we'll pass the $ctx reference as PublicInbox::SearchView::mset_thread did. So although mset_thread was wrong, we now make it's usage of SearchThread::thread correct and update other callers to favor the new style of passing the entire $ctx (with ->{-inbox}) instead of just the Inbox object. This makes the thread skeleton at the bottom of the search page to show subjects of messages, but unfortunately links to non-existent #anchors. The next commit will fix that. While we're at it, favor "\&foo" over "*foo" since the former makes the code reference (aka "function pointer) obvious so it won't be confused for other things named "foo" in that scope (e.g. $foo/@foo/%foo).
2019-01-08view: more culling for search threads
{mapping} overhead is now down to ~1.3M at the end of a giant thread from hell.
2018-04-25thread: sort incoming messages by Date
Improve the display by finding any parent when we see out-of-order References. This prevents us from having two roots in the test case like Mail::Thread does.
2018-04-25thread: prevent hidden threads in /$INBOX/ landing page
In retrospect, the loop prevention done by our indexer is not always sufficient since it can have an improperly sorted or incomplete References headers. This bug was triggered multiple bracketed Message-IDs in an In-Reply-To: header (not References) where the Message-IDs were in non-chronological order when somebody tried to reply to different leafs of a thread with a single message. So we must check for descendents before blindly trying to use the last one. Fixes: c6a8fdf71e2c336f ("thread: last Reference always wins")
2018-03-29search: get rid of most lookup_* subroutines
Too many similar functions doing the same basic thing was redundant and misleading, especially since Message-ID is no longer treated as a truly unique identifier. For displaying threads in the HTML, this makes it clear that we favor the primary Message-ID mapped to an NNTP article number if a message cannot be found.
2017-10-03search: try to fill in ghosts when generating thread skeleton
Since we attempt to fill in threads by Subject, our thread skeletons can cross actual thread IDs, leading to the possibility of false ghosts showing up in the skeleton. Try to fill in the ghosts as well as possible by performing a message lookup.
2017-02-11handle repeated References and In-Reply-To headers
It seems possible for git-send-email(1) to generate repeated repeated instances of References and In-Reply-To headers, as evidenced in: https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw This causes a mismatch between how our search indexer threads and how our HTML view handles threading. In the future, View.pm will use the smsg-parsed {references} field and avoid redoing Email::MIME header parsing. We will still need to figure out a way to deal with messages with repeated Message-IDs, at some point, too.
2016-12-21searchthread: simplify API and remove needless OO
This simplifies callers to prevent errors and avoids needless object-orientation in favor of a single procedure call to handle threading and ordering.
2016-12-21searchthread: update comment about loop prevention
It definitely is necessary to prevent looping with the %seen hash.
2016-12-10thread: last Reference always wins
Since we use SearchMsg from Xapian data, we can be assured we do not get self-referential {references} field. However, we may need to be more careful when checking has_descendent for loops, as blindly calling add_child could open us up to that possibility...
2016-12-10view: skip ghosts with no direct children
Otherwise, a malicious or broken client could populate the thread skeleton with invalid References. We only care about ghosts which messages correctly refer to, not totally bogus ones which may be the result of long line or token truncation + wrapping in MUA headers.
2016-12-10thread: fix comment describing its existence
Mail::Thread is UNavailable on many distros, meaning ordinary users will have to rely on CPAN, a Perl-specific packaging tool.
2016-10-14thread: reinstates stable ordering when ghosts are present
This reverts commit 3c9dd6619f825f0515e7e4afa1bd55c99c1a68d3 ("thread: fix sorting without topmost") and reinstates the "topmost" routine for sorting purposes.
2016-10-13thread: fix parent/child relationships
The ordering change in add_child is critical if $self == $parent as the {children} hash was lost before this change. has_descendent can be simplified by walking upwards from the child instead of downwards from the parent. This fixes threading regressions introduced in commit 30100c46326e2eac275e0af13116636701d2537e ("thread: use hash + array instead of hand-rolled linked list")
2016-10-13thread: reduce indentation level
This should reduce differences from the original Mail::Thread code and hopefully make things easier-to-follow.
2016-10-05thread: remove weaken dependency
We have to walk through all the messages after threading anyways to build the rootset, so we can just delete all the parent references at that point.
2016-10-05thread: use hash + array instead of hand-rolled linked list
This starts to show noticeable performance improvements when attempting to thread over 400 messages; but the improvement may not be measurable with less. However, the resulting code is much shorter and (IMHO) much easier to understand.
2016-10-05thread: inline and remove recurse_down logic
We no longer recurse, and it's too hard to come up with a new name for a sub we will only use once.
2016-10-05thread: order_children no longer cares about depth
We never use the depth anywhere in this sub
2016-10-05thread: avoid incrementing undefined value
It is pointless to increment when setting a true value is simpler as there is no need to read before writing.
2016-10-05thread: remove iterate_down
Unnecessary subs and complexity. This was hiding the fact that $before is never used.
2016-10-05thread: simplify
Single use subroutines actually make the code more complex in this case, and there's never a {seen} field in $self.
2016-10-05thread: remove rootset accessor method
It doesn't buy us much and copying to a new array is slower; but probably not measurable in real-world use.
2016-10-05thread: remove Email::Abstract wrapping
This roughly doubles performance due to the reduction in object creation and abstraction layers.
2016-10-05thread: remove accessor usage in internals
This improves top-level index generation performance by 3-4%.
2016-10-05thread: pass array refs instead of entire arrays
Copying large arrays is expensive, so avoid it. This reduces /$INBOX/ time by around 1%.
2016-10-05thread: remove Mail::Thread dependency
Introduce our own SearchThread class for threading messages. This should allow us to specialize and optimize away objects in future commits.