Date | Commit message (Collapse) |
|
This allows us to consistently enforce the same Message-ID
extraction rules everywhere and makes it easier for us to
make changes in the future.
Update scripts/ssoma-replay, as well, but don't rely on
PublicInbox::* modules in that since it's legacy and
public-inbox was never a dependency of ssoma.
|
|
Since the introduction of over.sqlite3, SearchMsg is not tied to
our search functionality in any way, so stop confusing ourselves
and future hackers by just calling it "PublicInbox::Smsg".
Add a missing "use" in ExtMsg while we're at it.
|
|
We don't need to hold onto the Email::MIME object across
multiple WwwResponse->getline calls, instead we can stuff
the rendered HTML of the first (and hopefully only) message
of the buffer into ctx->{-html_tip}.
|
|
The object-oriented Hval API turned out to be less useful and
more clunky than I envisioned years ago, so get rid of it.
We'll no longer strip trailing whitespace from From: headers in
the HTML display, but I doubt anybody cares.
|
|
We need to escape ampersands (and some other characters for href
attributes), so introduce a `mid_href' sub to do just that.
'<', '>' and '"' were always escaped, so there's no risk of tag
or attribute injection, but creative Message-IDs could cause
confusion for some parsers and generate invalid URLs.
Start getting rid of the bloated, over-engineered OO Hval API
while we're at it, I only noticed this bug because I started
killing off Hval->new* callers.
|
|
No need to use the over-engineered Hval OO API when the subject
is already normalized and there's no trailing spaces because of
normalization.
|
|
We already pre-populate the hashref when loading $smsg
(PublicInbox::SearchMsg) objects out of over.sqlite3 or Xapian,
so making expensive method calls isn't necessary in those cases.
We only need to use the method calls when SQLite or Xapian are
not available or are being populated (such as during indexing).
|
|
Avoid needlessly normalizing the subject when dumping, since
it's pushed into the @$topic array during accumulation in
normalized form.
We can also safely treat $smsg as a hashref and avoid
calling "->ds" as a method since we know we've got that
loaded via Over||Search and won't have to use Email::MIME
header lookup methods.
|
|
We use `$top' in other places, so name it to `$top_subj'
consistently for `$subj' and `$prev_subj' comparisons down
the function.
|
|
While multi-Subject messages are unfortunate, try not to
generate confusing/invalid HTML with multiple elements
having the same HTML id attribute.
|
|
No point in passing something on stack only to stash it
into the $ctx which holds most other parameters used for
rendering the HTML.
|
|
I didn't wait until September to do it, this year!
|
|
Instead of going line-by-line, use split() with a giant regexp
to capture groups of contiguous lines. This offloads state
management to the regexp itself and makes it FAR easier to
keep track of <span> and </span> pairings.
Performance seems roughly on par after this change for the
meta@public-inbox archives. It seems a tiny bit faster for
git@vger with xt/perf-msgview.t, likely due to the longer
messages and larger contiguous groups of lines having the same
prefix (or no prefix at all) and drastically reduces the number
of subroutine calls and Perl ops executed.
|
|
We use the same idiom in many places for doing two-step
linkification and HTML escaping. Get rid of an outdated
comment in flush_quote while we're at it.
|
|
No need to keep the old sub around, anymore. Rename auxiliary
subs to "msg_page_*" instead of the "html" version.
|
|
Get rid of the confusingly named {rv} and {tip} fields
and unify them into {obuf} for readability.
{obuf} usage may be expanded to more areas in the future. This
will eventually make it easier for us to experiment with
alternative buffering schemes.
|
|
It's an uncommon code path, no need to make it more complex
than it needs to be by having extra sub parameters.
|
|
It hasn't changed in a few years. Now we can rely on constant
folding to avoid extraneous ops to the $skel buffer.
|
|
Put more logic into html_footer and less in its only caller so
we can control the buffering and string creation.
|
|
Pass \&coderefs explicitly to walk_thread, and add some
prototypes + comments to describe what goes on.
|
|
This saves us a few comments and confusion. Yes, it's a
destination so "dst" can be appropriate, but we may be using
that term elsewhere.
|
|
While both can be correct, the former seems more common,
is shorter, and is also consistent with the spelling found
in the AGPL-3.0 text.
|
|
We're often iterating through messages while writing to another
buffer in our WWW interface, causing memory usage to multiply.
Since we know we won't need to keep the MIME object around in
some cases, and can tell msg_iter to clobber the on-stack
variable while it operates on subparts of multipart messages.
With xt/mem-msgview.t switched to multipart from the previous
commit, this shows a 13 MB memory reduction on that test.
|
|
The POSIX module is always loaded, so import `strftime' into the
namespace so we can use it and take advantage of compile-time
arg checking. While we're at it, update and reorder caller
functions to use prototypes, too.
|
|
This allows to do some compile-time checking and fills in a
missing "use" in PublicInbox::NewsWWW, allowing it to be used
standalone and independently of PublicInbox::WWW
|
|
In rare cases where Message-IDs get reused, we do not want to
hold onto the large Email::MIME objects in memory after showing
the first message. So discard each message as soon as we're
done using it so we can save memory for the next message.
The new and expensive xt/mem-msgview.t test shows a nearly 14MB
reduction for two ~7MB messages. run_script() also gets
upgraded to make it easier to pass large inputs via IO GLOBs.
|
|
No need to waste several kilobytes creating an anonymous sub for
every invocation of msg_iter.
|
|
We don't need to return a closure or have a separate hash
for sorting threads by relevance. Instead, we can stuff
the relevance {pct} into the SearchMsg object itself and
use that.
Note: upon reviewing this code, the sort-by-relevance seems
bogus as it only considers the relevance of the topmost message.
Instead, it would make more sense to the user to sort by the
highest relevance of all messages in that particular thread.
|
|
We can pass everything we need into the WWW $ctx to avoid
allocating kilobytes of memory for an anonymous sub for every
$MESSAGE_ID/t/ request.
|
|
Stash 5 local variables into the WWW $ctx hash table instead of
allocating several kilobytes for an anonymous sub.
|
|
WwwStream already passes the WWW $ctx to the callback sub, so we
don't need to create a new sub every call to capture local variables
for the callback.
|
|
Displaying "100%" wastes a precious column. Show "99%" instead
since there's little practical difference and <xapian/mset.h>
states:
Note that these generally aren't percentages of anything meaningful
(unless you use a custom weighting formula where they are!)
And we're not using a custom weighting formula.
|
|
The displays the Xapian ->get_percent value in the skeleton to
improve scanning of relevancy; irrelevant results do not display
that.
This fixes broken #anchor links introduced in the previous
commit, irrelevant messages now link to the /$INBOX/$MESSAGE_ID page.
|
|
Instead of only passing an Inbox object, we'll pass the $ctx
reference as PublicInbox::SearchView::mset_thread did.
So although mset_thread was wrong, we now make it's usage
of SearchThread::thread correct and update other callers to
favor the new style of passing the entire $ctx (with ->{-inbox})
instead of just the Inbox object.
This makes the thread skeleton at the bottom of the search
page to show subjects of messages, but unfortunately links to
non-existent #anchors. The next commit will fix that.
While we're at it, favor "\&foo" over "*foo" since the former
makes the code reference (aka "function pointer) obvious so it
won't be confused for other things named "foo" in that
scope (e.g. $foo/@foo/%foo).
|
|
Since we index X-Alt-Message-ID (because we need to placate some
NNTP clients), we now display it as well, since that Message-ID
could be the X-Alt-Message-ID that the reader is actually
interested in.
|
|
And use it for the per-message permalink display.
|
|
"refer" is not the correct term, here; since that would mean
multiple messages have the current message in the "References:"
header, and that's a normal occurence.
Instead, we need to warn the reader that the given message
itself has multiple Message-IDs.
|
|
Browsers may underline '<' and '>' in links, which may be
confused with '≤' and '≥'. So have the Message-ID header
display follow what we do with In-Reply-To headers and move the
"<" and ">" outside of <a> in the HTML.
|
|
Mail headers can contain multiple headers of any type, so ensure
we don't hide any information we're getting in the per-message
permalink views.
This means it's possible to have multiple From, Date, To, Cc,
Subject, and In-Reply-To headers displayed.
The thread indices are a special case, I guess, since we run
out of space on the line if the headers too long and tools like
mutt only show the first one.
|
|
|
|
Passing digits to `timegm' which it does not understand would
be a waste of time.
|
|
We don't want to emit funky URLs which can be lost in
translation or cause problems with non-Unicode-aware
clients.
Then, don't accept non-ASCII filenames in URLs, since
a manually-generated URL/filename in attachment downloads
could be used for Unicode homographs to confuse folks who
down the attachment.
|
|
We don't need to rely on Xapian search functionality for the
majority of the WWW code, even. subject_normalized is moved to
SearchMsg, where it (probably) makes more sense, anyways.
|
|
Try to appear gramatically correct and state:
"only message in thread" when there's only one known (to us)
message in the thread.
|
|
Empty subjects ("") and undefined Subjects: are now both
displayed as "(no subject)" for now.
|
|
'$inbox' is more human-readable, so that is for the more
human-readable name in most cases. Making our variable naming
more consistent should make the code easier-to-review and
harder to screw up.
|
|
We were relying on Danga::Socket using the "bytes" pragma,
previously. Nowadays, the "bytes" pragma is not recommended in
general, but bytes::length remains acceptable for getting the
byte-size of a scalar.
|
|
This is best-effort, but works well-enough in practice for
projects which use shell-friendly filenames as well as the
long path names for some Linux kernel selftests.
|
|
Perl "split" can capture and group in the regexp itself,
so rely on that to shorten our code.
Comparing the /T/ HTML output of a thread from hell (on LKML with
1356 messages) reveals no difference in the rendered result.
Only the HTML source differs in newline placement before/after
the closing </span>
This allows a minor speedup on my X32 Thinkpad @ 1.6GHz with
the aforementioned LKML thread from hell:
before: 3.67s
after: 3.55s
|
|
We use absolute URLs in the Atom feeds (to ease
syndication/mirroring), so hunk headers need to point to the
solver URLs.
|