Date | Commit message (Collapse) |
|
Dikshunarees R gude!
|
|
We shouldn't rerun the address obfuscator on data we've
already run through. Instead, run through the unescaped
text part and substitute the UTF-8 "\x{2022}" substitution
before it hits HTML escaping
Fixes: 9bdd81dc16ba6511 ("view: msg_iter calls add_body_text directly")
|
|
Using `undef EXPR' like a function call actually frees the heap
memory associated with the scalar, whereas `$sv = undef' or
`$sv = ""' will hold the buffer around until $sv goes out
of scope.
The `sv_set_undef' documentation in the perlapi(1) manpage
explicitly states this:
The perl equivalent is "$sv = undef;". Note that it doesn't
free any string buffer, unlike "undef $sv".
And I've confirmed by reading Dump() output from Devel::Peek.
We'll also inline the old index_body sub in SearchIdx.pm to make
the scope of the scalar more obvious.
This change saves several hundred kB RSS on both -index and
-httpd when hitting large emails with thousands of lines.
|
|
No point in having an extra sub for a short, commonly
called function in the same file.
|
|
We're slowly moving towards doing all of our output buffering
into a single buffer, so passing that around on the stack as
a dedicated parameter is confusing.
|
|
While rare in practice (even by spammers), A single "0" could
theoretically be the entire contents of a Subject line. So
use the Perl 5.10+ defined-or operator to improve correctness
of subject deduplication.
|
|
We depend on Perl 5.10 features in other places. Shorten the
lifetime of the `$desc' scalar while we're at it.
|
|
Clarify that we're assuming the text is UTF-8, since users
may have no idea how it's mangled.
|
|
These seem mostly harmless since Perl will just truncate the
match and start a new one on a newline boundary in our case.
The only downside is we'd end up with redundant <span> tags in
HTML.
Limiting the number of line matched ourselves with `{1,$NUM}'
doesn't seem prudent since lines vary in length, so we continue
to defer the job of limiting matches to the Perl regexp engine.
I've noticed this warning in practice on 100K+ line patches to
locale data.
|
|
There may be no topics for a given timestamp range,
so don't attempt to treat `undef' as an arrayref.
|
|
This allows us to consistently enforce the same Message-ID
extraction rules everywhere and makes it easier for us to
make changes in the future.
Update scripts/ssoma-replay, as well, but don't rely on
PublicInbox::* modules in that since it's legacy and
public-inbox was never a dependency of ssoma.
|
|
Since the introduction of over.sqlite3, SearchMsg is not tied to
our search functionality in any way, so stop confusing ourselves
and future hackers by just calling it "PublicInbox::Smsg".
Add a missing "use" in ExtMsg while we're at it.
|
|
We don't need to hold onto the Email::MIME object across
multiple WwwResponse->getline calls, instead we can stuff
the rendered HTML of the first (and hopefully only) message
of the buffer into ctx->{-html_tip}.
|
|
The object-oriented Hval API turned out to be less useful and
more clunky than I envisioned years ago, so get rid of it.
We'll no longer strip trailing whitespace from From: headers in
the HTML display, but I doubt anybody cares.
|
|
We need to escape ampersands (and some other characters for href
attributes), so introduce a `mid_href' sub to do just that.
'<', '>' and '"' were always escaped, so there's no risk of tag
or attribute injection, but creative Message-IDs could cause
confusion for some parsers and generate invalid URLs.
Start getting rid of the bloated, over-engineered OO Hval API
while we're at it, I only noticed this bug because I started
killing off Hval->new* callers.
|
|
No need to use the over-engineered Hval OO API when the subject
is already normalized and there's no trailing spaces because of
normalization.
|
|
We already pre-populate the hashref when loading $smsg
(PublicInbox::SearchMsg) objects out of over.sqlite3 or Xapian,
so making expensive method calls isn't necessary in those cases.
We only need to use the method calls when SQLite or Xapian are
not available or are being populated (such as during indexing).
|
|
Avoid needlessly normalizing the subject when dumping, since
it's pushed into the @$topic array during accumulation in
normalized form.
We can also safely treat $smsg as a hashref and avoid
calling "->ds" as a method since we know we've got that
loaded via Over||Search and won't have to use Email::MIME
header lookup methods.
|
|
We use `$top' in other places, so name it to `$top_subj'
consistently for `$subj' and `$prev_subj' comparisons down
the function.
|
|
While multi-Subject messages are unfortunate, try not to
generate confusing/invalid HTML with multiple elements
having the same HTML id attribute.
|
|
No point in passing something on stack only to stash it
into the $ctx which holds most other parameters used for
rendering the HTML.
|
|
I didn't wait until September to do it, this year!
|
|
Instead of going line-by-line, use split() with a giant regexp
to capture groups of contiguous lines. This offloads state
management to the regexp itself and makes it FAR easier to
keep track of <span> and </span> pairings.
Performance seems roughly on par after this change for the
meta@public-inbox archives. It seems a tiny bit faster for
git@vger with xt/perf-msgview.t, likely due to the longer
messages and larger contiguous groups of lines having the same
prefix (or no prefix at all) and drastically reduces the number
of subroutine calls and Perl ops executed.
|
|
We use the same idiom in many places for doing two-step
linkification and HTML escaping. Get rid of an outdated
comment in flush_quote while we're at it.
|
|
No need to keep the old sub around, anymore. Rename auxiliary
subs to "msg_page_*" instead of the "html" version.
|
|
Get rid of the confusingly named {rv} and {tip} fields
and unify them into {obuf} for readability.
{obuf} usage may be expanded to more areas in the future. This
will eventually make it easier for us to experiment with
alternative buffering schemes.
|
|
It's an uncommon code path, no need to make it more complex
than it needs to be by having extra sub parameters.
|
|
It hasn't changed in a few years. Now we can rely on constant
folding to avoid extraneous ops to the $skel buffer.
|
|
Put more logic into html_footer and less in its only caller so
we can control the buffering and string creation.
|
|
Pass \&coderefs explicitly to walk_thread, and add some
prototypes + comments to describe what goes on.
|
|
This saves us a few comments and confusion. Yes, it's a
destination so "dst" can be appropriate, but we may be using
that term elsewhere.
|
|
While both can be correct, the former seems more common,
is shorter, and is also consistent with the spelling found
in the AGPL-3.0 text.
|
|
We're often iterating through messages while writing to another
buffer in our WWW interface, causing memory usage to multiply.
Since we know we won't need to keep the MIME object around in
some cases, and can tell msg_iter to clobber the on-stack
variable while it operates on subparts of multipart messages.
With xt/mem-msgview.t switched to multipart from the previous
commit, this shows a 13 MB memory reduction on that test.
|
|
The POSIX module is always loaded, so import `strftime' into the
namespace so we can use it and take advantage of compile-time
arg checking. While we're at it, update and reorder caller
functions to use prototypes, too.
|
|
This allows to do some compile-time checking and fills in a
missing "use" in PublicInbox::NewsWWW, allowing it to be used
standalone and independently of PublicInbox::WWW
|
|
In rare cases where Message-IDs get reused, we do not want to
hold onto the large Email::MIME objects in memory after showing
the first message. So discard each message as soon as we're
done using it so we can save memory for the next message.
The new and expensive xt/mem-msgview.t test shows a nearly 14MB
reduction for two ~7MB messages. run_script() also gets
upgraded to make it easier to pass large inputs via IO GLOBs.
|
|
No need to waste several kilobytes creating an anonymous sub for
every invocation of msg_iter.
|
|
We don't need to return a closure or have a separate hash
for sorting threads by relevance. Instead, we can stuff
the relevance {pct} into the SearchMsg object itself and
use that.
Note: upon reviewing this code, the sort-by-relevance seems
bogus as it only considers the relevance of the topmost message.
Instead, it would make more sense to the user to sort by the
highest relevance of all messages in that particular thread.
|
|
We can pass everything we need into the WWW $ctx to avoid
allocating kilobytes of memory for an anonymous sub for every
$MESSAGE_ID/t/ request.
|
|
Stash 5 local variables into the WWW $ctx hash table instead of
allocating several kilobytes for an anonymous sub.
|
|
WwwStream already passes the WWW $ctx to the callback sub, so we
don't need to create a new sub every call to capture local variables
for the callback.
|
|
Displaying "100%" wastes a precious column. Show "99%" instead
since there's little practical difference and <xapian/mset.h>
states:
Note that these generally aren't percentages of anything meaningful
(unless you use a custom weighting formula where they are!)
And we're not using a custom weighting formula.
|
|
The displays the Xapian ->get_percent value in the skeleton to
improve scanning of relevancy; irrelevant results do not display
that.
This fixes broken #anchor links introduced in the previous
commit, irrelevant messages now link to the /$INBOX/$MESSAGE_ID page.
|
|
Instead of only passing an Inbox object, we'll pass the $ctx
reference as PublicInbox::SearchView::mset_thread did.
So although mset_thread was wrong, we now make it's usage
of SearchThread::thread correct and update other callers to
favor the new style of passing the entire $ctx (with ->{-inbox})
instead of just the Inbox object.
This makes the thread skeleton at the bottom of the search
page to show subjects of messages, but unfortunately links to
non-existent #anchors. The next commit will fix that.
While we're at it, favor "\&foo" over "*foo" since the former
makes the code reference (aka "function pointer) obvious so it
won't be confused for other things named "foo" in that
scope (e.g. $foo/@foo/%foo).
|
|
Since we index X-Alt-Message-ID (because we need to placate some
NNTP clients), we now display it as well, since that Message-ID
could be the X-Alt-Message-ID that the reader is actually
interested in.
|
|
And use it for the per-message permalink display.
|
|
"refer" is not the correct term, here; since that would mean
multiple messages have the current message in the "References:"
header, and that's a normal occurence.
Instead, we need to warn the reader that the given message
itself has multiple Message-IDs.
|
|
Browsers may underline '<' and '>' in links, which may be
confused with '≤' and '≥'. So have the Message-ID header
display follow what we do with In-Reply-To headers and move the
"<" and ">" outside of <a> in the HTML.
|
|
Mail headers can contain multiple headers of any type, so ensure
we don't hide any information we're getting in the per-message
permalink views.
This means it's possible to have multiple From, Date, To, Cc,
Subject, and In-Reply-To headers displayed.
The thread indices are a special case, I guess, since we run
out of space on the line if the headers too long and tools like
mutt only show the first one.
|
|
|