Date | Commit message (Collapse) |
|
This simplifies the code a bit and reduces the translation
overhead for looking directly at data from tools shipped
with Xapian.
While we're at it, fix thread-all.t :)
|
|
Apparently it never actually got used, and the world seems
fine without it, so we can drop it.
While we're at it, consider removing our subject_path
usage from existence, too. We are not using fancy subject-line
based URLs, here.
|
|
This is faster, smaller, and more straighforward to me with
fewer layers of indirection.
|
|
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
|
|
Instead, only preload the ->mid field for threading,
as we only need ->thread and ->path once in Search->get_thread
(but we will need the ->mid field repeatedly).
This more than doubles View->load_results performance on
according to thread-all on an inbox with over 300K messages.
|
|
We only generate the ->date once in NNTP, so creating
the hash entry is a waste.
|
|
strftime is locale-dependent, which can cause surprising
failures for some users.
|
|
This hasn't been needed since our Email::Abstract removal
for message threading.
|
|
This roughly doubles performance due to the reduction in
object creation and abstraction layers.
|
|
Doing git tree lookups based on the SHA-1 of the Message-ID
is expensive as trees get larger, instead, use the SHA-1
object ID directly. This drastically reduces the amount
of time spent in the "git cat-file --batch" process for
fetching the /$INBOX/all.mbox.gz endpoint on the ~800MB
git@vger.kernel.org mirror
This retains backwards compatibility and allows existing
indices to be transparently upgraded without performance
degradation.
|
|
Address::names is sufficient to handle what from_name did.
|
|
This should help avoid having too many fake top-level
messages in the topic view since we only have a partial
window for threading results.
|
|
No need to duplicate the string when transforming it;
learned from studying SpamAssassin 3.4.1
|
|
We cannot have strftime using the local timezone for %z.
This fixes output when a server is not running UTC.
|
|
git has stricter requirements for ident names (no '<>')
which Email::Address allows.
Even in 1.908, Email::Address also has an incomplete fix for
CVE-2015-7686 with a DoS-able regexp for comments. Since we
don't care for or need all the RFC compliance of Email::Address,
avoiding it entirely may be preferable.
Email::Address will still be installed as a requirement for
Email::MIME, but it is only used by the
Email::MIME::header_str_set which we do not use
|
|
Noticed when using a long URL in the subject.
|
|
Hard tabs *may* be searchable, so preserve them since they do
not take up any more space than a normal space. However, CR
(carriage return) is worthless and likely a sign of a buggy mail
(or spam) client anyways.
|
|
Message-IDs should not be MIME encoded, but in case they are,
use the raw form for compatibility with ssoma and possibly
other tools. This prevents a potential problem where a
malicious client could confuse our storage layer into indexing
incorrect contents.
|
|
Not sure how, but this should've always been AGPL-3.0+ like
the rest of the code, not GPL-3.0+
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
The document data of a search message already contains a good chunk
of the information needed to respond to OVER/XOVER commands quickly.
Expand on that and use the document data to implement OVER/XOVER
quickly.
This adds a dependency on Xapian being available for nntpd usage,
but is probably alright since nntpd is esoteric enough that anybody
willing to run nntpd will also want search functionality offered
by Xapian.
This also speeds up XHDR/HDR with the To: and Cc: headers and
:bytes/:lines article metadata used by some clients for header
displays and marking messages as read/unread.
|
|
Using Xapian allows us to implement XROVER without forking
new processes.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
Spaces may be added when using header_str with Email::MIME->create,
so use the normal "header" parameter when setting Message-IDs
and References.
|
|
We'll continue to compress long Message-IDs in URLs (which we know
about), but we will store entire Message-IDs in the Xapian database
to facilitate ease-of-lookups in external databases.
|
|
Redundant document data increases our database size, pull the
smsg->mid off the unique term, the smsg->ts off the value, and
only generate the formatted display date off smsg->ts.
|
|
A document may have many terms, so this hurts performance
if we blindly iterate. Unfortunately, we can't rely on the
order of the termlist just yet, either, so we must repeatedly
restart the search for now until we're ready to bump schema
versions.
|
|
We ought to summarize subjects to avoid exploding
line lengths in the web interface.
|
|
Consistently name mid_* functions as verbs.
|
|
We need proper ordering of References to thread messages
correctly. We would lose this order if we load the terms
from the database, so set it directly document data.
Do not bother with a separate In-Reply-To, since Mail::Thread
just merges the IRT into References. This bumps our schema
version once again.
|
|
Ghosts have no document data in them.
Perhaps we should just rely on terms for Message-ID
and avoid storing that in the document data...
|
|
This can be used to quickly scan for replies in a message without
displaying an entire thread.
|
|
This will relieve callers of the need to decode the data
we store internally in Xapian
|
|
This shall allow us to search for replies/threads more easily.
|