Date | Commit message (Collapse) |
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
We use it as a general compressor for identifiers such as
subject paths, so using the "mid_" prefix probably is not
appropriate.
|
|
The document data of a search message already contains a good chunk
of the information needed to respond to OVER/XOVER commands quickly.
Expand on that and use the document data to implement OVER/XOVER
quickly.
This adds a dependency on Xapian being available for nntpd usage,
but is probably alright since nntpd is esoteric enough that anybody
willing to run nntpd will also want search functionality offered
by Xapian.
This also speeds up XHDR/HDR with the To: and Cc: headers and
:bytes/:lines article metadata used by some clients for header
displays and marking messages as read/unread.
|
|
We probably won't be supporting this in the public API
|
|
Implementing NEWNEWS, XHDR, XOVER efficiently will require
additional caching on top of msgmap.
This seems to work with lynx and slrnpull, haven't tried clients.
DO NOT run in production, yet, denial-of-service vulnerabilities
await!
|
|
DBI + DBD::SQLite has much better handling of prefix lookups
than Xapian. While we're at it, avoid linking blatantly wrong
Message-IDs to external services.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
In case a URL gets truncated (as is common with long URLs),
we can rely on Xapian for partial matches and bring the user
to their destination.
|
|
We should not need to use QueryParser for internal queries,
but rather for external ones.
We'll also be exposing searching Message-IDs with the "mid:" prefix
for broken mids on some servers, and enabling partial searching
with 'm' to help with URL truncations.
Since thread IDs may be volatile, they cannot be exposed to the
public, there's no reason to expose them to the query parser,
either.
Also, add 's:' as an alternative probabilistic prefix to 'subject'
as it is shorter.
|
|
Perhaps this can be optionally enabled in the future for smaller
sites.
|
|
Might as well give relevance some weight if the timestamp is tied.
|
|
This hopefully makes it easier to find things without resorting
to proprietary external services.
|
|
We'll continue to compress long Message-IDs in URLs (which we know
about), but we will store entire Message-IDs in the Xapian database
to facilitate ease-of-lookups in external databases.
|
|
Like revision control history, older stuff is less relevant,
so favor newer stuff, first.
|
|
This makes dumping recent topics easier, hopefully.
|
|
Redundant document data increases our database size, pull the
smsg->mid off the unique term, the smsg->ts off the value, and
only generate the formatted display date off smsg->ts.
|
|
We no longer need them, as we can rely on index-time thread
resolution and thread merging. This allows us to index less
data and hopefully increase efficiency.
|
|
Perl does not currently optimize for this.
ref (from p5p):
http://mid.gmane.org/D5C27970-9176-4C7A-8B99-7D78360E67A2@pobox.com
|
|
Our search query already filters out ghost messages,
so it's wasteful to have type information loaded.
|
|
We ought to summarize subjects to avoid exploding
line lengths in the web interface.
|
|
Consistently name mid_* functions as verbs.
|
|
Many of our internal search queries do not care about relevance,
but is used for proper thread displays.
|
|
Most of our special query functions require exact matches, so none
of the flags we normally use are necessary for query parsing.
|
|
This makes organization easier and reduces the amount of code
loaded for a PSGI, mod_perl or CGI instance.
|
|
This is hopefully less ambiguous, as the word "count" confused
me, too.
|
|
Since mbox is usually downloaded, support fetching infinitely large
responses via streaming.
|
|
We need proper ordering of References to thread messages
correctly. We would lose this order if we load the terms
from the database, so set it directly document data.
Do not bother with a separate In-Reply-To, since Mail::Thread
just merges the IRT into References. This bumps our schema
version once again.
|
|
This is for consistency with ssoma. I doubt it makes
a difference in practice, but in case somebody decides
any of the Message-ID-containing headers should have
strange characters, we'll decode and attempt to thread
them. This isn't an attack vector, just a way to
make messages thread improperly which is pointless...
|
|
This should allow us to sync the index to a temporary head
to update the Xapian index before we update the real HEAD
index.
|
|
In "index: simplify main landing page if search-enabled",
subject normalization went a little farther to drop trailing
'.' characters, so we will need to re-index.
|
|
We do not need ghost messages in any of our thread views
|
|
Email::MIME should handle everything for us and make things
work nicely with Xapian (assuming I understand how encoding
works in Perl).
While we're at it, reduce temporary strings and arrays by
using destructive operations and clobbering parts as we
iterate through them.
|
|
We can display /t/$MESSAGE_ID.html easily with a Xapian search
index, so rely on it instead of trying to display messages inline.
|
|
The following two commits affect indexing behavior, so
change the schema version to avoid compatibility problems
or missing messages:
search: common Subject: normalization for Re: prefixes
search: avoid creating ghosts for circular References
|
|
This makes it easier to reconfigure for non-English users
|
|
Drop German ("Aw:") support since it's non-standard and
is not supported by Mail::Thread and non-English prefixes
are more likely to conflict with prefixes used in Free Software
development where ("subsection:") prefixes are common and English is the
common language.
Anyways we don't filter "Vs: " (Finnish) or "Sv: "
(Norwegian, Swedish, Danish, Icelandic), either.
ref:
https://en.wikipedia.org/wiki/RE_(e-mail)#Abbreviations_in_other_languages
|
|
Some mail software incorrectly creates circular references
and causes us to create ghosts before the actual mail doc
is created.
|
|
There's no need to make a transaction for each message when doing
incremental indexing against a git repository. While we're at it,
simplify the interface for callers, too and do not auto-create
the Xapian database if it was not explicitly enabled.
|
|
commit 0fea7793b22efd2596983283947ee43687e0cfac
("mid: compress Message-IDs with '%' in them")
requires re-indexing of repositories with '%' in Message-IDs :<
|
|
Otherwise we'll be wasting space in our index for long
subjects.
|
|
This should be less error-prone in case somebody tries to screw with
us and our thread_id mechanism or somehow waste our resources.
Unfortunately Mail::Thread isn't smart enough for this, yet, so we
may need to downgrade to Email::Simple objects as a workaround.
Or simply not worry about the display so much if somebody is
intentionally trying to make it thread badly/incorrectly.
|
|
Replies are only direct replies, but followups could be any message
further down the thread. The latter is more useful.
|
|
Quick-and-dirty wiring up of to Subject: paths.
This may prove more memorizable and easier-to-share than
/t/$MESSAGE_ID.html links, but less strict.
This changes our schema version to 1, since we now
use lower-case subject paths.
|
|
SearchMsg calls it with the full module path anyways.
|
|
This will relieve callers of the need to decode the data
we store internally in Xapian
|
|
Quit repeating ourselves and use a common MID module
instead.
|
|
We need to make the indexer executable and installable
while we're at it.
|
|
This shall allow us to search for replies/threads more easily.
|