Date | Commit message (Collapse) |
|
We should not need to use QueryParser for internal queries,
but rather for external ones.
We'll also be exposing searching Message-IDs with the "mid:" prefix
for broken mids on some servers, and enabling partial searching
with 'm' to help with URL truncations.
Since thread IDs may be volatile, they cannot be exposed to the
public, there's no reason to expose them to the query parser,
either.
Also, add 's:' as an alternative probabilistic prefix to 'subject'
as it is shorter.
|
|
Perhaps this can be optionally enabled in the future for smaller
sites.
|
|
Might as well give relevance some weight if the timestamp is tied.
|
|
This hopefully makes it easier to find things without resorting
to proprietary external services.
|
|
We'll continue to compress long Message-IDs in URLs (which we know
about), but we will store entire Message-IDs in the Xapian database
to facilitate ease-of-lookups in external databases.
|
|
Like revision control history, older stuff is less relevant,
so favor newer stuff, first.
|
|
This makes dumping recent topics easier, hopefully.
|
|
Redundant document data increases our database size, pull the
smsg->mid off the unique term, the smsg->ts off the value, and
only generate the formatted display date off smsg->ts.
|
|
We no longer need them, as we can rely on index-time thread
resolution and thread merging. This allows us to index less
data and hopefully increase efficiency.
|
|
Perl does not currently optimize for this.
ref (from p5p):
http://mid.gmane.org/D5C27970-9176-4C7A-8B99-7D78360E67A2@pobox.com
|
|
Our search query already filters out ghost messages,
so it's wasteful to have type information loaded.
|
|
We ought to summarize subjects to avoid exploding
line lengths in the web interface.
|
|
Consistently name mid_* functions as verbs.
|
|
Many of our internal search queries do not care about relevance,
but is used for proper thread displays.
|
|
Most of our special query functions require exact matches, so none
of the flags we normally use are necessary for query parsing.
|
|
This makes organization easier and reduces the amount of code
loaded for a PSGI, mod_perl or CGI instance.
|
|
This is hopefully less ambiguous, as the word "count" confused
me, too.
|
|
Since mbox is usually downloaded, support fetching infinitely large
responses via streaming.
|
|
We need proper ordering of References to thread messages
correctly. We would lose this order if we load the terms
from the database, so set it directly document data.
Do not bother with a separate In-Reply-To, since Mail::Thread
just merges the IRT into References. This bumps our schema
version once again.
|
|
This is for consistency with ssoma. I doubt it makes
a difference in practice, but in case somebody decides
any of the Message-ID-containing headers should have
strange characters, we'll decode and attempt to thread
them. This isn't an attack vector, just a way to
make messages thread improperly which is pointless...
|
|
This should allow us to sync the index to a temporary head
to update the Xapian index before we update the real HEAD
index.
|
|
In "index: simplify main landing page if search-enabled",
subject normalization went a little farther to drop trailing
'.' characters, so we will need to re-index.
|
|
We do not need ghost messages in any of our thread views
|
|
Email::MIME should handle everything for us and make things
work nicely with Xapian (assuming I understand how encoding
works in Perl).
While we're at it, reduce temporary strings and arrays by
using destructive operations and clobbering parts as we
iterate through them.
|
|
We can display /t/$MESSAGE_ID.html easily with a Xapian search
index, so rely on it instead of trying to display messages inline.
|
|
The following two commits affect indexing behavior, so
change the schema version to avoid compatibility problems
or missing messages:
search: common Subject: normalization for Re: prefixes
search: avoid creating ghosts for circular References
|
|
This makes it easier to reconfigure for non-English users
|
|
Drop German ("Aw:") support since it's non-standard and
is not supported by Mail::Thread and non-English prefixes
are more likely to conflict with prefixes used in Free Software
development where ("subsection:") prefixes are common and English is the
common language.
Anyways we don't filter "Vs: " (Finnish) or "Sv: "
(Norwegian, Swedish, Danish, Icelandic), either.
ref:
https://en.wikipedia.org/wiki/RE_(e-mail)#Abbreviations_in_other_languages
|
|
Some mail software incorrectly creates circular references
and causes us to create ghosts before the actual mail doc
is created.
|
|
There's no need to make a transaction for each message when doing
incremental indexing against a git repository. While we're at it,
simplify the interface for callers, too and do not auto-create
the Xapian database if it was not explicitly enabled.
|
|
commit 0fea7793b22efd2596983283947ee43687e0cfac
("mid: compress Message-IDs with '%' in them")
requires re-indexing of repositories with '%' in Message-IDs :<
|
|
Otherwise we'll be wasting space in our index for long
subjects.
|
|
This should be less error-prone in case somebody tries to screw with
us and our thread_id mechanism or somehow waste our resources.
Unfortunately Mail::Thread isn't smart enough for this, yet, so we
may need to downgrade to Email::Simple objects as a workaround.
Or simply not worry about the display so much if somebody is
intentionally trying to make it thread badly/incorrectly.
|
|
Replies are only direct replies, but followups could be any message
further down the thread. The latter is more useful.
|
|
Quick-and-dirty wiring up of to Subject: paths.
This may prove more memorizable and easier-to-share than
/t/$MESSAGE_ID.html links, but less strict.
This changes our schema version to 1, since we now
use lower-case subject paths.
|
|
SearchMsg calls it with the full module path anyways.
|
|
This will relieve callers of the need to decode the data
we store internally in Xapian
|
|
Quit repeating ourselves and use a common MID module
instead.
|
|
We need to make the indexer executable and installable
while we're at it.
|
|
This shall allow us to search for replies/threads more easily.
|