about summary refs log tree commit homepage
path: root/lib/PublicInbox/Search.pm
DateCommit message (Collapse)
2015-08-25search: only sort by relevance if requested
Many of our internal search queries do not care about relevance, but is used for proper thread displays.
2015-08-22search: consistently pass options and flags
Most of our special query functions require exact matches, so none of the flags we normally use are necessary for query parsing.
2015-08-22search: split search indexing to a separate file
This makes organization easier and reduces the amount of code loaded for a PSGI, mod_perl or CGI instance.
2015-08-21search: s/count/total/ for results
This is hopefully less ambiguous, as the word "count" confused me, too.
2015-08-21mbox: stream entire thread, regardless of size
Since mbox is usually downloaded, support fetching infinitely large responses via streaming.
2015-08-20search: preserve References: order in document data
We need proper ordering of References to thread messages correctly. We would lose this order if we load the terms from the database, so set it directly document data. Do not bother with a separate In-Reply-To, since Mail::Thread just merges the IRT into References. This bumps our schema version once again.
2015-08-20avoid using header_raw for Message-ID retrieval
This is for consistency with ssoma. I doubt it makes a difference in practice, but in case somebody decides any of the Message-ID-containing headers should have strange characters, we'll decode and attempt to thread them. This isn't an attack vector, just a way to make messages thread improperly which is pointless...
2015-08-20search: index_sync allows specifying alternate HEAD
This should allow us to sync the index to a temporary head to update the Xapian index before we update the real HEAD index.
2015-08-20search: bump schema version to 5 for subject_path
In "index: simplify main landing page if search-enabled", subject normalization went a little farther to drop trailing '.' characters, so we will need to re-index.
2015-08-20search: reject ghosts in all cases
We do not need ghost messages in any of our thread views
2015-08-20search: avoid needless decode
Email::MIME should handle everything for us and make things work nicely with Xapian (assuming I understand how encoding works in Perl). While we're at it, reduce temporary strings and arrays by using destructive operations and clobbering parts as we iterate through them.
2015-08-20index: simplify main landing page if search-enabled
We can display /t/$MESSAGE_ID.html easily with a Xapian search index, so rely on it instead of trying to display messages inline.
2015-08-18search: bump SCHEMA_VERSION to 4
The following two commits affect indexing behavior, so change the schema version to avoid compatibility problems or missing messages: search: common Subject: normalization for Re: prefixes search: avoid creating ghosts for circular References
2015-08-18search: expose $PublicInbox::Search::LANG variable
This makes it easier to reconfigure for non-English users
2015-08-18search: common Subject: normalization for Re: prefixes
Drop German ("Aw:") support since it's non-standard and is not supported by Mail::Thread and non-English prefixes are more likely to conflict with prefixes used in Free Software development where ("subsection:") prefixes are common and English is the common language. Anyways we don't filter "Vs: " (Finnish) or "Sv: " (Norwegian, Swedish, Danish, Icelandic), either. ref: https://en.wikipedia.org/wiki/RE_(e-mail)#Abbreviations_in_other_languages
2015-08-18search: avoid creating ghosts for circular References
Some mail software incorrectly creates circular references and causes us to create ghosts before the actual mail doc is created.
2015-08-17search: simplify indexing operation
There's no need to make a transaction for each message when doing incremental indexing against a git repository. While we're at it, simplify the interface for callers, too and do not auto-create the Xapian database if it was not explicitly enabled.
2015-08-17search: bump schema version for '%' compression change
commit 0fea7793b22efd2596983283947ee43687e0cfac ("mid: compress Message-IDs with '%' in them") requires re-indexing of repositories with '%' in Message-IDs :<
2015-08-17search: apply mid_compression to subject paths, too
Otherwise we'll be wasting space in our index for long subjects.
2015-08-17search: use raw headers without MIME decoding
This should be less error-prone in case somebody tries to screw with us and our thread_id mechanism or somehow waste our resources. Unfortunately Mail::Thread isn't smart enough for this, yet, so we may need to downgrade to Email::Simple objects as a workaround. Or simply not worry about the display so much if somebody is intentionally trying to make it thread badly/incorrectly.
2015-08-17terminology: replies => followups
Replies are only direct replies, but followups could be any message further down the thread. The latter is more useful.
2015-08-16implement /s/$SUBJECT_PATH.html lookups
Quick-and-dirty wiring up of to Subject: paths. This may prove more memorizable and easier-to-share than /t/$MESSAGE_ID.html links, but less strict. This changes our schema version to 1, since we now use lower-case subject paths.
2015-08-16search: remove unnecessary xpfx export
SearchMsg calls it with the full module path anyways.
2015-08-15search: make search results more OO
This will relieve callers of the need to decode the data we store internally in Xapian
2015-08-15extract redundant Message-ID handling code
Quit repeating ourselves and use a common MID module instead.
2015-08-15search: implement index_sync to fixup indexer
We need to make the indexer executable and installable while we're at it.
2015-08-13initial search backend implementation
This shall allow us to search for replies/threads more easily.