public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2015-11-20	various internal documentation updates
	Hopefully this gives new hackers a better overview of how the components relate to each other.
2015-10-02	rename mid_compress to id_compress
	We use it as a general compressor for identifiers such as subject paths, so using the "mid_" prefix probably is not appropriate.
2015-09-30	nntp: implement OVER/XOVER summary in search document
	The document data of a search message already contains a good chunk of the information needed to respond to OVER/XOVER commands quickly. Expand on that and use the document data to implement OVER/XOVER quickly. This adds a dependency on Xapian being available for nntpd usage, but is probably alright since nntpd is esoteric enough that anybody willing to run nntpd will also want search functionality offered by Xapian. This also speeds up XHDR/HDR with the To: and Cc: headers and :bytes/:lines article metadata used by some clients for header displays and marking messages as read/unread.
2015-09-30	search: remove get_subject_path
	We probably won't be supporting this in the public API
2015-09-18	read-only NNTP server
	Implementing NEWNEWS, XHDR, XOVER efficiently will require additional caching on top of msgmap. This seems to work with lynx and slrnpull, haven't tried clients. DO NOT run in production, yet, denial-of-service vulnerabilities await!
2015-09-15	extmsg: wire up to use msgmap for prefixes
	DBI + DBD::SQLite has much better handling of prefix lookups than Xapian. While we're at it, avoid linking blatantly wrong Message-IDs to external services.
2015-09-06	update copyright headers and email addresses
	In the future, it should be possible to use this: git ls-files \| UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright
2015-09-05	extmsg: fall back to partial Message-ID matching
	In case a URL gets truncated (as is common with long URLs), we can rely on Xapian for partial matches and bring the user to their destination.
2015-09-05	search: tweak parsing for internal queries
	We should not need to use QueryParser for internal queries, but rather for external ones. We'll also be exposing searching Message-IDs with the "mid:" prefix for broken mids on some servers, and enabling partial searching with 'm' to help with URL truncations. Since thread IDs may be volatile, they cannot be exposed to the public, there's no reason to expose them to the query parser, either. Also, add 's:' as an alternative probabilistic prefix to 'subject' as it is shorter.
2015-09-05	search: note why we do not support FLAG_PURE_NOT
	Perhaps this can be optionally enabled in the future for smaller sites.
2015-09-05	search: use relevance as secondary sort by default
	Might as well give relevance some weight if the timestamp is tied.
2015-09-05	view: preliminary HTML search interface
	This hopefully makes it easier to find things without resorting to proprietary external services.
2015-09-03	search: disable Message-ID compression in Xapian
	We'll continue to compress long Message-IDs in URLs (which we know about), but we will store entire Message-IDs in the Xapian database to facilitate ease-of-lookups in external databases.
2015-09-01	search: show newest results first
	Like revision control history, older stuff is less relevant, so favor newer stuff, first.
2015-09-01	search: allow querying all mail with ''
	This makes dumping recent topics easier, hopefully.
2015-09-01	search: reduce redundant doc data
	Redundant document data increases our database size, pull the smsg->mid off the unique term, the smsg->ts off the value, and only generate the formatted display date off smsg->ts.
2015-08-30	search: do not index references and inreplyto terms
	We no longer need them, as we can rely on index-time thread resolution and thread merging. This allows us to index less data and hopefully increase efficiency.
2015-08-29	avoid length in boolean context
	Perl does not currently optimize for this. ref (from p5p): http://mid.gmane.org/D5C27970-9176-4C7A-8B99-7D78360E67A2@pobox.com
2015-08-28	search: do not load type into metadata
	Our search query already filters out ghost messages, so it's wasteful to have type information loaded.
2015-08-25	search: implement subject summarization
	We ought to summarize subjects to avoid exploding line lengths in the web interface.
2015-08-25	mid: mid_compressed => mid_compress
	Consistently name mid_* functions as verbs.
2015-08-25	search: only sort by relevance if requested
	Many of our internal search queries do not care about relevance, but is used for proper thread displays.
2015-08-22	search: consistently pass options and flags
	Most of our special query functions require exact matches, so none of the flags we normally use are necessary for query parsing.
2015-08-22	search: split search indexing to a separate file
	This makes organization easier and reduces the amount of code loaded for a PSGI, mod_perl or CGI instance.
2015-08-21	search: s/count/total/ for results
	This is hopefully less ambiguous, as the word "count" confused me, too.
2015-08-21	mbox: stream entire thread, regardless of size
	Since mbox is usually downloaded, support fetching infinitely large responses via streaming.
2015-08-20	search: preserve References: order in document data
	We need proper ordering of References to thread messages correctly. We would lose this order if we load the terms from the database, so set it directly document data. Do not bother with a separate In-Reply-To, since Mail::Thread just merges the IRT into References. This bumps our schema version once again.
2015-08-20	avoid using header_raw for Message-ID retrieval
	This is for consistency with ssoma. I doubt it makes a difference in practice, but in case somebody decides any of the Message-ID-containing headers should have strange characters, we'll decode and attempt to thread them. This isn't an attack vector, just a way to make messages thread improperly which is pointless...
2015-08-20	search: index_sync allows specifying alternate HEAD
	This should allow us to sync the index to a temporary head to update the Xapian index before we update the real HEAD index.
2015-08-20	search: bump schema version to 5 for subject_path
	In "index: simplify main landing page if search-enabled", subject normalization went a little farther to drop trailing '.' characters, so we will need to re-index.
2015-08-20	search: reject ghosts in all cases
	We do not need ghost messages in any of our thread views
2015-08-20	search: avoid needless decode
	Email::MIME should handle everything for us and make things work nicely with Xapian (assuming I understand how encoding works in Perl). While we're at it, reduce temporary strings and arrays by using destructive operations and clobbering parts as we iterate through them.
2015-08-20	index: simplify main landing page if search-enabled
	We can display /t/$MESSAGE_ID.html easily with a Xapian search index, so rely on it instead of trying to display messages inline.
2015-08-18	search: bump SCHEMA_VERSION to 4
	The following two commits affect indexing behavior, so change the schema version to avoid compatibility problems or missing messages: search: common Subject: normalization for Re: prefixes search: avoid creating ghosts for circular References
2015-08-18	search: expose $PublicInbox::Search::LANG variable
	This makes it easier to reconfigure for non-English users
2015-08-18	search: common Subject: normalization for Re: prefixes
	Drop German ("Aw:") support since it's non-standard and is not supported by Mail::Thread and non-English prefixes are more likely to conflict with prefixes used in Free Software development where ("subsection:") prefixes are common and English is the common language. Anyways we don't filter "Vs: " (Finnish) or "Sv: " (Norwegian, Swedish, Danish, Icelandic), either. ref: https://en.wikipedia.org/wiki/RE_(e-mail)#Abbreviations_in_other_languages
2015-08-18	search: avoid creating ghosts for circular References
	Some mail software incorrectly creates circular references and causes us to create ghosts before the actual mail doc is created.
2015-08-17	search: simplify indexing operation
	There's no need to make a transaction for each message when doing incremental indexing against a git repository. While we're at it, simplify the interface for callers, too and do not auto-create the Xapian database if it was not explicitly enabled.
2015-08-17	search: bump schema version for '%' compression change
	commit 0fea7793b22efd2596983283947ee43687e0cfac ("mid: compress Message-IDs with '%' in them") requires re-indexing of repositories with '%' in Message-IDs :<
2015-08-17	search: apply mid_compression to subject paths, too
	Otherwise we'll be wasting space in our index for long subjects.
2015-08-17	search: use raw headers without MIME decoding
	This should be less error-prone in case somebody tries to screw with us and our thread_id mechanism or somehow waste our resources. Unfortunately Mail::Thread isn't smart enough for this, yet, so we may need to downgrade to Email::Simple objects as a workaround. Or simply not worry about the display so much if somebody is intentionally trying to make it thread badly/incorrectly.
2015-08-17	terminology: replies => followups
	Replies are only direct replies, but followups could be any message further down the thread. The latter is more useful.
2015-08-16	implement /s/$SUBJECT_PATH.html lookups
	Quick-and-dirty wiring up of to Subject: paths. This may prove more memorizable and easier-to-share than /t/$MESSAGE_ID.html links, but less strict. This changes our schema version to 1, since we now use lower-case subject paths.
2015-08-16	search: remove unnecessary xpfx export
	SearchMsg calls it with the full module path anyways.
2015-08-15	search: make search results more OO
	This will relieve callers of the need to decode the data we store internally in Xapian
2015-08-15	extract redundant Message-ID handling code
	Quit repeating ourselves and use a common MID module instead.
2015-08-15	search: implement index_sync to fixup indexer
	We need to make the indexer executable and installable while we're at it.
2015-08-13	initial search backend implementation
	This shall allow us to search for replies/threads more easily.