public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2017-02-10	search: remove unnecessary abstractions and functionality
	This simplifies the code a bit and reduces the translation overhead for looking directly at data from tools shipped with Xapian. While we're at it, fix thread-all.t :)
2017-01-07	search: remove subject_summary
	Apparently it never actually got used, and the world seems fine without it, so we can drop it. While we're at it, consider removing our subject_path usage from existence, too. We are not using fancy subject-line based URLs, here.
2017-01-07	searchmsg: favor direct hash access over accessor methods
	This is faster, smaller, and more straighforward to me with fewer layers of indirection.
2017-01-07	remove incorrect comment about strftime + locales
	We only need strftime to be locale-independent when generating dates for email and HTTP headers. Purely numeric dates can use strftime for ease-of-readability.
2016-12-20	searchmsg: remove ensure_metadata
	Instead, only preload the ->mid field for threading, as we only need ->thread and ->path once in Search->get_thread (but we will need the ->mid field repeatedly). This more than doubles View->load_results performance on according to thread-all on an inbox with over 300K messages.
2016-12-17	searchmsg: do not memoize {date} field
	We only generate the ->date once in NNTP, so creating the hash entry is a waste.
2016-12-17	searchmsg: remove locale-dependency for ->date
	strftime is locale-dependent, which can cause surprising failures for some users.
2016-12-13	searchmsg: remove unused EPOCH_822 constant
	This hasn't been needed since our Email::Abstract removal for message threading.
2016-10-05	thread: remove Email::Abstract wrapping
	This roughly doubles performance due to the reduction in object creation and abstraction layers.
2016-08-04	searchmsg: add git object ID to doc_data
	Doing git tree lookups based on the SHA-1 of the Message-ID is expensive as trees get larger, instead, use the SHA-1 object ID directly. This drastically reduces the amount of time spent in the "git cat-file --batch" process for fetching the /$INBOX/all.mbox.gz endpoint on the ~800MB git@vger.kernel.org mirror This retains backwards compatibility and allows existing indices to be transparently upgraded without performance degradation.
2016-06-25	address: remove Address::from_name
	Address::names is sufficient to handle what from_name did.
2016-06-20	www: improve topic view by scanning for ghosts
	This should help avoid having too many fake top-level messages in the topic view since we only have a partial window for threading results.
2016-05-30	use utf8::{encode,decode} for in-place transforms
	No need to duplicate the string when transforming it; learned from studying SpamAssassin 3.4.1
2016-05-29	searchmsg: all timestamps stored in Xapian are UTC
	We cannot have strftime using the local timezone for %z. This fixes output when a server is not running UTC.
2016-05-25	remove Email::Address dependency
	git has stricter requirements for ident names (no '<>') which Email::Address allows. Even in 1.908, Email::Address also has an incomplete fix for CVE-2015-7686 with a DoS-able regexp for comments. Since we don't care for or need all the RFC compliance of Email::Address, avoiding it entirely may be preferable. Email::Address will still be installed as a requirement for Email::MIME, but it is only used by the Email::MIME::header_str_set which we do not use
2016-04-30	searchmsg: ensure long subject lines are not broken
	Noticed when using a long URL in the subject.
2016-03-12	searchmsg: preserve hard tabs, but drop CR (\r)
	Hard tabs may be searchable, so preserve them since they do not take up any more space than a normal space. However, CR (carriage return) is worthless and likely a sign of a buggy mail (or spam) client anyways.
2016-03-03	use raw header for Message-ID
	Message-IDs should not be MIME encoded, but in case they are, use the raw form for compatibility with ssoma and possibly other tools. This prevents a potential problem where a malicious client could confuse our storage layer into indexing incorrect contents.
2016-02-28	searchmsg: update + fix license header
	Not sure how, but this should've always been AGPL-3.0+ like the rest of the code, not GPL-3.0+
2015-11-20	various internal documentation updates
	Hopefully this gives new hackers a better overview of how the components relate to each other.
2015-09-30	nntp: implement OVER/XOVER summary in search document
	The document data of a search message already contains a good chunk of the information needed to respond to OVER/XOVER commands quickly. Expand on that and use the document data to implement OVER/XOVER quickly. This adds a dependency on Xapian being available for nntpd usage, but is probably alright since nntpd is esoteric enough that anybody willing to run nntpd will also want search functionality offered by Xapian. This also speeds up XHDR/HDR with the To: and Cc: headers and :bytes/:lines article metadata used by some clients for header displays and marking messages as read/unread.
2015-09-19	nntp: implement XROVER, speed up XHDR for some cases
	Using Xapian allows us to implement XROVER without forking new processes.
2015-09-06	update copyright headers and email addresses
	In the future, it should be possible to use this: git ls-files \| UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright
2015-09-04	SearchMsg: avoid encoding Message-IDs
	Spaces may be added when using header_str with Email::MIME->create, so use the normal "header" parameter when setting Message-IDs and References.
2015-09-03	search: disable Message-ID compression in Xapian
	We'll continue to compress long Message-IDs in URLs (which we know about), but we will store entire Message-IDs in the Xapian database to facilitate ease-of-lookups in external databases.
2015-09-01	search: reduce redundant doc data
	Redundant document data increases our database size, pull the smsg->mid off the unique term, the smsg->ts off the value, and only generate the formatted display date off smsg->ts.
2015-08-28	search: do not iterate through entire termlist
	A document may have many terms, so this hurts performance if we blindly iterate. Unfortunately, we can't rely on the order of the termlist just yet, either, so we must repeatedly restart the search for now until we're ready to bump schema versions.
2015-08-25	search: implement subject summarization
	We ought to summarize subjects to avoid exploding line lengths in the web interface.
2015-08-25	mid: mid_compressed => mid_compress
	Consistently name mid_* functions as verbs.
2015-08-20	search: preserve References: order in document data
	We need proper ordering of References to thread messages correctly. We would lose this order if we load the terms from the database, so set it directly document data. Do not bother with a separate In-Reply-To, since Mail::Thread just merges the IRT into References. This bumps our schema version once again.
2015-08-16	SearchMsg: ensure metadata for ghost messages mid
	Ghosts have no document data in them. Perhaps we should just rely on terms for Message-ID and avoid storing that in the document data...
2015-08-15	view: display replies in per-message view
	This can be used to quickly scan for replies in a message without displaying an entire thread.
2015-08-15	search: make search results more OO
	This will relieve callers of the need to decode the data we store internally in Xapian
2015-08-13	initial search backend implementation
	This shall allow us to search for replies/threads more easily.