public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2018-04-02	www: rework query responses to avoid COUNT in SQLite
	In many cases, we do not care about the total number of messages. It's a rather expensive operation in SQLite (Xapian only provides an estimate). For LKML, this brings top-level /$INBOX/ loading time from ~375ms to around 60ms on my system. Days ago, this operation was taking 800-900ms(!) for me before introducing the SQLite overview DB.
2018-03-29	mbox: avoid extracting Message-ID for linkification
	We can avoid a small amount of overhead and use the "preferred" Message-ID based on what is in the SearchMsg object.
2018-03-29	www: remove unnecessary ghost checks
	We do not need to care about ghosts at multiple call sites; they cannot have a {blob} field and we've stored the blob field in Xapian since SCHEMA_VERSION=13.
2018-03-27	view: permalink (per-message) view shows multiple messages
	This needs tests and further refinement, but current tests pass.
2018-03-23	www: $MESSAGE_ID/raw endpoint supports "duplicates"
	Since v2 supports duplicate messages, we need to support looking up different messages with the same Message-Id. Fortunately, our "raw" endpoint has always been mboxrd, so users won't need to change their parsing tools.
2018-02-07	update copyrights for 2018
	Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-12-01	search: allow downloading search results as mbox
	Allowing downloading of all search results as an gzipped mboxrd file can be convenient for some users.
2017-10-04	mbox: support inline filename via Content-Disposition header
	This is hopefully more sensical than "raw" files from resulting downloads.
2017-06-23	mbox: show application/mbox for obfuscated inboxes
	Sigh, yet another place to handle obfuscation for misguided people who expect it. Maybe this will do something to prevent spammers from getting addresses, while still allowing the "curl $URL \| git am" use case to work.
2016-12-10	search: always sort thread results in ascending time order
	This makes life easier for the threading algorithm, as we can use the implied ordering of timestamps to avoid temporary ghosts and resulting container vivication. This would've also allowed us to hide the bug (in most cases) fixed by the patch titled "thread: last Reference always wins", in case that needs to be reverted due to infinite looping.
2016-08-14	www: do not unecessarily escape some chars in paths
	Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-06	mbox: be fair to other HTTP clients
	At least for public-inbox-httpd, this allows us to avoid having a client monopolize one event loop tick of the server for too long. It hurts throughput for the /all.mbox.gz endpoint, but I doubt anybody cares and the latency improvement for other clients would be appreciated. We already do the same fairness thing for HTML pages.
2016-08-04	searchmsg: add git object ID to doc_data
	Doing git tree lookups based on the SHA-1 of the Message-ID is expensive as trees get larger, instead, use the SHA-1 object ID directly. This drastically reduces the amount of time spent in the "git cat-file --batch" process for fetching the /$INBOX/all.mbox.gz endpoint on the ~800MB git@vger.kernel.org mirror This retains backwards compatibility and allows existing indices to be transparently upgraded without performance degradation.
2016-07-09	cleanup some unnecessary use/requires
	Hopefully this can reduce memory overhead for people that use one-shot CGI.
2016-07-02	inbox: base_url method takes PSGI env hashref instead
	This is lighter and we can work further towards eliminating our Plack::Request dependency entirely.
2016-06-24	mbox: reduce small packets for gzipped mboxes
	We want to avoid sending 10 or 20-byte gzip headers as separate TCP packets to reduce syscalls and avoid wasting bandwidth.
2016-06-20	feed: various object-orientation cleanups
	Favor Inbox objects as our primary source of truth to simplify our code. This increases our coupling with PSGI to make it easier to write tests in the future. A lot of this code was originally designed to be usable standalone without PSGI or CGI at all; but that might increase development effort.
2016-06-20	mbox: avoid write dependency for streaming
	Prefer to return strings instead, so Content-Length can be calculated for caching and such.
2016-06-20	mbox: remove feed dependency
	We do not need feed options there (or anywhere, hopefully).
2016-06-19	mbox: set gzip timestamp to the Unix epoch
	This allows consistency between different invocations from roughly the same period and is no worse for caching any any of our existing HTML and Atom feeds. We cannot set the timestamp to the end date since messages may be added to the repository while we are iterating (and this streaming mechanism will pick them up).
2016-05-21	mbox: switch generation over to pull model
	This allows us to easily provide gigantic inboxes with proper backpressure handling for slow clients. It also eliminates public-inbox-httpd and Danga::Socket-specific knowledge from this class, making it easier to follow for those used to generic PSGI applications.
2016-05-15	mbox: support /$INBOX/all.mbox.gz endpoint
	Allows easily downloading the entire archive without special tools. In any case, it's not yet advertised to via HTML until we can test it better. It'll also support range queries in the future to avoid wasting bandwidth.
2016-05-15	mbox: consistent header order when decompressed
	This should make validating the output easier when testing between different servers.
2016-05-06	mbox: sort messages by ascending date
	This allows messages to be read in chronological order when read without a mail client (e.g. with "zcat t.mbox.gz \| less")
2016-04-12	mbox: do not clobber existing archive headers in WWW
	When serving archives, it's more robust to keep existing archive links in one server goes down.
2016-04-11	mbox: unconditionally add trailing newline
	This may be necessary for compatibility with non-mboxrd aware parsers which expect "\nFrom " for everything but the first record.
2015-12-22	rename 'GitCatFile' package to 'Git'
	We'll be using it for more than just cat-file. Adding a `popen' API for internal use allows us to save a bunch of code in other places.
2015-11-20	various internal documentation updates
	Hopefully this gives new hackers a better overview of how the components relate to each other.
2015-10-04	mbox: generate Archived-At, List-Post, List-Archive headers
	Downloaded mboxen can be archived/stored indefinitely, try to make it easy for future archaelogists to find the online archive location.
2015-10-04	mbox: kill Bytes meta-header, too
	It may be present in messages imported from NNTP.
2015-09-30	remove unnecessary fields usage
	It doesn't actually give performance improvements unless we use types with "my", but we don't do that. We'll only continue using fields with Danga::Socket-derived classes where they're required.
2015-09-06	update copyright headers and email addresses
	In the future, it should be possible to use this: git ls-files \| UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright
2015-09-03	get rid of Message-ID compression entirely
	Provide a fallback for legacy SHA-1 messages, but do not advertise shorter URLs anymore for data portability concerns. This fixes a regression introduced in commit 81a9c1b476987d845b340ab9013d26cf4487cb9a ("search: disable Message-ID compression in Xapian") which ended up breaking thread-related endpoints for large Message-IDs, as lookups on the SHA-1 message no longer worked.
2015-08-26	mbox: close file handle for single mbox
	This doesn't seem needed for actual server use, but Plack tests complain about it
2015-08-25	mid: mid_compressed => mid_compress
	Consistently name mid_* functions as verbs.
2015-08-23	cleanup calls to header_obj
	Dereference header_obj only once when performance may be critical, or simplify our code by calling "header" directly on the Email::{Simple,MIME} object if not.
2015-08-23	mbox: clarify our use of the the mboxrd variant
	Commenting it in the From: line seems appropriate and reduces compatibility problems in case a MUA cannot handle trailing comments after the timestamp.
2015-08-23	mbox: use mboxrd quoting rules
	This redundantly quotes >From from to prevent losing information as described by qmail
2015-08-23	.txt links return an mbox instead
	This improves compatibility and allows individual messages to be concatenated into an existing mbox without further modifications. "git format-patch" does something similar (but does not do "From " line escaping(!))
2015-08-22	mbox: support uncompressed mbox
	Some folks may want to view the mbox inline as a string of raw text, when guessing URLs. Let them do this...
2015-08-22	stream HTML views as much as possible
	This should allow progressive rendering on the client and reduce memory usage on the server. Unfortunately XML::Atom::SimpleFeed does not yet support streaming, so we may not use it in the future.
2015-08-21	mbox: drop unnecessary imports
	These are not necessary, anymore
2015-08-21	switch to gzipped mboxes
	Mboxes may be huge, so only support downloading gzipped mboxes to save bandwidth and to get free checksumming. Streaming output means we should not be wasting too much memory on this unless the chosen server sucks.
2015-08-21	mbox: stream entire thread, regardless of size
	Since mbox is usually downloaded, support fetching infinitely large responses via streaming.
2015-08-21	support dumping thread as an mbox
	Some folks may not want to download and install Perl code like ssoma, so allow downloading an mbox containing the entire thread.