public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2016-10-05	thread: remove Mail::Thread dependency
	Introduce our own SearchThread class for threading messages. This should allow us to specialize and optimize away objects in future commits.
2016-10-05	view: remove "subject dummy" references
	We will not care for inexact threading by subject or pruning.
2016-09-13	help: document new search prefixes
	Support (and document) 'a:' after all, as "mairix -h" uses it, so this should reduce the learning curve for mairix users.
2016-09-09	nntp: cleanup: move use statements out of sub scope
	This clarifies the code somewhat, and we don't care to lazy-load in NNTP.pm anyways since this is only used for a long-lived daemon.
2016-09-09	TODO: updates for done items
	The existing string -> number date range Xapian query is good enough, and having too much flexibility is probably bad for caching (as well as increasing our attack surface, because parsing queries is tricky). Tags-as-skiplists are probably not worth the effort given Xapian, and we may have to import old messages after-the-fact, anyways, and message delivery for mirrors is never orderly. Other items are all done and need to be maintained (like the search engine docs for the mairix-compatibility features that just got pushed out)
2016-09-09	t/httpd-unix: warn about connection failure
	Output $! for diagnostic purposes since I've noticed this on two slow machines, today (and seemingly, never prior).
2016-09-09	search: index attachment filenames
	And while we're at it, ensure searching inside displayable attachment bodies works.
2016-09-09	search: match the behavior of WWW for indexing text
	The basic rule is that if it is displayable via our WWW interface, it should be indexable text for Xapian search.
2016-09-09	search: avoid mindlessly calling body_set
	It's not worth entering a complex codepath in Email::MIME to save some (probably immeasurable amount of) memory, here. We've already stopped doing this in our WWW code a while back, too. If we really cared enough about it, we'd prioritize work on a streaming replacement for Email::MIME.
2016-09-09	search: fix compatibility with Debian wheezy
	Specifying the "d:" field only worked for NumberValueRangeProcessor in older versions of Xapian, such as the one in Debian wheezy (libsearch-xapian-perl=1.2.10.0-1) This slipped through since I rarely use wheezy, anymore, and perhaps nobody else does, either. Perhaps wheezy support may be dropped, soon. Unfortunately, this requires a schema version bump.
2016-09-09	search: increase term positions for each quoted hunk
	We pay a storage cost for storing positional information in Xapian, make good use of it by attempting to preserve it for (hopefully) better search results.
2016-09-09	search: match quote detection behavior of view
	This is stricter than the mutt quote_regexp default ("^([ \t]*[\|>:}#])+" on Debian jessie), but matches what we have in View.pm. I prefer the stricter quote detection since it is less ambiguous and less likely to hide/obscure important details.
2016-09-09	search: fix space regressions from recent changes
	As of Xapian 1.0.4 (from 2007) is possible to use Search::Xapian::QueryParser::add_prefix multiple times with the same user field name but different term prefixes. This brings my current git@vger mirror from 6.5GB to 2.1GB (both sizes are after xapian-compact).
2016-09-09	search: more granular message body searching
	"bs:" and "b:" are adapted from mairix(1) We will also support searching explicitly for quoted vs non-quoted text via "q:" and "nq:" prefixes since sometimes readers will not care for quoted text. In the future, we will support parsing diffs (perhaps when repobrowse integration is complete). Note: this roughly doubles the size of the Xapian database due to the additional information; so this change may not be worth it.
2016-09-09	search: drop longer subject: prefix for search
	We only document the "s:" anyways. While the long name is more descriptive, the ambiguity makes agnostic caching (by Varnish or similar) slightly harder and longer URLs are more likely to be accidentally truncated when shared.
2016-09-09	search: allow searching user fields (To/Cc/From)
	Sometimes it can be useful to search based on who the message was sent to, sent by, or Cc:-ed. Of course, headers can be faked, but they usually are not... Anyways this mostly matches the behavior of mairix(1).
2016-09-08	import: run "git gc --auto" when done
	We need to prevent excessive repository growth for public-inbox-watch and public-inbox-mda users.
2016-09-08	import: hoist out common run_die subroutine
	We will be reusing this in the next commit, too.
2016-09-08	doc: document PERL_INLINE_DIRECTORY usage
	For now, we will document this since it allows better performance without the burden of extensions. Perhaps one day far in the future Perl can natively support vfork(2) AND that version of Perl will be widely available, but I suspect that day is at least a decade away, if not two: https://rt.perl.org/Ticket/Display.html?id=128227
2016-09-08	import: hoist out _check_path function
	This reduces duplication, slightly. We may be using it yet again in a to-be-introduced function (or we may not introduce it).
2016-09-08	view: handle missing Content-Type in message
	Email::MIME internally assumes "text/plain" for messages missing a Content-Type, but does not expose that in the Email::MIME::content_type API method. We must assume it ourselves to avoid uninitialized value warnings for the rare (nowadays) MUAs which do not set it.
2016-09-07	doc: flesh out public-inbox-index documentation
	And include it into the build + website
2016-09-07	doc: new docs for user-level commands
	Hopefully more folks can download and run public-inbox, nowadays.
2016-09-02	config: use "publicinboxlimiter" prefix
	Just having "limiter" in the prefix may confuse it with something else. Use the full prefix to avoid this confusion.
2016-09-02	init: enable pack bitmaps by default
	We want to encourage users to serve repositories. So enable bitmaps by default so performance suffers less with smart HTTP.
2016-09-01	watch: use "publicinboxwatch" namespace
	We'll keep supporting "publicinboxlearn" indefinitely, but "publicinboxwatch" is probably more appropriate at the moment. Noticed while writing documentation.
2016-08-31	doc: set release and section properly for manpages
	This will be important as we will have more of them.
2016-08-31	txt2pre: allow overriding title via env
	This will allow reasonable titles to be generated for manpages.
2016-08-31	txt2pre: use public-inbox internal APIs
	Since this is bundled with the source, we might as well use internal APIs to avoid having duplicate code (and bugs :P)
2016-08-23	www: give tor2web some exposure, too
	Not everybody can run Tor, hopefully more can use Tor2web even if it compromises their privacy. This should help make system more resilient for users unable to use Tor.
2016-08-21	doc: avoid conflicting with MakeMaker variable names
	We want the pod2man(1) executable for handling certain options. Also, use the correct year while we're at it :P
2016-08-21	avoid spaces after shell redirection operators
	This makes us closer to git.git style (though I'm not quite sure why we do this...)
2016-08-21	doc: mda: remove vestigial pandoc comment
	We use perlpod nowadays since it's Perl, like our code base.
2016-08-21	README: add link to source code mirrors
	Centralization sucks, so we mirror everything.
2016-08-18	searchview: link to internal help text
	The internal help text links to the Xapian query parser documentation anyways, but also provides information on which prefixes exist.
2016-08-18	www: implement generic help text
	Begin documenting some basic help functionality. I may tweak the anchor names of the various HTML endpoints to be more consistent with each other (old ones will be supported for a short while), so I'm not documenting those, for now. This may become part of a builtin key-value store for basic texts, but this probably shouldn't become a wiki engine, either.
2016-08-18	linkify: be stricter about matching RFC 3986
	We're not to-the-letter about percent-encoding, but we should allow all the characters. This is mainly so we can effectively use the link to some Wikipedia pages with parentheses in them: https://en.wikipedia.org/wiki/Atom_(standard) https://en.wikipedia.org/wiki/Git_(software)
2016-08-18	view: try assuming UTF-8 for bogus charsets
	For some reason, Alpine will set X-UNKNOWN for valid UTF-8. Since we favor UTF-8 HTML anyways, try forcing Email::MIME to handle text/plain as UTF-8 which might show up better. At least this change renders <alpine.DEB.2.20.1608131214070.4924@virtualbox> properly by showing "•" (•) instead of "â ¢" (â¢) Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
2016-08-18	view: try to display bogus charsets for text/plain
	Alpine seems to set charset=X-UNKNOWN for valid UTF-8 text, which causes Email::MIME::body_str to fail as X-UNKNOWN is not a valid encoding. So, blindly display the body as plain-text but warn users about possibly mangled text. Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
2016-08-18	view: attach_link uses string concatentation
	There is no point in using an array to join on an empty string (my original intention was probably to join on "\n"). This is only preparation for the next change to show a warning to in the attachment link.
2016-08-16	search: add YYYYMMDD search range via "d:" prefix
	This is similar to mairix in that it uses a "d:" prefix; but only takes YYYYMMDD, for now. Using custom date/time parsers via Perl will be much more work: nntp://news.gmane.org/20151005222157.GE5880@survex.com Anyhow, this ought to be more human-friendly than searching by Unix timestamps, but it requires reindexing to take advantage of.
2016-08-16	search: drop pointless range processors for Unix timestamp
	The Unix timestamp isn't meaningful for users searching, we will start indexing the YYYYMMDD date stamp which may use StringValueRangeProcessor, instead.
2016-08-16	HACKING: minor updates and add to the website
	Also, at least add one of the Tor mirrors (the rest will be discoverable through the mirrors themselves).
2016-08-15	import: use common address parsing to drop unnecessary quotes
	Not sure why or how I missed this before; but the common address parsing routine we have should be more correct. Add a test to ensure excessively quoted names don't make it through, either.
2016-08-14	TODO: updates based on git@vger mirror experience
	Plenty more to do!
2016-08-14	www: do not double-clean Message-IDs from internal DBs
	Ensure we usually strip one level of '<>' from Message-IDs, since our internal SQLite, Xapian, and SHA-1 storage all assume that. Realistically, we screw up if somebody has '<<' or '>>', but those are screwed up mail clients and we can deal with it another time. Currently, this means some messages with '>>' in References or Message-Id are not handled correctly, yet, but we match the behavior of Mail::Thread in keeping the extra '>'.
2016-08-14	www: do not unecessarily escape some chars in paths
	Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-14	www: ensure XML validity for some odd ASCII chars
	I've seen 0x1b (\e) in at least one message and some other possibly non-printable chars. In any case, make sure they're valid XML with us-ascii encoding as far as xmlstarlet(1) thinks so.
2016-08-14	mid: no wide characters for sha1_hex
	Apparently there are some really screwed up In-Reply-To fields out there.
2016-08-14	search: gracefully handle lookup_message failure
	We can't blindly assume a ghost even exists in the DB, as the rules can change internally for some corner-case Message-IDs.