public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2016-12-17	feed: support publicinbox.<name>.feedmax
	This allows users to customize by using smaller or larger Atom feeds than the default value of 25 entries.
2016-12-14	t/thread-cycle: no need for Xapian to run this test
	We don't actually use anything from SearchMsg, just the class name.
2016-12-13	nntp: add test case for the "DATE" command
	We may not always use strftime and may implement caching. But for now, just add a test.
2016-12-12	init: preserve permissions of existing config file
	This matches git-config(1) behavior, and implied user intent when it comes to programatically editing files.
2016-12-10	thread: last Reference always wins
	Since we use SearchMsg from Xapian data, we can be assured we do not get self-referential {references} field. However, we may need to be more careful when checking has_descendent for loops, as blindly calling add_child could open us up to that possibility...
2016-12-06	linkify: implement Markdown link compatibility (again)
	Although unescaped parentheses in URLs are technically allowed, they are uncommon. However, Markdown-like syntaxes are unfortunately common for URLs, so we might as well support them. This fixes parentheses detection at sentence endings, as seen in practice on emails.
2016-12-06	Revert "linkify: implement Markdown link compatibility"
	This reverts commit 130d0c4e33c5c73dc69e270fc698735d49e0f159.
2016-12-06	linkify: implement Markdown link compatibility
	Although unescaped parentheses in URLs are technically allowed, they are uncommon. However, Markdown-like syntaxes are unfortunately common for URLs, so we might as well support them.
2016-12-03	atom: switch to getline/close for response bodies
	This will let us stream larger Atom documents bodies without wasting too much memory and reduce the amount of round-trip requests needed to get necessary information. Hopefully clients are using streaming (SAX) parsers, too. This is the final transition in the core public-inbox code to allow migrating to a "pull"-based body streaming scheme which allows a HTTP server to respond appropriately to backpressure from slow clients.
2016-11-26	avoid IO::File for anonymous temporary files
	We do not need to import IO::File into the main programs since Perl 5.8+ supports literal "undef" for generating anonymous temporary file handles.
2016-10-05	t/thread-cycle: test self-referential messages
	Some broken (or malicious) mailers may include a generated Message-ID in its References header, so be prepared for it.
2016-10-05	thread: use hash + array instead of hand-rolled linked list
	This starts to show noticeable performance improvements when attempting to thread over 400 messages; but the improvement may not be measurable with less. However, the resulting code is much shorter and (IMHO) much easier to understand.
2016-10-05	thread: remove Email::Abstract wrapping
	This roughly doubles performance due to the reduction in object creation and abstraction layers.
2016-10-05	thread: remove Mail::Thread dependency
	Introduce our own SearchThread class for threading messages. This should allow us to specialize and optimize away objects in future commits.
2016-09-09	t/httpd-unix: warn about connection failure
	Output $! for diagnostic purposes since I've noticed this on two slow machines, today (and seemingly, never prior).
2016-09-09	search: index attachment filenames
	And while we're at it, ensure searching inside displayable attachment bodies works.
2016-09-09	search: more granular message body searching
	"bs:" and "b:" are adapted from mairix(1) We will also support searching explicitly for quoted vs non-quoted text via "q:" and "nq:" prefixes since sometimes readers will not care for quoted text. In the future, we will support parsing diffs (perhaps when repobrowse integration is complete). Note: this roughly doubles the size of the Xapian database due to the additional information; so this change may not be worth it.
2016-09-09	search: drop longer subject: prefix for search
	We only document the "s:" anyways. While the long name is more descriptive, the ambiguity makes agnostic caching (by Varnish or similar) slightly harder and longer URLs are more likely to be accidentally truncated when shared.
2016-09-09	search: allow searching user fields (To/Cc/From)
	Sometimes it can be useful to search based on who the message was sent to, sent by, or Cc:-ed. Of course, headers can be faked, but they usually are not... Anyways this mostly matches the behavior of mairix(1).
2016-08-18	www: implement generic help text
	Begin documenting some basic help functionality. I may tweak the anchor names of the various HTML endpoints to be more consistent with each other (old ones will be supported for a short while), so I'm not documenting those, for now. This may become part of a builtin key-value store for basic texts, but this probably shouldn't become a wiki engine, either.
2016-08-18	linkify: be stricter about matching RFC 3986
	We're not to-the-letter about percent-encoding, but we should allow all the characters. This is mainly so we can effectively use the link to some Wikipedia pages with parentheses in them: https://en.wikipedia.org/wiki/Atom_(standard) https://en.wikipedia.org/wiki/Git_(software)
2016-08-16	search: add YYYYMMDD search range via "d:" prefix
	This is similar to mairix in that it uses a "d:" prefix; but only takes YYYYMMDD, for now. Using custom date/time parsers via Perl will be much more work: nntp://news.gmane.org/20151005222157.GE5880@survex.com Anyhow, this ought to be more human-friendly than searching by Unix timestamps, but it requires reindexing to take advantage of.
2016-08-15	import: use common address parsing to drop unnecessary quotes
	Not sure why or how I missed this before; but the common address parsing routine we have should be more correct. Add a test to ensure excessively quoted names don't make it through, either.
2016-08-14	www: do not unecessarily escape some chars in paths
	Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-12	www: allow including links to NNTP sites in HTML footer
	Improve the discoverability of NNTP endpoints for users who still know what NNTP is. ==> ~/.public-inbox/config <== ; aliases for the locally-run nntpd can be specified in ; the "publicinbox" section: [publicinbox] nntpserver = nntp://ou63pmih66umazou.onion/ nntpserver = news.public-inbox.org ; NNTPS is not supported natively, yet, ; but one can use haproxy or similar ; nntpserver = nntps://news.public-inbox.invalid/ ; mirrors for specific inboxes may be specified either as full ; NNTP (or NNTPS) URLs, or with the server name only if the ; newsgroup name is specfied for a local NNTP server [publicinbox "git"] ... newsgroup = inbox.a.b.c nntpmirror = nntp://czquwvybam4bgbro.onion/ nntpmirror = hjrcffqmbrq6wope.onion ; there may be a mirror on a different server with a ; different name: nntpmirror = nntp://news.example.com/differently.named.group ; (And I really need to write manpages for all this...)
2016-08-12	config: do not nest multi-value altid arrays
	Oops. We will inevitably need to support multiple altids for a public-inbox one day.
2016-08-11	search: support alt-ID for mapping legacy serial numbers
	For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.
2016-08-09	searchidx: release Xapian FDs before spawning git log
	This will allow us to release and re-acquire Xapian locks due to the lack of FD_CLOEXEC on some FDs.
2016-08-05	http: do not allow bad getline+close responses to kill us
	PSGI applications (like our WWW :P) can fail unpredictability, but lets try to avoid bringing the entire process down when this happens.
2016-07-30	t/config_limiter: fix check for identical Git object
	If we completely undef an object, it is likely possible to have the same scalar address as the original object even if they are different. So keep the same object around and only force creation of the same reference. Tested on Perl 5.14.2 on Debian 7.x wheezy.
2016-07-26	learn: fix uninitialized variable
	Oops :x
2016-07-26	mda: fix address matching in address lists
	This is common when multiple participants are in a thread.
2016-07-09	www: add configurable limiters
	Currently only for git-http-backend use, this allows limiting the number of spawned processes per-inbox or by group, if there are multiple large inboxes amidst a sea of small ones. For example, a "big" repo limiter could be used for big inboxes: which would be shared between multiple repos: [limiter "big"] max = 4 [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.git ; shared limiter with giant: httpbackendmax = big [publicinbox "giant"] address = giant@project.org mainrepo = /path/to/giant.git ; shared limiter with git: httpbackendmax = big ; This is a tiny inbox, use the default limiter with 32 slots: [publicinbox "meta"] address = meta@public-inbox.org mainrepo = /path/to/meta.git
2016-07-09	qspawn: allow configurable limiters
	And bump the default limit to 32 so we match git-daemon behavior. This shall allow us to configure different levels of concurrency for different repositories and prevent clones of giant repos from stalling service to small repos.
2016-07-07	www: remove old footer generation code and normalize new.html
	We now generate all of our HTML using WwwStream which forces us to have consistent headers and footers in the HTML itself. This also makes the search-capable vs search-less installs go to the new.html endpoint to maintain consistency (in case an admin decides to enable Xapian).
2016-07-07	t/git-http-backend: check BSD::Resource availability
	We should not fail tests when this is not available.
2016-07-06	address: attempt to handle comments somewhat
	They're uncommon, fortunately, but we make no attempt to handle nested comments (which would open us up to things like CVE-2015-7686) or use the comment in place of a missing name.
2016-07-02	nntp: respect 3 minute idle time for shutdown
	This avoids breaking clients on graceful shutdown since NNTP responses should usually be quick.
2016-07-02	www: remove Plack::Request dependency entirely
	Lighter and ever-so-slightly faster! Most importantly, this won't do non-obvious stuff behind our backs like trying to parse a POST request body for a query string param.
2016-07-02	inbox: base_url method takes PSGI env hashref instead
	This is lighter and we can work further towards eliminating our Plack::Request dependency entirely.
2016-07-01	address: filter out domain from address-as-name idents
	It seems common for address entries to end up as: "foo@example" <foo@example> Avoid needlessly displaying the domain name in that case.
2016-07-01	t/watch_maildir: quiet down spam check warning
	Probably better than bloating our own API with configurable warning streams and such...
2016-06-30	www_stream: add response wrapper sub
	This encapsulates an entire PSGI response array, hopefully making it easier to generate responses and avoid typos when setting the Content-Type.
2016-06-25	view: safer and optional quoting for --in-reply-to arg
	Angle brackets around the --in-reply-to= arg for git send-email has been optional since git v1.5.3.2, so strip them and make the command-line argument easier-to-type.
2016-06-25	address: remove Address::from_name
	Address::names is sufficient to handle what from_name did.
2016-06-25	address: beef up the module with name list extaction
	We may remove from_name in the future. ...And disallow quotes in email addresses. Technically I believe they're allowed, but they're definitely uncommon and unlikely to show up in legitimate mail.
2016-06-24	watch_maildir: implement optional spam checking
	Mailing lists I watch and mirror may not have the best spam filtering, and an extra layer should not hurt.
2016-06-24	document Filesys::Notify::Simple dependency
	And improve documentation for existing dependencies, too.
2016-06-24	split out spamcheck/spamc to its own module.
	This should hopefully make it easier to try other anti-spam systems (or none at all) in the future.
2016-06-20	inbox: move field population logic to initializer
	Inboxes are normally created by Config, but having the population logic in Inbox should make it easier to mock for testing.