public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2016-09-09	t/httpd-unix: warn about connection failure
	Output $! for diagnostic purposes since I've noticed this on two slow machines, today (and seemingly, never prior).
2016-09-09	search: index attachment filenames
	And while we're at it, ensure searching inside displayable attachment bodies works.
2016-09-09	search: more granular message body searching
	"bs:" and "b:" are adapted from mairix(1) We will also support searching explicitly for quoted vs non-quoted text via "q:" and "nq:" prefixes since sometimes readers will not care for quoted text. In the future, we will support parsing diffs (perhaps when repobrowse integration is complete). Note: this roughly doubles the size of the Xapian database due to the additional information; so this change may not be worth it.
2016-09-09	search: drop longer subject: prefix for search
	We only document the "s:" anyways. While the long name is more descriptive, the ambiguity makes agnostic caching (by Varnish or similar) slightly harder and longer URLs are more likely to be accidentally truncated when shared.
2016-09-09	search: allow searching user fields (To/Cc/From)
	Sometimes it can be useful to search based on who the message was sent to, sent by, or Cc:-ed. Of course, headers can be faked, but they usually are not... Anyways this mostly matches the behavior of mairix(1).
2016-08-18	www: implement generic help text
	Begin documenting some basic help functionality. I may tweak the anchor names of the various HTML endpoints to be more consistent with each other (old ones will be supported for a short while), so I'm not documenting those, for now. This may become part of a builtin key-value store for basic texts, but this probably shouldn't become a wiki engine, either.
2016-08-18	linkify: be stricter about matching RFC 3986
	We're not to-the-letter about percent-encoding, but we should allow all the characters. This is mainly so we can effectively use the link to some Wikipedia pages with parentheses in them: https://en.wikipedia.org/wiki/Atom_(standard) https://en.wikipedia.org/wiki/Git_(software)
2016-08-16	search: add YYYYMMDD search range via "d:" prefix
	This is similar to mairix in that it uses a "d:" prefix; but only takes YYYYMMDD, for now. Using custom date/time parsers via Perl will be much more work: nntp://news.gmane.org/20151005222157.GE5880@survex.com Anyhow, this ought to be more human-friendly than searching by Unix timestamps, but it requires reindexing to take advantage of.
2016-08-15	import: use common address parsing to drop unnecessary quotes
	Not sure why or how I missed this before; but the common address parsing routine we have should be more correct. Add a test to ensure excessively quoted names don't make it through, either.
2016-08-14	www: do not unecessarily escape some chars in paths
	Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-12	www: allow including links to NNTP sites in HTML footer
	Improve the discoverability of NNTP endpoints for users who still know what NNTP is. ==> ~/.public-inbox/config <== ; aliases for the locally-run nntpd can be specified in ; the "publicinbox" section: [publicinbox] nntpserver = nntp://ou63pmih66umazou.onion/ nntpserver = news.public-inbox.org ; NNTPS is not supported natively, yet, ; but one can use haproxy or similar ; nntpserver = nntps://news.public-inbox.invalid/ ; mirrors for specific inboxes may be specified either as full ; NNTP (or NNTPS) URLs, or with the server name only if the ; newsgroup name is specfied for a local NNTP server [publicinbox "git"] ... newsgroup = inbox.a.b.c nntpmirror = nntp://czquwvybam4bgbro.onion/ nntpmirror = hjrcffqmbrq6wope.onion ; there may be a mirror on a different server with a ; different name: nntpmirror = nntp://news.example.com/differently.named.group ; (And I really need to write manpages for all this...)
2016-08-12	config: do not nest multi-value altid arrays
	Oops. We will inevitably need to support multiple altids for a public-inbox one day.
2016-08-11	search: support alt-ID for mapping legacy serial numbers
	For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.
2016-08-09	searchidx: release Xapian FDs before spawning git log
	This will allow us to release and re-acquire Xapian locks due to the lack of FD_CLOEXEC on some FDs.
2016-08-05	http: do not allow bad getline+close responses to kill us
	PSGI applications (like our WWW :P) can fail unpredictability, but lets try to avoid bringing the entire process down when this happens.
2016-07-30	t/config_limiter: fix check for identical Git object
	If we completely undef an object, it is likely possible to have the same scalar address as the original object even if they are different. So keep the same object around and only force creation of the same reference. Tested on Perl 5.14.2 on Debian 7.x wheezy.
2016-07-26	learn: fix uninitialized variable
	Oops :x
2016-07-26	mda: fix address matching in address lists
	This is common when multiple participants are in a thread.
2016-07-09	www: add configurable limiters
	Currently only for git-http-backend use, this allows limiting the number of spawned processes per-inbox or by group, if there are multiple large inboxes amidst a sea of small ones. For example, a "big" repo limiter could be used for big inboxes: which would be shared between multiple repos: [limiter "big"] max = 4 [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.git ; shared limiter with giant: httpbackendmax = big [publicinbox "giant"] address = giant@project.org mainrepo = /path/to/giant.git ; shared limiter with git: httpbackendmax = big ; This is a tiny inbox, use the default limiter with 32 slots: [publicinbox "meta"] address = meta@public-inbox.org mainrepo = /path/to/meta.git
2016-07-09	qspawn: allow configurable limiters
	And bump the default limit to 32 so we match git-daemon behavior. This shall allow us to configure different levels of concurrency for different repositories and prevent clones of giant repos from stalling service to small repos.
2016-07-07	www: remove old footer generation code and normalize new.html
	We now generate all of our HTML using WwwStream which forces us to have consistent headers and footers in the HTML itself. This also makes the search-capable vs search-less installs go to the new.html endpoint to maintain consistency (in case an admin decides to enable Xapian).
2016-07-07	t/git-http-backend: check BSD::Resource availability
	We should not fail tests when this is not available.
2016-07-06	address: attempt to handle comments somewhat
	They're uncommon, fortunately, but we make no attempt to handle nested comments (which would open us up to things like CVE-2015-7686) or use the comment in place of a missing name.
2016-07-02	nntp: respect 3 minute idle time for shutdown
	This avoids breaking clients on graceful shutdown since NNTP responses should usually be quick.
2016-07-02	www: remove Plack::Request dependency entirely
	Lighter and ever-so-slightly faster! Most importantly, this won't do non-obvious stuff behind our backs like trying to parse a POST request body for a query string param.
2016-07-02	inbox: base_url method takes PSGI env hashref instead
	This is lighter and we can work further towards eliminating our Plack::Request dependency entirely.
2016-07-01	address: filter out domain from address-as-name idents
	It seems common for address entries to end up as: "foo@example" <foo@example> Avoid needlessly displaying the domain name in that case.
2016-07-01	t/watch_maildir: quiet down spam check warning
	Probably better than bloating our own API with configurable warning streams and such...
2016-06-30	www_stream: add response wrapper sub
	This encapsulates an entire PSGI response array, hopefully making it easier to generate responses and avoid typos when setting the Content-Type.
2016-06-25	view: safer and optional quoting for --in-reply-to arg
	Angle brackets around the --in-reply-to= arg for git send-email has been optional since git v1.5.3.2, so strip them and make the command-line argument easier-to-type.
2016-06-25	address: remove Address::from_name
	Address::names is sufficient to handle what from_name did.
2016-06-25	address: beef up the module with name list extaction
	We may remove from_name in the future. ...And disallow quotes in email addresses. Technically I believe they're allowed, but they're definitely uncommon and unlikely to show up in legitimate mail.
2016-06-24	watch_maildir: implement optional spam checking
	Mailing lists I watch and mirror may not have the best spam filtering, and an extra layer should not hurt.
2016-06-24	document Filesys::Notify::Simple dependency
	And improve documentation for existing dependencies, too.
2016-06-24	split out spamcheck/spamc to its own module.
	This should hopefully make it easier to try other anti-spam systems (or none at all) in the future.
2016-06-20	inbox: move field population logic to initializer
	Inboxes are normally created by Config, but having the population logic in Inbox should make it easier to mock for testing.
2016-06-20	feed: various object-orientation cleanups
	Favor Inbox objects as our primary source of truth to simplify our code. This increases our coupling with PSGI to make it easier to write tests in the future. A lot of this code was originally designed to be usable standalone without PSGI or CGI at all; but that might increase development effort.
2016-06-19	watch_maildir: tighten up path checks
	Only mark seen messages as spam, otherwise it could be too aggressive and cause problems or over training. We wouldn't want a wayward FIFO ruining our day, either :)
2016-06-19	watch_maildir: spam removal support
	We can support spam removal by watching a special "spam" Maildir, too. We can run public-inbox-learn as a separate step, and that command will be improved to support auto-learning, too.
2016-06-18	watch_maildir: add scan test
	This should be portable despite the intended use of this directory being non-portable.
2016-06-18	spawn: try to keep signals blocked in spawned child
	While we only want to stop our daemons and gracefully destroy subprocesses, it is common for 'Ctrl-C' from a terminal to kill the entire pgroup. Killing an entire pgroup nukes subprocesses like git-upload-pack breaks graceful shutdown on long clones. Make a best effort to ensure git-upload-pack processes are not broken when somebody signals an entire process group. Followup-to: commit 37bf2db81bbbe114d7fc5a00e30d3d5a6fa74de5 ("doc: systemd examples should only kill one process")
2016-06-18	view: introduce WwwStream interface
	This will allow us to commonalize HTML generation in the future and is the start of moving existing HTML generation to a "pull" streaming model (from the existing "push" one). Using the getline/close pull model is superior to the existing $fh->write streaming as it allows us to throttle response generation based on backpressure from slow clients.
2016-06-17	remove dependency on IPC::Run
	We no longer depend on it for the core code, and tests are optional for users. Hopefully this makes this easier-to-install.
2016-06-15	inbox: allow undef return value for base_url
	It should be possible to serve the contents of a public-inbox over NNTP but not HTTP.
2016-06-15	mda: hook up new filter functionality
	This removes the Email::Filter dependency as well as the signature-breaking scrubber code. We now prefer to reject unacceptable messages and grudgingly (and blindly) mirror messages we're not the primary endpoint for.
2016-06-15	emergency: implement new emergency Maildir delivery
	This is transactional and hopefully safer in case we hit SIGSEGV or SIGKILL during processing, as the tmp/ copy will remain on the FS even if DESTROY/END handlers are not called.
2016-06-15	filter: begin work on a new filter API
	This filter API should be independent of Email::Filter and hopefully less intrusive to long running processes.
2016-06-15	mda: precheck no longer depends on Email::Filter
	Email::Filter doesn't offer any functionality we need, here; and our dependency on Email::Filter will gradually be removed since it (and Email::LocalDelivery) seem abandoned and we can have more-fine-grained control by rolling our own Maildir delivery which can work transactionally.
2016-06-15	t/mda: use only Maildir for testing
	Remove mbox tests since mbox is unreliable due to raciness and incompatible implementations. We will drop support for mbox emergency destinations, soon.
2016-06-15	t/mda.t: remove senseless use of Email::Filter
	Totally unnecessary...