public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2017-03-14	view: escape HTML description name
	Otherwise funky filenames can cause HTML injection vulnerabilities (hope you have JavaScript disabled!)
2017-02-14	www: do not unescape PATH_INFO twice
	PSGI specs already require PATH_INFO to be unescaped; so our tests were wrong, too.
2017-02-12	t/mime: quiet warnings for old versions of Email::Simple
	This is fixed in the newest versions of Email::Simple, but not the version in Debian jessie (2.203)
2017-01-26	add filter for Subject: tags
	Some mailing lists add annoying tags into the Subject line which discourages readers from doing proper mail organization on the client side. They also waste precious screen space and attention span. Remove them from our archives to reduce clutter.
2017-01-18	mime: avoid SUPER usage in Email::MIME subclass
	We must call Email::Simple methods directly in our monkey patch for Email::MIME to call the intended method. Using SUPER in our subclass would instead hit a different, unintended method in Email::MIME. Reported-by: Junio C Hamano <gitster@pobox.com> <xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
2017-01-10	introduce PublicInbox::MIME wrapper class
	This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2017-01-07	search: remove subject_summary
	Apparently it never actually got used, and the world seems fine without it, so we can drop it. While we're at it, consider removing our subject_path usage from existence, too. We are not using fancy subject-line based URLs, here.
2017-01-07	config: allow per-inbox nntpserver
	This allows certain inboxes to override the global nntpserver (perhaps under a different domain).
2017-01-07	inbox: eliminate weaken usage entirely
	We can do a better job initializing the data structure so we no longer need to rely on weak references to cleanup when we ditch the config on reload.
2017-01-07	config: always use namespaced "publicinboxlimiter"
	I'm not sure if we'll ever support sharing a config file with other tools, but maybe we will, and "limiter" is too generic.
2016-12-21	searchthread: simplify API and remove needless OO
	This simplifies callers to prevent errors and avoids needless object-orientation in favor of a single procedure call to handle threading and ordering.
2016-12-20	searchmsg: remove ensure_metadata
	Instead, only preload the ->mid field for threading, as we only need ->thread and ->path once in Search->get_thread (but we will need the ->mid field repeatedly). This more than doubles View->load_results performance on according to thread-all on an inbox with over 300K messages.
2016-12-20	tests: add thread-all testing for benchmarking
	I'll be using this to improve message threading performance.
2016-12-17	t/config.t: fix feedmax default
	Oops :x
2016-12-17	feed: support publicinbox.<name>.feedmax
	This allows users to customize by using smaller or larger Atom feeds than the default value of 25 entries.
2016-12-14	t/thread-cycle: no need for Xapian to run this test
	We don't actually use anything from SearchMsg, just the class name.
2016-12-13	nntp: add test case for the "DATE" command
	We may not always use strftime and may implement caching. But for now, just add a test.
2016-12-12	init: preserve permissions of existing config file
	This matches git-config(1) behavior, and implied user intent when it comes to programatically editing files.
2016-12-10	thread: last Reference always wins
	Since we use SearchMsg from Xapian data, we can be assured we do not get self-referential {references} field. However, we may need to be more careful when checking has_descendent for loops, as blindly calling add_child could open us up to that possibility...
2016-12-06	linkify: implement Markdown link compatibility (again)
	Although unescaped parentheses in URLs are technically allowed, they are uncommon. However, Markdown-like syntaxes are unfortunately common for URLs, so we might as well support them. This fixes parentheses detection at sentence endings, as seen in practice on emails.
2016-12-06	Revert "linkify: implement Markdown link compatibility"
	This reverts commit 130d0c4e33c5c73dc69e270fc698735d49e0f159.
2016-12-06	linkify: implement Markdown link compatibility
	Although unescaped parentheses in URLs are technically allowed, they are uncommon. However, Markdown-like syntaxes are unfortunately common for URLs, so we might as well support them.
2016-12-03	atom: switch to getline/close for response bodies
	This will let us stream larger Atom documents bodies without wasting too much memory and reduce the amount of round-trip requests needed to get necessary information. Hopefully clients are using streaming (SAX) parsers, too. This is the final transition in the core public-inbox code to allow migrating to a "pull"-based body streaming scheme which allows a HTTP server to respond appropriately to backpressure from slow clients.
2016-11-26	avoid IO::File for anonymous temporary files
	We do not need to import IO::File into the main programs since Perl 5.8+ supports literal "undef" for generating anonymous temporary file handles.
2016-10-05	t/thread-cycle: test self-referential messages
	Some broken (or malicious) mailers may include a generated Message-ID in its References header, so be prepared for it.
2016-10-05	thread: use hash + array instead of hand-rolled linked list
	This starts to show noticeable performance improvements when attempting to thread over 400 messages; but the improvement may not be measurable with less. However, the resulting code is much shorter and (IMHO) much easier to understand.
2016-10-05	thread: remove Email::Abstract wrapping
	This roughly doubles performance due to the reduction in object creation and abstraction layers.
2016-10-05	thread: remove Mail::Thread dependency
	Introduce our own SearchThread class for threading messages. This should allow us to specialize and optimize away objects in future commits.
2016-09-09	t/httpd-unix: warn about connection failure
	Output $! for diagnostic purposes since I've noticed this on two slow machines, today (and seemingly, never prior).
2016-09-09	search: index attachment filenames
	And while we're at it, ensure searching inside displayable attachment bodies works.
2016-09-09	search: more granular message body searching
	"bs:" and "b:" are adapted from mairix(1) We will also support searching explicitly for quoted vs non-quoted text via "q:" and "nq:" prefixes since sometimes readers will not care for quoted text. In the future, we will support parsing diffs (perhaps when repobrowse integration is complete). Note: this roughly doubles the size of the Xapian database due to the additional information; so this change may not be worth it.
2016-09-09	search: drop longer subject: prefix for search
	We only document the "s:" anyways. While the long name is more descriptive, the ambiguity makes agnostic caching (by Varnish or similar) slightly harder and longer URLs are more likely to be accidentally truncated when shared.
2016-09-09	search: allow searching user fields (To/Cc/From)
	Sometimes it can be useful to search based on who the message was sent to, sent by, or Cc:-ed. Of course, headers can be faked, but they usually are not... Anyways this mostly matches the behavior of mairix(1).
2016-08-18	www: implement generic help text
	Begin documenting some basic help functionality. I may tweak the anchor names of the various HTML endpoints to be more consistent with each other (old ones will be supported for a short while), so I'm not documenting those, for now. This may become part of a builtin key-value store for basic texts, but this probably shouldn't become a wiki engine, either.
2016-08-18	linkify: be stricter about matching RFC 3986
	We're not to-the-letter about percent-encoding, but we should allow all the characters. This is mainly so we can effectively use the link to some Wikipedia pages with parentheses in them: https://en.wikipedia.org/wiki/Atom_(standard) https://en.wikipedia.org/wiki/Git_(software)
2016-08-16	search: add YYYYMMDD search range via "d:" prefix
	This is similar to mairix in that it uses a "d:" prefix; but only takes YYYYMMDD, for now. Using custom date/time parsers via Perl will be much more work: nntp://news.gmane.org/20151005222157.GE5880@survex.com Anyhow, this ought to be more human-friendly than searching by Unix timestamps, but it requires reindexing to take advantage of.
2016-08-15	import: use common address parsing to drop unnecessary quotes
	Not sure why or how I missed this before; but the common address parsing routine we have should be more correct. Add a test to ensure excessively quoted names don't make it through, either.
2016-08-14	www: do not unecessarily escape some chars in paths
	Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-12	www: allow including links to NNTP sites in HTML footer
	Improve the discoverability of NNTP endpoints for users who still know what NNTP is. ==> ~/.public-inbox/config <== ; aliases for the locally-run nntpd can be specified in ; the "publicinbox" section: [publicinbox] nntpserver = nntp://ou63pmih66umazou.onion/ nntpserver = news.public-inbox.org ; NNTPS is not supported natively, yet, ; but one can use haproxy or similar ; nntpserver = nntps://news.public-inbox.invalid/ ; mirrors for specific inboxes may be specified either as full ; NNTP (or NNTPS) URLs, or with the server name only if the ; newsgroup name is specfied for a local NNTP server [publicinbox "git"] ... newsgroup = inbox.a.b.c nntpmirror = nntp://czquwvybam4bgbro.onion/ nntpmirror = hjrcffqmbrq6wope.onion ; there may be a mirror on a different server with a ; different name: nntpmirror = nntp://news.example.com/differently.named.group ; (And I really need to write manpages for all this...)
2016-08-12	config: do not nest multi-value altid arrays
	Oops. We will inevitably need to support multiple altids for a public-inbox one day.
2016-08-11	search: support alt-ID for mapping legacy serial numbers
	For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.
2016-08-09	searchidx: release Xapian FDs before spawning git log
	This will allow us to release and re-acquire Xapian locks due to the lack of FD_CLOEXEC on some FDs.
2016-08-05	http: do not allow bad getline+close responses to kill us
	PSGI applications (like our WWW :P) can fail unpredictability, but lets try to avoid bringing the entire process down when this happens.
2016-07-30	t/config_limiter: fix check for identical Git object
	If we completely undef an object, it is likely possible to have the same scalar address as the original object even if they are different. So keep the same object around and only force creation of the same reference. Tested on Perl 5.14.2 on Debian 7.x wheezy.
2016-07-26	learn: fix uninitialized variable
	Oops :x
2016-07-26	mda: fix address matching in address lists
	This is common when multiple participants are in a thread.
2016-07-09	www: add configurable limiters
	Currently only for git-http-backend use, this allows limiting the number of spawned processes per-inbox or by group, if there are multiple large inboxes amidst a sea of small ones. For example, a "big" repo limiter could be used for big inboxes: which would be shared between multiple repos: [limiter "big"] max = 4 [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.git ; shared limiter with giant: httpbackendmax = big [publicinbox "giant"] address = giant@project.org mainrepo = /path/to/giant.git ; shared limiter with git: httpbackendmax = big ; This is a tiny inbox, use the default limiter with 32 slots: [publicinbox "meta"] address = meta@public-inbox.org mainrepo = /path/to/meta.git
2016-07-09	qspawn: allow configurable limiters
	And bump the default limit to 32 so we match git-daemon behavior. This shall allow us to configure different levels of concurrency for different repositories and prevent clones of giant repos from stalling service to small repos.
2016-07-07	www: remove old footer generation code and normalize new.html
	We now generate all of our HTML using WwwStream which forces us to have consistent headers and footers in the HTML itself. This also makes the search-capable vs search-less installs go to the new.html endpoint to maintain consistency (in case an admin decides to enable Xapian).
2016-07-07	t/git-http-backend: check BSD::Resource availability
	We should not fail tests when this is not available.