public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-02-04	over: simplify read-only vs read-write checking
	No need to call ref() and do a string comparison. Add some extra tests using the {ReadOnly} attribute in DBI.pm.
2019-09-09	run update-copyrights from gnulib for 2019

2019-01-08	over: cull unneeded fields for get_thread
	On a certain ugly /$INBOX/$MESSAGE_ID/T/ endpoint with 1000 messages in the thread, this cuts memory usage from 2.5M to 1.9M (which still isn't great, but it's a start).
2018-08-05	view: distinguish strict and loose thread matches
	The "loose" (Subject:-based) thread matching yields too many hits for some common subjects (e.g. "[GIT] Networking" on LKML) and causes thread skeletons to not show the current messages. Favor strict matches in the query and only add loose matches if there's space. While working on this, I noticed the backwards --reindex walk breaks `tid' on v1 repositories, at least. That bug was hidden by the Subject: match logic and not discovered until now. It will be fixed separately. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2018-06-19	Tweak over.sqlite3 queries for sqlite < 3.8
	The query planner in sqlite3 < 3.8 is not very clever, so when it sees num mentioned in the query filter, it decides not to use the fast idx_ts index and goes for the much slower autoindex. CentOS-7 still has sqlite-3.7, so loading the http landing page of a very large archive (LKML) was taking over 18 seconds, as oppposed to milliseconds on a system with sqlite-3.8 and above: $ time sqlite3 -line over.sqlite3 'SELECT ts,ds,ddd FROM over \ WHERE num > 0 ORDER BY ts DESC LIMIT 1000;' > /dev/null real 0m19.610s user 0m17.805s sys 0m1.805s $ sqlite3 -line over.sqlite3 'EXPLAIN QUERY PLAN SELECT ts,ds,ddd \ FROM over WHERE num > 0 ORDER BY ts DESC LIMIT 1000;' selectid = 0 order = 0 from = 0 detail = SEARCH TABLE over USING INDEX sqlite_autoindex_over_1 (num>?) (~250000 rows) However, if we slightly tweak the query per SQlite recommendations [1] by adding + to the num filter, we force it to use the correct index and see much faster performance: $ time sqlite3 -line over.sqlite3 'SELECT ts,ds,ddd FROM over \ WHERE +num > 0 ORDER BY ts DESC LIMIT 1000;' > /dev/null real 0m0.007s user 0m0.005s sys 0m0.002s $ sqlite3 -line over.sqlite3 'EXPLAIN QUERY PLAN SELECT ts,ds,ddd \ FROM over WHERE +num > 0 ORDER BY ts DESC LIMIT 1000;' selectid = 0 order = 0 from = 0 detail = SCAN TABLE over USING INDEX idx_ts (~1464303 rows) This appears to be the only place where this is needed in order to avoid running into this issue. As far as I can tell, this change has no impact on systems running newer sqlite3 (>= 3.8). .. [1] https://sqlite.org/optoverview.html#disqualifying_where_clause_terms_using_unary_ Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2018-04-18	ensure SQLite and Xapian files respect core.sharedRepository
	We can't have files with permissions inconsistent with what's in git objects.
2018-04-07	over: avoid excessive SELECT
	No need to read what we don't need into the Perl process. Fix some broken capitalization while we're at it.
2018-04-07	psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological
	We only need to call get_thread beyond 1000 messages for fetching entire mboxes. It's probably too much for the HTML display otherwise.
2018-04-06	www: favor reading more from SQLite, and less from Xapian
	Favor simpler internal APIs this time around, this cuts a fair amount of code out and takes another step towards removing Xapian as a dependency for v2 repos.
2018-04-03	mbox: remove remaining OFFSET usage in SQLite
	We can use id_batch in the common case to speed up full mbox retrievals. Gigantic msets are still a problem, but will be fixed in future commits.
2018-04-03	view: avoid offset during pagination
	OFFSET in SQLite gets painful to deal with. Instead, rely on timestamps (from Received:) for pagination. This also sets us up for more precise Date searching in case we want it.
2018-04-03	nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster
	While SQLite is faster than Xapian for some queries we use, it sucks at handling OFFSET. Fortunately, we do not need offsets when retrieving sorted results and can bake it into the query. For inbox.comp.version-control.git (v1 Xapian), XOVER and XHDR are over 20x faster.
2018-04-02	over: speedup get_thread by avoiding JOIN
	JOIN operations on SQLite can be disasterously slow. This reduces per-message pages with the thread overview at the bottom of those pages from over 800ms to ~60ms. In comparison, the v1 code took around 70-80ms using Xapian on my machine.
2018-04-02	www: rework query responses to avoid COUNT in SQLite
	In many cases, we do not care about the total number of messages. It's a rather expensive operation in SQLite (Xapian only provides an estimate). For LKML, this brings top-level /$INBOX/ loading time from ~375ms to around 60ms on my system. Days ago, this operation was taking 800-900ms(!) for me before introducing the SQLite overview DB.
2018-04-02	replace Xapian skeleton with SQLite overview DB
	This ought to provide better performance and scalability which is less dependent on inbox size. Xapian does not seem optimized for some queries used by the WWW homepage, Atom feeds, XOVER and NEWNEWS NNTP commands. This can actually make Xapian optional for NNTP usage, and allow more functionality to work without Xapian installed. Indexing performance was extremely bad at first, but DBI::Profile helped me optimize away problematic queries.