public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2019-06-01	git: drop the deleted err_c file
	No reason to leave that (usually) empty file open after killing off "cat-file --batch-check". This wasn't an unbound leak, though, as respawning the --batch-check process would've clobbered the old err_c file.
2019-06-01	git: unconditional expiry
	A constant stream of traffic to either httpd/nntpd would mean git-cat-file processes never expire. Things can go bad after a full repack, as a full repack will unlink old pack indices and git-cat-file does not currently detect unlinked files. We could do something complicated by recursively stat-ing objects/pack of every git directory and alternate; but that's probably not worth the trouble compared to occasionally restarting the cat-file process. So simplify the code and let httpd/nntpd expire them periodically, since spawning a "git-cat-file --batch" process isn't too expensive. We already spawn for every request which hits git-http-backend, cgit, and git-apply. In the future, we may optionally support the Git::Raw module to avoid IPC; but we must remain careful to not leave lingering FDs open to unlinked files after repack.
2019-05-31	TODO: add item for optional Cache::FastMmap
	Taking one step out of setting up a performant deployment could make setup and administration easier (at the cost of installing an extra-but-common XS module). This can also be useful for the day NNTP servers see hug-of-death events.
2019-05-31	viewdiff: avoid repeat variable expansion
	This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620 messages currently in https://public-inbox.org/meta/
2019-05-30	Merge remote-tracking branch 'origin/v2-noop-speedup'
	* origin/v2-noop-speedup: v2writable: short-circuit is_ancestor check on equality v2writable: avoid mm_tmp creation without regen v2writable: hoist out index_epoch sub v2writable: split off unindex_range mapping
2019-05-30	doc/hosted: drop some links and clarify wording
	I don't have time to check and train spam for all these projects. Spam filtering is especially difficult on ruby-core: it enters via Redmine, so it doesn't have a distinct Received: chain, and also gets mixed with non-spam bug-report text, throwing off Bayes training. And I'm not sure if those mirrors did anybody any good, even; so lets not say its' a "service" to anybody :P The actual mirrors remain up, for now, but who knows... I care about decentralization too much to ask anybody to trust me to keep anything up :P
2019-05-30	v2writable: short-circuit is_ancestor check on equality
	We don't need to use git to check ancestry if object IDs match on a string comparison. This saves 100ms or so and brings down the ~0.5s no-op time on lore.kernel.org/lkml down to ~0.4s.
2019-05-30	v2writable: avoid mm_tmp creation without regen
	Creating mm_tmp is an expensive operation with large inboxes and can be avoided if there are no new messages to process. Since git-fetch(1) currently lacks an --exit-code option(), mirrors will run `public-inbox-index' unconditionally after fetch, which is an expensive op if it needs to duplicate a large SQLite DB. This speeds up the mirror case of: git --git-dir=git/$EPOCH.git fetch && public-inbox-index This reduces the no-op `public-inbox-index' time from over 8s to ~0.5s on a (currently) 7-epoch clone of https://lore.kernel.org/lkml/ on my system. () WIP --exit-code for git-fetch: https://public-inbox.org/git/87ftphw7mv.fsf@evledraar.gmail.com/
2019-05-30	v2writable: hoist out index_epoch sub
	This will make future changes easier-to-follow.
2019-05-30	v2writable: split off unindex_range mapping
	It'll make it easier to detect if we have anything to unindex and run git-log on, at all.
2019-05-29	searchidx: store indexlevel=medium as metadata
	And use it from Admin. It's easy to tell what indexlevel=basic is from unconfigured inboxes, but distinguishing between 'medium' and 'full' would require stat()-ing position.* files which is fragile and Xapian-implementation-dependent. So use the metadata facility of Xapian and store it in the main partition so Admin tools can deal better with unconfigured inboxes copied using generic tools like cp(1) or rsync(1).
2019-05-29	index: remove warning on unconfigured inboxes
	It's annoying for people using "git fetch && public-inbox-index" as one user; and running -httpd/-nntpd as a different user (where users see different config files).
2019-05-29	Merge branch 'v2-idx-progress'
	* v2-idx-progress: v2writable: show progress updates for index_sync index: support --verbose option v2writable: move index_sync options to sync state v2writable: use prototypes for internal subs v2writable: localize unindex-range.$EPOCH to $sync state v2writable: move {ranges} into $sync state v2writable: move {regen} into $sync state v2writable: move {reindex} field to $sync state v2writable: sync: move delete markers into $sync state v2writable: introduce $sync state and put mm_tmp in it
2019-05-29	Makefile.PL: enable prove warnings
	We already "use warnings" everywhere, but could miss some spots. This ought to cover that, and usually Perl module authors are consistent about avoiding warnings that we won't clutter our test outputs.
2019-05-29	Makefile.PL: allow `N' variable to be set in local config.mak
	This can useful for limiting test resource use without relying on remembering the variable command-line.
2019-05-29	v2writable: show progress updates for index_sync
	We can show progress whenever we commit changes to the FS.
2019-05-29	index: support --verbose option
	It doesn't implement progress of batches, yet, but it wires up the parsing of the command-line while preserving output compatibility. This output is NOT meant to be stable.
2019-05-29	v2writable: move index_sync options to sync state
	And use singular `opt' to be consistent with the common name of 'getopt'.
2019-05-29	v2writable: use prototypes for internal subs
	Hopefully this improves maintainability by allowing Perl to do some arg checking for us.
2019-05-29	v2writable: localize unindex-range.$EPOCH to $sync state
	We don't need to stuff that into $self (V2Writable) which can be longer-lived than a ->index_sync invocation.
2019-05-29	v2writable: move {ranges} into $sync state
	Yet another temporary variable with no use outside of index_sync.
2019-05-29	v2writable: move {regen} into $sync state
	regen is always enabled for index_sync nowadays (and has been for a while). Rename `index_prepare' to `sync_prepare' to show it's for ->index_sync; and not the online indexing we do for ->add.
2019-05-29	v2writable: move {reindex} field to $sync state
	reindexing info is not used outside of the index_sync code path.
2019-05-29	v2writable: sync: move delete markers into $sync state
	Another small step to reduce parameters passed to reindex_oid.
2019-05-29	v2writable: introduce $sync state and put mm_tmp in it
	A first step towards making the v2 index_sync code easier-to-follow. More fields to follow...
2019-05-27	v2: fix reindex skipping NNTP article numbers
	`public-inbox-index --reindex' could cause NNTP article number gaps to form when it also has to deal with new, never-before-seen commits in mirrors running off `git fetch'. Fix this by running two distinct invocations of ->index_sync; once to only reindex old commits, and a second time to index new commits. This does not appear to be a problem on v1 at the moment, but I'll need more time to analyze this.
2019-05-27	searchidx: fix obvious typo
	We can't pass an empty string to `git merge-base --is-ancestor' AFAIK, this did NOT present issues in the current test suite.
2019-05-27	t/v1reindex.t: fix typo in setting `indexlevel'
	It did not cause a test failure because the default fallback is `indexlevel=full'
2019-05-26	viewvcs: keep temporary Solver dir for large streams
	Streaming large blobs can take multiple iterations of the event loop in our -httpd; so we must not let the File::Temp::Dir result go out-of-scope when streaming large blobs created from patches.
2019-05-25	v2writable: fix assertions around reindexing
	Fix a misspelling and ensure line context is printed by `die' by leaving out the final '\n'. Also, `delete' was pointless.
2019-05-25	contrib/css: mark as CC0 (public domain)
	No reason to copyright colour schemes :P
2019-05-25	v2writable: drop unused $last_commits var
	Apparently it's never been used and we write to msgmap directly.
2019-05-25	t/indexlevels: fix indexlevel of ro_mirror
	Don't hard-code "basic", since we already ran -init with the intended indexlevel.
2019-05-25	msgmap: remove double negative
	I have never not found double negatives to be confusing...
2019-05-24	TODO: more stuff: bundles, synonyms, dogfooding
	git bundles could/should make self-hosting easier. Being able to configure synonym (and spelling) lists would make some searches more useful. Might as well dogfood kernel stuff, too, given the overlap and history between this project, git and the Linux kernel. Would be interesting to have *BSD folks throw their hat in the ring, too. Building/testing userspace stuff is often the most time-consuming, but necessary to ensure future compatibility.
2019-05-24	MANIFEST: add extman.perl
	Oops :x
2019-05-24	doc: add URLs for Xapian manpages
	Since we go through the effort of hosting these manpages, link to them.
2019-05-24	doc: xcpdb: add switch documentation
	In particular, the '--compact' switch is really useful since it works without holding the inbox-wide lock for minutes at a time on giant inboxes (inboxes where copies can take dozens, if not hundreds of minutes).
2019-05-24	doc: generate manpages for some Xapian commands
	They're nowhere to be found on Xapian.org, and links to external services are either too long (for manpages.debian.org) or have privacy-invasive tracking JS on them.
2019-05-24	doc: sync .txt mtime to .pod mtime
	Otherwise timestamps for .html files get screwed up, too; and that hurts caching.
2019-05-24	doc: don't barf on missing `git set-file-times'
	It's not critical, but it's nice to have for cache-friendliness (otherwise I would not have written it :P) I guess I should follow up on getting it into 'git contrib/': https://public-inbox.org/git/20100702033709.GA6818@burratino/
2019-05-24	doc: daemon: fix manpage section for nginx
	The nginx manpage is in section 8.
2019-05-24	doc: index: fix miscapitalization of "SQLite"
	Oops :x
2019-05-24	search: don't log all warnings on retry_reopen
	Some users (or bots :P) can trigger horrible queries which the caller can choose to either log or ignore. This prevents horrible queries from ExtMsg from logging confusing "ref: " messages when $@ is not a Perl reference.
2019-05-23	doc: various updates to reflect current state
	-index documentation avoid redundant v1 information and refers readers to apropriate v1/v2 manpages. Search::Xapian can also be optional, now, as only the PSGI search interface uses it. Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be confused for code repos which we also support. XAPIAN_FLUSH_THRESHOLD is documented for all relevant bulk commands.
2019-05-23	xapcmd: do not reset %SIG until last Xtmpdir is done
	To properly handle compact tmpdir cleanup in single process situations, we need to carefully account for Xtmpdir not being a singleton and ensuring we don't clobber signal handlers which belong to other Xtmpdirs.
2019-05-23	xcpdb\|compact: support --jobs/-j flag like gmake(1)
	We don't have to be tied to the number of partitions in case we made a bad choice at initialization. This doesn't affect reindexing, but the copying phase is already intensive. And optimize away the extra process when we only have a single job which won't parallelize. The wording for the (v2) reindexing phase could be improved, later. I also plan to allow repartitioning of existing Xapian DBs.
2019-05-23	xapcmd: cleanup on interrupted xcpdb "--compact"
	We should not have leftover junk on interrupted invocations.
2019-05-23	xcpdb\|compact: support some xapian-compact switches
	Allow users to specify the --blocksize <B>, --no-full, --fuller options for xapian-compact(1) for fine-tuning compact behavior for low-traffic/inactive inboxes. We also won't support --multipass, since it doesn't seem compatible with our requirement to use --no-renumber. We also won't support --single-file, since it only seems intended for totally dead inboxes; and it doesn't seem worth the support overhead when "totally dead" turns out to be a misdiagnosis.
2019-05-23	compact: reuse infrastructure from xcpdb
	Since -xcpdb is a superset of -compact, we can reuse much of that code used for driving compact. For compact (only), this is slightly less memory efficient since it requires an extra process per-partition, but we get to prefix the output with the partition name for more readable output.