|Date||Commit message (Collapse)|
e226f18934eb7291 modified the lei-q manpage so that each variant of an
option gets a dedicated =item to make L</--xyz> look nicer and to
follow the Perl core documentation. Do the same for the other
Note that this still leaves the variants of an option grouped in one
scenario: when a list of options without descriptions is presented as
a pointer to another location. Splitting the variants in that case
would make it harder for the reader to tell what the distinct options
v2 onions are insecure, deprecated and going away. v3 names are
unfortunately longer and more difficult to remember, but should
be more resistant to attack than v2 ones.
Using "make update-copyrights" after setting GNULIB_PATH in my
In most cases, this ensures users will only have to opt-in to
using -extindex once and won't have to issue extra commands
to keep external indices up-to-date when using
Since we support arbitrary numbers of external indices for
ease-of-development, we'll support repeating "-E"
("--update-extindex=") in case users want to test changes in
I should've dropped "PENDING" notes before the 1.6 release;
they're dropped now, and a note is added to remind my future
self to drop them before 1.7.
And change the documentation reference in -tuning to
point to the -index manpage while we're at it.
I've learned a thing or three about btrfs in the past few
weeks and remembered some old HDD things, too.
The Xapian MultiDatabase problem will need to be addressed
Since we no longer read document data from Xapian, allow users
to opt-out of storing it.
This breaks compatibility with previous releases of
public-inbox, but gives us a ~1.5% space savings on Xapian
storage (and associated I/O and page cache pressure reduction).
For -index, this is a convenient way to quickly index all
inboxes after a grok-pull. Might as well support it for
rarely used commands like -compact and -xcpdb, too.
Move away from hard-to-read alllowercase naming and favor
snake_case or separated-by-dashes.
We'll keep `--indexlevel' as-is for now, since it's been around
for several releases; but we'll support `--index-level' in the
CLI and update our documentation in a few months.
We'll also clarify that publicInbox.indexMaxSize is only
intended for -index, and not -watch or -mda.
We parse other options, too, not just --max-size
With LKML on an HDD, a giant --batch-size of 500m ends up being
pretty useful. I was able to index LKML in ~16 hours on a
system that had other activity on it. The big downside was it
was eating up over 5g of RAM :x.
We'll also fix up a duplicated indexBatchSize section, fix
formatting around global vs per-inbox indexSequentialShard,
and ensure section 5 manpages are linked correctly.
Eventually, commonly-used commands run by the user will all
support --help / -? for user-friendliness. The changes from
up-front `use' to lazy `require' speed up `--help' by 3x or so.
We'll continue supporting `--no-sync' even if its yet-to-make it
it into a release, but the term `sync' is overloaded in our
codebase which may be confusing to new hackers and users.
None of our our code nor dependencies issue the sync(2) syscall,
either, only fsync(2) and fdatasync(2).
This gives better page cache utilization for Xapian indexing on
slow storage by improving locality for random I/O activity on
the Xapian DB.
Instead of doing a single-pass to index both SQLite and Xapian;
this indexes them separately. The first pass is identical to
indexlevel=basic: it indexes both over.sqlite3 and msgmap.sqlite3.
Subsequent passes only operate on a single Xapian shard for
documents belonging to that shard. Given enough shards, each
individual shard can be made small enough to fit into the kernel
page cache and avoid HDD seeks for read activity.
Doing rough tests with a busy system with a 7200 RPM HDD with ext4,
full indexing of LKML (9 epochs) goes from ~80 hours (-j0) to
~30 hours (-j8) with 16GB RAM with 7 shards configured and fsync(2)
disabled (--no-sync) and `--batch-size=10m'.
This allows us to speed up indexing operations to SQLite
Unfortunately, it doesn't affect operations using
`xapian-compact' and the compactor API, since that doesn't seem
to support Xapian::DB_NO_SYNC, yet.
Older versions of public-inbox < 1.3.0 had subtly
different semantics around threading in some corner
cases. This switch (when combined with --reindex)
allows us to fix them by regenerating associations.
grok-pull is still painful with serialization on an old USB 2.0
HDD, but at least it can finish with flock(1) and disabling
parallelization. While parallel "git fetch" doesn't seem so
bad, slow seeks are exacerbated by parallel reads in Xapian.
That means some updates can take days instead of hours. The
same updates take only seconds or minutes on an SSD.
Update release notes with some features in the 1.6 timeline.
We'll note the version availability of some command-line
options, it may help users who are reading the latest
documentation online but running older versions.
On powerful systems, having this option is preferable to
XAPIAN_FLUSH_THRESHOLD due to lock granularity and contention
with other processes (-learn, -mda, -watch).
Setting XAPIAN_FLUSH_THRESHOLD can cause -learn, -mda, and
-watch to get stuck until an epoch is completely processed.
As an established project (:P), it's important to document when
new features appear in manpages. Users may be reading new
documentation online which doesn't reflect an older version they
In normal mail paths, we can rely on MTAs being configured with
reasonable limits in the -watch and -mda mail injection paths.
However, the MTA is bypassed in a git-only delivery path, a BOFH
could inject a large message and DoS users attempting to mirror
This doesn't protect unindexed WWW interfaces from Email::MIME
memory explosions on v1 inboxes. Probably nobody cares about
unindexed WWW interfaces anymore, especially now that Xapian is
optional for indexing.
It's more convenient to specify `-c' / `--compact' on the
command-line when reindexing than it is to invoke
This is especially convenient in low-space situations when
public-inbox-index is operating on multiple inboxes
sequentially, as compaction can happen immediately after
indexing each inbox, instead of waiting until all inboxes are
Since v2 inboxes contain multiple git repositories, avoid the
use of the word "repository" when referring to inboxes as a
whole in most places.
I didn't wait until September to do it, this year!
It's likely a user will be low on space after running --reindex,
so recommend the use of public-inbox-compact afterwards.
And add a few more notes about using public-inbox-compact to
clarify it's for inboxes-only (and not any old Xapian DBs) that
using xapian-compact(1) directly is error-prone and likely to
edit: unlink temporary file when done
v2writable: replace: kill git processes before reindexing
edit: drop unwanted headers before noop check
edit|purge: improve output on rewrites
edit: new tool to perform edits
doc: document the --prune option for -index
admin: expose ->config
AdminEdit: move editability checks from -purge
admin: beef up resolve_inboxes to handle purge options
purge: start moving common options to AdminEdit module
admin: remove warning arg for unconfigured inboxes
v2writable: implement ->replace call
import: switch to "replace_oids" interface for purge
import: extract_author_info becomes extract_commit_info
v2writable: consolidate overview and indexing call
We've had it around for a while, but I forgot to document it :x
-index documentation avoid redundant v1 information and refers
readers to apropriate v1/v2 manpages. Search::Xapian can also
be optional, now, as only the PSGI search interface uses it.
Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be
confused for code repos which we also support.
XAPIAN_FLUSH_THRESHOLD is documented for all relevant
No functional changes, yet, but this makes future changes
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
And include it into the build + website
Hopefully more folks can download and run public-inbox,