about summary refs log tree commit homepage
path: root/Documentation
DateCommit message (Collapse)
2020-12-12doc: v2-format: drop repeated word
2020-12-12doc: add public-inbox-extindex-format(5) manpage
The CLI tool still needs usability work, and "misc" is still in flux, but the core message indexing part is stable (since it's stolen from v2 :P).
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-11-05doc/standards: add RFCs for URL schemes
We linkify these in the WWW UI, and will support them in other places. These URL schemes may end up being stored in external/detached indices for indexing non-git-based mail stores.
2020-11-04nntp: delimit Newsgroup: header with commas
...instead of spaces. This is specified in RFC 5536 3.1.4. Include references to RFC 1036, 5536 and 5537 in our docs while we're at it. Reported-by: Andrey Melnikov <temnota.am@gmail.com> Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
2020-09-20doc: post-1.6 updates, start 1.7
I should've dropped "PENDING" notes before the 1.6 release; they're dropped now, and a note is added to remind my future self to drop them before 1.7.
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19add gcf2 client and executable script
This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed.
2020-09-18doc: txt2pre: more manpage URLs
We host our own -imapd manpage, and we started using a few more git commands (fast-import for ages). We'll also need to link to manpages.debian.org and live with long URLs for a few non-standard manpages in software we reference.
2020-09-18doc: flow: include -imapd
It's another read-only daemon, and it may see more usage than -nntpd as more users have IMAP support than NNTP.
2020-09-16public-inbox 1.6.0 v1.6.0
2020-09-14doc: TODO and release notes updates ahead of 1.6
Some more things have happened... And drop some items which are too expensive to support, such as automatic mirroring.
2020-09-14doc: Add piem to list of clients
2020-09-02doc: remove B<> (bold) markup from the remaining POD
B<> decreases readability of the POD source and is of dubious usefulness in the man page.
2020-09-02watch: add --help/-h support
And avoid unnecessary POD markup in the man page.
2020-09-02mda+learn: add --help / -h support
"use Getopt::Long" doesn't seem too slow on a hot page cache, and it's probably used frequently enough to be in cache. We'll also start reducing the amount of markup in the .pod and favoring verbatim text in documentation for readability in source form, since the bold text seems excessive.
2020-09-02edit+purge: support `--help' and `-h' like other commands
And while we're at it, note edit is *destructive* to encourage reading the fine manual.
2020-08-31doc: expand on indexBatchSize regarding fragmentation
And change the documentation reference in -tuning to point to the -index manpage while we're at it.
2020-08-28www: improve navigation around contemporary threads
Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-08-28doc: watch: expand on NNTP and IMAP-specific knobs
There's a few more, but maybe they're too esoteric to be worth documenting at the moment (batch sizes, timeouts, etc).
2020-08-27doc: move watch config docs to -watch manpage
The -config manpage is a bit long and the -watch stuff is isolated from the rest of it while we start documenting NNTP and IMAP support. I'm not entirely happy with the way IMAP and NNTP are configured, it's still good enough for small setups. This also fixes a long-standing misplaced comment about `publicinboxwatch.spamcheck' affecting all configured inboxes, that comment was actually for `publicinboxwatch.watchspam'. We'll omit documenting NNTP for `watchspam', for now, given the lack of \Seen flags in NNTP and I'm not sure if it's even useful. There may not be any newsgroups for sharing confirmed spam, either...
2020-08-27doc: speling fickses
2020-08-27doc: document graceful shutdown signals
Same as the read-only daemons.
2020-08-26doc: 1.6.0 release notes update
A few more things happened, here.
2020-08-26doc: add some more tuning notes
I've learned a thing or three about btrfs in the past few weeks and remembered some old HDD things, too. The Xapian MultiDatabase problem will need to be addressed for 1.7...
2020-08-23searchidx: index THREADID in Xapian
This is the `tid' column from over.sqlite3; and will be used for IMAP and JMAP search (among other things).
2020-08-20init+index: support --skip-docdata for Xapian
Since we no longer read document data from Xapian, allow users to opt-out of storing it. This breaks compatibility with previous releases of public-inbox, but gives us a ~1.5% space savings on Xapian storage (and associated I/O and page cache pressure reduction).
2020-08-20init: drop -N alias for --skip-artnum
It may be too easily confused for --newsgroup or --ng. This is too rarely used and never made it into a release, so it should be fine.
2020-08-20init: support --newsgroup option
We can reduce the need to edit the config file for NNTP group names this way.
2020-08-20doc: note -compact and -xcpdb are rarely used
Slowly improving the learning curve...
2020-08-16doc: add public-inbox-tuning(7) manpage
Determining storage device speed and latencies doesn't seem portable or even possible with the wide variety of storage layers in use. This means we need to write a tuning document and hope users read and improve on it :P
2020-08-14index|compact|xcpdb: support --all switch
For -index, this is a convenient way to quickly index all inboxes after a grok-pull. Might as well support it for rarely used commands like -compact and -xcpdb, too.
2020-08-13xcpdb: wire up new index options and --help
--sequential-shard also disables the copy parallelism (--jobs), so it can be useful for systems unable to handle parallel random I/O but still want many shards. There was a missing "use strict", too, which is fixed.
2020-08-10convert: support new -index options
Converting v1 inboxes from v2 can be a painful experience on HDD. Some of the new options in the CLI or config file make it less painful.
2020-08-10index: cleanup internal variables
Move away from hard-to-read alllowercase naming and favor snake_case or separated-by-dashes. We'll keep `--indexlevel' as-is for now, since it's been around for several releases; but we'll support `--index-level' in the CLI and update our documentation in a few months. We'll also clarify that publicInbox.indexMaxSize is only intended for -index, and not -watch or -mda.
2020-08-10admin: use a generic variable name
We parse other options, too, not just --max-size
2020-08-10doc: add some notes around -xcpdb / -edit / -purge
These rarely-used commands have some caveats that needed expanding on.
2020-08-10doc: index: more notes about latest changes
With LKML on an HDD, a giant --batch-size of 500m ends up being pretty useful. I was able to index LKML in ~16 hours on a system that had other activity on it. The big downside was it was eating up over 5g of RAM :x. We'll also fix up a duplicated indexBatchSize section, fix formatting around global vs per-inbox indexSequentialShard, and ensure section 5 manpages are linked correctly.
2020-08-07index: add built-in --help / -?
Eventually, commonly-used commands run by the user will all support --help / -? for user-friendliness. The changes from up-front `use' to lazy `require' speed up `--help' by 3x or so.
2020-08-07index+xcpdb: rename `--no-sync' to `--no-fsync'
We'll continue supporting `--no-sync' even if its yet-to-make it it into a release, but the term `sync' is overloaded in our codebase which may be confusing to new hackers and users. None of our our code nor dependencies issue the sync(2) syscall, either, only fsync(2) and fdatasync(2).
2020-08-07index: v2: --sequential-shard option
This gives better page cache utilization for Xapian indexing on slow storage by improving locality for random I/O activity on the Xapian DB. Instead of doing a single-pass to index both SQLite and Xapian; this indexes them separately. The first pass is identical to indexlevel=basic: it indexes both over.sqlite3 and msgmap.sqlite3. Subsequent passes only operate on a single Xapian shard for documents belonging to that shard. Given enough shards, each individual shard can be made small enough to fit into the kernel page cache and avoid HDD seeks for read activity. Doing rough tests with a busy system with a 7200 RPM HDD with ext4, full indexing of LKML (9 epochs) goes from ~80 hours (-j0) to ~30 hours (-j8) with 16GB RAM with 7 shards configured and fsync(2) disabled (--no-sync) and `--batch-size=10m'.
2020-07-25index+xcpdb: support --no-sync flag
This allows us to speed up indexing operations to SQLite and Xapian. Unfortunately, it doesn't affect operations using `xapian-compact' and the compactor API, since that doesn't seem to support Xapian::DB_NO_SYNC, yet.
2020-07-25index: support --rethread switch to fix old indices
Older versions of public-inbox < 1.3.0 had subtly different semantics around threading in some corner cases. This switch (when combined with --reindex) allows us to fix them by regenerating associations.
2020-07-17doc: add some recommendations around slow HDDs
grok-pull is still painful with serialization on an old USB 2.0 HDD, but at least it can finish with flock(1) and disabling parallelization. While parallel "git fetch" doesn't seem so bad, slow seeks are exacerbated by parallel reads in Xapian. That means some updates can take days instead of hours. The same updates take only seconds or minutes on an SSD.
2020-07-14doc: release notes and version info updates
Update release notes with some features in the 1.6 timeline. We'll note the version availability of some command-line options, it may help users who are reading the latest documentation online but running older versions.
2020-07-10doc: standards: link IMAP capabilities and response codes
We'll be implementing some IMAP search/threading extensions in IMAP and providing analogues over HTTP via JMAP.
2020-07-06doc/technical/whyperl: note Perl 7 announcement
Right now[1] the Perl upstream plan is to maintain 5 compatibility in Perl 7 for at least 5 years[1], and perhaps drop it when Perl 8 comes along. That said, distros may pick it and maintain 5 on their own given the vast amounts of perfectly good legacy code out there. [1] http://nntp.perl.org/group/perl.perl5.porters/257817 [2] http://nntp.perl.org/group/perl.perl5.porters/257565
2020-07-06doc/technical/whyperl: reword bit around installed docs
I originally proposed this rewording to address Leah's comment but forgot to squash it in :x Link: https://public-inbox.org/meta/20200408221741.GA10142@dcvr/ Cc: Leah Neukirchen <leah@vuxu.org>
2020-07-06doc: daemon: update documentation around Inline::C
`~/.cache/public-inbox/inline-c' is supported, nowadays for convenience, but Inline::C usage will remain opt-in.
2020-07-06view: simplify eml_entry callers further
This simplifies the primary callers of eml_entry while only making mknews.perl worse.