Date | Commit message (Collapse) |
|
We linkify these in the WWW UI, and will support them in other
places. These URL schemes may end up being stored in
external/detached indices for indexing non-git-based mail
stores.
|
|
...instead of spaces. This is specified in RFC 5536 3.1.4.
Include references to RFC 1036, 5536 and 5537 in our docs while
we're at it.
Reported-by: Andrey Melnikov <temnota.am@gmail.com>
Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
|
|
I should've dropped "PENDING" notes before the 1.6 release;
they're dropped now, and a note is added to remind my future
self to drop them before 1.7.
|
|
It seems easiest to have a singleton Gcf2Client client object
per daemon worker for all inboxes to use. This reduces overall
FD usage from pipes.
The `public-inbox-gcf2' command + manpage are gone and a `$^X'
one-liner is used, instead. This saves inodes for internal
commands and hopefully makes it easier to avoid mismatched
PERL5LIB include paths (as noticed during development :x).
We'll also make the existing cat-file process management
infrastructure more resilient to BOFHs on process killing
sprees (or in case our libgit2-based code fails on us).
(Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd
won't automatically benefit from this change, and extra
configuration will be required (to be documented later).
|
|
This should be able to replace multiple `git cat-file' for blob
retrieval, but adjustments may be needed.
|
|
We host our own -imapd manpage, and we started using a few more
git commands (fast-import for ages). We'll also need to link to
manpages.debian.org and live with long URLs for a few
non-standard manpages in software we reference.
|
|
It's another read-only daemon, and it may see more usage than
-nntpd as more users have IMAP support than NNTP.
|
|
|
|
Some more things have happened...
And drop some items which are too expensive to support,
such as automatic mirroring.
|
|
|
|
B<> decreases readability of the POD source and is of dubious
usefulness in the man page.
|
|
And avoid unnecessary POD markup in the man page.
|
|
"use Getopt::Long" doesn't seem too slow on a hot page cache,
and it's probably used frequently enough to be in cache.
We'll also start reducing the amount of markup in the .pod and
favoring verbatim text in documentation for readability in
source form, since the bold text seems excessive.
|
|
And while we're at it, note edit is *destructive* to encourage
reading the fine manual.
|
|
And change the documentation reference in -tuning to
point to the -index manpage while we're at it.
|
|
Sometimes it's useful to quickly get to threads and messages
which are contemporaries of the current thread/message being
focused on. This hopefully improves navigation by making:
a) the top line (where $INBOX_DIR/description) is shown
a link to the latest topics in search results and
per-thread/per-message views.
b) providing a link to contemporaries ("~YYYY-MM-DD") at
around the thread overview skeleton area for per-thread
and per-message views
|
|
There's a few more, but maybe they're too esoteric
to be worth documenting at the moment (batch sizes, timeouts, etc).
|
|
The -config manpage is a bit long and the -watch stuff is
isolated from the rest of it while we start documenting NNTP and
IMAP support.
I'm not entirely happy with the way IMAP and NNTP are
configured, it's still good enough for small setups.
This also fixes a long-standing misplaced comment about
`publicinboxwatch.spamcheck' affecting all configured inboxes,
that comment was actually for `publicinboxwatch.watchspam'.
We'll omit documenting NNTP for `watchspam', for now, given the
lack of \Seen flags in NNTP and I'm not sure if it's even
useful. There may not be any newsgroups for sharing confirmed
spam, either...
|
|
|
|
Same as the read-only daemons.
|
|
A few more things happened, here.
|
|
I've learned a thing or three about btrfs in the past few
weeks and remembered some old HDD things, too.
The Xapian MultiDatabase problem will need to be addressed
for 1.7...
|
|
This is the `tid' column from over.sqlite3; and will be used for
IMAP and JMAP search (among other things).
|
|
Since we no longer read document data from Xapian, allow users
to opt-out of storing it.
This breaks compatibility with previous releases of
public-inbox, but gives us a ~1.5% space savings on Xapian
storage (and associated I/O and page cache pressure reduction).
|
|
It may be too easily confused for --newsgroup or --ng. This is
too rarely used and never made it into a release, so it should
be fine.
|
|
We can reduce the need to edit the config file for NNTP group names
this way.
|
|
Slowly improving the learning curve...
|
|
Determining storage device speed and latencies doesn't
seem portable or even possible with the wide variety
of storage layers in use.
This means we need to write a tuning document and hope
users read and improve on it :P
|
|
For -index, this is a convenient way to quickly index all
inboxes after a grok-pull. Might as well support it for
rarely used commands like -compact and -xcpdb, too.
|
|
--sequential-shard also disables the copy parallelism (--jobs),
so it can be useful for systems unable to handle parallel random
I/O but still want many shards.
There was a missing "use strict", too, which is fixed.
|
|
Converting v1 inboxes from v2 can be a painful experience
on HDD. Some of the new options in the CLI or config
file make it less painful.
|
|
Move away from hard-to-read alllowercase naming and favor
snake_case or separated-by-dashes.
We'll keep `--indexlevel' as-is for now, since it's been around
for several releases; but we'll support `--index-level' in the
CLI and update our documentation in a few months.
We'll also clarify that publicInbox.indexMaxSize is only
intended for -index, and not -watch or -mda.
|
|
We parse other options, too, not just --max-size
|
|
These rarely-used commands have some caveats that needed
expanding on.
|
|
With LKML on an HDD, a giant --batch-size of 500m ends up being
pretty useful. I was able to index LKML in ~16 hours on a
system that had other activity on it. The big downside was it
was eating up over 5g of RAM :x.
We'll also fix up a duplicated indexBatchSize section, fix
formatting around global vs per-inbox indexSequentialShard,
and ensure section 5 manpages are linked correctly.
|
|
Eventually, commonly-used commands run by the user will all
support --help / -? for user-friendliness. The changes from
up-front `use' to lazy `require' speed up `--help' by 3x or so.
|
|
We'll continue supporting `--no-sync' even if its yet-to-make it
it into a release, but the term `sync' is overloaded in our
codebase which may be confusing to new hackers and users.
None of our our code nor dependencies issue the sync(2) syscall,
either, only fsync(2) and fdatasync(2).
|
|
This gives better page cache utilization for Xapian indexing on
slow storage by improving locality for random I/O activity on
the Xapian DB.
Instead of doing a single-pass to index both SQLite and Xapian;
this indexes them separately. The first pass is identical to
indexlevel=basic: it indexes both over.sqlite3 and msgmap.sqlite3.
Subsequent passes only operate on a single Xapian shard for
documents belonging to that shard. Given enough shards, each
individual shard can be made small enough to fit into the kernel
page cache and avoid HDD seeks for read activity.
Doing rough tests with a busy system with a 7200 RPM HDD with ext4,
full indexing of LKML (9 epochs) goes from ~80 hours (-j0) to
~30 hours (-j8) with 16GB RAM with 7 shards configured and fsync(2)
disabled (--no-sync) and `--batch-size=10m'.
|
|
This allows us to speed up indexing operations to SQLite
and Xapian.
Unfortunately, it doesn't affect operations using
`xapian-compact' and the compactor API, since that doesn't seem
to support Xapian::DB_NO_SYNC, yet.
|
|
Older versions of public-inbox < 1.3.0 had subtly
different semantics around threading in some corner
cases. This switch (when combined with --reindex)
allows us to fix them by regenerating associations.
|
|
grok-pull is still painful with serialization on an old USB 2.0
HDD, but at least it can finish with flock(1) and disabling
parallelization. While parallel "git fetch" doesn't seem so
bad, slow seeks are exacerbated by parallel reads in Xapian.
That means some updates can take days instead of hours. The
same updates take only seconds or minutes on an SSD.
|
|
Update release notes with some features in the 1.6 timeline.
We'll note the version availability of some command-line
options, it may help users who are reading the latest
documentation online but running older versions.
|
|
We'll be implementing some IMAP search/threading extensions in
IMAP and providing analogues over HTTP via JMAP.
|
|
Right now[1] the Perl upstream plan is to maintain 5 compatibility
in Perl 7 for at least 5 years[1], and perhaps drop it when Perl 8
comes along. That said, distros may pick it and maintain 5 on their
own given the vast amounts of perfectly good legacy code out there.
[1] http://nntp.perl.org/group/perl.perl5.porters/257817
[2] http://nntp.perl.org/group/perl.perl5.porters/257565
|
|
I originally proposed this rewording to address Leah's comment
but forgot to squash it in :x
Link: https://public-inbox.org/meta/20200408221741.GA10142@dcvr/
Cc: Leah Neukirchen <leah@vuxu.org>
|
|
`~/.cache/public-inbox/inline-c' is supported, nowadays
for convenience, but Inline::C usage will remain opt-in.
|
|
This simplifies the primary callers of eml_entry while only making
mknews.perl worse.
|
|
We no longer favor getline+close for streaming PSGI responses
when using public-inbox-httpd. We still support it for other
PSGI servers, though.
|
|
We can save stack space and simplify subroutine calls, here.
|
|
This will make it easier to support asynchronous blob
retrievals. The `$ctx->{nr}' counter is no longer implicitly
supplied since many users didn't care for it, so stack overhead
is slightly reduced.
|