about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-08-11treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and it's not worth our time to start. So stop wasting CPU cycles giving the illusion that we'd care about non-*nix pathnames.
2021-08-09lei_xsearch: improve Xapian open failure messages
Displaying $! can help users diagnose resource limit problems such as EMFILE/ENFILE/ENOMEM. $@ is currently useful for XS Search::Xapian and perhaps future versions of the Xapian.pm SWIG bindings.
2021-08-08searchidx: die on Xapian load errors
Xapian bindings may not be installed or be out-of-date w.r.t. the Perl version, improve the visibility of errors in those cases. Cleanup and drop some redundant checks while we're at it. Cc: "Toke Høiland-Jørgensen" <toke@toke.dk> Link: https://public-inbox.org/meta/87k0ky5mbd.fsf@toke.dk/
2021-08-08tests: fix test failures when Xapian is missing
We still support usage without Xapian, so ensure our tests work when Xapian bindings are missing
2021-08-08httpd: set psgi.url_scheme to 'https' for TLS listeners
For users using the native TLS functionality of -httpd (instead of using nginx + Plack::Middleware::ReverseProxy), psgi.url_scheme=http was wrong and would lead to improper redirects.
2021-08-06li2wrap: avoid double-close on Linux::Inotify2 <2.3
LI2Wrap was not working as expected due to the missing bless to override ->DESTROY. This bug showed up in an message check in t/lei-q-remote-import.t Fixes: 7fc6e30aeab9925b ("lei: close inotify FD in forked child")
2021-08-05lei export-kw: workaround race in updating Maildir locations
Inotify updates may simultaneously remove or update the location of a message, so ensure we at least have knowledge of the new location if the old one cannot be updated.
2021-08-04extindex: fix boost with partial runs
Boost relies on knowledge of all inboxes in a given config file to work properly. So while we support indexing a subset of inboxes, we must still account for boost in inboxes we're not indexing. So split internal inbox groups into "known" and "active", where previously we only cared for inboxes which were being actively indexed. Furthermore, boost checks need to be applied when a message arrives in different inboxes across multiple invocations. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210802204058.vscbxs5q7xyolyu2@nitro.local/
2021-08-04extindex: do not over-account for cross-posted messages
Cross-posted messages don't result in massive writes to the Xapian DBs like a completely unseen message would, so stop accounting for their size. This ought to improve performance for heavily cross-posted setups, but --commit-interval still has effect.
2021-08-04lei: close inotify FD in forked child
Linux::Inotify2 2.3+ includes an ->fh method to give us the ability to safely close an FD without hitting EBADF (and automatically use FD_CLOEXEC). We'll still need a new wrapper class (LI2Wrap) to handle it for users of old versions, though. Link: http://lists.schmorp.de/pipermail/perl/2021q3/thread.html
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-07-31admin: index_inbox: drop unnecessary check
No callers pass an unblessed pathname to index_inbox, only Inbox object refs.
2021-07-28listener: maximize listen(2) backlog
This helps avoid errors from script/lei dying on ECONNRESET when a single lei-daemon is serving all tests when run via "make check-run". Instead of using some arbitrary limit, use INT_MAX and let the kernel clamp it (both Linux and FreeBSD do). There's no need to call listen() in LEI.pm, either, since Listener->new takes care of it.
2021-07-28lei: die on ECONNRESET
ECONNRESET should be rare on a private local socket, and if we hit it, it's because we're hitting the listen() limit.
2021-07-28treewide: s/sequential_shard/sequential-shard/g
The underscore variant was never documented and maintaining the difference between the command-line and internal hash is not worth it.
2021-07-25extindex: support --jobs/-j properly on creation for shard count
This wasn't wired up properly, but Xapian appears to suffer from I/O amplification problems as DB shards get larger: https://lists.xapian.org/pipermail/xapian-discuss/2019-February/009727.html <23640.32170.703368.841021@y.dockes.com> Of course, we shouldn't have too many shards, either; because performance problems with too many shards was the entire reason extindex was created: https://lists.xapian.org/pipermail/xapian-discuss/2020-August/009823.html <20200826064728.GA32239@dcvr>
2021-07-25doc: lei-{p2q,rediff}: note implicit --stdin
lei actually uses implicit --stdin everywhere, but I thing these patch-related commands are the most common use of them.
2021-07-25t/lei-watch.t: improve test reliability
On single CPU (and overloaded SMP) systems, we can't rely on inotify in lei-daemon firing before a "lei note-event done" client hits it. So force in a single tick() to ensure the scheduler can yield to lei-daemon and see the inotify wakeup before "lei note-event done" to commit the write.
2021-07-25init: support git <2.30 for "-c KEY=VALUE" args
It turns out `--fixed-value' is a relatively new git-config(1) feature in git 2.30+ (December 2020). So use the quotemeta perlop for now since it seems compatible-enough for POSIX ERE used by git.
2021-07-25lei_mail_sync: locations_for API uses oidbin for comparisons
Favor oidbin use internally to reduce internal memory traffic.
2021-07-25lei_inspect: fix typo
Not sure how this wasn't caught, earlier...
2021-07-25lei_search: favor binary OID comparisons
Reduce memory traffic and code, too.
2021-07-25extsearchidx: favor binary comparison in common case
We'll use 20-byte SHA-1 comparisons instead of 40-byte hex representations for a minor reduction in memory traffic.
2021-07-25extsearchidx: use more appropriate max for dedupe
The over.msgid table may contain ghost Message-IDs and also Message-IDs of deleted spam messages, so over->max isn't a good aproproximation of dedupe progress.
2021-07-25extindex: improve comment around git->async_wait_all
I found myself tempted to remove this, but it appears impossible due to odd messages which have multiple Message-IDs.
2021-07-25extindex: support --dedupe[=MSGID]
Sometimes I just want to dedupe a single Message-ID to test something, and this lets me do it. This patch appears to do what its supposed to. But it also appears to be finding duplicates that were previously missed. That's a good thing, but I wish I understood what seems to be fixed :x I'm not sure why the previous ExtSearchIdx.pm (blob 357312b8) was causing messages to be missed, even, and why this patch seems to fix it... And it's not infinite looping, either. Anyways, before this patch, "-extindex --dedupe" was taking ~5 min to no-op every message (after the initial full --dedupe run which took over a day to run). No-op --dedupes now take just under 2 hours to scan every single cross-posted message for a no-op dedupe. The initial dedupe took nearly 44 hours on my system for <https://yhbt.net/lore/all/> due to SATA-2 TLC SSD latency on 3 gigantic Xapian shards. Running --dedupe with this change seems to prevent /BUG\?.*?not deduplicated properly/ stderr messages from being triggered by View.pm. Current versions of -extindex do not seem susceptible to introducing duplicates.
2021-07-25lei rm-watch: new command to support removing watches
Pretty trivial since it just invokes "git-config". It's mainly intended to make shell completion easier.
2021-07-25lei: avoid SQLite COUNT() for dedupe
SQLite COUNT() is a slow operation that does a full table scan with no conditions. There's no need for it, since lei dedupe only needs to know if it's empty or not to decide between new/ and cur/ for Maildir outputs.
2021-07-25t/lei*: check error messages on failures
I just hit an unreproducible failure in t/lei-p2q.t and lacked $lei_err information to diagnose it. Hopefully this helps track down odd failures in the future.
2021-07-22t/solver_git: use like() to improve error reporting
I hit a test failure here, but haven't been able to reproduce it...
2021-07-22lei: auto-refresh watches in config, cancel missing
This makes behavior less surprising on restarts as we no longer lose state on restarts, so there's no need to manually run "lei add-watch" to re-enable watches. This also allows us to transparently handle changes if somebody edits the lei config file directly or via git-config(1).
2021-07-22lei: start implementing inotify Maildir support
This allows lei to automatically note keyword (message flag) changes made to a Maildir and propagate it into lei/store: lei add-watch --state=tag-ro /path/to/Maildir This doesn't persist across restarts, yet. In the future, it will be applied automatically to "lei q" output Maildirs by default (with an option to disable it). State values of tag-rw, index-<ro|rw>, import-<ro|rw> will all be supported for Maildir. This represents a fairly major internal change that's fairly intrusive, but the whole daemon-oriented design was to facilitate being able to automatically monitor (and propagate) Maildir/IMAP flag changes.
2021-07-22init: allow arbitrary key-values via -c KEY=VALUE
This won't blindly append identical key=values, but allows specifying multiple, different key=value pairs as long as the values are different.
2021-07-22extsearch: support publicinbox.*.boost parameter
This behaves identically the lei external "boost" parameter in prioritizing raw messages for extindex. Relying exclusively on the config file order doesn't work well for mirrors since it's impossible to guarantee config file ordering via grokmirror hooks. Config file ordering remains the default if boost is unconfigured, or in case of ties. Note: I chose the name "boost" rather than "priority" or "rank" since I always get confused by whether higher or lower numbers take precedence when it comes to kernel scheduling. "weight" is also a part of Xapian API terminology, which we currently do not expose to configuration (but may in the future).
2021-07-20httpd: fix SIGHUP by invalidating cache on reload
Since we require separate PublicInbox::HTTPD instances for each listen socket address (in order to support {SERVER_<NAME|PORT>} for PSGI env), the old cache needed to be invalidated on rare app refreshes. SIGHUP has always been broken in -httpd (but not -imapd or -nntpd) due to this cache. Update the daemon documentation and 5.10.1-ize some bits while we're in the area.
2021-07-18config: s/_one_val/get_1/ for public use
We'll be using this in lei for watch configs.
2021-07-08extindex: dedupe: reduce SQLite contention and dirty data
Complex queries causes SQLite to block readers for longer than their retry period. For dedupe, it was also preventing us from making good use of checkpoints due to the query time. With many deduplications, checkpoints are necessary to maintain system health due to having too much data piled up.
2021-07-08extsearchidx: ignore Eml warnings across the board
There's nothing we can do about misformatted emails and headers we get from untrusted sources. They're too noisy and those messages already exist in public-inboxes, anyways, so just keep things quiet so we can spot real problems more easily.
2021-07-06extindex: --gc: avoid SQLite lock conflict on shard cleanup
Xapian shard cleanup only requires read-only access to over.sqlite3, so avoid opening it with read-write access since create_tables will hit lock conflicts on "INSERT OR IGNORE" statements.
2021-07-06extindex: implement --dedupe to fix old extindices
This is intended to fix older indices that had deduplication bugs for matching content. It'll also make dealing with future changes to ContentHash easier since that's never guaranteed stable. It also supports --dry-run to print changes only without making them.
2021-07-06eml: relax warn_ignore regexps for current Email::Address::XS
These seem needed with the data I'm currently working on, but I haven't changed my version of Email::Address::XS since my last Debian stable upgrade (to buster).
2021-07-05lei: drop workers on EOF from clients
Sometimes a user will be bored waiting for a command to finish, so ensure we drop disconnect workers in this case.
2021-07-03lei import: increase flags search batch size, display progress
IMAP flag-only synchronization doesn't fetch entire messages, so we can safely bump the batch size iff a user specified one for full messages to 10000 times that. Since I sometimes wonder why nothing happens for several seconds after starting "lei import $URL", we'll also show some progress during the flag synchronization phase.
2021-07-03lei inspect: help+completion for --dir option
It's the most generic name I could find for it since it can mean so many things...
2021-07-03extsearchidx: extra assertions for deduplication flow
I haven't found any bugs from this (still looking for missed deduplication bugs), and it's a bit shorter and more likely to catch future bugs. Clean up an unnecessary ->{mid} array copy while we're at it, too.
2021-07-01lei inspect: support "mid:" (and "m:") prefix
Using this to track down deduplication failures in -extindex...
2021-07-01lei inspect: support automatic pager in output
All commands which output non-trivial amounts of data to the terminal should support this.
2021-07-01extsearchidx: lock before writing multi-pack-index
This avoids errors from git in case -extindex gets invoked in parallel.
2021-06-30extsearchidx: symlink .rev and .bitmap files into ALL.git
It's possible for these to exist and git can (or may eventually) take advantage of them to speed up functionality which affects us.
2021-06-30searchidx: default BATCH_BYTES to 8MB on 64-bit systems
This default seems closer to reasonable on 64-bit systems which are the norm these days. 32-bit systems gain 48K so it's an even 1 MB, but we need to keep 32-bit systems from using too much since there's still some ancient systems out there with small inboxes.