about summary refs log tree commit homepage
path: root/t
DateCommit message (Collapse)
2021-09-03lei: dump errors to syslog, and not to CLI
Dumping errors from the previous run can often get lost, so just spew to syslog since it's a standard place to put errors that don't make it to a client. Note: we don't rely on $SIG{__WARN__} since some of the Net:: stuff will write directly to STDERR (as will external processes).
2021-09-02tests: "make check-run" favors reliability over speed
Sharing a single lei-daemon across multiple processes still exhibits reliability problems, and reliably checking lei-daemon's inotify internals seems impossible without. Even without lei-daemon sharing, "make check-run" is a few seconds faster than "make check" for me.
2021-09-02t/lei-auto-watch: improve test reliability
On slower systems, even a 100ms delay may not be enough; so loop and retry in hopes of an early exit for faster systems.
2021-09-02lei: propagate keyword changes from lei/store
This works with existing inotify/EVFILT_VNODE functionality to propagate changes made from one Maildir to another Maildir. I chose the lei/store worker process to handle this since propagating changes back into lei-daemon on a massive scale could lead to dead-locking while both processes are attempting to write to each other. Eliminating IPC overhead is a nice side effect, but could hurt performance if Maildirs are slow. The code for "lei export-kw" is significantly revamped to match the new code used in the "lei/store" daemon. It should be more correct w.r.t. corner-cases and stale entries, but perhaps better tests need to be written. squashed: t/lei-auto-watch: increase delay for FreeBSD kevent My FreeBSD VM seems to need longer for this test than inotify under Linux, likely because the kevent support code needs to be more complicated.
2021-09-02lei_mail_sync: do not use transactions
For lei-index to work in parallel with MUA access and upcoming inotify-based updates, mail_sync.sqlite3 needs to always be up-to-date to read-only worker processes (ahead of everything else). So rely on the default auto-commit behavior and hope SQLite WAL can reduce some of the overheads involved with writes.
2021-09-01extindex: --gc removes messages from over, too
While messages from removed inboxes were removed from Xapian search, --gc failed to remove messages from over.sqlite3 entirely. They no longer show up in the topic summary view. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/20210830201723.dehoul4y6gpqf2cp@nitro.local/
2021-08-31lei_mail_sync: set_src uses binary OIDs
Another step towards moving more of our internals to use binary OIDs to avoid needless conversions before hitting disk.
2021-08-31t/lei-watch: avoid race between glob + readlink
Open file handles in lei-daemon may be unstable so we need to account for readlink() returning undef.
2021-08-30www: move mirror instructions to /text/
This makes the mirroring and code retrieval instructions less obstructive. Relying on WwwText means we only use our Linkify module to make hrefs of full URLs; making relative and shortened hrefs off-limits; hopefully this isn't too much of a problem. coderepo information remains duplicated on every page since (IMHO) coderepos are an important feature; but nobody besides me has ever bothered to configure coderepos, so I suppose it's fine... Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210826132747.6gxuwnhftyf7c6hp@nitro.local/
2021-08-28www_listing: fix odd "locate inbox" cases
Searching inboxes with an empty query no longer gives 500 errors due to Xapian. Also, improve the error message when no inboxes match, since saying no inboxes exist yet is wrong.
2021-08-28www_listing: show ->ALL at top of HTML listing
It's a special case and we can show it in the HTML display without affecting manifest.js.gz generation.
2021-08-28move ->ids_after from mm to over
Since we favor ->over in WWW and IMAP, move this method to ->over to reduce open files in common cases. This fixes the /$EXTINDEX_NAME/all.mbox.gz endpoint for extindex entries (which may get expensive...).
2021-08-28www_text: fix example config snippet for extindex
extindex doesn't use the same config stuff as normal "publicinbox" entries, so we'll need a separate function for them.
2021-08-28get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd was the raw $INBOX_DIR/description text, which is now converted to octets. All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode, so length() and bytes::length() are equivalent on reads. For socket writes, any non-octet data would warn about wide characters and we are strict in warnings with test_httpd. All gzipped buffers are also octets, as is PublicInbox::Eml->body, and anything from PerlIO objects ("git cat-file --batch" output, filesystems), so bytes::length was unnecessary in all those places.
2021-08-19lei q: make --save the default
Since "lei up" is more often useful than not and incurs neglible overhead; enable --save by default and allow --no-save to work. This also fixes a long-standing when overwriting --output destinations with saved searches: dedupe data from previous searches are reset and no longer influences the new (changed) search, so results no longer go missing if two sequential invocations of "lei q --save" point to the same --output.
2021-08-17view: remove mbox.gz and Atom from topic view
This declutters the topic view since these links seem rarely used. Atom and mbox.gz links probably make most sense when users have read the HTML and decide the topic is worth following or downloading. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210816154444.sj3ks2sikq3x2ywx@nitro.local/
2021-08-16t/run.perl: fix "make check-run" on FreeBSD 11.x
Persistent lei-daemon still leads to ECONNRESET client errors on FreeBSD, and maxing out the kern.ipc.soacceptqueue sysctl (as documented in the FreeBSD listen(2) manpage) doesn't seem to help. "make check-run" is still 4-5s faster than "make check" on my FreeBSD VM even after this change, so it's still a worthwhile improvement.
2021-08-11lei: attempt to canonicalize away "/../" pathnames
As documented, File::Spec->canonpath does not canonicalize "/../". While we want to do our best to preserve symlinks in pathnames, leaving "/../" can mislead our inotify|kqueue usage.
2021-08-11lei_saved_search: canonicalized relative save paths
Storing relative paths with '..' in them can be expensive to resolve when running 'lei up', so prefer storing canonicalized absolute paths. We only do this for paths with '..' in them, though, since this can lose symlink info.
2021-08-11treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and it's not worth our time to start. So stop wasting CPU cycles giving the illusion that we'd care about non-*nix pathnames.
2021-08-08tests: fix test failures when Xapian is missing
We still support usage without Xapian, so ensure our tests work when Xapian bindings are missing
2021-08-08httpd: set psgi.url_scheme to 'https' for TLS listeners
For users using the native TLS functionality of -httpd (instead of using nginx + Plack::Middleware::ReverseProxy), psgi.url_scheme=http was wrong and would lead to improper redirects.
2021-08-04extindex: fix boost with partial runs
Boost relies on knowledge of all inboxes in a given config file to work properly. So while we support indexing a subset of inboxes, we must still account for boost in inboxes we're not indexing. So split internal inbox groups into "known" and "active", where previously we only cared for inboxes which were being actively indexed. Furthermore, boost checks need to be applied when a message arrives in different inboxes across multiple invocations. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210802204058.vscbxs5q7xyolyu2@nitro.local/
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-07-25extindex: support --jobs/-j properly on creation for shard count
This wasn't wired up properly, but Xapian appears to suffer from I/O amplification problems as DB shards get larger: https://lists.xapian.org/pipermail/xapian-discuss/2019-February/009727.html <23640.32170.703368.841021@y.dockes.com> Of course, we shouldn't have too many shards, either; because performance problems with too many shards was the entire reason extindex was created: https://lists.xapian.org/pipermail/xapian-discuss/2020-August/009823.html <20200826064728.GA32239@dcvr>
2021-07-25t/lei-watch.t: improve test reliability
On single CPU (and overloaded SMP) systems, we can't rely on inotify in lei-daemon firing before a "lei note-event done" client hits it. So force in a single tick() to ensure the scheduler can yield to lei-daemon and see the inotify wakeup before "lei note-event done" to commit the write.
2021-07-25lei_mail_sync: locations_for API uses oidbin for comparisons
Favor oidbin use internally to reduce internal memory traffic.
2021-07-25lei rm-watch: new command to support removing watches
Pretty trivial since it just invokes "git-config". It's mainly intended to make shell completion easier.
2021-07-25t/lei*: check error messages on failures
I just hit an unreproducible failure in t/lei-p2q.t and lacked $lei_err information to diagnose it. Hopefully this helps track down odd failures in the future.
2021-07-22t/solver_git: use like() to improve error reporting
I hit a test failure here, but haven't been able to reproduce it...
2021-07-22lei: auto-refresh watches in config, cancel missing
This makes behavior less surprising on restarts as we no longer lose state on restarts, so there's no need to manually run "lei add-watch" to re-enable watches. This also allows us to transparently handle changes if somebody edits the lei config file directly or via git-config(1).
2021-07-22lei: start implementing inotify Maildir support
This allows lei to automatically note keyword (message flag) changes made to a Maildir and propagate it into lei/store: lei add-watch --state=tag-ro /path/to/Maildir This doesn't persist across restarts, yet. In the future, it will be applied automatically to "lei q" output Maildirs by default (with an option to disable it). State values of tag-rw, index-<ro|rw>, import-<ro|rw> will all be supported for Maildir. This represents a fairly major internal change that's fairly intrusive, but the whole daemon-oriented design was to facilitate being able to automatically monitor (and propagate) Maildir/IMAP flag changes.
2021-07-22init: allow arbitrary key-values via -c KEY=VALUE
This won't blindly append identical key=values, but allows specifying multiple, different key=value pairs as long as the values are different.
2021-07-22extsearch: support publicinbox.*.boost parameter
This behaves identically the lei external "boost" parameter in prioritizing raw messages for extindex. Relying exclusively on the config file order doesn't work well for mirrors since it's impossible to guarantee config file ordering via grokmirror hooks. Config file ordering remains the default if boost is unconfigured, or in case of ties. Note: I chose the name "boost" rather than "priority" or "rank" since I always get confused by whether higher or lower numbers take precedence when it comes to kernel scheduling. "weight" is also a part of Xapian API terminology, which we currently do not expose to configuration (but may in the future).
2021-07-20httpd: fix SIGHUP by invalidating cache on reload
Since we require separate PublicInbox::HTTPD instances for each listen socket address (in order to support {SERVER_<NAME|PORT>} for PSGI env), the old cache needed to be invalidated on rare app refreshes. SIGHUP has always been broken in -httpd (but not -imapd or -nntpd) due to this cache. Update the daemon documentation and 5.10.1-ize some bits while we're in the area.
2021-07-06extindex: implement --dedupe to fix old extindices
This is intended to fix older indices that had deduplication bugs for matching content. It'll also make dealing with future changes to ContentHash easier since that's never guaranteed stable. It also supports --dry-run to print changes only without making them.
2021-06-29www: fix manifest.js.gz for default publicInbox.grokManifest
ManifestJsGz->response was not invoking the new "url_filter" method properly. Furthermore, fix url_filter for returning 404 responses. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87fsx3128a.fsf@kyleam.com/ Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
2021-06-23www: do not warn on blank query parameters
Sometimes users (or bots) may lead queries with '&' and trigger uninitialized variable warnings, just ignore them and give consumers a $ctx->{qp}->{''} entry. While we're in the area, pass a regexp rather than scalar string to the `split' perlop to prevent Perl from recompiling the regexp on every call.
2021-06-23search: make xap_terms easier-to-use and use it more
This allows us to simplify callers throughout, and exceptions are can no longer be silently hidden. MiscSearch now uses xap_terms for looking up eidx_key terms for a code reduction. We also simplify LeiStore->_msg_kw for runtime use by moving the MsetIterator handling into t/lei_store.t test case.
2021-06-22t/hl_mod: accept "make" or "makefile" for Makefile
Version 4.0 of highlight has renamed the "make" language to "makefile". So just check the string starts with "make", to handle both 3.x and 4.x. I tested that public-inbox does actually work with highlight 4 -- it can highlight my Makefile fine. :)
2021-06-18t/sigfd: add diagnostic for occasional FreeBSD failure
Not 100% sure what's going on, here...
2021-06-14lei index+import: reject keywords from R/O IMAP
Since users can't set IMAP flags in read-only IMAP folders, we won't clobber local flags when importing from IMAP. This also enables the local_blob fallback used for lei-index to be used for index deduplication.
2021-06-14lei_input: allow keywords when importing 1 file from Maildir
This will eventually be useful for supporting inotify watches on Maildir. It will also allow users to script their own FS watchers more easily.
2021-06-13t/lei-import-http: quiet unnecessary diag message
Leftover while writing the test.
2021-06-12lei ls-mail-source: list IMAP folders and NNTP groups
While other tools can provide the same functionality, having integration with git-credential is convenient, here. Caching and completion will be implemented separately.
2021-06-09lei edit-search: fix and add a (weak) test
This broke recently and lacked an automated test, so rely on EDITOR=cat to ensure we have some coverage. Fixes: d2670108f71b1eff ("pkt_op: make pkt_do an OO method")
2021-06-08lei import: speed up repeated Maildir imports
On a 4-core CPU, this speeds up "lei import" on a largish Maildir inbox with 75K messages from ~8 minutes down to ~40s. Parallelizing alone did not bring any improvement and may even hurt performance slightly, depending on CPU availability. However, creating the index on the "fid" and "name" columns in blob2name yields us the same speedup we got. Parallelizing IMAP makes more sense due to the fact most IMAP stores are non-local and subject to network latency. Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
2021-05-30lei: support implicit stdin by default
This adds implicit stdin suppport for p2q and lcat, while rm and rediff no longer need explicit support for it.
2021-05-30lei lcat: allow IMAP folder URLs w/o UIDVALIDITY
Requiring UIDVALIDITY on the command-line is of course unreasonable.
2021-05-30lei import|lcat: improve+fix single message IMAP support
lcat can now dump the memoized contents of entire IMAP folders, not just a single UID. It's now parallelized and pipelined for multiple lei2mail workers. Furthemore, various forms of JSON output work consistently with blob-only output, now. While working on this, I noticed NetReader was passing UID URLs to imap_each callbacks, which was causing mail_sync.sqlite3 to store UIDs in `folders' and clearly wrong so it's now fixed.