about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-06-29www: fix manifest.js.gz for default publicInbox.grokManifest
ManifestJsGz->response was not invoking the new "url_filter" method properly. Furthermore, fix url_filter for returning 404 responses. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87fsx3128a.fsf@kyleam.com/ Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
2021-06-27extindex: maintain pack symlinks and use "git multi-pack-index"
This is a fair amount of complexity, but it speeds up "git cat-file --batch" startup by 3-4% with 50K packfiles with a hot kernel cache. This appears extremely sensitive to RAM available to the kernel page cache with my SATA 2 SSD. Faster storage and more RAM can bring loading pack. 2.60s vs 2.69s were the best cases on my workstation with and without the multi-pack-index, however times could be all over the place (even in the minutes) with more activity on my workstation. Getting sub-minute times requires a git patch to speed up alt_odb_usable(): <https://lore.kernel.org/20210624005806.12079-1-e@80x24.org/> Otherwise, prepare to wait several minutes.
2021-06-24www_listing: fix manifest.js.gz generation with extindex "all"
WwwListing and ManifestJsGz may be too different nowadays to be worth the code sharing between them. Update some comments and note we still needs better tests :x Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
2021-06-24v2writable: avoid spawning "git hash-object"
We have git_sha() nowadays that's used everywhere, so avoid process spawning overhead for "git hash-object".
2021-06-24doc: tuning: add a note about Linux sys.vm.max_map_count
git tends to die when mmap(2) fails on this limit, so let users know about it. Perhaps git could gracefully fallback.
2021-06-24favor git(1) rather than libgit2 for ExtSearch
While both git and libgit2 take around 16 minutes to load 100K alternates there's already a proposed patch to make git faster: <https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/> It's also easier to patch and install git locally since the git.git build system defaults to prefix=$HOME and dealing with dynamic linking with libgit2 is more difficult for end users relying on Inline::C. libgit2 remains in use for the non-ALL.git case, but maybe it's not necessary (libgit2 is significantly slower than git in Debian 10 due to SHA-1 collision checking).
2021-06-23www: do not warn on blank query parameters
Sometimes users (or bots) may lead queries with '&' and trigger uninitialized variable warnings, just ignore them and give consumers a $ctx->{qp}->{''} entry. While we're in the area, pass a regexp rather than scalar string to the `split' perlop to prevent Perl from recompiling the regexp on every call.
2021-06-23www_listing: start updating for pagination + search
When dealing with thousands of inboxes, displaying all of them on a single page isn't going to work. So steal some pagination and search results code from the message search to generate some basic HTML output that looks good in w3m.
2021-06-23search: make xap_terms easier-to-use and use it more
This allows us to simplify callers throughout, and exceptions are can no longer be silently hidden. MiscSearch now uses xap_terms for looking up eidx_key terms for a code reduction. We also simplify LeiStore->_msg_kw for runtime use by moving the MsetIterator handling into t/lei_store.t test case.
2021-06-22t/hl_mod: accept "make" or "makefile" for Makefile
Version 4.0 of highlight has renamed the "make" language to "makefile". So just check the string starts with "make", to handle both 3.x and 4.x. I tested that public-inbox does actually work with highlight 4 -- it can highlight my Makefile fine. :)
2021-06-22lei: use open() perlop for -C (chdir)
This is for consistency with the open() at initial accept, in case we hit a code path which expects Perl directory handles rather than "file handles". Both work with the chdir() perlop (fchdir(2), in our case).
2021-06-20lei import: help + completion for --new-only
I've found it's very helpful for large IMAP folders.
2021-06-20lei sucks: don't warn or error out on missing dependencies
%INC can hold undef. This can be hit on a Linux machine missing Linux::Inotify2. Loading PublicInbox::KQNotify is attempted and PublicInbox/KQNotify.pm always exists, causing the `undef' entry in %INC when it fails to load IO::KQueue. $ perl -MData::Dumper -I lib \ -E 'eval { require PublicInbox::KQNotify }; say Dumper(\%INC)'
2021-06-20view: extra check to for redundant messages in HTML view
There appears to be some cases of duplicates appearing due to -extindex. I haven't nailed down the cause of it, yet, but this should make things easier for readers using the PSGI HTML interface in the meantime. The raw mboxrd remains undeduplicated for now, and the correct fix/workaround would be some fsck-like mode for public-inbox-extindex.
2021-06-20scripts: add syscall-list tool for development
We'll be supporting inotify directly as we do with epoll so so Linux users won't have to deal with XS, extra DSOs or install Linux::Inotify2 (and common::sense) modules.
2021-06-18t/sigfd: add diagnostic for occasional FreeBSD failure
Not 100% sure what's going on, here...
2021-06-18lei/store: do not put NULL into over.num column
Simplify oid2docid and filter out undefined docids in ->add_eml, instead. This avoids SQLite "datatype mismatch" errors in OverIdx->add_over Fixes: d1052f03ea85d4af ("lei/store: cull redundant docids based on blob OID")
2021-06-17lei/store: cull redundant docids based on blob OID
I'm not sure how this happened (only once for me in March), but it should not happen... In any case, we'll operate on the lowest numbered docid and cull redundant index entries when lei/store is open for read-write. This also fixes the normal lei/store removal path to clean up the xref3 table (since it's not done automatically for public-facing -eidx due to the multi-list nature of it).
2021-06-17lei_input: prefix bare Maildir paths w/ "maildir:"
This will simplify upcoming code for watches.
2021-06-17lei inspect: learn "num:" and "docid:" prefixes
"num:" is useful for inspecting Inbox-ish directories, while "docid:" can be used for any Xapian DB (not just stuff managed by our code).
2021-06-14lei index+import: reject keywords from R/O IMAP
Since users can't set IMAP flags in read-only IMAP folders, we won't clobber local flags when importing from IMAP. This also enables the local_blob fallback used for lei-index to be used for index deduplication.
2021-06-14lei_input: allow keywords when importing 1 file from Maildir
This will eventually be useful for supporting inotify watches on Maildir. It will also allow users to script their own FS watchers more easily.
2021-06-13net_reader: canonicalize URL args on add_url
This fixes cases when users specify an IMAP or NNTP URL with standard port numbers explicitly. In other words, this allows users to use "lei ls-mail-source nntps://public-inbox.org:563/" and "lei ls-mail-source imaps://public-inbox.org:993/" without hitting "BUG:" errors.
2021-06-13lei import: use url_folder_cache for completion
And fix "lei index" completion while we're at it.
2021-06-13t/lei-import-http: quiet unnecessary diag message
Leftover while writing the test.
2021-06-13lei ls-mail-source: write through to URL folder cache
We'll be able to use this for shell completion for lei import, lcat, tag, etc.. This also adds --url support for scripting purposes.
2021-06-13lei: stop pager early on exit
This is necessary when using "ls-mail-source" on an unreachable IMAP server.
2021-06-12lei ls-mail-source: list IMAP folders and NNTP groups
While other tools can provide the same functionality, having integration with git-credential is convenient, here. Caching and completion will be implemented separately.
2021-06-10lei tag: less confusing warning about unimported messages
"unimported" is more meaningful than "missing", here. And instead of having every worker spew about unimported messages, we'll accumulate and only print one warning line. This necessitated alterating ->DESTROY behavior and persisting the client socket within the $lei object itself, not just the PktOp consumer object.
2021-06-10lei import: support --new-only for IMAP
Taking ~40s to synchronize a ~75K message IMAP folder is still a lot of time, so support an option to only touch new messages. This is similar to "offlineimap -q" (quick) or "mbsync --new" switches, but lei already accepts "-q" as a shortcut for --quiet. "--new" could work, but "--new-only" might be more descriptive (or "--only-new"?), since the default fetches also fetches new messages. v2: warn for non-IMAP sources, I'm not sure it's worth it for Maildir or other sources, yet. It will also make sense for MH and JMAP once we support them.
2021-06-09lei prune-mail-sync: new command to prune invalid sync data
This will be invoked automatically by "lei import" eventually, but it may make sense to expose as a separate command.
2021-06-09lei_mail_sync: hoist out --all handling from export-kw
We'll be reusing it in other commands, too.
2021-06-09lei tag: parallelize Maildir access
Since Maildir isn't guaranteed to have any sort of order, we can parallelize inputs, here. On a 4-core system, this reduced one of my tag invocations from 5.5 to 1.4s.
2021-06-09mdir_reader: maildir_each_file: pass flags, skip Trash
This is a slight behavior change for "lei q": Trashed (but not-yet-expunged) messages no longer get unlinked when --output is used without --augment.
2021-06-09inbox_writable: fix import_maildir
I'm not sure if anybody uses this, but it exists. It'll likely be dropped in the future. Fixes: fa3f0cbcd1af5008 ("use MdirReader in -watch and InboxWritable")
2021-06-09lei/store: do eidx_init before creating R/W lms dbh
Sharing lms->{dbh} with eidx shards appears to be the cause of the "Issuing rollback() due to DESTROY without explicit disconnect() of DBD::SQLite::db handle" messages I've been seeing from "lei up".
2021-06-09lei edit-search: fix and add a (weak) test
This broke recently and lacked an automated test, so rely on EDITOR=cat to ensure we have some coverage. Fixes: d2670108f71b1eff ("pkt_op: make pkt_do an OO method")
2021-06-09lei pmdir: fix nproc for <= 4 CPUs
I forgot my FreeBSD VM has 8 cores, actually, and tweaked the nproc detection on that machine before finalizing commit 10b523eb017162240b1ac3647f8dcbbf2be348a7 ("lei import: speed up repeated Maildir imports") Fixes: 10b523eb01716224 ("lei import: speed up repeated Maildir imports")
2021-06-08lei import: speed up repeated Maildir imports
On a 4-core CPU, this speeds up "lei import" on a largish Maildir inbox with 75K messages from ~8 minutes down to ~40s. Parallelizing alone did not bring any improvement and may even hurt performance slightly, depending on CPU availability. However, creating the index on the "fid" and "name" columns in blob2name yields us the same speedup we got. Parallelizing IMAP makes more sense due to the fact most IMAP stores are non-local and subject to network latency. Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
2021-06-08lei: generalize auxiliary WQ handling
op_wait_event is now more lei-specific since we no longer have to care about oneshot and use a synchronous loop. {ikw} (import-keywords) started a trend, but LeiPmdir (parallel Maildir) is an upcoming WQ class that will follow this idea. Eventually, {l2m} usage may be updated to follow this, too.
2021-06-08lei: safety fix for multiple WQ classes
For commands utilizing multiple workers, this simple change generalizes the persistence mechanism and and prevents lei->dclose from causing script/lei to exit if there are still in-flight workers. This ougth to prevent read-after-write consistency problems that occasionally manifest in scripts (e.g. test cases) but usually go unnoticed in normal use.
2021-06-08lei/store: checkpoint commits mail_sync.sqlite3
We mainly rely on ->done with lei/store, but moving to ->checkpoint probably makes sense. Note: over, msgmap, and mail_sync all have slightly different transacation behavior; perhaps they can be unified in the future.
2021-06-06lei: don't drop WQ workers on normal exit
This is dangerous and causes race conditions on commands which utilize multiple workqueues.
2021-06-06INSTALL: note about lei metadata storage
Since lei is for personal mailboxes, I don't think lei needs to keep keyword and label changes in history. And fix a minor wording problem ("or" => "nor") while we're at it.
2021-06-04pkt_op: make pkt_do an OO method
This will make it easier to use for internal use such as managing Maildir and IMAP IDLE watches.
2021-06-03pkt_op: remove blocking I/O support
Since lei-daemon is guaranteed to be running, there's no need to keep blocking I/O support around (and we can get it back via git if we need it). Followup-to: 1d6e1f9a6a66a42d ("lei: require Socket::MsgHdr or Inline::C, drop oneshot")
2021-06-03lei import: speed up kw updates for old IMAP messages
On a 4-core CPU, this speeds up "lei import" on a largish IMAP inbox with 75K messages from ~21 minutes down to 40s. Parallelizing with the new LeiImportKw WQ worker class gives a near-linear speedup and brought the runtime down to ~5:40. The new idx_fid_uid index on the "fid" and "uid" columns of blob2num in mail_sync.sqlite3 brought us the final speedup. An additional index on over.sqlite3#xref3(oidbin) did not help, since idx_nntp already exists and speeds up the new ->oidbin_exists internal API. I initially experimented with a separate "lei import-kw" command but decided against it since it's useless outside of IMAP+JMAP and would require extra cognitive overhead for both users and hackers. So LeiImportKw is just a WQ worker used by "lei import" and not its own user-visible command. v2: fix ikw_done_wait arg handling (ugh, confusing API :x)
2021-06-02lei export-kw: do not write directly to mail_sync.sqlite3
Only the lei/store process should be writing to files/DBs in lei/store.
2021-06-02lei: remove "forget" (old name for "rm")
"rm" is probably the better name for it, since it matches "public-inbox-learn rm"
2021-06-01lei_mail_sync: more debug info for uncommitted txn
I'm not actually sure if I hit an uncommitted transaction just now, it doesn't seem like it.