about summary refs log tree commit homepage
path: root/MANIFEST
DateCommit message (Collapse)
2021-02-11doc: add lei-import(1)
2021-02-10net_reader: new package split from -watch
We'll be using some of this for IMAP and NNTP support in lei, too. More will need to be done to improve code sharing and reusability, soon, but this is a start.
2021-02-10use MdirReader in -watch and InboxWritable
MdirReader now handles files in "$MAILDIR/new" properly and is stricter about what it accepts. eml_from_path is also made robust against FIFOs while eliminating TOCTOU races with between stat(2) and open(2) calls.
2021-02-10lei: split out MdirReader package, lazy-require earlier
We'll do more requires in the top-level lei-daemon process to save work in workers. We can also work towards aborting on user errors in lei-daemon rather than worker processes. "lei import -f mbox*" is finally tested inside t/lei_to_mail.t
2021-02-08lei import: support Maildirs
It seems to be working trivially, though I'm probably going to split out Maildir reading into a separate package rather than using LeiToMail.
2021-02-07lei help: split out into separate file
We'll reword and improve formatting with non-breaking spaces ("\xa0") which is only replaced with SP after wrapping. Some terminology is shortened (e.g. "URL_OR_PATHNAME" => "LOCATION") to improve formatting. This also enables completion for -h/--help and lets us prioritize favored switch names while attempting to satisfy users relying on muscle memory from other tools.
2021-02-07lei: add-external --mirror support
This can be useful for users who want to clone and mirror an existing public-inbox. This doesn't have update support, yet, so users will need to run "git fetch && public-inbox-index" for now.
2021-02-07tests: split out lei-daemon.t from lei.t
This makes it easier for hackers to find daemon-specific tests and forces us to always test both daemon and oneshot mode.
2021-02-07t/tests: split out setup_public_inboxes sub
We'll probably use this in many more existing places and likely change non-lei tests to use it.
2021-02-07t/lei-externals: split out into separate test
This is still overloaded with "lei q" stuff, but that's somewhat inevitable.
2021-02-07tests: add test_lei wrapper, split out t/lei-import.t
This will make it easier to maintain and test lei going forward, we need to be testing against existing read-only daemons. We'll also save ourselves some boilerplate by exporting all the Test::More methods directly in TestCommon We'll start using this by splitting out the latest "lei import" tests into its own file.
2021-02-05lei import: initial implementation
Only tested with .eml files so far, but Maildir + IMAP will be supported.
2021-02-04lei q: support reading queries from stdin
This will be useful on shared machines when a user doesn't want search queries visible to other users looking at the ps(1) output or similar.
2021-02-03lei: switch to use SEQPACKET socketpair instead of pipe
This will allow us to use larger messages and do progress reporting to accumulate in the main daemon.
2021-02-01sharedkv: use lock_for_scope_fast
This allows us to avoid repeated open() and close() syscalls and speeds up the new xt/stress-sharedkv.t maintainer test by roughly 7%.
2021-02-01ipc: switch wq to use the event loop
This will let us to maximize the capability of our asynchronous git API. This lets us avoid relying on EOF to notify lei2mail workers; thus giving us the option of running fewer lei_xsearch worker processes in parallel than local sources. I tried using a synchronous git API; and even with libgit2 in the same process to avoid the IPC cost failed to match the throughput afforded by this change. This is because libgit2 is built (at least on Debian) with the SHA-1 collision code enabled and ubc_check stuff was dominating my profiles.
2021-02-01doc: add lei-overview(7)
[ew: s/mboxrd/mboxcl2/ since that's what mutt uses]
2021-02-01doc: start manpages for lei commands
Add manpages for lei and the currently implemented subcommands. The included options and their descriptions follow to a large degree the --help output, dropping some options that are not currently wired up.
2021-01-26doc: start working on public-inbox-extindex(1) manpage
It's barely started, but I started writing this weeks ago, but I'm still unsure about some behavioral/usability things and hoping work on lei(1) can flush them out.
2021-01-25doc: re-add missing 1.6 release notes
I missed these during the merge :x
2021-01-22lei: forget-external support with canonicalization
For proper matching, we'll do a better job canonicalizing URLs and path names for matching. Of course, users may edit the file outside of lei, so ensure we try both the canonicalized and as-is form provided by the user. I also don't think we'll need to store externals info in MiscIdx; just the config file is fine.
2021-01-18lei: q: results output to Maildir and mbox* working
All the augment and deduplication stuff seems to be working based on unit tests. OpPipe is a nice general addition that will probably make future state machines easier.
2021-01-15lei: q: lock stdout on overview output
Most writes to stdout aren't atomic and we need locking to prevent workers from interleaving and corrupting JSON output. The one case stdout won't require locking is if it's pointed to a regular file with O_APPEND; as POSIX O_APPEND semantics guarantees atomicity.
2021-01-14lei: test SIGPIPE, stop xsearch workers on client abort
The new test ensures consistency between oneshot and client/daemon users. Cancelling an in-progress result now also stops xsearch workers to avoid wasted CPU and I/O. Note the lei->atfork_child_wq usage changes, it is to workaround a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> This switches the internal protocol to use SOCK_SEQPACKET AF_UNIX sockets to prevent merging messages from the daemon to client to run pager and kill/exit the client script.
2021-01-12lei: query: restore JSON output overview
This internal API is better suited for fork-friendliness (but locking + dedupe still needs to be re-added). Normal "json" is the default, though stream-friendly "concatjson" and "jsonl" (AKA "ndjson" AKA "ldjson") all seem working (though tests aren't working, yet). For normal "json", the biggest downside is the necessity of a trailing "null" element at the end of the array because of parallel processes, since (AFAIK) regular JSON doesn't allow trailing commas, unlike JavaScript.
2021-01-12lei_xsearch: transfer 4 FDs internally, drop IO::FDPass
It's easier to make the code more generic by transferring all four FDs (std(in|out|err) + socket) instead of omitting stdin. We'll be reading from stdin on some imports, and possibly outputting to stdout, so omitting stdin now would needlessly complicate things. The differences with IO::FDPass "1" code paths and the "4" code paths used by Inline::C and Socket::MsgHdr are far too much to support and test at the moment.
2021-01-12cmd_ipc: send FDs with buffer payload
For another step in in syscall reduction, we'll support transferring 3 FDs and a buffer with a single sendmsg/recvmsg syscall using Socket::MsgHdr if available. Beyond script/lei itself, this will be used for internal IPC between search backends (perhaps with SOCK_SEQPACKET). There's a chance this could make it to the public-facing daemons, too. This adds an optional dependency on the Socket::MsgHdr package, available as libsocket-msghdr-perl on Debian-based distros (but not CentOS 7.x and FreeBSD 11.x, at least). Our Inline::C version in PublicInbox::Spawn remains the last choice for script/lei due to the high startup time, and IO::FDPass remains supported for non-Debian distros. Since the socket name prefix changes from 3 to 4, we'll also take this opportunity to make the argv+env buffer transfer less error-prone by relying on argc instead of designated delimiters.
2021-01-12lei query + pagination sorta working
Parallelism and interactivity with pager + SIGPIPE needs work; but results are shown and phrase search works without shell users having to apply Xapian quoting rules on top of standard shell quoting.
2021-01-01lei: rename "extinbox" => "external"
The words "extinbox" and "extindex" are too close and easy to confuse with the other. Rename "extinbox" to "external", since these could be IMAP, JMAP or other non-public-inbox search APIs. Link: https://public-inbox.org/meta/20201226112649.GB6226@dcvr/
2021-01-01ipc: generic IPC dispatch based on Storable
I intend to use this with LeiStore when importing from multiple slow sources at once (e.g. curl, IMAP, etc). This is because over.sqlite3 can only have a single writer, and we'll have several slow readers running in parallel. Watch and SearchIdxShard should also be able to use this code in the future, but this will be proven with LeiStore, first.
2021-01-01lei: implement various deduplication strategies
For writing mboxes and Maildirs, users may wish to use stricter or looser deduplication strategies. This gives them more control.
2021-01-01mboxreader: new class for reading various mbox formats
This is only lightly-tested against stuff LeiToMail generates and will need real-world tests to validate.
2021-01-01sharedkv: fork()-friendly key-value store
This is intended for maintaining Maildir states, mbox message deduplication, but may be useful for other purposes...
2021-01-01lei_to_mail: initial implementation for writing mbox formats
No Maildir, support, yet, but it'll come.
2020-12-31public-inbox 1.6.1 - minor bugfix release v1.6.1
2020-12-31Merge remote-tracking branch 'origin/master' into lorelei
* origin/master: (58 commits) ds: flatten + reuse @events, epoll_wait style fixes ds: simplify EventLoop implementation check defined return value for localized slurp errors import: check for git->qx errors, clearer return values git: qx: avoid extra "local" for scalar context case search: remove {mset} option for ->mset method search: remove pointless {relevance} setting miscsearch: take reopen from Search and use it extsearch: unconditionally reopen on access extindex: allow using --all without EXTINDEX_DIR extindex: add undocumented --no-scan switch extindex: enable autoflush on STDOUT/STDERR extindex: various --watch signal handling fixes extindex: --watch for inotify-based updates eml: fix undefined vars on <Perl 5.28 t/config: test --get-urlmatch for git <2.26 default to CORE::warn in $SIG{__WARN__} handlers inbox: name variable for values loop iterator inboxidle: avoid needless syscalls on refresh inboxidle: clue users into resolving ENOSPC from inotify ...
2020-12-31lei_xsearch: cross-(inbox|extindex) search
While a single extindex combines multiple inboxes into a single search index, extindex still requires up-front indexing on items which can be searched. XSearch has no on-disk footprint itself and uses Xapian DBs of existing publicinbox and extindex ("extinbox") exclusively. XSearch still suffers from the multi-shard Xapian scalability problems which led to the creation of extindex, but I expect the number of shards to remain relatively low. I envision users hosting public-inbox instances on their workstations will only have two extindex combined by this, one read-only extindex for serving public archives, and one read-write extindex managed by LeiStore for private mail.
2020-12-26over: ensure old, merged {tid} is really gone
We must use the result of link_refs() since it can trigger merge_threads() and invalidate $old_tid. In case merge_threads() isn't triggered, link_refs() will return $old_tid anyways. When rethreading and allocating new {tid}, we also must update the row where the now-expired {tid} came from to ensure only the new {tid} is seen when reindexing subsequent messages in history. Otherwise, every subsequently reindexed+rethreaded message could end up getting a new {tid}. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87360nlc44.fsf@kyleam.com/ (cherry picked from commit 9356ec0cc5afc95a8fd398ddf898942ef0acdb74)
2020-12-26doc: post-1.6 updates, start 1.7
I should've dropped "PENDING" notes before the 1.6 release; they're dropped now, and a note is added to remind my future self to drop them before 1.7. (cherry picked from commit 3b5d3d1910f1db526a488142c01f42db5255ac72)
2020-12-23xt: add create-many-inboxes helper test
I've been using something like this to mock out thousands of inboxes for testing.
2020-12-19lei: extinbox: start implementing in config file
They need to be indexed by MiscIdx, but MiscIdx still needs more work to support faster config loading when dealing with ~100K data sources.
2020-12-19build: add lei.sh + "make symlink-install" target
This could've been done ages ago, but I rarely invoked public-inbox-* commands from an interactive terminal like I would with lei.
2020-12-19lei: start working on bash completion
Much work still needs to be done, but that goes for this entire project :P
2020-12-19on_destroy: generic localized END
This is a localized version of the process-wide END{}, but runs at the end of variable scope. A subroutine ref and arguments may be passed, which allows us to avoid anonymous subs and problems they cause. It's similar to `defer' or `ensure' in other languages; Perl can rely on deterministic destructors due to refcounting.
2020-12-19rename LeiDaemon package to PublicInbox::LEI
"LEI" is an acronym, and ALL CAPS is consistent with existing PublicInbox::{IMAP,HTTP,NNTP,WWW} naming for top-level modules, 3 of 4 old ones which deal directly with sockets and requests.
2020-12-19t/lei-oneshot: standalone oneshot (non-socket) test
We can use the same "local $ENV{FOO}" hack we do with t/nntpd-v2.t to test the oneshot code path without imposing an extra script in the users' $PATH.
2020-12-19lei_store: local storage for Local Email Interface
Still unstable, this builds off the equally unstable extindex :P This will be used for caching/memoization of traditional mail stores (IMAP, Maildir, etc) while providing indexing via Xapian, along with compression, and checksumming from git. Most notably, this adds the ability to add/remove per-message keywords (draft, seen, flagged, answered) as described in the JMAP specification (RFC 8621 section 4.1.1). We'll use `.' (a single period) as an $eidx_key since it's an invalid {inboxdir} or {newsgroup} name.
2020-12-19lei: FD-passing and IPC basics
The start of lei, a Local Email Interface. It'll support a daemon via FD passing to avoid startup time penalties if IO::FDPass is installed, but fall back to a slow one-shot mode if not. Compared to traditional socket daemon, FD passing should allow us to eventually do stuff like run "git show" and still have proper terminal support for pager and color.
2020-12-12doc: add public-inbox-extindex-format(5) manpage
The CLI tool still needs usability work, and "misc" is still in flux, but the core message indexing part is stable (since it's stolen from v2 :P).
2020-12-05isearch: emulate per-inbox search with ->ALL
Using "eidx_key:" boolean prefix to limit results to a given inbox, we can use ->ALL to emulate and replace per-Inbox xap15/[0-9] search indices. With this change, the presence of "extindex.all.topdir" in the $PI_CONFIG will cause the WWW code to use that extindex and ignore per-inbox Xapian DBs in xap15/[0-9]. Unfortunately IMAP search still requires old per-inbox indices, for now. Mapping extindex Xapian docids to per-Inbox UIDs and vice-versa is proving tricky. Fortunately, IMAP search is rarely used and optional. The RFCs don't specify expensive phrase search, either, so `indexlevel=medium' can be used in per-inbox Xapian indices to save space. For primarily WWW (and future JMAP) users; this should result in significant disk space, FD, and page cache footprint savings for large instances with many inboxes and many cross-posted messages.