about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01Makefile.PL: add update-copyrights target
It might save me a few cycles every year to not have to scroll through git history to see how it's run.
2021-01-01on_destroy: support PID owner guard
Since we'll be forking for Xapian indexing and maybe other places, having a simple guard in place to ensure OnDestroy doesn't unexpectedly unlink files or similar is a safer option.
2021-01-01ds: clobber $in_loop first at reset
This may help ensure DESTROY callbacks will see in_loop before the others.
2021-01-01avoid calling waitpid from children in DESTROY
Objects with DESTROY callbacks get propagated to children, so we must be careful to not invoke waitpid from children on their sibling processes. Only parents (and their parents...) can reap child processes.
2021-01-01lei: avoid Spawn package when starting daemon
Spawn was designed to speed up process spawning inside long-lived daemons with largish memory usage. It does not help for short-lived scripts which only exist to start and connect to a daemon. This change actually speeds up initial lei startup from ~190ms to ~140ms(!). Normal usage once the daemon is running is unaffected, at <20ms for help text. While we're in the area, simplify Cwd error message generation, too.
2021-01-01syscall: SFD_NONBLOCK can be a constant, again
Since Perl exposes O_NONBLOCK as a constant, we can safely make SFD_NONBLOCK a constant, too. This is not the case for SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite being used internally in the interpreter.
2021-01-01use PublicInbox::DS for dwaitpid
This simplifies our code and provides a more consistent API for error handling. PublicInbox::DS can be loaded nowadays on all *BSDs and Linux distros easily without extra packages to install. The downside is possibly increased startup time, but it's probably not as a big problem with lei being a daemon (and -mda possibly following suite).
2021-01-01t/ipc.t: test for references via `die'
We'll probably start using references as exceptions in some places for more exact matching.
2021-01-01searchidxshard: call DS->Reset at worker start
The daemon for the local email interface will be inside the DS->EventLoop. -watch currently doesn't trigger this bug since it doesn't enable parallelism, but it may in the future.
2021-01-01lei_to_mail: open FIFOs O_WRONLY so we block
Opening a FIFO with O_RDWR always succeeds on Linux, which cause the cat(1) process invoked by t/lei_to_mail.t to get stuck. Furthermore O_APPEND makes no sense on FIFOs and perhaps there's some kernel out there which will reject it.
2021-01-01gcf2client: reap process on DESTROY
We don't want to leave Xapcmd waitpid(-1, ...) call to hit it.
2021-01-01t/run: avoid uninitialized var on incomplete test
Diagnosing an occasional FIFO failure in t/lei_to_mail.t...
2021-01-01init: remove embedded UnlinkMe package
PublicInbox::OnDestroy can do the same thing
2021-01-01spawn: move run_die here from PublicInbox::Import
It seems like a more logical place for it, but we'll favor the newly-added xsys_e() in tests for BAIL_OUT use.
2021-01-01lei: add --mfolder as an --output alias
This will be helpful for mairix users.
2021-01-01lei_to_mail: unlink mboxes if not augmenting
This matches mairix(1) behavior and may be safer if there's concurrent readers on the existing mbox, especially since we don't do currently implement mbox locking (nor does mairix).
2021-01-01ipc: use shutdown(2), base atfork* callback
shutdown(2) on a socket can be preferable if there's multiple forked processes writing to a single worker and we really want to shut things down ASAP. It may also be good to provide an ipc_worker_exit method which subclasses can override if needed for graceful shutdown. But we won't need equivalents to atexit(3) since we can rely on DESTROY handlers given this is Perl5.
2021-01-01lei_store: handle messages without Message-ID at all
For personal mail, unsent drafts messages are a common source of messages without Message-IDs.
2021-01-01mid: hoist out mids_in sub
We'll be using it for Resent-Message-ID with lei, and possibly other places.
2021-01-01mid: use defined-or with `push' for uniqueness check
As shown recently in commit a05445fb400108e60ede7d377cf3b26a0392eb24 ("config: config_fh_parse: micro-optimize"), the relying on the return value of `push' and defined-or operators can avoid modifying a the hash value scalar with an increment.
2021-01-01lei: rename "extinbox" => "external"
The words "extinbox" and "extindex" are too close and easy to confuse with the other. Rename "extinbox" to "external", since these could be IMAP, JMAP or other non-public-inbox search APIs. Link: https://public-inbox.org/meta/20201226112649.GB6226@dcvr/
2021-01-01lei_store: add ->set_eml, ->add_eml can return smsg
Add a ->set_eml method which can be a useful fire-and-forget way of either adding new files to store OR setting keywords on them. When seeing brand-new messages, add_eml can afford to return more information in the smsg instead of just the OID.
2021-01-01ipc: support Sereal
Some testing will be needed to see if it's worth the code and maintenance overhead, but it seems easy-enough to get working.
2021-01-01ipc: generic IPC dispatch based on Storable
I intend to use this with LeiStore when importing from multiple slow sources at once (e.g. curl, IMAP, etc). This is because over.sqlite3 can only have a single writer, and we'll have several slow readers running in parallel. Watch and SearchIdxShard should also be able to use this code in the future, but this will be proven with LeiStore, first.
2021-01-01lei_to_mail: support Maildir, fix+test --augment
Maildir should be plenty fine for short-lived output folders.
2021-01-01lei_to_mail: support for non-seekable outputs
Users may wish to pipe output to "git am", "spamc", or similar, so we need to support those cases and not bail out on lseek(2) or ftruncate(2) failures.
2021-01-01lei_to_mail: lazy-require LeiDedupe
LeiDedupe requires SQLite, so we may want to be able to test writing mail without DBI or SQLite down the line.
2021-01-01lei: implement various deduplication strategies
For writing mboxes and Maildirs, users may wish to use stricter or looser deduplication strategies. This gives them more control.
2021-01-01lei_to_mail: start --augment, dedupe, bz2 and xz
--augment will match the mairix(1) option of the same name to augment existing search results. We'll need to implement deduplication for a better user experience. mutt ships with compressed mbox support for bz2 and xz, at least, so we'll support those out-of-the-box.
2021-01-01mboxreader: new class for reading various mbox formats
This is only lightly-tested against stuff LeiToMail generates and will need real-world tests to validate.
2021-01-01lei_to_mail: start atomic and compressed mbox writing
We'll allow using multiple workers to write to a single mbox (which could be compressed). This is can be done safely with O_APPEND + syswrite for uncompressed files, and using a lock when piping to pigz/gzip/bzip2/xz.
2021-01-01sharedkv: split out index_values
In most cases, we won't need to index by value, so don't waste cycles or space on it.
2021-01-01sharedkv: fork()-friendly key-value store
This is intended for maintaining Maildir states, mbox message deduplication, but may be useful for other purposes...
2021-01-01lei_to_mail: initial implementation for writing mbox formats
No Maildir, support, yet, but it'll come.
2021-01-01revert "lei_store: use per-machine refname as git HEAD"
In retrospect, per-machine HEADs was a bad idea because users of removable storage would be thrown off when moving storage between different machines. This is only a partial revert, the Import::init_bare change to support alternate head names still exists because we may use it for other reasons.
2021-01-01lei_store: use per-machine refname as git HEAD
It may be helpful to identify the source of messages and perhaps avoid conflicting history. On the other hand, this may be a terrible idea for users who move portable storage (e.g. USB sticks) across computers...
2021-01-01import: respect init.defaultBranch
This matches git v2.28.0+ behavior in case users prefer a different name.
2021-01-01Merge remote-tracking branch 'origin/lei' into eidx
* origin/lei: (28 commits) lei: rename proposed "query" command to "q", add JSON output lei_xsearch: cross-(inbox|extindex) search lei: extinbox: start implementing in config file lei: revise output routines lei: support for -$DIGIT and -$SIG CLI switches build: add lei.sh + "make symlink-install" target lei: start working on bash completion lei: drop $SIG{__DIE__}, add oneshot fallbacks lei: restore default __DIE__ handler for event loop on_destroy: generic localized END lei_store: keyword extraction from mbox and Maildir lei_store: relax GIT_COMMITTER_IDENT check lei: micro-optimize startup time lei: rename $client => $self and bless lei: help: show actual paths being operated on lei: support pass-through for `lei config' rename LeiDaemon package to PublicInbox::LEI search: simplify initialization, add ->xdb_shards_flat lei_store: simplify git_epoch_max, slightly lei: support `daemon-env' for modifying long-lived env ...
2021-01-01Merge tag 'v1.6.1' into eidx
public-inbox 1.6.1 - minor bugfix release * tag 'v1.6.1': (31 commits) public-inbox 1.6.1 - minor bugfix release import: drop X-Status in addition to Status eml: fix undefined vars on <Perl 5.28 t/config: test --get-urlmatch for git <2.26 inboxidle: avoid needless syscalls on refresh inboxidle: clue users into resolving ENOSPC from inotify inbox: name variable for values loop iterator public-inbox-v[12]-format.pod: make lexgrog happy manifest.js.gz: fix per-inbox /$INBOX/manifest.js.gz Fix manpage section of perl module documentation t/psgi_v2: ignore warnings on missing P::M::ReverseProxy daemon: support --daemonize without Net::Server::Daemonize doc: v2-format: drop repeated word over: ensure old, merged {tid} is really gone wwwattach: prevent deep-linking via Referer match t/eml.t: workaround newer Email::MIME* behavior nntp: attempt RFC 5536 3.1.5-conformant Path: headers nntp: delimit Newsgroup: header with commas tls: epollbit: account for miscellaneous OpenSSL errors scripts/dupe-finder: restore $dbh variable ...
2020-12-31public-inbox 1.6.1 - minor bugfix release v1.6.1
2020-12-31Merge remote-tracking branch 'origin/master' into lorelei
* origin/master: (58 commits) ds: flatten + reuse @events, epoll_wait style fixes ds: simplify EventLoop implementation check defined return value for localized slurp errors import: check for git->qx errors, clearer return values git: qx: avoid extra "local" for scalar context case search: remove {mset} option for ->mset method search: remove pointless {relevance} setting miscsearch: take reopen from Search and use it extsearch: unconditionally reopen on access extindex: allow using --all without EXTINDEX_DIR extindex: add undocumented --no-scan switch extindex: enable autoflush on STDOUT/STDERR extindex: various --watch signal handling fixes extindex: --watch for inotify-based updates eml: fix undefined vars on <Perl 5.28 t/config: test --get-urlmatch for git <2.26 default to CORE::warn in $SIG{__WARN__} handlers inbox: name variable for values loop iterator inboxidle: avoid needless syscalls on refresh inboxidle: clue users into resolving ENOSPC from inotify ...
2020-12-31lei: rename proposed "query" command to "q", add JSON output
Using "query" as a verb may be confusing when we'll also refer to them as nouns with the "<ls|rm|mv>-query" sub commands. "query" is also many characters to type without tab-completion on what I expect to be one of the most commonly used sub-commands Furthermore, "q" is also the common query parameter name used by our PSGI interface, as is the case with several major web search engines; so there's an element of familiarity there. The name "search" was disregarded because "show" could be a commonly used lei sub-command, too, and typing "se" for tab-completion may be slow since two-handed typists on QWERTY keyboards won't be able to use alternating hands. "f" or "find" could be a possibility here, too; but we're currently using the term "forget" as a weaker version of "remove" or "rm", though "ignore" could be substituted for "forget", perhaps... Kyle Meyer noted the lack of (proposed) JSON output support so that's been added to the proposed UI.
2020-12-31lei_xsearch: cross-(inbox|extindex) search
While a single extindex combines multiple inboxes into a single search index, extindex still requires up-front indexing on items which can be searched. XSearch has no on-disk footprint itself and uses Xapian DBs of existing publicinbox and extindex ("extinbox") exclusively. XSearch still suffers from the multi-shard Xapian scalability problems which led to the creation of extindex, but I expect the number of shards to remain relatively low. I envision users hosting public-inbox instances on their workstations will only have two extindex combined by this, one read-only extindex for serving public archives, and one read-write extindex managed by LeiStore for private mail.
2020-12-31import: drop X-Status in addition to Status
It's actually supported by mutt, dovecot[1], and likely some other software to augment the Status: header. While dovecot doesn't expose X-Status to clients, mutt will write 'A' (answered) and 'F' to X-Status (but not T (draft)). So we'll drop it like we do Status since it's not suitable for public mail, but stick it in an @UNWANTED_HEADERS array will allow us to configure an override if needed. [1] https://doc.dovecot.org/configuration_manual/mail_location/mbox/
2020-12-28ds: flatten + reuse @events, epoll_wait style fixes
Consistently returning the equivalent of pollfd.revents in a portable manner was never worth the effort for us, as we use the same ->event_step callback regardless of POLLIN/POLLOUT/POLLHUP. Being a Perl, @events knows it size and we don't have to return a maximum index for the caller to iterate on. We can also avoid redundant integer coercion ("+0") since we ensure everything is an IV in other places. Finally, vec() is preferable to ("\0" x $size) for resizing buffers because it only needs to write the extended portion and not overwrite the entire buffer.
2020-12-28ds: simplify EventLoop implementation
More importantly, make it easier-to-find the sub by avoiding runtime manipulation of subroutine names. There's no point in avoiding a potential call to _InitPoller in EventLoop since entering EventLoop is rare. On the contrary, PublicInbox::DS->new is called often and this change to avoid entering _InitPoller there may have more benefits (which may still be unmeasurable).
2020-12-28check defined return value for localized slurp errors
Reading from regular files (even on STDIN) can fail when dealing with flakey storage.
2020-12-28import: check for git->qx errors, clearer return values
Those git commands can fail and git->qx will set $? when it fails. There's no need for the extra indirection of the @ret array, either. Improve git->qx coverage to check for $? while we're at it.
2020-12-28git: qx: avoid extra "local" for scalar context case
We can use the ternary operator to avoid an early return, here