about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2021-09-21lei inspect: convert to WQ worker
Xapian and SQLite access can be slow when a DB is large and/or on high-latency storage.
2021-09-20gcf2: fix loading at runtime
We need to waitpid synchronously on pkg-config to use $?. When loading Gcf2 inside the event loop, implicit dwaitpid done by PublicInbox::ProcessPipe would not call waitpid in time to zero $?. This was causing one of my -httpd to occasionally fall back to git(1) instead of using Gcf2. This was noted in: Link: https://public-inbox.org/meta/20210914085322.25517-1-e@80x24.org/
2021-09-19net_reader: NNTP: remove article numbers from mail_sync folders
NNTP article numbers are stored separately from folder names in mail_sync.sqlite3. Recovering from this is optional, worse case is wasting bandwidth refetching some messages. To (optionally) recover from this, use: lei forget-mail-sync $URL_WITH_ARTNUMS Some articles will be refetched on the next import, but duplicate data won't be indexed in Xapian.
2021-09-19net_reader: disallow imap.fetchBatchSize=0
A batch size of zero is nonsensical and causes infinite loops.
2021-09-19lei config --edit: use controlling terminal
As with "lei edit-search", "lei config --edit" may spawn an interactive editor which works best from the terminal running script/lei. So implement LeiConfig as a superclass of LeiEditSearch so the two commands can share the same verification hooks and retry logic.
2021-09-19net_reader: no STARTTLS for IMAP localhost or onions
At least not by default, to match existing NNTP behavior. Tor .onions are already encrypted, and there's no point in encrypting traffic on localhost outside of testing.
2021-09-19watch: use net_reader->mic_new wrapper for SOCKS+TLS
This brings -watch up to feature parity with lei with SOCKS support.
2021-09-19net_reader: fix single NNTP article fetch, test ranges
While NNTP ranges was already working, fetching a single message was broken. We'll also simplify the code a bit and ensure incremental synchronization is ignored when ranges are specified.
2021-09-19lei ls-mail-source: pretty JSON support
As with other commands, we enable pretty JSON by default if stdout is a terminal or if --pretty is specified. While the ->pretty JSON output has excessive vertical whitespace, too many lines is preferable to having everything on one line.
2021-09-19lei ls-mail-source: use "high"/"low" for NNTP
The meanings of "hwm" and "lwm" may not be obvious abbreviations for (high|low) water mark descriptions used by RFC 3977. "high" and "low" should be obvious to anyone.
2021-09-19lei: clamp internal worker processes to 4
"All" my CPUs is only 4, but it's probably ridiculous for somebody with a 16-core system to have 16 processes for accessing SQLite DBs. We do the same thing in Pmdir for parallel Maildir access (and V2Writable).
2021-09-19ipc: drop dynamic WQ process counts
In retrospect, I don't think it's needed; and trying to wire up a user interface for lei to manage process counts doesn't seem worthwhile. It could be resurrected for public-facing daemon use in the future, but that's what version control systems are for. This also lets us automatically avoid setting up broadcast sockets Followup-to: 7b7939d47b336fb7 ("lei: lock worker counts")
2021-09-19lei_xsearch: drop Data::Dumper use
We're not using Data::Dumper for JSON output.
2021-09-19lei: simplify sto_done_request
With the switch from pipes to sockets for lei-daemon => lei/store IPC, we can send the script/lei client socket to the lei/store process and rely on reference counting in both Perl and the kernel to persist the script/lei.
2021-09-19lei/store: use SOCK_SEQPACKET rather than pipe
This has several advantages: * no need to use ipc.lock to protect a pipe for non-atomic writes * ability to pass FDs. In another commit, this will let us simplify lei->sto_done_request and pass newly-created sockets to lei/store directly. disadvantages: - an extra pipe is required for rare messages over several hundred KB, this is probably a non-issue, though The performance delta is unknown, but I expect shards (which remain pipes) to be the primary bottleneck IPC-wise for lei/store.
2021-09-19ipc: allow disabling broadcast for wq_workers
Since some lei worker classes only use a single worker, there's no sense in having broadcast for those cases.
2021-09-19ipc: wq_do: support synchronous waits and responses
This brings the wq_* SOCK_SEQPACKET API functionality on par with the ipc_do (pipe-based) API.
2021-09-19net_reader: quote URL properly for Tor .onion hint
The semicolon in ';AUTH=ANONYMOUS' requires quoting in Bourne shell.
2021-09-18lei up: automatically use dt: for remote externals
Since we can't use maxuid for remote externals, automatically maintaining the last time we got results and appending a dt: range to the query will prevent HTTP(S) responses from getting too big. We could be using "rt:", but no stable release of public-inbox supports it, yet, so we'll use dt:, instead. By default, there's a two day fudge factor to account for MTA downtime and delays; which is hopefully enough. The fudge factor may be changed per-invocation with the --remote-fudge-factor=INTERVAL option Since different externals can have different message transport routes, "lastresult" entries are stored on a per-external basis.
2021-09-18net_reader: set SO_KEEPALIVE on all Net::NNTP sockets
SO_KEEPALIVE can prevent stuck processes and is safe to enable unconditionally on all TCP sockets (like git, and the rest of public-inbox does). Verified via strace on both NNTP and NNTPS with and without nntp.proxy=socks5h://...
2021-09-18net_reader: support imaps:// w/ socks5h:// proxy
While Non-TLS IMAP worked perfectly with IO::Socket::Socks and Mail::IMAPClient; we need to wrap the IO::Socket::Socks object with IO::Socket::SSL before handing it to Mail::IMAPClient.
2021-09-18net_reader: detect IMAP failures earlier
An Mail::IMAPClient object may be returned even on connection failure, so use IsConnected to check for it. This ensures git-credential will no longer prompt for passwords when there's no connection.
2021-09-18net_reader: tie SocksDebug to {imap,nntp}.Debug
I think tying IO::Socket::Socks debugging to existing debug switches is enough, and there's no need to introduce a separate socks.Debug parameter.
2021-09-18ds: support add unique timers
A common pattern we use is to arm a timer once and prevent it from being armed until it fires. We'll be using it more to do polling for saved searches and imports.
2021-09-18lei_mail_sync: set nodatacow on btrfs
As with other SQLite3 databases, copy-on-write with files experiencing random writes leads to write amplification and low performance.
2021-09-18lei_mail_sync: rely on flock(2), avoid IPC
Since 44917fdd24a8bec1 ("lei_mail_sync: do not use transactions"), relying on lei/store to serialize access was a pointless endeavor. Rely on flock(2) to serialize multiple writers since (in my experience) it's the easiest way to deal with parallel writers when using SQLite. This allows us to simplify existing callers while speeding up 'lei refresh-mail-sync --all=local' by 5% or so.
2021-09-18lei: lock worker counts
It doesn't seem worthwhile to change worker counts dynamically on a per-command-basis with lei, and I don't know how such an interface would even work...
2021-09-17git_http_backend: forward HTTP_GIT_PROTOCOL in request headers
It looks like git-http-backend(1) will support HTTP_GIT_PROTOCOL, soon, and we won't have to add GIT_PROTOCOL support to support newer versions of the git protocol, either. Link: https://public-inbox.org/git/YTiXEEEs36NCEr9S@coredump.intra.peff.net/
2021-09-17fetch: ignore non-writable epoch dirs
This will eventually be useful for maintaing partial mirrors. Keeping inline with the original public-inbox-fetch philosophy, there are no additional config files to manage: the user merely needs to remove write permissions to an $N.git directory to prevent it from being updated. Re-enabling updates just requires restoring write permission.
2021-09-17search: fix rt: w/ approxidate when TZ != UTC
While git respects a user's local timezone and returns seconds-since-the-Epoch, we were unnecessarily and incorrectly calling gmtime+strftime on its result. So ignore calling gmtime+strftime when the strftime format is "%s", just feed the output time from git directly to Xapian. This is mainly for lei, which will likely run in a variety of timezones. While we're at it, add a recommendation to use TZ=UTC in public-inbox-httpd, in case there are (misguided :P) sysadmins who set a non-UTC TZ.
2021-09-17lei refresh-mail-sync: drop old IMAP folder info
Like with Maildir, IMAP folders can be deleted entirely. Ensure they can be eliminated, but don't be fooled into removing them if they're temporarily unreachable.
2021-09-17lei refresh-mail-sync: implicitly remove missing folders
There's no point in keeping mail_sync.sqlite3 entries around if the folder is gone. We do keep saved-search configs around, however, since somebody may decide to blow away a search and start over.
2021-09-17lei refresh-mail-sync: drop unused {verify} code path
That option was never wired up, and probably not needed...
2021-09-17lei refresh-mail-sync: remove "gone" notices
Those stderr messages are not useful at all, and harmful with the noise they cause.
2021-09-17lei_mail_sync: don't hold statement handle into callback
This can cause readers and writers to conflict since the implicit transaction from SELECT in a LeiRefreshMailSync worker would block the LeiStore process.
2021-09-17lei refresh-mail-sync: replace prune-mail-sync
Merely pruning mail synchronization information was insufficient for Maildir: renames are common in Maildir and we need to detect them after-the-fact when lei-daemon isn't running. Running this command could make "lei index" far more useful... v2: close R/O mail_sync.sqlite3 dbh before fork Keeping the DB file handle open across fork can cause bad things to happen even if we don't use it since sqlite3 itself still knows about it (but doesn't know Perl code doesn't know about it).
2021-09-16lei_pmdir: do not attempt to trigger network auth
Since some commands access both Maildirs and IMAP/NNTP servers at the same time, LeiPmdir may see the same lei->{auth} and lei->{net} objects as the sibling LeiInput-based workers. Delete those at fork and do not attempt to do authentication in those cases, since "net_merge_continue" will not be a registered op and cause PktOp to fail even if authentication /can/ work from a LeiPmdir worker.
2021-09-16net_reader: load IO::Socket::Socks in all workers
This was previously undetected since SOCKS is mainly used for read-only (single worker) tasks, and worker[0] always loaded the module. However, "lei refresh-mail-sync" can bounce reads to any worker, so we need to ensure worker[1..Inf] load it, too.
2021-09-16lei: git_oid: replace git_blob_id
We'll be using binary SHA-1 and SHA-256 in-memory since that's what mail_sync.sqlite3 stores.
2021-09-16imapd: sort LIST response
While RFC 3501 doesn't require LIST responses be sorted, it makes reading protocol dumps easier and we memoize it once per-refresh, so it shouldn't be too expensive even with thousands of folders.
2021-09-16lei ls-mail-source: sort IMAP folder names
Otherwise, public-inbox-imapd will emit mailboxes in random order (as IMAP servers do not need to guarantee any sort of ordering). We'll take into account numeric slice numbers generated by -imapd if they exist, so slice "80" doesn't show up next to "8".
2021-09-16net_reader: emit .onion help for potential Tor users
We can't easily use torsocks, here, so try to be helpful when it comes to proxy support.
2021-09-16www_stream: note existence of IMAP and NNTP URLs
The "mirror" link may not clue users into the existence of NNTP and IMAP servers, so add a note about them (but don't list them, in case there are dozens of URLs :>).
2021-09-16www: support publicinbox.imapserver
This allows PublicInbox::WWW hosts to advertise the existence of IMAP servers in addition to NNTP servers.
2021-09-16inbox: streamline ->nntp_url
We no longer waste a precious hash slot for a per-Inbox {nntpserver} if it's only configured globally for all inboxes.
2021-09-15fetch|clone|--mirror: shorten paths for progress output
The full pathname for "curl -o ..." was too noisy and confusing. Reduce confusion by adding the ".tmp" suffix and relying on "-C". We'll also avoid displaying "-C" in run_reap() and rely on "--git-dir=" with "git fetch" to display progress for users.
2021-09-15clone|fetch|--mirror: add convenience $INBOX_DIR/Makefile
Since the beginning of time, I've been dropping Makefiles in $INBOX_DIR (and above hiearchies) to organize groups of commands. make(1) is widely available in various flavors and a familiar tool for our target audience. It is easy to run in the right directory, typically has built-in shell completion, and doesn't silently ignore errors by default like Bourne shell.
2021-09-15fetch: support --exit-code switch
As noted in the new manpage entry, this is useful for avoiding public-inbox-index invocations when there's nothing to update. We use 127 to match "grok-pull", and also because it doesn't conflict with any of the current curl(1) exit codes.
2021-09-15lei_saved_search: fix prefix for IMAP folders w/ slash
We failed to account for IMAP mailboxes containing `/' characters when creating saved search files for them. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210915123347.knr4qpaei73tjc5q@meerkat.local/
2021-09-15multi_git: hoist out common epoch/alternates handling
IMHO, this greatly improves code sharing and organization between v2, extindex, and lei/store. Common git-related logic for these is lightly-refactored and easier to reason about. The impetus for this big change was to ensure inboxes created+managed by public-inbox-{clone,fetch} could have alternates and configs setup properly without depending on SQLite (via V2Writable). This change does that while making old code shorter and better factored.