Date | Commit message (Collapse) |
|
Xapian and SQLite access can be slow when a DB is large and/or
on high-latency storage.
|
|
We need to waitpid synchronously on pkg-config to use $?.
When loading Gcf2 inside the event loop, implicit dwaitpid
done by PublicInbox::ProcessPipe would not call waitpid in
time to zero $?. This was causing one of my -httpd to
occasionally fall back to git(1) instead of using Gcf2.
This was noted in:
Link: https://public-inbox.org/meta/20210914085322.25517-1-e@80x24.org/
|
|
NNTP article numbers are stored separately from folder names
in mail_sync.sqlite3.
Recovering from this is optional, worse case is wasting
bandwidth refetching some messages. To (optionally) recover
from this, use:
lei forget-mail-sync $URL_WITH_ARTNUMS
Some articles will be refetched on the next import, but
duplicate data won't be indexed in Xapian.
|
|
A batch size of zero is nonsensical and causes infinite loops.
|
|
As with "lei edit-search", "lei config --edit" may
spawn an interactive editor which works best from
the terminal running script/lei.
So implement LeiConfig as a superclass of LeiEditSearch
so the two commands can share the same verification
hooks and retry logic.
|
|
At least not by default, to match existing NNTP behavior.
Tor .onions are already encrypted, and there's no point
in encrypting traffic on localhost outside of testing.
|
|
This brings -watch up to feature parity with lei with
SOCKS support.
|
|
While NNTP ranges was already working, fetching a single message
was broken. We'll also simplify the code a bit and ensure
incremental synchronization is ignored when ranges are
specified.
|
|
As with other commands, we enable pretty JSON by default if
stdout is a terminal or if --pretty is specified. While the
->pretty JSON output has excessive vertical whitespace, too many
lines is preferable to having everything on one line.
|
|
The meanings of "hwm" and "lwm" may not be obvious abbreviations
for (high|low) water mark descriptions used by RFC 3977.
"high" and "low" should be obvious to anyone.
|
|
"All" my CPUs is only 4, but it's probably ridiculous for
somebody with a 16-core system to have 16 processes for
accessing SQLite DBs.
We do the same thing in Pmdir for parallel Maildir access
(and V2Writable).
|
|
In retrospect, I don't think it's needed; and trying to wire up
a user interface for lei to manage process counts doesn't seem
worthwhile. It could be resurrected for public-facing daemon
use in the future, but that's what version control systems are for.
This also lets us automatically avoid setting up broadcast
sockets
Followup-to: 7b7939d47b336fb7 ("lei: lock worker counts")
|
|
We're not using Data::Dumper for JSON output.
|
|
With the switch from pipes to sockets for lei-daemon =>
lei/store IPC, we can send the script/lei client socket to the
lei/store process and rely on reference counting in both Perl
and the kernel to persist the script/lei.
|
|
This has several advantages:
* no need to use ipc.lock to protect a pipe for non-atomic writes
* ability to pass FDs. In another commit, this will let us
simplify lei->sto_done_request and pass newly-created
sockets to lei/store directly.
disadvantages:
- an extra pipe is required for rare messages over several
hundred KB, this is probably a non-issue, though
The performance delta is unknown, but I expect shards
(which remain pipes) to be the primary bottleneck IPC-wise
for lei/store.
|
|
Since some lei worker classes only use a single worker,
there's no sense in having broadcast for those cases.
|
|
This brings the wq_* SOCK_SEQPACKET API functionality
on par with the ipc_do (pipe-based) API.
|
|
The semicolon in ';AUTH=ANONYMOUS' requires quoting in Bourne shell.
|
|
Since we can't use maxuid for remote externals, automatically
maintaining the last time we got results and appending a dt:
range to the query will prevent HTTP(S) responses from getting
too big.
We could be using "rt:", but no stable release of public-inbox
supports it, yet, so we'll use dt:, instead.
By default, there's a two day fudge factor to account for MTA
downtime and delays; which is hopefully enough. The fudge
factor may be changed per-invocation with the
--remote-fudge-factor=INTERVAL option
Since different externals can have different message transport
routes, "lastresult" entries are stored on a per-external basis.
|
|
SO_KEEPALIVE can prevent stuck processes and is safe to enable
unconditionally on all TCP sockets (like git, and the rest of
public-inbox does). Verified via strace on both NNTP and NNTPS
with and without nntp.proxy=socks5h://...
|
|
While Non-TLS IMAP worked perfectly with IO::Socket::Socks
and Mail::IMAPClient; we need to wrap the IO::Socket::Socks
object with IO::Socket::SSL before handing it to
Mail::IMAPClient.
|
|
An Mail::IMAPClient object may be returned even on connection
failure, so use IsConnected to check for it. This ensures
git-credential will no longer prompt for passwords when there's
no connection.
|
|
I think tying IO::Socket::Socks debugging to existing debug
switches is enough, and there's no need to introduce a separate
socks.Debug parameter.
|
|
A common pattern we use is to arm a timer once and prevent
it from being armed until it fires. We'll be using it more
to do polling for saved searches and imports.
|
|
As with other SQLite3 databases, copy-on-write with
files experiencing random writes leads to write amplification
and low performance.
|
|
Since 44917fdd24a8bec1 ("lei_mail_sync: do not use transactions"),
relying on lei/store to serialize access was a pointless endeavor.
Rely on flock(2) to serialize multiple writers since (in my
experience) it's the easiest way to deal with parallel writers
when using SQLite. This allows us to simplify existing callers
while speeding up 'lei refresh-mail-sync --all=local' by 5% or
so.
|
|
It doesn't seem worthwhile to change worker counts dynamically
on a per-command-basis with lei, and I don't know how such an
interface would even work...
|
|
It looks like git-http-backend(1) will support
HTTP_GIT_PROTOCOL, soon, and we won't have to add GIT_PROTOCOL
support to support newer versions of the git protocol, either.
Link: https://public-inbox.org/git/YTiXEEEs36NCEr9S@coredump.intra.peff.net/
|
|
This will eventually be useful for maintaing partial mirrors.
Keeping inline with the original public-inbox-fetch philosophy,
there are no additional config files to manage:
the user merely needs to remove write permissions to an $N.git
directory to prevent it from being updated.
Re-enabling updates just requires restoring write permission.
|
|
While git respects a user's local timezone and returns
seconds-since-the-Epoch, we were unnecessarily and incorrectly
calling gmtime+strftime on its result. So ignore calling
gmtime+strftime when the strftime format is "%s", just feed
the output time from git directly to Xapian.
This is mainly for lei, which will likely run in a variety of
timezones. While we're at it, add a recommendation to use
TZ=UTC in public-inbox-httpd, in case there are (misguided :P)
sysadmins who set a non-UTC TZ.
|
|
Like with Maildir, IMAP folders can be deleted entirely.
Ensure they can be eliminated, but don't be fooled into
removing them if they're temporarily unreachable.
|
|
There's no point in keeping mail_sync.sqlite3 entries around
if the folder is gone. We do keep saved-search configs around,
however, since somebody may decide to blow away a search and
start over.
|
|
That option was never wired up, and probably not needed...
|
|
Those stderr messages are not useful at all, and harmful with
the noise they cause.
|
|
This can cause readers and writers to conflict since the
implicit transaction from SELECT in a LeiRefreshMailSync
worker would block the LeiStore process.
|
|
Merely pruning mail synchronization information was
insufficient for Maildir: renames are common in Maildir
and we need to detect them after-the-fact when lei-daemon
isn't running.
Running this command could make "lei index" far more
useful...
v2: close R/O mail_sync.sqlite3 dbh before fork
Keeping the DB file handle open across fork can cause bad things
to happen even if we don't use it since sqlite3 itself still knows
about it (but doesn't know Perl code doesn't know about it).
|
|
Since some commands access both Maildirs and IMAP/NNTP servers
at the same time, LeiPmdir may see the same lei->{auth} and
lei->{net} objects as the sibling LeiInput-based workers.
Delete those at fork and do not attempt to do authentication in
those cases, since "net_merge_continue" will not be a registered
op and cause PktOp to fail even if authentication /can/ work
from a LeiPmdir worker.
|
|
This was previously undetected since SOCKS is mainly used for
read-only (single worker) tasks, and worker[0] always loaded
the module. However, "lei refresh-mail-sync" can bounce reads
to any worker, so we need to ensure worker[1..Inf] load it, too.
|
|
We'll be using binary SHA-1 and SHA-256 in-memory since that's
what mail_sync.sqlite3 stores.
|
|
While RFC 3501 doesn't require LIST responses be sorted,
it makes reading protocol dumps easier and we memoize it
once per-refresh, so it shouldn't be too expensive even
with thousands of folders.
|
|
Otherwise, public-inbox-imapd will emit mailboxes in random
order (as IMAP servers do not need to guarantee any sort of
ordering). We'll take into account numeric slice numbers
generated by -imapd if they exist, so slice "80" doesn't show up
next to "8".
|
|
We can't easily use torsocks, here, so try to be helpful
when it comes to proxy support.
|
|
The "mirror" link may not clue users into the existence of
NNTP and IMAP servers, so add a note about them (but don't
list them, in case there are dozens of URLs :>).
|
|
This allows PublicInbox::WWW hosts to advertise the existence of
IMAP servers in addition to NNTP servers.
|
|
We no longer waste a precious hash slot for a per-Inbox
{nntpserver} if it's only configured globally for all inboxes.
|
|
The full pathname for "curl -o ..." was too noisy and confusing.
Reduce confusion by adding the ".tmp" suffix and relying on
"-C". We'll also avoid displaying "-C" in run_reap() and
rely on "--git-dir=" with "git fetch" to display progress for
users.
|
|
Since the beginning of time, I've been dropping Makefiles
in $INBOX_DIR (and above hiearchies) to organize groups
of commands.
make(1) is widely available in various flavors and a familiar
tool for our target audience. It is easy to run in the right
directory, typically has built-in shell completion, and doesn't
silently ignore errors by default like Bourne shell.
|
|
As noted in the new manpage entry, this is useful for avoiding
public-inbox-index invocations when there's nothing to update.
We use 127 to match "grok-pull", and also because it doesn't
conflict with any of the current curl(1) exit codes.
|
|
We failed to account for IMAP mailboxes containing `/'
characters when creating saved search files for them.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210915123347.knr4qpaei73tjc5q@meerkat.local/
|
|
IMHO, this greatly improves code sharing and organization
between v2, extindex, and lei/store. Common git-related
logic for these is lightly-refactored and easier to reason
about.
The impetus for this big change was to ensure inboxes
created+managed by public-inbox-{clone,fetch} could have
alternates and configs setup properly without depending on
SQLite (via V2Writable). This change does that while
making old code shorter and better factored.
|