Date | Commit message (Collapse) |
|
Our pure-Perl (PublicInbox::AddressPP) fallback is closer to the
preferred Email::Address::XS (EAX) behavior than Mail::Address
is for ->name support. EAX tends to be overkill with good spam
filtering, and using our own fallback means life is easier for
users with neither C/XS build tools nor a pre-built EAX package.
|
|
Aside from our prior import bugs (fixed in a0c07cba0e5d8b6a
(mda: drop leading "From " lines again, 2016-06-26)), we'll
always have to be dealing with mutt piping messages to us and
`git format-patch' output. So just share the regexp so we
can use it everywhere.
In may be desirable to allow importing messages with a leading
"From " line for FUSE, even.
Additionally, some instances of this regexp needlessly added
optional `\r?' (CR) checks ahead of the `\n' (LF) element; but
they're pointless anyways since [^\n]* is enough to exclude all
non-LF bytes.
|
|
This enables the t/pop3d.t test to pass when Parse::RecDescent
is not available.
|
|
The $oid arg for Git->cat_async is defined on async_abort using
the original request, so use undefined $type to distinguish that
case in caller-supplied callbacks. async_abort isn't common, of
course, but sometimes git subprocesses can die unexpectedly.
|
|
This fixes the display of raw (non-RFC 2047) names and subjects
in HTML message views.
SMTPUTF8 (RFC 6531) allows raw UTF-8 in headers without RFC 2047
encoding, so let Perl handle it as a character sequence for the
rest of our consumers. Thus, the old special case in
PublicInbox::Smsg->populate is no longer necessary and gone.
The one regression notice so far (and fixed here) is compressed
IMAP envelope responses still needs raw bytes since the zlib
wrapper is designed for octets, not Perl UTF-8 chars. Thus we
reverse utf8::decode with utf8::encode in PublicInbox::IMAP::_esc.
->header_set also forces encoding to bytes, since all existing
callers would either be dealing with ->header_raw results or
be RFC-2047-encoded anyways.
Reindexing is not necessary with this change due to the prior
PublicInbox::Smsg->populate special case.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20221124153715.3nenjpjzj43vqxr2@meerkat.local/
|
|
We can assign arrays directly without `[]' creating an extra
immortal pad allocation.
|
|
warn/carp usage is unavoidable given Perl itself and standard
libraries, so just rely on localized $SIG{__WARN__} from
60d262483a4d6ddf (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08)
for all error reporting.
While we're in the area, make some of the error handling more
consistent between IMAP/NNTP/POP3.
|
|
This allows "-l $ADDRESS?err=/path/to/err.log to isolate normal
warn() (and carp()) messages for a particular listen address to
track down errors more easily.
|
|
...by deprioritizing clients using a username + password.
As IMAP provides AUTH=ANONYMOUS for designating anonymous
access, we'll rely on it as a heuristic for favoring "good"
clients. Clients using a username + password seem to (more
often than not) be malicious and looking for info which doesn't
belong in public inboxes.
This copies the technique used by WWW + -httpd to deprioritize
expensive mbox.gz downloads.
|
|
Looking at IMAP traffic on public-inbox.org, it seems there is a
fair amount of traffic coming from malicious clients assuming
the IMAP server is compromised and searching for private
information. Since AUTH=ANONYMOUS clients are more likely to
be legitimate clients looking for publicly-archived mail,
give them priority.
|
|
We can reduce ops and temporary objects here by folding the
stringification into the `for' loop and push directly into the
{mailboxlist} array; relying on autovivification to turn it into
a noop for the initial population.
|
|
ConfigIter was still too slow despite being fair. The addition of
ART_MIN in ALL->misc means it can be used as a startup/reload cache
for -imapd, too.
This results in a ~3x faster startup for -imapd with 50K inboxes.
|
|
This allows new TLS certificates to be loaded for new clients
without having to timeout nor drop existing clients with
established connections made with the old certs. This should
benefit users with admins who expire certificates frequently (as
encouraged by Let's Encrypt).
|
|
->zflush is already for GzipFilter in PublicInbox::WWW,
while we use DEFLATE for NNTP and IMAP. This ought to
make the code easier-to-follow.
|
|
Their code was nearly identical to begin with, so save some
memory in -netd and disk space for all of our tarball/distro
users, at least.
And I seem to have used multiple inheritance successfully, here,
maybe...
|
|
It's not actually used by our POP3 code at the moment,
but it may be soon to reduce memory usage when loading
50K smsg objects into memory.
|
|
It's the same subroutine everywhere.
|
|
More deduplication, and POP3 never needed it.
|
|
We can share some common code between IMAP, NNTP, and POP3
without too much trouble, so cut down our LoC.
|
|
This only affects the rarely-used STATUS command, our message
count was consistely zero due to misusing ->imap_exists.
Noticed while implementing POP3 server.
|
|
Noticed while reviewing pieces for POP3.
|
|
As stated in the previous change, conditional hash assignments
which trigger other hash assignments seem problematic, at times.
So replace:
$h->{k} //= do { $h->{x} = ...; $val };
$h->{k} // do {
$h->{x} = ...;
$hk->{k} = $val
};
"||=" is affected the same way, and some instances of "||=" are
replaced with "//=" or "// do {", now.
|
|
Caching the value doesn't seem necessary from a performance
perspective, and it adds a caveat for read-only users which
may lead to bugs in future code.
|
|
It's needlessly complex and O(n), so it doesn't scale well to a
high number of clients nor is it easy-to-scale with the data
structures available to us in pure Perl.
In any case, I see no evidence of either -imapd nor -nntpd
experiencing high connection loads on public-facing sites.
-httpd has never had its own timer-based expiration, either.
Fwiw, public-inbox.org itself has been running a public-facing
HTTP/HTTPS server with no userspace idle client expiration for
the past 8 years or with no ill effect. Clients can come and go
as they wish, and SO_KEEPALIVE takes care of truly broken
connections if they're gone for ~2 hours.
Internet connections drop all time, so it should be harmless to
drop connections w/o warning since both NNTP and IMAP protocols
have well-defined semantics for determining if a message was
truncated (as does HTTP/1.1+).
|
|
add_uniq_timer seems sufficient, and we'll drop the last
user of ::later (IMAP) and switch to unique timers.
|
|
While RFC 3501 doesn't require LIST responses be sorted,
it makes reading protocol dumps easier and we memoize it
once per-refresh, so it shouldn't be too expensive even
with thousands of folders.
|
|
While the WWW front-end can gracefully handle ->mm and ->over
disappearing (in most cases), IMAP+NNTP front-ends are completely
dependent on these and failed mysteriously when they go missing
after startup.
These will hopefully make issues like what Konstantin
encountered more obvious:
Link: https://public-inbox.org/meta/20210824204855.ejspej4z7r2rpu63@nitro.local/
|
|
While both git and libgit2 take around 16 minutes to load 100K
alternates there's already a proposed patch to make git faster:
<https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/>
It's also easier to patch and install git locally since the
git.git build system defaults to prefix=$HOME and dealing with
dynamic linking with libgit2 is more difficult for end users
relying on Inline::C.
libgit2 remains in use for the non-ALL.git case, but maybe it's
not necessary (libgit2 is significantly slower than git in
Debian 10 due to SHA-1 collision checking).
|
|
None of the Content-Type attributes are long-lived
(and unlikely to be memory intensive). While these
callsites won't trigger $DB::args segfaults via
confess or longmess, it'll make future code audits
easier.
cf. commit 0795b0906cc81f40
("ds: guard against stack-not-refcounted quirk of Perl 5")
|
|
It seems only triggered by bots trying to steal information.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
The ->mset method always returns a Xapian mset nowadays, so
naming a parameter {mset} is too confusing. As it does with
MiscSearch, setting the {relevance} parameter to -1 now sorts by
ascending docid order. -2 is now supported for descending
docid order, too, since it may be useful for lei users.
|
|
Avoid confusing hackers since this conflicts with a method name
provided by (Search::)Xapian::QueryParser.
|
|
Since IMAP search (either with Isearch or traditional per-Inbox
search) only returns UIDs, we can safely set the limit to the
UID slice size(*). With isearch, we can also trust the Xapian
result to fit any docid range we specify.
Limiting Xapian results to 1000 was making ->ALL docid <=>
per-Inbox UID impossible since results could overlap between
ranges unpredictably.
Finally, we can map the ->ALL docids into per-Inbox UIDs and
show them to the client in the UID order of the Inbox, not the
docid order of the ->ALL extindex.
This also lets us get rid of the "uid:" query parser prefix
and use the Xapian::Query API directly to reduce our search
prefix footprint.
For mbox.gz downloads in WWW, we'll also make a best effort to
preserve the order from the Inbox, not the order of extindex;
though it's possible large result sets can have non-overlapping
windows.
(*) by definition, UID slice size is a "safe" value which
shouldn't OOM either the server or clients.
|
|
Apparently they happen (triggered by my -imapd instance), so
bail out by closing the underlying socket rather than stopping
the event loop and daemon process.
|
|
This ought to save a few cycles if a client disconnects while
in the middle of a (UID) FETCH. This avoids:
Can't call method "git" on an undefined value at .../PublicInbox/IMAP.pm
errors in stderr.
|
|
It seems easiest to have a singleton Gcf2Client client object
per daemon worker for all inboxes to use. This reduces overall
FD usage from pipes.
The `public-inbox-gcf2' command + manpage are gone and a `$^X'
one-liner is used, instead. This saves inodes for internal
commands and hopefully makes it easier to avoid mismatched
PERL5LIB include paths (as noticed during development :x).
We'll also make the existing cat-file process management
infrastructure more resilient to BOFHs on process killing
sprees (or in case our libgit2-based code fails on us).
(Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd
won't automatically benefit from this change, and extra
configuration will be required (to be documented later).
|
|
This was triggered by blindly trying to FETCH an MSN (not
"UID FETCH") on an empty dummy inbox. It's harmless, and
probably triggered by a wayward client or misbehaving bot.
|
|
We switched to Parse::RecDescent during development and left
some dead code behind.
|
|
Nearly all of the search uses in the production code rely on
a Xapian mset iterator being returned (instead of an array
of $smsg objects). So default to returning the mset and move
the burden of smsg array conversion into the test cases.
|
|
We can avoid importing mdocid() in several places by using
this method, simplifying callers.
|
|
No need to have awkward globrefs for this.
|
|
We can keep the git process more active by sending another
request to it while fetch_run_ops() is running. This
parallelization speeds up mutt's initial FETCH for headers by
around ~35%(!).
|
|
Non-slice mailboxes never have messages themselves,
so we must not assume a message exists when sending
untagged EXISTS messages.
|
|
Since -edit and -purge should be rare and TOCTOU around them
rarer still; missing {blobs} could be indicative of a real bug
elsewhere. Warn on them.
And I somehow ended up with 3 different field names for Inbox
objects. Perhaps they'll be made consistent in the future.
|
|
Since the removal of pseudo-hash support in Perl 5.10, the
"fields" module no longer provides the space or speed benefits
it did in 5.8. It also does not allow for compile-time checks,
only run-time checks.
To me, the extra developer overhead in maintaining "use fields"
args has become a hassle. None of our non-DS-related code uses
fields.pm, nor do any of our current dependencies. In fact,
Danga::Socket (which DS was originally forked from) and its
subclasses are the only fields.pm users I've ever encountered in
the wild. Removing fields may make our code more approachable
to other Perl hackers.
So stop using fields.pm and locked hashes, but continue to
document what fields do for non-trivial classes.
|
|
We need to rely on num_highwater for UIDNEXT since the
highest `num' stored in over.sqlite3 may be rolled back
if the most recent messages were spam.
We also need to load the uo2m immediately on EXAMINE to ensure
EXISTS responses are always consistent with regard to future
updates.
|
|
Clients which are NOT in an IDLE state still need to be
notified of message existence. Unlike the EXPUNGE response,
untagged EXISTS responses seem to be allowed at any time
according to RFC 3501.
We'll also perform uo2m_extend on the NOOP command, since
NOOP is the recommended command for message polling.
|
|
While this circular reference was carefully managed to not leak
memory; it was still triggering a warning at -imapd/-nntpd
shutdown due to the EPOLL_CTL_DEL op failing after the $Epoll FD
gets closed.
So remove the circular reference by providing a ref to `undef',
instead.
|
|
There's no need to loop when the first iteration guarantees
a `return'.
|