Date | Commit message (Collapse) |
|
This allows -fetch to work out-of-the-box on using the
grokmirror 2.x default of "_grokmirror".
|
|
Encode::FB_CROAK leaks memory in old versions of Encode:
<https://rt.cpan.org/Public/Bug/Display.html?id=139622>
Since I expect there's still many users on old systems and old
Perls, we can use "$SIG{__WARN__} = \&croak" here with
Encode::FB_WARN to emulate Encode::FB_CROAK behavior.
|
|
We'll be reusing this in more places. While we're at it, allow
it to tail all run_script() users, including lei() in TestCommon.
|
|
This ought to give us more CoW savings and fragmentation
avoidance in -httpd.
|
|
This gives context as to where warnings are coming from.
|
|
Large chunks of our codebase and 3rd-party dependencies do not
use ->{psgi.errors}, so trying to standardize on it was a
fruitless endeavor. Since warn() and carp() are standard
mechanism within Perl, just use that instead and simplify a
bunch of existing code.
|
|
warn() is easier to augment with context information, and
frankly unavoidable in the presence of 3rd-party libraries
we don't control.
|
|
AFAIK I've never hit these messages, but I might be glad
if I ever do.
|
|
Eml->warn_ignore_cb itself returns a callback, so creating a
reference to it was wrong when assigning it to $SIG{__WARN__};
Fixes: 176cd51f9aa81b74 ("daemon: quiet down Eml-related warnings")
|
|
This helps users make sense of which saved searches some
warnings were coming from.
Since I often create and discard externals, some warnings
from saved searches were confusing to me without output context:
"`$FOO' is unknown"
"$FOO not indexed by Xapian"
|
|
This covers v1 inboxes, as well. We also guard the execution
since "PRAGMA optimize" was only introduced in SQLite 3.18.0
(2017-03-30)
|
|
|
|
This prevents unnecessary message renumbering and I/O.
Without this change, there is a small window for long-running
WWW streaming requests to miss a message that was unref-ed
before reindexing. If we expose an "All Mail" mailbox via
IMAP/JMAP, this will save client traffic.
|
|
This allows IMAP mirrors to keep UIDVALIDITY synchronized (and
"LIST ACTIVE.TIMES" in NNTP). "lei add-external --mirror" will
automatically set it, as will the combination of
public-inbox-clone + public-inbox-index.
This avoids the need for extra endpoints or config entries,
at least...
|
|
The original Msgmap->new API was v1-specific and not necessary.
The ->new_file API now supports an $ibx object being passed to
it, simplify -no_fsync use. It will also make an upcoming
change easier...
|
|
The cost of opening a Xapian DB (even with shards) isn't high,
so save some FDs and just close it. We hit Xapian far less than
over.sqlite3 and we discard the MSet ASAP even when streaming
large responses.
This simplifies our code a bit and hopefully helps reduce
fragmentation by increasing mortality of late allocations.
|
|
We still need to account for msgmap being open all the time
and not having separate read-only vs. read-write packages.
|
|
msgmap is not performance-critical enough to justify doing our
own prepared statement caching. Just rely on the functionality
of DBI here so future changes will be easier.
There's also minor style changes to avoid dirtying refcount
cache lines bumping by repeating hash lookups rather than attempting
to store them as locals.
|
|
"<0>" could be a valid Message-ID, maybe...
|
|
Xapian::QueryParser is attached to the Xapian::Database,
so holding onto the QueryParser was preventing us from
releasing DB handles if a query was performed.
|
|
Email::Address::XS is quite noisy and there's nothing we can
really do about messages we're serving from read-only daemons.
|
|
We're moving towards relying on "perl -w" for warnings and
v5.12 for strict.
|
|
It may not exist due to periodic cleanup to avoid excessive FD use.
|
|
When unref-ing a blob from xref3, make sure the "preferred"
smsg->{blob} doesn't point to the blob we just unrefed. This
is necessary because we periodically checkpoint our extindex
process to allow -watch and -mda processes to run.
This also gets rid of a lot of redundant code for ->remove_xref3,
since it's all handled in ExtSearchIdx, now.
|
|
We need to ensure a message is consistently removed from eidxq,
over and Xapian in all cases. Removing from eidxq saves users
from some noisy error messages.
|
|
We can use the same logic for --gc and --reindex and
'd' log entries
They're similar enough and the actual need to unref should
be fairly rare. We could go a lot faster if we didn't show
progress for --gc and --reindex, actually.
|
|
We also have the idea of active inboxes, too, so "active shards"
ought to make the purpose of the data structure more obvious.
|
|
As recommended by SQLite documentation[1]:
To achieve the best long-term query performance without the need
to do a detailed engineering analysis of the application schema
and SQL, it is recommended that applications run "PRAGMA optimize"
(with no arguments) just before closing each database connection.
Hopefully that works for our use cases and can make things
faster for us.
[1] https://www.sqlite.org/pragma.html#pragma_optimize
|
|
This required some tweaking of xref3 indices in over.sqlite3,
but the end result is it brings no-op "--reindex --fast --all"
checks down to roughly 20 minutes (from 30-40 minutes) on
lore/all.
This is faster because a bunch of small SQLite queries are still
slower en-mass than a bunch of perlops. Despite the lack of IPC
overhead, crossing .so boundaries and repeating lookups over
btrees is still slower than doing the same with Perl hash tables.
|
|
Otherwise, it gets too noisy and we repeat some work
when we do an actual sync, since the last_commit info
will be out-of-date.
|
|
This is slighly more meaningful since the file is already
in ~/.local/share/lei/store, so "lei_store" was redundant
(and the "XXXX" are random characters replaced by File::Temp)
|
|
We were deleting ghost entries, this was usually harmless since
other messages could fill-in-the-blanks, but could cause
misthreading in odd cases where a big chunk of a thread is
missing and the latest messages only referenced ghosts.
We'll also save some cycles when scanning Xapian shards since
docids won't be <= 0.
|
|
Don't bother decoding the 20-byte SHA-1 to a 40-byte hex value
since we don't read it, anyways. We can also use the on-stack
ibx->eidx_key value instead of dispatching the method again.
|
|
Avoiding repeated SQL statements brings --gc down to 2-3 minutes
from around 10. We'll also add some checkpoints around over and
xref3 cleanups.
|
|
We'll set nodatacow when detecting existing but empty
files, and also their directories in more cases (for
auxiliary -wal, -journal, -shm files). Hopefully
this keeps performance reasonable on CoW FSes.
|
|
It's more consistent with TAP output and hopefully puts
users at ease in case they don't understand the meaning
of a message.
|
|
Just in case it fails when there's many parallel invocations.
|
|
This mode only checks history for missed/stale messages
and doesn't attempt to reindex messages which are already
indexed.
|
|
We'll also save a few LoC when generating it. $smsg objects can
linger a while when rendering large threads, so saving a few
bytes here can add up to several hundred KB saved.
I noticed this while chasing the ref cycle leak in commit
b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03).
While there's no longer a leak, releasing memory earlier can
allow it to be reused sooner and reduce both memory traffic and
memory pressure.
|
|
By using syswrite to populate env->{psgi.input}. The substr()
call IO::Handle->write will trigger Perl's target/scratchpad and
result in a permanent allocation. Since this is a cold path,
that allocation is pointless, and syswrite() can already write a
substring.
Allowing Perl to cache a large allocation in a cold path only
result in fragmentation and wasted RAM.
write(2) on a regular file won't result in short writes
unless the FS quotas or free space limits are hit, or the buffer
is close to overflowing (e.g. the 0x7ffff000-byte Linux limit).
Since our HTTP server will never buffer that much in RAM,
there's no need to retry syswrite nor rely on the retrying
implicit in IO::Handle->write and the "print" perlop.
|
|
We can release the raw body buffer once we've obtained a copy of
the decoded buffer. This reduces memory pressure ahead of some
expensive diff processing.
|
|
Some of these scalar buffers may be large patches, so try
to keep them as short-lived as possible to reduce memory
pressure.
|
|
We'll be supporting pipelining in a future commit, since
Tor is too slow and increasing batch size can use too much
memory.
|
|
This should help us catch BUG: errors (and then some) in
-extindex and other read-write code paths. Only read-only
daemons should warn on async callback failures, since those
aren't capable of causing data loss.
|
|
We need to abort both check-only and cat requests when
aborting, since we'll be aborting more aggressively
in in read-write paths.
|
|
Some code paths may use maximum size checks, so ensure
any checks are waited on, too.
|
|
This may fix some extindex problems and should get rid of
the "Can't bless non-reference value" errors.
|
|
The use of `substr' here an argument to `print' was causing Perl
to internally cache its target buffer. Since `syswrite()'
already offers a buffer offset arg and length limits, just use
`syswrite' directly. We were using autoflush anyways, so the
lack of buffering was of no concern performance-wise.
The target buffer could get to roughly ~10MB under some loads,
but it was usually a cold path and using memory which cannot be
released nor reused in other places.
note: IO::Handle::write uses `substr' internally, too;
so nothing would be gained using IO::Handle:write.
|
|
The regexp in split_quotes relies on the presence of a
final "\n", so add it wherever we need to instead of
making it the responsibility of every caller.
This probably doesn't matter in practice since every
email seems to have a "\n" as the final byte (due to
the way SMTP works), but maybe there's some odd ones
that'll get imported via lei.
|
|
This should bring us closer to the "Base subject" definition in
IMAP ORDEREDSUBJECT (RFC 5256 2.1). Larger changes may cause
some breakage (until --reindex). But for now, a reindex will
prevents the non-ASCII subjects from being normalized to the
same fuzzy "thread" in the thread view.
|