Date | Commit message (Collapse) |
|
Some yak-shaving while I try to track down other bugs...
|
|
The original Msgmap->new API was v1-specific and not necessary.
The ->new_file API now supports an $ibx object being passed to
it, simplify -no_fsync use. It will also make an upcoming
change easier...
|
|
The cost of opening a Xapian DB (even with shards) isn't high,
so save some FDs and just close it. We hit Xapian far less than
over.sqlite3 and we discard the MSet ASAP even when streaming
large responses.
This simplifies our code a bit and hopefully helps reduce
fragmentation by increasing mortality of late allocations.
|
|
Having git processes outlive DB handles is likely to hurt
from a fragmentation perspective if the DB handle needs to
be recreated immediately due to a git->cat_async callback.
So only unref DB handles when we're sure there's no live
git users left, otherwise check the inodes.
We'll also avoid needless localization checks in git->cleanup
and make the return value more obvious since the pid fields are
unconditionally deleted nowadays.
|
|
It was probably incorrect to use from max_git_epoch, and it's
small enough to inline into do_cleanup. We'll also eliminate
the unnecessary deletion of {-altid_map} while we're in the
area, since we no longer cache/memoize that.
Fixes: 7e5cea05f061e757 ("inbox: rewrite cleanup to be more aggressive")
|
|
This caused config->repo_objs to not fill in {-repo_objs}
properly before starting solver.
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87o88cqobd.fsf@kyleam.com/
Fixes: 63d7b8ceee55a34 ("daemons: revamp periodic cleanup task")
|
|
cloneurl, description, and base_url are no longer memoized. The
non-$env form of base_url is rare in WWW, and is fast enough to
not require memoization.
cloneurl and description are now expired during cleanup,
allowing admins to change these files without restarting
(or SIGHUP).
-altid_map is no longer cached nor memoized at all, since the
endpoint(s) which hit it seem rarely accessed.
nntp_url and imap_url are now cached (instead of memoized) in
case an inbox is unvisited for a long time. They remain cached
since the truthiness check gets called in every per-inbox HTML
page, which can potentially be expensive.
|
|
Avoid relying on a giant cleanup hash and instead use the new
DS->add_uniq_timer API to amortize the pause times associated
with having to cleanup many inboxes. We can also use smaller
intervals for this, as well.
We now discard SQLite DB handles at cleanup. Each of these can
use several megabytes of memory, which adds up with
hundreds/thousands of inboxes. Since per-inbox access intervals
are unpredictable and opening an SQLite handle is relatively
inexpensive, release memory more aggressively to avoid the heap
having to hit swap.
|
|
This saves us some memory for the hash slot in the common case
the `cloneurl' file doesn't exist.
|
|
`undef' entries still take up a slot in the hash table, and
cause the `exists' check to false-positive in ->cleanup_shards.
This should fully fix the (innocuous) messages introduced in
commit 63d7b8ce (daemons: revamp periodic cleanup task, 2021-09-23)
|
|
Neither Inboxes nor ExtSearch objects were retrying correctly
when there are live git processes, but the inboxes were getting
rescanned for search or other reasons. Ensure the scan retries
eventually if there's live processes.
We also need to update the cleanup task to detect Xapian shard
count changes, since Xapian ->reopen is enough to detect any
other Xapian changes. Otherwise, we just issue an inexpensive
->reopen call and let Xapian check whether there's anything
worth reopening.
This also lets us eliminate the Devel::Peek dependency.
|
|
Check for unlinked mmap-ed files via /proc/$PID/maps every 60s
or so.
ExtSearch (extindex) is compatible-enough with Inbox objects to
be wired into the old per-inbox code, but the startup cost is
projected to be much higher down the line when there's >30K
inboxes, so we scan /proc/$PID/maps for deleted files before
unlinking. With old Inbox objects, it was (and is) simpler to
just kill processes w/o checking due to the low startup cost
(and non-portability of checking).
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210921144754.gulkneuulzo27qbw@meerkat.local/
|
|
A few dozen bytes saved here can add up when we have thousands
of inboxes. It also makes Data::Dumper debug output a bit cleaner.
|
|
This allows PublicInbox::WWW hosts to advertise the existence of
IMAP servers in addition to NNTP servers.
|
|
We no longer waste a precious hash slot for a per-Inbox
{nntpserver} if it's only configured globally for all inboxes.
|
|
This may fix problems with the "all" link disappearing.
Link: https://public-inbox.org/meta/CAMwyc-Tw=v5yT1U1U66GSwwTK8OJXv8_YDu-=oXbZO3tHSnYWw@mail.gmail.com/
|
|
While the WWW front-end can gracefully handle ->mm and ->over
disappearing (in most cases), IMAP+NNTP front-ends are completely
dependent on these and failed mysteriously when they go missing
after startup.
These will hopefully make issues like what Konstantin
encountered more obvious:
Link: https://public-inbox.org/meta/20210824204855.ejspej4z7r2rpu63@nitro.local/
|
|
Sometimes a mailed patch is generated with non-ideal output,
(lacking context, noisy whitespace changes, etc.), or a user
wants to use the same external diff viewer they've configured
git to use.
Since we have SolverGit to regenerate arbitrary blobs from
patches; this new command allows us to regenerate a diff with
different options using the blobs SolverGit gives us.
The amount of git-diff(1) options is mind numbing, so it's
likely I missed some favorites or botched the getopt spec
translation.
This also fixes Inbox::base_url to check psgi.url_scheme
before attempting to generate URLs and avoid uninitialized
variable warnings. Oddly, the "lei blob" tests did not
trigger these uninitialized warnings.
Note: this will automatically import+index the message(s)
it's regenerating, because solver relies on being able
to lookup pre/postimage OIDs and read blobs.
|
|
This will be necessary for "mailboxIds" as described in RFCs 8620 and
8621 (for JMAP). We may implement "MAILBOXID" in RFC 8474 for IMAP,
as well.
|
|
The features we use for SharedKV could probably be implemented
with GDBM_File or SDBM_File, but that doesn't seem worth it at
the moment since we depend on SQLite elsewhere.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Reading from regular files (even on STDIN) can fail
when dealing with flakey storage.
|
|
->on_inbox_unlock callbacks could clobber $_, and this seems to
fix a problem with -extindex --watch failing to index some
inboxes after SIGHUP reload.
|
|
The original comment hasn't been true since
PublicInbox::Git->modified was changed to use cat_async blob
responses. In any case, manifest.js.gz generation already
cleans up per-epoch git processes used for ->modified.
|
|
Our read-only code won't need to know the version until an inbox
is accessed. This is a small step towards eliminating many
stat() calls on read-only daemon startup.
|
|
Perl readdir detects list context and can return an array
suitable for the grep op. From there, we can rely on
substr to remove the ".git" suffix and integerize the value
to save a few bytes before letting List::Util::max return
the value.
This is how we detect Xapian shards nowadays, too, and
we'll also use defined-or (//) to simplify the return
value there.
We'll also simplify InboxWritable->git_dir_latest,
remove some callers, and consider removing it entirely.
|
|
While totally unindexed inboxes are rare, we still support
them for v1 and may hit code which calls this method. Just
return `undef' when ->mm access fails.
|
|
There's no need to have extra code in the Inbox package for this
or to waste dozens of bytes for every Inbox object which uses
the default value.
This makes our code more flexible w.r.t Inbox-like ExtSearch
objects and fixes uninitialized value warnings with ->ALL.
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
Using "eidx_key:" boolean prefix to limit results to a given
inbox, we can use ->ALL to emulate and replace per-Inbox
xap15/[0-9] search indices.
With this change, the presence of "extindex.all.topdir" in the
$PI_CONFIG will cause the WWW code to use that extindex and
ignore per-inbox Xapian DBs in xap15/[0-9].
Unfortunately IMAP search still requires old per-inbox indices,
for now. Mapping extindex Xapian docids to per-Inbox UIDs and
vice-versa is proving tricky. Fortunately, IMAP search is
rarely used and optional. The RFCs don't specify expensive
phrase search, either, so `indexlevel=medium' can be used in
per-inbox Xapian indices to save space.
For primarily WWW (and future JMAP) users; this should result in
significant disk space, FD, and page cache footprint savings for
large instances with many inboxes and many cross-posted
messages.
|
|
Stop leaking WWW/PSGI-specific logic into classes like
PublicInbox::Inbox, which is used universally.
We'll also decouple $ibx->over from $ibx->search and just deal
with duplicate the code inside ->over to reduce argument
complexity in ->search.
This is also a step in moving away from using {psgi.errors}
to ease code sharing between IMAP, NNTP, and command-line
interfaces. Perl's built-in `warn' and `local $SIG{__WARN__}'
provides all the flexibility we need to control warning output
and should be universally understood by Perl hackers who may
be unfamiliar with PSGI.
|
|
For a mirror of lore.kernel.org with >140 inboxes, this speeds
up manifest.js.gz generation from ~1s to 40ms on my HW. This
is still unacceptable when dealing with thousands of inboxes,
but gets us closer to where we need to be.
|
|
If $epoch is supplied to this method, there's already epochs and
an extra method call for ->version is a pointless waste of CPU
cycles.
|
|
We can slightly reduce the amount of version-specific logic,
here.
|
|
This lets us pretend an ExtSearch object is an Inbox object
in most of the existing WWW code.
|
|
We'll probably still need synchronous message retrieval
in a few places (tests, at least).
|
|
This will make it easier to deal with ExtSearchIdx, which
won't have msgmap.
|
|
"*foo" is ambiguous in that it may refer to a bareword file handle;
so we'll use it where we can without triggering warnings.
PublicInbox::TestCommon::run_script_exit required dropping the
prototype, however. We'll also future-proof by dropping "use
warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm
while we're in the area.
|
|
Only inbox accesses the read-only {over}, now, instead of going
through ->search. This simplifies our object graph and avoids
potentially redundant FDs and DB handles pointing to the same
over.sqlite3 file.
|
|
`->connect' is confused with the perlfunc for the `connect(2)'
syscall, and also `DBI->connect'. Since SQLite doesn't use
sockets, the word "connect" needlessly confuses me. Give
it a short name to match the field name we use for it, which
also matches the variable name used by the DBI(3pm) and
DBD::SQLite(3pm) manpages.
|
|
While this is unlikely to be a problem in current practice,
keeping Xapian DBs open for long responses can interfere with
free space recovery after -compact.
In the future, it will interfere with inbox search grouping
and lead to unexpected results.
|
|
SQLite already knows the filename internally, so avoid having it
as a long-lived Perl SV to save some bytes when there's many
inboxes and open DBs.
|
|
While it's even less common to experience a replaced
msgmap.sqlite3 file, BOFHs may do the darndest things. This is
another step towards reducing the number of needless wakeups
we need to do in long-lived read-only daemons.
|
|
Otherwise, we may never know what went wrong.
|
|
For v1 inboxes (and possibly v2 in the future, for VACUUM),
public-inbox-compact replaces over.sqlite3 with a new file.
This currently doesn't need an extra inotify watch descriptor
(or FD for kevent) at the moment, so it can coexist nicely for
systems w/o IO::KQueue or Linux::Inotify2.
|
|
Trying to avoid a circular reference by relying on $ibx object
here makes no sense, since skipping GitCatAsync::close will
result in an FD leak, anyways. So keep GitAsyncCat contained to
git-only operations, since we'll be using it for Solver in the
distant feature.
|
|
This ought to improve overall performance with multiple clients.
Single client performance suffers a tiny bit due to extra
syscall overhead from epoll.
This also makes the existing async interface easier-to-use,
since calling cat_async_begin is no longer required.
|
|
This will be used to implement IMAP IDLE, first.
Eventually, it may be used to trigger other things:
* incremental internal updates for manifest.js.gz
* restart `git cat-file' processes on pack index unlink
* IMAP IDLE-like long-polling HTTP endpoint
And maybe more things we haven't thought of, yet.
It uses Linux::Inotify2 or IO::KQueue depending on what packages
are installed and what the kernel supports. It falls back to
nanosecond-aware Time::HiRes::stat() (available with Perl 5.10.0+)
on systems lacking Linux::Inotify2 and IO::KQueue.
In the future, a pure Perl alternative to Linux::Inotify2 may be
supplied for users of architectures we already support signalfd
and epoll on.
v2 changes:
- avoid O_TRUNC on lock file
- change ctime on Linux systems w/o inotify
- fix naming of comments and fields
|
|
To further simplify callers and avoid embarrasing memory
explosions[1], we can finally eliminate this method in
favor of smsg_eml.
[1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5
("view: stop storing all MIME objects on large threads")
fixed a huge memory blowup.
|
|
None of our current callers care about the size of the blob
we're retrieving, so stop wasting stack space and code for
it.
|