Date | Commit message (Collapse) |
|
We no longer vivify the intermediate $ibx->{-hide} hashref,
instead we use $ibx->{-hide_$KEY} directly. This avoids
an intermediate hashref and extra hash table lookups.
|
|
We no longer trigger git cleanups from the Inbox package since
`git cat-file' users have their own cleanup to support git
coderepos not associated with any inbox.
This change means we unconditionally expire SQLite and Xapian
FDs and some internal caches regardless of git activity. The
old logic was irrelevant to Gcf2 (libgit2) users anyways since
we couldn't determine whether or not an inbox was active based
on {inflight} git requests, and upcoming changes will make it
inaccurate for all extindex/cindex users as well.
Opening SQLite and Xapian DBs is fairly cheap; so it's a small
price to pay to reduce memory use and fragmentation.
|
|
The IO package seems like a better home for I/O subs than the
Git package. We lose the 60 second read timeout for `git
cat-file --batch-*' processes since it's probably not necessary
given how reliable the code has proven and things would fall
over hard in other ways if the storage device were completely
hosed.
|
|
It's possible to have many coderepos with no inbox association
that never see git->cleanup. So instead of tying git->cleanup
to inboxes, ensure it gets armed when ->watch_async is called
(since it's only called in our -netd or -httpd servers).
|
|
It's slightly better organized this way, especially since
`publicinboxLimiter' has its own user-facing config section
and knobs. I may use it in LeiMirror and CodeSearchIdx for
process management.
|
|
This will make it easier to switch in the far future while
making callers easier-to-read (and more callers will be added).
Anyways, Perl 5.26 is a long time away for enterprise users;
but isolating compatibility code away can improve readability
of code we actually care about in the meantime.
|
|
I'm not sure how to best make a UI for one coderepo to many
inboxes/extindices, yet; but at least allow a simple 1:1
mapping, for now. This ensures /$CODEREPO/$OID/s/ can work
as effectively as /$INBOX/$OID/s/ when looking for emails
associated with a git commit.
|
|
We'll be using these functions for serving coderepos natively
without cgit.
|
|
It's a needless wrapper, nowadays. Originally, ->over was added
on experimental basis to optimize for /$INBOX/ where Xapian
->search is slower on gigantic (LKML-sized) inboxes.
Nowadays with extindex, ->over is here to stay given NNTP and
IMAP both benefit from it. So reduce the interpreter stack
overhead and just access ->over directly.
lxs->recent was never used outside of tests, anyways.
And while we're in the area, avoid needlessly bumping the
refcount of $ctx->{ibx} in View::paginate_recent.
|
|
Showing "../../foo.git" looks awkward and isn't conducive to
users who want to "git clone" a URL.
|
|
curl only supports "pop3://" and "pop3s://", despite RFC 2384
existing for "pop://". AFAIK, there's no RFCs for "pop3://"
and "pop3s://", but please let us know if there are.
In any case, real-world cases like curl are more relevant.
|
|
This will be used to speed up NNTP group listings and IMAP startup
with thousands of inboxes.
|
|
Hopefully it makes sense to new users deploying or using POP3...
|
|
Old account expiry has not been implemented, but it seems to
work well with both mpop(1) and getmail(1). The strictness of
mpop was particularly helpful in ironing out bugs in our
implementation of (dreaded) message sequence numbers.
"EXPIRE 0" (RFC 2449) can theoretically save numerous "DELE"
commands, but that's untested by real-world clients. mpop
supports PIPELINING which is effective in hiding latency,
and the core networking functionality is already well-tested
from our NNTP and IMAP implementations.
Configuration requires "publicinbox.pop3state" to point to
a directory writable by the otherwise read-only daemon.
See public-inbox-pop3d(1) manpage for more usage details.
|
|
As stated in the previous change, conditional hash assignments
which trigger other hash assignments seem problematic, at times.
So replace:
$h->{k} //= do { $h->{x} = ...; $val };
$h->{k} // do {
$h->{x} = ...;
$hk->{k} = $val
};
"||=" is affected the same way, and some instances of "||=" are
replaced with "//=" or "// do {", now.
|
|
Some yak-shaving while I try to track down other bugs...
|
|
The original Msgmap->new API was v1-specific and not necessary.
The ->new_file API now supports an $ibx object being passed to
it, simplify -no_fsync use. It will also make an upcoming
change easier...
|
|
The cost of opening a Xapian DB (even with shards) isn't high,
so save some FDs and just close it. We hit Xapian far less than
over.sqlite3 and we discard the MSet ASAP even when streaming
large responses.
This simplifies our code a bit and hopefully helps reduce
fragmentation by increasing mortality of late allocations.
|
|
Having git processes outlive DB handles is likely to hurt
from a fragmentation perspective if the DB handle needs to
be recreated immediately due to a git->cat_async callback.
So only unref DB handles when we're sure there's no live
git users left, otherwise check the inodes.
We'll also avoid needless localization checks in git->cleanup
and make the return value more obvious since the pid fields are
unconditionally deleted nowadays.
|
|
It was probably incorrect to use from max_git_epoch, and it's
small enough to inline into do_cleanup. We'll also eliminate
the unnecessary deletion of {-altid_map} while we're in the
area, since we no longer cache/memoize that.
Fixes: 7e5cea05f061e757 ("inbox: rewrite cleanup to be more aggressive")
|
|
This caused config->repo_objs to not fill in {-repo_objs}
properly before starting solver.
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87o88cqobd.fsf@kyleam.com/
Fixes: 63d7b8ceee55a34 ("daemons: revamp periodic cleanup task")
|
|
cloneurl, description, and base_url are no longer memoized. The
non-$env form of base_url is rare in WWW, and is fast enough to
not require memoization.
cloneurl and description are now expired during cleanup,
allowing admins to change these files without restarting
(or SIGHUP).
-altid_map is no longer cached nor memoized at all, since the
endpoint(s) which hit it seem rarely accessed.
nntp_url and imap_url are now cached (instead of memoized) in
case an inbox is unvisited for a long time. They remain cached
since the truthiness check gets called in every per-inbox HTML
page, which can potentially be expensive.
|
|
Avoid relying on a giant cleanup hash and instead use the new
DS->add_uniq_timer API to amortize the pause times associated
with having to cleanup many inboxes. We can also use smaller
intervals for this, as well.
We now discard SQLite DB handles at cleanup. Each of these can
use several megabytes of memory, which adds up with
hundreds/thousands of inboxes. Since per-inbox access intervals
are unpredictable and opening an SQLite handle is relatively
inexpensive, release memory more aggressively to avoid the heap
having to hit swap.
|
|
This saves us some memory for the hash slot in the common case
the `cloneurl' file doesn't exist.
|
|
`undef' entries still take up a slot in the hash table, and
cause the `exists' check to false-positive in ->cleanup_shards.
This should fully fix the (innocuous) messages introduced in
commit 63d7b8ce (daemons: revamp periodic cleanup task, 2021-09-23)
|
|
Neither Inboxes nor ExtSearch objects were retrying correctly
when there are live git processes, but the inboxes were getting
rescanned for search or other reasons. Ensure the scan retries
eventually if there's live processes.
We also need to update the cleanup task to detect Xapian shard
count changes, since Xapian ->reopen is enough to detect any
other Xapian changes. Otherwise, we just issue an inexpensive
->reopen call and let Xapian check whether there's anything
worth reopening.
This also lets us eliminate the Devel::Peek dependency.
|
|
Check for unlinked mmap-ed files via /proc/$PID/maps every 60s
or so.
ExtSearch (extindex) is compatible-enough with Inbox objects to
be wired into the old per-inbox code, but the startup cost is
projected to be much higher down the line when there's >30K
inboxes, so we scan /proc/$PID/maps for deleted files before
unlinking. With old Inbox objects, it was (and is) simpler to
just kill processes w/o checking due to the low startup cost
(and non-portability of checking).
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210921144754.gulkneuulzo27qbw@meerkat.local/
|
|
A few dozen bytes saved here can add up when we have thousands
of inboxes. It also makes Data::Dumper debug output a bit cleaner.
|
|
This allows PublicInbox::WWW hosts to advertise the existence of
IMAP servers in addition to NNTP servers.
|
|
We no longer waste a precious hash slot for a per-Inbox
{nntpserver} if it's only configured globally for all inboxes.
|
|
This may fix problems with the "all" link disappearing.
Link: https://public-inbox.org/meta/CAMwyc-Tw=v5yT1U1U66GSwwTK8OJXv8_YDu-=oXbZO3tHSnYWw@mail.gmail.com/
|
|
While the WWW front-end can gracefully handle ->mm and ->over
disappearing (in most cases), IMAP+NNTP front-ends are completely
dependent on these and failed mysteriously when they go missing
after startup.
These will hopefully make issues like what Konstantin
encountered more obvious:
Link: https://public-inbox.org/meta/20210824204855.ejspej4z7r2rpu63@nitro.local/
|
|
Sometimes a mailed patch is generated with non-ideal output,
(lacking context, noisy whitespace changes, etc.), or a user
wants to use the same external diff viewer they've configured
git to use.
Since we have SolverGit to regenerate arbitrary blobs from
patches; this new command allows us to regenerate a diff with
different options using the blobs SolverGit gives us.
The amount of git-diff(1) options is mind numbing, so it's
likely I missed some favorites or botched the getopt spec
translation.
This also fixes Inbox::base_url to check psgi.url_scheme
before attempting to generate URLs and avoid uninitialized
variable warnings. Oddly, the "lei blob" tests did not
trigger these uninitialized warnings.
Note: this will automatically import+index the message(s)
it's regenerating, because solver relies on being able
to lookup pre/postimage OIDs and read blobs.
|
|
This will be necessary for "mailboxIds" as described in RFCs 8620 and
8621 (for JMAP). We may implement "MAILBOXID" in RFC 8474 for IMAP,
as well.
|
|
The features we use for SharedKV could probably be implemented
with GDBM_File or SDBM_File, but that doesn't seem worth it at
the moment since we depend on SQLite elsewhere.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Reading from regular files (even on STDIN) can fail
when dealing with flakey storage.
|
|
->on_inbox_unlock callbacks could clobber $_, and this seems to
fix a problem with -extindex --watch failing to index some
inboxes after SIGHUP reload.
|
|
The original comment hasn't been true since
PublicInbox::Git->modified was changed to use cat_async blob
responses. In any case, manifest.js.gz generation already
cleans up per-epoch git processes used for ->modified.
|
|
Our read-only code won't need to know the version until an inbox
is accessed. This is a small step towards eliminating many
stat() calls on read-only daemon startup.
|
|
Perl readdir detects list context and can return an array
suitable for the grep op. From there, we can rely on
substr to remove the ".git" suffix and integerize the value
to save a few bytes before letting List::Util::max return
the value.
This is how we detect Xapian shards nowadays, too, and
we'll also use defined-or (//) to simplify the return
value there.
We'll also simplify InboxWritable->git_dir_latest,
remove some callers, and consider removing it entirely.
|
|
While totally unindexed inboxes are rare, we still support
them for v1 and may hit code which calls this method. Just
return `undef' when ->mm access fails.
|
|
There's no need to have extra code in the Inbox package for this
or to waste dozens of bytes for every Inbox object which uses
the default value.
This makes our code more flexible w.r.t Inbox-like ExtSearch
objects and fixes uninitialized value warnings with ->ALL.
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
Using "eidx_key:" boolean prefix to limit results to a given
inbox, we can use ->ALL to emulate and replace per-Inbox
xap15/[0-9] search indices.
With this change, the presence of "extindex.all.topdir" in the
$PI_CONFIG will cause the WWW code to use that extindex and
ignore per-inbox Xapian DBs in xap15/[0-9].
Unfortunately IMAP search still requires old per-inbox indices,
for now. Mapping extindex Xapian docids to per-Inbox UIDs and
vice-versa is proving tricky. Fortunately, IMAP search is
rarely used and optional. The RFCs don't specify expensive
phrase search, either, so `indexlevel=medium' can be used in
per-inbox Xapian indices to save space.
For primarily WWW (and future JMAP) users; this should result in
significant disk space, FD, and page cache footprint savings for
large instances with many inboxes and many cross-posted
messages.
|
|
Stop leaking WWW/PSGI-specific logic into classes like
PublicInbox::Inbox, which is used universally.
We'll also decouple $ibx->over from $ibx->search and just deal
with duplicate the code inside ->over to reduce argument
complexity in ->search.
This is also a step in moving away from using {psgi.errors}
to ease code sharing between IMAP, NNTP, and command-line
interfaces. Perl's built-in `warn' and `local $SIG{__WARN__}'
provides all the flexibility we need to control warning output
and should be universally understood by Perl hackers who may
be unfamiliar with PSGI.
|
|
For a mirror of lore.kernel.org with >140 inboxes, this speeds
up manifest.js.gz generation from ~1s to 40ms on my HW. This
is still unacceptable when dealing with thousands of inboxes,
but gets us closer to where we need to be.
|
|
If $epoch is supplied to this method, there's already epochs and
an extra method call for ->version is a pointless waste of CPU
cycles.
|
|
We can slightly reduce the amount of version-specific logic,
here.
|
|
This lets us pretend an ExtSearch object is an Inbox object
in most of the existing WWW code.
|