about summary refs log tree commit homepage
path: root/lib/PublicInbox/Inbox.pm
DateCommit message (Collapse)
2023-11-30inbox: shrink data structures for publicinbox.*.hide
We no longer vivify the intermediate $ibx->{-hide} hashref, instead we use $ibx->{-hide_$KEY} directly. This avoids an intermediate hashref and extra hash table lookups.
2023-11-30inbox: expire resources more aggressively
We no longer trigger git cleanups from the Inbox package since `git cat-file' users have their own cleanup to support git coderepos not associated with any inbox. This change means we unconditionally expire SQLite and Xapian FDs and some internal caches regardless of git activity. The old logic was irrelevant to Gcf2 (libgit2) users anyways since we couldn't determine whether or not an inbox was active based on {inflight} git requests, and upcoming changes will make it inaccurate for all extindex/cindex users as well. Opening SQLite and Xapian DBs is fairly cheap; so it's a small price to pay to reduce memory use and fragmentation.
2023-11-03move read_all, try_cat, and poll_in to PublicInbox::IO
The IO package seems like a better home for I/O subs than the Git package. We lose the 60 second read timeout for `git cat-file --batch-*' processes since it's probably not necessary given how reliable the code has proven and things would fall over hard in other ways if the storage device were completely hosed.
2023-10-26git: cleanup un-associated coderepo processes
It's possible to have many coderepos with no inbox association that never see git->cleanup. So instead of tying git->cleanup to inboxes, ensure it gets armed when ->watch_async is called (since it's only called in our -netd or -httpd servers).
2023-10-25limiter: split out from qspawn
It's slightly better organized this way, especially since `publicinboxLimiter' has its own user-facing config section and knobs. I may use it in LeiMirror and CodeSearchIdx for process management.
2023-06-09add compat package for List::Util::uniqstr
This will make it easier to switch in the far future while making callers easier-to-read (and more callers will be added). Anyways, Perl 5.26 is a long time away for enterprise users; but isolating compatibility code away can improve readability of code we actually care about in the meantime.
2022-10-09www_coderepo: allow searching one extindex|inbox
I'm not sure how to best make a UI for one coderepo to many inboxes/extindices, yet; but at least allow a simple 1:1 mapping, for now. This ensures /$CODEREPO/$OID/s/ can work as effectively as /$INBOX/$OID/s/ when looking for emails associated with a git commit.
2022-10-05git: move cloneurl + description reading here
We'll be using these functions for serving coderepos natively without cgit.
2022-08-29treewide: ditch inbox->recent method
It's a needless wrapper, nowadays. Originally, ->over was added on experimental basis to optimize for /$INBOX/ where Xapian ->search is slower on gigantic (LKML-sized) inboxes. Nowadays with extindex, ->over is here to stay given NNTP and IMAP both benefit from it. So reduce the interpreter stack overhead and just access ->over directly. lxs->recent was never used outside of tests, anyways. And while we're in the area, avoid needlessly bumping the refcount of $ctx->{ibx} in View::paginate_recent.
2022-08-20www: use absolute URLs for coderepo URLs
Showing "../../foo.git" looks awkward and isn't conducive to users who want to "git clone" a URL.
2022-08-11www: inbox: favor "pop3://" over "pop://"
curl only supports "pop3://" and "pop3s://", despite RFC 2384 existing for "pop://". AFAIK, there's no RFCs for "pop3://" and "pop3s://", but please let us know if there are. In any case, real-world cases like curl are more relevant.
2022-08-04miscidx: index inbox min/max article numbers
This will be used to speed up NNTP group listings and IMAP startup with thousands of inboxes.
2022-07-30doc|www: flesh out POP3 documentation for servers and users
Hopefully it makes sense to new users deploying or using POP3...
2022-07-20public-inbox-pop3d - a mostly read-only POP3 server
Old account expiry has not been implemented, but it seems to work well with both mpop(1) and getmail(1). The strictness of mpop was particularly helpful in ironing out bugs in our implementation of (dreaded) message sequence numbers. "EXPIRE 0" (RFC 2449) can theoretically save numerous "DELE" commands, but that's untested by real-world clients. mpop supports PIPELINING which is effective in hiding latency, and the core networking functionality is already well-tested from our NNTP and IMAP implementations. Configuration requires "publicinbox.pop3state" to point to a directory writable by the otherwise read-only daemon. See public-inbox-pop3d(1) manpage for more usage details.
2021-11-01treewide: kill problematic "$h->{k} //= do {" assignments
As stated in the previous change, conditional hash assignments which trigger other hash assignments seem problematic, at times. So replace: $h->{k} //= do { $h->{x} = ...; $val }; $h->{k} // do { $h->{x} = ...; $hk->{k} = $val }; "||=" is affected the same way, and some instances of "||=" are replaced with "//=" or "// do {", now.
2021-10-16inbox + search: use 5.10.1 and do some golfing
Some yak-shaving while I try to track down other bugs...
2021-10-12msgmap: ->new_file to supports $ibx arg, drop ->new
The original Msgmap->new API was v1-specific and not necessary. The ->new_file API now supports an $ibx object being passed to it, simplify -no_fsync use. It will also make an upcoming change easier...
2021-10-12daemon: unconditionally close Xapian shards on cleanup
The cost of opening a Xapian DB (even with shards) isn't high, so save some FDs and just close it. We hit Xapian far less than over.sqlite3 and we discard the MSet ASAP even when streaming large responses. This simplifies our code a bit and hopefully helps reduce fragmentation by increasing mortality of late allocations.
2021-10-01inbox: keep DB handles if git processes are live
Having git processes outlive DB handles is likely to hurt from a fragmentation perspective if the DB handle needs to be recreated immediately due to a git->cat_async callback. So only unref DB handles when we're sure there's no live git users left, otherwise check the inodes. We'll also avoid needless localization checks in git->cleanup and make the return value more obvious since the pid fields are unconditionally deleted nowadays.
2021-10-01inbox: inline and eliminate git_cleanup
It was probably incorrect to use from max_git_epoch, and it's small enough to inline into do_cleanup. We'll also eliminate the unnecessary deletion of {-altid_map} while we're in the area, since we no longer cache/memoize that. Fixes: 7e5cea05f061e757 ("inbox: rewrite cleanup to be more aggressive")
2021-09-29inbox: do not vivify {-repo_objs} during cleanup
This caused config->repo_objs to not fill in {-repo_objs} properly before starting solver. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87o88cqobd.fsf@kyleam.com/ Fixes: 63d7b8ceee55a34 ("daemons: revamp periodic cleanup task")
2021-09-29inbox: drop memoization/preload, cleanup expires caches
cloneurl, description, and base_url are no longer memoized. The non-$env form of base_url is rare in WWW, and is fast enough to not require memoization. cloneurl and description are now expired during cleanup, allowing admins to change these files without restarting (or SIGHUP). -altid_map is no longer cached nor memoized at all, since the endpoint(s) which hit it seem rarely accessed. nntp_url and imap_url are now cached (instead of memoized) in case an inbox is unvisited for a long time. They remain cached since the truthiness check gets called in every per-inbox HTML page, which can potentially be expensive.
2021-09-29inbox: rewrite cleanup to be more aggressive
Avoid relying on a giant cleanup hash and instead use the new DS->add_uniq_timer API to amortize the pause times associated with having to cleanup many inboxes. We can also use smaller intervals for this, as well. We now discard SQLite DB handles at cleanup. Each of these can use several megabytes of memory, which adds up with hundreds/thousands of inboxes. Since per-inbox access intervals are unpredictable and opening an SQLite handle is relatively inexpensive, release memory more aggressively to avoid the heap having to hit swap.
2021-09-26inbox: cloneurl: avoid undef to hash table value
This saves us some memory for the hash slot in the common case the `cloneurl' file doesn't exist.
2021-09-26search: avoid setting undef hashtable entries
`undef' entries still take up a slot in the hash table, and cause the `exists' check to false-positive in ->cleanup_shards. This should fully fix the (innocuous) messages introduced in commit 63d7b8ce (daemons: revamp periodic cleanup task, 2021-09-23)
2021-09-23daemons: revamp periodic cleanup task
Neither Inboxes nor ExtSearch objects were retrying correctly when there are live git processes, but the inboxes were getting rescanned for search or other reasons. Ensure the scan retries eventually if there's live processes. We also need to update the cleanup task to detect Xapian shard count changes, since Xapian ->reopen is enough to detect any other Xapian changes. Otherwise, we just issue an inexpensive ->reopen call and let Xapian check whether there's anything worth reopening. This also lets us eliminate the Devel::Peek dependency.
2021-09-23gcf2 + extsearch: check for unlinked files on Linux
Check for unlinked mmap-ed files via /proc/$PID/maps every 60s or so. ExtSearch (extindex) is compatible-enough with Inbox objects to be wired into the old per-inbox code, but the startup cost is projected to be much higher down the line when there's >30K inboxes, so we scan /proc/$PID/maps for deleted files before unlinking. With old Inbox objects, it was (and is) simpler to just kill processes w/o checking due to the low startup cost (and non-portability of checking). Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210921144754.gulkneuulzo27qbw@meerkat.local/
2021-09-22inbox: do not waste hash slot on httpbackend_limiter
A few dozen bytes saved here can add up when we have thousands of inboxes. It also makes Data::Dumper debug output a bit cleaner.
2021-09-16www: support publicinbox.imapserver
This allows PublicInbox::WWW hosts to advertise the existence of IMAP servers in addition to NNTP servers.
2021-09-16inbox: streamline ->nntp_url
We no longer waste a precious hash slot for a per-Inbox {nntpserver} if it's only configured globally for all inboxes.
2021-08-28www: avoid potential auto-vivification on ibx->{url}
This may fix problems with the "all" link disappearing. Link: https://public-inbox.org/meta/CAMwyc-Tw=v5yT1U1U66GSwwTK8OJXv8_YDu-=oXbZO3tHSnYWw@mail.gmail.com/
2021-08-25imap+nntp: die loudly if ->mm or ->over disappear
While the WWW front-end can gracefully handle ->mm and ->over disappearing (in most cases), IMAP+NNTP front-ends are completely dependent on these and failed mysteriously when they go missing after startup. These will hopefully make issues like what Konstantin encountered more obvious: Link: https://public-inbox.org/meta/20210824204855.ejspej4z7r2rpu63@nitro.local/
2021-05-05lei rediff: regenerate diffs from stdin
Sometimes a mailed patch is generated with non-ideal output, (lacking context, noisy whitespace changes, etc.), or a user wants to use the same external diff viewer they've configured git to use. Since we have SolverGit to regenerate arbitrary blobs from patches; this new command allows us to regenerate a diff with different options using the blobs SolverGit gives us. The amount of git-diff(1) options is mind numbing, so it's likely I missed some favorites or botched the getopt spec translation. This also fixes Inbox::base_url to check psgi.url_scheme before attempting to generate URLs and avoid uninitialized variable warnings. Oddly, the "lei blob" tests did not trigger these uninitialized warnings. Note: this will automatically import+index the message(s) it's regenerating, because solver relies on being able to lookup pre/postimage OIDs and read blobs.
2021-03-02inbox: ->mailboxid accessor
This will be necessary for "mailboxIds" as described in RFCs 8620 and 8621 (for JMAP). We may implement "MAILBOXID" in RFC 8474 for IMAP, as well.
2021-02-04tests: guard against missing DBD::SQLite
The features we use for SharedKV could probably be implemented with GDBM_File or SDBM_File, but that doesn't seem worth it at the moment since we depend on SQLite elsewhere.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-28check defined return value for localized slurp errors
Reading from regular files (even on STDIN) can fail when dealing with flakey storage.
2020-12-26inbox: name variable for values loop iterator
->on_inbox_unlock callbacks could clobber $_, and this seems to fix a problem with -extindex --watch failing to index some inboxes after SIGHUP reload.
2020-12-23inbox: git_epoch: correct false comment
The original comment hasn't been true since PublicInbox::Git->modified was changed to use cat_async blob responses. In any case, manifest.js.gz generation already cleans up per-epoch git processes used for ->modified.
2020-12-21inbox: delay ->version detection
Our read-only code won't need to know the version until an inbox is accessed. This is a small step towards eliminating many stat() calls on read-only daemon startup.
2020-12-17inbox: simplify v2 epoch counting
Perl readdir detects list context and can return an array suitable for the grep op. From there, we can rely on substr to remove the ".git" suffix and integerize the value to save a few bytes before letting List::Util::max return the value. This is how we detect Xapian shards nowadays, too, and we'll also use defined-or (//) to simplify the return value there. We'll also simplify InboxWritable->git_dir_latest, remove some callers, and consider removing it entirely.
2020-12-17inbox: ->uidvalidity returns undef w/o ->mm
While totally unindexed inboxes are rare, we still support them for v1 and may hit code which calls this method. Just return `undef' when ->mm access fails.
2020-12-14PublicInbox::Feed owns `feedmax' default value
There's no need to have extra code in the Inbox package for this or to waste dozens of bytes for every Inbox object which uses the default value. This makes our code more flexible w.r.t Inbox-like ExtSearch objects and fixes uninitialized value warnings with ->ALL.
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-12-05isearch: emulate per-inbox search with ->ALL
Using "eidx_key:" boolean prefix to limit results to a given inbox, we can use ->ALL to emulate and replace per-Inbox xap15/[0-9] search indices. With this change, the presence of "extindex.all.topdir" in the $PI_CONFIG will cause the WWW code to use that extindex and ignore per-inbox Xapian DBs in xap15/[0-9]. Unfortunately IMAP search still requires old per-inbox indices, for now. Mapping extindex Xapian docids to per-Inbox UIDs and vice-versa is proving tricky. Fortunately, IMAP search is rarely used and optional. The RFCs don't specify expensive phrase search, either, so `indexlevel=medium' can be used in per-inbox Xapian indices to save space. For primarily WWW (and future JMAP) users; this should result in significant disk space, FD, and page cache footprint savings for large instances with many inboxes and many cross-posted messages.
2020-12-05inbox: simplify ->search and callers
Stop leaking WWW/PSGI-specific logic into classes like PublicInbox::Inbox, which is used universally. We'll also decouple $ibx->over from $ibx->search and just deal with duplicate the code inside ->over to reduce argument complexity in ->search. This is also a step in moving away from using {psgi.errors} to ease code sharing between IMAP, NNTP, and command-line interfaces. Perl's built-in `warn' and `local $SIG{__WARN__}' provides all the flexibility we need to control warning output and should be universally understood by Perl hackers who may be unfamiliar with PSGI.
2020-11-24manifest: support faster generation via [extindex "all"]
For a mirror of lore.kernel.org with >140 inboxes, this speeds up manifest.js.gz generation from ~1s to 40ms on my HW. This is still unacceptable when dealing with thousands of inboxes, but gets us closer to where we need to be.
2020-11-24inbox: git_epoch: remove ->version check
If $epoch is supplied to this method, there's already epochs and an extra method call for ->version is a pointless waste of CPU cycles.
2020-11-24manifest: use ibx->git_epoch method for v2
We can slightly reduce the amount of version-specific logic, here.
2020-11-07extsearch: wire up remaining Inbox-like methods for WWW
This lets us pretend an ExtSearch object is an Inbox object in most of the existing WWW code.