about summary refs log tree commit homepage
path: root/lib/PublicInbox/Inbox.pm
DateCommit message (Collapse)
2020-07-14over+msgmap: do not store filename after DBI->connect
SQLite already knows the filename internally, so avoid having it as a long-lived Perl SV to save some bytes when there's many inboxes and open DBs.
2020-07-14nntpd+imapd: detect unlinked msgmap
While it's even less common to experience a replaced msgmap.sqlite3 file, BOFHs may do the darndest things. This is another step towards reducing the number of needless wakeups we need to do in long-lived read-only daemons.
2020-06-28inbox: warn on ->on_inbox_unlock exception
Otherwise, we may never know what went wrong.
2020-06-13nntpd+imapd: detect replaced over.sqlite3
For v1 inboxes (and possibly v2 in the future, for VACUUM), public-inbox-compact replaces over.sqlite3 with a new file. This currently doesn't need an extra inotify watch descriptor (or FD for kevent) at the moment, so it can coexist nicely for systems w/o IO::KQueue or Linux::Inotify2.
2020-06-13git: move async_cat reference to PublicInbox::Git
Trying to avoid a circular reference by relying on $ibx object here makes no sense, since skipping GitCatAsync::close will result in an FD leak, anyways. So keep GitAsyncCat contained to git-only operations, since we'll be using it for Solver in the distant feature.
2020-06-13imap: use git-cat-file asynchronously
This ought to improve overall performance with multiple clients. Single client performance suffers a tiny bit due to extra syscall overhead from epoll. This also makes the existing async interface easier-to-use, since calling cat_async_begin is no longer required.
2020-06-13inboxidle: new class to detect inbox changes
This will be used to implement IMAP IDLE, first. Eventually, it may be used to trigger other things: * incremental internal updates for manifest.js.gz * restart `git cat-file' processes on pack index unlink * IMAP IDLE-like long-polling HTTP endpoint And maybe more things we haven't thought of, yet. It uses Linux::Inotify2 or IO::KQueue depending on what packages are installed and what the kernel supports. It falls back to nanosecond-aware Time::HiRes::stat() (available with Perl 5.10.0+) on systems lacking Linux::Inotify2 and IO::KQueue. In the future, a pure Perl alternative to Linux::Inotify2 may be supplied for users of architectures we already support signalfd and epoll on. v2 changes: - avoid O_TRUNC on lock file - change ctime on Linux systems w/o inotify - fix naming of comments and fields
2020-06-03www: remove smsg_mime API and adjust callers
To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup.
2020-06-03inbox: msg_by_*: remove $(size)ref args
None of our current callers care about the size of the blob we're retrieving, so stop wasting stack space and code for it.
2020-06-03inbox: introduce smsg_eml method
The goal of this is to eventually remove the $smsg->{mime} field which is easy-to-misuse and cause memory explosions which necessitated fixes like commit 7d02b9e64455831d ("view: stop storing all MIME objects on large threads").
2020-05-29treat $INBOX_DIR/description and gitweb.owner as UTF-8
gitweb does the same with $GIT_DIR/description and gitweb.owner. Allowing UTF-8 description should not cause problems when used in responses for to the NNTP "LIST NEWSGROUPS" request, either, since RFC 3977 section 7.6.6 recommends the description be UTF-8 (but does not require it). Link: https://public-inbox.org/meta/20200528151216.l7vmnmrs4ojw372g@sourcephile.fr/
2020-05-15PublicInbox::Inbox.pm: Default unset address to a one element array
PublicInbox::Config.pm::_fill() assumes that address is an array. Therefore when handling an unset address use an array containing a single string, instead of a single string. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-09replace most uses of PublicInbox::MIME with Eml
PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-04-19reduce scope of mbox From_ line removal
It's unnecessary overhead for anything which does Email::MIME parsing. It was never done for v2 indexing, even though v1->v2 conversions did NOT remove those From_ lines. There was never a need to remote From_ lines the v1 SearchIdx paths, either. Hitting a /$INBOX_URL/$MSGID/T/ endpoint with an 18 message thread reveals a ~0.5% speed improvement. This will become more apparent when we have a faster MIME parser.
2020-04-19inbox: replace `eval {}' with `do {}' where appropriate
-Git->new and -Limiter->new will never fail unless there's an OOM, so using `eval' is incorrect.
2020-04-19inbox: don't memoize missing description|cloneurl
It's probably common to have inboxes initially setup without these files properly configured, so don't memoize at that stage.
2020-03-26inbox: altid_map becomes a method
We want to be able to preload that, as well as to access it in WwwText for a config comment in the config example.
2020-03-22rename PublicInbox::SearchMsg => PublicInbox::Smsg
Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-02-04inbox: remove TODO item for msg_by_path
It's an old function which only gets called by inboxes w/o SQLite indices.
2020-02-04inbox: simplify ->description and ->cloneurl
We can use "//=" from Perl 5.10 to simplify the logic for these methods. The use of chomp() in ->cloneurl was also unnecessary since split(/\s+/s,...) already removes newlines.
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-24inbox: simplify filtering for duplicate NNTP URLs
And add a note to remind ourselves to use List::Util::uniq when it becomes common.
2020-01-13ds: add an in_loop() function for Inbox.pm use
Inbox.pm accessing the $in_loop variable directly raises warnings when Inbox is loaded without DS.
2020-01-11inbox: use PublicInbox::Git::host_prefix_url for base_url
Better not to duplicate the same logic across different classes. Also, our git wrapper class is a strange place for host_prefix_url, but it needs to be usable for coderepos, so it's there, for now...
2020-01-02config: support multi-value inbox.*.*url
Since the beginning of this project, we've implicitly supported inboxes with multiple URLs by relying on the Host: header sent by the client ($env->{HTTP_HOST}). We now offer the option to explicitly configure multiple URLs for every inbox along with the ability to do a best-effort match for matching hostnames.
2019-12-15inbox: fix periodic git process cleanup
We need to use $PublicInbox::DS::in_loop instead of ::running(). The latter is not valid for systems with signalfd or kqueue and is now gone, completely. Not needing periodic cleanups at all to deal with unlinked pack indices will be a tougher task...
2019-12-14ds: move EvCleanup code into DS
EvCleanup only existed since Danga::Socket was a separate component, and cleanup code belongs with the event loop.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-09-09run update-copyrights from gnulib for 2019
2019-06-14rename reference to git epochs as "partitions"
Try to remain consistent with our own documentation regarding v2 git "epochs", first.
2019-06-14search: require PublicInbox::Inbox ref here
No sense in supporting multiple methods of initialization for an internal class.
2019-06-04require ASCII digits for local FS items
In case some BOFH decides to randomly create directories using non-ASCII digits all over the place.
2019-06-04inbox: require ASCII digits for feedmax var
Don't waste more cycles than necessary if somebody decides to put non-ASCII digits in their ~/.public-inbox/config
2019-06-01git: unconditional expiry
A constant stream of traffic to either httpd/nntpd would mean git-cat-file processes never expire. Things can go bad after a full repack, as a full repack will unlink old pack indices and git-cat-file does not currently detect unlinked files. We could do something complicated by recursively stat-ing objects/pack of every git directory and alternate; but that's probably not worth the trouble compared to occasionally restarting the cat-file process. So simplify the code and let httpd/nntpd expire them periodically, since spawning a "git-cat-file --batch" process isn't too expensive. We already spawn for every request which hits git-http-backend, cgit, and git-apply. In the future, we may optionally support the Git::Raw module to avoid IPC; but we must remain careful to not leave lingering FDs open to unlinked files after repack.
2019-05-23doc: various updates to reflect current state
-index documentation avoid redundant v1 information and refers readers to apropriate v1/v2 manpages. Search::Xapian can also be optional, now, as only the PSGI search interface uses it. Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be confused for code repos which we also support. XAPIAN_FLUSH_THRESHOLD is documented for all relevant bulk commands.
2019-05-21Merge remote-tracking branch 'origin/xap-optional' into master
* origin/xap-optional: admin: improve warnings and errors for missing modules searchidx: do not create empty Xapian partitions for basic lazy load Xapian and make it optional for v2 www: use Inbox->over where appropriate nntp: use Inbox->over directly inbox: add ->over method to ease access
2019-05-15remove hard Devel::Peek dependency and lazy load for daemons
It's only useful for a corner case in long-running daemons when an admin decides to compact or vacuum a Xapian or SQLite DB. As a result, other scripts should run slightly faster. For instance, this saves about 80ms (2.710s => 2.630s) in t/mda.t on my remote workstation. While we're at it, make sure EvCleanup is properly require'd in Daemon.pm and HTTP.pm and document our use of Devel::Peek.
2019-05-15inbox: remove POSIX strftime import
We no longer need it since Inbox->recent hits the overview DB instead of Xapian
2019-05-15lazy load Xapian and make it optional for v2
More tests work without Search::Xapian, now. Usability issues still need to be fixed
2019-05-15www: use Inbox->over where appropriate
We don't need to rely on Xapian search functionality for the majority of the WWW code, even. subject_normalized is moved to SearchMsg, where it (probably) makes more sense, anyways.
2019-05-15inbox: add ->over method to ease access
One small step towards making installing Xapian optional for v2 and providing more WWW and NNTP functionality without it.
2019-04-19www: support listing of inboxes
We will still return a 404 by default to '/' for compatibility with users of Plack::App::Cascade or similar. Inboxes are sorted by modification times to help users detect activity (similar to the /$INBOX/ topic view). New configuration options: * publicinbox.wwwlisting - configure the listing type * publicinbox.<name>.hide - hide a particular inbox from the listing See changes to public-inbox-config.pod for full descriptions of the new options. Requested-by: Leah Neukirchen <leah@vuxu.org> https://public-inbox.org/meta/871sdfzy80.fsf@gmail.com/
2019-04-18inbox: add `modified' sub
For inboxes with SQLite enabled (all v2, and probably most v1); we can use the overview DB to get the timestamp of the latest message. It's faster than scanning git branches for commit times, but not always the same.
2019-01-31inbox: drop psgi.url_scheme requirement from base_url
This will make it easier to make command-line tools from SolverGit.
2019-01-31inbox: perform cleanup of Git objects for coderepos
Otherwise, long-running but idle git processes may keep unlinked packs around indefinitely and waste disk space.
2019-01-08view: more culling for search threads
{mapping} overhead is now down to ~1.3M at the end of a giant thread from hell.
2019-01-02inbox: keep Danga::Socket optional
We can't run cleanup stuff without Danga::Socket.
2018-04-18extmsg: remove expensive git path checks
Searching across different inboxes is expensive without SQLite (or Xapian) installed, so avoid doing expensive tree lookups in git. Since SQLite is required for Xapian support anyways, we won't need to check Xapian, either. Sites without SQLite installed will simply 404 if somebody requests a message which isn't in the current inbox.
2018-04-06www: favor reading more from SQLite, and less from Xapian
Favor simpler internal APIs this time around, this cuts a fair amount of code out and takes another step towards removing Xapian as a dependency for v2 repos.