about summary refs log tree commit homepage
path: root/lib/PublicInbox/Inbox.pm
DateCommit message (Collapse)
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-02-04inbox: remove TODO item for msg_by_path
It's an old function which only gets called by inboxes w/o SQLite indices.
2020-02-04inbox: simplify ->description and ->cloneurl
We can use "//=" from Perl 5.10 to simplify the logic for these methods. The use of chomp() in ->cloneurl was also unnecessary since split(/\s+/s,...) already removes newlines.
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-24inbox: simplify filtering for duplicate NNTP URLs
And add a note to remind ourselves to use List::Util::uniq when it becomes common.
2020-01-13ds: add an in_loop() function for Inbox.pm use
Inbox.pm accessing the $in_loop variable directly raises warnings when Inbox is loaded without DS.
2020-01-11inbox: use PublicInbox::Git::host_prefix_url for base_url
Better not to duplicate the same logic across different classes. Also, our git wrapper class is a strange place for host_prefix_url, but it needs to be usable for coderepos, so it's there, for now...
2020-01-02config: support multi-value inbox.*.*url
Since the beginning of this project, we've implicitly supported inboxes with multiple URLs by relying on the Host: header sent by the client ($env->{HTTP_HOST}). We now offer the option to explicitly configure multiple URLs for every inbox along with the ability to do a best-effort match for matching hostnames.
2019-12-15inbox: fix periodic git process cleanup
We need to use $PublicInbox::DS::in_loop instead of ::running(). The latter is not valid for systems with signalfd or kqueue and is now gone, completely. Not needing periodic cleanups at all to deal with unlinked pack indices will be a tougher task...
2019-12-14ds: move EvCleanup code into DS
EvCleanup only existed since Danga::Socket was a separate component, and cleanup code belongs with the event loop.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-09-09run update-copyrights from gnulib for 2019
2019-06-14rename reference to git epochs as "partitions"
Try to remain consistent with our own documentation regarding v2 git "epochs", first.
2019-06-14search: require PublicInbox::Inbox ref here
No sense in supporting multiple methods of initialization for an internal class.
2019-06-04require ASCII digits for local FS items
In case some BOFH decides to randomly create directories using non-ASCII digits all over the place.
2019-06-04inbox: require ASCII digits for feedmax var
Don't waste more cycles than necessary if somebody decides to put non-ASCII digits in their ~/.public-inbox/config
2019-06-01git: unconditional expiry
A constant stream of traffic to either httpd/nntpd would mean git-cat-file processes never expire. Things can go bad after a full repack, as a full repack will unlink old pack indices and git-cat-file does not currently detect unlinked files. We could do something complicated by recursively stat-ing objects/pack of every git directory and alternate; but that's probably not worth the trouble compared to occasionally restarting the cat-file process. So simplify the code and let httpd/nntpd expire them periodically, since spawning a "git-cat-file --batch" process isn't too expensive. We already spawn for every request which hits git-http-backend, cgit, and git-apply. In the future, we may optionally support the Git::Raw module to avoid IPC; but we must remain careful to not leave lingering FDs open to unlinked files after repack.
2019-05-23doc: various updates to reflect current state
-index documentation avoid redundant v1 information and refers readers to apropriate v1/v2 manpages. Search::Xapian can also be optional, now, as only the PSGI search interface uses it. Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be confused for code repos which we also support. XAPIAN_FLUSH_THRESHOLD is documented for all relevant bulk commands.
2019-05-21Merge remote-tracking branch 'origin/xap-optional' into master
* origin/xap-optional: admin: improve warnings and errors for missing modules searchidx: do not create empty Xapian partitions for basic lazy load Xapian and make it optional for v2 www: use Inbox->over where appropriate nntp: use Inbox->over directly inbox: add ->over method to ease access
2019-05-15remove hard Devel::Peek dependency and lazy load for daemons
It's only useful for a corner case in long-running daemons when an admin decides to compact or vacuum a Xapian or SQLite DB. As a result, other scripts should run slightly faster. For instance, this saves about 80ms (2.710s => 2.630s) in t/mda.t on my remote workstation. While we're at it, make sure EvCleanup is properly require'd in Daemon.pm and HTTP.pm and document our use of Devel::Peek.
2019-05-15inbox: remove POSIX strftime import
We no longer need it since Inbox->recent hits the overview DB instead of Xapian
2019-05-15lazy load Xapian and make it optional for v2
More tests work without Search::Xapian, now. Usability issues still need to be fixed
2019-05-15www: use Inbox->over where appropriate
We don't need to rely on Xapian search functionality for the majority of the WWW code, even. subject_normalized is moved to SearchMsg, where it (probably) makes more sense, anyways.
2019-05-15inbox: add ->over method to ease access
One small step towards making installing Xapian optional for v2 and providing more WWW and NNTP functionality without it.
2019-04-19www: support listing of inboxes
We will still return a 404 by default to '/' for compatibility with users of Plack::App::Cascade or similar. Inboxes are sorted by modification times to help users detect activity (similar to the /$INBOX/ topic view). New configuration options: * publicinbox.wwwlisting - configure the listing type * publicinbox.<name>.hide - hide a particular inbox from the listing See changes to public-inbox-config.pod for full descriptions of the new options. Requested-by: Leah Neukirchen <leah@vuxu.org> https://public-inbox.org/meta/871sdfzy80.fsf@gmail.com/
2019-04-18inbox: add `modified' sub
For inboxes with SQLite enabled (all v2, and probably most v1); we can use the overview DB to get the timestamp of the latest message. It's faster than scanning git branches for commit times, but not always the same.
2019-01-31inbox: drop psgi.url_scheme requirement from base_url
This will make it easier to make command-line tools from SolverGit.
2019-01-31inbox: perform cleanup of Git objects for coderepos
Otherwise, long-running but idle git processes may keep unlinked packs around indefinitely and waste disk space.
2019-01-08view: more culling for search threads
{mapping} overhead is now down to ~1.3M at the end of a giant thread from hell.
2019-01-02inbox: keep Danga::Socket optional
We can't run cleanup stuff without Danga::Socket.
2018-04-18extmsg: remove expensive git path checks
Searching across different inboxes is expensive without SQLite (or Xapian) installed, so avoid doing expensive tree lookups in git. Since SQLite is required for Xapian support anyways, we won't need to check Xapian, either. Sites without SQLite installed will simply 404 if somebody requests a message which isn't in the current inbox.
2018-04-06www: favor reading more from SQLite, and less from Xapian
Favor simpler internal APIs this time around, this cuts a fair amount of code out and takes another step towards removing Xapian as a dependency for v2 repos.
2018-04-03view: avoid offset during pagination
OFFSET in SQLite gets painful to deal with. Instead, rely on timestamps (from Received:) for pagination. This also sets us up for more precise Date searching in case we want it.
2018-04-02replace Xapian skeleton with SQLite overview DB
This ought to provide better performance and scalability which is less dependent on inbox size. Xapian does not seem optimized for some queries used by the WWW homepage, Atom feeds, XOVER and NEWNEWS NNTP commands. This can actually make Xapian optional for NNTP usage, and allow more functionality to work without Xapian installed. Indexing performance was extremely bad at first, but DBI::Profile helped me optimize away problematic queries.
2018-03-30feed: optimize query for feeds, too
This is a smaller improvement than the landing /$INBOX/ page because full message bodies are shown; but still saves around 100ms for my system with LKML.
2018-03-30view: speed up homepage loading time with date clamp
This saves over 400ms on my system with the full LKML with over 2.8 million messages.
2018-03-29www: cleanup expensive fallback for legacy URLs
Back in the day, we compressed long Message-IDs to SHA-1 hexdigests for the URL. This now redirects to a 301 in the hopes we can remove these checks some day to reduce overhead.
2018-03-29search: get rid of most lookup_* subroutines
Too many similar functions doing the same basic thing was redundant and misleading, especially since Message-ID is no longer treated as a truly unique identifier. For displaying threads in the HTML, this makes it clear that we favor the primary Message-ID mapped to an NNTP article number if a message cannot be found.
2018-03-29lookup by Message-ID favors the "primary" one
The Message-ID mapped to an NNTP article number is stronger, so we will favor that for attachment lookups.
2018-03-29www: remove unnecessary ghost checks
We do not need to care about ghosts at multiple call sites; they cannot have a {blob} field and we've stored the blob field in Xapian since SCHEMA_VERSION=13.
2018-03-27www: support cloning individual v2 git partitions
This will require multiple client invocations, but should reduce load on the server and make it easier for readers to only clone the latest data. Unfortunately, supporting a cloneurl file for externally-hosted repos will be more difficult as we cannot easily know if the clones use v1 or v2 repositories, or how many git partitions they have.
2018-03-27view: depend on SearchMsg for Message-ID
Since we need to handle messages with multiple and duplicate Message-ID headers, our thread skeleton display must account for that. Since we have a "preferred" Message-ID in case of conflicts, use it as the UUID in an Atom feed so readers do not get confused by conflicts.
2018-03-22fix syntax warnings
I keep forgetting to run "make syntax"
2018-03-22msgmap: add tmp_clone to create an anonymous copy
This will be used to keep track of Message-ID <-> NNTP Article numbers to prevent article number reuse when reindexing.
2018-03-02evcleanup: disable outside of daemon
We'll be using these in a more OO manner for V2Writable (which doesn't use Danga::Socket), so lets not unnecessarily register cleanup handlers intended for network daemons.
2018-02-28v2/ui: retry DB reopens in a few more places
Relying more on Xapian requires retrying reopens in more places to ensure it does not fall down and show errors to the user.
2018-02-28v2/ui: some hacky things to get the PSGI UI to show up
Fortunately, Xapian multiple database support makes things easier but we still need to handle the skeleton DB separately.
2018-02-20v2: support Xapian + SQLite indexing
This is too slow, currently. Working with only 2017 LKML archives: git-only: ~1 minute git + SQLite: ~12 minutes git+Xapian+SQlite: ~45 minutes So yes, it looks like we'll need to parallelize Xapian indexing, at least.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-01-11inbox: reinstate periodic cleanup of Xapian and SQLite objects
We may need to do this even more aggressively, since the Xapian database does not always give the latest results. This time, we'll do it without relying on weak references, and instead check refcounts.