about summary refs log tree commit homepage
path: root/lib/PublicInbox/WwwListing.pm
DateCommit message (Collapse)
2023-11-30inbox: shrink data structures for publicinbox.*.hide
We no longer vivify the intermediate $ibx->{-hide} hashref, instead we use $ibx->{-hide_$KEY} directly. This avoids an intermediate hashref and extra hash table lookups.
2023-11-30www_listing: support publicInbox.nameIsUrl
This is a convenient (and slightly memory-saving) alternative to specifying a `publicinbox.*.url' entry for every single inbox when using publicinbox.wwwListing.
2022-12-15www_listing: drop "sort options + mbox downloads" bit
The sort options and mbox downloads only apply to individual inbox search endpoints, and they make no sense for the listing of inboxes themselves.
2022-09-10www_listing: switch to `print $zfh'
Again, ->deflate (and thus ->zmore) calls are relatively expensive compared to `print' ops using PerlIO::scalar behind-the-scenes. While I can likely optimize the `join' away here, too, that will happen in a future commit.
2022-09-10www_listing: avoid unnecessary work for common cases
We need to branch for non-empty `q=' parameters anyways, but `q=' is usually empty/unset. While we're in the area, `chomp' reads `$/' while `chop' is simpler. Furthermore, we can shave a few bytes off the form HTML by omitting spaces before `/>' and placing `\n' to wrap long lines before attribute names.
2022-09-10www_listing: consolidate some ->zmore dispatches
`.' concatenation is still faster for small strings, but passing an array to ->zmore is more efficient for large search results and full listings.
2021-09-26www_listing: support /all/ search as a 302 redirect
This allows users to search /all/ from the top-level WwwListing without extra manual steps, although there's still extra network roundtrips incurred. No vertical whitespace is added, and there's no clumsy radio buttons nor menus to deal with. Users only have to use a different <input type=submit /> button. I forgot how to do this until I realized we already do something similar with multiple submit buttons for threaded vs non-threaded mboxrd.gz downloads. Link: https://public-inbox.org/meta/20210827120845.29682-1-e@80x24.org/
2021-08-31www_listing: add note about mirroring information
Perhaps this can be expanded to include grokmirror information in the future. For now, just give a hint about the "mirror" link for each inbox.
2021-08-30www: move mirror instructions to /text/
This makes the mirroring and code retrieval instructions less obstructive. Relying on WwwText means we only use our Linkify module to make hrefs of full URLs; making relative and shortened hrefs off-limits; hopefully this isn't too much of a problem. coderepo information remains duplicated on every page since (IMHO) coderepos are an important feature; but nobody besides me has ever bothered to configure coderepos, so I suppose it's fine... Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210826132747.6gxuwnhftyf7c6hp@nitro.local/
2021-08-28www: avoid potential auto-vivification on ibx->{url}
This may fix problems with the "all" link disappearing. Link: https://public-inbox.org/meta/CAMwyc-Tw=v5yT1U1U66GSwwTK8OJXv8_YDu-=oXbZO3tHSnYWw@mail.gmail.com/
2021-08-28www_listing: fix odd "locate inbox" cases
Searching inboxes with an empty query no longer gives 500 errors due to Xapian. Also, improve the error message when no inboxes match, since saying no inboxes exist yet is wrong.
2021-08-28www_listing: show ->ALL at top of HTML listing
It's a special case and we can show it in the HTML display without affecting manifest.js.gz generation.
2021-08-28get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd was the raw $INBOX_DIR/description text, which is now converted to octets. All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode, so length() and bytes::length() are equivalent on reads. For socket writes, any non-octet data would warn about wide characters and we are strict in warnings with test_httpd. All gzipped buffers are also octets, as is PublicInbox::Eml->body, and anything from PerlIO objects ("git cat-file --batch" output, filesystems), so bytes::length was unnecessary in all those places.
2021-08-26wwwlisting: support global CSS in HTML view
Since CSS can be overridden by a static webserver on a per-inbox basis, we need a similar pattern to deal with the instance-wide WwwListing HTML. "/+/" probably won't conflict with any current nor future public inbox names. I don't think it'll cause problems with common linkifiers or URL extractors, either (and it's unlikely anybody would want to share URLs of just the CSS in a plain text(-like) format).
2021-06-29www: fix manifest.js.gz for default publicInbox.grokManifest
ManifestJsGz->response was not invoking the new "url_filter" method properly. Furthermore, fix url_filter for returning 404 responses. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87fsx3128a.fsf@kyleam.com/ Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
2021-06-24www_listing: fix manifest.js.gz generation with extindex "all"
WwwListing and ManifestJsGz may be too different nowadays to be worth the code sharing between them. Update some comments and note we still needs better tests :x Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
2021-06-23www_listing: start updating for pagination + search
When dealing with thousands of inboxes, displaying all of them on a single page isn't going to work. So steal some pagination and search results code from the message search to generate some basic HTML output that looks good in w3m.
2021-02-24www: use PublicInbox::WwwStream
This prevents the following problem logged to the webserver's error log: E: Undefined subroutine &PublicInbox::WwwStream::code_footer called at /usr/share/perl5/PublicInbox/WwwListing.pm line 102. in PublicInbox::ConfigIter=ARRAY(0x557aea68b1a8)::each_section at /usr/share/perl5/PublicInbox/ConfigIter.pm line 37. Fixes: 7a3946ef122e ("www: support listing of inboxes")
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-28miscsearch: take reopen from Search and use it
As with ExtSearch, MiscSearch lacks a janky cleanup timer of PublicInbox::Inbox objects, leading to info about inboxes/newsgroups going stale. Fortunately, we don't use MiscSearch very heavily, yet. In the future, we may be able to detect new inboxes without having to SIGHUP or restart daemons using MiscSearch.
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-09-10wwwlisting: avoid hogging event loop
By using the just-introduced ConfigIter class. And make ManifestJsGz a subclass of it to reduce duplication.
2020-09-10config: flatten each_inbox and iterate_start args
In Perl, we can simplify callers by passing a single array all the way down the stack instead of a single array ref which needs to be expanded every call.
2020-09-10www: manifest.js.gz generation no longer hogs event loop
It's still as slow as before with hundreds/thousands of inboxes, but at least it's fair. Future changes will allow it to be cached and memoized with persistent HTTP servers.
2020-09-10use "\&" where possible when referring to subroutines
"*foo" is ambiguous in that it may refer to a bareword file handle; so we'll use it where we can without triggering warnings. PublicInbox::TestCommon::run_script_exit required dropping the prototype, however. We'll also future-proof by dropping "use warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm while we're in the area.
2020-08-28www: improve navigation around contemporary threads
Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-07-30wwwlisting: fix grep call for match=domain filtering
The grep call in list_match_domain_i returns true for all inboxes, even ones without a URL that matches the regular expression, because the qr value passed to grep is not surrounded by slashes. Add them. Fixes: 1988d730c0088e8b (config: support multi-value inbox.*.*url)
2020-07-06www: start making gzipfilter the parent response class
Virtually all of our responses are going to be gzipped, anyways. This will allow us to utilize zlib as a buffering layer and share common code for async blob retrieval responses. To streamline this and allow GzipFilter to be a parent class, we'll replace the NoopFilter with a similar CompressNoop class which emulates the two Compress::Raw::Zlib::Deflate methods we use. This drops a bunch of redundant code and will hopefully make upcoming WwwStream changes easier to reason about.
2020-07-06{gzip,noop}filter: ->zmore returns undef, always
This simplifies callers, as witnessed by the change to WwwListing. It adds overhead to NoopFilter, but NoopFilter should see little use as nearly all HTTP clients request gzip.
2020-07-06wwwlisting: use GzipFilter for HTML
The changes to GzipFilter here may be beneficial for building HTML and XML responses in other places, too.
2020-06-03wwwlisting: utf8::decode before undef
Assisted by commit a73957b5b05f2a00f7a85353b1658b6d8cde05ae ("testcommon: speed up wait_for_tail() on GNU/Linux") Fixes: 846161e3d1207d59 ("treat $INBOX_DIR/description and gitweb.owner as UTF-8")
2020-05-29treat $INBOX_DIR/description and gitweb.owner as UTF-8
gitweb does the same with $GIT_DIR/description and gitweb.owner. Allowing UTF-8 description should not cause problems when used in responses for to the NNTP "LIST NEWSGROUPS" request, either, since RFC 3977 section 7.6.6 recommends the description be UTF-8 (but does not require it). Link: https://public-inbox.org/meta/20200528151216.l7vmnmrs4ojw372g@sourcephile.fr/
2020-04-22make zlib-related modules a hard dependency
This allows us to simplify some of our existing code and make future changes easier. I doubt anybody goes through the trouble to have a Perl installation without zlib support. The zlib source code is even bundled with Perl since 5.9.3 for systems without existing zlib development headers and libraries. Of course, zlib is also a requirement of git, too; and we're not going to stop using git :) [squashed: "wwwaltid: use gzipfilter up front"]
2020-03-21wwwlisting: use first successfully loaded JSON module
And not the last... I only noticed this since JSON::PP::Boolean was spewing redefinition warnings via overload.pm Fixes: 8fb8fc52420ef669 ("wwwlisting: avoid lazy loading JSON module")
2020-03-20wwwlisting: avoid lazy loading JSON module
We already lazy-load WwwListing for the CGI script, and hiding another layer of lazy-loading makes things difficult to do WWW->preload. We want long-lived processes to do all long-lived allocations up front to avoid fragmentation in the allocator, but we'll still support short-lived processes by lazy-loading individual modules in the PublicInbox::* namespace. Mixing up allocation lifetimes (e.g. doing immortal allocations while a large amount of space is taken by short-lived objects) will cause fragmentation in any allocator which favors large contiguous regions for performance reasons. This includes any malloc implementation which relies on sbrk() for the primary heap, including glibc malloc.
2020-03-20wwwlisting: favor "use" over require
"use" is also evaluated earlier than "require", so it is favorable for compile-only checking.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27linkify: move to_html over from ViewDiff
We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-11spawn (and thus popen_rd) die on failure
Most spawn and popen_rd callers die on failure to spawn, anyways, and some are missing checks entirely. This saves us a bunch of verbose error-checking code in callers. This also makes popen_rd more consistent, since it already dies on pipe creation failures.
2020-01-06hval: export prurl and add prototype
This allows to do some compile-time checking and fills in a missing "use" in PublicInbox::NewsWWW, allowing it to be used standalone and independently of PublicInbox::WWW
2020-01-02config: support multi-value inbox.*.*url
Since the beginning of this project, we've implicitly supported inboxes with multiple URLs by relying on the Host: header sent by the client ($env->{HTTP_HOST}). We now offer the option to explicitly configure multiple URLs for every inbox along with the ability to do a best-effort match for matching hostnames.
2020-01-02wwwlisting: show configured "infourl" properly
git's config file keys lack underscores, but my mind is wired for underscores :x. Fix the whitespace around the info URL while we're at it, so that it shows up right under the inbox description.
2019-12-27config: each_inbox: pass user arg to callback
Another place where we can replace anonymous subs with named subs by passing a user-supplied arg.
2019-12-26wwwlisting: do not rely on $? after ProcessPipe::CLOSE
ProcessPipe::CLOSE won't reliably set $? inside the event loop if waitpid(..., WNOHANG) isn't successful. So use a blocking waitpid() call, here, and hope "git show-ref" exits promptly since we've already drained its stdout.
2019-12-24remove "no warnings 'once'" in a few places
We can use "use" to get the namespace into the "BEGIN" phase of the interpreter. While we're at it, use \&coderef syntax explicitly instead of globbing everything.
2019-10-30wwwlisting: fix spelling and clarify sub location
Spell "Schwartzian" correctly, and clarify the location of "modified" since we have multiple subs named "modified"
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-06-23manifest: v2 epoch descriptions based on inbox->description
The default $GIT_DIR/description (provided by git.git templates) isn't very useful for v2 epochs, so use the inbox description and suffix it with the epoch number if it's otherwise unnamed. Requested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> https://public-inbox.org/meta/20190620190017.GA27175@chatter.i7.local/
2019-06-14rename reference to git epochs as "partitions"
Try to remain consistent with our own documentation regarding v2 git "epochs", first.
2019-06-09www: support $INBOX/git/$EPOCH.git for v2 cloning
And use it in manifest.js. To ease maintaining mirrors with grokmirror(1), we can accept a "git/" directory prefix before the epoch, and ".git" suffix after the epoch number. We maintain compatibility with "$INBOX/$EPOCH" cloning, of course, and it's still easier-to-type on the command-line.