about summary refs log tree commit homepage
path: root/lib/PublicInbox/WwwStream.pm
DateCommit message (Collapse)
2021-10-25gzip_filter: delay async wcb call
This will let us modify the response header later to set a proper charset for Content-Type when displaying raw messages. Cc: Thomas Weißschuh <thomas@t-8ch.de>
2021-09-16www_stream: note existence of IMAP and NNTP URLs
The "mirror" link may not clue users into the existence of NNTP and IMAP servers, so add a note about them (but don't list them, in case there are dozens of URLs :>).
2021-08-31www_stream: extra link to mirroring information in the footer
This may be redundant with the "mirror" link at the top right, but maybe people will miss one. Properly capitalize the "Code repositories" text while we're at it. Link: https://public-inbox.org/20210828175827.rgzwqbn7brl56oej@nitro.local/ Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-08-30www_stream: description header links to top $INBOX_URL
Making the inbox description link back to the most recent per-inbox topics from text/ and $OID/s/ URLs seems useful, rather than keeping the description up there. Followup-to: 6c853f5256f3a324 ("www: improve navigation around contemporary threads")
2021-08-30www: move mirror instructions to /text/
This makes the mirroring and code retrieval instructions less obstructive. Relying on WwwText means we only use our Linkify module to make hrefs of full URLs; making relative and shortened hrefs off-limits; hopefully this isn't too much of a problem. coderepo information remains duplicated on every page since (IMHO) coderepos are an important feature; but nobody besides me has ever bothered to configure coderepos, so I suppose it's fine... Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210826132747.6gxuwnhftyf7c6hp@nitro.local/
2021-08-28www: avoid incorrect instructions for extindex
There's no way to clone an extindex, since there's no git storage associated with them. So attempt to link to the HTML listing of public-inboxes, instead.
2021-08-28www_stream: sh-friendly .onion URLs wrapping
The long v3 .onion URL was causing havoc on small mobile displays, so extract "hostname" into a variable which can still used as a Bourne shell snippet. While we're at it, include "torsocks" in the git command used for .onion URLs since that's the (near)-universal wrapper for Tor-ifying things (like git) which are dynamically linked to libc. Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210816163654.c6gfzuezhji4l6s7@nitro.local/
2021-08-28get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd was the raw $INBOX_DIR/description text, which is now converted to octets. All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode, so length() and bytes::length() are equivalent on reads. For socket writes, any non-octet data would warn about wide characters and we are strict in warnings with test_httpd. All gzipped buffers are also octets, as is PublicInbox::Eml->body, and anything from PerlIO objects ("git cat-file --batch" output, filesystems), so bytes::length was unnecessary in all those places.
2021-05-04treewide: update to v3 Tor onions
v2 onions are insecure, deprecated and going away. v3 names are unfortunately longer and more difficult to remember, but should be more resistant to attack than v2 ones.
2021-03-17www: improve visibility of coderepos
By adding "+code" next to "mirror" at the top next to the search box. Instead of showing "/path/to/$FOO", showing "$FOO.git" makes it more obvious we're talking about a git repo, here, instead of some random directory.
2021-03-17config: lazy-load coderepos, support extindex
Extsearch objects are duck-types of Inbox objects, and are capable of supporting code repos all the same.
2021-03-17www_stream: add trailing slash for help and color links
This saves clients a redirect
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-22wwwstream: show relative coderepo URLs correctly
Trying to link "foo.git" relative to the current URL usually does not provide correct results, so prefix it by going into the parent directory if an absolute (or protocol-relative) URL is not supplied.
2020-12-22support multiple CODE_URLs
public-inbox.org will expire in a few years, so ensure Tor .onions can be known before then.
2020-12-20wwwstream: linkify coderepo URLs
It seems like a good idea to get more cgit visibility.
2020-12-10www+nntp: deal with lack of addresses for ->ALL
Since extindex is an amalgamation of several inboxes, discerning an appropriate address for List-Post: would be expensive and most likely unnecessary. Some legacy/historical inboxes may have no active address, either, so don't attempt to set the List-Post header if no addresses are configured.
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-12-05isearch: emulate per-inbox search with ->ALL
Using "eidx_key:" boolean prefix to limit results to a given inbox, we can use ->ALL to emulate and replace per-Inbox xap15/[0-9] search indices. With this change, the presence of "extindex.all.topdir" in the $PI_CONFIG will cause the WWW code to use that extindex and ignore per-inbox Xapian DBs in xap15/[0-9]. Unfortunately IMAP search still requires old per-inbox indices, for now. Mapping extindex Xapian docids to per-Inbox UIDs and vice-versa is proving tricky. Fortunately, IMAP search is rarely used and optional. The RFCs don't specify expensive phrase search, either, so `indexlevel=medium' can be used in per-inbox Xapian indices to save space. For primarily WWW (and future JMAP) users; this should result in significant disk space, FD, and page cache footprint savings for large instances with many inboxes and many cross-posted messages.
2020-09-16wwwstream: link to cgit URLs for coderepo
Hopefully this reduces the ambiguity between code for the project(s) using public-inbox and the code for public-inbox itself.
2020-09-10wwwstream: show init + index instructions for -V1, too
This should've always been there. I'm not sure how widely spread 1.0 and earlier releases were, but we'll keep documenting the version requirement.
2020-09-09wwwstream: fix "Atom feed" link
Oops, I wanted to stop escaping double-quotes with `qq()' but used `q()' instead :x Fixes: 2f61828fcb727e51 ("www: make mirror instructions more prominent")
2020-09-09www: make mirror instructions more prominent
In order to fight the misconception that public-inboxes are centralized, anchor "#mirror" to the clone instructions and place an emphasis on "mirror", not just cloning. While we're at it, better describe multi-epoch -V2 inboxes, since some users do not seem to realize epochs consist of different data.
2020-08-28www: improve navigation around contemporary threads
Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-08-01www: rework async_* to use method table
Although the ->async_next method does not take $self as a receiver, but rather a PublicInbox::HTTP object, we may still retrieve it to be called with the HTTP object via UNIVERSAL->can.
2020-07-06www: update internal docs
We no longer favor getline+close for streaming PSGI responses when using public-inbox-httpd. We still support it for other PSGI servers, though.
2020-07-06wwwstream: eliminate ::response, use html_oneshot
All of our streaming responses use ::aresponse, now, and our synchronous responses use html_oneshot. So there's no need for the old WwwStream::response.
2020-07-06view: make /$INBOX/$MSGID/ permalink async
This will allow -httpd to handle other requusts if waiting on an HDD seek or git to decode a blob.
2020-07-06wwwstream: subclass off GzipFilter
This makes WwwStream closer to MboxGz and WwwAtomStream and will eventually allow us to follow the same patterns.
2020-07-06wwwstream: use parent.pm and no warnings
parent.pm is leaner than base and we'll rely on `-w' for warnings during development.
2020-07-06wwwstream: reduce blob fetch paths for ->getline
This will make it easier to support asynchronous blob retrievals. The `$ctx->{nr}' counter is no longer implicitly supplied since many users didn't care for it, so stack overhead is slightly reduced.
2020-07-06wwwstream: reduce object graph depth
Like with WwwAtomStream and MboxGz, we can bless the existing $ctx object directly to avoid allocating a new hashref. We'll also switch from "->" to "::" to reduce stack utilization.
2020-07-06gzipfilter: replace Compress::Raw::Deflate usages
The new ->zmore and ->zflush APIs make it possible to replace existing verbose usages of Compress::Raw::Deflate and simplify buffering logic for streaming large gzipped data. One potentially user visible change is we now break the mbox.gz response on zlib failures, instead of silently continuing onto the next message. zlib only seems to fail on OOM, which should be rare; so it's ideal we drop the connection anyways.
2020-07-06www*stream: gzip ->getline responses
Our most common endpoints deserve to be gzipped.
2020-07-06wwwstream: oneshot: perform gzip without middleware
Plack::Middleware::Deflater forces us to use a memory-intensive closure. Instead, work towards building compressed strings in memory to reduce the overhead of buffering large HTML output.
2020-03-30wwwstream::oneshot => html_oneshot
And use Exporter to make our life easier, since WwwAltId was using a non-existent PublicInbox::WwwResponse namespace in error paths which doesn't get noticed by `perl -c' or exercised by tests on normal systems. Fixes: 6512b1245ebc6fe3 ("www: add endpoint to retrieve altid dumps")
2020-03-25wwwstream: oneshot sets content-length
PublicInbox::HTTP will chunk, otherwise, and that's extra overhead which isn't needed.
2020-03-25extmsg: use WwwResponse::oneshot
No reason to use the ->getline interface for small responses.
2020-03-25wwwstream: introduce oneshot API to avoid ->getline
The ->getline API is only useful for limiting memory use when streaming responses containing multiple emails or log messages. However it's unnecessary complexity and overhead for callers (PublicInbox::HTTP) when there's only a single message.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27wwwstream: discard single-use $ctx fields after use
This should make it clear that we only use these elements once and can discard them. While we're in the area, avoid escaping '"' by using qq() instead of "" to quote strings requiring interpolation.
2020-01-27www*stream: favor \&close instead of *close
Be explicit that we're making a code reference, and not a reference to a scalar, array, hash, or IO...
2020-01-24wwwstream: shorten cloneurl uniquification
Another place where List::Scalar::uniq doesn't make sense, but there's a small op reduction to be had anyways.
2020-01-06hval: export prurl and add prototype
This allows to do some compile-time checking and fills in a missing "use" in PublicInbox::NewsWWW, allowing it to be used standalone and independently of PublicInbox::WWW
2019-10-17Merge remote-tracking branch 'origin/inboxdir'
* origin/inboxdir: config: remove redundant inboxdir check config: support "inboxdir" in addition to "mainrepo" examples/grok-pull.post_update_hook: use "inbox_dir"
2019-10-17doc: avoid [<directory>] arg for git-clone(1)
While it is possible to host source code from the root of a URL using git-http-backend(1), the lack of pathname in the URL can also be confusing to users. So just add the path name of the project into the URL itself so users can invoke "git clone" with one command-line argument instead of two. Of course, previously documented URLs continue to work as normal.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-10-01www: fix absolute URLs when mounted under a subdir
While we avoid generating absolute URLs in most cases, our "git clone" instructions and URL headers in mboxrd files contain full URLs. So do the same thing we do for WwwAtomStream and pre-generate the full URL before Plack::App::URLMap changes $env->{PATH_INFO} and $env->{SCRIPT_NAME} back to their original values. Reported-by: edef <edef@edef.eu> Link: https://public-inbox.org/meta/cover.0f97c47bb88db8b875be7497289d8fedd3b11991.1569296942.git-series.edef@edef.eu/
2019-09-27wwwtext: support $INBOX_URL/_/text/config/raw
This returns a git-config(1)-compatible file to make it easier to get started on mirroring an existing public-inbox. Omitting the "raw" from the URL works, as well, but I'm not sure if it's very useful.