Date | Commit message (Collapse) |
|
This ought to give us more CoW savings and fragmentation
avoidance in -httpd.
|
|
cloneurl, description, and base_url are no longer memoized. The
non-$env form of base_url is rare in WWW, and is fast enough to
not require memoization.
cloneurl and description are now expired during cleanup,
allowing admins to change these files without restarting
(or SIGHUP).
-altid_map is no longer cached nor memoized at all, since the
endpoint(s) which hit it seem rarely accessed.
nntp_url and imap_url are now cached (instead of memoized) in
case an inbox is unvisited for a long time. They remain cached
since the truthiness check gets called in every per-inbox HTML
page, which can potentially be expensive.
|
|
While each git blob request is treated fairly w.r.t other git
blob requests, responses triggering thousands of git blob
requests can still noticeably increase latency for
less-expensive responses.
Move large mbox results and the nasty all.mbox endpoint to
a low priority queue which only fires once per-event loop
iteration. This reduces the response time of short HTTP
responses while many gigantic mboxes are being downloaded
simultaneously, but still maximizes use of available I/O
when there's no inexpensive HTTP responses happening.
This only affects PublicInbox::WWW users who use
public-inbox-httpd, not generic PSGI servers.
|
|
The only place where we could return wide characters with -httpd
was the raw $INBOX_DIR/description text, which is now converted
to octets.
All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode,
so length() and bytes::length() are equivalent on reads. For
socket writes, any non-octet data would warn about wide characters
and we are strict in warnings with test_httpd.
All gzipped buffers are also octets, as is PublicInbox::Eml->body,
and anything from PerlIO objects ("git cat-file --batch" output,
filesystems), so bytes::length was unnecessary in all those places.
|
|
Since CSS can be overridden by a static webserver on a per-inbox
basis, we need a similar pattern to deal with the instance-wide
WwwListing HTML. "/+/" probably won't conflict with any current
nor future public inbox names.
I don't think it'll cause problems with common linkifiers or URL
extractors, either (and it's unlikely anybody would want to
share URLs of just the CSS in a plain text(-like) format).
|
|
Sometimes users (or bots) may lead queries with '&' and
trigger uninitialized variable warnings, just ignore them
and give consumers a $ctx->{qp}->{''} entry.
While we're in the area, pass a regexp rather than scalar string
to the `split' perlop to prevent Perl from recompiling the
regexp on every call.
|
|
Don't attempt to return HTTP 300 via Extmsg on it,
since whoever uses /raw is likely piping it to some
other command.
|
|
Extsearch objects are duck-types of Inbox objects, and
are capable of supporting code repos all the same.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
/$INBOX/manifest.js.gz should not attempt to match every inbox
in the domain (or every inbox); that is for /manifest.js.gz
(without a /$INBOX prefix).
Fixes: f303b4add8ea1883 ("wwwlisting: avoid hogging event loop")
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
{ibx} is shorter and is the most prevalent abbreviation
in indexing and IMAP code, and the `$ibx' local variable
is already prevalent throughout.
In general, the codebase favors removal of vowels in variable
and field names to denote non-references (because references are
"lighter" than non-references).
So update WWW and Filter users to use the same code since
it reduces confusion and may allow easier code sharing.
|
|
Using "eidx_key:" boolean prefix to limit results to a given
inbox, we can use ->ALL to emulate and replace per-Inbox
xap15/[0-9] search indices.
With this change, the presence of "extindex.all.topdir" in the
$PI_CONFIG will cause the WWW code to use that extindex and
ignore per-inbox Xapian DBs in xap15/[0-9].
Unfortunately IMAP search still requires old per-inbox indices,
for now. Mapping extindex Xapian docids to per-Inbox UIDs and
vice-versa is proving tricky. Fortunately, IMAP search is
rarely used and optional. The RFCs don't specify expensive
phrase search, either, so `indexlevel=medium' can be used in
per-inbox Xapian indices to save space.
For primarily WWW (and future JMAP) users; this should result in
significant disk space, FD, and page cache footprint savings for
large instances with many inboxes and many cross-posted
messages.
|
|
This lets us pretend an ExtSearch object is an Inbox object
in most of the existing WWW code.
|
|
This will help with eventual git SHA-256 transitions.
|
|
By using the just-introduced ConfigIter class.
And make ManifestJsGz a subclass of it to reduce duplication.
|
|
It's still as slow as before with hundreds/thousands of inboxes,
but at least it's fair. Future changes will allow it to be
cached and memoized with persistent HTTP servers.
|
|
This means we need to filter out "" from query parameters.
While we're at it, update comments for the WWW endpoint.
|
|
It'll give us a nicer HTML header and footer.
|
|
PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
|
|
Since PublicInbox::Eml doesn't parse MIME subparts
up front, it can replace most uses of Email::Simple
without performance penalty.
This will eventually allow us to lower overall internal
API footprint by not having to keep the MIME vs Simple
distinction.
|
|
Encode lazy-loads encodings on an as-needed basis. This is
great for short-lived programs, but leads to fragmentation in
long-lived daemons where immortal allocations can get
interleaved with short-lived, per-request allocations.
Since we have no idea which encodings will be needed when
there's a constant flow of incoming mail, just preload
everything available at startup.
|
|
Seeing the example config linkified, some users may inevitably
try to following it in a browser with a GET request. Provide
a helpful message to inform users to use POST instead of
attempting to treat /$INBOX/$ALTID.sql.gz as a Message-Id.
|
|
We want to be able to preload that, as well as to access it
in WwwText for a config comment in the config example.
|
|
This ensures all our indexed data, including data from altid
searches (e.g. "gmane:$ARTNUM") is retrievable.
It uses a "POST" request to avoid wasting cycles when invoked by
crawlers, since it could potentially be several megabytes of
data not indexable by search engines.
|
|
Doing immortal allocations late can cause those allocations
to end up in places where it fragments the heap. So do more
things up front for long-lived daemons.
|
|
We'll also avoid explicitly loading standard library modules
like POSIX and Digest::SHA, here; instead we load our own
modules and let those load whatever non-PublicInbox:: modules
they need.
|
|
I didn't wait until September to do it, this year!
|
|
Instead of serving $INBOX_DIR/all.git/description, since
$INBOX_DIR/all.git/description is not described in the
default message when it's missing.
|
|
We want to match "GET" and "HEAD" exactly, not requests which
start with "GET" or end with "HEAD". This doesn't seem like
a real problem for public-inboxes which are actually public
data anyways.
|
|
It's now possible to use WwwStatic as a standalone PSGI
app to serve static files and recreate the award-winning
web design of https://public-inbox.org/ :>
|
|
Remove redundant "r" functions for generating short error
responses. These responses will no longer be cached by clients,
which is probably a good thing since most errors ought to be
transient, anyways. This also fixes error responses for our
cgit wrapper when static files are missing.
|
|
It'll be easier to reuse in future code.
|
|
cgit users won't need Plack::Util, here.
|
|
This hasn't been used since commit 48b21cb662c1e17b7 in 2016:
("declare Inbox object for reusability")
|
|
|
|
Try to remain consistent with our own documentation regarding
v2 git "epochs", first.
|
|
And use it in manifest.js.
To ease maintaining mirrors with grokmirror(1), we can accept
a "git/" directory prefix before the epoch, and ".git" suffix
after the epoch number.
We maintain compatibility with "$INBOX/$EPOCH" cloning, of
course, and it's still easier-to-type on the command-line.
|
|
I can imagine myself just wanting to clone a single v2 inbox
and all its epochs without thinking about include/exclude
rules in a grokmirror config file.
|
|
Support on-demand generation of "/manifest.js.gz" for inboxes.
By default, this matches inboxes with URLs matching the given
request hostname by default.
This makes it easier to create full mirrors of several inboxes
without needing to configure static file serving.
cf. https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
|
|
Allowing admins to set non-ASCII CSS filenames could
cause unnecessary problems for client and proxies.
|
|
We do not support many mboxrd download range specifications at
the moment; but parsing non-ASCII characters isn't planned.
This makes no difference aside from being able to return 404
slightly earlier than we would've in the past.
|
|
Don't inadvertantly serve git repos containing non-ASCII
digit characters.
|
|
Our Hval::to_filename sub has always been strict about emitting
ASCII-only characters for ViewVCS "raw" links.
However, somebody could manually generate a filename with
non-ASCII words for somebody else to download (we have no
cheap and fast way of mapping filenames back to blobs for
validation).
|
|
We don't want to emit funky URLs which can be lost in
translation or cause problems with non-Unicode-aware
clients.
Then, don't accept non-ASCII filenames in URLs, since
a manually-generated URL/filename in attachment downloads
could be used for Unicode homographs to confuse folks who
down the attachment.
|
|
* origin/xap-optional:
admin: improve warnings and errors for missing modules
searchidx: do not create empty Xapian partitions for basic
lazy load Xapian and make it optional for v2
www: use Inbox->over where appropriate
nntp: use Inbox->over directly
inbox: add ->over method to ease access
|
|
This allows searching for terms with "+" in them properly.
|
|
More tests work without Search::Xapian, now.
Usability issues still need to be fixed
|
|
We don't need to rely on Xapian search functionality for the
majority of the WWW code, even. subject_normalized is moved to
SearchMsg, where it (probably) makes more sense, anyways.
|
|
We will still return a 404 by default to '/' for compatibility
with users of Plack::App::Cascade or similar. Inboxes are
sorted by modification times to help users detect activity
(similar to the /$INBOX/ topic view).
New configuration options:
* publicinbox.wwwlisting - configure the listing type
* publicinbox.<name>.hide - hide a particular inbox from the listing
See changes to public-inbox-config.pod for full descriptions
of the new options.
Requested-by: Leah Neukirchen <leah@vuxu.org>
https://public-inbox.org/meta/871sdfzy80.fsf@gmail.com/
|