Date | Commit message (Collapse) |
|
This is a major step in solving the problem of having to
manually associate hundreds/thousands of coderepos with
hundreds/thousands of public-inboxes to power solver
(and more).
|
|
This seems like a easy (but WWW-specific) way to get recently
created and recently active topics as suggested by Konstantin.
To do this with Xapian will require a new columns and
reindexing; and I'm not sure if the current lei handling of
search results by dumping results to a format readable by common
MUAs would work well with this. A new TUI may be required...
Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20231107-skilled-cobra-of-swiftness-a6ff26@meerkat/
|
|
The IO package seems like a better home for I/O subs than the
Git package. We lose the 60 second read timeout for `git
cat-file --batch-*' processes since it's probably not necessary
given how reliable the code has proven and things would fall
over hard in other ways if the storage device were completely
hosed.
|
|
IMHO, this is not worth importing here since it's only called at
startup.
Fixes: 19b791f4894e (use read_all in more places to improve safety)
|
|
`readline' ops may not detect errors on partial reads.
This saves us some code to reduce cognitive overhead for
readers. We'll also support reusing a destination buffers so it
can work more nicely with existing code.
|
|
Only allow single-character query keys to prevent clients from
wasting memory in Perl's hash tables. We'll also perform the
utf8::decode and tr/+/ / calls once on the whole query string at
once to reduce op calls.
This also avoids creating an empty hash in the common case
when the QUERY_STRING is empty and instead relies on
auto-vivification of Perl.
|
|
This allows filtering the contents of any existing thread using
a search query. It uses the existing THREADID column in Xapian
so we can internally add a Xapian OP_FILTER to the results.
This new functionality is orthogonal to the existing `t=1'
parameter which gives mairix-style thread expansion. It doesn't
make sense to use `t=1' with this functionality, but it's not
disallowed, either.
The indentation change in Over->next_by_mid is to ensure
DBI->prepare_cached can share across both ->next_by_mid
and ->mid2tid.
I also noticed the existing regex for `POST /$INBOX/?x=m&q=' was
allowing extra characters. With an added \z, it's now as strict
was originally intended and AFAIK nothing was generating invalid
URLs for it
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/aaniyhk7wfm4e6m5mbukcrhevzoc6ftctyrfwvmz4fkykwwtlj@mverfng6ytas/T/
|
|
To ensure users aren't abusing the ability to reuse Message-IDs,
provide a convenient front-end to `lei mail-diff' from WWW.
Most of the time it's just list-appended signatures, so I expect
this to be useful for /all/ users.
|
|
For backwards-compatibility, this defaults to `first'. When set
to `fallback', PublicInbox::WwwCoderepo is favored and cgit is
only used as a fallback. Eventually, `rewrite' will also be
supported to rewrite cgit URLs to WwwCoderepo ones.
Of course, WwwCoderepo is still missing search and other key
features, but that's being worked on...
|
|
The PublicInbox::Cgit wrapper will return a sub-ref for most
responses, so ensure we don't try to treat it as an array-ref.
|
|
This will allow it to easily map a single coderepo to multiple
inboxes (or multiple coderepos to any number of inboxes).
For now, this is just a summary, but $REPO/$OID/s/ support
will be added, along with archive downloads.
Indexing of coderepos will probably be supported via -extindex,
only.
|
|
Another step towards making our internal APIs more writev-like
and reducing the copies needed for `join' or `.=' concatenation.
|
|
The $r404 variable is unset if we have a valid inbox, but no
coderepos configured for that inbox, thus we must `r(404)'
explicitly.
|
|
`+' already seemed to works for IMAP mailboxes and NNTP newsgroup
names and git-config doesn't complain, either. So allow it as
the path components of WWW URLs so projects like `libstdc++' can
use it.
Reported-by: Mark Wielaard <mark@klomp.org>
Tested-by: Mark Wielaard <mark@klomp.org>
Link: https://public-inbox.org/meta/YwKnFCvganW7ErXU@wildebeest.org/
|
|
Apparently some browsers can set a Referer: header which fails
to match. I'm not certain why, but making "$schema://$HOST_PORT"
matches case-insensitive seems more correct regardless.
In case that doesn't work, we'll also allow bypassing deep-link
prevention via a POST form button.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://public-inbox.org/meta/93ebfbd1-9924-481c-4edc-9b232d1e995c@suse.cz/
|
|
As stated in the previous change, conditional hash assignments
which trigger other hash assignments seem problematic, at times.
So replace:
$h->{k} //= do { $h->{x} = ...; $val };
$h->{k} // do {
$h->{x} = ...;
$hk->{k} = $val
};
"||=" is affected the same way, and some instances of "||=" are
replaced with "//=" or "// do {", now.
|
|
This ought to give us more CoW savings and fragmentation
avoidance in -httpd.
|
|
cloneurl, description, and base_url are no longer memoized. The
non-$env form of base_url is rare in WWW, and is fast enough to
not require memoization.
cloneurl and description are now expired during cleanup,
allowing admins to change these files without restarting
(or SIGHUP).
-altid_map is no longer cached nor memoized at all, since the
endpoint(s) which hit it seem rarely accessed.
nntp_url and imap_url are now cached (instead of memoized) in
case an inbox is unvisited for a long time. They remain cached
since the truthiness check gets called in every per-inbox HTML
page, which can potentially be expensive.
|
|
While each git blob request is treated fairly w.r.t other git
blob requests, responses triggering thousands of git blob
requests can still noticeably increase latency for
less-expensive responses.
Move large mbox results and the nasty all.mbox endpoint to
a low priority queue which only fires once per-event loop
iteration. This reduces the response time of short HTTP
responses while many gigantic mboxes are being downloaded
simultaneously, but still maximizes use of available I/O
when there's no inexpensive HTTP responses happening.
This only affects PublicInbox::WWW users who use
public-inbox-httpd, not generic PSGI servers.
|
|
The only place where we could return wide characters with -httpd
was the raw $INBOX_DIR/description text, which is now converted
to octets.
All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode,
so length() and bytes::length() are equivalent on reads. For
socket writes, any non-octet data would warn about wide characters
and we are strict in warnings with test_httpd.
All gzipped buffers are also octets, as is PublicInbox::Eml->body,
and anything from PerlIO objects ("git cat-file --batch" output,
filesystems), so bytes::length was unnecessary in all those places.
|
|
Since CSS can be overridden by a static webserver on a per-inbox
basis, we need a similar pattern to deal with the instance-wide
WwwListing HTML. "/+/" probably won't conflict with any current
nor future public inbox names.
I don't think it'll cause problems with common linkifiers or URL
extractors, either (and it's unlikely anybody would want to
share URLs of just the CSS in a plain text(-like) format).
|
|
Sometimes users (or bots) may lead queries with '&' and
trigger uninitialized variable warnings, just ignore them
and give consumers a $ctx->{qp}->{''} entry.
While we're in the area, pass a regexp rather than scalar string
to the `split' perlop to prevent Perl from recompiling the
regexp on every call.
|
|
Don't attempt to return HTTP 300 via Extmsg on it,
since whoever uses /raw is likely piping it to some
other command.
|
|
Extsearch objects are duck-types of Inbox objects, and
are capable of supporting code repos all the same.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
/$INBOX/manifest.js.gz should not attempt to match every inbox
in the domain (or every inbox); that is for /manifest.js.gz
(without a /$INBOX prefix).
Fixes: f303b4add8ea1883 ("wwwlisting: avoid hogging event loop")
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
{ibx} is shorter and is the most prevalent abbreviation
in indexing and IMAP code, and the `$ibx' local variable
is already prevalent throughout.
In general, the codebase favors removal of vowels in variable
and field names to denote non-references (because references are
"lighter" than non-references).
So update WWW and Filter users to use the same code since
it reduces confusion and may allow easier code sharing.
|
|
Using "eidx_key:" boolean prefix to limit results to a given
inbox, we can use ->ALL to emulate and replace per-Inbox
xap15/[0-9] search indices.
With this change, the presence of "extindex.all.topdir" in the
$PI_CONFIG will cause the WWW code to use that extindex and
ignore per-inbox Xapian DBs in xap15/[0-9].
Unfortunately IMAP search still requires old per-inbox indices,
for now. Mapping extindex Xapian docids to per-Inbox UIDs and
vice-versa is proving tricky. Fortunately, IMAP search is
rarely used and optional. The RFCs don't specify expensive
phrase search, either, so `indexlevel=medium' can be used in
per-inbox Xapian indices to save space.
For primarily WWW (and future JMAP) users; this should result in
significant disk space, FD, and page cache footprint savings for
large instances with many inboxes and many cross-posted
messages.
|
|
This lets us pretend an ExtSearch object is an Inbox object
in most of the existing WWW code.
|
|
This will help with eventual git SHA-256 transitions.
|
|
By using the just-introduced ConfigIter class.
And make ManifestJsGz a subclass of it to reduce duplication.
|
|
It's still as slow as before with hundreds/thousands of inboxes,
but at least it's fair. Future changes will allow it to be
cached and memoized with persistent HTTP servers.
|
|
This means we need to filter out "" from query parameters.
While we're at it, update comments for the WWW endpoint.
|
|
It'll give us a nicer HTML header and footer.
|
|
PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
|
|
Since PublicInbox::Eml doesn't parse MIME subparts
up front, it can replace most uses of Email::Simple
without performance penalty.
This will eventually allow us to lower overall internal
API footprint by not having to keep the MIME vs Simple
distinction.
|
|
Encode lazy-loads encodings on an as-needed basis. This is
great for short-lived programs, but leads to fragmentation in
long-lived daemons where immortal allocations can get
interleaved with short-lived, per-request allocations.
Since we have no idea which encodings will be needed when
there's a constant flow of incoming mail, just preload
everything available at startup.
|
|
Seeing the example config linkified, some users may inevitably
try to following it in a browser with a GET request. Provide
a helpful message to inform users to use POST instead of
attempting to treat /$INBOX/$ALTID.sql.gz as a Message-Id.
|
|
We want to be able to preload that, as well as to access it
in WwwText for a config comment in the config example.
|
|
This ensures all our indexed data, including data from altid
searches (e.g. "gmane:$ARTNUM") is retrievable.
It uses a "POST" request to avoid wasting cycles when invoked by
crawlers, since it could potentially be several megabytes of
data not indexable by search engines.
|
|
Doing immortal allocations late can cause those allocations
to end up in places where it fragments the heap. So do more
things up front for long-lived daemons.
|
|
We'll also avoid explicitly loading standard library modules
like POSIX and Digest::SHA, here; instead we load our own
modules and let those load whatever non-PublicInbox:: modules
they need.
|
|
I didn't wait until September to do it, this year!
|
|
Instead of serving $INBOX_DIR/all.git/description, since
$INBOX_DIR/all.git/description is not described in the
default message when it's missing.
|
|
We want to match "GET" and "HEAD" exactly, not requests which
start with "GET" or end with "HEAD". This doesn't seem like
a real problem for public-inboxes which are actually public
data anyways.
|
|
It's now possible to use WwwStatic as a standalone PSGI
app to serve static files and recreate the award-winning
web design of https://public-inbox.org/ :>
|
|
Remove redundant "r" functions for generating short error
responses. These responses will no longer be cached by clients,
which is probably a good thing since most errors ought to be
transient, anyways. This also fixes error responses for our
cgit wrapper when static files are missing.
|
|
It'll be easier to reuse in future code.
|
|
cgit users won't need Plack::Util, here.
|