Date | Commit message (Collapse) |
|
We used ->header_obj in the past as an optimization with
Email::MIME. That optimization is no longer necessary
with PublicInbox::Eml.
This doesn't make any functional difference even if we were to
go back to Email::MIME. However, it reduces the amount of code
we have and slightly reduces allocations with PublicInbox::Eml.
|
|
Although the ->async_next method does not take $self as
a receiver, but rather a PublicInbox::HTTP object, we may
still retrieve it to be called with the HTTP object via
UNIVERSAL->can.
|
|
As in Import, we'll fall back to Sender: if From: is missing,
and use the primary_address of the inboxes to indicate the total
absence of those fields.
|
|
We no longer favor getline+close for streaming PSGI responses
when using public-inbox-httpd. We still support it for other
PSGI servers, though.
|
|
Virtually all of our responses are going to be gzipped, anyways.
This will allow us to utilize zlib as a buffering layer and
share common code for async blob retrieval responses.
To streamline this and allow GzipFilter to be a parent class,
we'll replace the NoopFilter with a similar CompressNoop class
which emulates the two Compress::Raw::Zlib::Deflate methods we
use.
This drops a bunch of redundant code and will hopefully make
upcoming WwwStream changes easier to reason about.
|
|
This allows -httpd to handle other requests while waiting
for git to retrieve and decode blobs. We'll also break
apart t/psgi_v2.t further to ensure tests run against
-httpd in addition to generic PSGI testing.
Using xt/httpd-async-stream.t to test against clones of meta@public-inbox.org
shows a 10-12% performance improvement with the following env:
TEST_JOBS=1000 TEST_CURL_OPT=--compressed TEST_ENDPOINT=new.atom
|
|
No need to deepen our object graph, here.
|
|
stat(2) on the inboxdir is unlikely to be correct, now that
msgmap truncates its journal (rather than unlinking it).
|
|
We always return Z (UTC) times, anyways, so we'll always
use gmtime() on the seconds-after-the-epoch.
|
|
Our most common endpoints deserve to be gzipped.
|
|
It's no longer necessary to populate the smsg->{mid} field now
that ->smsg_eml calls smsg->populate in rare cases where the
smsg did not originate from SQLite.
|
|
We can simplify WwwAtomStream callbacks by performing ->smsg_eml
calls in the `feed_entry' sub itself. This simplifies callers,
by reducing the number of places which can load an Eml object
into memory.
|
|
There's no need to pollute the cross-package $ctx with it.
|
|
We need to escape ampersands (and some other characters for href
attributes), so introduce a `mid_href' sub to do just that.
'<', '>' and '"' were always escaped, so there's no risk of tag
or attribute injection, but creative Message-IDs could cause
confusion for some parsers and generate invalid URLs.
Start getting rid of the bloated, over-engineered OO Hval API
while we're at it, I only noticed this bug because I started
killing off Hval->new* callers.
|
|
No point in passing something on stack only to stash it
into the $ctx which holds most other parameters used for
rendering the HTML.
|
|
I didn't wait until September to do it, this year!
|
|
Get rid of the confusingly named {rv} and {tip} fields
and unify them into {obuf} for readability.
{obuf} usage may be expanded to more areas in the future. This
will eventually make it easier for us to experiment with
alternative buffering schemes.
|
|
Be explicit that we're making a code reference, and not
a reference to a scalar, array, hash, or IO...
|
|
We're often iterating through messages while writing to another
buffer in our WWW interface, causing memory usage to multiply.
Since we know we won't need to keep the MIME object around in
some cases, and can tell msg_iter to clobber the on-stack
variable while it operates on subparts of multipart messages.
With xt/mem-msgview.t switched to multipart from the previous
commit, this shows a 13 MB memory reduction on that test.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
WwwStream already passes the WWW $ctx to the user-supplied
callback, and it's a trivial change for WwwAtomStream to do
the same. Callers in Feed.pm can now take advantage of that
to save a few kilobytes of memory on every response.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
We were emitting the same "<id>mailto:name@domain</id>" tag
for every feed (but not per-feed entry). This could cause
feed readers to mistake the top (news.atom) feed for other
feeds (search results, or per-thread feeds).
This is technically a breaking change for people relying on
per-thread or per-query feeds, but the only alternative is
to remain broken for anybody trying to follow multiple feeds
off the same inbox.
|
|
|
|
diffstat <-> ^diff anchors work within the same attachment or
message while in HTML views which display multiple messages.
|
|
Hopefully this helps people familiarize themselves with
the source code.
|
|
Since we need to handle messages with multiple and duplicate
Message-ID headers, our thread skeleton display must account
for that.
Since we have a "preferred" Message-ID in case of conflicts,
use it as the UUID in an Atom feed so readers do not get
confused by conflicts.
|
|
The first Received: header is believable since it typically
hits the user's mail server and can be treated as relatively
trustworthy. We still show the Date: in per-message (permalink)
views, which may expose users for having incorrect Date:
headers, but all the ISO YYYY-MM-DD dates we display will
match what we see.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
This can allow streaming parsers (SAX) to work a little more
efficiently as they can handle/discard all the metadata before
the big content.
|
|
Oops, we must not discard the timezone when parsing dates
for the Atom stream.
|
|
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
|
|
This will allows certain feed readers to render a message thread
as described in <https://www.jwz.org/doc/threading.html>.
Feed readers with knowledge of of RFC 4685 are unknown to us at
this time, but perhaps this will encourage future implementations.
Existing feed readers I've tested (newsbeuter, feed2imap) seem
to ignore these tags gracefully without degradation.
|
|
This will let us stream larger Atom documents bodies without
wasting too much memory and reduce the amount of round-trip
requests needed to get necessary information.
Hopefully clients are using streaming (SAX) parsers, too.
This is the final transition in the core public-inbox
code to allow migrating to a "pull"-based body streaming
scheme which allows a HTTP server to respond appropriately
to backpressure from slow clients.
|