Date | Commit message (Collapse) |
|
In many cases, we do not care about the total number of
messages. It's a rather expensive operation in SQLite
(Xapian only provides an estimate).
For LKML, this brings top-level /$INBOX/ loading time from
~375ms to around 60ms on my system. Days ago, this operation
was taking 800-900ms(!) for me before introducing the SQLite
overview DB.
|
|
We can avoid a small amount of overhead and use the "preferred"
Message-ID based on what is in the SearchMsg object.
|
|
We do not need to care about ghosts at multiple call sites; they
cannot have a {blob} field and we've stored the blob field in
Xapian since SCHEMA_VERSION=13.
|
|
This needs tests and further refinement, but current tests pass.
|
|
Since v2 supports duplicate messages, we need to support
looking up different messages with the same Message-Id.
Fortunately, our "raw" endpoint has always been mboxrd,
so users won't need to change their parsing tools.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Allowing downloading of all search results as an gzipped mboxrd
file can be convenient for some users.
|
|
This is hopefully more sensical than "raw" files from
resulting downloads.
|
|
Sigh, yet another place to handle obfuscation for misguided
people who expect it. Maybe this will do something to prevent
spammers from getting addresses, while still allowing the
"curl $URL | git am" use case to work.
|
|
This makes life easier for the threading algorithm, as we can
use the implied ordering of timestamps to avoid temporary ghosts
and resulting container vivication.
This would've also allowed us to hide the bug (in most cases)
fixed by the patch titled "thread: last Reference always wins",
in case that needs to be reverted due to infinite looping.
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
At least for public-inbox-httpd, this allows us to avoid having
a client monopolize one event loop tick of the server for too
long. It hurts throughput for the /all.mbox.gz endpoint, but I
doubt anybody cares and the latency improvement for other
clients would be appreciated.
We already do the same fairness thing for HTML pages.
|
|
Doing git tree lookups based on the SHA-1 of the Message-ID
is expensive as trees get larger, instead, use the SHA-1
object ID directly. This drastically reduces the amount
of time spent in the "git cat-file --batch" process for
fetching the /$INBOX/all.mbox.gz endpoint on the ~800MB
git@vger.kernel.org mirror
This retains backwards compatibility and allows existing
indices to be transparently upgraded without performance
degradation.
|
|
Hopefully this can reduce memory overhead for people that
use one-shot CGI.
|
|
This is lighter and we can work further towards eliminating
our Plack::Request dependency entirely.
|
|
We want to avoid sending 10 or 20-byte gzip headers as
separate TCP packets to reduce syscalls and avoid wasting
bandwidth.
|
|
Favor Inbox objects as our primary source of truth to simplify
our code. This increases our coupling with PSGI to make it
easier to write tests in the future.
A lot of this code was originally designed to be usable
standalone without PSGI or CGI at all; but that might increase
development effort.
|
|
Prefer to return strings instead, so Content-Length can be
calculated for caching and such.
|
|
We do not need feed options there (or anywhere, hopefully).
|
|
This allows consistency between different invocations from
roughly the same period and is no worse for caching any any of
our existing HTML and Atom feeds.
We cannot set the timestamp to the end date since messages
may be added to the repository while we are iterating
(and this streaming mechanism will pick them up).
|
|
This allows us to easily provide gigantic inboxes
with proper backpressure handling for slow clients.
It also eliminates public-inbox-httpd and Danga::Socket-specific
knowledge from this class, making it easier to follow for
those used to generic PSGI applications.
|
|
Allows easily downloading the entire archive without
special tools. In any case, it's not yet advertised to via
HTML until we can test it better. It'll also support range
queries in the future to avoid wasting bandwidth.
|
|
This should make validating the output easier
when testing between different servers.
|
|
This allows messages to be read in chronological order when
read without a mail client (e.g. with "zcat t.mbox.gz | less")
|
|
When serving archives, it's more robust to keep existing
archive links in one server goes down.
|
|
This may be necessary for compatibility with non-mboxrd aware
parsers which expect "\nFrom " for everything but the first
record.
|
|
We'll be using it for more than just cat-file.
Adding a `popen' API for internal use allows us to save a bunch
of code in other places.
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
Downloaded mboxen can be archived/stored indefinitely, try to
make it easy for future archaelogists to find the online
archive location.
|
|
It may be present in messages imported from NNTP.
|
|
It doesn't actually give performance improvements unless we
use types with "my", but we don't do that. We'll only continue
using fields with Danga::Socket-derived classes where they're
required.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
Provide a fallback for legacy SHA-1 messages, but do not
advertise shorter URLs anymore for data portability concerns.
This fixes a regression introduced in
commit 81a9c1b476987d845b340ab9013d26cf4487cb9a
("search: disable Message-ID compression in Xapian")
which ended up breaking thread-related endpoints for
large Message-IDs, as lookups on the SHA-1 message no longer
worked.
|
|
This doesn't seem needed for actual server use, but Plack tests
complain about it
|
|
Consistently name mid_* functions as verbs.
|
|
Dereference header_obj only once when performance may be
critical, or simplify our code by calling "header" directly on
the Email::{Simple,MIME} object if not.
|
|
Commenting it in the From: line seems appropriate and
reduces compatibility problems in case a MUA cannot handle
trailing comments after the timestamp.
|
|
This redundantly quotes >From from to prevent losing information
as described by qmail
|
|
This improves compatibility and allows individual messages
to be concatenated into an existing mbox without further
modifications. "git format-patch" does something similar
(but does not do "From " line escaping(!))
|
|
Some folks may want to view the mbox inline as a string of raw text,
when guessing URLs. Let them do this...
|
|
This should allow progressive rendering on the client and reduce
memory usage on the server. Unfortunately XML::Atom::SimpleFeed
does not yet support streaming, so we may not use it in the
future.
|
|
These are not necessary, anymore
|
|
Mboxes may be huge, so only support downloading gzipped mboxes
to save bandwidth and to get free checksumming.
Streaming output means we should not be wasting too much memory
on this unless the chosen server sucks.
|
|
Since mbox is usually downloaded, support fetching infinitely large
responses via streaming.
|
|
Some folks may not want to download and install Perl code like
ssoma, so allow downloading an mbox containing the entire
thread.
|