Date | Commit message (Collapse) |
|
InboxWritable should only set $v2w->{parallel} if the $parallel
flag is defined to 0 or 1. We want indexing a new inbox to
utilize SMP, just like --reindex.
-index once again allows -j0/--jobs=0 to force single-process
use, and we'll be ensuring that works in tests to maintain
performance on small systems.
Fixes: 61a2fff5b34a3e32 ("admin: move index_inbox over")
|
|
We'll continue to favor simpler data models that can be
used directly rather than wasting time and memory with
accessor APIs.
The ->from, ->to, -cc, ->mid, ->subject, >references methods can
all be trivially replaced by hash lookups since all their values
are stored in doc_data. Most remaining callers of those methods
were test cases, anyways.
->from_name is only used in the PSGI code, so we can just
use ->psgi_cull to take care of populating the {from_name}
field.
|
|
To further simplify callers and avoid embarrasing memory
explosions[1], we can finally eliminate this method in
favor of smsg_eml.
[1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5
("view: stop storing all MIME objects on large threads")
fixed a huge memory blowup.
|
|
This will eventually replace the __hdr() calling methods and
eradicate {mime} usage from Smsg. For now, we can eliminate
PublicInbox::Smsg->new since most callers already rely on an
open `bless' to avoid the old {mime} arg.
|
|
gitweb does the same with $GIT_DIR/description and gitweb.owner.
Allowing UTF-8 description should not cause problems when used
in responses for to the NNTP "LIST NEWSGROUPS" request, either,
since RFC 3977 section 7.6.6 recommends the description be UTF-8
(but does not require it).
Link: https://public-inbox.org/meta/20200528151216.l7vmnmrs4ojw372g@sourcephile.fr/
|
|
There is obviously a typo here, so fix it and add a test
case to guard against future regressions.
Fixes: 74a3206babe0572a ("mda: support multiple List-ID matches")
|
|
Offering links to download 0-byte files is useless. We could
waste memory by preserving $eml->{bdy} during iteration, but
offering attachments of type "multipart" is not very useful,
as users are usually interested in decoded attachments or
the entire raw message.
Fixes: e60231148eb604a3 ("descend into message/(rfc822|news|global) parts")
|
|
This test may still run against ancient versions of Email::MIME
for comparisons.
|
|
I missed this instance of file slurping into an Email::MIME-like
object the other week when tearing Email::MIME usage out.
|
|
To avoid confusing future readers and users, recommend
PublicInbox::Eml in our Import POD and refer to PublicInbox::Eml
comments at the top of PublicInbox::MIME.
mime_load() confined to t/eml.t, since we won't be using
it anywhere else in our tests.
|
|
Email::MIME never supported this properly, but there's real
instances of forwarded messages as message/rfc822 attachments.
message/news is legacy thing which we'll see in archives, and
message/global appears to be the new thing.
gmime also supports message/rfc2822, so we'll support it anyways
despite lacking other evidence of its existence.
Existing attachments remain downloadable as a whole message,
but individual attachments of subparts are now downloadable
and can be displayed in HTML, too.
Furthermore, ensure Xapian can now search for common headers
inside those messages as well as the message bodies.
|
|
We'll be adding support to descend into message/rfc822 (and
legacy message/news) attachments. First, we must ensure
existing message/rfc822 attachments can be downloaded and remain
downloadable in future commits.
|
|
The old name may be confused with "Content-ID" as described in
RFC 2392, so use an alternate name to avoid confusing future
readers.
|
|
We don't have to worry about compatibility with old
installations of Email::MIME::ContentType any longer,
so save some space.
|
|
Although the lazy loading changes were correct, the code
was still using PublicInbox::MIME as a fixed class. Use
the `$cls' variable from the loop.
Favor ->subparts to ->parts, instead, too, since ->parts is
discouraged by the Email::MIME manpage and not implemented for
Eml.
|
|
They're still part of our internal API at this point, but
reusing the same names as those used by postfix makes sense for
now to reduce cognitive overheads of learning new things.
There's no "mime_parts_limit", but the name is consistent
with "mime_nesting_limit".
|
|
While our header processing is more efficient than
Email::*::Header, capping the maximum size for a `m//g' match
still limits memory growth on a header we care for.
Use the same limit as postfix (header_size_limit=102400), since
messages fetched via git/HTTP/NNTP/etc can bypass MTA limits.
|
|
We no longer load or use Email::MIME outside of comparison
tests.
|
|
Since we're getting rid of Email::MIME, get rid of
Email::MIME::ContentType, too; since we may introduce
speedups down the line specific to our codebase.
|
|
PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
|
|
Email::MIME eats memory, wastes time parsing out all the
headers, and some problems can't be fixed without breaking
compatibility for other projects which depend on it.
Informal benchmarks show a ~2x improvement in general
stats gathering scripts and ~10% improvement in HTML
view rendering.
We also don't need the ability to create MIME messages, just
parse them and maybe drop an attachment.
While this isn't the zero-copy or streaming MIME parser of my
dreams; it's still an improvement in that it doesn't keep a
scalar copy of the raw body around along with subparts. It also
doesn't parse subparts up front, so it can also replace our uses
of Email::Simple.
|
|
This doesn't make any difference for most multipart
messages (or any single part messages). However,
this starts having space savings when parts start
nesting.
It also slightly simplifies callers.
|
|
We'll support both probabilistic matches via `l:' and boolean
matches via `lid:' for exact matches, similar to how both `m:'
and `mid:' are supported. Only text inside angle braces (`<'
and `>') are supported, since I'm not sure if there's value in
searching on the optional phrases (which would require decoding
with ->header_str instead of ->header_raw).
|
|
Perl 5.10.1 would warn about implicit assignment to @_ by
split(). So favor the documented method of using `tr'
to count lines.
Fixes: b5ddcb3352ef31ae ("index: support --compact / -c on command-line")
|
|
Current versions of Perl don't warn when vec() is given `undef'
as its first arg, but Perl 5.10.1 does, at least.
Fixes: c7b4cbdadf3116a0 ("t/httpd-corner: improve reliability and diagnostics")
|
|
It's likely we'll replace Email::Simple using our Email::MIME
alternative/replacement, as well. So reduce the API surface we
interact with and make it easier to swap implementations.
|
|
mime_from_path is designed to fail gracefully in busy Maildirs
whereas mime_load was made for loading files from a work tree.
|
|
Replace them with .eml files generated with the help of
Email::MIME, but without some extraneous and unnecessary
headers, and strip mime_load down to just loading files.
This will give us more freedom to experiment with other mail
libraries which may be more correct, better maintained, use
less memory and/or be faster than Email::MIME.
|
|
We'll use this to create, memoize, and reuse .eml files. This
will be used to reduce (and eventually eliminate) our dependency
on Email::MIME in tests.
|
|
Totally pointless to create an object only to convert
it back to a raw string for -mda input.
|
|
Instead, favor PublicInbox::MIME->new for non-attachment emails.
We may support alternatives to Email::MIME down the line.
We'll still keep Email::MIME->create to deal with attachments,
for now, but there's also a fair amount of test duplication
we should eliminate, later.
|
|
PublicInbox::MIME only supports ->new, and is only different
from Email::MIME for old versions of Email::MIME. In the
future, PublicInbox::MIME may not be a subclass of Email::MIME
at all.
|
|
I don't think this has been useful since we stopped
supporting ssoma in this test.
|
|
We need to detect FS errors and bail out on the test
if we can't open a file -nntpd was just writing to.
|
|
Since the advent of run_script(), we can rely on it to simplify
our test code. Changes like this will let us evolve the
internal API more easily while preserving stable CLI interfaces,
especially since we test the v2 path by default, now.
|
|
The `xqx' sub requires an absolute path for optional
commands.
Fixes: 6e07def560b211d9 ("testcommon: spawn-aware system() and qx[] workalikes")
|
|
In normal mail paths, we can rely on MTAs being configured with
reasonable limits in the -watch and -mda mail injection paths.
However, the MTA is bypassed in a git-only delivery path, a BOFH
could inject a large message and DoS users attempting to mirror
a public-inbox.
This doesn't protect unindexed WWW interfaces from Email::MIME
memory explosions on v1 inboxes. Probably nobody cares about
unindexed WWW interfaces anymore, especially now that Xapian is
optional for indexing.
|
|
Barely noticeable on Linux, but this gives a 1-2% speedup
on a FreeBSD 11.3 VM and lets us use built-in redirects
rather than relying on /bin/sh.
|
|
We use BSD::Resource in other places, so there's no sense
in avoiding it, here.
|
|
Allowing ->init_bare to be used as a method saves some
keystrokes, and we can save a little bit of time on systems with
our vfork(2)-enabled spawn().
This also sets us up for future improvements where we can
avoid spawning a process at all.
|
|
The watchheader key supports only a single value. Supporting multiple
watchheader values was mentioned in discussion [1] of 8d3e3bd8 (doc:
explain publicinbox.<name>.watchheader, 2019-10-09), and it wasn't
clear if there was a need.
One scenario in which matching multiple headers would be convenient is
when someone wants to set up public-inbox archives for some small
projects but does _not_ want to run mailing lists for them, instead
allowing others to follow the project by any of the pull mechanisms.
Using a common underlying address, an address alias for each project
is configured via a third-party email provider, with messages for each
alias being exposed as a separate public-inbox archive. In this
setup, messages for an inbox cannot be selected by a List-ID header
but can be identified by the inbox's address in either the To or Cc
header.
To support such a use case, update the watchheader handling to
consider multiple values, accepting a message if it matches any value.
While selecting a message based on matching _any_ rather than _all_
values is motivated by the above scenario, it's worth noting that the
"any" behavior is consistent with how multiple listid config values
are handled.
[1] https://public-inbox.org/meta/20191010085118.r3amey4cayazfycb@dcvr/
|
|
|
|
It's unnecessary overhead for anything which does Email::MIME
parsing. It was never done for v2 indexing, even though v1->v2
conversions did NOT remove those From_ lines. There was never a
need to remote From_ lines the v1 SearchIdx paths, either.
Hitting a /$INBOX_URL/$MSGID/T/ endpoint with an 18 message
thread reveals a ~0.5% speed improvement. This will become
more apparent when we have a faster MIME parser.
|
|
I did not know to use the return value of `do' back in the day.
There's probably no practical difference in these cases, but
`eval' is overkill for these uses and may hide actual errors.
We can get rid of a few redundant `scalar' ops and pass scalar
refs to Email::MIME->new to avoid copies in a few more places,
too.
|
|
It's probably common to have inboxes initially setup without
these files properly configured, so don't memoize at that stage.
|
|
There's nothing Maildir-specific about the function, so
`maildir_path_load' was a bad name. So give it a more
appropriate name and use it in our tests.
This save ourselves some code and inconsistency by reusing an
existing internal library routine in more places. We can drop
the "From_" line in some of our (formerly) mbox sample files.
|
|
We can rid ourselves of a layer of indirection by subclassing
PublicInbox::Smsg instead of using a container object to hold
each $smsg. Furthermore, the `{id}' vs. `{mid}' field name
confusion is eliminated.
This reduces the size of the $rootset passed to walk_thread by
around 15%, that is over 50K memory when rendering a /$INBOX/
landing page.
|
|
Some of these tests just don't seem reliable enough with the
way we or Perl do portable signal handling.
|
|
The graceful-shutdown-on-PUT test is unreliable because we can't
rely on a FIFO as we do with the GET tests. So increase the
delay to 100ms since that seems enough on my system even with
CONFIG_HZ=100.
Add a timeout and backtrace to the $check_self sub to help with
further diagnostics while we're at it, too.
It would be nice if there were a portable syscall tracing
mechanism we could attach to the -httpd process to make the test
more determistic...
|
|
I've observed FreeBSD 11.2 read(2) having one of three
behaviors after a failed write(2) on a socket:
1) returning number of bytes read
2) failing with ECONNRESET
3) returning with EOF
1) is the most common, and I've only seen 1) on Linux. It may
be possible to use SO_LINGER or shutdown(2) to ensure 1) always
happens, but SO_LINGER behavior seems inconsistent across OSes,
especially with non-blocking sockets.
Since these tests are corner-cases where we're dealing with
broken/malicious clients, lets continue spending the least
amount of syscalls protecting ourselves in the daemon and
instead make the client-side test code tolerate more socket
implementations.
|