Date | Commit message (Collapse) |
|
InboxWritable should only set $v2w->{parallel} if the $parallel
flag is defined to 0 or 1. We want indexing a new inbox to
utilize SMP, just like --reindex.
-index once again allows -j0/--jobs=0 to force single-process
use, and we'll be ensuring that works in tests to maintain
performance on small systems.
Fixes: 61a2fff5b34a3e32 ("admin: move index_inbox over")
|
|
We forcibly stop git-log here, so erroring out on git-log close
failures is wrong since it sees SIGPIPE. Noticed while
reindexing a large v1 inbox for IMAP changes.
Fixes: b32b47fb12a3043d ("index: "git log" failures are fatal")
|
|
It's no longer necessary to populate the smsg->{mid} field now
that ->smsg_eml calls smsg->populate in rare cases where the
smsg did not originate from SQLite.
|
|
We'll continue to favor simpler data models that can be
used directly rather than wasting time and memory with
accessor APIs.
The ->from, ->to, -cc, ->mid, ->subject, >references methods can
all be trivially replaced by hash lookups since all their values
are stored in doc_data. Most remaining callers of those methods
were test cases, anyways.
->from_name is only used in the PSGI code, so we can just
use ->psgi_cull to take care of populating the {from_name}
field.
|
|
They're stored directly in Xapian and SQLite document data.
NNTP accesses those fields directly to avoid method invocation
overhead so there's no reason to waste several kilobytes for
each sub.
|
|
We'll let $smsg->populate take care of everything all at once
without hanging onto the header object for too long.
|
|
PublicInbox::Smsg::date remains the only exception which
requires any subroutine calls, here, so we'll just have
a branch just for that.
|
|
To further simplify callers and avoid embarrasing memory
explosions[1], we can finally eliminate this method in
favor of smsg_eml.
[1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5
("view: stop storing all MIME objects on large threads")
fixed a huge memory blowup.
|
|
None of our current callers care about the size of the blob
we're retrieving, so stop wasting stack space and code for
it.
|
|
We'll just use `bless' like most current PublicInbox::Smsg callers.
|
|
This will eventually replace the __hdr() calling methods and
eradicate {mime} usage from Smsg. For now, we can eliminate
PublicInbox::Smsg->new since most callers already rely on an
open `bless' to avoid the old {mime} arg.
|
|
First, prefer the leaner "parent" module over the heavy "base"
module to establish ISA relationships, since "base" is only
needed for "fields".
The "//" and "//=" operators allow us simplify our code and fix
minor bugs where a value of "0" was disallowed. Yes, we'll
allow "0" as an email address, too, since some twisted BOFH
could theoretically use it as a local user name.
Going forward, we'll also be avoiding "use warnings" and
instead rely on `-w' in the shebang.
|
|
No point in attempting to print the value of an undefined
variable if there's a bug. Fortunately, (AFAIK) we've never hit
that bug check :>
|
|
We can simplify WwwAtomStream callbacks by performing ->smsg_eml
calls in the `feed_entry' sub itself. This simplifies callers,
by reducing the number of places which can load an Eml object
into memory.
|
|
The goal of this is to eventually remove the $smsg->{mime} field
which is easy-to-misuse and cause memory explosions which
necessitated fixes like commit 7d02b9e64455831d
("view: stop storing all MIME objects on large threads").
|
|
Assisted by commit a73957b5b05f2a00f7a85353b1658b6d8cde05ae
("testcommon: speed up wait_for_tail() on GNU/Linux")
Fixes: 846161e3d1207d59 ("treat $INBOX_DIR/description and gitweb.owner as UTF-8")
|
|
Somewhat recent versions of GNU tail(1) use inotify(7) on Linux;
so don't penalize hackers using TAIL='tail -F' to run their tests
with extra delays.
Ironically, we still need to busy loop on /proc/$TAIL_PID/{fd,fdinfo}
since inotify doesn't seem to support procfs.
|
|
gitweb does the same with $GIT_DIR/description and gitweb.owner.
Allowing UTF-8 description should not cause problems when used
in responses for to the NNTP "LIST NEWSGROUPS" request, either,
since RFC 3977 section 7.6.6 recommends the description be UTF-8
(but does not require it).
Link: https://public-inbox.org/meta/20200528151216.l7vmnmrs4ojw372g@sourcephile.fr/
|
|
I found myself wanting to remove a message from all inboxes
while working on a test case in another branch. I figure this
could also be useful for globally removing messages which are in
the grey area or too big for spamc.
|
|
There is obviously a typo here, so fix it and add a test
case to guard against future regressions.
Fixes: 74a3206babe0572a ("mda: support multiple List-ID matches")
|
|
This prevents $TMPDIR from being littered with *-journal files
after running the test suite.
This shouldn't cause excessive memory use since $v2w->{mm_tmp}
doesn't see big transactions. There's no need to worry about
data loss, here,either, since this is just a temporary clone
we've even disabled fsync on.
Fixes: 78888d36fb80889f ("msgmap: use TRUNCATE for journal_mode, for now")
|
|
Offering links to download 0-byte files is useless. We could
waste memory by preserving $eml->{bdy} during iteration, but
offering attachments of type "multipart" is not very useful,
as users are usually interested in decoded attachments or
the entire raw message.
Fixes: e60231148eb604a3 ("descend into message/(rfc822|news|global) parts")
|
|
We don't need to load Xapian until we have a directory
which looks like a shard, otherwise we're wasting cycles
on memory when running short-lived processes.
|
|
This test may still run against ancient versions of Email::MIME
for comparisons.
|
|
Older versions of Inline (e.g. 0.53 in CentOS 7) did not accept
the `directory' parameter, so use conditional assignment to set
a default value on $ENV{PERL_INLINE_DIRECTORY}, instead.
|
|
These aren't really supported and will probably be replaced with
better tools, but PublicInbox::Eml should be readily available
to anybody who already has our source tree.
|
|
The <EXPR> perlop, `readline', and `read' functions will all
retry on EINTR, so there's no need to retry and loop ourselves
with `sysread'.
|
|
I missed this instance of file slurping into an Email::MIME-like
object the other week when tearing Email::MIME usage out.
|
|
Upon rereading the code, it wasn't immediately obvious to
me why we didn't check for errors with `close($w)' instead
of relying on `undef'. So add a comment for the benefit of
future readers.
|
|
In our inbox-writing code paths, ->getline as an OO method may
be confused with the various definitions of `getline' used by
the PSGI interface. It's also easier to do: "perldoc -f readline"
than to figure out which class "->getline" belongs to (IO::Handle)
and lookup documentation for that.
->print is less confusing than the "readline" vs "getline"
mismatch, but we can still make it clear we're using a real
file handle and not a mock interface.
Finally, functions are a bit faster than their OO counterparts.
|
|
On powerful systems, having this option is preferable to
XAPIAN_FLUSH_THRESHOLD due to lock granularity and contention
with other processes (-learn, -mda, -watch).
Setting XAPIAN_FLUSH_THRESHOLD can cause -learn, -mda, and
-watch to get stuck until an epoch is completely processed.
|
|
`--reindex' involves chomping down lots of mail, so it benefits
from parallelization just like the initial indexing. It's
also a bit surprising to specify `--jobs/-j' without parallel
processes, so ensure we turn on parallelization there, too.
We can simplify initialization here, as well, since neither
`eval' nor `V2Writable->new' should be in this code.
|
|
To avoid confusing future readers and users, recommend
PublicInbox::Eml in our Import POD and refer to PublicInbox::Eml
comments at the top of PublicInbox::MIME.
mime_load() confined to t/eml.t, since we won't be using
it anywhere else in our tests.
|
|
Email::MIME never supported this properly, but there's real
instances of forwarded messages as message/rfc822 attachments.
message/news is legacy thing which we'll see in archives, and
message/global appears to be the new thing.
gmime also supports message/rfc2822, so we'll support it anyways
despite lacking other evidence of its existence.
Existing attachments remain downloadable as a whole message,
but individual attachments of subparts are now downloadable
and can be displayed in HTML, too.
Furthermore, ensure Xapian can now search for common headers
inside those messages as well as the message bodies.
|
|
We'll be adding support to descend into message/rfc822 (and
legacy message/news) attachments. First, we must ensure
existing message/rfc822 attachments can be downloaded and remain
downloadable in future commits.
|
|
However, we'll always have a newline before subsequent
attachments links after the first.
For the initial part of a multipart message, this regression
appeared in 1.5.0, but the display was overly clumped in prior
relases, too.
Fixes: 453dee4881a9c764 ("msg_iter: pass $idx as a scalar, not array")
|
|
PublicInbox::Config.pm::_fill() assumes that address is an array.
Therefore when handling an unset address use an array containing a
single string, instead of a single string.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
It avoids I/O on the directory itself, which could prolong
the lifetime of the storage device.
|
|
This ought to prevent cargo-culting the cache_size PRAGMA
into smaller SQLite DBs we might use.
|
|
The old name may be confused with "Content-ID" as described in
RFC 2392, so use an alternate name to avoid confusing future
readers.
|
|
This allows maintainers to easily check limits against the
contents of existing inboxes. This script covers most of
the new limits enforced by PublicInbox::Eml.
Usage is similar to most xt/*.t scripts:
GIANT_INBOX_DIR=/path/to/inbox prove -bvw xt/eml_check_limits.t
Setting `TEST_CLASS=PublicInbox::MIME' allows us to check
performance and memory use against the old subclass of
Email::MIME.
|
|
Despite several memory reductions and pure Perl performance
improvements, Inline::C spawn() still gives us a noticeable
performance boost.
More user-oriented command-line programs are likely coming,
setting PERL_INLINE_DIRECTORY is annoying to users, and so is
is poor performance. So allow users to opt-in to using our
Inline::C code once by creating a `~/.cache/public-inbox/inline-c'
directory.
XDG_CACHE_HOME is respected to override the location of ~/.cache
independent of HOME, according to
https://specifications.freedesktop.org/basedir-spec/0.6/ar01s03.html
v2: use "/nonexistent" if HOME is undefined, since that's
the home of the "nobody" user on both FreeBSD and Debian.
|
|
We don't have to worry about compatibility with old
installations of Email::MIME::ContentType any longer,
so save some space.
|
|
Although the lazy loading changes were correct, the code
was still using PublicInbox::MIME as a fixed class. Use
the `$cls' variable from the loop.
Favor ->subparts to ->parts, instead, too, since ->parts is
discouraged by the Email::MIME manpage and not implemented for
Eml.
|
|
And just treat it as a non-fatal nag when checking the rest of the
codebase. Calling it "check-manifest" as a `make' target
preserves the old behavior, which causes the check to fail
if a file were added to the worktree without changing the
MANIFEST.
|
|
|
|
|
|
They're still part of our internal API at this point, but
reusing the same names as those used by postfix makes sense for
now to reduce cognitive overheads of learning new things.
There's no "mime_parts_limit", but the name is consistent
with "mime_nesting_limit".
|
|
While our header processing is more efficient than
Email::*::Header, capping the maximum size for a `m//g' match
still limits memory growth on a header we care for.
Use the same limit as postfix (header_size_limit=102400), since
messages fetched via git/HTTP/NNTP/etc can bypass MTA limits.
|
|
I'm not sure it's necessary, since "mid:" is similarly
undocumented. Also, "t:", "c:", "f:" don't offer boolean
analogues for exact matches on To/Cc/From headers, despite
having similar tokens as List-Id inside angle brackets.
|