Date | Commit message (Collapse) |
|
I didn't wait until September to do it, this year!
|
|
It's only used by us in public-inbox-watch, and maybe not
for long. It's in most installations because Plack pulls it
in though, but Plack is no longer required.
|
|
Most spawn and popen_rd callers die on failure to spawn,
anyways, and some are missing checks entirely. This saves
us a bunch of verbose error-checking code in callers.
This also makes popen_rd more consistent, since it already
dies on pipe creation failures.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
And update callers to use it, as it makes the code a bit cleaner.
Probably irrelvant, but it should be faster, too, as
"perl -I lib -w -MO=Deparse $FILE" shows REJECT() calls are
constant-folded.
|
|
This is distributed with Perl 5.10.1 and onwards, so it should
not be an installation burden for any users. I'm planning to
move away from tempdir() entirely and use File::Temp->newdir to
remove dependencies on END{} blocks.
|
|
InboxWritable::maildir_path_load exists and we may support
it for use with standalone scripts.
|
|
This also adds watchheader tests for -watch, which we never
had before :x
|
|
|
|
Given most folks have multiple mail accounts, there's no reason
we can't support multiple Maildirs.
|
|
We can drop some unnecessary imports and now that we switched
to InboxWritable.
|
|
Knowing which message failed a spam check is tough when I have
many Maildirs and don't have a search indexing tool setup for
spam mail.
|
|
Clearly the AltId stuff was never tested for v2. Ensure
this tricky filter (which reuses Msgmap to avoid introducing
new serial numbers) doesn't trigger deadlocks SQLite due
to opening a DB for writing multiple times.
I went through several iterations of this change before
going with this one, which is the least intrusive I could
fine.
|
|
Remove redundant slashes while we're at it.
|
|
Unused since commit 6c2caa791bd5fbf5c4edb1a4a2c1807e527348a7
("watchmaildir: support v2 repositories")
|
|
Not sure what I was smoking when I originally wrote this code.
cf. https://public-inbox.org/meta/874li887mp.fsf@vuxu.org/
|
|
No need to reach into PublicInbox::Config internals and iterate
through the hashref by hand
|
|
This reuses some of the configuration from -watch, but remains
independent since some configurations will use -watch for some
inboxes and -mda for others.
The default remains "spamc" for -mda users so nothing changes
without explicit configuration.
Per-inbox configurations may also be supported in the future.
|
|
I suppose it's a bug or inconsistency that altid is write-only
and their deletions do not get reflected. But for now, we
do not set it when training spam so there's no window where
an invalid NNTP article number shows up.
This should solve the problem where there's massive gaps
in messages solved by spam training for ruby groups:
https://public-inbox.org/meta/20180307093754.GA27748@dcvr/
|
|
This will make it easier to as well as supporting future
Filter API users. It allows simplifying our ad-hoc
import_vger_from_mbox script.
|
|
Reduce the places where we have duplicate logic for discarding
unwanted headers.
|
|
This code will be shared with future mass-import tools.
|
|
While parallel processes improves import speed for initial
imports; they are probably not necessary for daily mail imports
via WatchMaildir and certainly not for public-inbox-init. Save
some memory for daily use and even helps improve readability of
some subroutines by showing which methods they call remotely.
|
|
Unfortunately this gives up some minor performance tweaks we
made to avoid reforking import processes.
|
|
This allows us to share code for generating Message-IDs
between v1 and v2 repos.
For v1, this introduces a slight incompatibility in message
removal iff the original message lacked a Message-ID AND
the training request came from a message which did not
pass through the public-inbox:
The workaround for this would be to reuse the bad message from
the archive itself.
|
|
This can probably be moved to Import for code reuse.
|
|
This will allow WatchMaildir to use ->barrier operations instead
of reaching inside for nchg. This also ensures dumb HTTP
clients can see changes to V2 repos immediately.
|
|
It works around some bugs in older Email::MIME which we'll
find useful.
|
|
Hostnames can contain '-' and this allows public-inbox-watch(1)
to work on machines which generate Maildir files with '-' in
them.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
This makes it easy to identify the reason for message removals.
|
|
We must not trigger future activity when initializing
a -watch shutdown.
|
|
We should make changes visible sooner, even during
lengthy scans.
|
|
This should be more reliable and safer as it'll ensure
existing fast-import instances are shut down properly.
|
|
We need to ensure new messages are being processed
fairly during full rescans, so have the ->scan subroutine
yield and reschedule itself. Additionally, having a
long-running task inside the signal handler is dangerous
and subject to reentrancy bugs.
Due to the limitations of the Filesys::Notify::Simple interface,
we cannot rely on multiplexing I/O interfaces (select, IO::Poll,
Danga::Socket, etc...) for this. Forking a separate process
was considered, but it is more expensive for a mostly-idle
process.
So, we use a variant of the "self-pipe trick" via inotify (or
whatever Filesys::Notify::Simple gives us). Instead of writing
to our own pipe, we write to a file in our own temporary
directory watched by Filesys::Notify::Simple to trigger events
in signal handlers.
|
|
Otherwise the old watcher may run indefinitely
|
|
The RubyLang filter is strict about what messages it rejects, so
the spam learning path will not auto-train or remove messages
missing X-Mail-Count headers.
|
|
Unfortunately, it appears we have to reject this and instead add
support filtering at View time(*), due to DKIM signatures in
messages from ruby-lang.org.
(*) which may not be worth it
|
|
It should be helpful to know what error happened.
|
|
Dovecot uses 'a'..'z' (lowercase) to designate keywords
in Maildir flags. This was preventing certain messages
from being marked as spam.
https://wiki2.dovecot.org/MailboxFormat/Maildir
|
|
We'll want to allow some degree of configuration for
various mailing lists.
|
|
We don't want to be triggering OOM or swapping on weaker
systems when we have dozens of inboxes as potential targets.
|
|
This should fix problems with multipart messages where
text/plain parts lack a header.
cf. git clone --mirror https://github.com/rjbs/Email-MIME.git
refs/pull/28/head
In the future, we may still introduce as streaming
interface to reduce memory usage on large emails.
|
|
If a message is spam in one mailbox, it is spam in all others a
particular user/group will care about.
|
|
We'll keep supporting "publicinboxlearn" indefinitely,
but "publicinboxwatch" is probably more appropriate
at the moment.
Noticed while writing documentation.
|
|
We need to pass the Inbox object to SearchIdx to get altid
mappings properly for incremental imports.
TODO: use the Inbox object in more places where it makes sense
to do so.
|
|
It would be nice to know about spamcheck failures.
|
|
Trashed messages and drafts are probably not intended for
importing, so do not import them. Dovecot uses extra flags via
lowercase letters, so we must support those (as that's the
server I use).
|
|
Mailing lists I watch and mirror may not have the best spam
filtering, and an extra layer should not hurt.
|
|
We do not actually do spam checking, here; but will
do spam checking before adding a message in the future.
|