about summary refs log tree commit homepage
path: root/script
DateCommit message (Collapse)
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-11-16learn: use "spam" as subject for removal commits (part #2)
We need to use the correct subject when doing global scanning, too. In fact, the per-recipient spam training path is entirely redundant at this point.
2017-11-16learn: use "spam" as subject for removal commits
Sometimes an email is an innocent removal "rm" for a misdirected, off-topic post, while most removed messages are "spam". Allow anybody to look at history and easily distinguish the reason for removing the message.
2017-06-26watch: use "self-inotify-tempfile trick" for quit
This should be more reliable and safer as it'll ensure existing fast-import instances are shut down properly.
2017-06-26watch: improve fairness during full rescans
We need to ensure new messages are being processed fairly during full rescans, so have the ->scan subroutine yield and reschedule itself. Additionally, having a long-running task inside the signal handler is dangerous and subject to reentrancy bugs. Due to the limitations of the Filesys::Notify::Simple interface, we cannot rely on multiplexing I/O interfaces (select, IO::Poll, Danga::Socket, etc...) for this. Forking a separate process was considered, but it is more expensive for a mostly-idle process. So, we use a variant of the "self-pipe trick" via inotify (or whatever Filesys::Notify::Simple gives us). Instead of writing to our own pipe, we write to a file in our own temporary directory watched by Filesys::Notify::Simple to trigger events in signal handlers.
2017-06-26watch: ensure HUP causes the scanner to be reloaded
Otherwise the old watcher may run indefinitely
2017-04-05learn: scan all inboxes when learning spam
This matches the behavior of the -watch daemon since 6d534038285ddd760709ba76ea007f9108200097 ("watch: watchspam affects all configured inboxes")
2017-01-19learn: implement "rm" only functionality
Do not consider this interface stable, but I just needed a way to remove mis-imported multipart messages so public-inbox-watch could pick them up again from my Maildir.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2016-12-12init: preserve permissions of existing config file
This matches git-config(1) behavior, and implied user intent when it comes to programatically editing files.
2016-11-18index: allow indexing before configuration
One may build the initial index on a powerful host and transfer it to a weaker one for incremental indexing. Thus there is no requirement to have a configured public-inbox for building the index unless a user needs altid support or some such.
2016-09-02init: enable pack bitmaps by default
We want to encourage users to serve repositories. So enable bitmaps by default so performance suffers less with smart HTTP.
2016-08-12public-inbox-watch: support reloading config with SIGHUP
This can be useful for adding new lists, as restarting is expensive (but still non-lossy).
2016-08-11search: support alt-ID for mapping legacy serial numbers
For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.
2016-07-31search: support reindexing existing search indices
This should make tweaking the way we search more efficiet by allowing us to avoid doubling destroying the index every time we want to change something. We also give priority to incremental indexing via public-inbox-{watch,mda} and have manual invocations of public-inbox-index perform batch updates while releasing ssoma.lock.
2016-07-26mda: always call Import::done, even on dupes
We don't want to leave fast_import_crash_* dumps around on duplicates.
2016-07-26learn: fix uninitialized variable
Oops :x
2016-07-03examples: remove X-Forwarded-For mentions
We don't need to care about client IPs anywhere.
2016-07-02www: remove Plack::Request dependency entirely
Lighter and ever-so-slightly faster! Most importantly, this won't do non-obvious stuff behind our backs like trying to parse a POST request body for a query string param.
2016-06-26mda: drop leading "From " lines again
Oops... While we're at it, drop blank lines before the "From ", too, since it could happen.
2016-06-24split out spamcheck/spamc to its own module.
This should hopefully make it easier to try other anti-spam systems (or none at all) in the future.
2016-06-21spawn: improve error checking for fork failures
fork failures are unfortunately common when Xapian has gigabytes and gigabytes mmapped.
2016-06-17import: auto-update index when done
This prevents multiple update processes from stepping over each other while called under the lock, and also allows the new -watch process to update the index iff indexing was desired.
2016-06-17mda: support loading arbitrary filters
Give users some rope to do their own filtering.
2016-06-17watch: introduce watch directive
This will allow users to run importers off existing mail accounts where they may not have access to run -mda. Currently, we only support Maildirs, but IMAP ought to be doable.
2016-06-15mda: hook up new filter functionality
This removes the Email::Filter dependency as well as the signature-breaking scrubber code. We now prefer to reject unacceptable messages and grudgingly (and blindly) mirror messages we're not the primary endpoint for.
2016-06-15mda: precheck no longer depends on Email::Filter
Email::Filter doesn't offer any functionality we need, here; and our dependency on Email::Filter will gradually be removed since it (and Email::LocalDelivery) seem abandoned and we can have more-fine-grained control by rolling our own Maildir delivery which can work transactionally.
2016-06-15learn: remove IPC::Run dependency
We'll be relying on our spawn implementation, for now; since it'll be consistent with the rest of our code and can optionally take advantage of vfork.
2016-06-15drop dependency on File::Path::Expand
We still pull it in via Email::LocalDelivery, but that dependency will go away, soon.
2016-05-30script/*{mda,learn}: no strict params for Email::MIME::ContentType
User input is imperfect, do not pollute our mail logs with warnings we cannot fix. This is documented in the Email::MIME::ContentType manpage so it should remain supported.
2016-05-25remove Email::Address dependency
git has stricter requirements for ident names (no '<>') which Email::Address allows. Even in 1.908, Email::Address also has an incomplete fix for CVE-2015-7686 with a DoS-able regexp for comments. Since we don't care for or need all the RFC compliance of Email::Address, avoiding it entirely may be preferable. Email::Address will still be installed as a requirement for Email::MIME, but it is only used by the Email::MIME::header_str_set which we do not use
2016-05-23http: chunk in the server, not middleware
Since PSGI does not require Transfer-Encoding: chunked or Content-Length, we cannot expect random apps we host to chunk their responses. Thus, to improve interoperability, chunk at the HTTP layer like other PSGI servers do. I'm chosing a more syscall-intensive method (via multiple send(...MSG_MORE) for now to reduce copy + packet overhead.
2016-05-16declare Inbox object for reusability
From the beginning, we've avoided objects here in favor of faster startup time; but it may not be worth it since a persistent httpd/nntpd is faster and -mda isn't hit as often.
2016-05-14rename most instances of "list" to "inbox"
A public-inbox is NOT necessarily a mailing list, but it could serve as an input point for zero, one, or infinite mailing lists :D
2016-04-28import: run git-update-server-info when done
We should update $GIT_DIR/info/refs for dumb HTTP clients whenever we make changes to the repository. The best place to update is immediately after making commits. This fixes a bug where public-inbox-learn did not properly update $GIT_DIR/info/refs after inserting or removing messages.
2016-04-25remove ssoma dependency
By converting to using ourt git-fast-import-based Import module. This should allow us to be more easily installed.
2016-04-25split out NNTPD and HTTPD* modules
Hopefully this modularizes things a little and allows us to work on a combined super server to save RAM.
2016-04-09public-inbox-learn: drop leading "From " line from mboxes
It can confuse Email::MIME if we have it.
2016-03-31httpd: remove reference to callback during close
Avoid wasting memory and the risk of a potential reference cycles by dropping the callback ASAP.
2016-03-05httpd: remove unnecessary eval
We have per-middleware evals to deal with them being missing; no need to put an eval around the whole thing and use an extra level of indentation.
2016-03-03daemon: introduce host_with_port for identifying sockets
This allows us to share more code between daemons and avoids having to make additional syscalls for preparing REMOTE_HOST and REMOTE_PORT in the PSGI env in -httpd. This will also make supporting HTTP (and NNTP) over Unix sockets easier in a future commit.
2016-03-03daemon: avoid polluting the main package
We've distilled the daemon code into one public function ("run"), so avoid polluting the main namespace and just have users prefix with the full package name for this rarely-used class.
2016-03-01httpd: remove unneeded err and out fields from class
Vestigial pieces from the nntpd code which aren't needed because the psgi env already has the "psgi.errors" key.
2016-03-01httpd: document pi-httpd.async as totally unstable
We'll have to use it some more before deciding it is a public interface. I do hope for it to be a usable public interface one day for other users.
2016-02-29fixup Plack-related requires
We do not need to load Plack::Request outside of WWW anymore.
2016-02-29favor procedural calls for most private functions
This makes for better compile-time checking and also helps document which calls are private for HTTP and NNTP. While we're at it, use IO::Handle::* functions procedurally, too, since we know we're working with native glob handles.
2016-02-28httpd: allow running if ReverseProxy is missing
Not everybody will be running this behind a ReverseProxy; but it's probably the likely configuration. Anyways, warn about this and also about Deflater being missing.
2016-02-27move executables to script/ directory
This seems to match more closely with what is expected of Perl packages based on how blib is used. Hopefully makes the top-level source tree less cluttered and things easier-to-find.