public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2024-01-30	watch: support incremental updates from MH
	The good news (compared to lei) is we only have to worry about imports and don't care about the filename nor keywords, so it's immune to .mh_sequences writing inconsistencies across MH implementations and sequence number packing. We still assume the writer will write the mail file with one of: * rename(2) to create the final sequence number filename * a single write(2) if not relying on rename(2) mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py may, I'm not sure...
2023-12-29	pure Perl inotify support
	This is a step towards improving the out-of-the-box experience in achieving notifications without XS, extra downloads, and .so loading + runtime mmap overhead. This also fixes loongarch support of all Linux syscalls due to a bad regexp :x All the reachable Linux architectures listed at <https://portal.cfarm.net/machines/list/> should be supported. At the moment, there appears to be no reachable sparc* Linux machines available to cfarm users. Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
2023-11-22	watch: support `watch=false' to negate watchspam
	For users hosting read-only mirrors (via clone\|fetch) and feeding inboxes via -watch
2023-11-11	mda\|learn\|watch: support dropUniqueUnsubscribe config
	List-Unsubscribe headers with unique identifiers (such as those generated by our examples/unsubscribe.milter) should not end up in public archives. Add a new config knob to strip List-Unsubscribe headers if they have the `List-Unsubscribe-Post: List-Unsubscribe=One-Click' header. Unfortunately, this breaks DKIM signatures if the signature covers either of these List-Unsubscribe* headers. However, breaking DKIM is the lesser evil compared to any archive reader being able to stop archival by an independent archivist. As much as I would like this to be the default, it probably affects few users at the moment since very few mailing lists use unique identifiers in List-Unsubscribe (but that number has grown, recently).
2023-09-24	config: drop scalar ref support from internal API
	It's a needless branch to maintain exclusively for our tests. The `git config -l' output isn't pleasant to write in tests, anyways. So just use heredocs to write git configs in their native format rather than emulate the output of `git config -l'. This does make the test suite do more work with temporary files and process invocations, but it doesn't seem very measurable when testing on tmpfs (TMPDIR=/dev/shm). We'll make a minor improvement to TestCommon::tmpdir by allowing it to return a single value (which I suspect we can rely on in more places since File::Temp::Dir overloads stringification).
2023-09-11	favor poll(2) for most daemons
	public-inbox-watch, lei-daemon, the master process of public-inbox-(netd\|httpd\|imapd\|nntpd\|pop3d), and the (mostly) Perl implementation of XapHelper do not have many FDs to watch so epoll\|kqueue end up being overkill. Of course, *BSDs already have separate kqueue FDs emulating signalfd and/or inotify, even. In other words, only the worker processes of public-inbox-(netd\|httpd\|imapd\|nntpd\|pop3d) are expected to see C10K (or C100K) types of traffic where epoll\|kqueue shine. Perhaps lei could benefit from epoll/kqueue on some virtual users IMAP/JMAP system one day; as could -watch with many IMAP IDLE folders; but we'll probably add a knob if/when it comes to that.
2023-04-28	t/watch_maildir: eliminate extra LF from cat-file requests
	This allows us to eliminate the workaround of respawning `git cat-file', too :x
2023-03-25	ds: @post_loop_do replaces SetPostLoopCallback
	This allows us to avoid repeatedly using memory-intensive anonymous subs in CodeSearchIdx where the callback is assigned frequently. Anonymous subs are known to leak memory in old Perls (e.g. 5.16.3 in enterprise distros) and still expensive in newer Perls. So favor the (\&subroutine, @args) form which allows us to eliminate anonymous subs going forward. Only CodeSearchIdx takes advantage of the new API at the moment, since it's the biggest repeat user of post-loop callback changes. Getting rid of the subroutine and relying on a global `our' variable also has two advantages: 1) Perl warnings can detect typos at compile-time, whereas the (now gone) method could only detect errors at run-time. 2) `our' variable assignment can be `local'-ized to a scope
2021-10-24	t/watch_maildir: support non-master default branch

2021-10-01	ds: simplify signalfd use
	Since signalfd is often combined with our event loop, give it a convenient API and reduce the code duplication required to use it. EventLoop is replaced with ::event_loop to allow consistent parameter passing and avoid needlessly passing the package name on stack. We also avoid exporting SFD_NONBLOCK since it's the only flag we support. There's no sense in having the memory overhead of a constant function when it's in cold code.
2021-01-01	update copyrights for 2021
	Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09	rename {pi_config} fields to {pi_cfg}
	{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-09-01	rename WatchMaildir => Watch
	This is no longer limited to Maildirs now that IMAP and NNTP support exist; so give it a shorter name.
2020-06-28	watch: remove {mdir} array
	Since we store all watched directory names as keys in %mdmap, there should be no need to keep an array of those directories around. t/watch_maildir*.t required changes to remove trained spam. Once we've trained something as spam, there shouldn't be a need to rescan it.
2020-06-28	watch: use signalfd for Maildir watching
	We can get rid of the janky wannabe self-using-a-directory-instead-of-pipe thing we needed to workaround Filesys::Notify::Simple being blocking. For existing Maildir users, this should be more robust and immune to missed wakeups for signalfd and kqueue-enabled systems; as well as being immune to BOFHs clearing $TMPDIR and preventing notifications from firing. The IMAP IDLE code still uses normal Perl signals, so it's still vulnerable to missed wakeups. That will be addressed in future commits.
2020-06-28	watch: remove Filesys::Notify::Simple dependency
	Since we already use inotify and EVFILT_VNODE (kqueue) in -imapd, we might as well use them directly in -watch, too. This will allow public-inbox-watch to use PublicInbox::DS for timers to watch newsgroups/mailboxes and have saner signal handling in future commits.
2020-06-28	watchmaildir: fix check for spam vs ham inbox conflicts
	The old check was ineffective since we process the spam folder config before ham inboxes; and would only fail when attempting to treat the scalar "watchspam" string as an array ref.
2020-05-09	remove most internal Email::MIME usage
	We no longer load or use Email::MIME outside of comparison tests.
2020-04-22	t/*.t: reduce dependency on Email::MIME APIs
	Instead, favor PublicInbox::MIME->new for non-attachment emails. We may support alternatives to Email::MIME down the line. We'll still keep Email::MIME->create to deal with attachments, for now, but there's also a fair amount of test duplication we should eliminate, later.
2020-04-20	import: init_bare: allow use as method, use in tests
	Allowing ->init_bare to be used as a method saves some keystrokes, and we can save a little bit of time on systems with our vfork(2)-enabled spawn(). This also sets us up for future improvements where we can avoid spawning a process at all.
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2019-12-24	testcommon: add require_mods method and use it
	This cuts down on lines of code in individual test cases and fixes some misnamed error messages by using "$0" consistently. This will also provide us with a method of swapping out dependencies which provide equivalent functionality (e.g "Xapian" SWIG can replace "Search::Xapian" XS bindings).
2019-12-19	tests: move t/common.perl to PublicInbox::TestCommon
	We want to be able to use run_script with *.t files, so t/common.perl putting subs into the top-level "main" namespace won't work. Instead, make it a module which uses Exporter like other libraries.
2019-11-24	tests: use File::Temp->newdir instead of tempdir()
	We'll also introduce a tmpdir() API to give tempdirs consistent names.
2019-11-24	tests: use strict everywhere
	The "strict" pragma makes code easier to debug, and we had undeclared variables as a result in t/watch_maildir_v2.t. So use it everywhere to be consistent with the rest of our code.
2019-10-16	config: support "inboxdir" in addition to "mainrepo"
	"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-10-15	config: we always have {-section_order}
	Rewrite a bunch of tests to use ordered input (emulating "git config -l" output) so we can always walk sections in the order they were given in the config file.
2019-09-09	run update-copyrights from gnulib for 2019

2019-01-05	watchmaildir: normalize Maildir pathnames consistently
	Remove redundant slashes while we're at it.
2018-03-19	t/watch_maildir: note the reason for FIFO creation
	I had to dig through commit history for this and we should better document our tests (along with everything else).
2018-02-07	update copyrights for 2018
	Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-06-26	watch: improve fairness during full rescans
	We need to ensure new messages are being processed fairly during full rescans, so have the ->scan subroutine yield and reschedule itself. Additionally, having a long-running task inside the signal handler is dangerous and subject to reentrancy bugs. Due to the limitations of the Filesys::Notify::Simple interface, we cannot rely on multiplexing I/O interfaces (select, IO::Poll, Danga::Socket, etc...) for this. Forking a separate process was considered, but it is more expensive for a mostly-idle process. So, we use a variant of the "self-pipe trick" via inotify (or whatever Filesys::Notify::Simple gives us). Instead of writing to our own pipe, we write to a file in our own temporary directory watched by Filesys::Notify::Simple to trigger events in signal handlers.
2016-07-01	t/watch_maildir: quiet down spam check warning
	Probably better than bloating our own API with configurable warning streams and such...
2016-06-24	watch_maildir: implement optional spam checking
	Mailing lists I watch and mirror may not have the best spam filtering, and an extra layer should not hurt.
2016-06-24	document Filesys::Notify::Simple dependency
	And improve documentation for existing dependencies, too.
2016-06-19	watch_maildir: tighten up path checks
	Only mark seen messages as spam, otherwise it could be too aggressive and cause problems or over training. We wouldn't want a wayward FIFO ruining our day, either :)
2016-06-19	watch_maildir: spam removal support
	We can support spam removal by watching a special "spam" Maildir, too. We can run public-inbox-learn as a separate step, and that command will be improved to support auto-learning, too.
2016-06-18	watch_maildir: add scan test
	This should be portable despite the intended use of this directory being non-portable.