public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-05-12	rename "ContentId" to "ContentHash"
	The old name may be confused with "Content-ID" as described in RFC 2392, so use an alternate name to avoid confusing future readers.
2020-05-09	replace most uses of PublicInbox::MIME with Eml
	PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-04-22	make zlib-related modules a hard dependency
	This allows us to simplify some of our existing code and make future changes easier. I doubt anybody goes through the trouble to have a Perl installation without zlib support. The zlib source code is even bundled with Perl since 5.9.3 for systems without existing zlib development headers and libraries. Of course, zlib is also a requirement of git, too; and we're not going to stop using git :) [squashed: "wwwaltid: use gzipfilter up front"]
2020-04-21	index: support --max-size / publicinbox.indexMaxSize
	In normal mail paths, we can rely on MTAs being configured with reasonable limits in the -watch and -mda mail injection paths. However, the MTA is bypassed in a git-only delivery path, a BOFH could inject a large message and DoS users attempting to mirror a public-inbox. This doesn't protect unindexed WWW interfaces from Email::MIME memory explosions on v1 inboxes. Probably nobody cares about unindexed WWW interfaces anymore, especially now that Xapian is optional for indexing.
2020-04-20	drop needless `eval {}' around Config->new
	It hasn't been needed since commit 089cca37fa036411 ("config: ignore missing config files"). And we actually want to propagate errors when we can't start new processes or if git(1) is missing.
2020-04-19	favor `do {}' over `eval {}' for localized slurp
	I did not know to use the return value of `do' back in the day. There's probably no practical difference in these cases, but `eval' is overkill for these uses and may hide actual errors. We can get rid of a few redundant `scalar' ops and pass scalar refs to Email::MIME->new to avoid copies in a few more places, too.
2020-04-19	inboxwritable: mime_from_path: reuse in more places
	There's nothing Maildir-specific about the function, so `maildir_path_load' was a bad name. So give it a more appropriate name and use it in our tests. This save ourselves some code and inconsistency by reusing an existing internal library routine in more places. We can drop the "From_" line in some of our (formerly) mbox sample files.
2020-03-29	index: support --compact / -c on command-line
	It's more convenient to specify `-c' / `--compact' on the command-line when reindexing than it is to invoke public-inbox-compact(1) separately. This is especially convenient in low-space situations when public-inbox-index is operating on multiple inboxes sequentially, as compaction can happen immediately after indexing each inbox, instead of waiting until all inboxes are indexed.
2020-02-23	doc: improve wording of "inbox" vs "repository"
	Since v2 inboxes contain multiple git repositories, avoid the use of the word "repository" when referring to inboxes as a whole in most places.
2020-02-08	convert: preserve indexlevel on conversions
	We don't want to blow up users storage too badly when converting v1 to v2 or break because they don't have Xapian bindings installed.
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2020-02-02	convert: fix --no-index switch
	The (currently undocumented) "--no-index" flag did not trigger the V2Writable->done call necessary to make the import successful. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-02	convert: shift @ARGV explicitly
	Relying on implicit "@_" for shift fails with TestCommon::_run_sub iff GetOptions modifies @ARGV.
2020-02-02	convert: remove unused variables capturing :from
	Looking at git history, they were never used.
2020-01-31	convert: preserve highwater mark from v1 msgmap
	If we're reusing the msgmap from a v1 inbox, we also need to ensure the highwater mark doesn't get doubled in the v1->v2 conversion by internally triggering the equivalent of "--reindex" on a fresh v2 inbox. This was needed to convert an indexed v1 inbox which featured messages with multiple Message-IDs in it. Fresh, unindexed clones of v1 inboxes would not have been affected by this.
2020-01-27	init: use Import::run_die instead of system()
	We already load PublicInbox::Import via PublicInbox::InboxWritable, so it's not an extra module to load. This can give us a slight speedup in tests.
2020-01-27	inbox: add ->version method
	This allows us to simplify version checking by avoiding "//" or "\|\|" operators sprinkled around.
2020-01-11	make Plack optional for non-WWW and non-httpd users
	Some users just want to run -mda, -watch, and/or -nntpd. Let them run just those without forcing them to pull in a bunch of dependencies.
2020-01-06	treewide: "require" + "use" cleanup and docs
	There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-01	filter/base: export REJECT as a constant
	And update callers to use it, as it makes the code a bit cleaner. Probably irrelvant, but it should be faster, too, as "perl -I lib -w -MO=Deparse $FILE" shows REJECT() calls are constant-folded.
2019-12-24	remove "no warnings 'once'" in a few places
	We can use "use" to get the namespace into the "BEGIN" phase of the interpreter. While we're at it, use \&coderef syntax explicitly instead of globbing everything.
2019-11-24	check for File::Temp 0.19 for ->newdir method
	This is distributed with Perl 5.10.1 and onwards, so it should not be an installation burden for any users. I'm planning to move away from tempdir() entirely and use File::Temp->newdir to remove dependencies on END{} blocks.
2019-11-16	inboxwritable: add ->cleanup method
	We've been using this in -edit, and will be using it in some more scripts and tests to optimize for run_mode=2 with run_script. Keeping this in the *Writable modules since I don't see it being useful for the WWW and NNTP read-only interfaces which use PublicInbox::Inbox.
2019-11-16	learn: pass global variables into subs
	Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub.
2019-11-16	mda: pass global variables into subs
	Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub.
2019-11-16	init: pass global variables into subs
	Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub. We also need to rely on ->DESTROY instead of END{} to unlink the lock file on sub exit.
2019-11-16	index: pass global variables into subs
	Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub.
2019-11-16	admin: get rid of singleton $CFG var
	PublicInbox::Admin::config() just adds an extra layer of indirection which we barely rely on. So get rid of this global variable and make it easier to run tests in the future without relying on global state.
2019-11-16	edit: use OO API of File::Temp to shorten lifetime
	Instead of relying on END{} blocks, rely on ->DESTROY so the temporary files go out-of-scope and system resources get released, sooner.
2019-11-16	edit: pass global variables into subs
	Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub.
2019-11-14	convert: remove duplicated GetOptions() call
	We only need to parse the command-line once.
2019-11-14	inboxwritable: drop {-importer} cyclic reference
	InboxWritable caching the result of ->importer leads to a circular references with returned (V2Writable\|Import) object holds onto the calling InboxWritable object. With public-inbox-watch, this leads to a memory leak if a user is reloading via SIGHUP after a message is imported (it would only become noticeable with SIGHUPs after every message imported). I would not expect anybody to to notice this in real-world usage. I only noticed this since I was making -xcpdb suitable for long-lived process use (e.g. "mod_perl style") and a flock remained unreleased on v1 inboxes after resharding. WatchMaildir (used by -watch) already handles caching of the importer object itself, and all of our other real-world uses of ->importer are short-lived or designed for batch scripts, so there's no need to cache the importer result internally.
2019-11-08	edit: check for write errors writing "From_" line
	We need to check every print to a regular file for errors, because storage devices inevitably fail.
2019-11-08	edit: propagate correct editor exit code
	exit($?) is never correct, since ($? >> 8) is needed to extract the correct exit code, as other information (e.g. such as signal) is encoded in $? in addition to the exit code.
2019-10-30	mda: support multiple List-ID matches
	While it's not RFC2919-conformant, mail software can theoretically set multiple List-ID headers. Deliver to all inboxes which match a given List-ID since that's likely the intended. Cc: Eric W. Biederman <ebiederm@xmission.com> Link: https://public-inbox.org/meta/87pniltscf.fsf@x220.int.ebiederm.org/
2019-10-30	mda: prepare for multiple destinations
	Multiple List-ID headers will be supported in the next commit
2019-10-30	inboxwritable: add assert_usable_dir sub
	And use it for mda, since "0" could be a usable directory if somebody insists on using relative paths...
2019-10-30	mda: skip MIME parsing if spam
	We don't want to waste cycles parsing the message for MIME bits if it's spam.
2019-10-30	mda: hoist out mda_filter_adjust
	It makes it easier to document the default -mda behavior is stricter than normal, including "public-inbox-learn ham"
2019-10-30	mda: hoist out List-ID handling and reuse in -learn
	It's now possible to inject false-positive ham into an inbox the same way -mda does via List-ID.
2019-10-30	learn: hoist out remove_or_add subroutine
	We'll be reusing it for List-ID processing in the next commit.
2019-10-30	learn: GIT_COMMITTER_<NAME\|EMAIL> may be "" or "0"
	Users may be zeroes or blanks.
2019-10-30	learn: update usage statement
	Use <foo\|bar> since that seems to be the favored notation for required command args (taking a hint from git(1) manpage). While we're at it, remove the space after '<' for the redirect to match git.git coding style.
2019-10-30	learn: only map recipient list on "ham" or "rm"
	It's assumed that "spam" can end up anywhere due to Bcc:, so we need to scan every single inbox. However, "rm" is usually more targeted and and "ham" obviously only belongs in some inboxes.
2019-10-30	learn: support multiple To/Cc headers
	It's possible to specify these headers multiple times, and PublicInbox::MDA->precheck takes that into account, so -learn should, too.
2019-10-17	Merge remote-tracking branch 'origin/inboxdir'
	* origin/inboxdir: config: remove redundant inboxdir check config: support "inboxdir" in addition to "mainrepo" examples/grok-pull.post_update_hook: use "inbox_dir"
2019-10-16	config: support "inboxdir" in addition to "mainrepo"
	"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-10-16	mda: support --no-precheck option
	Since -mda now supports List-ID to better support mirroring of existing mailing lists, it probably makes sense to support disabling the precheck function to provide more accurate (though potentially spammier) mirrors of lists
2019-10-15	mda, watch: wire up List-ID header support
	This also adds watchheader tests for -watch, which we never had before :x
2019-10-05	init: implement locking
	First, we use flock(2) to wait on parallel public-inbox-init(1) invocations while we make multiple changes using git-config(1). This flock allows -init processes to wait on each other if using reasonable POSIX filesystems. Then, we also need a git-config(1)-compatible lock to prevent user-invoked git-config(1) processes from clobbering our changes while we're holding the flock.