public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-03-22	rename PublicInbox::SearchMsg => PublicInbox::Smsg
	Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-03-22	index: use git commit times on missing Date/Received
	When indexing messages without Date: and/or Received: headers, fall back to using timestamps originally recorded by git in the commit object. This allows git mirrors to preserve the import datestamp and timestamp of a message according to what was fed into git, instead of blindly falling back to the current time.
2020-03-21	t/msgtime: skip test if timezone isn't UTC
	Date::Parse falls back to using the local timezone when it's missing from an email, so only test in a reasonable TZ (UTC) for server software.
2020-03-21	t/www_listing: avoid 'once' warnings
	We reach into the WwwListing package directly to retrieve that JSON encoder/decoder object, and we can't rely on `use' since WwwListing loading may fail if Plack is missing.
2020-03-20	wwwlisting: avoid lazy loading JSON module
	We already lazy-load WwwListing for the CGI script, and hiding another layer of lazy-loading makes things difficult to do WWW->preload. We want long-lived processes to do all long-lived allocations up front to avoid fragmentation in the allocator, but we'll still support short-lived processes by lazy-loading individual modules in the PublicInbox::* namespace. Mixing up allocation lifetimes (e.g. doing immortal allocations while a large amount of space is taken by short-lived objects) will cause fragmentation in any allocator which favors large contiguous regions for performance reasons. This includes any malloc implementation which relies on sbrk() for the primary heap, including glibc malloc.
2020-03-19	http: fix RFC conformance w.r.t. message length
	We need to favor "Transfer-Encoding: chunked" over the value of the Content-Length header. We should also reject bogus, duplicate and/or unreasonable values for both these, since they can trigger unexpected behavior when combined with other HTTP parsers in proxies such as varnish, nginx, haproxy, etc... See RFC 7230 (and RFC 2616) for more details: https://tools.ietf.org/html/rfc7230 https://www.rfc-editor.org/errata_search.php?rfc=7230
2020-03-07	searchmsg: allow lines (and bytes) to be zero
	We will occasionally see legit messages with zero lines, be sure we index that count for NNTP clients. I'm not sure about bytes being zero (aside from purged messages), but we should've dealt with that earlier up the stack.
2020-03-01	msgtime: assume +0000 if TZ missing when using Date::Parse
	Some old emails don't have timezone offsets, since our Date::Parse code path takes a liberal interpretation of dates, fallback to using "+0000" as the timezone offset since it's closer to the actual date of the message than whatever the current date is. Reported-by: Leah Neukirchen <leah@vuxu.org> Link: https://public-inbox.org/meta/87h7zfemur.fsf@vuxu.org/ Fixes: ae80a3fdb53d7014 ("MsgTime.pm: Use strptime to compute the time zone")
2020-03-01	import: drop '<' and '>' characters in addresses
	Some strange "From:" lines will cause Email::Address::XS to leave '<' (and presumably '>') in the address which git-fast-import won't accept even if quoted. Workaround this problem by deleting '<' and '>' the same way we delete them for the ident name. Reported-by: Leah Neukirchen <leah@vuxu.org> Link: https://public-inbox.org/meta/87h7zfemur.fsf@vuxu.org/
2020-02-24	v2writable: make remove return-compatible w/ Import::remove
	Import::remove is a documented interface, and the return value of the V2Writable work-alike should try to be compatible with what Import implements.
2020-02-24	hval: ascii_html: drop CRLF => LF conversion
	Instead, we add CRLF conversion to the only remaining place which needs it, ViewVCS. This save many redundant ops in in many places. The only other place where this mattered was in View::add_text_body, but we already started doing CRLF conversions when we added diff parsing and link generation for ViewVCS. Otherwise, all other places we used this was for header viewing and Email::MIME doesn't preserve CRLF in headers.
2020-02-16	view: escape ampersand in Message-IDs
	We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-15	t/msg_iter: test for X-UNKNOWN charset from Alpine
	A long overdue test for behavior established in 2016. Fixes: 1b28cc7f00a866cb ("view: try assuming UTF-8 for bogus charsets")
2020-02-08	t/multi-mid: skip properly w/o DBD::SQLite
	SearchIdx always requires DBD::SQLite, so only require it after we've passed `require_mods(qw(DBD::SQLite))'.
2020-02-07	tests: switch to XML::TreePP for testing Atom feeds
	XML::Feed pulls in a lot of dependencies, some of which XS. That makes testing with blead or any non-OS-supplied Perl installations more time consuming and more difficult because of the need to have development headers and libraries for libexpat1 or libxml2. Performance from libexpat1 or libxml2 for our small tests cases isn't relevant, either, and the pure Perl XML::TreePP seems up to the task. It's also available in CentOS 7.x, FreeBSD 11.x, and Debian, at least.
2020-02-07	syscall: support Linux x32 ABI
	The x32 ABI allows users to take advantage of the extra registers on x86-64 without the bloat of 64-bit pointers and longs. This ought to be significant since Perl was designed when 32-bit was prevalent; and the common structs for ops, hashes, scalars, and arrays use longs (SSize_t/Size_t) for things which should never need 64-bits when processing emails. Debian's x32 port seems to work quite nicely under a chroot on an amd64 Linux system. All tests pass under x32, now.
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2020-02-06	t/multi-mid: don't access ~/.public-inbox/config
	It can cause unpredictable behavior and also slow things down. Followup-to: e4d3be19612b2082 ("t: localize the PI_CONFIG env")
2020-02-04	over: simplify read-only vs read-write checking
	No need to call ref() and do a string comparison. Add some extra tests using the {ReadOnly} attribute in DBI.pm.
2020-02-04	www: serve $INBOX_DIR/description as $INBOX_URL/description
	Instead of serving $INBOX_DIR/all.git/description, since $INBOX_DIR/all.git/description is not described in the default message when it's missing.
2020-02-04	www: stricter regexp for 405 errors
	We want to match "GET" and "HEAD" exactly, not requests which start with "GET" or end with "HEAD". This doesn't seem like a real problem for public-inboxes which are actually public data anyways.
2020-02-02	convert: fix --no-index switch
	The (currently undocumented) "--no-index" flag did not trigger the V2Writable->done call necessary to make the import successful. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-02	v2writable: nproc_shards: subtract 1 from given value
	This is to be consistent with the `nproc(1)' code path. It also quiets down a warning from Admin when "-j $JOBS" is specified, since the master process (which distributes work to shards and handles OverIdx and Msgmap) is considered a job on its own.
2020-02-02	t/multi-mid.t: extra test for -convert highwater mark
	This is derived from a real-world test case where I encounterd multiple Message-IDs in a v1 inbox causing regen problems. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-01-31	convert: preserve highwater mark from v1 msgmap
	If we're reusing the msgmap from a v1 inbox, we also need to ensure the highwater mark doesn't get doubled in the v1->v2 conversion by internally triggering the equivalent of "--reindex" on a fresh v2 inbox. This was needed to convert an indexed v1 inbox which featured messages with multiple Message-IDs in it. Fresh, unindexed clones of v1 inboxes would not have been affected by this.
2020-01-31	mboxgz: ensure gzipped mboxes always have filenames
	Lets always have Content-Disposition for files intended to be downloaded for consumption by non-browsers, such as pigz, zcat, "git am". This is also to be consistent with the non-gzipped mbox $MESSAGE_ID/raw endpoint.
2020-01-31	t/psgi_search: test for subject-free messages
	Apparently I fixed this bug a while back in commit f94c3a195a25a31d0215cd175938008fca473378 but did not write tests.
2020-01-28	v2writable: newest epochs go first in alternates
	New epochs are the most likely to have loose objects. git won't be able to take advantage of pack indices and needs to scan every alternate for the loose object via open/openat syscalls. Those syscalls will add up some day when we've got hundreds or thousands of epochs.
2020-01-28	t/v2reindex.t: 5.10.1 glob compatibility
	I'm not sure when `for (<"quoted string/glob/*">)' became supported, and maybe it was inadvertant, but it fails with Perl 5.10.1. Just use the glob() function to be explicit.
2020-01-28	t/hl_mod: document IO::Handle for autoflush
	We don't need IO::File for this test, but IO::Handle is needed for ->autoflush with Perl <5.14. Note: I haven't tested highlight.pm under 5.10.1 since it's a weird dependency which isn't easy to install w/o distro support.
2020-01-28	avoid relying on IO::Handle/IO::File autoload
	Perl 5.14+ gained the ability to autoload IO::File (and IO::Handle) on missing methods, so relying on this breaks under 5.10.1. There's no reason to load IO::File or IO::Handle when built-in perlops work fine and are even a hair faster.
2020-01-28	daemon: provide TCP_DEFER_ACCEPT for Perl <5.14
	Socket::TCP_DEFER_ACCEPT() did not appear in the Socket module distributed with Perl until 5.14, despite it being available since Linux 2.4.
2020-01-27	tests: move the majority of t/view.t into t/plack.t
	And some more into t/mid.t. PublicInbox::View::msg_html may change internally, so lets rely on the stable PSGI interface to test it, rather than a test which reaches deep into the internals.
2020-01-27	t/plack.t: modernize and unindent
	This test will be expanded, and we can take advantage of run_script to simplify our internal API use.
2020-01-23	hval: from_attr: move to unit test
	We don't call from_attr anywhere outside of tests, so don't bloat normal processes with it.
2020-01-23	hval: to_attr: support wide characters
	We need to escape wide characters when making attribute names from filename-looking things in diffstats.
2020-01-13	ds: add_timer: rename from AddTimer, remove a parameter
	The class parameter is pointless, especially for an internal sub which only has one external caller in a test. Add a sub prototype while we're at it to get some compile time checking.
2020-01-13	use popen_rd for bidirectional pipes
	popen_rd accepts arbitrary redirects, so we can reuse its code to setup the pipe end we want to read, saving each caller a few lines of code compared to calling pipe+spawn.
2020-01-13	t/solver_git: avoid uninitialized warnings in hostname generation
	Outside of tests, this is only relevant for non-PSGI use, which may happen someday... Fixes: cb1c874520153f5c ("inbox: use PublicInbox::Git::host_prefix_url for base_url")
2020-01-11	make Plack optional for non-WWW and non-httpd users
	Some users just want to run -mda, -watch, and/or -nntpd. Let them run just those without forcing them to pull in a bunch of dependencies.
2020-01-11	spawn (and thus popen_rd) die on failure
	Most spawn and popen_rd callers die on failure to spawn, anyways, and some are missing checks entirely. This saves us a bunch of verbose error-checking code in callers. This also makes popen_rd more consistent, since it already dies on pipe creation failures.
2020-01-11	git: remove ->commit_title method
	We haven't used it in SolverGit, yet, and I'll be reworking it to work with ->cat_async, instead.
2020-01-11	inbox: use PublicInbox::Git::host_prefix_url for base_url
	Better not to duplicate the same logic across different classes. Also, our git wrapper class is a strange place for host_prefix_url, but it needs to be usable for coderepos, so it's there, for now...
2020-01-06	treewide: "require" + "use" cleanup and docs
	There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-06	t/nntp.t: fix parse_time test for non-GMT local time
	Yes, there's actually other timezones!
2020-01-05	tests: remove some "git config" calls after "git init"
	Creating a hash and iterating through it just to run "git config" is ugly and slow. Just write out the text file in a human-friendly way since the git-config file format is stable and won't break randomly.
2020-01-05	search: remove lookup_article
	It was no longer used outside of tests, so don't penalize regular users with the extra function. Just inline it for t/search.t.
2020-01-04	tests: fix running without SQLite or Xapian
	PublicInbox::Search always loads DBD::SQLite, so we can't blindly "use" it in t/xcpdb-reshard.t. We also need to account for that in TestCommon.
2020-01-02	config: support multi-value inbox..url
	Since the beginning of this project, we've implicitly supported inboxes with multiple URLs by relying on the Host: header sent by the client ($env->{HTTP_HOST}). We now offer the option to explicitly configure multiple URLs for every inbox along with the ability to do a best-effort match for matching hostnames.
2020-01-01	wwwstatic: add directory listing + index.html support
	It's now possible to use WwwStatic as a standalone PSGI app to serve static files and recreate the award-winning web design of https://public-inbox.org/ :>