about summary refs log tree commit
AgeCommit message (Collapse)AuthorFilesLines
2020-02-15t/msg_iter: test for X-UNKNOWN charset from AlpineEric Wong3-0/+42
A long overdue test for behavior established in 2016. Fixes: 1b28cc7f00a866cb ("view: try assuming UTF-8 for bogus charsets")
2020-02-09doc: update v1.3.0.eml with actual headers, start v1.4.0Eric Wong3-1/+16
Bigger changes coming :>
2020-02-10public-inbox 1.3.0 v1.3.0Eric Wong2-7/+8
2020-02-08t/multi-mid: skip properly w/o DBD::SQLiteEric Wong1-1/+1
SearchIdx always requires DBD::SQLite, so only require it after we've passed `require_mods(qw(DBD::SQLite))'.
2020-02-08convert: preserve indexlevel on conversionsEric Wong1-0/+9
We don't want to blow up users storage too badly when converting v1 to v2 or break because they don't have Xapian bindings installed.
2020-02-08doc: more 1.3.0 release notes updatesEric Wong1-1/+3
2020-02-08doc: mark some TODO items as doneEric Wong3-6/+4
NNTP TLS and COMPRESS support and cgit spawning from the WWW interface were implemented last year. Given the lack of syscall number stability guarantee on the OpenBSD and FreeBSD, I don't think supporting a pure-Perl kevent is feasible. Inline::C may still be an option since IO::KQueue is abandoned, though, as it is for some Linux-only syscalls and maybe some POSIX ones not covered by POSIX.pm.
2020-02-08doc: update copyright for standards.perlEric Wong1-1/+1
It was missing "(C)", so gnulib update-copyright missed it.
2020-02-07tests: switch to XML::TreePP for testing Atom feedsEric Wong5-30/+37
XML::Feed pulls in a lot of dependencies, some of which XS. That makes testing with blead or any non-OS-supplied Perl installations more time consuming and more difficult because of the need to have development headers and libraries for libexpat1 or libxml2. Performance from libexpat1 or libxml2 for our small tests cases isn't relevant, either, and the pure Perl XML::TreePP seems up to the task. It's also available in CentOS 7.x, FreeBSD 11.x, and Debian, at least.
2020-02-07syscall: support Linux x32 ABIEric Wong3-2/+28
The x32 ABI allows users to take advantage of the extra registers on x86-64 without the bloat of 64-bit pointers and longs. This ought to be significant since Perl was designed when 32-bit was prevalent; and the common structs for ops, hashes, scalars, and arrays use longs (SSize_t/Size_t) for things which should never need 64-bits when processing emails. Debian's x32 port seems to work quite nicely under a chroot on an amd64 Linux system. All tests pass under x32, now.
2020-02-06treewide: run update-copyrights from gnulib for 2019Eric Wong234-234/+234
I didn't wait until September to do it, this year!
2020-02-06MANIFEST: add flow.{ge,txt}Eric Wong1-0/+2
Oops :x
2020-02-06doc: v1: add a reference to git-filter-repo(1), tooEric Wong2-1/+4
The git-filter-branch(1) manpage itself recommends git-filter-repo, nowadays due to performance and safety problems.
2020-02-06doc: txt2pre: auto-linkify manpage referencesEric Wong4-7/+88
This can be more convenient for people browsing HTML docs remotely or locally.
2020-02-06doc: remove .x/ subdirectory for Xapian manpagesEric Wong3-7/+4
There's no need to keep Xapian manpage renderings in a separate subdirectory, after all. Eliminating this difference between the local FS and URL path will allow relative URLs to the Xapian manpages in our local HTML documentation to work smoothly, since there was never any ".x/" path component for files served from public-inbox.org
2020-02-06doc: add data flow diagram using Graph::EasyEric Wong4-1/+77
Maybe this can make it easier for new and potential users to understand what's going on.
2020-02-06t/multi-mid: don't access ~/.public-inbox/configEric Wong1-2/+2
It can cause unpredictable behavior and also slow things down. Followup-to: e4d3be19612b2082 ("t: localize the PI_CONFIG env")
2020-02-04doc: recommend -compact after --reindexEric Wong2-3/+5
It's likely a user will be low on space after running --reindex, so recommend the use of public-inbox-compact afterwards. And add a few more notes about using public-inbox-compact to clarify it's for inboxes-only (and not any old Xapian DBs) that using xapian-compact(1) directly is error-prone and likely to break things.
2020-02-04over: simplify read-only vs read-write checkingEric Wong3-6/+10
No need to call ref() and do a string comparison. Add some extra tests using the {ReadOnly} attribute in DBI.pm.
2020-02-04inbox: remove TODO item for msg_by_pathEric Wong1-1/+1
It's an old function which only gets called by inboxes w/o SQLite indices.
2020-02-04inbox: simplify ->description and ->cloneurlEric Wong1-15/+9
We can use "//=" from Perl 5.10 to simplify the logic for these methods. The use of chomp() in ->cloneurl was also unnecessary since split(/\s+/s,...) already removes newlines.
2020-02-04www: serve $INBOX_DIR/description as $INBOX_URL/descriptionEric Wong3-0/+16
Instead of serving $INBOX_DIR/all.git/description, since $INBOX_DIR/all.git/description is not described in the default message when it's missing.
2020-02-04www: stricter regexp for 405 errorsEric Wong2-1/+6
We want to match "GET" and "HEAD" exactly, not requests which start with "GET" or end with "HEAD". This doesn't seem like a real problem for public-inboxes which are actually public data anyways.
2020-02-04doc: spellling fixes for manpagesEric Wong2-5/+5
The wording for publicinbox.nntpserver was awkward, too, and I took this as opportunity to hopefully clarify it and favor "hostname" for Internet addresses, because we already use "address" to mean "email address" in the config.
2020-02-02spawn: actually die on (vfork|fork) failuresEric Wong1-1/+2
Commit 9f5a583694396f84 ("spawn (and thus popen_rd) die on failure") was incomplete in that it only removed error checking for spawn failures for non-(vfork|fork) calls, but the actual (vfork|fork) PID result could still be undef. Fixes: 9f5a583694396f84 ("spawn (and thus popen_rd) die on failure")
2020-02-02v2writable: more ways to detect online CPU countEric Wong1-3/+19
OpenBSD and FreeBSD support `getconf NPROCESSORS_ONLN` (no leading underscore). They may also have GNU nproc installed as "gnproc". We may also encounter Linux systems w/o GNU coreutils, but able to use `getconf _NPROCESSORS_ONLN` (with leading underscore).
2020-02-02doc: -convert: document switchesEric Wong1-2/+39
These switches have always been there, but were not documented until now.
2020-02-02convert: fix --no-index switchEric Wong2-4/+5
The (currently undocumented) "--no-index" flag did not trigger the V2Writable->done call necessary to make the import successful. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-02convert: shift @ARGV explicitlyEric Wong1-2/+2
Relying on implicit "@_" for shift fails with TestCommon::_run_sub iff GetOptions modifies @ARGV.
2020-02-02searchidxshard: rely on autoflush instead of ->flushEric Wong1-2/+2
It reduces the number of ops and simplifies the code, slightly. Add a missing IO::Handle import while we're at it, to be explicit about which methods we use.
2020-02-02convert: remove unused variables capturing :fromEric Wong1-6/+0
Looking at git history, they were never used.
2020-02-02v2writable: do not clobber {shards} or {parallel} if unsetEric Wong1-2/+5
The $jobs parameter in `public-inbox-convert' is passed to V2Writable->init_inbox as `undef' by default, causing parallelization to be disabled. Instead, leave the underlying {parallel} flag untouched if $shards is undef and do not clobber the default shard count. This allows us to take advantage of multicore systems when running public-inbox-convert with no command-line switches.
2020-02-02v2writable: nproc_shards: subtract 1 from given valueEric Wong2-8/+3
This is to be consistent with the `nproc(1)' code path. It also quiets down a warning from Admin when "-j $JOBS" is specified, since the master process (which distributes work to shards and handles OverIdx and Msgmap) is considered a job on its own.
2020-02-02t/multi-mid.t: extra test for -convert highwater markEric Wong2-0/+62
This is derived from a real-world test case where I encounterd multiple Message-IDs in a v1 inbox causing regen problems. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-01doc: more 1.3.0 release notes updatesEric Wong1-6/+22
Some updates with recent bugfixes and a few wording/formatting improvements.
2020-02-01config: assume multiple cgit URLs, tooEric Wong2-9/+17
Since we support inboxes with multiple URLs and multiple infourls to reduce reliance on SPOFs, we'll do the same with cgit URLs.
2020-02-01solver: join multiple URLs with "||"Eric Wong1-2/+3
It seems to make sense to the target audience that any of the URLs displayed could work.
2020-02-01wwwtext: give "url" examples in sample configEric Wong1-0/+2
inbox.$NAME.url is a common parameter and set by public-inbox-init(1), so ensure we have lines for it and emphasize it can be multi-value for .onion hidden services or otherwise mirrored and available under multiple URLs.
2020-02-01wwwtext: show multiple infourl values properlyEric Wong1-2/+2
This is now an array, so ensure it's shown properly in the sample config, instead of "ARRAY(0xI8BADBEEF)" or similar. Fixes: 1988d730c0088e8b "config: support multi-value inbox.*.*url"
2020-01-31convert: preserve highwater mark from v1 msgmapEric Wong2-3/+21
If we're reusing the msgmap from a v1 inbox, we also need to ensure the highwater mark doesn't get doubled in the v1->v2 conversion by internally triggering the equivalent of "--reindex" on a fresh v2 inbox. This was needed to convert an indexed v1 inbox which featured messages with multiple Message-IDs in it. Fresh, unindexed clones of v1 inboxes would not have been affected by this.
2020-01-31mboxgz: ensure gzipped mboxes always have filenamesEric Wong2-6/+10
Lets always have Content-Disposition for files intended to be downloaded for consumption by non-browsers, such as pigz, zcat, "git am". This is also to be consistent with the non-gzipped mbox $MESSAGE_ID/raw endpoint.
2020-01-31t/psgi_search: test for subject-free messagesEric Wong1-2/+31
Apparently I fixed this bug a while back in commit f94c3a195a25a31d0215cd175938008fca473378 but did not write tests.
2020-01-28v2writable: newest epochs go first in alternatesEric Wong2-11/+55
New epochs are the most likely to have loose objects. git won't be able to take advantage of pack indices and needs to scan every alternate for the loose object via open/openat syscalls. Those syscalls will add up some day when we've got hundreds or thousands of epochs.
2020-01-28INSTALL: fix Linux::Inotify2 package nameEric Wong1-1/+1
The "2" is important, since "Linux::Inotify" without the "2" is not available from Debian 9/10 or CentOS 7.x and seems unmaintained.
2020-01-28t/v2reindex.t: 5.10.1 glob compatibilityEric Wong1-3/+3
I'm not sure when `for (<"quoted string/glob/*">)' became supported, and maybe it was inadvertant, but it fails with Perl 5.10.1. Just use the glob() function to be explicit.
2020-01-28t/hl_mod: document IO::Handle for autoflushEric Wong1-0/+1
We don't need IO::File for this test, but IO::Handle is needed for ->autoflush with Perl <5.14. Note: I haven't tested highlight.pm under 5.10.1 since it's a weird dependency which isn't easy to install w/o distro support.
2020-01-28avoid relying on IO::Handle/IO::File autoloadEric Wong3-6/+5
Perl 5.14+ gained the ability to autoload IO::File (and IO::Handle) on missing methods, so relying on this breaks under 5.10.1. There's no reason to load IO::File or IO::Handle when built-in perlops work fine and are even a hair faster.
2020-01-28daemon: provide TCP_DEFER_ACCEPT for Perl <5.14Eric Wong5-9/+11
Socket::TCP_DEFER_ACCEPT() did not appear in the Socket module distributed with Perl until 5.14, despite it being available since Linux 2.4.
2020-01-27viewdiff: rewrite and simplifyEric Wong2-166/+139
Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27viewdiff: use autovivification for long_path hashEric Wong1-2/+1
No sense in wasting code to do something the interpreter already does for us.