2020-02-06MANIFEST: add flow.{ge,txt}
Oops :x
2020-02-06doc: v1: add a reference to git-filter-repo(1), too
The git-filter-branch(1) manpage itself recommends git-filter-repo, nowadays due to performance and safety problems.
2020-02-06doc: txt2pre: auto-linkify manpage references
This can be more convenient for people browsing HTML docs remotely or locally.
2020-02-06doc: remove .x/ subdirectory for Xapian manpages
There's no need to keep Xapian manpage renderings in a separate subdirectory, after all. Eliminating this difference between the local FS and URL path will allow relative URLs to the Xapian manpages in our local HTML documentation to work smoothly, since there was never any ".x/" path component for files served from public-inbox.org
2020-02-06doc: add data flow diagram using Graph::Easy
Maybe this can make it easier for new and potential users to understand what's going on.
2020-02-06t/multi-mid: don't access ~/.public-inbox/config
It can cause unpredictable behavior and also slow things down. Followup-to: e4d3be19612b2082 ("t: localize the PI_CONFIG env")
2020-02-04doc: recommend -compact after --reindex
It's likely a user will be low on space after running --reindex, so recommend the use of public-inbox-compact afterwards. And add a few more notes about using public-inbox-compact to clarify it's for inboxes-only (and not any old Xapian DBs) that using xapian-compact(1) directly is error-prone and likely to break things.
2020-02-04over: simplify read-only vs read-write checking
No need to call ref() and do a string comparison. Add some extra tests using the {ReadOnly} attribute in DBI.pm.
2020-02-04inbox: remove TODO item for msg_by_path
It's an old function which only gets called by inboxes w/o SQLite indices.
2020-02-04inbox: simplify ->description and ->cloneurl
We can use "//=" from Perl 5.10 to simplify the logic for these methods. The use of chomp() in ->cloneurl was also unnecessary since split(/\s+/s,...) already removes newlines.
2020-02-04www: serve $INBOX_DIR/description as $INBOX_URL/description
Instead of serving $INBOX_DIR/all.git/description, since $INBOX_DIR/all.git/description is not described in the default message when it's missing.
2020-02-04www: stricter regexp for 405 errors
We want to match "GET" and "HEAD" exactly, not requests which start with "GET" or end with "HEAD". This doesn't seem like a real problem for public-inboxes which are actually public data anyways.
2020-02-04doc: spellling fixes for manpages
The wording for publicinbox.nntpserver was awkward, too, and I took this as opportunity to hopefully clarify it and favor "hostname" for Internet addresses, because we already use "address" to mean "email address" in the config.
2020-02-02spawn: actually die on (vfork|fork) failures
Commit 9f5a583694396f84 ("spawn (and thus popen_rd) die on failure") was incomplete in that it only removed error checking for spawn failures for non-(vfork|fork) calls, but the actual (vfork|fork) PID result could still be undef. Fixes: 9f5a583694396f84 ("spawn (and thus popen_rd) die on failure")
2020-02-02v2writable: more ways to detect online CPU count
OpenBSD and FreeBSD support `getconf NPROCESSORS_ONLN` (no leading underscore). They may also have GNU nproc installed as "gnproc". We may also encounter Linux systems w/o GNU coreutils, but able to use `getconf _NPROCESSORS_ONLN` (with leading underscore).
2020-02-02doc: -convert: document switches
These switches have always been there, but were not documented until now.
2020-02-02convert: fix --no-index switch
The (currently undocumented) "--no-index" flag did not trigger the V2Writable->done call necessary to make the import successful. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-02convert: shift @ARGV explicitly
Relying on implicit "@_" for shift fails with TestCommon::_run_sub iff GetOptions modifies @ARGV.
2020-02-02searchidxshard: rely on autoflush instead of ->flush
It reduces the number of ops and simplifies the code, slightly. Add a missing IO::Handle import while we're at it, to be explicit about which methods we use.
2020-02-02convert: remove unused variables capturing :from
Looking at git history, they were never used.
2020-02-02v2writable: do not clobber {shards} or {parallel} if unset
The $jobs parameter in `public-inbox-convert' is passed to V2Writable->init_inbox as `undef' by default, causing parallelization to be disabled. Instead, leave the underlying {parallel} flag untouched if $shards is undef and do not clobber the default shard count. This allows us to take advantage of multicore systems when running public-inbox-convert with no command-line switches.
2020-02-02v2writable: nproc_shards: subtract 1 from given value
This is to be consistent with the `nproc(1)' code path. It also quiets down a warning from Admin when "-j $JOBS" is specified, since the master process (which distributes work to shards and handles OverIdx and Msgmap) is considered a job on its own.
2020-02-02t/multi-mid.t: extra test for -convert highwater mark
This is derived from a real-world test case where I encounterd multiple Message-IDs in a v1 inbox causing regen problems. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-01doc: more 1.3.0 release notes updates
Some updates with recent bugfixes and a few wording/formatting improvements.
2020-02-01config: assume multiple cgit URLs, too
Since we support inboxes with multiple URLs and multiple infourls to reduce reliance on SPOFs, we'll do the same with cgit URLs.
2020-02-01solver: join multiple URLs with "||"
It seems to make sense to the target audience that any of the URLs displayed could work.
2020-02-01wwwtext: give "url" examples in sample config
inbox.$NAME.url is a common parameter and set by public-inbox-init(1), so ensure we have lines for it and emphasize it can be multi-value for .onion hidden services or otherwise mirrored and available under multiple URLs.
2020-02-01wwwtext: show multiple infourl values properly
This is now an array, so ensure it's shown properly in the sample config, instead of "ARRAY(0xI8BADBEEF)" or similar. Fixes: 1988d730c0088e8b "config: support multi-value inbox.*.*url"
2020-01-31convert: preserve highwater mark from v1 msgmap
If we're reusing the msgmap from a v1 inbox, we also need to ensure the highwater mark doesn't get doubled in the v1->v2 conversion by internally triggering the equivalent of "--reindex" on a fresh v2 inbox. This was needed to convert an indexed v1 inbox which featured messages with multiple Message-IDs in it. Fresh, unindexed clones of v1 inboxes would not have been affected by this.
2020-01-31mboxgz: ensure gzipped mboxes always have filenames
Lets always have Content-Disposition for files intended to be downloaded for consumption by non-browsers, such as pigz, zcat, "git am". This is also to be consistent with the non-gzipped mbox $MESSAGE_ID/raw endpoint.
2020-01-31t/psgi_search: test for subject-free messages
Apparently I fixed this bug a while back in commit f94c3a195a25a31d0215cd175938008fca473378 but did not write tests.
2020-01-28v2writable: newest epochs go first in alternates
New epochs are the most likely to have loose objects. git won't be able to take advantage of pack indices and needs to scan every alternate for the loose object via open/openat syscalls. Those syscalls will add up some day when we've got hundreds or thousands of epochs.
2020-01-28INSTALL: fix Linux::Inotify2 package name
The "2" is important, since "Linux::Inotify" without the "2" is not available from Debian 9/10 or CentOS 7.x and seems unmaintained.
2020-01-28t/v2reindex.t: 5.10.1 glob compatibility
I'm not sure when `for (<"quoted string/glob/*">)' became supported, and maybe it was inadvertant, but it fails with Perl 5.10.1. Just use the glob() function to be explicit.
2020-01-28t/hl_mod: document IO::Handle for autoflush
We don't need IO::File for this test, but IO::Handle is needed for ->autoflush with Perl <5.14. Note: I haven't tested highlight.pm under 5.10.1 since it's a weird dependency which isn't easy to install w/o distro support.
2020-01-28avoid relying on IO::Handle/IO::File autoload
Perl 5.14+ gained the ability to autoload IO::File (and IO::Handle) on missing methods, so relying on this breaks under 5.10.1. There's no reason to load IO::File or IO::Handle when built-in perlops work fine and are even a hair faster.
2020-01-28daemon: provide TCP_DEFER_ACCEPT for Perl <5.14
Socket::TCP_DEFER_ACCEPT() did not appear in the Socket module distributed with Perl until 5.14, despite it being available since Linux 2.4.
2020-01-27viewdiff: rewrite and simplify
Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27viewdiff: use autovivification for long_path hash
No sense in wasting code to do something the interpreter already does for us.
2020-01-27viewdiff: add "b=" param when missing "diff --git" line
<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to git@vger contained an otherwise valid diff without a "diff --git" line. Generate a "b=" parameter in that case using the "+++" line instead of the "diff --git" line. SearchIdx.pm no longer uses the "diff --git" line for filename information, either.
2020-01-27viewdiff: add "b=" param with non-standard diff prefix
<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to git@vger) uses "i" and "w" prefixes instead of the standard "a" and "b" prefixes, ensure we emit a "b=$FILENAME" param for the solver endpoint to improve search accuracy, syntax highlighting, and information density in the URL itself.
2020-01-27searchidx: don't assume "a/" and "b/" as prefixes
Some people use "--{src,dst}-prefix=", try to deal with those since git-apply can handle them when called by solver.
2020-01-27searchidx: skip filenames on "diff --git ..."
We already capture filenames on the lines beginning with "---" and "+++", so it's redundant work to capture filenames from "diff --git ..." lines.
2020-01-27linkify: move to_html over from ViewDiff
We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-27linkify: compile $LINK_RE once
This gives a 3-4% performance improvement in xt/perf-msgview.t with a mirror of https://public-inbox.org/meta/
2020-01-27view: inline and eliminate msg_html
No need to keep the old sub around, anymore. Rename auxiliary subs to "msg_page_*" instead of the "html" version.
2020-01-27xt/perf-msgview: switch to multipart_text_as_html
It's a more widely-used (but still internal) API which will probably last longer than msg_html. It also reaches deeper into the stack and avoids the overhead of ->getline via PSGI, so it's faster and gives a more accurate measurement of lower-level parts.
2020-01-27tests: move the majority of t/view.t into t/plack.t
And some more into t/mid.t. PublicInbox::View::msg_html may change internally, so lets rely on the stable PSGI interface to test it, rather than a test which reaches deep into the internals.
2020-01-27init: use Import::run_die instead of system()
We already load PublicInbox::Import via PublicInbox::InboxWritable, so it's not an extra module to load. This can give us a slight speedup in tests.
2020-01-27t/plack.t: modernize and unindent
This test will be expanded, and we can take advantage of run_script to simplify our internal API use.