public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-02-02	doc: -convert: document switches
	These switches have always been there, but were not documented until now.
2020-02-02	convert: fix --no-index switch
	The (currently undocumented) "--no-index" flag did not trigger the V2Writable->done call necessary to make the import successful. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-02	convert: shift @ARGV explicitly
	Relying on implicit "@_" for shift fails with TestCommon::_run_sub iff GetOptions modifies @ARGV.
2020-02-02	searchidxshard: rely on autoflush instead of ->flush
	It reduces the number of ops and simplifies the code, slightly. Add a missing IO::Handle import while we're at it, to be explicit about which methods we use.
2020-02-02	convert: remove unused variables capturing :from
	Looking at git history, they were never used.
2020-02-02	v2writable: do not clobber {shards} or {parallel} if unset
	The $jobs parameter in `public-inbox-convert' is passed to V2Writable->init_inbox as `undef' by default, causing parallelization to be disabled. Instead, leave the underlying {parallel} flag untouched if $shards is undef and do not clobber the default shard count. This allows us to take advantage of multicore systems when running public-inbox-convert with no command-line switches.
2020-02-02	v2writable: nproc_shards: subtract 1 from given value
	This is to be consistent with the `nproc(1)' code path. It also quiets down a warning from Admin when "-j $JOBS" is specified, since the master process (which distributes work to shards and handles OverIdx and Msgmap) is considered a job on its own.
2020-02-02	t/multi-mid.t: extra test for -convert highwater mark
	This is derived from a real-world test case where I encounterd multiple Message-IDs in a v1 inbox causing regen problems. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-02-01	doc: more 1.3.0 release notes updates
	Some updates with recent bugfixes and a few wording/formatting improvements.
2020-02-01	config: assume multiple cgit URLs, too
	Since we support inboxes with multiple URLs and multiple infourls to reduce reliance on SPOFs, we'll do the same with cgit URLs.
2020-02-01	solver: join multiple URLs with "\|\|"
	It seems to make sense to the target audience that any of the URLs displayed could work.
2020-02-01	wwwtext: give "url" examples in sample config
	inbox.$NAME.url is a common parameter and set by public-inbox-init(1), so ensure we have lines for it and emphasize it can be multi-value for .onion hidden services or otherwise mirrored and available under multiple URLs.
2020-02-01	wwwtext: show multiple infourl values properly
	This is now an array, so ensure it's shown properly in the sample config, instead of "ARRAY(0xI8BADBEEF)" or similar. Fixes: 1988d730c0088e8b "config: support multi-value inbox..url"
2020-01-31	convert: preserve highwater mark from v1 msgmap
	If we're reusing the msgmap from a v1 inbox, we also need to ensure the highwater mark doesn't get doubled in the v1->v2 conversion by internally triggering the equivalent of "--reindex" on a fresh v2 inbox. This was needed to convert an indexed v1 inbox which featured messages with multiple Message-IDs in it. Fresh, unindexed clones of v1 inboxes would not have been affected by this.
2020-01-31	mboxgz: ensure gzipped mboxes always have filenames
	Lets always have Content-Disposition for files intended to be downloaded for consumption by non-browsers, such as pigz, zcat, "git am". This is also to be consistent with the non-gzipped mbox $MESSAGE_ID/raw endpoint.
2020-01-31	t/psgi_search: test for subject-free messages
	Apparently I fixed this bug a while back in commit f94c3a195a25a31d0215cd175938008fca473378 but did not write tests.
2020-01-28	v2writable: newest epochs go first in alternates
	New epochs are the most likely to have loose objects. git won't be able to take advantage of pack indices and needs to scan every alternate for the loose object via open/openat syscalls. Those syscalls will add up some day when we've got hundreds or thousands of epochs.
2020-01-28	INSTALL: fix Linux::Inotify2 package name
	The "2" is important, since "Linux::Inotify" without the "2" is not available from Debian 9/10 or CentOS 7.x and seems unmaintained.
2020-01-28	t/v2reindex.t: 5.10.1 glob compatibility
	I'm not sure when `for (<"quoted string/glob/*">)' became supported, and maybe it was inadvertant, but it fails with Perl 5.10.1. Just use the glob() function to be explicit.
2020-01-28	t/hl_mod: document IO::Handle for autoflush
	We don't need IO::File for this test, but IO::Handle is needed for ->autoflush with Perl <5.14. Note: I haven't tested highlight.pm under 5.10.1 since it's a weird dependency which isn't easy to install w/o distro support.
2020-01-28	avoid relying on IO::Handle/IO::File autoload
	Perl 5.14+ gained the ability to autoload IO::File (and IO::Handle) on missing methods, so relying on this breaks under 5.10.1. There's no reason to load IO::File or IO::Handle when built-in perlops work fine and are even a hair faster.
2020-01-28	daemon: provide TCP_DEFER_ACCEPT for Perl <5.14
	Socket::TCP_DEFER_ACCEPT() did not appear in the Socket module distributed with Perl until 5.14, despite it being available since Linux 2.4.
2020-01-27	viewdiff: rewrite and simplify
	Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27	viewdiff: use autovivification for long_path hash
	No sense in wasting code to do something the interpreter already does for us.
2020-01-27	viewdiff: add "b=" param when missing "diff --git" line
	<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to git@vger contained an otherwise valid diff without a "diff --git" line. Generate a "b=" parameter in that case using the "+++" line instead of the "diff --git" line. SearchIdx.pm no longer uses the "diff --git" line for filename information, either.
2020-01-27	viewdiff: add "b=" param with non-standard diff prefix
	<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to git@vger) uses "i" and "w" prefixes instead of the standard "a" and "b" prefixes, ensure we emit a "b=$FILENAME" param for the solver endpoint to improve search accuracy, syntax highlighting, and information density in the URL itself.
2020-01-27	searchidx: don't assume "a/" and "b/" as prefixes
	Some people use "--{src,dst}-prefix=", try to deal with those since git-apply can handle them when called by solver.
2020-01-27	searchidx: skip filenames on "diff --git ..."
	We already capture filenames on the lines beginning with "---" and "+++", so it's redundant work to capture filenames from "diff --git ..." lines.
2020-01-27	linkify: move to_html over from ViewDiff
	We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-27	linkify: compile $LINK_RE once
	This gives a 3-4% performance improvement in xt/perf-msgview.t with a mirror of https://public-inbox.org/meta/
2020-01-27	view: inline and eliminate msg_html
	No need to keep the old sub around, anymore. Rename auxiliary subs to "msg_page_*" instead of the "html" version.
2020-01-27	xt/perf-msgview: switch to multipart_text_as_html
	It's a more widely-used (but still internal) API which will probably last longer than msg_html. It also reaches deeper into the stack and avoids the overhead of ->getline via PSGI, so it's faster and gives a more accurate measurement of lower-level parts.
2020-01-27	tests: move the majority of t/view.t into t/plack.t
	And some more into t/mid.t. PublicInbox::View::msg_html may change internally, so lets rely on the stable PSGI interface to test it, rather than a test which reaches deep into the internals.
2020-01-27	init: use Import::run_die instead of system()
	We already load PublicInbox::Import via PublicInbox::InboxWritable, so it's not an extra module to load. This can give us a slight speedup in tests.
2020-01-27	t/plack.t: modernize and unindent
	This test will be expanded, and we can take advantage of run_script to simplify our internal API use.
2020-01-27	view: start performing buffering into {obuf}
	Get rid of the confusingly named {rv} and {tip} fields and unify them into {obuf} for readability. {obuf} usage may be expanded to more areas in the future. This will eventually make it easier for us to experiment with alternative buffering schemes.
2020-01-27	wwwstream: discard single-use $ctx fields after use
	This should make it clear that we only use these elements once and can discard them. While we're in the area, avoid escaping '"' by using qq() instead of "" to quote strings requiring interpolation.
2020-01-27	view: simplify duplicate Message-ID handling
	It's an uncommon code path, no need to make it more complex than it needs to be by having extra sub parameters.
2020-01-27	view: thread_skel: drop constant tpfx parameter
	It hasn't changed in a few years. Now we can rely on constant folding to avoid extraneous ops to the $skel buffer.
2020-01-27	view: reduce parameters for html_footer
	Put more logic into html_footer and less in its only caller so we can control the buffering and string creation.
2020-01-27	searchview: keep $noop sub private to the package
	It'll always be used as a callback, so there's no point in giving it a name to be called non-anonymously. Making assigments to it is slightly faster since there's no need to repeatedly do a lookup by name.
2020-01-27	view: improve readability around walk_thread
	Pass \&coderefs explicitly to walk_thread, and add some prototypes + comments to describe what goes on.
2020-01-27	www: use "skel" terminology consistently
	This saves us a few comments and confusion. Yes, it's a destination so "dst" can be appropriate, but we may be using that term elsewhere.
2020-01-27	wwwstream: favor \&close instead of close
	Be explicit that we're making a code reference, and not a reference to a scalar, array, hash, or IO...
2020-01-27	xapcmd: increase scope of lock
	The old lock scope was only sufficient for protecting against concurrent modifications from the common -mda, -watch, or -learn writers. It was not sufficient for protecting against parallel -compact or -xcpdb invocations from eager admins. Most of the time this only leads to confusing and misleading warning messages, but parallel xcpdb --reshard could lead to errors.
2020-01-27	search: {version} => {ibx_ver}
	We don't confuse human readers with the Xapian schema version. We also want to make it obvious this is the version of the inbox we're indexing, these are Search or SearchIdx objects, not Inbox objects.
2020-01-27	inbox: add ->version method
	This allows us to simplify version checking by avoiding "//" or "\|\|" operators sprinkled around.
2020-01-27	switch to sysseek + sysread for serving static files
	The "perlio" layer doesn't do read(2) syscalls over 8192 bytes at the moment, and binmode($fh, ':unix') leaks[1]. So use sysseek and sysread for now, since I can't see retaining compatibility with PerlIO::scalar being worth the trouble. [1] http://nntp.perl.org/group/perl.perl5.porters/256918
2020-01-25	s/news.gmane.org/news.gmane.io/
	gmane still has a NNTP server, so update links to point to it. cf. https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
2020-01-25	wwwstatic: wire up buffer bypass for -httpd
	This prevents public-inbox-httpd from buffering ->getline results from a static file into another temporary file when writing to slow clients. Instead we inject the static file ref with offsets and length directly into the {wbuf} queue. It took me a while to decide to go this route, some rejected ideas: 1. Using Plack::Util::set_io_path and having PublicInbox::HTTP serve the result directly. This is compatible with what some other PSGI servers do using sendfile. However, neither Starman or Twiggy currently use sendfile for partial responses. 2. Parsing the Content-Range response header for offsets and lengths to use with set_io_path for partial responses. These rejected ideas required increasing the complexity of HTTP response writing in PublicInbox::HTTP in the common, non-static file cases. Instead, we made minor changes to the colder write buffering path of PublicInbox::DS and leave the hot paths untouched. We still support generic PSGI servers via ->getline. However, since we don't know the characteristics of other PSGI servers, we no longer do a 64K initial read in an attempt to negotiate a larger TCP window.