Date | Commit message (Collapse) |
|
These switches have always been there, but were not
documented until now.
|
|
The (currently undocumented) "--no-index" flag did not trigger
the V2Writable->done call necessary to make the import
successful.
Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
|
|
Relying on implicit "@_" for shift fails with
TestCommon::_run_sub iff GetOptions modifies @ARGV.
|
|
It reduces the number of ops and simplifies the code, slightly.
Add a missing IO::Handle import while we're at it, to be
explicit about which methods we use.
|
|
Looking at git history, they were never used.
|
|
The $jobs parameter in `public-inbox-convert' is passed to
V2Writable->init_inbox as `undef' by default, causing
parallelization to be disabled.
Instead, leave the underlying {parallel} flag untouched if
$shards is undef and do not clobber the default shard count.
This allows us to take advantage of multicore systems when
running public-inbox-convert with no command-line switches.
|
|
This is to be consistent with the `nproc(1)' code path. It also
quiets down a warning from Admin when "-j $JOBS" is specified,
since the master process (which distributes work to shards and
handles OverIdx and Msgmap) is considered a job on its own.
|
|
This is derived from a real-world test case where I encounterd
multiple Message-IDs in a v1 inbox causing regen problems.
Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
|
|
Some updates with recent bugfixes and a few wording/formatting
improvements.
|
|
Since we support inboxes with multiple URLs and multiple
infourls to reduce reliance on SPOFs, we'll do the same with
cgit URLs.
|
|
It seems to make sense to the target audience that any of
the URLs displayed could work.
|
|
inbox.$NAME.url is a common parameter and set by
public-inbox-init(1), so ensure we have lines for it and
emphasize it can be multi-value for .onion hidden services or
otherwise mirrored and available under multiple URLs.
|
|
This is now an array, so ensure it's shown properly in the
sample config, instead of "ARRAY(0xI8BADBEEF)" or similar.
Fixes: 1988d730c0088e8b "config: support multi-value inbox.*.*url"
|
|
If we're reusing the msgmap from a v1 inbox, we also need to
ensure the highwater mark doesn't get doubled in the v1->v2
conversion by internally triggering the equivalent of
"--reindex" on a fresh v2 inbox.
This was needed to convert an indexed v1 inbox which featured
messages with multiple Message-IDs in it. Fresh, unindexed
clones of v1 inboxes would not have been affected by this.
|
|
Lets always have Content-Disposition for files intended
to be downloaded for consumption by non-browsers, such
as pigz, zcat, "git am".
This is also to be consistent with the non-gzipped mbox
$MESSAGE_ID/raw endpoint.
|
|
Apparently I fixed this bug a while back in commit
f94c3a195a25a31d0215cd175938008fca473378 but did
not write tests.
|
|
New epochs are the most likely to have loose objects. git won't
be able to take advantage of pack indices and needs to scan
every alternate for the loose object via open/openat syscalls.
Those syscalls will add up some day when we've got hundreds or
thousands of epochs.
|
|
The "2" is important, since "Linux::Inotify" without the "2"
is not available from Debian 9/10 or CentOS 7.x and seems
unmaintained.
|
|
I'm not sure when `for (<"quoted string/glob/*">)' became
supported, and maybe it was inadvertant, but it fails
with Perl 5.10.1. Just use the glob() function to be
explicit.
|
|
We don't need IO::File for this test, but IO::Handle
is needed for ->autoflush with Perl <5.14.
Note: I haven't tested highlight.pm under 5.10.1 since
it's a weird dependency which isn't easy to install w/o
distro support.
|
|
Perl 5.14+ gained the ability to autoload IO::File
(and IO::Handle) on missing methods, so relying on
this breaks under 5.10.1.
There's no reason to load IO::File or IO::Handle
when built-in perlops work fine and are even a hair
faster.
|
|
Socket::TCP_DEFER_ACCEPT() did not appear in the Socket module
distributed with Perl until 5.14, despite it being available
since Linux 2.4.
|
|
Instead of going line-by-line, use split() with a giant regexp
to capture groups of contiguous lines. This offloads state
management to the regexp itself and makes it FAR easier to
keep track of <span> and </span> pairings.
Performance seems roughly on par after this change for the
meta@public-inbox archives. It seems a tiny bit faster for
git@vger with xt/perf-msgview.t, likely due to the longer
messages and larger contiguous groups of lines having the same
prefix (or no prefix at all) and drastically reduces the number
of subroutine calls and Perl ops executed.
|
|
No sense in wasting code to do something the interpreter
already does for us.
|
|
<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to
git@vger contained an otherwise valid diff without a "diff
--git" line. Generate a "b=" parameter in that case using the
"+++" line instead of the "diff --git" line. SearchIdx.pm no
longer uses the "diff --git" line for filename information,
either.
|
|
<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to
git@vger) uses "i" and "w" prefixes instead of the standard "a"
and "b" prefixes, ensure we emit a "b=$FILENAME" param for the
solver endpoint to improve search accuracy, syntax highlighting,
and information density in the URL itself.
|
|
Some people use "--{src,dst}-prefix=", try to deal with those
since git-apply can handle them when called by solver.
|
|
We already capture filenames on the lines beginning
with "---" and "+++", so it's redundant work to capture
filenames from "diff --git ..." lines.
|
|
We use the same idiom in many places for doing two-step
linkification and HTML escaping. Get rid of an outdated
comment in flush_quote while we're at it.
|
|
This gives a 3-4% performance improvement in xt/perf-msgview.t
with a mirror of https://public-inbox.org/meta/
|
|
No need to keep the old sub around, anymore. Rename auxiliary
subs to "msg_page_*" instead of the "html" version.
|
|
It's a more widely-used (but still internal) API which will
probably last longer than msg_html. It also reaches deeper into
the stack and avoids the overhead of ->getline via PSGI, so it's
faster and gives a more accurate measurement of lower-level parts.
|
|
And some more into t/mid.t. PublicInbox::View::msg_html may
change internally, so lets rely on the stable PSGI interface
to test it, rather than a test which reaches deep into the
internals.
|
|
We already load PublicInbox::Import via
PublicInbox::InboxWritable, so it's not an extra module
to load. This can give us a slight speedup in tests.
|
|
This test will be expanded, and we can take advantage of
run_script to simplify our internal API use.
|
|
Get rid of the confusingly named {rv} and {tip} fields
and unify them into {obuf} for readability.
{obuf} usage may be expanded to more areas in the future. This
will eventually make it easier for us to experiment with
alternative buffering schemes.
|
|
This should make it clear that we only use these elements
once and can discard them. While we're in the area, avoid
escaping '"' by using qq() instead of "" to quote strings
requiring interpolation.
|
|
It's an uncommon code path, no need to make it more complex
than it needs to be by having extra sub parameters.
|
|
It hasn't changed in a few years. Now we can rely on constant
folding to avoid extraneous ops to the $skel buffer.
|
|
Put more logic into html_footer and less in its only caller so
we can control the buffering and string creation.
|
|
It'll always be used as a callback, so there's no point in
giving it a name to be called non-anonymously. Making
assigments to it is slightly faster since there's no need
to repeatedly do a lookup by name.
|
|
Pass \&coderefs explicitly to walk_thread, and add some
prototypes + comments to describe what goes on.
|
|
This saves us a few comments and confusion. Yes, it's a
destination so "dst" can be appropriate, but we may be using
that term elsewhere.
|
|
Be explicit that we're making a code reference, and not
a reference to a scalar, array, hash, or IO...
|
|
The old lock scope was only sufficient for protecting against
concurrent modifications from the common -mda, -watch, or -learn
writers.
It was not sufficient for protecting against parallel -compact
or -xcpdb invocations from eager admins. Most of the time this
only leads to confusing and misleading warning messages, but
parallel xcpdb --reshard could lead to errors.
|
|
We don't confuse human readers with the Xapian schema version.
We also want to make it obvious this is the version of the inbox
we're indexing, these are Search or SearchIdx objects, not Inbox
objects.
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
The "perlio" layer doesn't do read(2) syscalls over 8192 bytes
at the moment, and binmode($fh, ':unix') leaks[1]. So use
sysseek and sysread for now, since I can't see retaining
compatibility with PerlIO::scalar being worth the trouble.
[1] http://nntp.perl.org/group/perl.perl5.porters/256918
|
|
gmane still has a NNTP server, so update links to point to it.
cf. https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
|
|
This prevents public-inbox-httpd from buffering ->getline
results from a static file into another temporary file when
writing to slow clients. Instead we inject the static file
ref with offsets and length directly into the {wbuf} queue.
It took me a while to decide to go this route, some
rejected ideas:
1. Using Plack::Util::set_io_path and having PublicInbox::HTTP
serve the result directly. This is compatible with what
some other PSGI servers do using sendfile. However, neither
Starman or Twiggy currently use sendfile for partial responses.
2. Parsing the Content-Range response header for offsets and
lengths to use with set_io_path for partial responses.
These rejected ideas required increasing the complexity of HTTP
response writing in PublicInbox::HTTP in the common, non-static
file cases. Instead, we made minor changes to the colder write
buffering path of PublicInbox::DS and leave the hot paths
untouched.
We still support generic PSGI servers via ->getline. However,
since we don't know the characteristics of other PSGI servers,
we no longer do a 64K initial read in an attempt to negotiate a
larger TCP window.
|