Date | Commit message (Collapse) |
|
Commit 9f5a583694396f84 ("spawn (and thus popen_rd) die on failure")
was incomplete in that it only removed error checking for spawn
failures for non-(vfork|fork) calls, but the actual (vfork|fork)
PID result could still be undef.
Fixes: 9f5a583694396f84 ("spawn (and thus popen_rd) die on failure")
|
|
OpenBSD and FreeBSD support `getconf NPROCESSORS_ONLN` (no
leading underscore). They may also have GNU nproc installed as
"gnproc".
We may also encounter Linux systems w/o GNU coreutils, but able
to use `getconf _NPROCESSORS_ONLN` (with leading underscore).
|
|
It reduces the number of ops and simplifies the code, slightly.
Add a missing IO::Handle import while we're at it, to be
explicit about which methods we use.
|
|
The $jobs parameter in `public-inbox-convert' is passed to
V2Writable->init_inbox as `undef' by default, causing
parallelization to be disabled.
Instead, leave the underlying {parallel} flag untouched if
$shards is undef and do not clobber the default shard count.
This allows us to take advantage of multicore systems when
running public-inbox-convert with no command-line switches.
|
|
This is to be consistent with the `nproc(1)' code path. It also
quiets down a warning from Admin when "-j $JOBS" is specified,
since the master process (which distributes work to shards and
handles OverIdx and Msgmap) is considered a job on its own.
|
|
Since we support inboxes with multiple URLs and multiple
infourls to reduce reliance on SPOFs, we'll do the same with
cgit URLs.
|
|
It seems to make sense to the target audience that any of
the URLs displayed could work.
|
|
inbox.$NAME.url is a common parameter and set by
public-inbox-init(1), so ensure we have lines for it and
emphasize it can be multi-value for .onion hidden services or
otherwise mirrored and available under multiple URLs.
|
|
This is now an array, so ensure it's shown properly in the
sample config, instead of "ARRAY(0xI8BADBEEF)" or similar.
Fixes: 1988d730c0088e8b "config: support multi-value inbox.*.*url"
|
|
Lets always have Content-Disposition for files intended
to be downloaded for consumption by non-browsers, such
as pigz, zcat, "git am".
This is also to be consistent with the non-gzipped mbox
$MESSAGE_ID/raw endpoint.
|
|
New epochs are the most likely to have loose objects. git won't
be able to take advantage of pack indices and needs to scan
every alternate for the loose object via open/openat syscalls.
Those syscalls will add up some day when we've got hundreds or
thousands of epochs.
|
|
Perl 5.14+ gained the ability to autoload IO::File
(and IO::Handle) on missing methods, so relying on
this breaks under 5.10.1.
There's no reason to load IO::File or IO::Handle
when built-in perlops work fine and are even a hair
faster.
|
|
Socket::TCP_DEFER_ACCEPT() did not appear in the Socket module
distributed with Perl until 5.14, despite it being available
since Linux 2.4.
|
|
Instead of going line-by-line, use split() with a giant regexp
to capture groups of contiguous lines. This offloads state
management to the regexp itself and makes it FAR easier to
keep track of <span> and </span> pairings.
Performance seems roughly on par after this change for the
meta@public-inbox archives. It seems a tiny bit faster for
git@vger with xt/perf-msgview.t, likely due to the longer
messages and larger contiguous groups of lines having the same
prefix (or no prefix at all) and drastically reduces the number
of subroutine calls and Perl ops executed.
|
|
No sense in wasting code to do something the interpreter
already does for us.
|
|
<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to
git@vger contained an otherwise valid diff without a "diff
--git" line. Generate a "b=" parameter in that case using the
"+++" line instead of the "diff --git" line. SearchIdx.pm no
longer uses the "diff --git" line for filename information,
either.
|
|
<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to
git@vger) uses "i" and "w" prefixes instead of the standard "a"
and "b" prefixes, ensure we emit a "b=$FILENAME" param for the
solver endpoint to improve search accuracy, syntax highlighting,
and information density in the URL itself.
|
|
Some people use "--{src,dst}-prefix=", try to deal with those
since git-apply can handle them when called by solver.
|
|
We already capture filenames on the lines beginning
with "---" and "+++", so it's redundant work to capture
filenames from "diff --git ..." lines.
|
|
We use the same idiom in many places for doing two-step
linkification and HTML escaping. Get rid of an outdated
comment in flush_quote while we're at it.
|
|
This gives a 3-4% performance improvement in xt/perf-msgview.t
with a mirror of https://public-inbox.org/meta/
|
|
No need to keep the old sub around, anymore. Rename auxiliary
subs to "msg_page_*" instead of the "html" version.
|
|
Get rid of the confusingly named {rv} and {tip} fields
and unify them into {obuf} for readability.
{obuf} usage may be expanded to more areas in the future. This
will eventually make it easier for us to experiment with
alternative buffering schemes.
|
|
This should make it clear that we only use these elements
once and can discard them. While we're in the area, avoid
escaping '"' by using qq() instead of "" to quote strings
requiring interpolation.
|
|
It's an uncommon code path, no need to make it more complex
than it needs to be by having extra sub parameters.
|
|
It hasn't changed in a few years. Now we can rely on constant
folding to avoid extraneous ops to the $skel buffer.
|
|
Put more logic into html_footer and less in its only caller so
we can control the buffering and string creation.
|
|
It'll always be used as a callback, so there's no point in
giving it a name to be called non-anonymously. Making
assigments to it is slightly faster since there's no need
to repeatedly do a lookup by name.
|
|
Pass \&coderefs explicitly to walk_thread, and add some
prototypes + comments to describe what goes on.
|
|
This saves us a few comments and confusion. Yes, it's a
destination so "dst" can be appropriate, but we may be using
that term elsewhere.
|
|
Be explicit that we're making a code reference, and not
a reference to a scalar, array, hash, or IO...
|
|
The old lock scope was only sufficient for protecting against
concurrent modifications from the common -mda, -watch, or -learn
writers.
It was not sufficient for protecting against parallel -compact
or -xcpdb invocations from eager admins. Most of the time this
only leads to confusing and misleading warning messages, but
parallel xcpdb --reshard could lead to errors.
|
|
We don't confuse human readers with the Xapian schema version.
We also want to make it obvious this is the version of the inbox
we're indexing, these are Search or SearchIdx objects, not Inbox
objects.
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
The "perlio" layer doesn't do read(2) syscalls over 8192 bytes
at the moment, and binmode($fh, ':unix') leaks[1]. So use
sysseek and sysread for now, since I can't see retaining
compatibility with PerlIO::scalar being worth the trouble.
[1] http://nntp.perl.org/group/perl.perl5.porters/256918
|
|
gmane still has a NNTP server, so update links to point to it.
cf. https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
|
|
This prevents public-inbox-httpd from buffering ->getline
results from a static file into another temporary file when
writing to slow clients. Instead we inject the static file
ref with offsets and length directly into the {wbuf} queue.
It took me a while to decide to go this route, some
rejected ideas:
1. Using Plack::Util::set_io_path and having PublicInbox::HTTP
serve the result directly. This is compatible with what
some other PSGI servers do using sendfile. However, neither
Starman or Twiggy currently use sendfile for partial responses.
2. Parsing the Content-Range response header for offsets and
lengths to use with set_io_path for partial responses.
These rejected ideas required increasing the complexity of HTTP
response writing in PublicInbox::HTTP in the common, non-static
file cases. Instead, we made minor changes to the colder write
buffering path of PublicInbox::DS and leave the hot paths
untouched.
We still support generic PSGI servers via ->getline. However,
since we don't know the characteristics of other PSGI servers,
we no longer do a 64K initial read in an attempt to negotiate a
larger TCP window.
|
|
We want to be able to inject existing file handles + offsets and
even lengths into this in the future, without going through the
->getline interface[1]
We also switch to using a 64K buffer size since we can safely
discard whatever got truncated on write and full writes can help
negotiate a larger TCP window for high-latency, high-bandwidth
links.
While we're at it, make it obvious that we're using O_APPEND for
our tmpfile() interface so we can seek freely for reading while
the writer always prints to the end of the file.
[1] the getline interface for serving static files may result
in us buffering on-FS data into another temporary file,
which is a waste.
|
|
The PSGI server needs to account for ->getline failing
due to disk failures or truncated files, anyways. So
just die() ourselves and let the PSGI server log and
drop the client.
|
|
While there is no known actual leak due to reference cycles,
here, eliminating a potential source of leaks is helpful.
|
|
While both can be correct, the former seems more common,
is shorter, and is also consistent with the spelling found
in the AGPL-3.0 text.
|
|
We can't pass empty strings to `to_filename' without
triggering warnings, and `to_filename' on an empty string
makes no sense.
|
|
OverIdx::parse_references already skips duplicate
References (which we use in SearchThread for rendering).
So there's no reason for our content deduplication logic
to care if a Message-Id in the Reference header is mentioned
twice.
|
|
Another place where List::Scalar::uniq doesn't make sense,
but there's a small op reduction to be had anyways.
|
|
We won't be able to use List::Util::uniq here, but we can still
shorten our logic and make it more consistent with the rest of
our code which does similar things.
|
|
And add a note to remind ourselves to use List::Util::uniq
when it becomes common.
|
|
We can cut down on the number of operations required
using "grep" instead of "foreach".
|
|
This use of map {} is a common idiom as we no longer consider
the Message-ID as part of the digest.
|
|
We don't call from_attr anywhere outside of tests, so don't
bloat normal processes with it.
|
|
We need to escape wide characters when making attribute names from
filename-looking things in diffstats.
|