Date | Commit message (Collapse) |
|
Wrap the entire solver command chain with a dedicated limiter.
The normal limiter is designed for longer-lived commands or ones
which serve a single HTTP request (e.g. git-http-backend or
cgit) and not effective for short memory + CPU intensive commands
used for solver.
Each overall solver request is both memory + CPU intensive: it
spawns several short-lived git processes(*) in addition to a
longer-lived `git cat-file --batch' process.
Thus running parallel solvers from a single -netd/-httpd worker
(which have their own parallelization) results in excessive
parallelism that is both memory and CPU-bound (not network-bound)
and cascade into slowdowns for handling simpler memory/CPU-bound
requests. Parallel solvers were also responsible for the
increased lifetime and frequency of zombies since the event loop
was too saturated to reap them.
We'll also return 503 on excessive solver queueing, since these
require an FD for the client HTTP(S) socket to be held onto.
(*) git (update-index|apply|ls-files) are all run by solver and
short-lived
|
|
This is a major step in solving the problem of having to
manually associate hundreds/thousands of coderepos with
hundreds/thousands of public-inboxes to power solver
(and more).
|
|
We don't want hundreds of git cat-file processes for coderepos
lingering around.
|
|
This is pretty convenient way to create files for diff
generation in both WWW and lei. The test suite should also be
able to take advantage of it.
|
|
Callers that want to requeue can call PublicInbox::DS::requeue
directly and not go through the convoluted argument handling
via PublicInbox::HTTPD::Async->new.
|
|
This will make it easier to switch in the far future while
making callers easier-to-read (and more callers will be added).
Anyways, Perl 5.26 is a long time away for enterprise users;
but isolating compatibility code away can improve readability
of code we actually care about in the meantime.
|
|
Too many patches don't apply (due to coderepos being a PITA to
associate) and interested admins can check for 404s to diagnose
them, anyways. This reduces the noise in syslog/stderr for
public-facing daemons.
|
|
It was a harmless negation, I must've pasted a line from a diff
and forgotten to chop off the first character :x
Fixes: 6f5b238bae5c "solver: early make hints detection more robust"
|
|
This happens quite often on my systems due to scrapers,
unfortunately.
|
|
Just reusing ViewVCS::show, since encoding refname and pathnames
into things just makes things slower.
|
|
Messages in /all/ can get duplicated at times due to
list-appended signatures or buggy/malicious clients.
They'll all show up based on /$INBOX/$MSGID/,
so deduplicate the URLs to avoid noise.
|
|
We're considering duplicate patches from cross-posted lists
identical, so don't double-count them when displaying the
"applying [X/Y]" message since (successful) duplicates get
skipped.
|
|
At least enough to get 66 patches applied to handle
/lore/all/34d644a519c/s/?b=target/arm/helper-mve.h
properly.
I noticed this bug due to a:
E: BUG: extra files in index: <100644 e91f526a1a83edb2b56798388a355b1c3729b4bd 0#011target/arm/translate-mve.c>
line in my syslog
However, the $TOTAL in "applying [X/$TOTAL]" in the debug log
is seems off...
|
|
Hints fields can change, so we'll use a simple boolean rather
than checking a static count. We'll also short-circuit out
reliably regardless of hints when a full OID is given.
|
|
"lei blob" doesn't currently need it at all in some cases, and
the next commit will allow viewvcs to share tmpdirs to show
commits as HTML.
|
|
First off, avoid potential circular references (via {qx_arg}) by
dropping the {-qsp} field from $ctx and SolverGit objects.
Instead, we only share a reference to an optional error buffer
string {qsp_err}.
We'll also attempt to call qspawn.wcb if qx_cb fails, and warn
in more places w/o checking for $env since we now rely on warn()
instead of $env->{'psgi.errors'}.
This makes error handling simpler and safer in future callers.
|
|
git deprecated core.fsyncObjectFiles in favor of core.fsync
with 2.36.0+, while GIT_TEST_FSYNC was added in 2.35.0. So
use the environment variable since it's been supported slightly
longer than the new configuration knob.
|
|
Tested manually on a newish project I'm working on.
|
|
Some of these scalar buffers may be large patches, so try
to keep them as short-lived as possible to reduce memory
pressure.
|
|
While both git and libgit2 take around 16 minutes to load 100K
alternates there's already a proposed patch to make git faster:
<https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/>
It's also easier to patch and install git locally since the
git.git build system defaults to prefix=$HOME and dealing with
dynamic linking with libgit2 is more difficult for end users
relying on Inline::C.
libgit2 remains in use for the non-ALL.git case, but maybe it's
not necessary (libgit2 is significantly slower than git in
Debian 10 due to SHA-1 collision checking).
|
|
File::Temp only requires four 'X' characters (unlike mkstemp(3),
which requires six). So only so only give it 4 to avoid an
80-column violation and maybe save metadata space on FSes.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Using "eidx_key:" boolean prefix to limit results to a given
inbox, we can use ->ALL to emulate and replace per-Inbox
xap15/[0-9] search indices.
With this change, the presence of "extindex.all.topdir" in the
$PI_CONFIG will cause the WWW code to use that extindex and
ignore per-inbox Xapian DBs in xap15/[0-9].
Unfortunately IMAP search still requires old per-inbox indices,
for now. Mapping extindex Xapian docids to per-Inbox UIDs and
vice-versa is proving tricky. Fortunately, IMAP search is
rarely used and optional. The RFCs don't specify expensive
phrase search, either, so `indexlevel=medium' can be used in
per-inbox Xapian indices to save space.
For primarily WWW (and future JMAP) users; this should result in
significant disk space, FD, and page cache footprint savings for
large instances with many inboxes and many cross-posted
messages.
|
|
While Perl implements tail recursion via `goto' which allows
avoiding warnings on deep recursion. It doesn't (as of 5.28)
optimize the speed of such dispatches, though it may reduce
ephemeral memory usage.
Make the code less alien to hackers coming from other languages
by using normal subroutine dispatch. It's actually slightly
faster in micro benchmarks due to the complexity of `goto &NAME'.
|
|
Like the rest of the WWW code, public-inbox-httpd now uses
git_async_cat to retrieve blobs without blocking the event loop.
This improves fairness when git blobs are on slow storage and
allows us to take better advantage of SMP systems.
|
|
To avoid hogging the event loop in public-inbox-httpd when
many candidate messages match, we'll separate the steps to
ensure fairness on slow storage.
|
|
With public-inbox-httpd, this mitigates the effect of slow git
blob storage with multiple coderepos configured for an inbox.
It's still synchronous for now (and may need to remain that way
for ->last_check_err), but no longer monopolizes the event loop
when checking multiple coderepos.
We don't yet support multi-inbox scanning, yet; but this also
prepares us for a future where we do.
We'll also support >=40 char blob OIDs in preparation for future
git SHA-256 support, too.
|
|
With Perl upstream preparing to deprecate things, we'll move
towards only enabling warnings during development via shebang
and stop enabling them via "use".
We'll also favor "use v5.10.1" over the Perl 5.6-compatible "use
5.010_001", since our code base never worked on 5.6.
Finally, were also importing SEEK_SET without using it, just use it
for readability since we can't avoid loading Fcntl in other
places and it'll get constant-folded, anyways.
|
|
Nearly all of the search uses in the production code rely on
a Xapian mset iterator being returned (instead of an array
of $smsg objects). So default to returning the mset and move
the burden of smsg array conversion into the test cases.
|
|
The resulting OID ("oid_b") is a required arg and part of
$env->{PATH_INFO}, instead; so it's never part of an optional
query parameter.
|
|
The goal of this is to eventually remove the $smsg->{mime} field
which is easy-to-misuse and cause memory explosions which
necessitated fixes like commit 7d02b9e64455831d
("view: stop storing all MIME objects on large threads").
|
|
The reliance on Email::MIME->subparts is a tad inefficient with
a work-in-progress module to replace Email::MIME. So move
towards using ->each_part as a class-specific iterator which can
take advantage of more class-specific optimizations in the
yet-to-be-revealed PublicInbox::Eml and PublicInbox::Gmime
classes.
The msg_iter() sub remains for compatibility with existing
3rd-party scripts/modules which use our small public Perl API
and Email::MIME.
|
|
Since the introduction of over.sqlite3, SearchMsg is not tied to
our search functionality in any way, so stop confusing ourselves
and future hackers by just calling it "PublicInbox::Smsg".
Add a missing "use" in ExtMsg while we're at it.
|
|
We want WWW->preload to get as many immortal allocations done
as possible, and the `state' feature from Perl 5.10 prevents that.
|
|
I didn't wait until September to do it, this year!
|
|
It seems to make sense to the target audience that any of
the URLs displayed could work.
|
|
While both can be correct, the former seems more common,
is shorter, and is also consistent with the spelling found
in the AGPL-3.0 text.
|
|
This avoids uninitialized variable warnings when viewing
newly-created files.
|
|
We're often iterating through messages while writing to another
buffer in our WWW interface, causing memory usage to multiply.
Since we know we won't need to keep the MIME object around in
some cases, and can tell msg_iter to clobber the on-stack
variable while it operates on subparts of multipart messages.
With xt/mem-msgview.t switched to multipart from the previous
commit, this shows a 13 MB memory reduction on that test.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
While filenames are escaped, the actual diff contents may
contain an unescaped "\r" carriage return byte not in front
of the "\n" line feed. So just allow "\r" to appear in the
middle of a line.
|
|
Initialize the $di hashref at use to make it more obvious it's
a local variable. We can also use the :utf8 IO layer via
open+print to save ourselves the trouble of converting the UTF-8
patch to an octet stream.
|
|
This is needed to work with patches with many renames,
such as what makes "git/eebf7a8/s/?b=t%2Ftest-lib.sh"
|
|
solver can spawn multiple processes per HTTP request, but
"git apply" failures are needlessly noisy due to corrupt
patches. We also don't want to silence "git ls-files"
or "git update-index" errors using $env->{'qspawn.quiet'},
either, so this granularity is needed.
Admins can check for 500 errors in access logs to detect
(and reproduce) solver failures, anyways, so there's no
need to log every time "git apply" rejects a corrupt patch.
|
|
Rewrite the patch extraction loop using a single regexp which
accounts for missing "diff --git ..." lines and is capable of
extracting pathnames off the "+++ b/foo" line.
This fixes the solving of blob "96f1c7f" off
<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com>
in git@vger archives.
v2:
* Fix regressions in git@vger archives:
- git/776fa90f7f/s/?b=contrib/git-jump/git-jump
(fallback to "old mode" properly)
- git/5cd8845/s/?b=submodule.c
(no leading space in context)
* use "state" in a Perl <5.28.0-compatible way
|
|
Sometimes a patch is corrupted and resent to create the same
OID. We need to account for that case and actually move onto
the next patch instead of blindly trying "git ls-files" to get
nothing out of it.
|
|
This simplifies our admin module a bit and allows solver to be
used with v1 inboxes using git versions prior to v1.8.5 (but
still >= git v1.8.0).
|
|
We can save callers the trouble of {-hold} and {-dev_null}
refs as well as the trouble of calling fileno().
|
|
This allows us to get rid of the requirement to capture
on-stack variables with an anonymous sub, as illustrated
with the update to viewvcs to take advantage of this.
v2: fix error handling for missing OIDs
|
|
And remove the last anonymous sub in SolverGit itself.
|