Date | Commit message (Collapse) |
|
Write barriers can take a long time to finish, especially when
commands are issues in parallel. So handle it asynchronously
without blocking lei-daemon by making EOFpipe a little more
flexible by supporting arguments to the callback function.
This is another step towards improving parallel use of lei.
|
|
Schedule a timer to stop shard workers and the git-cat-file
process after a `barrier' command. This allows us to save some
memory again when the lei-daemon is idle but preserves the fork
overhead reduction when issuing many commands in parallel or in
quick succession.
|
|
barrier (synchronous checkpoint) is better than ->done with
parallel lei commands being issued (via '&' or different
terminals), since repeatedly stopping and restarting processes
doesn't play nicely with expensive tasks like `lei reindex'.
This introduces a slight regression in maintaining more
processes (and thus resource use) when lei is idle, but that'll
be fixed in the next commit.
|
|
Since data going to git is the most important, always ensure
data is written to git before attempting to write anything to
SQLite or Xapian.
|
|
Noticed while working on other things...
Fixes: 299aac294ec3 (lei: do label/keyword parsing in optparse, 2023-10-02)
|
|
We shouldn't attempt to reap a process again after it's been
reaped asynchronously in the SIGCHLD handler. Noticed while
working on changes to get lei/store to use checkpointing.
|
|
This should improve `lei blob' and `lei rediff' functionality
for folks relying on `lei index' and allows future work to
improve parallelism via checkpointing in lei/store.
|
|
This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..."
added last year to support per-thread searches
764035c83 (www: support POST /$INBOX/$MSGID/?x=m&q=, 2023-03-30)
This only supports instances of public-inbox since 764035c83,
but unfortunately there hasn't been a release since then.
|
|
Noticed while trying to make other reliability improvements to
lei...
|
|
By reducing internal event loop iterations, this brings 300+
inboxes down ~32ms to ~27ms. It should also be more consistent
on servers with busy event loops since all the Xapian DB traffic
happens at once, theoretically mproving cache utilization.
|
|
This fixes compile errors on platforms we can't explicitly
support from pure Perl due to the lack of syscall stability
guarantees by the OS developers.
Reported-by: Gaelan Steele <gbs@canishe.com>
Tested-by: Gaelan Steele <gbs@canishe.com>
|
|
There are still some places where on_destroy isn't suitable,
This gets rid of getpid() calls in most of those cases to
reduce syscall costs and cleanup syscall trace output.
|
|
getpid() isn't cached by glibc nowadays and system calls are
more expensive due to CPU vulnerability mitigations. To
ensure we switch to the new semantics properly, introduce
a new `on_destroy' function to simplify callers.
Furthermore, most OnDestroy correctness is often tied to the
process which creates it, so make the new API default to
guarded against running in subprocesses.
For cases which require running in all children, a new
PublicInbox::OnDestroy::all call is provided.
|
|
PID guards for OnDestroy will be the default in an upcoming
change. In the meantime, LeiMirror was the only user and
didn't actually need it.
|
|
While PublicInbox::Config is responsible for some instances of
setting $git->{nick}, more PublicInbox::Git objects may be
created from loading the cindex and we should do our best to
reuse that memory, too.
Followup-to: 84ed7ec1c887 (dedupe inbox names, coderepo nicks + git dirs, 2024-03-04)
|
|
With my current mirror of lore + gko, this saves over 300K
allocations and brings the allocation count in this area down
to under 5K. The reduction in AV refs saves around 45MB RAM
according to measurements done live via Devel::Mwrap.
|
|
Wrap the entire solver command chain with a dedicated limiter.
The normal limiter is designed for longer-lived commands or ones
which serve a single HTTP request (e.g. git-http-backend or
cgit) and not effective for short memory + CPU intensive commands
used for solver.
Each overall solver request is both memory + CPU intensive: it
spawns several short-lived git processes(*) in addition to a
longer-lived `git cat-file --batch' process.
Thus running parallel solvers from a single -netd/-httpd worker
(which have their own parallelization) results in excessive
parallelism that is both memory and CPU-bound (not network-bound)
and cascade into slowdowns for handling simpler memory/CPU-bound
requests. Parallel solvers were also responsible for the
increased lifetime and frequency of zombies since the event loop
was too saturated to reap them.
We'll also return 503 on excessive solver queueing, since these
require an FD for the client HTTP(S) socket to be held onto.
(*) git (update-index|apply|ls-files) are all run by solver and
short-lived
|
|
Fortunately, this only affects `--multi-accept=' users, with
`--multi-accept=-1' users getting infinite loops.
I noticed this when EMFILE was reached on my setup, but any
error should cause us to give up accept(2) (at least
temporarily) and allow work for other items in the event loop to
be processed.
|
|
We must chomp the newline in the branch name if it's set.
Reported-by: Rob Herring <robh@kernel.org>
Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
Fixes: 73830410e4336b77 (treewide: use run_qx where appropriate, 2023-10-27)
|
|
This allows accurate reporting of the error location and can be
made to dump a Perl backtrace via PERL5OPT='-MCarp=verbose'.
Noticed while tracking down fast-import failures.
Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
|
|
Noticed while tracking down fast-import crash bug report.
Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
|
|
Inbox names, coderepo nicks, git_dir values are used heavily
as hash keys by the read-only coderepo WWW pieces.
Relying on CoW for mutable scalars on newer Perl doesn't work
well since CoW for those scalars are limited to 256 CoW references
and blow past that number when mapping thousands of coderepos
and inboxes to each other. Instead, make the hash key up-front
and get the resulting string to point directly to the pointer
used by the hash key.
|
|
It's not really relevant at the moment, but a sufficiently
smart implementation could eventually save some memory here.
Perl already optimizes in-place sort (@x = sort @x), so there's
precedent for a potential future where a Perl implementation
could generally optimize in-place operations for non-builtin
subroutines, too.
|
|
Repeatedly allocating an anonymous sub is an expensive operation
and a potential source of leaks in older Perl. Instead,
`local'-ize a global and use a permanent sub to workaround the
old Encode 2.87..3.12 leak.
|
|
We are just using the odd ref+deref (`${\...}') syntax and
don't need to calculate line numbers ourselves, nowadays.
|
|
While fast build times from -O0 is critical to my sanity when
actively working on C++, the files installed via package
managers or `make install' aren't likely to change frequently.
In that case, expensive -O2 optimizations make sense since the
10-20s saved from a single large --join more than covers the
cost of waiting on g++ to optimize.
|
|
If publicinbox.cgitrc is set in the config file, we'll ensure
cgit sees it as CGIT_CONFIG since the configured
publicinbox.cgitrc knob may not be the default path the cgit.cgi
binary was configured to use.
Furthermore, we'll respect CGIT_CONFIG in the environment if
publicinbox.cgitrc is unset in the config file at -httpd/-netd
startup.
|
|
The "patch is too large to show" text is now broken by an <hr>
to prevent it from being confused as part of a commit message
(or having somebody intentionally insert that text in a commit
message to confuse readers). A missing </pre> is also necessary
before the <hr> tag for the related commit search form.
|
|
Similar to commit cbe2548c91859dfb923548ea85d8531b90d53dc3
(www_coderepo: use OnDestroy to render summary view,
2023-04-09), we can rely on OnDestroy and Qspawn to run
dependencies in a structured way and with some extra parallelism
for SMP users.
Perl (as opposed to POSIX sh) allows us to easily avoid
expensive patch generation for large root commits, and also avoid
needless `git patch-id' invocations for patches which are too
big to show.
Avoiding patch-id alone saved nearly 2s from the linux.git root
commit[1] with patch generation enabled and brought response
times down to ~6s (still slow). Avoiding patch generation for
root commits brings it down to a few hundred milliseconds on a
public-facing server (nobody wants a 355MB patch rendered as
HTML, right?).
[1] torvalds/linux.git 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
|
|
SIGPIPE (13) can be quite common with unreliable connections
and impatient clients, so just ignore them.
|
|
Štěpán Němec <stepnem@smrk.net> wrote:
> Eric Wong wrote:
> > Subject: [PATCH] view: decode In-Reply-To comments added by Gnus
> Or just "some MUAs"? Who knows who else...
Yeah, I wouldn't be surprised if there were more...
---8<---
Subject: [PATCH] view: decode In-Reply-To comments added by some MUAs
Emacs-based MUAs (e.g. Gnus and rmail) can do it, and maybe
some others, too. I noticed it in
<https://yhbt.net/lore/git/xmqqr0ho9oi9.fsf@gitster.g/>
while scanning for something else.
|
|
Setting $SIG{__WARN__} at the top-level no longer has any effect
since we localize $SIG{__WARN__} when entering ->event_step on
a per-listener basis.
Fixes: 60d262483a4d (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08)
|
|
The packaged Perl on OpenBSD i386 supports 64-bit file offsets
but not 64-bit integer support for 'q' and 'Q' with `pack'.
Since servers aren't likely to require lock files larger than
2 GB (we'd need an inbox with >2 billion messages), we can
workaround the Perl build limitation with explicit padding.
File::FcntlLock isn't packaged for OpenBSD <= 7.4 (but should be
in future releases), but I can test i386 OpenBSD on an extremely
slow VM.
Big endian support can be done, too, but I have no idea if
there's 32-bit BE users around nowadays...
|
|
MH sequence numbers can be analogous to IMAP UIDs and NNTP
article numbers (or more like IMAP MSNs with clients which
pack). In any case, sort then numerically by default to avoid
surprising users who treat NNTP spools and mlmmj archives as MH
folders. This gives more coherent git history and resulting
NNTP/IMAP numbering when round-tripping MH -> v2 -> (NNTP|IMAP) -> MH
|
|
We don't need multiple `use PublicInbox::IO' statements to
import a subroutine.
|
|
LeiToMail can't sort v2 output, but sorting MH input (and
NNTP spool + mlmmj archives) numerically makes sense.
|
|
I can't reproduce this in t/lei-sigpipe.t with GIANT_INBOX_DIR.
In real-world usage, having a large `lei q -f text ...' output
piped to a pager and killing the pager prematurely could
trigger:
non-fatal error from PublicInbox::LeiToMail $?=256
messages in my terminal. This is because $self->{lei} was
becoming undefined in the process cleanup process of
git_to_mail. So flip the cleanup logic around and
unconditionally check for Git::cleanup state to bail out
early.
With this change, the `non-fatal error ...' message no longer
appears when I stop reading results early.
|
|
BSD::Resource isn't packaged for Alpine (as of 3.19), but we
also have optional Inline::C support and already rely on calling
setrlimit(2) directly from the Inline::C version of pi_fork_exec.
|
|
The good news (compared to lei) is we only have to worry about
imports and don't care about the filename nor keywords, so it's
immune to .mh_sequences writing inconsistencies across MH
implementations and sequence number packing.
We still assume the writer will write the mail file with one of:
* rename(2) to create the final sequence number filename
* a single write(2) if not relying on rename(2)
mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py
may, I'm not sure...
|
|
While syscall symbols (e.g. SYS_*) have changed on us in FreeBSD
during the history of Sys::Syscall and this project and did bite
us in some cases; the actual numbers don't get recycled for new
syscalls. We're also fortunate that sendmsg and recvmsg syscalls
and associated msghdr and cmsg structs predate the BSD forks and
are compatible across all the BSDs I've tried.
OpenBSD routes Perl `syscall' through libc; while NetBSD + FreeBSD
document procedures for maintaining backwards compatibility.
It looks like Dragonfly follows FreeBSD, here.
Tested on i386 OpenBSD, and amd64 {Free,Net,Open,Dragonfly}BSD
This enables *BSD users to use lei, -cindex and future SCM_RIGHTS-only
features without needing Inline::C.
[1] https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl
[2] https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning
[3] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
|
|
Sys::Syscall needs separate patches anyways (if it ever gets
updated), and having a mix of indentation styles in our codebase
gets confusing. We'll also update cfarm-related comments for
the current URL.
|
|
This makes the new endpoints easier-to-find. The navigation is
still at the bottom of the page since I figured having it at the
top is too cluttered for users on small terminals.
|
|
We can rely on SQLite to map `MAX(ds)' to `ds' rather than
doing it in Perl, reducing the size of our Perl optree at the
(smaller) expense of SQLite bytecode.
|
|
This can make it easier to find deeply-nested repositories on my
mirror of git.kernel.org. It's not perfect, since projects like
Linux use several completely different basenames (e.g. linux.git
vs vfs.git vs net.git), but it can still help find significant
matches further up a tree.
I don't expect glob characters to conflict with actual git
repositories used by reasonable people, but direct (non-glob)
hits are still tried first.
|
|
Noticed while adding wildcard support to WwwCoderepo...
|
|
We don't need 404s for non-existent coderepos creating fake
(and invalid) entries. I noticed this while working on
subsequent changes to support globbing in URLs.
|
|
We moved to PublicInbox::Eml a while back and have no plans
to go back to using Email::MIME, so don't tempt users and
packagers to waste disk space on Email::MIME.
|
|
Users may accidentally or unknowingly write `mbox' and not know
we support 4 incompatible mbox variants.
|
|
Showing absolutely nothing when hitting a server requiring
authentication is a very bad user experience. While we're
at it, use Net::Cmd->message in more places where we experience
failure, too.
|
|
Clearly this was never tested until now, as passwords being
retrieved by git-credential got completely ignored and unused.
This enables users to connect to NNTP(S) servers requiring a
password.
|