From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: Re: trying to figure out 100% CPU usage in nntpd...
Date: Tue, 10 Sep 2019 18:12:24 +0000 [thread overview]
Message-ID: <20190910181224.urhyoo6av7mhjs67@dcvr> (raw)
In-Reply-To: <20190910083820.GA8018@pure.paranoia.local>
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Sep 09, 2019 at 05:53:41PM +0000, Eric Wong wrote:
> > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > > There also was a weird problem a couple of days ago where one of the
> > > httpd daemons started returning "Internal Server Error" to all requests.
> > > Restarting public-inbox-httpd fixed the problem, but I am not sure how I
> > > would troubleshoot the causes if it happens next time -- is there a way
> > > to enable error logging for the httpd daemon?
> >
> > That's a new one... I haven't seen any problems from -httpd myself
> > in ages. So -httpd could not handle requests at all?
> >
> > The daemons already spits errors to stderr which typically ends up
> > in syslog via systemd. So, that's the first place to look
> > (also "systemctl status $SERVICE"); anything in there?
>
> Bah, I hadn't even considered looking in syslog. I blame jet lag. Well,
> the explanation is simple, then:
>
> Sep 7 22:22:35 lore.kernel.org public-inbox-httpd: pipe failed: Too many open files at /usr/local/share/perl5/PublicInbox/Git.pm line 210.
> Sep 7 22:22:36 lore.kernel.org public-inbox-httpd: pipe: Too many open files
> Sep 7 22:23:01 lore.kernel.org public-inbox-httpd: pipe: Too many open files
> Sep 7 22:23:02 lore.kernel.org public-inbox-httpd: pipe: Too many open files
> Sep 7 22:23:02 lore.kernel.org public-inbox-httpd: pipe: Too many open files
>
> I see that there's already a LimitNOFILE=30000 -- is exhausting 30,000
> open files a reasonable sort of thing for a system as busy as
> lore.kernel.org, or should we look for a leak. Currently:
>
> publici+ 13125 1 0 52297 25096 0 Sep09 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -W4 -1 /var/log/public-inbox/httpd.out.log
> publici+ 13128 13125 1 200056 591376 4 Sep09 ? 00:21:06 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -W4 -1 /var/log/public-inbox/httpd.out.log
> publici+ 13129 13125 1 117789 262000 2 Sep09 ? 00:16:09 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -W4 -1 /var/log/public-inbox/httpd.out.log
> publici+ 13130 13125 0 103569 204772 9 Sep09 ? 00:08:06 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -W4 -1 /var/log/public-inbox/httpd.out.log
> publici+ 13131 13125 0 101510 199268 1 Sep09 ? 00:03:33 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -W4 -1 /var/log/public-inbox/httpd.out.log
Overall limit of open files could be higher (just limited by
RAM) but the balance of pipes/unlinked/other (sockets) seems way off.
I wonder if it's some bad interaction with hitting
pipe-user-pages-soft (see bottom), too...
> # ls /proc/13125/fd/ | wc -l
> 9
> # ls /proc/13128/fd/ | wc -l
> 17776
> # ls /proc/13129/fd/ | wc -l
> 3937
> # ls /proc/13130/fd/ | wc -l
> 648
> # ls /proc/13131/fd/ | wc -l
> 68
OK, there is correlation between CPU time and open FDs.
The poor load distribution (based on CPU time) between processes
sucks, but that's an unrelated problem[1]
> 17776 since Sep 09 seems... fairly excessive. Most of them are pipes:
>
> # lsof -p 13128 | grep pipe | wc -l
> 14741
>
> a bunch of them are to (deleted) entries in /tmp:
>
> # lsof -p 13128 | grep deleted | wc -l
> 3015
>
> (e.g.):
>
> public-in 13128 publicinbox 45u REG 259,4 984 4200094 /tmp/PerlIO_yd43mN (deleted)
> public-in 13128 publicinbox 46r FIFO 0,9 0t0 21818751 pipe
> public-in 13128 publicinbox 47u REG 259,4 984 4200096 /tmp/PerlIO_siY3Pb (deleted)
> public-in 13128 publicinbox 48r FIFO 0,9 0t0 21818758 pipe
> public-in 13128 publicinbox 49u REG 259,4 984 4200100 /tmp/PerlIO_WyIaaC (deleted)
>
> It does seem like there's perhaps a leak somewhere?
Probably. Not seeing any of that on my (smaller) instances;
but those -httpd haven't been restarted in weeks/months.
The "PerlIO_" prefix is created from open(..., '+>', undef), so
it either has to be for:
1. POST bodies for git http-backend
1. git cat-file --batch-check stderr
3. experimental ViewVCS|SolverGit which isn't configured on lore :)
= git-http-backend
Looking at the size (984), PerlIO_* prefix and proximity of FD
numbers, I think those lines above are git-http-backend POST body.
Any git-http-backend stuck from people fetching/cloning?
If so, fd=0 of those should correspond to the inodes of those
unlinked PerlIO files.
This is -httpd writing to varnish, still, right?
Anything from varnish error logs?
3K git-http-backend is a lot, the Qspawn mechanism
limits it to 32 processes by default (same as git-daemon),
so it shouldn't hit RLIMIT_NOPROC. Qspawn also won't create
the pipe until right before spawning git-http-backend.
The unlinked file is created once the POST request hits.
And 3K git-http-backend would only be a 1:1 ratio of pipes to
unlinked files, though, so that's over 10K pipes still
unaccounted for...
= git cat-file
There can be two cat-file processes (--batch and --batch-check)
for every inbox in every -httpd worker.
Every cat-file process uses 2 pipes (public-inbox closes the
ends it hands to git), and there's --batch-check ones which use
an additional unlinked regular file for stderr.
So: 80 inboxes * 2 cat-file * 2 pipes is only 320 pipes in -httpd
And not sure if 160 cat-file processes would cause RLIMIT_NPROC
problems, probably not...
Fwiw, while the new DS code also uses temporary files for output
buffering, those have the "wbuf-" prefix and not PerlIO_* since
'open(FH, "+>>", undef)' didn't work until recently:
https://rt.perl.org/Ticket/Display.html?id=134221
I've been meaning to expand dtas-graph to include
temporary file and socket connectivity, too, but it's pretty
useful at graphing pipe connectivity between processes:
https://80x24.org/dtas-all/b243086/s/?b=perl/dtas-graph
And it can help find unconnected pipes.
= /proc/sys/fs/pipe-user-pages-soft
I also just remembered this knob, which you could hit from just
cat-file, and I seem to remember some bugs in older kernels when
pipe limits are hit.
for httpd with 80 inboxes:
160 pipes * 4 workers * 16 pages = 10240 pages for --batch
160 pipes * 4 workers * 1 pages = 640 pages for --batch-check
That's already 10880 pages for cat-file alone despite only
F_SETPIPE_SZ-ing to a single page for batch-check (default is 16
pages for 65536 bytes).
And there's still pipes for nntpd (not sure how many workers
there), and various internal self-pipe stuff (single-page), some
of which can be made eventfd...
/proc/sys/fs/pipe-user-pages-soft seems to default to 16384
regardless of system RAM, and I've needed to increase it in other
places for different projects.
For anybody using SQLite, I could see about supporting a single
cat-file process in a repo where every inbox is an alternate of
a main one...
[1] maybe a multiaccept knob can improve load distribution:
https://public-inbox.org/meta/20190513025610.30378-1-e@80x24.org/
next prev parent reply other threads:[~2019-09-10 18:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-08 10:45 [PATCH] nntp: regexp always consumes rbuf if "\n" exists Eric Wong
2019-09-08 10:52 ` trying to figure out 100% CPU usage in nntpd Eric Wong
2019-09-09 10:05 ` Konstantin Ryabitsev
2019-09-09 17:53 ` Eric Wong
2019-09-10 8:38 ` Konstantin Ryabitsev
2019-09-10 18:12 ` Eric Wong [this message]
2019-09-11 2:22 ` httpd 502s [was: trying to figure out 100% CPU usage in nntpd...] Eric Wong
2019-09-11 10:24 ` Konstantin Ryabitsev
2019-09-11 17:12 ` Eric Wong
2019-09-11 17:36 ` Konstantin Ryabitsev
2019-09-12 0:05 ` Eric Wong
2019-09-12 2:49 ` Eric Wong
2019-09-12 8:35 ` Eric Wong
2019-09-12 11:37 ` Konstantin Ryabitsev
2019-09-13 3:12 ` Eric Wong
2019-09-13 7:03 ` Eric Wong
2019-09-13 9:01 ` Eric Wong
2019-09-13 18:07 ` Konstantin Ryabitsev
2019-09-14 5:25 ` Eric Wong
2019-09-11 9:44 ` trying to figure out 100% CPU usage in nntpd Konstantin Ryabitsev
2019-09-11 17:12 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190910181224.urhyoo6av7mhjs67@dcvr \
--to=e@80x24.org \
--cc=konstantin@linuxfoundation.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).