From: Eric Wong <email@example.com> To: Konstantin Ryabitsev <firstname.lastname@example.org> Cc: email@example.com Subject: Re: how's memory usage on public-inbox-httpd? Date: Thu, 6 Jun 2019 20:37:52 +0000 [thread overview] Message-ID: <20190606203752.7wpdla5ynemjlshs@dcvr> (raw) In-Reply-To: <20190606190455.GA17362@chatter.i7.local> Konstantin Ryabitsev <firstname.lastname@example.org> wrote: > Hello: > > This is an old-ish discussion, but we finally had a chance to run the httpd > daemon for a long time without restarting it to add more lists, and the > memory usage on it is actually surprising: Thanks for getting back to this. > $ ps -eF | grep public-inbox > publici+ 17741 1 0 52667 24836 8 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 17744 17741 0 69739 90288 9 May24 ? 00:38:43 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 18273 1 0 52599 23832 9 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > publici+ 18275 18273 4 5016115 19713872 10 May24 ? 13:59:13 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > > You'll notice that process 18275 has been running since May 24 and takes up > 19GB in RSS. This is a 16-core 64-GB system, so it's not necessarily super > alarming, but seems large. :) Yes, it's large and ugly :< I don't even have 19GB and even 90MB RSS worries me. Do you have commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5? ("view: stop storing all MIME objects on large threads") That was most significant. Also, it looks like you've yet to configure the wacky coderepo+solver stuff, so that's not a culprit... Otherwise it's probably a combination of several things... httpd and nntpd both supports streaming, arbitrarily large endpoints (all.mbox.gz, and /T/, /t/, /t.mbox.gz threads with thousands of messages, giant NNTP BODY/ARTICLE ranges). All those endpoints should detect backpressure from a slow client (varnish/nginx in your case) using the ->getline method. gzip (for compressed mbox) also uses truckload of memory and I would like to add options to control zlib window sizes to reduce memory use (at the cost of less compression). nginx has these options, too, but they're not documented AFAIK. For the case of varnish/nginx or whatever's in front of it not keeping up... the old design choice of Danga::Socket (now inherited to PublicInbox::DS) made it buffer slow client data to RAM, which doesn't make sense to me... I prefer buffering to the FS (similar to nginx/varnish) to avoid malloc fragmentation and also to avoid delaying the extra kernel-to-user copy if using sendfile. By default, glibc malloc is really adverse to releasing memory back to the OS, too. It's fast in benchmarks that way; (until the system starts swapping and slowdowns cascade to failure). I'm also unsure about malloc fragmentation behavior at such sizes and how it hurts locality. So my preference is to avoid putting big objects into heap and let the kernel/FS deal with big buffers. httpd/nntpd both try to avoid buffering at all with the backpressure handling based on ->getline; but sometimes it's not effective enough because some big chunks still end up in heap. In any case, you can safely SIGQUIT the individual worker and it'll restart gracefully w/o dropping active connections. Also, are you only using the default of -W/--worker-process=1 on a 16-core machine? Just checked public-inbox-httpd(8), the -W switch is documented :) You can use SIGTTIN/TTOU to increase, decrease workers w/o restarting, too. nntpd would have the same problem if people used it more; but at the moment it doesn't do gzip. I'm happy to see it's at least gotten some traffic :) > Is that normal, and if not, what can I do to help troubleshoot where it's > all going? There's definitely some problems with big threads, giant messages and gzip overhead. I was looking into a few big threads earlier this year but forgot the Message-IDs :x Do you have any stats on the number of simultaneous connections public-inbox-httpd/nginx/varnish handles (and logging of that info at peek)? (perhaps running "ss -tan" periodically)(*) Are you using the Plack::Middleware::Deflater endpoint in PSGI? Removing it and doing gzip in varnish/nginx may be a little faster since it can utilize multiple cores, but at higher IPC cost. I've gotten rid of the annoying warning for that middleware install as a result... But gzipped mboxes has the same problem; though; so adding zlib window-size options would be necessary... So I think supporting buffer-to-FS behavior in ::DS along with gzip options should alleviate much of the memory use. But immediately you can increase worker process counts to distribute the load between cores a bit... I've also tried nicing down nginx/varnish so they're prioritized by the kernel and don't bottleneck -httpd. Makes sense to me in theory but I was also making a lot of changes around the same time to reduce httpd memory use. Limiting HTTP endpoint response size isn't a real option to protect the server; IMHO, because NNTP requires supporting giant responses anyways. (*) I did "raindrops" with Ruby+C back in the day but haven't really looked at it in ages, and I don't think the IPv6 counting was accurate <https://raindrops-demo.bogomips.org/> That's -httpd on :280
next prev parent reply other threads:[~2019-06-06 20:37 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-01 19:44 Eric Wong 2019-06-06 19:04 ` Konstantin Ryabitsev 2019-06-06 20:37 ` Eric Wong [this message] 2019-06-06 21:45 ` Konstantin Ryabitsev 2019-06-06 22:10 ` Eric Wong 2019-06-06 22:19 ` Konstantin Ryabitsev 2019-06-06 22:29 ` Eric Wong 2019-06-10 10:09 ` [RFC] optionally support glibc malloc_info via SIGCONT Eric Wong 2019-06-09 8:39 ` how's memory usage on public-inbox-httpd? Eric Wong 2019-06-12 17:08 ` Eric Wong 2019-06-06 20:54 ` Eric Wong 2019-10-16 22:10 ` Eric Wong 2019-10-18 19:23 ` Konstantin Ryabitsev 2019-10-19 0:11 ` Eric Wong 2019-10-22 17:28 ` Konstantin Ryabitsev 2019-10-22 19:11 ` Eric Wong 2019-10-28 23:24 ` Eric Wong
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://public-inbox.org/README * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190606203752.7wpdla5ynemjlshs@dcvr \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: how'\''s memory usage on public-inbox-httpd?' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/public-inbox.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).