From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id C67EA1F462; Thu, 6 Jun 2019 20:37:52 +0000 (UTC) Date: Thu, 6 Jun 2019 20:37:52 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: Re: how's memory usage on public-inbox-httpd? Message-ID: <20190606203752.7wpdla5ynemjlshs@dcvr> References: <20181201194429.d5aldesjkb56il5c@dcvr> <20190606190455.GA17362@chatter.i7.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190606190455.GA17362@chatter.i7.local> List-Id: Konstantin Ryabitsev wrote: > Hello: > > This is an old-ish discussion, but we finally had a chance to run the httpd > daemon for a long time without restarting it to add more lists, and the > memory usage on it is actually surprising: Thanks for getting back to this. > $ ps -eF | grep public-inbox > publici+ 17741 1 0 52667 24836 8 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 17744 17741 0 69739 90288 9 May24 ? 00:38:43 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 18273 1 0 52599 23832 9 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > publici+ 18275 18273 4 5016115 19713872 10 May24 ? 13:59:13 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > > You'll notice that process 18275 has been running since May 24 and takes up > 19GB in RSS. This is a 16-core 64-GB system, so it's not necessarily super > alarming, but seems large. :) Yes, it's large and ugly :< I don't even have 19GB and even 90MB RSS worries me. Do you have commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5? ("view: stop storing all MIME objects on large threads") That was most significant. Also, it looks like you've yet to configure the wacky coderepo+solver stuff, so that's not a culprit... Otherwise it's probably a combination of several things... httpd and nntpd both supports streaming, arbitrarily large endpoints (all.mbox.gz, and /T/, /t/, /t.mbox.gz threads with thousands of messages, giant NNTP BODY/ARTICLE ranges). All those endpoints should detect backpressure from a slow client (varnish/nginx in your case) using the ->getline method. gzip (for compressed mbox) also uses truckload of memory and I would like to add options to control zlib window sizes to reduce memory use (at the cost of less compression). nginx has these options, too, but they're not documented AFAIK. For the case of varnish/nginx or whatever's in front of it not keeping up... the old design choice of Danga::Socket (now inherited to PublicInbox::DS) made it buffer slow client data to RAM, which doesn't make sense to me... I prefer buffering to the FS (similar to nginx/varnish) to avoid malloc fragmentation and also to avoid delaying the extra kernel-to-user copy if using sendfile. By default, glibc malloc is really adverse to releasing memory back to the OS, too. It's fast in benchmarks that way; (until the system starts swapping and slowdowns cascade to failure). I'm also unsure about malloc fragmentation behavior at such sizes and how it hurts locality. So my preference is to avoid putting big objects into heap and let the kernel/FS deal with big buffers. httpd/nntpd both try to avoid buffering at all with the backpressure handling based on ->getline; but sometimes it's not effective enough because some big chunks still end up in heap. In any case, you can safely SIGQUIT the individual worker and it'll restart gracefully w/o dropping active connections. Also, are you only using the default of -W/--worker-process=1 on a 16-core machine? Just checked public-inbox-httpd(8), the -W switch is documented :) You can use SIGTTIN/TTOU to increase, decrease workers w/o restarting, too. nntpd would have the same problem if people used it more; but at the moment it doesn't do gzip. I'm happy to see it's at least gotten some traffic :) > Is that normal, and if not, what can I do to help troubleshoot where it's > all going? There's definitely some problems with big threads, giant messages and gzip overhead. I was looking into a few big threads earlier this year but forgot the Message-IDs :x Do you have any stats on the number of simultaneous connections public-inbox-httpd/nginx/varnish handles (and logging of that info at peek)? (perhaps running "ss -tan" periodically)(*) Are you using the Plack::Middleware::Deflater endpoint in PSGI? Removing it and doing gzip in varnish/nginx may be a little faster since it can utilize multiple cores, but at higher IPC cost. I've gotten rid of the annoying warning for that middleware install as a result... But gzipped mboxes has the same problem; though; so adding zlib window-size options would be necessary... So I think supporting buffer-to-FS behavior in ::DS along with gzip options should alleviate much of the memory use. But immediately you can increase worker process counts to distribute the load between cores a bit... I've also tried nicing down nginx/varnish so they're prioritized by the kernel and don't bottleneck -httpd. Makes sense to me in theory but I was also making a lot of changes around the same time to reduce httpd memory use. Limiting HTTP endpoint response size isn't a real option to protect the server; IMHO, because NNTP requires supporting giant responses anyways. (*) I did "raindrops" with Ruby+C back in the day but haven't really looked at it in ages, and I don't think the IPv6 counting was accurate That's -httpd on :280