From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 1E0DB1F462; Sun, 9 Jun 2019 08:39:19 +0000 (UTC) Date: Sun, 9 Jun 2019 08:39:18 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: Re: how's memory usage on public-inbox-httpd? Message-ID: <20190609083918.gfr2kurah7f2hysx@dcvr> References: <20181201194429.d5aldesjkb56il5c@dcvr> <20190606190455.GA17362@chatter.i7.local> <20190606203752.7wpdla5ynemjlshs@dcvr> <20190606214509.GA4087@chatter.i7.local> <20190606221009.y4fe2e2rervvq3z4@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190606221009.y4fe2e2rervvq3z4@dcvr> List-Id: Eric Wong wrote: > Without concurrent connections; I can't see that happening > unless there's a single message which is gigabytes in size. I'm > already irked that Email::MIME requires slurping entire emails > into memory; but it should not be using more than one > Email::MIME object in memory-at-a-time for a single client. Giant multipart messages do a lot of damage. Maybe concurrent clients hitting the same endpoints will do more damage. Largest I see in LKML is 7204747 bytes (which is frightening). That bloats to 21626795 bytes when parsed by Email::MIME. I thought it was bad enough that all Perl mail modules seem to require slurping 7M into memory... -------8<-------- use strict; use warnings; require Email::MIME; use bytes (); use Devel::Size qw(total_size); my $in = do { local $/; }; print 'string: ', total_size($in), ' actual: ', bytes::length($in), "\n"; print 'MIME: ', total_size(Email::MIME->new(\$in)), "\n"; -------8<-------- That shows (on amd64): string: 7204819 actual: 7204747 MIME: 21626795 Maybe you have bigger messages outside of LKML. This prints all objects >1MB in a git dir: git cat-file --buffer --batch-check --batch-all-objects \ --unordered | awk '$3 > 1048576 { print }' And I also remember you're supporting non-vger lists where HTML mail is allowed, so that can't be good for memory use at all :< Streaming MIME handling has been on the TODO for a while, at least... |* streaming Email::MIME replacement: currently we generate many | allocations/strings for headers we never look at and slurp | entire message bodies into memory. | (this is pie-in-the-sky territory...)