From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 105151F990; Fri, 7 Aug 2020 13:13:10 +0000 (UTC) Date: Fri, 7 Aug 2020 13:13:09 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: [PATCH 1/5] v2writable: fix batch size accounting Message-ID: <20200807131309.GA3710@dcvr> References: <20200807105218.16843-1-e@yhbt.net> <20200807105218.16843-2-e@yhbt.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200807105218.16843-2-e@yhbt.net> List-Id: Eric Wong wrote: > We need to account for whether shard parallelization is > enabled or not, since users of parallelization are expected > to have more RAM. > --- > lib/PublicInbox/V2Writable.pm | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm > index a029fe4c..03320b9c 100644 > --- a/lib/PublicInbox/V2Writable.pm > +++ b/lib/PublicInbox/V2Writable.pm > @@ -152,6 +152,12 @@ sub add { > $self->{ibx}->with_umask(\&_add, $self, $eml, $check_cb); > } > > +sub batch_bytes ($) { > + my ($self) = @_; > + $self->{parallel} ? $PublicInbox::SearchIdx::BATCH_BYTES > + : $PublicInbox::SearchIdx::BATCH_BYTES * $self->{shards}; > +} Oops, that was backwards :x Will squash this in: diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 03320b9c0..f7a318e5b 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -154,8 +154,8 @@ sub add { sub batch_bytes ($) { my ($self) = @_; - $self->{parallel} ? $PublicInbox::SearchIdx::BATCH_BYTES - : $PublicInbox::SearchIdx::BATCH_BYTES * $self->{shards}; + ($self->{parallel} ? $self->{shards} : 1) * + $PublicInbox::SearchIdx::BATCH_BYTES; } # indexes a message, returns true if checkpointing is needed > sub do_idx ($$$$) { > my ($self, $msgref, $mime, $smsg) = @_; > @@ -160,7 +166,7 @@ sub do_idx ($$$$) { > my $idx = idx_shard($self, $smsg->{num} % $self->{shards}); > $idx->index_raw($msgref, $mime, $smsg); > my $n = $self->{transact_bytes} += $smsg->{raw_bytes}; > - $n >= ($PublicInbox::SearchIdx::BATCH_BYTES * $self->{shards}); > + $n >= batch_bytes($self); > } ...Because the old code always assumed parallel shards (even with --jobs=0). > sub _add { > @@ -1195,7 +1201,7 @@ sub index_xap_step ($$$;$) { > my $ibx = $self->{ibx}; > my $all = $ibx->git; > my $over = $ibx->over; > - my $batch_bytes = $PublicInbox::SearchIdx::BATCH_BYTES; > + my $batch_bytes = batch_bytes($self); > $step //= $self->{shards}; > my $end = $sync->{art_end}; > if (my $pr = $sync->{-opt}->{-progress}) { And the new index_xap_step was not designed for parallel operation, initially.