From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 331E920248; Wed, 27 Feb 2019 20:25:36 +0000 (UTC) Date: Wed, 27 Feb 2019 20:25:36 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: [PATCH] v2writable: fix epoch rollover on incremental imports Message-ID: <20190227202536.uvlvggnt4bwtgx5h@dcvr> References: <20190212191116.GA8720@chatter.qube.local> <20190212192715.mtncyagjzusi3vtp@dcvr> <20190227002204.GA17917@dcvr> <20190227132604.GA24441@pure.paranoia.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190227132604.GA24441@pure.paranoia.local> List-Id: Konstantin Ryabitsev wrote: > On Wed, Feb 27, 2019 at 12:22:04AM +0000, Eric Wong wrote: > > Eric Wong wrote: > > > Konstantin Ryabitsev wrote: > > > > Eric: > > > > > > > > I noticed today that the LKML shard 6 has grown over 1.1 GB, which is the > > > > size of other shards (0-5). I'm wondering if it will roll over to shard 7 > > > > automatically, or if there are other steps that need to be undertaken. > > > > > > It only counts bytes in *.pack files; so you might need to repack > > > (or wait for gc to run via --auto). > > > > Btw, have you checked this? I've been wondering if 7 will show up, too. > > Yeah, I've repacked it, but we still haven't rolled over to 7. This is > the latest on the server: > > $ git count-objects -v > count: 18 > size: 88 > in-pack: 1036749 > packs: 1 > size-pack: 1169104 > prune-packable: 0 > garbage: 0 > size-garbage: 0 > $ ls -al objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack > -r--r--r--. 1 archiver archiver 1168133236 Feb 27 13:10 objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack > > I'm looking at the code and I'm not entirely sure what PACKING_FACTOR > is: > > my $PACKING_FACTOR = 0.4; > ... > rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR), > > Wouldn't that give us 2.7GB? > (1024*1024*1024)/0.4 = 2,684,354,560 Yes, we do all the calculations using the estimated unpacked size. So the estimate is 2.7GB unpacked (and uncompressed) is roughly 1GB packed. > It's possible I'm not following the logic right. It looks to be the same > code that properly sharded things on the initial import, so I'm not > sure. Almost, the problem was the initial import never saw an existing git repo with data in it. The incemental -mda/-watch path failed to take into account the unpacked size of the existing data. This fixes it: ---------8<----------- Subject: [PATCH] v2writable: fix epoch rollover on incremental imports All of our internal epoch rollover calculations are done using the estimated unpacked (and uncompressed) size of the repo. The importer instance needs to check that unpacked size before selecting an epoch when an epoch already has packed data. This bug did not impact the initial mass imports since we only initialize the Import instance once-per-epoch and did not need to take existing epochs into account. Tested manually with -mda on a local clone of LKML Reported-by: Konstantin Ryabitsev --- lib/PublicInbox/V2Writable.pm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 1f17fe2..b1d8095 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -588,7 +588,9 @@ sub importer { if (defined $latest) { my $git = PublicInbox::Git->new($latest); my $packed_bytes = $git->packed_bytes; - if ($packed_bytes >= $self->{rotate_bytes}) { + my $unpacked_bytes = $packed_bytes / $PACKING_FACTOR; + + if ($unpacked_bytes >= $self->{rotate_bytes}) { $epoch = $max + 1; } else { $self->{epoch_max} = $max; -- EW