user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: [PATCH] v2writable: fix epoch rollover on incremental imports
Date: Wed, 27 Feb 2019 20:25:36 +0000	[thread overview]
Message-ID: <20190227202536.uvlvggnt4bwtgx5h@dcvr> (raw)
In-Reply-To: <20190227132604.GA24441@pure.paranoia.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Feb 27, 2019 at 12:22:04AM +0000, Eric Wong wrote:
> > Eric Wong <e@80x24.org> wrote:
> > > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > > > Eric:
> > > > 
> > > > I noticed today that the LKML shard 6 has grown over 1.1 GB, which is the
> > > > size of other shards (0-5). I'm wondering if it will roll over to shard 7
> > > > automatically, or if there are other steps that need to be undertaken.
> > > 
> > > It only counts bytes in *.pack files; so you might need to repack
> > > (or wait for gc to run via --auto).
> > 
> > Btw, have you checked this?  I've been wondering if 7 will show up, too.
> 
> Yeah, I've repacked it, but we still haven't rolled over to 7. This is
> the latest on the server:
> 
> $ git count-objects -v
> count: 18
> size: 88
> in-pack: 1036749
> packs: 1
> size-pack: 1169104
> prune-packable: 0
> garbage: 0
> size-garbage: 0
> $ ls -al objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack
> -r--r--r--. 1 archiver archiver 1168133236 Feb 27 13:10 objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack
> 
> I'm looking at the code and I'm not entirely sure what PACKING_FACTOR
> is:
> 
> my $PACKING_FACTOR = 0.4;
> ...
> rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR),
> 
> Wouldn't that give us 2.7GB?
> (1024*1024*1024)/0.4 = 2,684,354,560

Yes, we do all the calculations using the estimated unpacked
size.  So the estimate is 2.7GB unpacked (and uncompressed) is
roughly 1GB packed.

> It's possible I'm not following the logic right. It looks to be the same
> code that properly sharded things on the initial import, so I'm not
> sure.

Almost, the problem was the initial import never saw an existing
git repo with data in it.  The incemental -mda/-watch path
failed to take into account the unpacked size of the existing
data.

This fixes it:
---------8<-----------
Subject: [PATCH] v2writable: fix epoch rollover on incremental imports

All of our internal epoch rollover calculations are done using
the estimated unpacked (and uncompressed) size of the repo.  The
importer instance needs to check that unpacked size before
selecting an epoch when an epoch already has packed data.

This bug did not impact the initial mass imports since we only
initialize the Import instance once-per-epoch and did not need
to take existing epochs into account.

Tested manually with -mda on a local clone of LKML

Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
---
 lib/PublicInbox/V2Writable.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 1f17fe2..b1d8095 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -588,7 +588,9 @@ sub importer {
 	if (defined $latest) {
 		my $git = PublicInbox::Git->new($latest);
 		my $packed_bytes = $git->packed_bytes;
-		if ($packed_bytes >= $self->{rotate_bytes}) {
+		my $unpacked_bytes = $packed_bytes / $PACKING_FACTOR;
+
+		if ($unpacked_bytes >= $self->{rotate_bytes}) {
 			$epoch = $max + 1;
 		} else {
 			$self->{epoch_max} = $max;
-- 
EW

  reply	other threads:[~2019-02-27 20:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-12 19:11 V2 shard roll-over Konstantin Ryabitsev
2019-02-12 19:27 ` Eric Wong
2019-02-27  0:22   ` Eric Wong
2019-02-27 13:26     ` Konstantin Ryabitsev
2019-02-27 20:25       ` Eric Wong [this message]
2019-02-27 23:34         ` [PATCH] v2writable: fix epoch rollover on incremental imports Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190227202536.uvlvggnt4bwtgx5h@dcvr \
    --to=e@80x24.org \
    --cc=konstantin@linuxfoundation.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).