user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* V2 shard roll-over
@ 2019-02-12 19:11 Konstantin Ryabitsev
  2019-02-12 19:27 ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2019-02-12 19:11 UTC (permalink / raw)
  To: meta

Eric:

I noticed today that the LKML shard 6 has grown over 1.1 GB, which is 
the size of other shards (0-5). I'm wondering if it will roll over to 
shard 7 automatically, or if there are other steps that need to be 
undertaken.

Best,
-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: V2 shard roll-over
  2019-02-12 19:11 V2 shard roll-over Konstantin Ryabitsev
@ 2019-02-12 19:27 ` Eric Wong
  2019-02-27  0:22   ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2019-02-12 19:27 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Eric:
> 
> I noticed today that the LKML shard 6 has grown over 1.1 GB, which is the
> size of other shards (0-5). I'm wondering if it will roll over to shard 7
> automatically, or if there are other steps that need to be undertaken.

It only counts bytes in *.pack files; so you might need to repack
(or wait for gc to run via --auto).

You can monitor the rollover via stderr with the following to be
sure:

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 1f17fe2..0fd8c60 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -588,6 +588,9 @@ sub importer {
 	if (defined $latest) {
 		my $git = PublicInbox::Git->new($latest);
 		my $packed_bytes = $git->packed_bytes;
+
+		print STDERR "packed_bytes=$packed_bytes ",
+			"rotate_bytes=$self->{rotate_bytes}\n";
 		if ($packed_bytes >= $self->{rotate_bytes}) {
 			$epoch = $max + 1;
 		} else {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: V2 shard roll-over
  2019-02-12 19:27 ` Eric Wong
@ 2019-02-27  0:22   ` Eric Wong
  2019-02-27 13:26     ` Konstantin Ryabitsev
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2019-02-27  0:22 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Eric Wong <e@80x24.org> wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > Eric:
> > 
> > I noticed today that the LKML shard 6 has grown over 1.1 GB, which is the
> > size of other shards (0-5). I'm wondering if it will roll over to shard 7
> > automatically, or if there are other steps that need to be undertaken.
> 
> It only counts bytes in *.pack files; so you might need to repack
> (or wait for gc to run via --auto).

Btw, have you checked this?  I've been wondering if 7 will show up, too.

> You can monitor the rollover via stderr with the following to be
> sure:
> 
> diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
> index 1f17fe2..0fd8c60 100644
> --- a/lib/PublicInbox/V2Writable.pm
> +++ b/lib/PublicInbox/V2Writable.pm
> @@ -588,6 +588,9 @@ sub importer {
>  	if (defined $latest) {
>  		my $git = PublicInbox::Git->new($latest);
>  		my $packed_bytes = $git->packed_bytes;
> +
> +		print STDERR "packed_bytes=$packed_bytes ",
> +			"rotate_bytes=$self->{rotate_bytes}\n";
>  		if ($packed_bytes >= $self->{rotate_bytes}) {
>  			$epoch = $max + 1;
>  		} else {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: V2 shard roll-over
  2019-02-27  0:22   ` Eric Wong
@ 2019-02-27 13:26     ` Konstantin Ryabitsev
  2019-02-27 20:25       ` [PATCH] v2writable: fix epoch rollover on incremental imports Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2019-02-27 13:26 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Wed, Feb 27, 2019 at 12:22:04AM +0000, Eric Wong wrote:
> Eric Wong <e@80x24.org> wrote:
> > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > > Eric:
> > > 
> > > I noticed today that the LKML shard 6 has grown over 1.1 GB, which is the
> > > size of other shards (0-5). I'm wondering if it will roll over to shard 7
> > > automatically, or if there are other steps that need to be undertaken.
> > 
> > It only counts bytes in *.pack files; so you might need to repack
> > (or wait for gc to run via --auto).
> 
> Btw, have you checked this?  I've been wondering if 7 will show up, too.

Yeah, I've repacked it, but we still haven't rolled over to 7. This is
the latest on the server:

$ git count-objects -v
count: 18
size: 88
in-pack: 1036749
packs: 1
size-pack: 1169104
prune-packable: 0
garbage: 0
size-garbage: 0
$ ls -al objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack
-r--r--r--. 1 archiver archiver 1168133236 Feb 27 13:10 objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack

I'm looking at the code and I'm not entirely sure what PACKING_FACTOR
is:

my $PACKING_FACTOR = 0.4;
...
rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR),

Wouldn't that give us 2.7GB?
(1024*1024*1024)/0.4 = 2,684,354,560

It's possible I'm not following the logic right. It looks to be the same
code that properly sharded things on the initial import, so I'm not
sure.

-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] v2writable: fix epoch rollover on incremental imports
  2019-02-27 13:26     ` Konstantin Ryabitsev
@ 2019-02-27 20:25       ` Eric Wong
  2019-02-27 23:34         ` Konstantin Ryabitsev
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2019-02-27 20:25 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Feb 27, 2019 at 12:22:04AM +0000, Eric Wong wrote:
> > Eric Wong <e@80x24.org> wrote:
> > > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > > > Eric:
> > > > 
> > > > I noticed today that the LKML shard 6 has grown over 1.1 GB, which is the
> > > > size of other shards (0-5). I'm wondering if it will roll over to shard 7
> > > > automatically, or if there are other steps that need to be undertaken.
> > > 
> > > It only counts bytes in *.pack files; so you might need to repack
> > > (or wait for gc to run via --auto).
> > 
> > Btw, have you checked this?  I've been wondering if 7 will show up, too.
> 
> Yeah, I've repacked it, but we still haven't rolled over to 7. This is
> the latest on the server:
> 
> $ git count-objects -v
> count: 18
> size: 88
> in-pack: 1036749
> packs: 1
> size-pack: 1169104
> prune-packable: 0
> garbage: 0
> size-garbage: 0
> $ ls -al objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack
> -r--r--r--. 1 archiver archiver 1168133236 Feb 27 13:10 objects/pack/pack-7d2041260250f79f5d2396f38959560e013c8d26.pack
> 
> I'm looking at the code and I'm not entirely sure what PACKING_FACTOR
> is:
> 
> my $PACKING_FACTOR = 0.4;
> ...
> rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR),
> 
> Wouldn't that give us 2.7GB?
> (1024*1024*1024)/0.4 = 2,684,354,560

Yes, we do all the calculations using the estimated unpacked
size.  So the estimate is 2.7GB unpacked (and uncompressed) is
roughly 1GB packed.

> It's possible I'm not following the logic right. It looks to be the same
> code that properly sharded things on the initial import, so I'm not
> sure.

Almost, the problem was the initial import never saw an existing
git repo with data in it.  The incemental -mda/-watch path
failed to take into account the unpacked size of the existing
data.

This fixes it:
---------8<-----------
Subject: [PATCH] v2writable: fix epoch rollover on incremental imports

All of our internal epoch rollover calculations are done using
the estimated unpacked (and uncompressed) size of the repo.  The
importer instance needs to check that unpacked size before
selecting an epoch when an epoch already has packed data.

This bug did not impact the initial mass imports since we only
initialize the Import instance once-per-epoch and did not need
to take existing epochs into account.

Tested manually with -mda on a local clone of LKML

Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
---
 lib/PublicInbox/V2Writable.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 1f17fe2..b1d8095 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -588,7 +588,9 @@ sub importer {
 	if (defined $latest) {
 		my $git = PublicInbox::Git->new($latest);
 		my $packed_bytes = $git->packed_bytes;
-		if ($packed_bytes >= $self->{rotate_bytes}) {
+		my $unpacked_bytes = $packed_bytes / $PACKING_FACTOR;
+
+		if ($unpacked_bytes >= $self->{rotate_bytes}) {
 			$epoch = $max + 1;
 		} else {
 			$self->{epoch_max} = $max;
-- 
EW

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] v2writable: fix epoch rollover on incremental imports
  2019-02-27 20:25       ` [PATCH] v2writable: fix epoch rollover on incremental imports Eric Wong
@ 2019-02-27 23:34         ` Konstantin Ryabitsev
  0 siblings, 0 replies; 6+ messages in thread
From: Konstantin Ryabitsev @ 2019-02-27 23:34 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Wed, Feb 27, 2019 at 08:25:36PM +0000, Eric Wong wrote:
>All of our internal epoch rollover calculations are done using
>the estimated unpacked (and uncompressed) size of the repo.  The
>importer instance needs to check that unpacked size before
>selecting an epoch when an epoch already has packed data.
>
>This bug did not impact the initial mass imports since we only
>initialize the Import instance once-per-epoch and did not need
>to take existing epochs into account.
>
>Tested manually with -mda on a local clone of LKML

Ding, this got us shard 7. :)

Thanks, Eric!

-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-27 23:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-12 19:11 V2 shard roll-over Konstantin Ryabitsev
2019-02-12 19:27 ` Eric Wong
2019-02-27  0:22   ` Eric Wong
2019-02-27 13:26     ` Konstantin Ryabitsev
2019-02-27 20:25       ` [PATCH] v2writable: fix epoch rollover on incremental imports Eric Wong
2019-02-27 23:34         ` Konstantin Ryabitsev

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).