user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 00/20] v2: use consistent terminology
@ 2019-06-15  8:46 Eric Wong
  2019-06-15  8:46 ` [PATCH 01/20] doc: rename our Xapian "partitions" to "shards" Eric Wong
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:46 UTC (permalink / raw)
  To: meta

Modern Xapian docs and code refer to multi-DB uses as "shards";
so replace the term "partition" with the term "shard".  This
way, we'll minimize the learning curve for people new to
public-inbox or Xapian.

We also still had a few places where we referred to git epochs
as "partitions", so fix those up and be consistent with the
majority of our own docs and code.

Eric Wong (20):
  doc: rename our Xapian "partitions" to "shards"
  v2writable: update comments regarding xcpdb --reshard
  admin|xapcmd: user-facing messages say "shard"
  rename reference to git epochs as "partitions"
  searchidxpart: start using "shard" in user-visible places
  v2writable: count_partitions => count_shards
  v2writable: rename {partitions} field to {shards}
  tests: change messages to use "shard" instead of partition
  inboxwritable: s/partitions/shards/ in local var
  v2: rename SearchIdxPart => SearchIdxShard
  xapcmd: update comments referencing "partitions"
  search*: rename {partition} => {shard}
  v2writable: avoid "part" in internal subs and fields
  v2writable: rename local vars to match Xapian terminology
  adminedit: "part" => "shard" for local variables
  v2writable: use "epoch" consistently when referring to git repos
  search: use "shard" for local variable
  xapcmd: favor 'shard' over 'part' in local variables
  t/xcpdb-reshard: use 'shard' term in local variables
  comments: replace "partition" with "shard"

 Documentation/public-inbox-v2-format.pod      |  10 +-
 Documentation/public-inbox-xcpdb.pod          |  11 +-
 MANIFEST                                      |   2 +-
 lib/PublicInbox/Admin.pm                      |   4 +-
 lib/PublicInbox/AdminEdit.pm                  |  12 +-
 lib/PublicInbox/Inbox.pm                      |  18 +--
 lib/PublicInbox/InboxWritable.pm              |   4 +-
 lib/PublicInbox/Search.pm                     |  14 +--
 lib/PublicInbox/SearchIdx.pm                  |  19 +--
 .../{SearchIdxPart.pm => SearchIdxShard.pm}   |  30 ++---
 lib/PublicInbox/V2Writable.pm                 | 109 +++++++++---------
 lib/PublicInbox/WWW.pm                        |  12 +-
 lib/PublicInbox/WwwListing.pm                 |   2 +-
 lib/PublicInbox/WwwStream.pm                  |  12 +-
 lib/PublicInbox/Xapcmd.pm                     |  96 +++++++--------
 t/indexlevels-mirror.t                        |   2 +-
 t/psgi_v2.t                                   |   2 +-
 t/v2writable.t                                |   4 +-
 t/view.t                                      |   2 +-
 t/xcpdb-reshard.t                             |  14 +--
 20 files changed, 188 insertions(+), 191 deletions(-)
 rename lib/PublicInbox/{SearchIdxPart.pm => SearchIdxShard.pm} (78%)

-- 
EW


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/20] doc: rename our Xapian "partitions" to "shards"
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
@ 2019-06-15  8:46 ` Eric Wong
  2019-06-15  8:46 ` [PATCH 02/20] v2writable: update comments regarding xcpdb --reshard Eric Wong
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:46 UTC (permalink / raw)
  To: meta

For consistency with Xapian documentation (in the "master"
branch).
---
 Documentation/public-inbox-v2-format.pod | 10 +++++-----
 Documentation/public-inbox-xcpdb.pod     | 11 +++++------
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod
index bdfe7ab..28d3550 100644
--- a/Documentation/public-inbox-v2-format.pod
+++ b/Documentation/public-inbox-v2-format.pod
@@ -16,7 +16,7 @@ Message-IDs.
 The key change in v2 is the inbox is no longer a bare git
 repository, but a directory with two or more git repositories.
 v2 divides git repositories by time "epochs" and Xapian
-databases for parallelism by "partitions".
+databases for parallelism by "shards".
 
 =head2 INBOX OVERVIEW AND DEFINITIONS
 
@@ -28,7 +28,7 @@ foo/ # assuming "foo" is the name of the list
 - inbox.lock                 # lock file (flock) to protect global state
 - git/$EPOCH.git             # normal git repositories
 - all.git                    # empty git repo, alternates to git/$EPOCH.git
-- xap$SCHEMA_VERSION/$PART   # per-partition Xapian DB
+- xap$SCHEMA_VERSION/$SHARD  # per-shard Xapian DB
 - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP and threading
 - msgmap.sqlite3             # same the v1 msgmap
 
@@ -95,16 +95,16 @@ are documented at:
 
 L<https://public-inbox.org/meta/20180209205140.GA11047@dcvr/>
 
-=head2 XAPIAN PARTITIONS
+=head2 XAPIAN SHARDS
 
 Another second scalability problem in v1 was the inability to
 utilize multiple CPU cores for Xapian indexing.  This is
-addressed by using partitions in Xapian to perform import
+addressed by using shards in Xapian to perform import
 indexing in parallel.
 
 As with git alternates, Xapian natively supports a read-only
 interface which transparently abstracts away the knowledge of
-multiple partitions.  This allows us to simplify our read-only
+multiple shards.  This allows us to simplify our read-only
 code paths.
 
 The performance of the storage device is now the bottleneck on
diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod
index fd8770a..a13c4ef 100644
--- a/Documentation/public-inbox-xcpdb.pod
+++ b/Documentation/public-inbox-xcpdb.pod
@@ -21,7 +21,7 @@ L<public-inbox-watch(1)> or L<public-inbox-mda(1)>.
 =item --compact
 
 In addition to performing the copy operation, run L<xapian-compact(1)>
-on each Xapian partition after copying but before finalizing it.
+on each Xapian shard after copying but before finalizing it.
 Compared to the cost of copying a Xapian database, compacting a
 Xapian database takes only around 5% of the time required to copy.
 
@@ -32,14 +32,13 @@ the compaction to take hours at-a-time.
 
 =item --reshard=N / -R N
 
-Repartition the Xapian database on a L<v2|public-inbox-v2-format(5)>
-inbox to C<N> partitions.  Since L<xapian-compact(1)> is not suitable
-for merging, users can rely on this switch to repartition the
+Reshard the Xapian database on a L<v2|public-inbox-v2-format(5)>
+inbox to C<N> shards .  Since L<xapian-compact(1)> is not suitable
+for merging, users can rely on this switch to reshard the
 existing Xapian database(s) to any positive value of C<N>.
 
 This is useful in case the Xapian DB was created with too few or
-too many partitions given the capabilities of the current
-hardware.
+too many shards given the capabilities of the current hardware.
 
 =item --blocksize / --no-full / --fuller
 
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/20] v2writable: update comments regarding xcpdb --reshard
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
  2019-06-15  8:46 ` [PATCH 01/20] doc: rename our Xapian "partitions" to "shards" Eric Wong
@ 2019-06-15  8:46 ` Eric Wong
  2019-06-15  8:46 ` [PATCH 03/20] admin|xapcmd: user-facing messages say "shard" Eric Wong
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:46 UTC (permalink / raw)
  To: meta

Using compact to change shard count was abandoned during
the v2 development phase.
---
 lib/PublicInbox/V2Writable.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 76e61e8..db905f9 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -58,8 +58,8 @@ sub count_partitions ($) {
 	my $xpfx = $self->{xpfx};
 
 	# always load existing partitions in case core count changes:
-	# Also, partition count may change while -watch is running
-	# due to -compact
+	# Also, shard count may change while -watch is running
+	# due to "xcpdb --reshard"
 	if (-d $xpfx) {
 		foreach my $part (<$xpfx/*>) {
 			-d $part && $part =~ m!/[0-9]+\z! or next;
@@ -288,7 +288,7 @@ sub idx_init {
 		$self->lock_acquire unless ($opt && $opt->{-skip_lock});
 		$over->create;
 
-		# -compact can change partition count while -watch is idle
+		# xcpdb can change shard count while -watch is idle
 		my $nparts = count_partitions($self);
 		if ($nparts && $nparts != $self->{partitions}) {
 			$self->{partitions} = $nparts;
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/20] admin|xapcmd: user-facing messages say "shard"
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
  2019-06-15  8:46 ` [PATCH 01/20] doc: rename our Xapian "partitions" to "shards" Eric Wong
  2019-06-15  8:46 ` [PATCH 02/20] v2writable: update comments regarding xcpdb --reshard Eric Wong
@ 2019-06-15  8:46 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 04/20] rename reference to git epochs as "partitions" Eric Wong
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:46 UTC (permalink / raw)
  To: meta

We're slowly getting rid of the word "partition" when it
comes to remain consistent with Xapian docs.
---
 lib/PublicInbox/Admin.pm  |  2 +-
 lib/PublicInbox/Xapcmd.pm | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 8a2f204..5549b85 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -207,7 +207,7 @@ sub index_inbox {
 				my $n = $v2w->{partitions};
 				if ($jobs != ($n + 1)) {
 					warn
-"Unable to respect --jobs=$jobs, inbox was created with $n partitions\n";
+"Unable to respect --jobs=$jobs, inbox was created with $n shards\n";
 				}
 			}
 		}
diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index e1c6fe3..e303da9 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -68,11 +68,11 @@ sub commit_changes ($$$) {
 			my $n = $im->count_partitions;
 			if (defined $new_parts && $n != $new_parts) {
 				die
-"BUG: counted $n partitions after repartioning to $new_parts";
+"BUG: counted $n shards after resharding to $new_parts";
 			}
 			my $prev = $im->{partitions};
 			if ($pr && $prev != $n) {
-				$pr->("partition count changed: $prev => $n\n");
+				$pr->("shard count changed: $prev => $n\n");
 				$im->{partitions} = $n;
 			}
 		}
@@ -177,7 +177,7 @@ sub run {
 	}
 
 	# we want temporary directories to be as deep as possible,
-	# so v2 partitions can keep "xap$SCHEMA_VERSION" on a separate FS.
+	# so v2 shards can keep "xap$SCHEMA_VERSION" on a separate FS.
 	if ($v == 1) {
 		if (defined $new_parts) {
 			warn
@@ -355,9 +355,9 @@ sub cpdb ($$) {
 	if (ref($old) eq 'ARRAY') {
 		($cur_part) = ($new =~ m!xap[0-9]+/([0-9]+)\b!);
 		defined $cur_part or
-			die "BUG: could not extract partition # from $new";
+			die "BUG: could not extract shard # from $new";
 		$new_parts = $opt->{reshard};
-		defined $new_parts or die 'BUG: got array src w/o --partition';
+		defined $new_parts or die 'BUG: got array src w/o --reshard';
 
 		# repartitioning, M:N copy means have full read access
 		foreach (@$old) {
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/20] rename reference to git epochs as "partitions"
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (2 preceding siblings ...)
  2019-06-15  8:46 ` [PATCH 03/20] admin|xapcmd: user-facing messages say "shard" Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 05/20] searchidxpart: start using "shard" in user-visible places Eric Wong
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Try to remain consistent with our own documentation regarding
v2 git "epochs", first.
---
 lib/PublicInbox/Inbox.pm      | 18 +++++++++---------
 lib/PublicInbox/WWW.pm        | 12 ++++++------
 lib/PublicInbox/WwwListing.pm |  2 +-
 lib/PublicInbox/WwwStream.pm  | 12 ++++++------
 t/view.t                      |  2 +-
 5 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 10f716c..c0eb640 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -125,11 +125,11 @@ sub new {
 	bless $opts, $class;
 }
 
-sub git_part {
-	my ($self, $part) = @_;
+sub git_epoch {
+	my ($self, $epoch) = @_;
 	($self->{version} || 1) == 2 or return;
-	$self->{"$part.git"} ||= eval {
-		my $git_dir = "$self->{mainrepo}/git/$part.git";
+	$self->{"$epoch.git"} ||= eval {
+		my $git_dir = "$self->{mainrepo}/git/$epoch.git";
 		my $g = PublicInbox::Git->new($git_dir);
 		$g->{-httpbackend_limiter} = $self->{-httpbackend_limiter};
 		# no cleanup needed, we never cat-file off this, only clone
@@ -149,13 +149,13 @@ sub git {
 	};
 }
 
-sub max_git_part {
+sub max_git_epoch {
 	my ($self) = @_;
 	my $v = $self->{version};
 	return unless defined($v) && $v == 2;
-	my $part = $self->{-max_git_part};
+	my $cur = $self->{-max_git_epoch};
 	my $changed = git($self)->alternates_changed;
-	if (!defined($part) || $changed) {
+	if (!defined($cur) || $changed) {
 		$self->git->cleanup if $changed;
 		my $gits = "$self->{mainrepo}/git";
 		if (opendir my $dh, $gits) {
@@ -164,12 +164,12 @@ sub max_git_part {
 				$git_dir =~ m!\A([0-9]+)\.git\z! or next;
 				$max = $1 if $1 > $max;
 			}
-			$part = $self->{-max_git_part} = $max if $max >= 0;
+			$cur = $self->{-max_git_epoch} = $max if $max >= 0;
 		} else {
 			warn "opendir $gits failed: $!\n";
 		}
 	}
-	$part;
+	$cur;
 }
 
 sub mm {
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index e468263..9021cb5 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -76,9 +76,9 @@ sub call {
 	if ($method eq 'POST') {
 		if ($path_info =~ m!$INBOX_RE/(?:(?:git/)?([0-9]+)(?:\.git)?/)?
 					(git-upload-pack)\z!x) {
-			my ($part, $path) = ($2, $3);
+			my ($epoch, $path) = ($2, $3);
 			return invalid_inbox($ctx, $1) ||
-				serve_git($ctx, $part, $path);
+				serve_git($ctx, $epoch, $path);
 		} elsif ($path_info =~ m!$INBOX_RE/!o) {
 			return invalid_inbox($ctx, $1) || mbox_results($ctx);
 		}
@@ -100,8 +100,8 @@ sub call {
 		invalid_inbox($ctx, $1) || get_new($ctx);
 	} elsif ($path_info =~ m!$INBOX_RE/(?:(?:git/)?([0-9]+)(?:\.git)?/)?
 				($PublicInbox::GitHTTPBackend::ANY)\z!ox) {
-		my ($part, $path) = ($2, $3);
-		invalid_inbox($ctx, $1) || serve_git($ctx, $part, $path);
+		my ($epoch, $path) = ($2, $3);
+		invalid_inbox($ctx, $1) || serve_git($ctx, $epoch, $path);
 	} elsif ($path_info =~ m!$INBOX_RE/([a-zA-Z0-9_\-]+).mbox\.gz\z!o) {
 		serve_mbox_range($ctx, $1, $2);
 	} elsif ($path_info =~ m!$INBOX_RE/$MID_RE/$END_RE\z!o) {
@@ -437,10 +437,10 @@ sub msg_page {
 }
 
 sub serve_git {
-	my ($ctx, $part, $path) = @_;
+	my ($ctx, $epoch, $path) = @_;
 	my $env = $ctx->{env};
 	my $ibx = $ctx->{-inbox};
-	my $git = defined $part ? $ibx->git_part($part) : $ibx->git;
+	my $git = defined $epoch ? $ibx->git_epoch($epoch) : $ibx->git;
 	$git ? PublicInbox::GitHTTPBackend::serve($env, $git, $path) : r404();
 }
 
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index e2724cc..e052bbf 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -190,7 +190,7 @@ sub js ($$) {
 
 	my $manifest = { -abs2urlpath => {}, -mtime => 0 };
 	for my $ibx (@$list) {
-		if (defined(my $max = $ibx->max_git_part)) {
+		if (defined(my $max = $ibx->max_git_epoch)) {
 			for my $epoch (0..$max) {
 				manifest_add($manifest, $ibx, $epoch);
 			}
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index f6c5049..082e5ec 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -85,11 +85,11 @@ sub _html_end {
 	my (%seen, @urls);
 	my $http = $ibx->base_url($ctx->{env});
 	chop $http; # no trailing slash for clone
-	my $part = $ibx->max_git_part;
+	my $max = $ibx->max_git_epoch;
 	my $dir = (split(m!/!, $http))[-1];
-	if (defined($part)) { # v2
+	if (defined($max)) { # v2
 		$seen{$http} = 1;
-		for my $i (0..$part) {
+		for my $i (0..$max) {
 			# old parts my be deleted:
 			-d "$ibx->{mainrepo}/git/$i.git" or next;
 			my $url = "$http/$i";
@@ -101,7 +101,7 @@ sub _html_end {
 		push @urls, $http;
 	}
 
-	# FIXME: partitioning in can be different in other repositories,
+	# FIXME: epoch splits can be different in other repositories,
 	# use the "cloneurl" file as-is for now:
 	foreach my $u (@{$ibx->cloneurl}) {
 		next if $seen{$u};
@@ -109,13 +109,13 @@ sub _html_end {
 		push @urls, $u =~ /\Ahttps?:/ ? qq(<a\nhref="$u">$u</a>) : $u;
 	}
 
-	if (defined($part) || scalar(@urls) > 1) {
+	if (defined($max) || scalar(@urls) > 1) {
 		$urls .= "\n" .
 			join("\n", map { "\tgit clone --mirror $_" } @urls);
 	} else {
 		$urls .= " git clone --mirror $urls[0]";
 	}
-	if (defined $part) {
+	if (defined $max) {
 		my $addrs = $ibx->{address};
 		$addrs = join(' ', @$addrs) if ref($addrs) eq 'ARRAY';
 		$urls .=  <<EOF
diff --git a/t/view.t b/t/view.t
index 0782954..d93be6f 100644
--- a/t/view.t
+++ b/t/view.t
@@ -18,7 +18,7 @@ my $ctx = {
 		base_url => sub { 'http://example.com/' },
 		cloneurl => sub {[]},
 		nntp_url => sub {[]},
-		max_git_part => sub { undef },
+		max_git_epoch => sub { undef },
 		description => sub { '' }),
 	www => Plack::Util::inline_object(style => sub { '' }),
 };
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/20] searchidxpart: start using "shard" in user-visible places
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (3 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 04/20] rename reference to git epochs as "partitions" Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 06/20] v2writable: count_partitions => count_shards Eric Wong
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

We'll name our process title with "shard" instead, and
update a few error messages and comments to match.
---
 lib/PublicInbox/SearchIdxPart.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/SearchIdxPart.pm b/lib/PublicInbox/SearchIdxPart.pm
index 51d81a0..77fb7d9 100644
--- a/lib/PublicInbox/SearchIdxPart.pm
+++ b/lib/PublicInbox/SearchIdxPart.pm
@@ -1,8 +1,8 @@
 # Copyright (C) 2018 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# used to interface with a single Xapian partition in V2 repos.
-# See L<public-inbox-v2-format(5)> for more info on how we partition Xapian
+# used to interface with a single Xapian shard in V2 repos.
+# See L<public-inbox-v2-format(5)> for more info on how we shard Xapian
 package PublicInbox::SearchIdxPart;
 use strict;
 use warnings;
@@ -47,7 +47,7 @@ sub spawn_worker {
 
 sub partition_worker_loop ($$$$) {
 	my ($self, $r, $part, $bnote) = @_;
-	$0 = "pi-v2-partition[$part]";
+	$0 = "pi-v2-shard[$part]";
 	my $current_info = '';
 	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
 	local $SIG{__WARN__} = sub {
@@ -89,7 +89,7 @@ sub index_raw {
 	my ($self, $bytes, $msgref, $artnum, $oid, $mid0, $mime) = @_;
 	if (my $w = $self->{w}) {
 		print $w "$bytes $artnum $oid $mid0\n", $$msgref or die
-			"failed to write partition $!\n";
+			"failed to write shard $!\n";
 		$w->flush or die "failed to flush: $!\n";
 	} else {
 		$$msgref = undef;
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/20] v2writable: count_partitions => count_shards
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (4 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 05/20] searchidxpart: start using "shard" in user-visible places Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 07/20] v2writable: rename {partitions} field to {shards} Eric Wong
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Another step towards becoming consistent with Xapian terminology
---
 lib/PublicInbox/V2Writable.pm | 6 +++---
 lib/PublicInbox/Xapcmd.pm     | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index db905f9..03e6e95 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -52,7 +52,7 @@ sub nproc_parts ($) {
 	$n < 1 ? 1 : $n;
 }
 
-sub count_partitions ($) {
+sub count_shards ($) {
 	my ($self) = @_;
 	my $nparts = 0;
 	my $xpfx = $self->{xpfx};
@@ -103,7 +103,7 @@ sub new {
 		rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR),
 		last_commit => [], # git repo -> commit
 	};
-	$self->{partitions} = count_partitions($self) || nproc_parts($creat);
+	$self->{partitions} = count_shards($self) || nproc_parts($creat);
 	bless $self, $class;
 }
 
@@ -289,7 +289,7 @@ sub idx_init {
 		$over->create;
 
 		# xcpdb can change shard count while -watch is idle
-		my $nparts = count_partitions($self);
+		my $nparts = count_shards($self);
 		if ($nparts && $nparts != $self->{partitions}) {
 			$self->{partitions} = $nparts;
 		}
diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index e303da9..89bacc5 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -63,9 +63,9 @@ sub commit_changes ($$$) {
 	if (!$opt->{-coarse_lock}) {
 		$opt->{-skip_lock} = 1;
 
-		if ($im->can('count_partitions')) {
+		if ($im->can('count_shards')) {
 			my $pr = $opt->{-progress};
-			my $n = $im->count_partitions;
+			my $n = $im->count_shards;
 			if (defined $new_parts && $n != $new_parts) {
 				die
 "BUG: counted $n shards after resharding to $new_parts";
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/20] v2writable: rename {partitions} field to {shards}
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (5 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 06/20] v2writable: count_partitions => count_shards Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 08/20] tests: change messages to use "shard" instead of partition Eric Wong
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Our internal data structure should be consistent with Xapian
terminology.
---
 lib/PublicInbox/Admin.pm      |  2 +-
 lib/PublicInbox/V2Writable.pm | 10 +++++-----
 lib/PublicInbox/Xapcmd.pm     |  4 ++--
 t/v2writable.t                |  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 5549b85..29388ad 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -204,7 +204,7 @@ sub index_inbox {
 			if ($jobs == 0) {
 				$v2w->{parallel} = 0;
 			} else {
-				my $n = $v2w->{partitions};
+				my $n = $v2w->{shards};
 				if ($jobs != ($n + 1)) {
 					warn
 "Unable to respect --jobs=$jobs, inbox was created with $n shards\n";
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 03e6e95..aa13aa8 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -103,7 +103,7 @@ sub new {
 		rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR),
 		last_commit => [], # git repo -> commit
 	};
-	$self->{partitions} = count_shards($self) || nproc_parts($creat);
+	$self->{shards} = count_shards($self) || nproc_parts($creat);
 	bless $self, $class;
 }
 
@@ -134,7 +134,7 @@ sub add {
 sub do_idx ($$$$$$$) {
 	my ($self, $msgref, $mime, $len, $num, $oid, $mid0) = @_;
 	$self->{over}->add_overview($mime, $len, $num, $oid, $mid0);
-	my $npart = $self->{partitions};
+	my $npart = $self->{shards};
 	my $part = $num % $npart;
 	my $idx = idx_part($self, $part);
 	$idx->index_raw($len, $msgref, $num, $oid, $mid0, $mime);
@@ -290,12 +290,12 @@ sub idx_init {
 
 		# xcpdb can change shard count while -watch is idle
 		my $nparts = count_shards($self);
-		if ($nparts && $nparts != $self->{partitions}) {
-			$self->{partitions} = $nparts;
+		if ($nparts && $nparts != $self->{shards}) {
+			$self->{shards} = $nparts;
 		}
 
 		# need to create all parts before initializing msgmap FD
-		my $max = $self->{partitions} - 1;
+		my $max = $self->{shards} - 1;
 
 		# idx_parts must be visible to all forked processes
 		my $idx = $self->{idx_parts} = [];
diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index 89bacc5..322d827 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -70,10 +70,10 @@ sub commit_changes ($$$) {
 				die
 "BUG: counted $n shards after resharding to $new_parts";
 			}
-			my $prev = $im->{partitions};
+			my $prev = $im->{shards};
 			if ($pr && $prev != $n) {
 				$pr->("shard count changed: $prev => $n\n");
-				$im->{partitions} = $n;
+				$im->{shards} = $n;
 			}
 		}
 
diff --git a/t/v2writable.t b/t/v2writable.t
index b0f88d2..88df2d6 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -34,7 +34,7 @@ my $mime = PublicInbox::MIME->create(
 );
 
 my $im = PublicInbox::V2Writable->new($ibx, {nproc => 1});
-is($im->{partitions}, 1, 'one partition when forced');
+is($im->{shards}, 1, 'one shard when forced');
 ok($im->add($mime), 'ordinary message added');
 foreach my $f ("$mainrepo/msgmap.sqlite3",
 		glob("$mainrepo/xap*/*"),
@@ -199,7 +199,7 @@ EOF
 	my @before = $git0->qx(@log, qw(--pretty=oneline));
 	my $before = $git0->qx(@log, qw(--pretty=raw --raw -r));
 	$im = PublicInbox::V2Writable->new($ibx, {nproc => 2});
-	is($im->{partitions}, 1, 'detected single partition from previous');
+	is($im->{shards}, 1, 'detected single shard from previous');
 	my $smsg = $im->remove($mime, 'test removal');
 	$im->done;
 	my @after = $git0->qx(@log, qw(--pretty=oneline));
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/20] tests: change messages to use "shard" instead of partition
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (6 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 07/20] v2writable: rename {partitions} field to {shards} Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 09/20] inboxwritable: s/partitions/shards/ in local var Eric Wong
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Another potentially user-facing piece made consistent with
Xapian terminology.
---
 t/indexlevels-mirror.t | 2 +-
 t/psgi_v2.t            | 2 +-
 t/xcpdb-reshard.t      | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t
index 3597494..b685da1 100644
--- a/t/indexlevels-mirror.t
+++ b/t/indexlevels-mirror.t
@@ -138,7 +138,7 @@ sub import_index_incremental {
 
 	if ($v == 2 && $level eq 'basic') {
 		is_deeply([glob("$ibx->{mainrepo}/xap*/?/")], [],
-			 'no Xapian partition directories for v2 basic');
+			 'no Xapian shard directories for v2 basic');
 	}
 	if ($level ne 'basic') {
 		($nr, $msgs) = $ro_mirror->search->reopen->query('m:m@2');
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index 5c358cd..3601068 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -205,7 +205,7 @@ test_psgi(sub { $www->call(@_) }, sub {
 	$res = $cb->(GET('/v2test/0.git/info/refs'));
 	is($res->code, 200, 'got info refs for dumb clones w/ .git suffix');
 	$res = $cb->(GET('/v2test/info/refs'));
-	is($res->code, 404, 'unpartitioned git URL fails');
+	is($res->code, 404, 'v2 git URL w/o shard fails');
 
 	# ensure conflicted attachments can be resolved
 	foreach my $body (qw(old new)) {
diff --git a/t/xcpdb-reshard.t b/t/xcpdb-reshard.t
index ce552f5..bf56404 100644
--- a/t/xcpdb-reshard.t
+++ b/t/xcpdb-reshard.t
@@ -48,14 +48,14 @@ is(scalar(@parts), $nproc, 'got expected parts');
 my $orig = $ibx->over->query_xover(1, $ndoc);
 my %nums = map {; "$_->{num}" => 1 } @$orig;
 
-# ensure we can go up or down in partitions, or stay the same:
+# ensure we can go up or down in shards, or stay the same:
 for my $R (qw(2 4 1 3 3)) {
 	delete $ibx->{search}; # release old handles
 	is(system(@xcpdb, "-R$R", $ibx->{mainrepo}), 0, "xcpdb -R$R");
 	my @new_parts = grep(m!/\d+\z!, glob("$ibx->{mainrepo}/xap*/*"));
-	is(scalar(@new_parts), $R, 'repartitioned to two parts');
+	is(scalar(@new_parts), $R, 'resharded to two parts');
 	my $msgs = $ibx->search->query('s:this');
-	is(scalar(@$msgs), $ndoc, 'got expected docs after repartitioning');
+	is(scalar(@$msgs), $ndoc, 'got expected docs after resharding');
 	my %by_mid = map {; "$_->{mid}" => $_ } @$msgs;
 	ok($by_mid{"m$_\@example.com"}, "$_ exists") for (1..$ndoc);
 
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/20] inboxwritable: s/partitions/shards/ in local var
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (7 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 08/20] tests: change messages to use "shard" instead of partition Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 10/20] v2: rename SearchIdxPart => SearchIdxShard Eric Wong
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

More work towards being consistent with Xapian's own terminology
---
 lib/PublicInbox/InboxWritable.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 116f423..f00141d 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -31,7 +31,7 @@ sub new {
 }
 
 sub init_inbox {
-	my ($self, $partitions, $skip_epoch, $skip_artnum) = @_;
+	my ($self, $shards, $skip_epoch, $skip_artnum) = @_;
 	# TODO: honor skip_artnum
 	my $v = $self->{version} || 1;
 	if ($v == 1) {
@@ -39,7 +39,7 @@ sub init_inbox {
 		PublicInbox::Import::init_bare($dir);
 	} else {
 		my $v2w = importer($self);
-		$v2w->init_inbox($partitions, $skip_epoch, $skip_artnum);
+		$v2w->init_inbox($shards, $skip_epoch, $skip_artnum);
 	}
 }
 
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/20] v2: rename SearchIdxPart => SearchIdxShard
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (8 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 09/20] inboxwritable: s/partitions/shards/ in local var Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 11/20] xapcmd: update comments referencing "partitions" Eric Wong
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Another step towards keeping our file and package names
consistent with Xapian terminology.
---
 MANIFEST                                      |  2 +-
 .../{SearchIdxPart.pm => SearchIdxShard.pm}   | 24 +++++++++----------
 lib/PublicInbox/V2Writable.pm                 |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)
 rename lib/PublicInbox/{SearchIdxPart.pm => SearchIdxShard.pm} (83%)

diff --git a/MANIFEST b/MANIFEST
index 3f0a79a..c769397 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -119,7 +119,7 @@ lib/PublicInbox/Reply.pm
 lib/PublicInbox/SaPlugin/ListMirror.pm
 lib/PublicInbox/Search.pm
 lib/PublicInbox/SearchIdx.pm
-lib/PublicInbox/SearchIdxPart.pm
+lib/PublicInbox/SearchIdxShard.pm
 lib/PublicInbox/SearchMsg.pm
 lib/PublicInbox/SearchThread.pm
 lib/PublicInbox/SearchView.pm
diff --git a/lib/PublicInbox/SearchIdxPart.pm b/lib/PublicInbox/SearchIdxShard.pm
similarity index 83%
rename from lib/PublicInbox/SearchIdxPart.pm
rename to lib/PublicInbox/SearchIdxShard.pm
index 77fb7d9..15ec657 100644
--- a/lib/PublicInbox/SearchIdxPart.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -3,23 +3,23 @@
 
 # used to interface with a single Xapian shard in V2 repos.
 # See L<public-inbox-v2-format(5)> for more info on how we shard Xapian
-package PublicInbox::SearchIdxPart;
+package PublicInbox::SearchIdxShard;
 use strict;
 use warnings;
 use base qw(PublicInbox::SearchIdx);
 
 sub new {
-	my ($class, $v2writable, $part) = @_;
-	my $self = $class->SUPER::new($v2writable->{-inbox}, 1, $part);
+	my ($class, $v2writable, $shard) = @_;
+	my $self = $class->SUPER::new($v2writable->{-inbox}, 1, $shard);
 	# create the DB before forking:
 	$self->_xdb_acquire;
 	$self->_xdb_release;
-	$self->spawn_worker($v2writable, $part) if $v2writable->{parallel};
+	$self->spawn_worker($v2writable, $shard) if $v2writable->{parallel};
 	$self;
 }
 
 sub spawn_worker {
-	my ($self, $v2writable, $part) = @_;
+	my ($self, $v2writable, $shard) = @_;
 	my ($r, $w);
 	pipe($r, $w) or die "pipe failed: $!\n";
 	binmode $r, ':raw';
@@ -35,8 +35,8 @@ sub spawn_worker {
 		# speeds V2Writable batch imports across 8 cores by nearly 20%
 		fcntl($r, 1031, 1048576) if $^O eq 'linux';
 
-		eval { partition_worker_loop($self, $r, $part, $bnote) };
-		die "worker $part died: $@\n" if $@;
+		eval { shard_worker_loop($self, $r, $shard, $bnote) };
+		die "worker $shard died: $@\n" if $@;
 		die "unexpected MM $self->{mm}" if $self->{mm};
 		exit;
 	}
@@ -45,14 +45,14 @@ sub spawn_worker {
 	close $r or die "failed to close: $!";
 }
 
-sub partition_worker_loop ($$$$) {
-	my ($self, $r, $part, $bnote) = @_;
-	$0 = "pi-v2-shard[$part]";
+sub shard_worker_loop ($$$$) {
+	my ($self, $r, $shard, $bnote) = @_;
+	$0 = "pi-v2-shard[$shard]";
 	my $current_info = '';
 	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
 	local $SIG{__WARN__} = sub {
 		chomp $current_info;
-		$warn_cb->("[$part] $current_info: ", @_);
+		$warn_cb->("[$shard] $current_info: ", @_);
 	};
 	$self->begin_txn_lazy;
 	while (my $line = $r->getline) {
@@ -64,7 +64,7 @@ sub partition_worker_loop ($$$$) {
 		} elsif ($line eq "barrier\n") {
 			$self->commit_txn_lazy;
 			# no need to lock < 512 bytes is atomic under POSIX
-			print $bnote "barrier $part\n" or
+			print $bnote "barrier $shard\n" or
 					die "write failed for barrier $!\n";
 		} elsif ($line =~ /\AD ([a-f0-9]{40,}) (.+)\n\z/s) {
 			my ($oid, $mid) = ($1, $2);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index aa13aa8..cc9ebfe 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -7,7 +7,7 @@ package PublicInbox::V2Writable;
 use strict;
 use warnings;
 use base qw(PublicInbox::Lock);
-use PublicInbox::SearchIdxPart;
+use PublicInbox::SearchIdxShard;
 use PublicInbox::MIME;
 use PublicInbox::Git;
 use PublicInbox::Import;
@@ -300,7 +300,7 @@ sub idx_init {
 		# idx_parts must be visible to all forked processes
 		my $idx = $self->{idx_parts} = [];
 		for my $i (0..$max) {
-			push @$idx, PublicInbox::SearchIdxPart->new($self, $i);
+			push @$idx, PublicInbox::SearchIdxShard->new($self, $i);
 		}
 
 		# Now that all subprocesses are up, we can open the FDs
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/20] xapcmd: update comments referencing "partitions"
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (9 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 10/20] v2: rename SearchIdxPart => SearchIdxShard Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 12/20] search*: rename {partition} => {shard} Eric Wong
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Don't confuse future readers of our code.
---
 lib/PublicInbox/Xapcmd.pm | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index 322d827..5e4ac87 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -40,7 +40,7 @@ sub commit_changes ($$$) {
 			$over = undef;
 		}
 
-		if (!defined($new)) { # culled partition
+		if (!defined($new)) { # culled shard
 			push @old_part, $old;
 			next;
 		}
@@ -359,7 +359,7 @@ sub cpdb ($$) {
 		$new_parts = $opt->{reshard};
 		defined $new_parts or die 'BUG: got array src w/o --reshard';
 
-		# repartitioning, M:N copy means have full read access
+		# resharding, M:N copy means have full read access
 		foreach (@$old) {
 			if ($src) {
 				my $sub = Search::Xapian::Database->new($_);
@@ -397,7 +397,7 @@ sub cpdb ($$) {
 			my $lc = $src->get_metadata('last_commit');
 			$dst->set_metadata('last_commit', $lc) if $lc;
 
-			# only the first xapian partition (0) gets 'indexlevel'
+			# only the first xapian shard (0) gets 'indexlevel'
 			if ($new =~ m!(?:xapian[0-9]+|xap[0-9]+/0)\b!) {
 				my $l = $src->get_metadata('indexlevel');
 				if ($l eq 'medium') {
@@ -407,7 +407,7 @@ sub cpdb ($$) {
 			if ($pr_data) {
 				my $tot = $src->get_doccount;
 
-				# we can only estimate when repartitioning,
+				# we can only estimate when resharding,
 				# because removed spam causes slight imbalance
 				my $est = '';
 				if (defined $cur_part && $new_parts > 1) {
@@ -459,7 +459,7 @@ sub new {
 	# http://www.tldp.org/LDP/abs/html/exitcodes.html
 	$SIG{INT} = sub { exit(130) };
 	$SIG{HUP} = $SIG{PIPE} = $SIG{TERM} = sub { exit(1) };
-	my $self = bless {}, $_[0]; # old partition => new (tmp) partition
+	my $self = bless {}, $_[0]; # old shard => new (WIP) shard
 	$owner{"$self"} = $$;
 	$self;
 }
@@ -481,7 +481,7 @@ sub DESTROY {
 	my $owner_pid = delete $owner{"$self"} or return;
 	return if $owner_pid != $$;
 	foreach my $new (values %$self) {
-		defined $new or next; # may be undef if repartitioning
+		defined $new or next; # may be undef if resharding
 		remove_tree($new) unless -d "$new/old";
 	}
 	done($self);
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/20] search*: rename {partition} => {shard}
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (10 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 11/20] xapcmd: update comments referencing "partitions" Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 13/20] v2writable: avoid "part" in internal subs and fields Eric Wong
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Another step towards keeping our internal data structures
consistent with Xapian naming.
---
 lib/PublicInbox/Search.pm    |  6 +++---
 lib/PublicInbox/SearchIdx.pm | 17 +++++++++--------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 098c97c..45431ec 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -131,9 +131,9 @@ sub xdir ($;$) {
 		my $dir = "$self->{mainrepo}/xap" . SCHEMA_VERSION;
 		return $dir if $rdonly;
 
-		my $part = $self->{partition};
-		defined $part or die "partition not given";
-		$dir .= "/$part";
+		my $shard = $self->{shard};
+		defined $shard or die "shard not given";
+		$dir .= "/$shard";
 	}
 }
 
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index a088ce7..58b2337 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -29,7 +29,7 @@ use constant {
 my $xapianlevels = qr/\A(?:full|medium)\z/;
 
 sub new {
-	my ($class, $ibx, $creat, $part) = @_;
+	my ($class, $ibx, $creat, $shard) = @_;
 	ref $ibx or die "BUG: expected PublicInbox::Inbox object: $ibx";
 	my $levels = qr/\A(?:full|medium|basic)\z/;
 	my $mainrepo = $ibx->{mainrepo};
@@ -62,9 +62,9 @@ sub new {
 		my $dir = $self->xdir;
 		$self->{over} = PublicInbox::OverIdx->new("$dir/over.sqlite3");
 	} elsif ($version == 2) {
-		defined $part or die "partition is required for v2\n";
-		# partition is a number
-		$self->{partition} = $part;
+		defined $shard or die "shard is required for v2\n";
+		# shard is a number
+		$self->{shard} = $shard;
 		$self->{lock_path} = undef;
 	} else {
 		die "unsupported inbox version=$version\n";
@@ -102,8 +102,8 @@ sub _xdb_acquire {
 		$self->lock_acquire;
 
 		# don't create empty Xapian directories if we don't need Xapian
-		my $is_part = defined($self->{partition});
-		if (!$is_part || ($is_part && need_xapian($self))) {
+		my $is_shard = defined($self->{shard});
+		if (!$is_shard || ($is_shard && need_xapian($self))) {
 			File::Path::mkpath($dir);
 		}
 	}
@@ -824,9 +824,10 @@ sub commit_txn_lazy {
 	$self->{-inbox}->with_umask(sub {
 		if (my $xdb = $self->{xdb}) {
 
-			# store 'indexlevel=medium' in v2 part=0 and v1 (only part)
+			# store 'indexlevel=medium' in v2 shard=0 and
+			# v1 (only one shard)
 			# This metadata is read by Admin::detect_indexlevel:
-			if (!$self->{partition} # undef or 0, not >0
+			if (!$self->{shard} # undef or 0, not >0
 			    && $self->{indexlevel} eq 'medium') {
 				$xdb->set_metadata('indexlevel', 'medium');
 			}
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/20] v2writable: avoid "part" in internal subs and fields
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (11 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 12/20] search*: rename {partition} => {shard} Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 14/20] v2writable: rename local vars to match Xapian terminology Eric Wong
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

We'll be using the term "shard" from now on to be consistent
with Xapian terminology.
---
 lib/PublicInbox/V2Writable.pm | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index cc9ebfe..a231390 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -28,10 +28,10 @@ my $PACKING_FACTOR = 0.4;
 # waste of FDs and space.  It can also lead to excessive IO latency
 # and slow things down.  Users on NVME or other fast storage can
 # use the NPROC env or switches in our script/public-inbox-* programs
-# to increase Xapian partitions.
+# to increase Xapian shards
 our $NPROC_MAX_DEFAULT = 4;
 
-sub nproc_parts ($) {
+sub nproc_shards ($) {
 	my ($creat_opt) = @_;
 	if (ref($creat_opt) eq 'HASH') {
 		if (defined(my $n = $creat_opt->{nproc})) {
@@ -103,7 +103,7 @@ sub new {
 		rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR),
 		last_commit => [], # git repo -> commit
 	};
-	$self->{shards} = count_shards($self) || nproc_parts($creat);
+	$self->{shards} = count_shards($self) || nproc_shards($creat);
 	bless $self, $class;
 }
 
@@ -136,7 +136,7 @@ sub do_idx ($$$$$$$) {
 	$self->{over}->add_overview($mime, $len, $num, $oid, $mid0);
 	my $npart = $self->{shards};
 	my $part = $num % $npart;
-	my $idx = idx_part($self, $part);
+	my $idx = idx_shard($self, $part);
 	$idx->index_raw($len, $msgref, $num, $oid, $mid0, $mime);
 	my $n = $self->{transact_bytes} += $len;
 	$n >= (PublicInbox::SearchIdx::BATCH_BYTES * $npart);
@@ -252,15 +252,15 @@ sub num_for_harder {
 	$num;
 }
 
-sub idx_part {
+sub idx_shard {
 	my ($self, $part) = @_;
-	$self->{idx_parts}->[$part];
+	$self->{idx_shards}->[$part];
 }
 
 # idempotent
 sub idx_init {
 	my ($self, $opt) = @_;
-	return if $self->{idx_parts};
+	return if $self->{idx_shards};
 	my $ibx = $self->{-inbox};
 
 	# do not leak read-only FDs to child processes, we only have these
@@ -297,8 +297,8 @@ sub idx_init {
 		# need to create all parts before initializing msgmap FD
 		my $max = $self->{shards} - 1;
 
-		# idx_parts must be visible to all forked processes
-		my $idx = $self->{idx_parts} = [];
+		# idx_shards must be visible to all forked processes
+		my $idx = $self->{idx_shards} = [];
 		for my $i (0..$max) {
 			push @$idx, PublicInbox::SearchIdxShard->new($self, $i);
 		}
@@ -370,7 +370,7 @@ sub rewrite_internal ($$;$$$) {
 	}
 	my $over = $self->{over};
 	my $cids = content_ids($old_mime);
-	my $parts = $self->{idx_parts};
+	my $parts = $self->{idx_shards};
 	my $removed;
 	my $mids = mids($old_mime->header_obj);
 
@@ -605,7 +605,7 @@ sub checkpoint ($;$) {
 			$im->checkpoint;
 		}
 	}
-	my $parts = $self->{idx_parts};
+	my $parts = $self->{idx_shards};
 	if ($parts) {
 		my $dbh = $self->{mm}->{dbh};
 
@@ -652,7 +652,7 @@ sub done {
 	checkpoint($self);
 	my $mm = delete $self->{mm};
 	$mm->{dbh}->commit if $mm;
-	my $parts = delete $self->{idx_parts};
+	my $parts = delete $self->{idx_shards};
 	if ($parts) {
 		$_->remote_close for @$parts;
 	}
@@ -827,7 +827,7 @@ sub atfork_child {
 	my ($self) = @_;
 	my $fh = delete $self->{reindex_pipe};
 	close $fh if $fh;
-	if (my $parts = $self->{idx_parts}) {
+	if (my $parts = $self->{idx_shards}) {
 		$_->atfork_child foreach @$parts;
 	}
 	if (my $im = $self->{im}) {
@@ -1051,7 +1051,7 @@ sub sync_prepare ($$$) {
 
 sub unindex_oid_remote ($$$) {
 	my ($self, $oid, $mid) = @_;
-	$_->remote_remove($oid, $mid) foreach @{$self->{idx_parts}};
+	$_->remote_remove($oid, $mid) foreach @{$self->{idx_shards}};
 	$self->{over}->remove_oid($oid, $mid);
 }
 
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/20] v2writable: rename local vars to match Xapian terminology
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (12 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 13/20] v2writable: avoid "part" in internal subs and fields Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 15/20] adminedit: "part" => "shard" for local variables Eric Wong
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

---
 lib/PublicInbox/V2Writable.pm | 53 +++++++++++++++++------------------
 1 file changed, 25 insertions(+), 28 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index a231390..502824c 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -54,22 +54,22 @@ sub nproc_shards ($) {
 
 sub count_shards ($) {
 	my ($self) = @_;
-	my $nparts = 0;
+	my $n = 0;
 	my $xpfx = $self->{xpfx};
 
 	# always load existing partitions in case core count changes:
 	# Also, shard count may change while -watch is running
 	# due to "xcpdb --reshard"
 	if (-d $xpfx) {
-		foreach my $part (<$xpfx/*>) {
-			-d $part && $part =~ m!/[0-9]+\z! or next;
+		foreach my $shard (<$xpfx/*>) {
+			-d $shard && $shard =~ m!/[0-9]+\z! or next;
 			eval {
-				Search::Xapian::Database->new($part)->close;
-				$nparts++;
+				Search::Xapian::Database->new($shard)->close;
+				$n++;
 			};
 		}
 	}
-	$nparts;
+	$n;
 }
 
 sub new {
@@ -134,12 +134,10 @@ sub add {
 sub do_idx ($$$$$$$) {
 	my ($self, $msgref, $mime, $len, $num, $oid, $mid0) = @_;
 	$self->{over}->add_overview($mime, $len, $num, $oid, $mid0);
-	my $npart = $self->{shards};
-	my $part = $num % $npart;
-	my $idx = idx_shard($self, $part);
+	my $idx = idx_shard($self, $num % $self->{shards});
 	$idx->index_raw($len, $msgref, $num, $oid, $mid0, $mime);
 	my $n = $self->{transact_bytes} += $len;
-	$n >= (PublicInbox::SearchIdx::BATCH_BYTES * $npart);
+	$n >= (PublicInbox::SearchIdx::BATCH_BYTES * $self->{shards});
 }
 
 sub _add {
@@ -253,8 +251,8 @@ sub num_for_harder {
 }
 
 sub idx_shard {
-	my ($self, $part) = @_;
-	$self->{idx_shards}->[$part];
+	my ($self, $shard_i) = @_;
+	$self->{idx_shards}->[$shard_i];
 }
 
 # idempotent
@@ -289,9 +287,9 @@ sub idx_init {
 		$over->create;
 
 		# xcpdb can change shard count while -watch is idle
-		my $nparts = count_shards($self);
-		if ($nparts && $nparts != $self->{shards}) {
-			$self->{shards} = $nparts;
+		my $nshards = count_shards($self);
+		if ($nshards && $nshards != $self->{shards}) {
+			$self->{shards} = $nshards;
 		}
 
 		# need to create all parts before initializing msgmap FD
@@ -370,7 +368,6 @@ sub rewrite_internal ($$;$$$) {
 	}
 	my $over = $self->{over};
 	my $cids = content_ids($old_mime);
-	my $parts = $self->{idx_shards};
 	my $removed;
 	my $mids = mids($old_mime->header_obj);
 
@@ -590,7 +587,7 @@ sub barrier_wait {
 	while (scalar keys %$barrier) {
 		defined(my $l = $r->getline) or die "EOF on barrier_wait: $!";
 		$l =~ /\Abarrier (\d+)/ or die "bad line on barrier_wait: $l";
-		delete $barrier->{$1} or die "bad part[$1] on barrier wait";
+		delete $barrier->{$1} or die "bad shard[$1] on barrier wait";
 	}
 }
 
@@ -605,8 +602,8 @@ sub checkpoint ($;$) {
 			$im->checkpoint;
 		}
 	}
-	my $parts = $self->{idx_shards};
-	if ($parts) {
+	my $shards = $self->{idx_shards};
+	if ($shards) {
 		my $dbh = $self->{mm}->{dbh};
 
 		# SQLite msgmap data is second in importance
@@ -617,15 +614,15 @@ sub checkpoint ($;$) {
 
 		# Now deal with Xapian
 		if ($wait) {
-			my $barrier = $self->barrier_init(scalar @$parts);
+			my $barrier = $self->barrier_init(scalar @$shards);
 
 			# each partition needs to issue a barrier command
-			$_->remote_barrier for @$parts;
+			$_->remote_barrier for @$shards;
 
 			# wait for each Xapian partition
 			$self->barrier_wait($barrier);
 		} else {
-			$_->remote_commit for @$parts;
+			$_->remote_commit for @$shards;
 		}
 
 		# last_commit is special, don't commit these until
@@ -652,14 +649,14 @@ sub done {
 	checkpoint($self);
 	my $mm = delete $self->{mm};
 	$mm->{dbh}->commit if $mm;
-	my $parts = delete $self->{idx_shards};
-	if ($parts) {
-		$_->remote_close for @$parts;
+	my $shards = delete $self->{idx_shards};
+	if ($shards) {
+		$_->remote_close for @$shards;
 	}
 	$self->{over}->disconnect;
 	delete $self->{bnote};
 	$self->{transact_bytes} = 0;
-	$self->lock_release if $parts;
+	$self->lock_release if $shards;
 	$self->{-inbox}->git->cleanup;
 }
 
@@ -827,8 +824,8 @@ sub atfork_child {
 	my ($self) = @_;
 	my $fh = delete $self->{reindex_pipe};
 	close $fh if $fh;
-	if (my $parts = $self->{idx_shards}) {
-		$_->atfork_child foreach @$parts;
+	if (my $shards = $self->{idx_shards}) {
+		$_->atfork_child foreach @$shards;
 	}
 	if (my $im = $self->{im}) {
 		$im->atfork_child;
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/20] adminedit: "part" => "shard" for local variables
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (13 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 14/20] v2writable: rename local vars to match Xapian terminology Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 16/20] v2writable: use "epoch" consistently when referring to git repos Eric Wong
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

---
 lib/PublicInbox/AdminEdit.pm | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/AdminEdit.pm b/lib/PublicInbox/AdminEdit.pm
index 169feba..2e2a862 100644
--- a/lib/PublicInbox/AdminEdit.pm
+++ b/lib/PublicInbox/AdminEdit.pm
@@ -29,15 +29,15 @@ sub check_editable ($) {
 
 		# $ibx->{search} is populated by $ibx->over call
 		my $xdir_ro = $ibx->{search}->xdir(1);
-		my $npart = 0;
-		foreach my $part (<$xdir_ro/*>) {
-			if (-d $part && $part =~ m!/[0-9]+\z!) {
+		my $nshard = 0;
+		foreach my $shard (<$xdir_ro/*>) {
+			if (-d $shard && $shard =~ m!/[0-9]+\z!) {
 				my $bytes = 0;
-				$bytes += -s $_ foreach glob("$part/*");
-				$npart++ if $bytes;
+				$bytes += -s $_ foreach glob("$shard/*");
+				$nshard++ if $bytes;
 			}
 		}
-		if ($npart) {
+		if ($nshard) {
 			PublicInbox::Admin::require_or_die('-search');
 		} else {
 			# somebody could "rm -r" all the Xapian directories;
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 16/20] v2writable: use "epoch" consistently when referring to git repos
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (14 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 15/20] adminedit: "part" => "shard" for local variables Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 17/20] search: use "shard" for local variable Eric Wong
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Be consistent with our own terminology and use "epoch" for
[0-9]+\.git repos.  The term "partition" is going away entirely.
---
 lib/PublicInbox/V2Writable.pm | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 502824c..7a89093 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -556,7 +556,7 @@ W: $list
 	$rewritten->{rewrites};
 }
 
-sub last_commit_part ($$;$) {
+sub last_epoch_commit ($$;$) {
 	my ($self, $i, $cmt) = @_;
 	my $v = PublicInbox::Search::SCHEMA_VERSION();
 	$self->{mm}->last_commit_xap($v, $i, $cmt);
@@ -569,7 +569,7 @@ sub set_last_commits ($) {
 	foreach my $i (0..$epoch_max) {
 		defined(my $cmt = $last_commit->[$i]) or next;
 		$last_commit->[$i] = undef;
-		last_commit_part($self, $i, $cmt);
+		last_epoch_commit($self, $i, $cmt);
 	}
 }
 
@@ -927,13 +927,13 @@ sub reindex_oid ($$$$) {
 # only update last_commit for $i on reindex iff newer than current
 sub update_last_commit ($$$$) {
 	my ($self, $git, $i, $cmt) = @_;
-	my $last = last_commit_part($self, $i);
+	my $last = last_epoch_commit($self, $i);
 	if (defined $last && is_ancestor($git, $last, $cmt)) {
 		my @cmd = (qw(rev-list --count), "$last..$cmt");
 		chomp(my $n = $git->qx(@cmd));
 		return if $n ne '' && $n == 0;
 	}
-	last_commit_part($self, $i, $cmt);
+	last_epoch_commit($self, $i, $cmt);
 }
 
 sub git_dir_n ($$) { "$_[0]->{-inbox}->{mainrepo}/git/$_[1].git" }
@@ -942,7 +942,7 @@ sub last_commits ($$) {
 	my ($self, $epoch_max) = @_;
 	my $heads = [];
 	for (my $i = $epoch_max; $i >= 0; $i--) {
-		$heads->[$i] = last_commit_part($self, $i);
+		$heads->[$i] = last_epoch_commit($self, $i);
 	}
 	$heads;
 }
@@ -1013,7 +1013,7 @@ sub sync_prepare ($$$) {
 	for (my $i = $epoch_max; $i >= 0; $i--) {
 		die 'BUG: already indexing!' if $self->{reindex_pipe};
 		my $git_dir = git_dir_n($self, $i);
-		-d $git_dir or next; # missing parts are fine
+		-d $git_dir or next; # missing epochs are fine
 		my $git = PublicInbox::Git->new($git_dir);
 		if ($reindex_heads) {
 			$head = $reindex_heads->[$i] or next;
@@ -1123,7 +1123,7 @@ sub index_epoch ($$$) {
 
 	my $git_dir = git_dir_n($self, $i);
 	die 'BUG: already reindexing!' if $self->{reindex_pipe};
-	-d $git_dir or return; # missing parts are fine
+	-d $git_dir or return; # missing epochs are fine
 	fill_alternates($self, $i);
 	my $git = PublicInbox::Git->new($git_dir);
 	if (my $unindex_range = delete $sync->{unindex_range}->{$i}) {
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 17/20] search: use "shard" for local variable
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (15 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 16/20] v2writable: use "epoch" consistently when referring to git repos Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 18/20] xapcmd: favor 'shard' over 'part' in local variables Eric Wong
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Another small step towards terminology consistency with Xapian.
---
 lib/PublicInbox/Search.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 45431ec..60fc861 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -143,15 +143,15 @@ sub _xdb ($) {
 	my ($xdb, $slow_phrase);
 	my $qpf = \($self->{qp_flags} ||= $QP_FLAGS);
 	if ($self->{version} >= 2) {
-		foreach my $part (<$dir/*>) {
-			-d $part && $part =~ m!/[0-9]+\z! or next;
-			my $sub = Search::Xapian::Database->new($part);
+		foreach my $shard (<$dir/*>) {
+			-d $shard && $shard =~ m!/[0-9]+\z! or next;
+			my $sub = Search::Xapian::Database->new($shard);
 			if ($xdb) {
 				$xdb->add_database($sub);
 			} else {
 				$xdb = $sub;
 			}
-			$slow_phrase ||= -f "$part/iamchert";
+			$slow_phrase ||= -f "$shard/iamchert";
 		}
 	} else {
 		$slow_phrase = -f "$dir/iamchert";
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 18/20] xapcmd: favor 'shard' over 'part' in local variables
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (16 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 17/20] search: use "shard" for local variable Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 19/20] t/xcpdb-reshard: use 'shard' term " Eric Wong
  2019-06-15  8:47 ` [PATCH 20/20] comments: replace "partition" with "shard" Eric Wong
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Yet another step to keeping our naming consistent with Xapian
terminology.
---
 lib/PublicInbox/Xapcmd.pm | 70 +++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index 5e4ac87..819d782 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -17,13 +17,13 @@ our @COMPACT_OPT = qw(jobs|j=i quiet|q blocksize|b=s no-full|n fuller|F);
 
 sub commit_changes ($$$) {
 	my ($ibx, $tmp, $opt) = @_;
-	my $new_parts = $opt->{reshard};
+	my $reshard = $opt->{reshard};
 	my $reindex = $opt->{reindex};
 	my $im = $ibx->importer(0);
 	$im->lock_acquire if !$opt->{-coarse_lock};
 
 	$SIG{INT} or die 'BUG: $SIG{INT} not handled';
-	my @old_part;
+	my @old_shard;
 
 	while (my ($old, $new) = each %$tmp) {
 		my @st = stat($old);
@@ -41,7 +41,7 @@ sub commit_changes ($$$) {
 		}
 
 		if (!defined($new)) { # culled shard
-			push @old_part, $old;
+			push @old_shard, $old;
 			next;
 		}
 
@@ -58,7 +58,7 @@ sub commit_changes ($$$) {
 				die "failed to remove $prev: $!\n";
 		}
 	}
-	remove_tree(@old_part);
+	remove_tree(@old_shard);
 	$tmp->done;
 	if (!$opt->{-coarse_lock}) {
 		$opt->{-skip_lock} = 1;
@@ -66,9 +66,9 @@ sub commit_changes ($$$) {
 		if ($im->can('count_shards')) {
 			my $pr = $opt->{-progress};
 			my $n = $im->count_shards;
-			if (defined $new_parts && $n != $new_parts) {
+			if (defined $reshard && $n != $reshard) {
 				die
-"BUG: counted $n shards after resharding to $new_parts";
+"BUG: counted $n shards after resharding to $reshard";
 			}
 			my $prev = $im->{shards};
 			if ($pr && $prev != $n) {
@@ -171,17 +171,17 @@ sub run {
 	my $tmp = PublicInbox::Xtmpdirs->new;
 	my $v = $ibx->{version} ||= 1;
 	my @q;
-	my $new_parts = $opt->{reshard};
-	if (defined $new_parts && $new_parts <= 0) {
+	my $reshard = $opt->{reshard};
+	if (defined $reshard && $reshard <= 0) {
 		die "--reshard must be a positive number\n";
 	}
 
 	# we want temporary directories to be as deep as possible,
 	# so v2 shards can keep "xap$SCHEMA_VERSION" on a separate FS.
 	if ($v == 1) {
-		if (defined $new_parts) {
+		if (defined $reshard) {
 			warn
-"--reshard=$new_parts ignored for v1 $ibx->{mainrepo}\n";
+"--reshard=$reshard ignored for v1 $ibx->{mainrepo}\n";
 		}
 		my $old_parent = dirname($old);
 		same_fs_or_die($old_parent, $old);
@@ -191,28 +191,28 @@ sub run {
 		push @q, [ $old, $wip ];
 	} else {
 		opendir my $dh, $old or die "Failed to opendir $old: $!\n";
-		my @old_parts;
+		my @old_shards;
 		while (defined(my $dn = readdir($dh))) {
 			if ($dn =~ /\A[0-9]+\z/) {
-				push @old_parts, $dn;
+				push @old_shards, $dn;
 			} elsif ($dn eq '.' || $dn eq '..') {
 			} elsif ($dn =~ /\Aover\.sqlite3/) {
 			} else {
 				warn "W: skipping unknown dir: $old/$dn\n"
 			}
 		}
-		die "No Xapian parts found in $old\n" unless @old_parts;
+		die "No Xapian shards found in $old\n" unless @old_shards;
 
-		my ($src, $max_part);
-		if (!defined($new_parts) || $new_parts == scalar(@old_parts)) {
+		my ($src, $max_shard);
+		if (!defined($reshard) || $reshard == scalar(@old_shards)) {
 			# 1:1 copy
-			$max_part = scalar(@old_parts) - 1;
+			$max_shard = scalar(@old_shards) - 1;
 		} else {
 			# M:N copy
-			$max_part = $new_parts - 1;
-			$src = [ map { "$old/$_" } @old_parts ];
+			$max_shard = $reshard - 1;
+			$src = [ map { "$old/$_" } @old_shards ];
 		}
-		foreach my $dn (0..$max_part) {
+		foreach my $dn (0..$max_shard) {
 			my $tmpl = "$dn-XXXXXXXX";
 			my $wip = tempdir($tmpl, DIR => $old);
 			same_fs_or_die($old, $wip);
@@ -220,7 +220,7 @@ sub run {
 			push @q, [ $src // $cur , $wip ];
 			$tmp->{$cur} = $wip;
 		}
-		# mark old parts to be unlinked
+		# mark old shards to be unlinked
 		if ($src) {
 			$tmp->{$_} ||= undef for @$src;
 		}
@@ -305,7 +305,7 @@ sub compact ($$) {
 }
 
 sub cpdb_loop ($$$;$$) {
-	my ($src, $dst, $pr_data, $cur_part, $new_parts) = @_;
+	my ($src, $dst, $pr_data, $cur_shard, $reshard) = @_;
 	my ($pr, $fmt, $nr, $pfx);
 	if ($pr_data) {
 		$pr = $pr_data->{pr};
@@ -326,9 +326,9 @@ sub cpdb_loop ($$$;$$) {
 		eval {
 			for (; $it != $end; $it++) {
 				my $docid = $it->get_docid;
-				if (defined $new_parts) {
-					my $dst_part = $docid % $new_parts;
-					next if $dst_part != $cur_part;
+				if (defined $reshard) {
+					my $dst_shard = $docid % $reshard;
+					next if $dst_shard != $cur_shard;
 				}
 				my $doc = $src->get_document($docid);
 				$dst->replace_document($docid, $doc);
@@ -350,14 +350,14 @@ sub cpdb_loop ($$$;$$) {
 sub cpdb ($$) {
 	my ($args, $opt) = @_;
 	my ($old, $new) = @$args;
-	my ($src, $cur_part);
-	my $new_parts;
+	my ($src, $cur_shard);
+	my $reshard;
 	if (ref($old) eq 'ARRAY') {
-		($cur_part) = ($new =~ m!xap[0-9]+/([0-9]+)\b!);
-		defined $cur_part or
+		($cur_shard) = ($new =~ m!xap[0-9]+/([0-9]+)\b!);
+		defined $cur_shard or
 			die "BUG: could not extract shard # from $new";
-		$new_parts = $opt->{reshard};
-		defined $new_parts or die 'BUG: got array src w/o --reshard';
+		$reshard = $opt->{reshard};
+		defined $reshard or die 'BUG: got array src w/o --reshard';
 
 		# resharding, M:N copy means have full read access
 		foreach (@$old) {
@@ -410,8 +410,8 @@ sub cpdb ($$) {
 				# we can only estimate when resharding,
 				# because removed spam causes slight imbalance
 				my $est = '';
-				if (defined $cur_part && $new_parts > 1) {
-					$tot = int($tot/$new_parts);
+				if (defined $cur_shard && $reshard > 1) {
+					$tot = int($tot/$reshard);
 					$est = 'around ';
 				}
 				my $fmt = "$pfx % ".length($tot)."u/$tot\n";
@@ -422,15 +422,15 @@ sub cpdb ($$) {
 		};
 	} while (cpdb_retryable($src, $pfx));
 
-	if (defined $new_parts) {
+	if (defined $reshard) {
 		# we rely on document IDs matching NNTP article number,
-		# so we can't have the combined DB support rewriting
+		# so we can't have the Xapian sharding DB support rewriting
 		# document IDs.  Thus we iterate through each shard
 		# individually.
 		$src = undef;
 		foreach (@$old) {
 			my $old = Search::Xapian::Database->new($_);
-			cpdb_loop($old, $dst, $pr_data, $cur_part, $new_parts);
+			cpdb_loop($old, $dst, $pr_data, $cur_shard, $reshard);
 		}
 	} else {
 		cpdb_loop($src, $dst, $pr_data);
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 19/20] t/xcpdb-reshard: use 'shard' term in local variables
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (17 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 18/20] xapcmd: favor 'shard' over 'part' in local variables Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  2019-06-15  8:47 ` [PATCH 20/20] comments: replace "partition" with "shard" Eric Wong
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Another step in maintaining consistency with Xapian docs.
---
 t/xcpdb-reshard.t | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/xcpdb-reshard.t b/t/xcpdb-reshard.t
index bf56404..d921e12 100644
--- a/t/xcpdb-reshard.t
+++ b/t/xcpdb-reshard.t
@@ -43,8 +43,8 @@ for my $i (1..$ndoc) {
 	ok($im->add($mime), "message $i added");
 }
 $im->done;
-my @parts = grep(m!/\d+\z!, glob("$ibx->{mainrepo}/xap*/*"));
-is(scalar(@parts), $nproc, 'got expected parts');
+my @shards = grep(m!/\d+\z!, glob("$ibx->{mainrepo}/xap*/*"));
+is(scalar(@shards), $nproc, 'got expected shards');
 my $orig = $ibx->over->query_xover(1, $ndoc);
 my %nums = map {; "$_->{num}" => 1 } @$orig;
 
@@ -52,8 +52,8 @@ my %nums = map {; "$_->{num}" => 1 } @$orig;
 for my $R (qw(2 4 1 3 3)) {
 	delete $ibx->{search}; # release old handles
 	is(system(@xcpdb, "-R$R", $ibx->{mainrepo}), 0, "xcpdb -R$R");
-	my @new_parts = grep(m!/\d+\z!, glob("$ibx->{mainrepo}/xap*/*"));
-	is(scalar(@new_parts), $R, 'resharded to two parts');
+	my @new_shards = grep(m!/\d+\z!, glob("$ibx->{mainrepo}/xap*/*"));
+	is(scalar(@new_shards), $R, 'resharded to two shards');
 	my $msgs = $ibx->search->query('s:this');
 	is(scalar(@$msgs), $ndoc, 'got expected docs after resharding');
 	my %by_mid = map {; "$_->{mid}" => $_ } @$msgs;
@@ -64,7 +64,7 @@ for my $R (qw(2 4 1 3 3)) {
 	# ensure docids in Xapian match NNTP article numbers
 	my $tot = 0;
 	my %tmp = %nums;
-	foreach my $d (@new_parts) {
+	foreach my $d (@new_shards) {
 		my $xdb = Search::Xapian::Database->new($d);
 		$tot += $xdb->get_doccount;
 		my $it = $xdb->postlist_begin('');
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 20/20] comments: replace "partition" with "shard"
  2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
                   ` (18 preceding siblings ...)
  2019-06-15  8:47 ` [PATCH 19/20] t/xcpdb-reshard: use 'shard' term " Eric Wong
@ 2019-06-15  8:47 ` Eric Wong
  19 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2019-06-15  8:47 UTC (permalink / raw)
  To: meta

Now that the code matches Xapian terminology, ensure
our comments match, too.
---
 lib/PublicInbox/SearchIdx.pm  |  2 +-
 lib/PublicInbox/V2Writable.pm | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 58b2337..665f673 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -797,7 +797,7 @@ sub remote_close {
 sub remote_remove {
 	my ($self, $oid, $mid) = @_;
 	if (my $w = $self->{w}) {
-		# triggers remove_by_oid in a partition
+		# triggers remove_by_oid in a shard
 		print $w "D $oid $mid\n" or die "failed to write remove $!";
 	} else {
 		$self->begin_txn_lazy;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 7a89093..2b3ffa6 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -24,7 +24,7 @@ use IO::Handle;
 my $PACKING_FACTOR = 0.4;
 
 # SATA storage lags behind what CPUs are capable of, so relying on
-# nproc(1) can be misleading and having extra Xapian partions is a
+# nproc(1) can be misleading and having extra Xapian shards is a
 # waste of FDs and space.  It can also lead to excessive IO latency
 # and slow things down.  Users on NVME or other fast storage can
 # use the NPROC env or switches in our script/public-inbox-* programs
@@ -57,7 +57,7 @@ sub count_shards ($) {
 	my $n = 0;
 	my $xpfx = $self->{xpfx};
 
-	# always load existing partitions in case core count changes:
+	# always load existing shards in case core count changes:
 	# Also, shard count may change while -watch is running
 	# due to "xcpdb --reshard"
 	if (-d $xpfx) {
@@ -292,7 +292,7 @@ sub idx_init {
 			$self->{shards} = $nshards;
 		}
 
-		# need to create all parts before initializing msgmap FD
+		# need to create all shards before initializing msgmap FD
 		my $max = $self->{shards} - 1;
 
 		# idx_shards must be visible to all forked processes
@@ -616,17 +616,17 @@ sub checkpoint ($;$) {
 		if ($wait) {
 			my $barrier = $self->barrier_init(scalar @$shards);
 
-			# each partition needs to issue a barrier command
+			# each shard needs to issue a barrier command
 			$_->remote_barrier for @$shards;
 
-			# wait for each Xapian partition
+			# wait for each Xapian shard
 			$self->barrier_wait($barrier);
 		} else {
 			$_->remote_commit for @$shards;
 		}
 
 		# last_commit is special, don't commit these until
-		# remote partitions are done:
+		# remote shards are done:
 		$dbh->begin_work;
 		set_last_commits($self);
 		$dbh->commit;
-- 
EW


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-06-15  8:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-15  8:46 [PATCH 00/20] v2: use consistent terminology Eric Wong
2019-06-15  8:46 ` [PATCH 01/20] doc: rename our Xapian "partitions" to "shards" Eric Wong
2019-06-15  8:46 ` [PATCH 02/20] v2writable: update comments regarding xcpdb --reshard Eric Wong
2019-06-15  8:46 ` [PATCH 03/20] admin|xapcmd: user-facing messages say "shard" Eric Wong
2019-06-15  8:47 ` [PATCH 04/20] rename reference to git epochs as "partitions" Eric Wong
2019-06-15  8:47 ` [PATCH 05/20] searchidxpart: start using "shard" in user-visible places Eric Wong
2019-06-15  8:47 ` [PATCH 06/20] v2writable: count_partitions => count_shards Eric Wong
2019-06-15  8:47 ` [PATCH 07/20] v2writable: rename {partitions} field to {shards} Eric Wong
2019-06-15  8:47 ` [PATCH 08/20] tests: change messages to use "shard" instead of partition Eric Wong
2019-06-15  8:47 ` [PATCH 09/20] inboxwritable: s/partitions/shards/ in local var Eric Wong
2019-06-15  8:47 ` [PATCH 10/20] v2: rename SearchIdxPart => SearchIdxShard Eric Wong
2019-06-15  8:47 ` [PATCH 11/20] xapcmd: update comments referencing "partitions" Eric Wong
2019-06-15  8:47 ` [PATCH 12/20] search*: rename {partition} => {shard} Eric Wong
2019-06-15  8:47 ` [PATCH 13/20] v2writable: avoid "part" in internal subs and fields Eric Wong
2019-06-15  8:47 ` [PATCH 14/20] v2writable: rename local vars to match Xapian terminology Eric Wong
2019-06-15  8:47 ` [PATCH 15/20] adminedit: "part" => "shard" for local variables Eric Wong
2019-06-15  8:47 ` [PATCH 16/20] v2writable: use "epoch" consistently when referring to git repos Eric Wong
2019-06-15  8:47 ` [PATCH 17/20] search: use "shard" for local variable Eric Wong
2019-06-15  8:47 ` [PATCH 18/20] xapcmd: favor 'shard' over 'part' in local variables Eric Wong
2019-06-15  8:47 ` [PATCH 19/20] t/xcpdb-reshard: use 'shard' term " Eric Wong
2019-06-15  8:47 ` [PATCH 20/20] comments: replace "partition" with "shard" Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).