user/dev discussion of public-inbox itself
 help / color / mirror / Atom feed
* [PATCH 0/7] index + extindex interaction improvements
@ 2020-12-25 10:21 Eric Wong
  2020-12-25 10:21 ` [PATCH 1/7] index: disable --fast-noop on --reindex Eric Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

Some things which make -index less painful when auto-updating
external indices.

"public-inbox-extindex --all" itself is still painfully slow
with 50K inboxes, but I think that can only be used once for
initialization and -index can be relied on for all incremental
updates.


Eric Wong (7):
  index: disable --fast-noop on --reindex
  extsearchidx: delay SQLite availability checks
  extsearchidx: close DB handles after use if FD constrained
  index: do not attach inbox to extindex unless updated
  index: fix --no-fsync flag propagation to extindex
  v2writable: don't verify tip if reindexing
  index: filter out indexlevel=basic from extindex

 lib/PublicInbox/Admin.pm        |  1 +
 lib/PublicInbox/ExtSearchIdx.pm | 96 +++++++++++++++++++++------------
 lib/PublicInbox/SearchIdx.pm    |  2 +
 lib/PublicInbox/V2Writable.pm   | 36 +++++++++----
 script/public-inbox-index       | 27 ++++++----
 5 files changed, 109 insertions(+), 53 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7] index: disable --fast-noop on --reindex
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
@ 2020-12-25 10:21 ` Eric Wong
  2020-12-25 10:21 ` [PATCH 2/7] extsearchidx: delay SQLite availability checks Eric Wong
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

These options make no sense when used together, just inform the
user and move on since it's probably harmless to continue.
---
 script/public-inbox-index | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/script/public-inbox-index b/script/public-inbox-index
index 91afac88..87893ef1 100755
--- a/script/public-inbox-index
+++ b/script/public-inbox-index
@@ -49,6 +49,9 @@ die "--jobs must be >= 0\n" if defined $opt->{jobs} && $opt->{jobs} < 0;
 if ($opt->{xapian_only} && !$opt->{reindex}) {
 	die "--xapian-only requires --reindex\n";
 }
+if ($opt->{reindex} && delete($opt->{'fast-noop'})) {
+	warn "--fast-noop ignored with --reindex\n";
+}
 
 # require lazily to speed up --help
 require PublicInbox::Admin;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/7] extsearchidx: delay SQLite availability checks
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
  2020-12-25 10:21 ` [PATCH 1/7] index: disable --fast-noop on --reindex Eric Wong
@ 2020-12-25 10:21 ` Eric Wong
  2020-12-25 10:21 ` [PATCH 3/7] extsearchidx: close DB handles after use if FD constrained Eric Wong
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

This will make attach_inbox faster for no-op calls.  It also
helps us avoid races in case msgmap or over.sqlite3 gets
unlinked while -extindex is running.
---
 lib/PublicInbox/ExtSearchIdx.pm | 57 ++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index c43a6c5e..386e1cee 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -61,23 +61,7 @@ sub new {
 
 sub attach_inbox {
 	my ($self, $ibx) = @_;
-	my $ekey = $ibx->eidx_key;
-	my $misc = $self->{misc};
-	if ($misc && $misc->inbox_data($ibx)) { # all good if already indexed
-	} else {
-		my @sqlite = ($ibx->over, $ibx->mm);
-		my $uidvalidity = $ibx->uidvalidity;
-		$ibx->{mm} = $ibx->{over} = undef;
-		if (scalar(@sqlite) != 2) {
-			warn "W: skipping $ekey (unindexed)\n";
-			return;
-		}
-		if (!defined($uidvalidity)) {
-			warn "W: skipping $ekey (no UIDVALIDITY)\n";
-			return;
-		}
-	}
-	$self->{ibx_map}->{$ekey} //= do {
+	$self->{ibx_map}->{$ibx->eidx_key} //= do {
 		push @{$self->{ibx_list}}, $ibx;
 		$ibx;
 	}
@@ -281,29 +265,36 @@ sub last_commits {
 	$heads;
 }
 
+sub _ibx_index_reject ($) {
+	my ($ibx) = @_;
+	$ibx->mm // return 'unindexed, no msgmap.sqlite3';
+	$ibx->uidvalidity // return 'no UIDVALIDITY';
+	$ibx->over // return 'unindexed, no over.sqlite3';
+	undef;
+}
+
 sub _sync_inbox ($$$) {
 	my ($self, $sync, $ibx) = @_;
+	my $ekey = $ibx->eidx_key;
+	if (defined(my $err = _ibx_index_reject($ibx))) {
+		return "W: skipping $ekey ($err)";
+	}
 	$sync->{ibx} = $ibx;
 	$sync->{nr} = \(my $nr = 0);
 	my $v = $ibx->version;
-	my $ekey = $ibx->eidx_key;
 	if ($v == 2) {
 		$sync->{epoch_max} = $ibx->max_git_epoch // return;
 		sync_prepare($self, $sync); # or return # TODO: once MiscIdx is stable
 	} elsif ($v == 1) {
 		my $uv = $ibx->uidvalidity;
 		my $lc = $self->{oidx}->eidx_meta("lc-v1:$ekey//$uv");
-		my $head = $ibx->mm->last_commit;
-		unless (defined $head) {
-			warn "E: $ibx->{inboxdir} is not indexed\n";
-			return;
-		}
+		my $head = $ibx->mm->last_commit //
+			return "E: $ibx->{inboxdir} is not indexed";
 		my $stk = prepare_stack($sync, $lc ? "$lc..$head" : $head);
 		my $unit = { stack => $stk, git => $ibx->git };
 		push @{$sync->{todo}}, $unit;
 	} else {
-		warn "E: $ekey unsupported inbox version (v$v)\n";
-		return;
+		return "E: $ekey unsupported inbox version (v$v)";
 	}
 	for my $unit (@{delete($sync->{todo}) // []}) {
 		last if $sync->{quit};
@@ -311,6 +302,7 @@ sub _sync_inbox ($$$) {
 	}
 	$self->{midx}->index_ibx($ibx) unless $sync->{quit};
 	$ibx->git->cleanup; # done with this inbox, now
+	undef;
 }
 
 sub gc_unref_doc ($$$$) {
@@ -787,9 +779,14 @@ DELETE FROM xref3 WHERE ibx_id = ? AND xnum = ? AND oidbin = ?
 
 sub _reindex_inbox ($$$) {
 	my ($self, $sync, $ibx) = @_;
-	local $self->{current_info} = $ibx->eidx_key;
-	_reindex_check_unseen($self, $sync, $ibx);
-	_reindex_check_stale($self, $sync, $ibx) unless $sync->{quit};
+	my $ekey = $ibx->eidx_key;
+	local $self->{current_info} = $ekey;
+	if (defined(my $err = _ibx_index_reject($ibx))) {
+		warn "W: cannot reindex $ekey ($err)\n";
+	} else {
+		_reindex_check_unseen($self, $sync, $ibx);
+		_reindex_check_stale($self, $sync, $ibx) unless $sync->{quit};
+	}
 	delete @$ibx{qw(over mm search git)}; # won't need these for a bit
 }
 
@@ -847,7 +844,9 @@ sub eidx_sync { # main entry point
 	# don't use $_ here, it'll get clobbered by reindex_checkpoint
 	for my $ibx (@{$self->{ibx_list}}) {
 		last if $sync->{quit};
-		_sync_inbox($self, $sync, $ibx);
+		my $err = _sync_inbox($self, $sync, $ibx);
+		delete @$ibx{qw(mm over)};
+		warn $err, "\n" if defined($err);
 	}
 	$self->{oidx}->rethread_done($opt) unless $sync->{quit};
 	eidxq_process($self, $sync) unless $sync->{quit};

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/7] extsearchidx: close DB handles after use if FD constrained
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
  2020-12-25 10:21 ` [PATCH 1/7] index: disable --fast-noop on --reindex Eric Wong
  2020-12-25 10:21 ` [PATCH 2/7] extsearchidx: delay SQLite availability checks Eric Wong
@ 2020-12-25 10:21 ` Eric Wong
  2020-12-25 10:21 ` [PATCH 4/7] index: do not attach inbox to extindex unless updated Eric Wong
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

Most distros ship with low RLIMIT_NOFILE limits and surprises
may lurk for admins who configure many inboxes.  Keep FD usage
under control to avoid EMFILE errors at inopportune times during
reindex.

From what I can tell, this is the only place where extindex can
have unpredictable FD growth when there's thousands of inboxes,
and it's in an extremely rare code path.
---
 lib/PublicInbox/ExtSearchIdx.pm | 37 ++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 386e1cee..3f197973 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -393,6 +393,32 @@ sub _ibx_for ($$$) {
 	$self->{ibx_list}->[$pos] // die "BUG: ibx for $smsg->{blob} not mapped"
 }
 
+sub _fd_constrained ($) {
+	my ($self) = @_;
+	$self->{-fd_constrained} //= do {
+		my $soft;
+		if (eval { require BSD::Resource; 1 }) {
+			my $NOFILE = BSD::Resource::RLIMIT_NOFILE();
+			($soft, undef) = BSD::Resource::getrlimit($NOFILE);
+		} else {
+			chomp($soft = `sh -c 'ulimit -n'`);
+		}
+		if (defined($soft)) {
+			my $want = scalar(@{$self->{ibx_list}}) + 64; # estimate
+			my $ret = $want > $soft;
+			if ($ret) {
+				warn <<EOF;
+RLIMIT_NOFILE=$soft insufficient (want: $want), will close DB handles early
+EOF
+			}
+			$ret;
+		} else {
+			warn "Unable to determine RLIMIT_NOFILE: $@\n";
+			1;
+		}
+	};
+}
+
 sub _reindex_finalize ($$$) {
 	my ($req, $smsg, $eml) = @_;
 	my $sync = $req->{sync};
@@ -429,11 +455,16 @@ sub _reindex_finalize ($$$) {
 		my $x = pop(@$ary) // die "BUG: #$docid {by_chash} empty";
 		$x->{num} = delete($x->{xnum}) // die '{xnum} unset';
 		$ibx = _ibx_for($self, $sync, $x);
-		my $e = $ibx->over->get_art($x->{num});
-		$e->{blob} eq $x->{blob} or die <<EOF;
+		if (my $over = $ibx->over) {
+			my $e = $over->get_art($x->{num});
+			$e->{blob} eq $x->{blob} or die <<EOF;
 $x->{blob} != $e->{blob} (${\$ibx->eidx_key}:$e->{num});
 EOF
-		push @todo, $ibx, $e;
+			push @todo, $ibx, $e;
+			$over->dbh_close if _fd_constrained($self);
+		} else {
+			die "$ibx->{inboxdir}: over.sqlite3 unusable: $!\n";
+		}
 	}
 	undef $by_chash;
 	while (my ($ibx, $e) = splice(@todo, 0, 2)) {

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/7] index: do not attach inbox to extindex unless updated
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
                   ` (2 preceding siblings ...)
  2020-12-25 10:21 ` [PATCH 3/7] extsearchidx: close DB handles after use if FD constrained Eric Wong
@ 2020-12-25 10:21 ` Eric Wong
  2020-12-25 10:21 ` [PATCH 5/7] index: fix --no-fsync flag propagation to extindex Eric Wong
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

We'll count the number of log changes (regardless of index or
unindex) and only attach inboxes to ExtSearchIdx objects when
they get new work.  We'll also reduce lock bouncing and only
update external indices after all per-inbox indexing is done.

This also updates existing v2 indexing/unindexing callers
to be more consistent and ensures unindex log entries update
per-inbox last commit information.
---
 lib/PublicInbox/Admin.pm      |  1 +
 lib/PublicInbox/SearchIdx.pm  |  2 ++
 lib/PublicInbox/V2Writable.pm | 26 +++++++++++++++++++-------
 script/public-inbox-index     | 23 ++++++++++++++---------
 4 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 9a86d206..b468108e 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -271,6 +271,7 @@ EOM
 		$idx = PublicInbox::SearchIdx->new($ibx, 1);
 	}
 	$idx->index_sync($opt);
+	$idx->{nidx} // 0; # returns number processed
 }
 
 sub progress_prepare ($) {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index c8e309fc..b3361e05 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -615,6 +615,7 @@ sub index_both { # git->cat_async callback
 	$smsg->{num} = index_mm($self, $eml, $oid, $sync) or
 		die "E: could not generate NNTP article number for $oid";
 	add_message($self, $eml, $smsg, $sync);
+	++$self->{nidx};
 	my $cur_cmt = $sync->{cur_cmt} // die 'BUG: {cur_cmt} missing';
 	${$sync->{latest_cmt}} = $cur_cmt;
 }
@@ -629,6 +630,7 @@ sub unindex_both { # git->cat_async callback
 	if (defined(my $cur_cmt = $sync->{cur_cmt})) {
 		${$sync->{latest_cmt}} = $cur_cmt;
 	}
+	++$self->{nidx};
 }
 
 sub with_umask {
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 2b849ddf..ca52874b 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -891,12 +891,22 @@ sub reindex_checkpoint ($$) {
 	$mm_tmp->atfork_parent if $mm_tmp;
 }
 
+sub index_finalize ($$) {
+	my ($arg, $index) = @_;
+	++$arg->{self}->{nidx};
+	if (defined(my $cur = $arg->{cur_cmt})) {
+		${$arg->{latest_cmt}} = $cur;
+	} elsif ($index) {
+		die 'BUG: {cur_cmt} missing';
+	} # else { unindexing @leftovers doesn't set {cur_cmt}
+}
+
 sub index_oid { # cat_async callback
 	my ($bref, $oid, $type, $size, $arg) = @_;
-	return if is_bad_blob($oid, $type, $size, $arg->{oid});
+	is_bad_blob($oid, $type, $size, $arg->{oid}) and
+		return index_finalize($arg, 1); # size == 0 purged returns here
 	my $self = $arg->{self};
 	local $self->{current_info} = "$self->{current_info} $oid";
-	return if $size == 0; # purged
 	my ($num, $mid0);
 	my $eml = PublicInbox::Eml->new($$bref);
 	my $mids = mids($eml);
@@ -967,7 +977,7 @@ sub index_oid { # cat_async callback
 	if (do_idx($self, $bref, $eml, $smsg)) {
 		${$arg->{need_checkpoint}} = 1;
 	}
-	${$arg->{latest_cmt}} = $arg->{cur_cmt} // die 'BUG: {cur_cmt} missing';
+	index_finalize($arg, 1);
 }
 
 # only update last_commit for $i on reindex iff newer than current
@@ -1157,11 +1167,12 @@ sub unindex_oid_aux ($$$) {
 }
 
 sub unindex_oid ($$;$) { # git->cat_async callback
-	my ($bref, $oid, $type, $size, $sync) = @_;
-	return if is_bad_blob($oid, $type, $size, $sync->{oid});
-	my $self = $sync->{self};
+	my ($bref, $oid, $type, $size, $arg) = @_;
+	is_bad_blob($oid, $type, $size, $arg->{oid}) and
+		return index_finalize($arg, 0);
+	my $self = $arg->{self};
 	local $self->{current_info} = "$self->{current_info} $oid";
-	my $unindexed = $sync->{in_unindex} ? $sync->{unindexed} : undef;
+	my $unindexed = $arg->{in_unindex} ? $arg->{unindexed} : undef;
 	my $mm = $self->{mm};
 	my $mids = mids(PublicInbox::Eml->new($bref));
 	undef $$bref;
@@ -1186,6 +1197,7 @@ sub unindex_oid ($$;$) { # git->cat_async callback
 		}
 		unindex_oid_aux($self, $oid, $mid);
 	}
+	index_finalize($arg, 0);
 }
 
 sub git { $_[0]->{ibx}->git }
diff --git a/script/public-inbox-index b/script/public-inbox-index
index 87893ef1..a17bf615 100755
--- a/script/public-inbox-index
+++ b/script/public-inbox-index
@@ -63,7 +63,7 @@ my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt, $cfg);
 PublicInbox::Admin::require_or_die('-index');
 unless (@ibxs) { print STDERR $help; exit 1 }
 
-my (@eidx_dir, %eidx_seen);
+my (@eidx, %eidx_seen);
 my $update_extindex = $opt->{'update-extindex'};
 if (!scalar(@$update_extindex) && (my $ALL = $cfg->ALL)) {
 	# extindex and normal inboxes may have different owners
@@ -84,7 +84,8 @@ for my $ei_name (@$update_extindex) {
 	} else {
 		die "extindex `$ei_name' not configured or found\n";
 	}
-	$eidx_seen{$topdir} //= push(@eidx_dir, $topdir);
+	$eidx_seen{$topdir} //=
+		push(@eidx, PublicInbox::ExtSearchIdx->new($topdir));
 }
 my $mods = {};
 my @eidx_unconfigured;
@@ -95,7 +96,7 @@ foreach my $ibx (@ibxs) {
 	$ibx->{indexlevel} //= $opt->{indexlevel} // ($opt->{xapian_only} ?
 			'full' : $detected);
 	PublicInbox::Admin::scan_ibx_modules($mods, $ibx);
-	if (@eidx_dir && $ibx->{-unconfigured}) {
+	if (@eidx && $ibx->{-unconfigured}) {
 		push @eidx_unconfigured, "  $ibx->{inboxdir}\n";
 	}
 }
@@ -128,18 +129,22 @@ publicInbox.$ibx->{name}.indexSequentialShard not boolean
 EOL
 		$ibx_opt = { %$opt, sequential_shard => $v };
 	}
-	PublicInbox::Admin::index_inbox($ibx, undef, $ibx_opt);
+	my $nidx = PublicInbox::Admin::index_inbox($ibx, undef, $ibx_opt);
 	last if $ibx_opt->{quit};
 	if (my $copt = $opt->{compact_opt}) {
 		local $copt->{jobs} = 0 if $ibx_opt->{sequential_shard};
 		PublicInbox::Xapcmd::run($ibx, 'compact', $copt);
 	}
-	next if $ibx->{-unconfigured};
 	last if $ibx_opt->{quit};
-	for my $dir (@eidx_dir) {
-		my $eidx = PublicInbox::ExtSearchIdx->new($dir);
+	next if $ibx->{-unconfigured} || !$nidx;
+	for my $eidx (@eidx) {
 		$eidx->attach_inbox($ibx);
-		$eidx->eidx_sync($ibx_opt);
-		last if $ibx_opt->{quit};
 	}
 }
+$opt->{-no_fsync} = 1 if !$opt->{fsync};
+my $pr = $opt->{-progress};
+for my $eidx (@eidx) {
+	$pr->("indexing $eidx->{topdir} ...\n") if $pr;
+	$eidx->eidx_sync($opt);
+	last if $opt->{quit};
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/7] index: fix --no-fsync flag propagation to extindex
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
                   ` (3 preceding siblings ...)
  2020-12-25 10:21 ` [PATCH 4/7] index: do not attach inbox to extindex unless updated Eric Wong
@ 2020-12-25 10:21 ` Eric Wong
  2020-12-25 10:21 ` [PATCH 6/7] v2writable: don't verify tip if reindexing Eric Wong
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

Negation in flag names are confusing, but trying to deviate from
the DB_NO_SYNC name used by Xapian is also confusing.
---
 lib/PublicInbox/ExtSearchIdx.pm | 2 +-
 script/public-inbox-index       | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 3f197973..e7fdae48 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -54,7 +54,7 @@ sub new {
 	}, __PACKAGE__;
 	$self->{shards} = $self->count_shards || nproc_shards($opt->{creat});
 	my $oidx = PublicInbox::OverIdx->new("$self->{xpfx}/over.sqlite3");
-	$oidx->{-no_fsync} = 1 if $opt->{-no_fsync};
+	$self->{-no_fsync} = $oidx->{-no_fsync} = 1 if !$opt->{fsync};
 	$self->{oidx} = $oidx;
 	$self
 }
diff --git a/script/public-inbox-index b/script/public-inbox-index
index a17bf615..c68f9224 100755
--- a/script/public-inbox-index
+++ b/script/public-inbox-index
@@ -85,7 +85,7 @@ for my $ei_name (@$update_extindex) {
 		die "extindex `$ei_name' not configured or found\n";
 	}
 	$eidx_seen{$topdir} //=
-		push(@eidx, PublicInbox::ExtSearchIdx->new($topdir));
+		push(@eidx, PublicInbox::ExtSearchIdx->new($topdir, $opt));
 }
 my $mods = {};
 my @eidx_unconfigured;
@@ -141,7 +141,6 @@ EOL
 		$eidx->attach_inbox($ibx);
 	}
 }
-$opt->{-no_fsync} = 1 if !$opt->{fsync};
 my $pr = $opt->{-progress};
 for my $eidx (@eidx) {
 	$pr->("indexing $eidx->{topdir} ...\n") if $pr;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 6/7] v2writable: don't verify tip if reindexing
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
                   ` (4 preceding siblings ...)
  2020-12-25 10:21 ` [PATCH 5/7] index: fix --no-fsync flag propagation to extindex Eric Wong
@ 2020-12-25 10:21 ` Eric Wong
  2020-12-25 10:21 ` [PATCH 7/7] index: filter out indexlevel=basic from extindex Eric Wong
  2020-12-25 10:39 ` [PATCH 0/7] index + extindex interaction improvements Eric Wong
  7 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

We only rely on git-rev-parse to resolve symbolic names ("HEAD")
to a SHA-* git commit ID.  We'll assume any git commit IDs we
get from SQLite DBs are valid and let "git-log" fail if it
isn't.
---
 lib/PublicInbox/V2Writable.pm | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index ca52874b..f20b5c7f 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1104,12 +1104,14 @@ sub sync_prepare ($$) {
 		-d $git_dir or next; # missing epochs are fine
 		my $git = PublicInbox::Git->new($git_dir);
 		my $unit = { git => $git, epoch => $i };
+		my $tip;
 		if ($reindex_heads) {
-			$head = $reindex_heads->[$i] or next;
+			$tip = $head = $reindex_heads->[$i] or next;
+		} else {
+			$tip = $git->qx(qw(rev-parse -q --verify), $head);
+			next if $?; # new repo
+			chomp $tip;
 		}
-		chomp(my $tip = $git->qx(qw(rev-parse -q --verify), $head));
-		next if $?; # new repo
-
 		my $range = log_range($sync, $unit, $tip) or next;
 		# can't use 'rev-list --count' if we use --diff-filter
 		$pr->("$pfx $i.git counting $range ... ") if $pr;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 7/7] index: filter out indexlevel=basic from extindex
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
                   ` (5 preceding siblings ...)
  2020-12-25 10:21 ` [PATCH 6/7] v2writable: don't verify tip if reindexing Eric Wong
@ 2020-12-25 10:21 ` Eric Wong
  2020-12-25 10:39 ` [PATCH 0/7] index + extindex interaction improvements Eric Wong
  7 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:21 UTC (permalink / raw)
  To: meta

extindex users will likely want to use indexlevel=basic for
per-inbox indices, however extindex itself doesn't support basic
index level (yet?).  Let's ensure we don't trip up extindex
users who specify "-L basic" on the -index command-line.
---
 script/public-inbox-index | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/script/public-inbox-index b/script/public-inbox-index
index c68f9224..0fdfddc0 100755
--- a/script/public-inbox-index
+++ b/script/public-inbox-index
@@ -84,8 +84,10 @@ for my $ei_name (@$update_extindex) {
 	} else {
 		die "extindex `$ei_name' not configured or found\n";
 	}
+	my $o = { %$opt };
+	delete $o->{indexlevel} if ($o->{indexlevel}//'') eq 'basic';
 	$eidx_seen{$topdir} //=
-		push(@eidx, PublicInbox::ExtSearchIdx->new($topdir, $opt));
+		push(@eidx, PublicInbox::ExtSearchIdx->new($topdir, $o));
 }
 my $mods = {};
 my @eidx_unconfigured;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/7] index + extindex interaction improvements
  2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
                   ` (6 preceding siblings ...)
  2020-12-25 10:21 ` [PATCH 7/7] index: filter out indexlevel=basic from extindex Eric Wong
@ 2020-12-25 10:39 ` Eric Wong
  2020-12-26  1:44   ` [PATCH 0/3] extindex --watch support Eric Wong
  7 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2020-12-25 10:39 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> Some things which make -index less painful when auto-updating
> external indices.
> 
> "public-inbox-extindex --all" itself is still painfully slow
> with 50K inboxes, but I think that can only be used once for
> initialization and -index can be relied on for all incremental
> updates.

I've been wondering if --watch mode (using inotify) might be a
good idea for indexing in mirrors, too (or with -mda).

This would especially be helpful for those who want to keep
extindex directories owned by a separate user than the one
who writes to inboxes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/3] extindex --watch support
  2020-12-25 10:39 ` [PATCH 0/7] index + extindex interaction improvements Eric Wong
@ 2020-12-26  1:44   ` Eric Wong
  2020-12-26  1:44     ` [PATCH 1/3] default to CORE::warn in $SIG{__WARN__} handlers Eric Wong
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-26  1:44 UTC (permalink / raw)
  To: meta

"public-inbox-extindex --watch --all" is nice, and now maybe
public-inbox-watch or -mda won't /need/ updating to support
extindex.

1/3 is me still learning Perl, brown paper bag on 3/3

Eric Wong (3):
  default to CORE::warn in $SIG{__WARN__} handlers
  extindex: --watch for inotify-based updates
  init: use the return value of rel2abs_collapsed

 lib/PublicInbox/Admin.pm         |   2 +-
 lib/PublicInbox/ExtSearchIdx.pm  | 128 ++++++++++++++++++++++++++++---
 lib/PublicInbox/InboxIdle.pm     |   8 +-
 lib/PublicInbox/InboxWritable.pm |   2 +-
 lib/PublicInbox/OverIdx.pm       |   8 +-
 lib/PublicInbox/V2Writable.pm    |   2 +-
 lib/PublicInbox/Watch.pm         |   6 +-
 script/public-inbox-extindex     |  19 ++++-
 script/public-inbox-init         |   2 +-
 9 files changed, 153 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] default to CORE::warn in $SIG{__WARN__} handlers
  2020-12-26  1:44   ` [PATCH 0/3] extindex --watch support Eric Wong
@ 2020-12-26  1:44     ` Eric Wong
  2020-12-26  1:44     ` [PATCH 2/3] extindex: --watch for inotify-based updates Eric Wong
  2020-12-26  1:44     ` [PATCH 3/3] init: use the return value of rel2abs_collapsed Eric Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-26  1:44 UTC (permalink / raw)
  To: meta

As with CORE::die and $SIG{__DIE__}, it turns out CORE::warn is
safe to use inside $SIG{__WARN__} handlers without triggering
infinite recursion.  So fall back to reusing CORE::warn instead
of creating a new sub.
---
 lib/PublicInbox/Admin.pm         | 2 +-
 lib/PublicInbox/ExtSearchIdx.pm  | 2 +-
 lib/PublicInbox/InboxWritable.pm | 2 +-
 lib/PublicInbox/Watch.pm         | 6 +++---
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index b468108e..d414e4e2 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -241,7 +241,7 @@ sub index_inbox {
 	}
 	local %SIG = %SIG;
 	setup_signals(\&index_terminate, $ibx);
-	my $warn_cb = $SIG{__WARN__} // sub { print STDERR @_ };
+	my $warn_cb = $SIG{__WARN__} // \&CORE::warn;
 	my $idx = { current_info => $ibx->{inboxdir} };
 	my $warn_ignore = PublicInbox::InboxWritable->can('warn_ignore');
 	local $SIG{__WARN__} = sub {
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index e7fdae48..64ebf6db 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -841,7 +841,7 @@ sub eidx_reindex {
 sub eidx_sync { # main entry point
 	my ($self, $opt) = @_;
 
-	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
+	my $warn_cb = $SIG{__WARN__} || \&CORE::warn;
 	local $self->{current_info} = '';
 	local $SIG{__WARN__} = sub {
 		$warn_cb->($self->{current_info}, ': ', @_);
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 31eb3f15..b1d5caf5 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -292,7 +292,7 @@ sub warn_ignore {
 
 # this expects to be RHS in this assignment: "local $SIG{__WARN__} = ..."
 sub warn_ignore_cb {
-	my $cb = $SIG{__WARN__} // sub { print STDERR @_ };
+	my $cb = $SIG{__WARN__} // \&CORE::warn;
 	sub {
 		return if warn_ignore(@_);
 		$cb->(@_);
diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm
index e1246096..bc296e01 100644
--- a/lib/PublicInbox/Watch.pm
+++ b/lib/PublicInbox/Watch.pm
@@ -217,7 +217,7 @@ sub _try_path {
 		warn "unmappable dir: $1\n";
 		return;
 	}
-	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
+	my $warn_cb = $SIG{__WARN__} || \&CORE::warn;
 	local $SIG{__WARN__} = sub {
 		my $pfx = ($_[0] // '') =~ /^([A-Z]: )/g ? $1 : '';
 		$warn_cb->($pfx, "path: $path\n", @_);
@@ -467,7 +467,7 @@ sub imap_fetch_all ($$$) {
 	my $key = $req;
 	$key =~ s/\.PEEK//;
 	my ($uids, $batch);
-	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
+	my $warn_cb = $SIG{__WARN__} || \&CORE::warn;
 	local $SIG{__WARN__} = sub {
 		my $pfx = ($_[0] // '') =~ /^([A-Z]: )/g ? $1 : '';
 		$batch //= '?';
@@ -929,7 +929,7 @@ sub nntp_fetch_all ($$$) {
 	$beg = $l_art + 1;
 
 	warn "I: $url fetching ARTICLE $beg..$end\n";
-	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
+	my $warn_cb = $SIG{__WARN__} || \&CORE::warn;
 	my ($err, $art);
 	local $SIG{__WARN__} = sub {
 		my $pfx = ($_[0] // '') =~ /^([A-Z]: )/g ? $1 : '';

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/3] extindex: --watch for inotify-based updates
  2020-12-26  1:44   ` [PATCH 0/3] extindex --watch support Eric Wong
  2020-12-26  1:44     ` [PATCH 1/3] default to CORE::warn in $SIG{__WARN__} handlers Eric Wong
@ 2020-12-26  1:44     ` Eric Wong
  2020-12-26  1:44     ` [PATCH 3/3] init: use the return value of rel2abs_collapsed Eric Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-26  1:44 UTC (permalink / raw)
  To: meta

This reuses existing InboxIdle infrastructure to update external
indices based on per-inbox updates.  This is an alternative to
auto-updating external indices via the -index command and also
works with existing uses of -mda and public-inbox-watch.

Using inotify (or EVFILT_VNODE) allows watching thousands of
inboxes without having to scan every single one at every
invocation.

This is especially beneficial in cases where an external index
is not writable to the users writing to per-inbox indices.
---
 lib/PublicInbox/ExtSearchIdx.pm | 126 ++++++++++++++++++++++++++++++--
 lib/PublicInbox/InboxIdle.pm    |   8 +-
 lib/PublicInbox/OverIdx.pm      |   8 +-
 lib/PublicInbox/V2Writable.pm   |   2 +-
 script/public-inbox-extindex    |  19 ++++-
 5 files changed, 146 insertions(+), 17 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 64ebf6db..53ff2ca1 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -630,7 +630,7 @@ sub eidxq_process ($$) { # for reindexing
 	my $dbh = $self->{oidx}->dbh;
 	my $tot = $dbh->selectrow_array('SELECT COUNT(*) FROM eidxq') or return;
 	${$sync->{nr}} = 0;
-	$sync->{-regen_fmt} = "%u/$tot\n";
+	local $sync->{-regen_fmt} = "%u/$tot\n";
 	my $pr = $sync->{-opt}->{-progress};
 	if ($pr) {
 		my $min = $dbh->selectrow_array('SELECT MIN(docid) FROM eidxq');
@@ -709,7 +709,8 @@ sub _reindex_check_unseen ($$$) {
 	my $msgs;
 	my $pr = $sync->{-opt}->{-progress};
 	my $ekey = $ibx->eidx_key;
-	$sync->{-regen_fmt} = "$ekey checking unseen %u/".$ibx->over->max."\n";
+	local $sync->{-regen_fmt} =
+			"$ekey checking unseen %u/".$ibx->over->max."\n";
 	${$sync->{nr}} = 0;
 
 	while (scalar(@{$msgs = $ibx->over->query_xover($beg, $end)})) {
@@ -752,7 +753,7 @@ sub _reindex_check_stale ($$$) {
 	my $pr = $sync->{-opt}->{-progress};
 	my $fetching;
 	my $ekey = $ibx->eidx_key;
-	$sync->{-regen_fmt} =
+	local $sync->{-regen_fmt} =
 			"$ekey check stale/missing %u/".$ibx->over->max."\n";
 	${$sync->{nr}} = 0;
 	do {
@@ -838,6 +839,13 @@ sub eidx_reindex {
 	eidxq_process($self, $sync) unless $sync->{quit};
 }
 
+sub sync_inbox {
+	my ($self, $sync, $ibx) = @_;
+	my $err = _sync_inbox($self, $sync, $ibx);
+	delete @$ibx{qw(mm over)};
+	warn $err, "\n" if defined($err);
+}
+
 sub eidx_sync { # main entry point
 	my ($self, $opt) = @_;
 
@@ -868,22 +876,21 @@ sub eidx_sync { # main entry point
 		$ibx->{-ibx_id} //= $self->{oidx}->ibx_id($ibx->eidx_key);
 	}
 	if (delete($opt->{reindex})) {
-		$sync->{checkpoint_unlocks} = 1;
+		local $sync->{checkpoint_unlocks} = 1;
 		eidx_reindex($self, $sync);
 	}
 
 	# don't use $_ here, it'll get clobbered by reindex_checkpoint
 	for my $ibx (@{$self->{ibx_list}}) {
 		last if $sync->{quit};
-		my $err = _sync_inbox($self, $sync, $ibx);
-		delete @$ibx{qw(mm over)};
-		warn $err, "\n" if defined($err);
+		sync_inbox($self, $sync, $ibx);
 	}
 	$self->{oidx}->rethread_done($opt) unless $sync->{quit};
 	eidxq_process($self, $sync) unless $sync->{quit};
 
 	eidxq_release($self);
-	PublicInbox::V2Writable::done($self);
+	done($self);
+	$sync; # for eidx_watch
 }
 
 sub update_last_commit { # overrides V2Writable
@@ -970,6 +977,109 @@ sub idx_init { # similar to V2Writable
 	$self->{midx}->begin_txn;
 }
 
+sub _watch_commit { # PublicInbox::DS::add_timer callback
+	my ($self) = @_;
+	delete $self->{-commit_timer};
+	eidxq_process($self, $self->{-watch_sync});
+	eidxq_release($self);
+	delete local $self->{-watch_sync}->{-regen_fmt};
+	reindex_checkpoint($self, $self->{-watch_sync});
+
+	# call event_step => done unless commit_timer is armed
+	PublicInbox::DS::requeue($self);
+}
+
+sub on_inbox_unlock { # called by PublicInbox::InboxIdle
+	my ($self, $ibx) = @_;
+	my $opt = $self->{-watch_sync}->{-opt};
+	my $pr = $opt->{-progress};
+	my $ekey = $ibx->eidx_key;
+	local $0 = "sync $ekey";
+	$pr->("indexing $ekey\n") if $pr;
+	$self->idx_init($opt);
+	sync_inbox($self, $self->{-watch_sync}, $ibx);
+	$self->{-commit_timer} //= PublicInbox::DS::add_timer(
+					$opt->{'commit-interval'} // 10,
+					\&_watch_commit, $self);
+}
+
+sub eidx_reload { # -extindex --watch SIGHUP handler
+	my ($self, $idler) = @_;
+	if ($self->{cfg}) {
+		my $pr = $self->{-watch_sync}->{-opt}->{-progress};
+		$pr->('reloading ...') if $pr;
+		@{$self->{ibx_list}} = ();
+		%{$self->{ibx_map}} = ();
+		delete $self->{-watch_sync}->{id2pos};
+		my $cfg = PublicInbox::Config->new;
+		attach_config($self, $cfg);
+		$idler->refresh($cfg);
+		$pr->(" done\n") if $pr;
+	} else {
+		warn "reload not supported without --all\n";
+	}
+}
+
+sub eidx_resync_start ($) { # -extindex --watch SIGUSR1 handler
+	my ($self) = @_;
+	$self->{-resync_queue} //= [ @{$self->{ibx_list}} ];
+	PublicInbox::DS::requeue($self); # trigger our ->event_step
+}
+
+sub event_step { # PublicInbox::DS::requeue callback
+	my ($self) = @_;
+	if (my $resync_queue = $self->{-resync_queue}) {
+		if (my $ibx = shift(@$resync_queue)) {
+			on_inbox_unlock($self, $ibx);
+			PublicInbox::DS::requeue($self);
+		} else {
+			delete $self->{-resync_queue};
+			_watch_commit($self);
+		}
+	} else {
+		done($self) unless $self->{-commit_timer};
+	}
+}
+
+sub eidx_watch { # public-inbox-extindex --watch main loop
+	my ($self, $opt) = @_;
+	require PublicInbox::InboxIdle;
+	require PublicInbox::DS;
+	require PublicInbox::Syscall;
+	require PublicInbox::Sigfd;
+	my $idler = PublicInbox::InboxIdle->new($self->{cfg});
+	if (!$self->{cfg}) {
+		$idler->watch_inbox($_) for @{$self->{ibx_list}};
+	}
+	$_->subscribe_unlock(__PACKAGE__, $self) for @{$self->{ibx_list}};
+	my $sync = eidx_sync($self, $opt); # initial sync
+	return if $sync->{quit};
+	my $oldset = PublicInbox::Sigfd::block_signals();
+	local $self->{current_info} = '';
+	my $cb = $SIG{__WARN__} || \&CORE::warn;
+	local $SIG{__WARN__} = sub { $cb->($self->{current_info}, ': ', @_) };
+	my $sig = {
+		HUP => sub { eidx_reload($self, $idler) },
+		USR1 => sub { eidx_resync_start($self) },
+		TSTP => sub { kill('STOP', $$) },
+	};
+	my $quit = PublicInbox::SearchIdx::quit_cb($sync);
+	$sig->{QUIT} = $sig->{INT} = $sig->{TERM} = $quit;
+	my $sigfd = PublicInbox::Sigfd->new($sig,
+					$PublicInbox::Syscall::SFD_NONBLOCK);
+	local %SIG = (%SIG, %$sig) if !$sigfd;
+	local $self->{-watch_sync} = $sync; # for ->on_inbox_unlock
+	if (!$sigfd) {
+		# wake up every second to accept signals if we don't
+		# have signalfd or IO::KQueue:
+		PublicInbox::Sigfd::sig_setmask($oldset);
+		PublicInbox::DS->SetLoopTimeout(1000);
+	}
+	PublicInbox::DS->SetPostLoopCallback(sub { !$sync->{quit} });
+	PublicInbox::DS->EventLoop; # calls InboxIdle->event_step
+	done($self);
+}
+
 no warnings 'once';
 *done = \&PublicInbox::V2Writable::done;
 *with_umask = \&PublicInbox::InboxWritable::with_umask;
diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm
index f1cbc012..34606186 100644
--- a/lib/PublicInbox/InboxIdle.pm
+++ b/lib/PublicInbox/InboxIdle.pm
@@ -49,6 +49,9 @@ sub refresh {
 	$pi_cfg->each_inbox(\&in2_arm, $self);
 }
 
+# internal API for ease-of-use
+sub watch_inbox { in2_arm($_[1], $_[0]) };
+
 sub new {
 	my ($class, $pi_cfg) = @_;
 	my $self = bless {}, $class;
@@ -64,7 +67,7 @@ sub new {
 	$self->{inot} = $inot;
 	$self->{pathmap} = {}; # inboxdir => [ ibx, watch1, watch2, watch3...]
 	$self->{on_unlock} = {}; # lock path => ibx
-	refresh($self, $pi_cfg);
+	refresh($self, $pi_cfg) if $pi_cfg;
 	PublicInbox::FakeInotify::poll_once($self) if !$ino_cls;
 	$self;
 }
@@ -75,7 +78,8 @@ sub event_step {
 		my @events = $self->{inot}->read; # Linux::Inotify2::read
 		my $on_unlock = $self->{on_unlock};
 		for my $ev (@events) {
-			if (my $ibx = $on_unlock->{$ev->fullname}) {
+			my $fn = $ev->fullname // next; # cancelled
+			if (my $ibx = $on_unlock->{$fn}) {
 				$ibx->on_unlock;
 			}
 		}
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 4a39bf53..dcc2cff3 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -473,10 +473,14 @@ sub dbh_close {
 
 sub create {
 	my ($self) = @_;
-	unless (-r $self->{filename}) {
+	my $fn = $self->{filename} // do {
+		Carp::confess('BUG: no {filename}') unless $self->{dbh};
+		return;
+	};
+	unless (-r $fn) {
 		require File::Path;
 		require File::Basename;
-		File::Path::mkpath(File::Basename::dirname($self->{filename}));
+		File::Path::mkpath(File::Basename::dirname($fn));
 	}
 	# create the DB:
 	PublicInbox::Over::dbh($self);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index f20b5c7f..567582c5 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -879,7 +879,7 @@ sub reindex_checkpoint ($$) {
 		$self->done; # release lock
 	}
 
-	if (my $pr = $sync->{-opt}->{-progress}) {
+	if (my $pr = $sync->{-regen_fmt} ? $sync->{-opt}->{-progress} : undef) {
 		$pr->(sprintf($sync->{-regen_fmt}, ${$sync->{nr}}));
 	}
 
diff --git a/script/public-inbox-extindex b/script/public-inbox-extindex
index 17ad59fa..607baa3e 100644
--- a/script/public-inbox-extindex
+++ b/script/public-inbox-extindex
@@ -11,6 +11,7 @@ usage: public-inbox-extindex [options] EXTINDEX_DIR [INBOX_DIR]
   Create and update external (detached) search indices
 
   --no-fsync          speed up indexing, risk corruption on power outage
+  --watch             run persistently and watch for inbox updates
   -L LEVEL            `medium', or `full' (default: full)
   --all               index all configured inboxes
   --jobs=NUM          set or disable parallelization (NUM=0)
@@ -27,7 +28,7 @@ GetOptions($opt, qw(verbose|v+ reindex rethread compact|c+ jobs|j=i
 		fsync|sync!
 		indexlevel|index-level|L=s max_size|max-size=s
 		batch_size|batch-size=s
-		gc
+		gc commit-interval=i watch
 		all help|h))
 	or die $help;
 if ($opt->{help}) { print $help; exit 0 };
@@ -41,7 +42,8 @@ my $cfg = PublicInbox::Config->new;
 my @ibxs;
 if ($opt->{gc}) {
 	die "E: inbox paths must not be specified with --gc\n" if @ARGV;
-	die "E: --all not compatible --gc\n" if $opt->{all};
+	die "E: --all not compatible with --gc\n" if $opt->{all};
+	die "E: --watch is not compatible with --gc\n" if $opt->{watch};
 } else {
 	@ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt, $cfg);
 }
@@ -56,6 +58,15 @@ if ($opt->{gc}) {
 	$eidx->attach_config($cfg);
 	$eidx->eidx_gc($opt);
 } else {
-	$eidx->attach_inbox($_) for @ibxs;
-	$eidx->eidx_sync($opt);
+	if ($opt->{all}) {
+		$eidx->attach_config($cfg);
+	} else {
+		$eidx->attach_inbox($_) for @ibxs;
+	}
+	if ($opt->{watch}) {
+		$cfg = undef; # save memory only after SIGHUP
+		$eidx->eidx_watch($opt);
+	} else {
+		$eidx->eidx_sync($opt);
+	}
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/3] init: use the return value of rel2abs_collapsed
  2020-12-26  1:44   ` [PATCH 0/3] extindex --watch support Eric Wong
  2020-12-26  1:44     ` [PATCH 1/3] default to CORE::warn in $SIG{__WARN__} handlers Eric Wong
  2020-12-26  1:44     ` [PATCH 2/3] extindex: --watch for inotify-based updates Eric Wong
@ 2020-12-26  1:44     ` Eric Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-12-26  1:44 UTC (permalink / raw)
  To: meta

:x

Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
---
 script/public-inbox-init | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/script/public-inbox-init b/script/public-inbox-init
index afaa4c12..6d538e43 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -138,7 +138,7 @@ close($fh) or die "failed to close $pi_config_tmp: $!\n";
 my $pfx = "publicinbox.$name";
 my @x = (qw/git config/, "--file=$pi_config_tmp");
 
-PublicInbox::Config::rel2abs_collapsed($inboxdir);
+$inboxdir = PublicInbox::Config::rel2abs_collapsed($inboxdir);
 die "`\\n' not allowed in `$inboxdir'\n" if index($inboxdir, "\n") >= 0;
 
 if (-f "$inboxdir/inbox.lock") {

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-12-26  1:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25 10:21 [PATCH 0/7] index + extindex interaction improvements Eric Wong
2020-12-25 10:21 ` [PATCH 1/7] index: disable --fast-noop on --reindex Eric Wong
2020-12-25 10:21 ` [PATCH 2/7] extsearchidx: delay SQLite availability checks Eric Wong
2020-12-25 10:21 ` [PATCH 3/7] extsearchidx: close DB handles after use if FD constrained Eric Wong
2020-12-25 10:21 ` [PATCH 4/7] index: do not attach inbox to extindex unless updated Eric Wong
2020-12-25 10:21 ` [PATCH 5/7] index: fix --no-fsync flag propagation to extindex Eric Wong
2020-12-25 10:21 ` [PATCH 6/7] v2writable: don't verify tip if reindexing Eric Wong
2020-12-25 10:21 ` [PATCH 7/7] index: filter out indexlevel=basic from extindex Eric Wong
2020-12-25 10:39 ` [PATCH 0/7] index + extindex interaction improvements Eric Wong
2020-12-26  1:44   ` [PATCH 0/3] extindex --watch support Eric Wong
2020-12-26  1:44     ` [PATCH 1/3] default to CORE::warn in $SIG{__WARN__} handlers Eric Wong
2020-12-26  1:44     ` [PATCH 2/3] extindex: --watch for inotify-based updates Eric Wong
2020-12-26  1:44     ` [PATCH 3/3] init: use the return value of rel2abs_collapsed Eric Wong

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ http://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git