user/dev discussion of public-inbox itself
 help / color / mirror / Atom feed
* [PATCH 0/4] extindex: more fixes and usability things
@ 2020-12-26 10:16 Eric Wong
  2020-12-26 10:16 ` [PATCH 1/4] extindex: various --watch signal handling fixes Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2020-12-26 10:16 UTC (permalink / raw)
  To: meta

--watch seems nice, and "--watch --all" (or just "--all")
without having to specify the pathname of the extindex
is nice, too.

It still needs to write extindex.all.topdir if none is
configured and a section 1 manpage...

Eric Wong (4):
  extindex: various --watch signal handling fixes
  extindex: enable autoflush on STDOUT/STDERR
  extindex: add undocumented --no-scan switch
  extindex: allow using --all without EXTINDEX_DIR

 lib/PublicInbox/ExtSearchIdx.pm | 18 ++++++++++++++----
 script/public-inbox-extindex    | 21 +++++++++++++++------
 2 files changed, 29 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] extindex: various --watch signal handling fixes
  2020-12-26 10:16 [PATCH 0/4] extindex: more fixes and usability things Eric Wong
@ 2020-12-26 10:16 ` Eric Wong
  2020-12-26 10:16 ` [PATCH 2/4] extindex: enable autoflush on STDOUT/STDERR Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2020-12-26 10:16 UTC (permalink / raw)
  To: meta

We need to clobber the SIGUSR1 resync queue on SIGHUP to
invalidate old inbox objects.  Furthermore, the lengthy
initial scan needs to ignore signals intended for the
event loop to avoid unexpected behavior.  Finally, add
some progress output to inform users on the terminal
to inform users' of progress.
---
 lib/PublicInbox/ExtSearchIdx.pm | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 53ff2ca1..778154a5 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -1008,6 +1008,7 @@ sub eidx_reload { # -extindex --watch SIGHUP handler
 	if ($self->{cfg}) {
 		my $pr = $self->{-watch_sync}->{-opt}->{-progress};
 		$pr->('reloading ...') if $pr;
+		delete $self->{-resync_queue};
 		@{$self->{ibx_list}} = ();
 		%{$self->{ibx_map}} = ();
 		delete $self->{-watch_sync}->{id2pos};
@@ -1043,6 +1044,10 @@ sub event_step { # PublicInbox::DS::requeue callback
 
 sub eidx_watch { # public-inbox-extindex --watch main loop
 	my ($self, $opt) = @_;
+	local %SIG = %SIG;
+	for my $sig (qw(HUP USR1 TSTP QUIT INT TERM)) {
+		$SIG{$sig} = sub { warn "SIG$sig ignored while scanning\n" };
+	}
 	require PublicInbox::InboxIdle;
 	require PublicInbox::DS;
 	require PublicInbox::Syscall;
@@ -1052,6 +1057,8 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
 		$idler->watch_inbox($_) for @{$self->{ibx_list}};
 	}
 	$_->subscribe_unlock(__PACKAGE__, $self) for @{$self->{ibx_list}};
+	my $pr = $opt->{-progress};
+	$pr->("performing initial scan ...\n") if $pr;
 	my $sync = eidx_sync($self, $opt); # initial sync
 	return if $sync->{quit};
 	my $oldset = PublicInbox::Sigfd::block_signals();
@@ -1067,7 +1074,7 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
 	$sig->{QUIT} = $sig->{INT} = $sig->{TERM} = $quit;
 	my $sigfd = PublicInbox::Sigfd->new($sig,
 					$PublicInbox::Syscall::SFD_NONBLOCK);
-	local %SIG = (%SIG, %$sig) if !$sigfd;
+	%SIG = (%SIG, %$sig) if !$sigfd;
 	local $self->{-watch_sync} = $sync; # for ->on_inbox_unlock
 	if (!$sigfd) {
 		# wake up every second to accept signals if we don't
@@ -1076,6 +1083,7 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
 		PublicInbox::DS->SetLoopTimeout(1000);
 	}
 	PublicInbox::DS->SetPostLoopCallback(sub { !$sync->{quit} });
+	$pr->("initial scan complete, entering event loop\n") if $pr;
 	PublicInbox::DS->EventLoop; # calls InboxIdle->event_step
 	done($self);
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/4] extindex: enable autoflush on STDOUT/STDERR
  2020-12-26 10:16 [PATCH 0/4] extindex: more fixes and usability things Eric Wong
  2020-12-26 10:16 ` [PATCH 1/4] extindex: various --watch signal handling fixes Eric Wong
@ 2020-12-26 10:16 ` Eric Wong
  2020-12-26 10:16 ` [PATCH 3/4] extindex: add undocumented --no-scan switch Eric Wong
  2020-12-26 10:16 ` [PATCH 4/4] extindex: allow using --all without EXTINDEX_DIR Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2020-12-26 10:16 UTC (permalink / raw)
  To: meta

With --watch, the output may be redirected to a pipe or socket
which Perl may decide to buffer.  Ensure Perl doesn't buffer
these outputs since they can provide real-time status updates
in response to signals or FS activity.
---
 script/public-inbox-extindex | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/script/public-inbox-extindex b/script/public-inbox-extindex
index 607baa3e..17986f60 100644
--- a/script/public-inbox-extindex
+++ b/script/public-inbox-extindex
@@ -33,7 +33,9 @@ GetOptions($opt, qw(verbose|v+ reindex rethread compact|c+ jobs|j=i
 	or die $help;
 if ($opt->{help}) { print $help; exit 0 };
 die "--jobs must be >= 0\n" if defined $opt->{jobs} && $opt->{jobs} < 0;
-
+require IO::Handle;
+STDOUT->autoflush(1);
+STDERR->autoflush(1);
 # require lazily to speed up --help
 my $eidx_dir = shift(@ARGV) // die "E: $help";
 local $SIG{USR1} = 'IGNORE'; # to be overridden in eidx_sync

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/4] extindex: add undocumented --no-scan switch
  2020-12-26 10:16 [PATCH 0/4] extindex: more fixes and usability things Eric Wong
  2020-12-26 10:16 ` [PATCH 1/4] extindex: various --watch signal handling fixes Eric Wong
  2020-12-26 10:16 ` [PATCH 2/4] extindex: enable autoflush on STDOUT/STDERR Eric Wong
@ 2020-12-26 10:16 ` Eric Wong
  2020-12-26 10:16 ` [PATCH 4/4] extindex: allow using --all without EXTINDEX_DIR Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2020-12-26 10:16 UTC (permalink / raw)
  To: meta

This makes diagnosing --watch problems easier when there's
50K inboxes by avoiding the lengthy scan (which is the reason
--watch exists in the first place).
---
 lib/PublicInbox/ExtSearchIdx.pm | 8 +++++---
 script/public-inbox-extindex    | 4 ++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 778154a5..07e64698 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -881,9 +881,11 @@ sub eidx_sync { # main entry point
 	}
 
 	# don't use $_ here, it'll get clobbered by reindex_checkpoint
-	for my $ibx (@{$self->{ibx_list}}) {
-		last if $sync->{quit};
-		sync_inbox($self, $sync, $ibx);
+	if ($opt->{scan} // 1) {
+		for my $ibx (@{$self->{ibx_list}}) {
+			last if $sync->{quit};
+			sync_inbox($self, $sync, $ibx);
+		}
 	}
 	$self->{oidx}->rethread_done($opt) unless $sync->{quit};
 	eidxq_process($self, $sync) unless $sync->{quit};
diff --git a/script/public-inbox-extindex b/script/public-inbox-extindex
index 17986f60..f4ffda4b 100644
--- a/script/public-inbox-extindex
+++ b/script/public-inbox-extindex
@@ -23,12 +23,12 @@ usage: public-inbox-extindex [options] EXTINDEX_DIR [INBOX_DIR]
 BYTES may use `k', `m', and `g' suffixes (e.g. `10m' for 10 megabytes)
 See public-inbox-extindex(1) man page for full documentation.
 EOF
-my $opt = { quiet => -1, compact => 0, max_size => undef, fsync => 1 };
+my $opt = { quiet => -1, compact => 0, fsync => 1, scan => 1 };
 GetOptions($opt, qw(verbose|v+ reindex rethread compact|c+ jobs|j=i
 		fsync|sync!
 		indexlevel|index-level|L=s max_size|max-size=s
 		batch_size|batch-size=s
-		gc commit-interval=i watch
+		gc commit-interval=i watch scan!
 		all help|h))
 	or die $help;
 if ($opt->{help}) { print $help; exit 0 };

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 4/4] extindex: allow using --all without EXTINDEX_DIR
  2020-12-26 10:16 [PATCH 0/4] extindex: more fixes and usability things Eric Wong
                   ` (2 preceding siblings ...)
  2020-12-26 10:16 ` [PATCH 3/4] extindex: add undocumented --no-scan switch Eric Wong
@ 2020-12-26 10:16 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2020-12-26 10:16 UTC (permalink / raw)
  To: meta

If "--all" is specified to index all inboxes, implicitly choose
the configured [extindex "all"] external index since "--all" is
incompatible with specifying inbox directories on the
command-line.
---
 script/public-inbox-extindex | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/script/public-inbox-extindex b/script/public-inbox-extindex
index f4ffda4b..5f27988f 100644
--- a/script/public-inbox-extindex
+++ b/script/public-inbox-extindex
@@ -6,7 +6,7 @@ use strict;
 use v5.10.1;
 use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
 my $help = <<EOF; # the following should fit w/o scrolling in 80x24 term:
-usage: public-inbox-extindex [options] EXTINDEX_DIR [INBOX_DIR]
+usage: public-inbox-extindex [options] [EXTINDEX_DIR] [INBOX_DIR...]
 
   Create and update external (detached) search indices
 
@@ -36,11 +36,18 @@ die "--jobs must be >= 0\n" if defined $opt->{jobs} && $opt->{jobs} < 0;
 require IO::Handle;
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
-# require lazily to speed up --help
-my $eidx_dir = shift(@ARGV) // die "E: $help";
 local $SIG{USR1} = 'IGNORE'; # to be overridden in eidx_sync
+# require lazily to speed up --help
 require PublicInbox::Admin;
 my $cfg = PublicInbox::Config->new;
+my $eidx_dir = shift(@ARGV);
+unless (defined $eidx_dir) {
+	if ($opt->{all} && $cfg->ALL) {
+		$eidx_dir = $cfg->ALL->{topdir};
+	} else {
+		die "E: $help";
+	}
+}
 my @ibxs;
 if ($opt->{gc}) {
 	die "E: inbox paths must not be specified with --gc\n" if @ARGV;

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-26 10:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-26 10:16 [PATCH 0/4] extindex: more fixes and usability things Eric Wong
2020-12-26 10:16 ` [PATCH 1/4] extindex: various --watch signal handling fixes Eric Wong
2020-12-26 10:16 ` [PATCH 2/4] extindex: enable autoflush on STDOUT/STDERR Eric Wong
2020-12-26 10:16 ` [PATCH 3/4] extindex: add undocumented --no-scan switch Eric Wong
2020-12-26 10:16 ` [PATCH 4/4] extindex: allow using --all without EXTINDEX_DIR Eric Wong

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ https://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git