user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* Restarting daemons on config file change
@ 2021-07-19 20:03 Konstantin Ryabitsev
  2021-07-19 20:49 ` Eric Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-07-19 20:03 UTC (permalink / raw)
  To: meta

Hello:

Something I stumbled on today is the need to have the -httpd and -nntpd
daemons reread the config file after we've mirrored and initialized new
inboxdirs. The situation is:

- public-inbox-{httpd,nntpd} are running as systemd services as user
  "publicinbox"
- the mirroring and initialization/indexing is done as user "mirror", so we
  can't send a HUP to the daemon processes

The best I can think of is a systemd watcher service that automatically
restarts the daemons when the config file is modified, but I wanted to check
here first to see if perhaps I'm missing something simpler.

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Restarting daemons on config file change
  2021-07-19 20:03 Restarting daemons on config file change Konstantin Ryabitsev
@ 2021-07-19 20:49 ` Eric Wong
  2021-07-20  8:58   ` [PATCH] httpd: fix SIGHUP by invalidating cache on reload Eric Wong
  2021-07-20 17:00   ` Restarting daemons on config file change Konstantin Ryabitsev
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Wong @ 2021-07-19 20:49 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hello:
> 
> Something I stumbled on today is the need to have the -httpd and -nntpd
> daemons reread the config file after we've mirrored and initialized new
> inboxdirs. The situation is:

Correct, config files are read once at startup.

> - public-inbox-{httpd,nntpd} are running as systemd services as user
>   "publicinbox"
> - the mirroring and initialization/indexing is done as user "mirror", so we
>   can't send a HUP to the daemon processes

I seem to recall HUP having some trouble with -httpd (and less
so with nntpd/imapd); or at least that's what -daemon(8) manpage
alludes to...

I usually just swap between "@foo" and "@bar" systemd units, but I
restart more often for code changes than config file changes.

> The best I can think of is a systemd watcher service that automatically
> restarts the daemons when the config file is modified, but I wanted to check
> here first to see if perhaps I'm missing something simpler.

Yes, a systemd.path unit might be the way to go.  A patch for
examples/ would be appreciated if you go down that route :>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] httpd: fix SIGHUP by invalidating cache on reload
  2021-07-19 20:49 ` Eric Wong
@ 2021-07-20  8:58   ` Eric Wong
  2021-07-20 17:00   ` Restarting daemons on config file change Konstantin Ryabitsev
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Wong @ 2021-07-20  8:58 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

Eric Wong <e@80x24.org> wrote:
> I seem to recall HUP having some trouble with -httpd (and less
> so with nntpd/imapd); or at least that's what -daemon(8) manpage
> alludes to...

-------8<-------
Subject: [PATCH] httpd: fix SIGHUP by invalidating cache on reload

Since we require separate PublicInbox::HTTPD instances for each
listen socket address (in order to support {SERVER_<NAME|PORT>}
for PSGI env), the old cache needed to be invalidated on rare
app refreshes.

SIGHUP has always been broken in -httpd (but not -imapd or
-nntpd) due to this cache.

Update the daemon documentation and 5.10.1-ize some bits while
we're in the area.
---
 Documentation/public-inbox-daemon.pod |  8 +++++---
 script/public-inbox-httpd             |  6 ++++--
 t/httpd.t                             | 25 +++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/Documentation/public-inbox-daemon.pod b/Documentation/public-inbox-daemon.pod
index ec210efa..f77fc3a9 100644
--- a/Documentation/public-inbox-daemon.pod
+++ b/Documentation/public-inbox-daemon.pod
@@ -5,13 +5,14 @@ public-inbox-daemon - common usage for public-inbox network daemons
 =head1 SYNOPSIS
 
 	public-inbox-httpd
+	public-inbox-imapd
 	public-inbox-nntpd
 
 =head1 DESCRIPTION
 
 This manual describes common options and behavior for
 public-inbox network daemons.  Network daemons for public-inbox
-provide read-only NNTP and HTTP access to public-inboxes.  Write
+provide read-only NNTP, IMAP and HTTP access to public-inboxes.  Write
 access to a public-inbox will never be required to run these.
 
 These daemons are implemented with a common core using
@@ -103,7 +104,7 @@ See L</UPGRADING> below.
 =item SIGHUP
 
 Reload config files associated with the process.
-(FIXME: not tested for -httpd, yet)
+(Note: broken for L<public-inbox-httpd(1)> only in E<lt>= 1.6)
 
 =item SIGTTIN
 
@@ -188,4 +189,5 @@ License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
 
 =head1 SEE ALSO
 
-L<public-inbox-httpd(1)>, L<public-inbox-nntpd(1)>
+L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>,
+L<public-inbox-nntpd(1)>
diff --git a/script/public-inbox-httpd b/script/public-inbox-httpd
index b31b896d..7b0ec560 100755
--- a/script/public-inbox-httpd
+++ b/script/public-inbox-httpd
@@ -4,6 +4,7 @@
 #
 # Standalone HTTP server for public-inbox.
 use strict;
+use v5.10.1;
 use PublicInbox::Daemon;
 BEGIN {
 	for (qw(Plack::Builder Plack::Util)) {
@@ -14,7 +15,7 @@ BEGIN {
 	require PublicInbox::HTTPD;
 }
 
-my %httpds;
+my %httpds; # per-listen-FD mapping for HTTPD->{env}->{SERVER_<NAME|PORT>}
 my $app;
 my $refresh = sub {
 	if (@ARGV) {
@@ -37,12 +38,13 @@ my $refresh = sub {
 			sub { $www->call(@_) };
 		};
 	}
+	%httpds = (); # invalidate cache
 };
 
 PublicInbox::Daemon::run('0.0.0.0:8080', $refresh,
 	sub ($$$) { # post_accept
 		my ($client, $addr, $srv) = @_;
 		my $fd = fileno($srv);
-		my $h = $httpds{$fd} ||= PublicInbox::HTTPD->new($srv, $app);
+		my $h = $httpds{$fd} //= PublicInbox::HTTPD->new($srv, $app);
 		PublicInbox::HTTP->new($client, $addr, $h),
 	});
diff --git a/t/httpd.t b/t/httpd.t
index 0354a733..849f61bb 100644
--- a/t/httpd.t
+++ b/t/httpd.t
@@ -32,6 +32,10 @@ Date: Thu, 01 Jan 1970 06:06:06 +0000
 nntp
 EOF
 	};
+	my $i2 = create_inbox 'test-2', sub {
+		my ($im, $ibx) = @_;
+		$im->add(eml_load('t/plack-qp.eml')) or xbail '->add';
+	};
 	local $ENV{HOME} = $home;
 	my $cmd = [ '-init', $group, $inboxdir, 'http://example.com/', $addr ];
 	ok(run_script($cmd), 'init ran properly');
@@ -64,6 +68,27 @@ EOF
 			"$http_pfx/$group", "$tmpdir/dumb.git"),
 		0, 'clone successful');
 
+	# test config reload
+	my $cfg = "$home/.public-inbox/config";
+	open my $fh, '>>', $cfg or xbail "open: $!";
+	print $fh <<EOM or xbail "print $!";
+[publicinbox "test-2"]
+	inboxdir = $i2->{inboxdir}
+	address = test-2\@example.com
+	url = https://example.com/test-2
+EOM
+	close $fh or xbail "close $!";
+	$td->kill('HUP') or BAIL_OUT "failed to kill -httpd: $!";
+	tick; # wait for HUP to take effect
+	my $buf = do {
+		my $c2 = tcp_connect($sock);
+		$c2->write("GET /test-2/qp\@example.com/raw HTTP/1.0\r\n\r\n")
+					or xbail "c2 write: $!";
+		local $/;
+		<$c2>
+	};
+	like($buf, qr!\AHTTP/1\.0 200\b!s, 'got 200 after reload for test-2');
+
 	ok($td->kill, 'killed httpd');
 	$td->join;
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Restarting daemons on config file change
  2021-07-19 20:49 ` Eric Wong
  2021-07-20  8:58   ` [PATCH] httpd: fix SIGHUP by invalidating cache on reload Eric Wong
@ 2021-07-20 17:00   ` Konstantin Ryabitsev
  2021-07-20 20:34     ` Eric Wong
  1 sibling, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-07-20 17:00 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Jul 19, 2021 at 08:49:35PM +0000, Eric Wong wrote:
> > The best I can think of is a systemd watcher service that automatically
> > restarts the daemons when the config file is modified, but I wanted to check
> > here first to see if perhaps I'm missing something simpler.
> 
> Yes, a systemd.path unit might be the way to go.  A patch for
> examples/ would be appreciated if you go down that route :>

Okay, let me see what I can come up with. Looks like the best course of action
is to:

1. use a global blocking lock
2. copy the config file to a new location
3. make the necessary changes to the temporary config file
4. move the temp config file to overwrite the current config
5. release the lock

This would make sure that there is are no races and that all changes show up
at once and don't repeatedly trigger the path watcher.

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Restarting daemons on config file change
  2021-07-20 17:00   ` Restarting daemons on config file change Konstantin Ryabitsev
@ 2021-07-20 20:34     ` Eric Wong
  2021-07-20 20:49       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2021-07-20 20:34 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Jul 19, 2021 at 08:49:35PM +0000, Eric Wong wrote:
> > > The best I can think of is a systemd watcher service that automatically
> > > restarts the daemons when the config file is modified, but I wanted to check
> > > here first to see if perhaps I'm missing something simpler.
> > 
> > Yes, a systemd.path unit might be the way to go.  A patch for
> > examples/ would be appreciated if you go down that route :>
> 
> Okay, let me see what I can come up with. Looks like the best course of action
> is to:
> 
> 1. use a global blocking lock
> 2. copy the config file to a new location
> 3. make the necessary changes to the temporary config file
> 4. move the temp config file to overwrite the current config
> 5. release the lock

public-inbox-init already does all of that.  It doesn't set
all possible keys, yet (e.g. infourl, maybe some others).

> This would make sure that there is are no races and that all changes show up
> at once and don't repeatedly trigger the path watcher.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Restarting daemons on config file change
  2021-07-20 20:34     ` Eric Wong
@ 2021-07-20 20:49       ` Konstantin Ryabitsev
  2021-07-20 21:07         ` Eric Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-07-20 20:49 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Tue, Jul 20, 2021 at 08:34:33PM +0000, Eric Wong wrote:
> > Okay, let me see what I can come up with. Looks like the best course of action
> > is to:
> > 
> > 1. use a global blocking lock
> > 2. copy the config file to a new location
> > 3. make the necessary changes to the temporary config file
> > 4. move the temp config file to overwrite the current config
> > 5. release the lock
> 
> public-inbox-init already does all of that.  It doesn't set
> all possible keys, yet (e.g. infourl, maybe some others).

I figured as much, but we do want to set extra keys *and* write the config in
a certain order (e.g. prioritizing some sources over others by listing them
first). I currently do this via a list-id globbing match (e.g.
--listid-priority=*.linux.dev,*.kernel.org,*).

Apropos, question -- will public-inbox-init do anything to the config file if
it finds that all the settings are already in place, or will it just accept
them?

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Restarting daemons on config file change
  2021-07-20 20:49       ` Konstantin Ryabitsev
@ 2021-07-20 21:07         ` Eric Wong
  2021-07-20 21:18           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2021-07-20 21:07 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, Jul 20, 2021 at 08:34:33PM +0000, Eric Wong wrote:
> > > Okay, let me see what I can come up with. Looks like the best course of action
> > > is to:
> > > 
> > > 1. use a global blocking lock
> > > 2. copy the config file to a new location
> > > 3. make the necessary changes to the temporary config file
> > > 4. move the temp config file to overwrite the current config
> > > 5. release the lock
> > 
> > public-inbox-init already does all of that.  It doesn't set
> > all possible keys, yet (e.g. infourl, maybe some others).
> 
> I figured as much, but we do want to set extra keys *and* write the config in
> a certain order (e.g. prioritizing some sources over others by listing them
> first). I currently do this via a list-id globbing match (e.g.
> --listid-priority=*.linux.dev,*.kernel.org,*).

Ah, that gets tricky, since git-config doesn't seem to provide a
mechanism for ordering.  Not sure, perhaps serializing -init
invocations somehow is the way to go?  I'm not sure how to
expose that ordering, though...

> Apropos, question -- will public-inbox-init do anything to the config file if
> it finds that all the settings are already in place, or will it just accept
> them?

It only checks conflicting addresses, at the moment.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Restarting daemons on config file change
  2021-07-20 21:07         ` Eric Wong
@ 2021-07-20 21:18           ` Konstantin Ryabitsev
  2021-07-21 14:05             ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong
                               ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-07-20 21:18 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Tue, Jul 20, 2021 at 09:07:24PM +0000, Eric Wong wrote:
> > I figured as much, but we do want to set extra keys *and* write the config in
> > a certain order (e.g. prioritizing some sources over others by listing them
> > first). I currently do this via a list-id globbing match (e.g.
> > --listid-priority=*.linux.dev,*.kernel.org,*).
> 
> Ah, that gets tricky, since git-config doesn't seem to provide a
> mechanism for ordering.  Not sure, perhaps serializing -init
> invocations somehow is the way to go?  I'm not sure how to
> expose that ordering, though...

Serializing wouldn't necessarily help, as a list may be added to an existing
archive collection at some later point, and we don't necessarily want it at
the bottom.

Another thought would be to give a "rank" ("weight"/"priority") configuration
option that you can check before falling back to the listed order. E.g.:

[publicinbox "dkim-mangling"]
  listid=dkim-mangling.list.example.org
  rank=10

[publicinbox "dkim-respecting"]
  listid=dkim-respecting.list.example.net
  rank=1

This way higher-ranking entries will win when returning aggregating threads,
so we're less likely to see results from known dkim-mangling sources.

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] extsearch: support publicinbox.*.boost parameter
  2021-07-20 21:18           ` Konstantin Ryabitsev
@ 2021-07-21 14:05             ` Eric Wong
  2021-07-21 14:05             ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong
  2021-07-21 15:32             ` [PATCH 0/2] things to make mirroring easier Konstantin Ryabitsev
  2 siblings, 0 replies; 12+ messages in thread
From: Eric Wong @ 2021-07-21 14:05 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

This behaves identically the lei external "boost" parameter in
prioritizing raw messages for extindex.

Relying exclusively on the config file order doesn't work well
for mirrors since it's impossible to guarantee config file
ordering via grokmirror hooks.

Config file ordering remains the default if boost is
unconfigured, or in case of ties.

Note: I chose the name "boost" rather than "priority" or "rank"
since I always get confused by whether higher or lower numbers
take precedence when it comes to kernel scheduling.  "weight" is
also a part of Xapian API terminology, which we currently do not
expose to configuration (but may in the future).
---
 Documentation/public-inbox-config.pod |  8 +++++
 lib/PublicInbox/Config.pm             |  2 +-
 lib/PublicInbox/ExtSearchIdx.pm       | 43 +++++++++++++++++----------
 t/extsearch.t                         | 32 ++++++++++++++++++++
 4 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index 05d9ca62..5b86ef6c 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -124,6 +124,14 @@ allow for searching for phrases using quoted text.
 
 Default: C<full>
 
+=item publicinbox.<name>.boost
+
+Control indexing order for L<public-inbox-extindex(1)>, with ties
+broken by config file order.  This only affects indexing and does
+not affect messages which are already indexed.
+
+Default: C<0>
+
 =item publicinbox.<name>.indexSequentialShard
 
 See L<public-inbox-index(1)/publicInbox.indexSequentialShard>
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 8e46328d..7aa1f6c8 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -429,7 +429,7 @@ sub _fill_ibx {
 		$ibx->{$k} = $v if defined $v;
 	}
 	for my $k (qw(filter inboxdir newsgroup replyto httpbackendmax feedmax
-			indexlevel indexsequentialshard)) {
+			indexlevel indexsequentialshard boost)) {
 		my $v = get_1($self, $pfx, $k) // next;
 		$ibx->{$k} = $v;
 	}
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 0e27bba6..357312b8 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -44,7 +44,7 @@ sub new {
 		topdir => $dir,
 		creat => $opt->{creat},
 		ibx_map => {}, # (newsgroup//inboxdir) => $ibx
-		ibx_list => [],
+		ibx_cfg => [], # by config section order
 		indexlevel => $l,
 		transact_bytes => 0,
 		total_bytes => 0,
@@ -62,7 +62,8 @@ sub new {
 sub attach_inbox {
 	my ($self, $ibx) = @_;
 	$self->{ibx_map}->{$ibx->eidx_key} //= do {
-		push @{$self->{ibx_list}}, $ibx;
+		delete $self->{-ibx_ary}; # invalidate cache
+		push @{$self->{ibx_cfg}}, $ibx;
 		$ibx;
 	}
 }
@@ -388,7 +389,7 @@ sub _ibx_for ($$$) {
 	my ($self, $sync, $smsg) = @_;
 	my $ibx_id = delete($smsg->{ibx_id}) // die '{ibx_id} unset';
 	my $pos = $sync->{id2pos}->{$ibx_id} // die "$ibx_id no pos";
-	$self->{ibx_list}->[$pos] // die "BUG: ibx for $smsg->{blob} not mapped"
+	$self->{-ibx_ary}->[$pos] // die "BUG: ibx for $smsg->{blob} not mapped"
 }
 
 sub _fd_constrained ($) {
@@ -402,7 +403,7 @@ sub _fd_constrained ($) {
 			chomp($soft = `sh -c 'ulimit -n'`);
 		}
 		if (defined($soft)) {
-			my $want = scalar(@{$self->{ibx_list}}) + 64; # estimate
+			my $want = scalar(@{$self->{-ibx_ary}}) + 64; # estimate
 			my $ret = $want > $soft;
 			if ($ret) {
 				warn <<EOF;
@@ -524,10 +525,10 @@ BUG? #$docid $smsg->{blob} is not referenced by inboxes during reindex
 		return;
 	}
 
-	# we sort {xr3r} in the reverse order of {ibx_list} so we can
+	# we sort {xr3r} in the reverse order of ibx_sorted so we can
 	# hit the common case in _reindex_finalize without rereading
 	# from git (or holding multiple messages in memory).
-	my $id2pos = $sync->{id2pos}; # index in {ibx_list}
+	my $id2pos = $sync->{id2pos}; # index in ibx_sorted
 	@$xr3 = sort {
 		$id2pos->{$b->[0]} <=> $id2pos->{$a->[0]}
 				||
@@ -621,6 +622,17 @@ EOF
 	undef;
 }
 
+sub ibx_sorted ($) {
+	my ($self) = @_;
+	$self->{-ibx_ary} //= do {
+		# highest boost first, stable for config-ordering tiebreaker
+		use sort 'stable';
+		[ sort {
+			($b->{boost} // 0) <=> ($a->{boost} // 0)
+		  } @{$self->{ibx_cfg}} ];
+	}
+}
+
 sub eidxq_process ($$) { # for reindexing
 	my ($self, $sync) = @_;
 
@@ -638,7 +650,7 @@ sub eidxq_process ($$) { # for reindexing
 	$sync->{id2pos} //= do {
 		my %id2pos;
 		my $pos = 0;
-		$id2pos{$_->{-ibx_id}} = $pos++ for @{$self->{ibx_list}};
+		$id2pos{$_->{-ibx_id}} = $pos++ for (@{ibx_sorted($self)});
 		\%id2pos;
 	};
 	my ($del, $iter);
@@ -829,7 +841,7 @@ sub eidx_reindex {
 		warn "E: aborting --reindex\n";
 		return;
 	}
-	for my $ibx (@{$self->{ibx_list}}) {
+	for my $ibx (@{ibx_sorted($self)}) {
 		_reindex_inbox($self, $sync, $ibx);
 		last if $sync->{quit};
 	}
@@ -959,7 +971,7 @@ sub eidx_sync { # main entry point
 	local $SIG{QUIT} = $quit;
 	local $SIG{INT} = $quit;
 	local $SIG{TERM} = $quit;
-	for my $ibx (@{$self->{ibx_list}}) {
+	for my $ibx (@{ibx_sorted($self)}) {
 		$ibx->{-ibx_id} //= $self->{oidx}->ibx_id($ibx->eidx_key);
 	}
 	if (delete($opt->{dedupe})) {
@@ -973,7 +985,7 @@ sub eidx_sync { # main entry point
 
 	# don't use $_ here, it'll get clobbered by reindex_checkpoint
 	if ($opt->{scan} // 1) {
-		for my $ibx (@{$self->{ibx_list}}) {
+		for my $ibx (@{ibx_sorted($self)}) {
 			last if $sync->{quit};
 			sync_inbox($self, $sync, $ibx);
 		}
@@ -1115,7 +1127,7 @@ sub idx_init { # similar to V2Writable
 		}
 		undef $dh;
 	}
-	for my $ibx (@{$self->{ibx_list}}) {
+	for my $ibx (@{ibx_sorted($self)}) {
 		# create symlinks for multi-pack-index
 		$git_midx += symlink_packs($ibx, $pd);
 		# add new lines to our alternates file
@@ -1180,7 +1192,8 @@ sub eidx_reload { # -extindex --watch SIGHUP handler
 		my $pr = $self->{-watch_sync}->{-opt}->{-progress};
 		$pr->('reloading ...') if $pr;
 		delete $self->{-resync_queue};
-		@{$self->{ibx_list}} = ();
+		delete $self->{-ibx_ary};
+		$self->{ibx_cfg} = [];
 		%{$self->{ibx_map}} = ();
 		delete $self->{-watch_sync}->{id2pos};
 		my $cfg = PublicInbox::Config->new;
@@ -1194,7 +1207,7 @@ sub eidx_reload { # -extindex --watch SIGHUP handler
 
 sub eidx_resync_start ($) { # -extindex --watch SIGUSR1 handler
 	my ($self) = @_;
-	$self->{-resync_queue} //= [ @{$self->{ibx_list}} ];
+	$self->{-resync_queue} //= [ @{ibx_sorted($self)} ];
 	PublicInbox::DS::requeue($self); # trigger our ->event_step
 }
 
@@ -1225,9 +1238,9 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
 	require PublicInbox::Sigfd;
 	my $idler = PublicInbox::InboxIdle->new($self->{cfg});
 	if (!$self->{cfg}) {
-		$idler->watch_inbox($_) for @{$self->{ibx_list}};
+		$idler->watch_inbox($_) for (@{ibx_sorted($self)});
 	}
-	$_->subscribe_unlock(__PACKAGE__, $self) for @{$self->{ibx_list}};
+	$_->subscribe_unlock(__PACKAGE__, $self) for (@{ibx_sorted($self)});
 	my $pr = $opt->{-progress};
 	$pr->("performing initial scan ...\n") if $pr;
 	my $sync = eidx_sync($self, $opt); # initial sync
diff --git a/t/extsearch.t b/t/extsearch.t
index 5f0cd866..46a6f2ec 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -60,6 +60,38 @@ ok(run_script([qw(-extindex --all), "$home/extindex"]), 'extindex init');
 	ok($es->has_threadid, '->has_threadid');
 }
 
+if ('with boost') {
+	xsys([qw(git config publicinbox.v1test.boost), 10],
+		{ GIT_CONFIG => $cfg_path });
+	ok(run_script([qw(-extindex --all), "$home/extindex-b"]),
+		'extindex init with boost');
+	my $es = PublicInbox::ExtSearch->new("$home/extindex-b");
+	my $smsg = $es->over->get_art(1);
+	ok($smsg, 'got first article');
+	my $xref3 = $es->over->get_xref3($smsg->{num});
+	my @v1 = grep(/\Av1/, @$xref3);
+	my @v2 = grep(/\Av2/, @$xref3);
+	like($v1[0], qr/\Av1\.example.*?\b\Q$smsg->{blob}\E\b/,
+		'smsg->{blob} respected boost');
+	is(scalar(@$xref3), 2, 'only to entries');
+	undef $es;
+
+	xsys([qw(git config publicinbox.v2test.boost), 20],
+		{ GIT_CONFIG => $cfg_path });
+	ok(run_script([qw(-extindex --all --reindex), "$home/extindex-b"]),
+		'extindex --reindex with altered boost');
+
+	$es = PublicInbox::ExtSearch->new("$home/extindex-b");
+	$smsg = $es->over->get_art(1);
+	like($v2[0], qr/\Av2\.example.*?\b\Q$smsg->{blob}\E\b/,
+			'smsg->{blob} respects boost after reindex');
+
+	xsys([qw(git config --unset publicinbox.v1test.boost)],
+		{ GIT_CONFIG => $cfg_path });
+	xsys([qw(git config --unset publicinbox.v2test.boost)],
+		{ GIT_CONFIG => $cfg_path });
+}
+
 { # TODO: -extindex should write this to config
 	open $fh, '>>', $cfg_path or BAIL_OUT $!;
 	print $fh <<EOF or BAIL_OUT $!;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE
  2021-07-20 21:18           ` Konstantin Ryabitsev
  2021-07-21 14:05             ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong
@ 2021-07-21 14:05             ` Eric Wong
  2021-07-25 10:40               ` [PATCH 3/2] init: support git <2.30 for "-c KEY=VALUE" args Eric Wong
  2021-07-21 15:32             ` [PATCH 0/2] things to make mirroring easier Konstantin Ryabitsev
  2 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2021-07-21 14:05 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

This won't blindly append identical key=values, but
allows specifying multiple, different key=value pairs
as long as the values are different.
---
 Documentation/public-inbox-init.pod |  7 ++++++
 script/public-inbox-init            | 36 ++++++++++++++++++++++++++++-
 t/init.t                            | 34 +++++++++++++++++++++++++++
 3 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod
index 62624f14..85c6c9e8 100644
--- a/Documentation/public-inbox-init.pod
+++ b/Documentation/public-inbox-init.pod
@@ -60,6 +60,13 @@ Available in public-inbox 1.6.0+.
 
 Default: none.
 
+=item -c KEY=VALUE
+
+Allow setting arbitrary configs as C<publicinbox.$NAME.$KEY>.
+This is idempotent for the same C<VALUE>, but allows setting
+multiple values for keys such as C<publicinbox.$NAME.url> and
+C<publicinbox.$NAME.watch>.
+
 =item --skip-artnum
 
 This option allows archivists to publish incomplete archives
diff --git a/script/public-inbox-init b/script/public-inbox-init
index 335eb476..e22a0564 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -22,6 +22,7 @@ options:
   -V2                 use scalable public-inbox-v2-format(5)
   -L LEVEL            index level `basic', `medium', or `full' (default: full)
   --ng NEWSGROUP      set NNTP newsgroup name
+  -c KEY=VALUE        set additional config option(s)
   --skip-artnum=NUM   NNTP article numbers to skip
   --skip-epoch=NUM    epochs to skip (-V2 only)
   -j JOBS             number of indexing jobs (-V2 only), (default: 4)
@@ -35,6 +36,7 @@ PublicInbox::Admin::require_or_die('-base');
 my ($version, $indexlevel, $skip_epoch, $skip_artnum, $jobs, $show_help);
 my $skip_docdata;
 my $ng = '';
+my @c_extra;
 my %opts = (
 	'V|version=i' => \$version,
 	'L|index-level|indexlevel=s' => \$indexlevel,
@@ -44,6 +46,7 @@ my %opts = (
 	'ng|newsgroup=s' => \$ng,
 	'skip-docdata' => \$skip_docdata,
 	'help|h' => \$show_help,
+	'c=s@' => \@c_extra,
 );
 my $usage_cb = sub {
 	print STDERR $help;
@@ -51,13 +54,38 @@ my $usage_cb = sub {
 };
 GetOptions(%opts) or $usage_cb->();
 if ($show_help) { print $help; exit 0 };
-PublicInbox::Admin::indexlevel_ok_or_die($indexlevel) if defined $indexlevel;
 my $name = shift @ARGV or $usage_cb->();
 my $inboxdir = shift @ARGV or $usage_cb->();
 my $http_url = shift @ARGV or $usage_cb->();
 my (@address) = @ARGV;
 @address or $usage_cb->();
 
+@c_extra = map {
+	my ($k, $v) = split(/=/, $_, 2);
+	defined($v) or die "Usage: -c KEY=VALUE\n";
+	$k =~ /\A[a-z]+\z/i or die "$k contains invalid characters\n";
+	$k = lc($k);
+	if ($k eq 'newsgroup') {
+		die "newsgroup already set ($ng)\n" if $ng ne '';
+		$ng = $v;
+		();
+	} elsif ($k eq 'address') {
+		push @address, $v; # for conflict checking
+		();
+	} elsif ($k =~ /\A(?:inboxdir|mainrepo)\z/) {
+		die "$k not allowed via -c $_\n"
+	} elsif ($k eq 'indexlevel') {
+		defined($indexlevel) and
+			die "indexlevel already set ($indexlevel)\n";
+		$indexlevel = $v;
+		();
+	} else {
+		$_
+	}
+} @c_extra;
+
+PublicInbox::Admin::indexlevel_ok_or_die($indexlevel) if defined $indexlevel;
+
 $ng =~ m![^A-Za-z0-9/_\.\-\~\@\+\=:]! and
 	die "--newsgroup `$ng' is not valid\n";
 ($ng =~ m!\A\.! || $ng =~ m!\.\z!) and
@@ -201,6 +229,12 @@ if (defined($indexlevel)) {
 }
 run_die([@x, "$pfx.newsgroup", $ng]) if $ng ne '';
 
+for my $kv (@c_extra) {
+	my ($k, $v) = split(/=/, $kv, 2);
+	# --fixed-value for idempotent invocations
+	run_die([@x, qw(--replace-all --fixed-value), "$pfx.$k", $v, $v]);
+}
+
 # needed for git prior to v2.1.0
 if (defined $perm) {
 	chmod($perm & 07777, $pi_config_tmp) or
diff --git a/t/init.t b/t/init.t
index d46bef23..7382e05b 100644
--- a/t/init.t
+++ b/t/init.t
@@ -48,6 +48,40 @@ sub quiet_fail {
 	is($? >> 8, 255, 'got expected exit code on lock failure');
 	ok(unlink("$cfgfile.lock"),
 		'-init did not unlink lock on failure');
+
+	my @init_args = ('i', "$tmpdir/i",
+		   qw(http://example.com/i i@example.com));
+	$cmd = [ qw(-init -c .bogus=val), @init_args ];
+	quiet_fail($cmd, 'invalid -c KEY=VALUE fails');
+	$cmd = [ qw(-init -c .bogus=val), @init_args ];
+	quiet_fail($cmd, '-c KEY-only fails');
+	$cmd = [ qw(-init -c address=clist@example.com), @init_args ];
+	quiet_fail($cmd, '-c address=CONFLICTING-VALUE fails');
+
+	$cmd = [ qw(-init -c no=problem -c no=problemo), @init_args ];
+	ok(run_script($cmd), '-c KEY=VALUE runs');
+	my $env = { GIT_CONFIG => "$ENV{PI_DIR}/config" };
+	chomp(my @v = xqx([qw(git config --get-all publicinbox.i.no)], $env));
+	is_deeply(\@v, [ qw(problem problemo) ]) or xbail(\@v);
+
+	ok(run_script($cmd), '-c KEY=VALUE runs idempotently');
+	chomp(my @v2 = xqx([qw(git config --get-all publicinbox.i.no)], $env));
+	is_deeply(\@v, \@v2, 'nothing repeated') or xbail(\@v2);
+
+	ok(run_script([@$cmd, '-c', 'no=more']), '-c KEY=VALUE addendum');
+	chomp(@v = xqx([qw(git config --get-all publicinbox.i.no)], $env));
+	is_deeply(\@v, [ qw(problem problemo more) ]) or xbail(\@v);
+
+
+	ok(run_script([@$cmd, '-c', 'no=problem']), '-c KEY=VALUE repeated');
+	chomp(@v = xqx([qw(git config --get-all publicinbox.i.no)], $env));
+	is_deeply(\@v, [ qw(problem problemo more) ]) or xbail(\@v);
+
+	ok(run_script([@$cmd, '-c', 'address=j@example.com']),
+		'-c KEY=VALUE address');
+	chomp(@v = xqx([qw(git config --get-all publicinbox.i.address)], $env));
+	is_deeply(\@v, [ qw(i@example.com j@example.com) ],
+		'extra address added via -c KEY=VALUE');
 }
 {
 	my $env = { PI_DIR => "$tmpdir/.public-inbox/" };

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] things to make mirroring easier
  2021-07-20 21:18           ` Konstantin Ryabitsev
  2021-07-21 14:05             ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong
  2021-07-21 14:05             ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong
@ 2021-07-21 15:32             ` Konstantin Ryabitsev
  2 siblings, 0 replies; 12+ messages in thread
From: Konstantin Ryabitsev @ 2021-07-21 15:32 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Wed, Jul 21, 2021 at 02:05:48PM +0000, Eric Wong wrote:
> publicinbox.<name>.boost is now supported (it should be obvious
> higher numbers are handled first, because OS scheduler
> "priority" always confuses me :x)
> 
> And -init handles arbitrary "-c KEY=VALUE" things like
> git-config and allows multi-value settings.

Great stuff, thanks for both of these!

-K

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/2] init: support git <2.30 for "-c KEY=VALUE" args
  2021-07-21 14:05             ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong
@ 2021-07-25 10:40               ` Eric Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Wong @ 2021-07-25 10:40 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

It turns out `--fixed-value' is a relatively new git-config(1)
feature in git 2.30+ (December 2020).  So use the quotemeta
perlop for now since it seems compatible-enough for POSIX ERE
used by git.
---
 script/public-inbox-init | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/script/public-inbox-init b/script/public-inbox-init
index e22a0564..6fac4d18 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -231,8 +231,11 @@ run_die([@x, "$pfx.newsgroup", $ng]) if $ng ne '';
 
 for my $kv (@c_extra) {
 	my ($k, $v) = split(/=/, $kv, 2);
-	# --fixed-value for idempotent invocations
-	run_die([@x, qw(--replace-all --fixed-value), "$pfx.$k", $v, $v]);
+	# git 2.30+ has --fixed-value for idempotent invocations,
+	# but that's too new to depend on in 2021.  Perl quotemeta
+	# seems compatible enough for POSIX ERE which git uses
+	my $re = '^'.quotemeta($v).'$';
+	run_die([@x, qw(--replace-all), "$pfx.$k", $v, $re]);
 }
 
 # needed for git prior to v2.1.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-25 10:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-19 20:03 Restarting daemons on config file change Konstantin Ryabitsev
2021-07-19 20:49 ` Eric Wong
2021-07-20  8:58   ` [PATCH] httpd: fix SIGHUP by invalidating cache on reload Eric Wong
2021-07-20 17:00   ` Restarting daemons on config file change Konstantin Ryabitsev
2021-07-20 20:34     ` Eric Wong
2021-07-20 20:49       ` Konstantin Ryabitsev
2021-07-20 21:07         ` Eric Wong
2021-07-20 21:18           ` Konstantin Ryabitsev
2021-07-21 14:05             ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong
2021-07-21 14:05             ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong
2021-07-25 10:40               ` [PATCH 3/2] init: support git <2.30 for "-c KEY=VALUE" args Eric Wong
2021-07-21 15:32             ` [PATCH 0/2] things to make mirroring easier Konstantin Ryabitsev

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ https://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.mail.public-inbox.meta
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.mail.public-inbox.meta
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git