user/dev discussion of public-inbox itself
 help / color / mirror / Atom feed
* [PATCH 0/2] extsearch: avoid stale Xapian results
@ 2020-12-27 11:01 Eric Wong
  2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2020-12-27 11:01 UTC (permalink / raw)
  To: meta

I noticed recent messages weren't showing up in search results
on http://lore.czquwvybam4bgbro.onion/all/

These should fix it, and we'll probably get rid of the
cleanup timers for per-inbox search and follow this
strategy.

Eric Wong (2):
  extsearch: unconditionally reopen on access
  miscsearch: take reopen from Search and use it

 lib/PublicInbox/ExtSearch.pm  | 4 +---
 lib/PublicInbox/MiscSearch.pm | 4 ++++
 lib/PublicInbox/WwwListing.pm | 3 +++
 3 files changed, 8 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] extsearch: unconditionally reopen on access
  2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong
@ 2020-12-27 11:01 ` Eric Wong
  2020-12-27 11:01 ` [PATCH 2/2] miscsearch: take reopen from Search and use it Eric Wong
  2020-12-28 15:32 ` [PATCH 0/2] extsearch: avoid stale Xapian results Kyle Meyer
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-12-27 11:01 UTC (permalink / raw)
  To: meta

Since ExtSearch lacks the janky cleanup timer of
PublicInbox::Inbox objects, its search results get stale.

Reopen the Xapian DB on every ->search call for now, as
reducing reopen calls doesn't seem worth the complexity.

The Xapian::Database::reopen operation itself takes only ~50us
on my old workstation with 3 shards totaling <200GB.  Other
parts of Xapian dominates the search time, so the reopen seems
inconsequential with single-digit shard counts.
---
 lib/PublicInbox/ExtSearch.pm | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/PublicInbox/ExtSearch.pm b/lib/PublicInbox/ExtSearch.pm
index a2b97798..7c9586a6 100644
--- a/lib/PublicInbox/ExtSearch.pm
+++ b/lib/PublicInbox/ExtSearch.pm
@@ -29,8 +29,6 @@ sub misc {
 	$self->{misc} //= PublicInbox::MiscSearch->new("$self->{xpfx}/misc");
 }
 
-sub search { $_[0] } # self
-
 # overrides PublicInbox::Search::_xdb
 sub _xdb {
 	my ($self) = @_;
@@ -126,6 +124,6 @@ no warnings 'once';
 *recent = \&PublicInbox::Inbox::recent;
 
 *max_git_epoch = *nntp_usable = *msg_by_path = \&mm; # undef
-*isrch = *search;
+*isrch = *search = \&PublicInbox::Search::reopen;
 
 1;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] miscsearch: take reopen from Search and use it
  2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong
  2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong
@ 2020-12-27 11:01 ` Eric Wong
  2020-12-28 15:32 ` [PATCH 0/2] extsearch: avoid stale Xapian results Kyle Meyer
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-12-27 11:01 UTC (permalink / raw)
  To: meta

As with ExtSearch, MiscSearch lacks a janky cleanup timer of
PublicInbox::Inbox objects, leading to info about
inboxes/newsgroups going stale.  Fortunately, we don't use
MiscSearch very heavily, yet.

In the future, we may be able to detect new inboxes without
having to SIGHUP or restart daemons using MiscSearch.
---
 lib/PublicInbox/MiscSearch.pm | 4 ++++
 lib/PublicInbox/WwwListing.pm | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/lib/PublicInbox/MiscSearch.pm b/lib/PublicInbox/MiscSearch.pm
index c6ce255f..6683d564 100644
--- a/lib/PublicInbox/MiscSearch.pm
+++ b/lib/PublicInbox/MiscSearch.pm
@@ -73,6 +73,7 @@ sub misc_enquire_once { # retry_reopen callback
 sub mset {
 	my ($self, $qs, $opt) = @_;
 	$opt ||= {};
+	reopen($self);
 	my $qp = $self->{qp} //= mi_qp_new($self);
 	$qs = 'type:inbox' if $qs eq '';
 	my $qr = $qp->parse_query($qs, $PublicInbox::Search::QP_FLAGS);
@@ -184,4 +185,7 @@ sub nntpd_cache_load {
 	retry_reopen($self, \&_nntpd_cache_load);
 }
 
+no warnings 'once';
+*reopen = \&PublicInbox::Search::reopen;
+
 1;
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index fce0e530..4b3f1674 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -69,6 +69,9 @@ sub hide_key { 'www' }
 sub response {
 	my ($class, $ctx) = @_;
 	bless $ctx, $class;
+	if (my $ALL = $ctx->{www}->{pi_cfg}->ALL) {
+		$ALL->misc->reopen;
+	}
 	my $re = $ctx->url_regexp or return $ctx->psgi_triple;
 	my $iter = PublicInbox::ConfigIter->new($ctx->{www}->{pi_cfg},
 						\&list_match_i, $re, $ctx);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] extsearch: avoid stale Xapian results
  2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong
  2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong
  2020-12-27 11:01 ` [PATCH 2/2] miscsearch: take reopen from Search and use it Eric Wong
@ 2020-12-28 15:32 ` Kyle Meyer
  2 siblings, 0 replies; 4+ messages in thread
From: Kyle Meyer @ 2020-12-28 15:32 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong writes:

> I noticed recent messages weren't showing up in search results
> on http://lore.czquwvybam4bgbro.onion/all/
>
> These should fix it, and we'll probably get rid of the
> cleanup timers for per-inbox search and follow this
> strategy.

I noticed that too but hadn't gotten around to reporting it.  This seems
to resolve the issue on my end.  Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-28 15:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong
2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong
2020-12-27 11:01 ` [PATCH 2/2] miscsearch: take reopen from Search and use it Eric Wong
2020-12-28 15:32 ` [PATCH 0/2] extsearch: avoid stale Xapian results Kyle Meyer

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ http://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git