user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 04/26] search: reenable phrase search on non-chert Xapian
Date: Thu, 23 May 2019 09:36:42 +0000
Message-ID: <20190523093704.18367-5-e@80x24.org> (raw)
In-Reply-To: <20190523093704.18367-1-e@80x24.org>

This is assuming nobody uses flint or earlier, anymore;
as flint predates the existence of this project.
---
 lib/PublicInbox/Search.pm | 48 +++++++++++++++++++++++----------------
 t/search.t                |  1 +
 2 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index eae10d8..d861cf4 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -24,8 +24,8 @@ sub load_xapian () {
 
 		# n.b. FLAG_PURE_NOT is expensive not suitable for a public
 		# website as it could become a denial-of-service vector
-		# FLAG_PHRASE also seems to cause performance problems
-		# sometimes.
+		# FLAG_PHRASE also seems to cause performance problems chert
+		# (and probably earlier Xapian DBs).  glass seems fine...
 		# TODO: make this an option, maybe?
 		# or make indexlevel=medium as default
 		FLAG_PHRASE()|FLAG_BOOLEAN()|FLAG_LOVEHATE()|FLAG_WILDCARD();
@@ -137,26 +137,35 @@ sub xdir ($;$) {
 	}
 }
 
+sub _xdb ($) {
+	my ($self) = @_;
+	my $dir = xdir($self, 1);
+	my ($xdb, $slow_phrase);
+	my $qpf = \($self->{qp_flags} ||= $QP_FLAGS);
+	if ($self->{version} >= 2) {
+		foreach my $part (<$dir/*>) {
+			-d $part && $part =~ m!/\d+\z! or next;
+			my $sub = Search::Xapian::Database->new($part);
+			if ($xdb) {
+				$xdb->add_database($sub);
+			} else {
+				$xdb = $sub;
+			}
+			$slow_phrase ||= -f "$part/iamchert";
+		}
+	} else {
+		$slow_phrase = -f "$dir/iamchert";
+		$xdb = Search::Xapian::Database->new($dir);
+	}
+	$$qpf |= FLAG_PHRASE() unless $slow_phrase;
+	$xdb;
+}
+
 sub xdb ($) {
 	my ($self) = @_;
 	$self->{xdb} ||= do {
 		load_xapian();
-		my $dir = xdir($self, 1);
-		if ($self->{version} >= 2) {
-			my $xdb;
-			foreach my $part (<$dir/*>) {
-				-d $part && $part =~ m!/\d+\z! or next;
-				my $sub = Search::Xapian::Database->new($part);
-				if ($xdb) {
-					$xdb->add_database($sub);
-				} else {
-					$xdb = $sub;
-				}
-			}
-			$xdb;
-		} else {
-			Search::Xapian::Database->new($dir);
-		}
+		_xdb($self);
 	};
 }
 
@@ -194,7 +203,8 @@ sub query {
 		$self->{over_ro}->recent($opts);
 	} else {
 		my $qp = qp($self);
-		my $query = $qp->parse_query($query_string, $QP_FLAGS);
+		my $qp_flags = $self->{qp_flags};
+		my $query = $qp->parse_query($query_string, $qp_flags);
 		$opts->{relevance} = 1 unless exists $opts->{relevance};
 		_do_enquire($self, $query, $opts);
 	}
diff --git a/t/search.t b/t/search.t
index c063620..538baef 100644
--- a/t/search.t
+++ b/t/search.t
@@ -30,6 +30,7 @@ my $ro = PublicInbox::Search->new($git_dir);
 my $rw_commit = sub {
 	$rw->commit_txn_lazy if $rw;
 	$rw = PublicInbox::SearchIdx->new($git_dir, 1);
+	$rw->{qp_flags} = 0; # quiet a warning
 	$rw->begin_txn_lazy;
 };
 
-- 
EW


  parent reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23  9:36 [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
2019-05-23  9:36 ` [PATCH 01/26] t/convert-compact: skip on missing xapian-compact(1) Eric Wong
2019-05-23  9:36 ` [PATCH 02/26] v1writable: retire in favor of InboxWritable Eric Wong
2019-05-23  9:36 ` [PATCH 03/26] doc: document the reason for --no-renumber Eric Wong
2019-05-23  9:36 ` Eric Wong [this message]
2019-05-23  9:36 ` [PATCH 05/26] xapcmd: new module for wrapping Xapian commands Eric Wong
2019-05-23  9:36 ` [PATCH 06/26] admin: hoist out resolve_inboxes for -compact and -index Eric Wong
2019-05-23  9:36 ` [PATCH 07/26] xapcmd: support spawn options Eric Wong
2019-05-23  9:36 ` [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1) Eric Wong
2019-05-23  9:36 ` [PATCH 09/26] xapcmd: do not cleanup on errors Eric Wong
2019-05-23  9:36 ` [PATCH 10/26] admin: move index_inbox over Eric Wong
2019-05-23  9:36 ` [PATCH 11/26] xcpdb: implement using Perl bindings Eric Wong
2019-05-23  9:36 ` [PATCH 12/26] xapcmd: xcpdb supports compaction Eric Wong
2019-05-23  9:36 ` [PATCH 13/26] v2writable: hoist out log_range sub for readability Eric Wong
2019-05-23  9:36 ` [PATCH 14/26] xcpdb: use fine-grained locking Eric Wong
2019-05-23  9:36 ` [PATCH 15/26] xcpdb: implement progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 16/26] xcpdb: cleanup error handling and diagnosis Eric Wong
2019-05-23  9:36 ` [PATCH 17/26] xapcmd: avoid EXDEV when finalizing changes Eric Wong
2019-05-23  9:36 ` [PATCH 18/26] doc: xcpdb: update to reflect the current state Eric Wong
2019-05-23  9:36 ` [PATCH 19/26] xapcmd: use "print STDERR" for progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 20/26] xcpdb: show re-indexing progress Eric Wong
2019-05-23  9:36 ` [PATCH 21/26] xcpdb: remove temporary directories on aborts Eric Wong
2019-05-23  9:37 ` [PATCH 22/26] compact: reuse infrastructure from xcpdb Eric Wong
2019-05-23  9:37 ` [PATCH 23/26] xcpdb|compact: support some xapian-compact switches Eric Wong
2019-05-23  9:37 ` [PATCH 24/26] xapcmd: cleanup on interrupted xcpdb "--compact" Eric Wong
2019-05-23  9:37 ` [PATCH 25/26] xcpdb|compact: support --jobs/-j flag like gmake(1) Eric Wong
2019-05-23  9:37 ` [PATCH 26/26] xapcmd: do not reset %SIG until last Xtmpdir is done Eric Wong
2019-05-23 10:37 ` [PATCH 27/26] doc: various updates to reflect current state Eric Wong

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190523093704.18367-5-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox