* [PATCH 0/3] xap_helper improvements
@ 2025-02-23 13:13 Eric Wong
2025-02-23 13:13 ` [PATCH 1/3] xap_helper: enable FLAG_PURE_NOT in external process Eric Wong
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2025-02-23 13:13 UTC (permalink / raw)
To: meta
Since we're allowing reasonably expensive stuff in xap_helper,
disallowing expensive pure NOT queries doesn't seem to make
sense as there's many other ways to trigger slow queries.
2 and 3 should allow for minor memory reductions and speedups.
Eric Wong (3):
xap_helper: enable FLAG_PURE_NOT in external process
xap_helper: avoid temporary std::set in thread fp
xh_thread_fp: optimize OR query generation
lib/PublicInbox/XapHelper.pm | 4 +++-
lib/PublicInbox/xap_helper.h | 1 +
lib/PublicInbox/xh_thread_fp.h | 19 +++++++------------
3 files changed, 11 insertions(+), 13 deletions(-)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] xap_helper: enable FLAG_PURE_NOT in external process
2025-02-23 13:13 [PATCH 0/3] xap_helper improvements Eric Wong
@ 2025-02-23 13:13 ` Eric Wong
2025-02-23 13:13 ` [PATCH 2/3] xap_helper: avoid temporary std::set in thread fp Eric Wong
2025-02-23 13:13 ` [PATCH 3/3] xh_thread_fp: optimize OR query generation Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2025-02-23 13:13 UTC (permalink / raw)
To: meta
Since public-facing WWW uses an external process with
a non-blocking socket, clients will only see 503 errors
if the search is overloaded. While allowing pure NOT
queries is expensive, there are already many possible
ways of triggering expensive queries so it's probably not
a problem to allow one more.
---
lib/PublicInbox/XapHelper.pm | 4 +++-
lib/PublicInbox/xap_helper.h | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/XapHelper.pm b/lib/PublicInbox/XapHelper.pm
index 7e61631c..15abed79 100644
--- a/lib/PublicInbox/XapHelper.pm
+++ b/lib/PublicInbox/XapHelper.pm
@@ -20,6 +20,7 @@ use Carp qw(croak);
my $X = \%PublicInbox::Search::X;
our (%SRCH, %WORKERS, $nworker, $workerset, $in, $SHARD_NFD, $MY_FD_MAX);
our $stderr = \*STDERR;
+my $QP_FLAGS = $PublicInbox::Search::QP_FLAGS;
sub cmd_test_inspect {
my ($req) = @_;
@@ -194,7 +195,7 @@ sub dispatch {
$key .= "\0".join("\0", map { ('-Q', $_) } @{$req->{Q}}) if $req->{Q};
my $new;
$req->{srch} = $SRCH{$key} // do {
- $new = { qp_flags => $PublicInbox::Search::QP_FLAGS };
+ $new = { qp_flags => $QP_FLAGS };
my $nfd = scalar(@$dirs) * PublicInbox::Search::SHARD_COST;
$SHARD_NFD += $nfd;
if ($SHARD_NFD > $MY_FD_MAX) {
@@ -337,6 +338,7 @@ sub start (@) {
die "E: unable to get RLIMIT_NOFILE: $!";
warn "W: RLIMIT_NOFILE=$MY_FD_MAX too low\n" if $MY_FD_MAX < 72;
$MY_FD_MAX -= 64;
+ $QP_FLAGS |= PublicInbox::Search::FLAG_PURE_NOT();
local $nworker = $opt->{j};
return recv_loop() if $nworker == 0;
diff --git a/lib/PublicInbox/xap_helper.h b/lib/PublicInbox/xap_helper.h
index 7e48de8a..9c8436dc 100644
--- a/lib/PublicInbox/xap_helper.h
+++ b/lib/PublicInbox/xap_helper.h
@@ -590,6 +590,7 @@ static void srch_init(struct req *req)
const unsigned FLAG_PHRASE = Xapian::QueryParser::FLAG_PHRASE;
srch->qp_flags = Xapian::QueryParser::FLAG_BOOLEAN |
Xapian::QueryParser::FLAG_LOVEHATE |
+ Xapian::QueryParser::FLAG_PURE_NOT |
Xapian::QueryParser::FLAG_WILDCARD;
long nfd = req->dirc * SHARD_COST;
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] xap_helper: avoid temporary std::set in thread fp
2025-02-23 13:13 [PATCH 0/3] xap_helper improvements Eric Wong
2025-02-23 13:13 ` [PATCH 1/3] xap_helper: enable FLAG_PURE_NOT in external process Eric Wong
@ 2025-02-23 13:13 ` Eric Wong
2025-02-23 13:13 ` [PATCH 3/3] xh_thread_fp: optimize OR query generation Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2025-02-23 13:13 UTC (permalink / raw)
To: meta
Instead of creating a temporary std::set to store THREADIDs,
just inject the Xapian::Query::OP_OR operations directly into
the query we return.
Unlike notmuch(1), we can do this without risking redundant
query ops from repeated THREADIDs since use since we use a
column instead of a term for THREADID. Column use allowed
us to use .set_collapse_key to deduplicate the intermediate
mset, already.
---
lib/PublicInbox/xh_thread_fp.h | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/lib/PublicInbox/xh_thread_fp.h b/lib/PublicInbox/xh_thread_fp.h
index 2c88401c..27b5cfcb 100644
--- a/lib/PublicInbox/xh_thread_fp.h
+++ b/lib/PublicInbox/xh_thread_fp.h
@@ -10,13 +10,17 @@ public:
Xapian::Query operator()(const std::string &str);
};
-static enum exc_iter xpand_col_iter(std::set<std::string> &vals,
+static enum exc_iter xpand_col_iter(Xapian::Query *xqry,
Xapian::MSetIterator *i,
unsigned column)
{
try {
Xapian::Document doc = i->get_document();
- vals.insert(doc.get_value(column));
+ std::string val = doc.get_value(column);
+ *xqry = Xapian::Query(Xapian::Query::OP_OR, *xqry,
+ Xapian::Query(
+ Xapian::Query::OP_VALUE_RANGE,
+ column, val, val));
} catch (const Xapian::DatabaseModifiedError &e) {
cur_srch->db->reopen();
return ITER_RETRY;
@@ -29,9 +33,7 @@ static enum exc_iter xpand_col_iter(std::set<std::string> &vals,
static Xapian::Query qry_xpand_col(Xapian::Query qry, unsigned column)
{
Xapian::Query xqry = Xapian::Query::MatchNothing;
-
Xapian::Enquire enq(*cur_srch->db);
- std::set<std::string> vals; // serialised Xapian column
enq.set_weighting_scheme(Xapian::BoolWeight());
enq.set_query(qry);
@@ -41,19 +43,12 @@ static Xapian::Query qry_xpand_col(Xapian::Query qry, unsigned column)
for (Xapian::MSetIterator i = mset.begin(); i != mset.end(); i++) {
for (int t = 10; t > 0; --t)
- switch (xpand_col_iter(vals, &i, column)) {
+ switch (xpand_col_iter(&xqry, &i, column)) {
case ITER_OK: t = 0; break; // leave inner loop
case ITER_RETRY: break; // continue for-loop
case ITER_ABORT: return xqry; // impossible
}
}
-
- std::set<std::string>::const_iterator tid;
- for (tid = vals.begin(); tid != vals.end(); tid++)
- xqry = Xapian::Query(Xapian::Query::OP_OR, xqry,
- Xapian::Query(
- Xapian::Query::OP_VALUE_RANGE,
- column, *tid, *tid));
return xqry;
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] xh_thread_fp: optimize OR query generation
2025-02-23 13:13 [PATCH 0/3] xap_helper improvements Eric Wong
2025-02-23 13:13 ` [PATCH 1/3] xap_helper: enable FLAG_PURE_NOT in external process Eric Wong
2025-02-23 13:13 ` [PATCH 2/3] xap_helper: avoid temporary std::set in thread fp Eric Wong
@ 2025-02-23 13:13 ` Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2025-02-23 13:13 UTC (permalink / raw)
To: meta
For Xapian >= 1.4.10 users, using `|=' will reduce allocations.
`|=' still works for Xapian 1.3.2 .. 1.4.9 users, and I'm not
sure if Xapian 1.2.x is relevant anymore, especially for our
new C++-only features.
While operator overloading is often confusing and frustrating to
me when reading someone else's code, the optimization seems worth
it since (AFAIK) there's no other way to get the allocation
reduction.
cf. Olly in xapian-discuss <20250222043050.GA17282@survex.com>
---
lib/PublicInbox/xh_thread_fp.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/xh_thread_fp.h b/lib/PublicInbox/xh_thread_fp.h
index 27b5cfcb..8f0385f1 100644
--- a/lib/PublicInbox/xh_thread_fp.h
+++ b/lib/PublicInbox/xh_thread_fp.h
@@ -17,10 +17,10 @@ static enum exc_iter xpand_col_iter(Xapian::Query *xqry,
try {
Xapian::Document doc = i->get_document();
std::string val = doc.get_value(column);
- *xqry = Xapian::Query(Xapian::Query::OP_OR, *xqry,
- Xapian::Query(
- Xapian::Query::OP_VALUE_RANGE,
- column, val, val));
+ // n.b. Xapian 1.4.10+ optimizes `|=' to reduce allocation.
+ // operator overloading is confusing, yes :<
+ *xqry |= Xapian::Query(Xapian::Query::OP_VALUE_RANGE,
+ column, val, val);
} catch (const Xapian::DatabaseModifiedError &e) {
cur_srch->db->reopen();
return ITER_RETRY;
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-23 13:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-23 13:13 [PATCH 0/3] xap_helper improvements Eric Wong
2025-02-23 13:13 ` [PATCH 1/3] xap_helper: enable FLAG_PURE_NOT in external process Eric Wong
2025-02-23 13:13 ` [PATCH 2/3] xap_helper: avoid temporary std::set in thread fp Eric Wong
2025-02-23 13:13 ` [PATCH 3/3] xh_thread_fp: optimize OR query generation Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).