user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/4] www: support thread:{SUBQUERY} like notmuch
@ 2025-02-20 22:14 Eric Wong
  2025-02-20 22:14 ` [PATCH 1/4] xap_helper: switch C++ implementation to AGPL-3 Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2025-02-20 22:14 UTC (permalink / raw)
  To: meta

It requires using the C++ xap_helper implementation since custom
Xapian::FieldProcessors isn't possible with Perl bindings.

The cost issues are mitigated by making the xap_helper socket
non-blocking in the public daemons:
https://public-inbox.org/meta/2a3c504f6e10c0eb81ff2c4289166396584f9a04/s/
(daemon: make xap_helper socket non-blocking, 2025-02-11)

Note that thread:{SUBQUERY} has the same wonky quoting rules as
notmuch due to how the Xapian query parser works, so spaces
within the `{}' need to have the `{}' encased with double quotes
(") like this:

	thread:"{f:bob tc:alice}"

And it's even uglier in the URL:

	https://yhbt.net/lore/all/?q=thread%3A%22%7Bf%3Abob+tc%3Aalice%7D%22

`thread:MSGID' also works (similar to `thread:THREADID' in notmuch[1])

For 3/4 to work with `thread:GHOST-MSGID', reindexing must be
done and that takes 3-4 days for my HW with -extindex, so I
haven't started since I'm considering adding --xapian-only to
-extindex to speed up reindexing.

Unfortunately, --xapian-only in -extindex needs more care than
v2 -index since we need to deal with List-IDs being different
across cross-posted messages.

This isn't wired into lei, yet, but that's another work-in-progress...

Eric Wong (4):
  xap_helper: switch C++ implementation to AGPL-3
  xap_helper: support thread:{SUBQUERY} via C++
  search: index References: for thread:GHOST-MSGID
  searchidx: doc: note ->add_message is v1+tests only

 MANIFEST                        |  1 +
 lib/PublicInbox/Search.pm       |  4 +-
 lib/PublicInbox/SearchIdx.pm    |  9 ++--
 lib/PublicInbox/XapHelperCxx.pm |  3 +-
 lib/PublicInbox/xap_helper.h    | 18 +++++--
 lib/PublicInbox/xh_cidx.h       |  2 +-
 lib/PublicInbox/xh_mset.h       |  2 +-
 lib/PublicInbox/xh_thread_fp.h  | 77 ++++++++++++++++++++++++++++
 t/search.t                      | 10 ++++
 t/xap_helper.t                  | 89 +++++++++++++++++++++++++++++++++
 10 files changed, 203 insertions(+), 12 deletions(-)
 create mode 100644 lib/PublicInbox/xh_thread_fp.h

[1] we don't expose THREADIDs in the WWW UI due to lack of
    reproducibility amongst mirrors

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] xap_helper: switch C++ implementation to AGPL-3
  2025-02-20 22:14 [PATCH 0/4] www: support thread:{SUBQUERY} like notmuch Eric Wong
@ 2025-02-20 22:14 ` Eric Wong
  2025-02-20 22:14 ` [PATCH 2/4] xap_helper: support thread:{SUBQUERY} via C++ Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2025-02-20 22:14 UTC (permalink / raw)
  To: meta

GPL-2 approxidate code won't work with the XS/SWIG version, so
it looks like we'll keep calling `git rev-parse' in both
versions for the time being.  Meanwhile, it's more valuable to
be able to take GPL-3+ code from notmuch for thread:{} query
parsing.
---
 lib/PublicInbox/xap_helper.h | 3 +--
 lib/PublicInbox/xh_cidx.h    | 2 +-
 lib/PublicInbox/xh_mset.h    | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/xap_helper.h b/lib/PublicInbox/xap_helper.h
index 51ab48bf..95896725 100644
--- a/lib/PublicInbox/xap_helper.h
+++ b/lib/PublicInbox/xap_helper.h
@@ -1,7 +1,6 @@
 /*
  * Copyright (C) all contributors <meta@public-inbox.org>
- * License: GPL-2.0+ <https://www.gnu.org/licenses/gpl-2.0.txt>
- * Note: GPL-2+ since it'll incorporate approxidate from git someday
+ * License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
  *
  * Standalone helper process using C and minimal C++ for Xapian,
  * this is not linked to Perl in any way.
diff --git a/lib/PublicInbox/xh_cidx.h b/lib/PublicInbox/xh_cidx.h
index 8cc6a845..095999d0 100644
--- a/lib/PublicInbox/xh_cidx.h
+++ b/lib/PublicInbox/xh_cidx.h
@@ -1,5 +1,5 @@
 // Copyright (C) all contributors <meta@public-inbox.org>
-// License: GPL-2.0+ <https://www.gnu.org/licenses/gpl-2.0.txt>
+// License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 // This file is only intended to be included by xap_helper.h
 // it implements pieces used by CodeSearchIdx.pm
 
diff --git a/lib/PublicInbox/xh_mset.h b/lib/PublicInbox/xh_mset.h
index db2692c9..6fdecc39 100644
--- a/lib/PublicInbox/xh_mset.h
+++ b/lib/PublicInbox/xh_mset.h
@@ -1,5 +1,5 @@
 // Copyright (C) all contributors <meta@public-inbox.org>
-// License: GPL-2.0+ <https://www.gnu.org/licenses/gpl-2.0.txt>
+// License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 // This file is only intended to be included by xap_helper.h
 // it implements pieces used by WWW, IMAP and lei
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] xap_helper: support thread:{SUBQUERY} via C++
  2025-02-20 22:14 [PATCH 0/4] www: support thread:{SUBQUERY} like notmuch Eric Wong
  2025-02-20 22:14 ` [PATCH 1/4] xap_helper: switch C++ implementation to AGPL-3 Eric Wong
@ 2025-02-20 22:14 ` Eric Wong
  2025-02-20 22:14 ` [PATCH 3/4] search: index References: for thread:GHOST-MSGID Eric Wong
  2025-02-20 22:14 ` [PATCH 4/4] searchidx: doc: note ->add_message is v1+tests only Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2025-02-20 22:14 UTC (permalink / raw)
  To: meta

Stealing the idea and code from notmuch to perform subqueries
within search.  One major internal difference from notmuch is we
store THREADID as numeric column in Xapian whereas notmuch
stores a boolean term.  The use of a column lets us use
set_collapse_key to deduplicate results within Xapian itself.

The other difference from notmuch is we avoid exposing the
numeric THREADID since they're unstable and not reproducible in
mirrors, thus we also support `thread:MSGID' instead of
`thread:THREADID' in brace-less queries.
---
 MANIFEST                        |  1 +
 lib/PublicInbox/XapHelperCxx.pm |  3 +-
 lib/PublicInbox/xap_helper.h    | 15 +++++-
 lib/PublicInbox/xh_thread_fp.h  | 75 ++++++++++++++++++++++++++++++
 t/xap_helper.t                  | 82 +++++++++++++++++++++++++++++++++
 5 files changed, 173 insertions(+), 3 deletions(-)
 create mode 100644 lib/PublicInbox/xh_thread_fp.h

diff --git a/MANIFEST b/MANIFEST
index d4535038..ce1b2fdd 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -393,6 +393,7 @@ lib/PublicInbox/lg2.h
 lib/PublicInbox/xap_helper.h
 lib/PublicInbox/xh_cidx.h
 lib/PublicInbox/xh_mset.h
+lib/PublicInbox/xh_thread_fp.h
 sa_config/Makefile
 sa_config/README
 sa_config/root/etc/spamassassin/public-inbox.pre
diff --git a/lib/PublicInbox/XapHelperCxx.pm b/lib/PublicInbox/XapHelperCxx.pm
index 922bd583..817abcfb 100644
--- a/lib/PublicInbox/XapHelperCxx.pm
+++ b/lib/PublicInbox/XapHelperCxx.pm
@@ -28,7 +28,8 @@ $idir //= $ENV{PERL_INLINE_DIRECTORY} //
 substr($dir, 0, 0) = "$idir/";
 my $bin = "$dir/xap_helper";
 my ($srcpfx) = (__FILE__ =~ m!\A(.+/)[^/]+\z!);
-my @srcs = map { $srcpfx.$_ } qw(xh_mset.h xh_cidx.h xap_helper.h);
+my @srcs = map { $srcpfx.$_ }
+	qw(xh_mset.h xh_cidx.h xh_thread_fp.h xap_helper.h);
 my @pm_dep = map { $srcpfx.$_ } qw(Search.pm CodeSearch.pm);
 my $ldflags = '-Wl,-O1';
 $ldflags .= ' -Wl,--compress-debug-sections=zlib' if $^O ne 'openbsd';
diff --git a/lib/PublicInbox/xap_helper.h b/lib/PublicInbox/xap_helper.h
index 95896725..7e48de8a 100644
--- a/lib/PublicInbox/xap_helper.h
+++ b/lib/PublicInbox/xap_helper.h
@@ -139,6 +139,7 @@ static int srch_eq(const struct srch *a, const struct srch *b)
 KHASHL_CSET_INIT(KH_LOCAL, srch_set, srch_set, struct srch *,
 		srch_hash, srch_eq)
 static srch_set *srch_cache;
+static struct srch *cur_srch; // for ThreadFieldProcessor
 static long my_fd_max, shard_nfd;
 // sock_fd is modified in signal handler, yes, it's SOCK_SEQPACKET
 static volatile int sock_fd = STDIN_FILENO;
@@ -580,6 +581,8 @@ static void srch_cache_renew(struct srch *keep)
 	}
 }
 
+#include "xh_thread_fp.h" // ThreadFieldProcessor
+
 static void srch_init(struct req *req)
 {
 	int i;
@@ -634,10 +637,16 @@ static void srch_init(struct req *req)
 	srch->qp->set_stemming_strategy(Xapian::QueryParser::STEM_SOME);
 	srch->qp->SET_MAX_EXPANSION(100);
 
-	if (req->code_search)
+	if (req->code_search) {
 		qp_init_code_search(srch->qp); // CodeSearch.pm
-	else
+	} else {
+		Xapian::FieldProcessor *fp;
+
 		qp_init_mail_search(srch->qp); // Search.pm
+		// n.b. ->release() starts Xapian refcounting
+		fp = (new ThreadFieldProcessor(*srch->qp))->release();
+		srch->qp->add_boolean_prefix("thread", fp);
+	}
 }
 
 // setup query parser for altid and arbitrary headers
@@ -773,6 +782,7 @@ static void dispatch(struct req *req)
 	if (req->timeout_sec)
 		alarm(req->timeout_sec > UINT_MAX ?
 			UINT_MAX : (unsigned)req->timeout_sec);
+	cur_srch = req->srch; // set global for *FieldProcessor
 	try {
 		if (!req->fn(req))
 			warnx("`%s' failed", req->argv[0]);
@@ -834,6 +844,7 @@ static void req_cleanup(void *ptr)
 {
 	struct req *req = (struct req *)ptr;
 	free(req->lenv);
+	cur_srch = NULL;
 }
 
 static void reopen_logs(void)
diff --git a/lib/PublicInbox/xh_thread_fp.h b/lib/PublicInbox/xh_thread_fp.h
new file mode 100644
index 00000000..c7d36c36
--- /dev/null
+++ b/lib/PublicInbox/xh_thread_fp.h
@@ -0,0 +1,75 @@
+// thread field processor from notmuch - Copyright 2018 David Bremner
+// License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>
+// Disclaimer: Eric doesn't know C++
+
+class ThreadFieldProcessor : public Xapian::FieldProcessor {
+protected:
+	Xapian::QueryParser &qp;
+public:
+	ThreadFieldProcessor(Xapian::QueryParser &qp_) : qp(qp_) {};
+	Xapian::Query operator()(const std::string &str);
+};
+
+static enum exc_iter xpand_col_iter(std::set<std::string> &vals,
+					Xapian::MSetIterator *i,
+					unsigned column)
+{
+	try {
+		Xapian::Document doc = i->get_document();
+		vals.insert(doc.get_value(column));
+	} catch (const Xapian::DatabaseModifiedError &e) {
+		cur_srch->db->reopen();
+		return ITER_RETRY;
+	} catch (const Xapian::DocNotFoundError &e) { // oh well...
+		warnx("doc not found: %s", e.get_description().c_str());
+	}
+	return ITER_OK;
+}
+
+static Xapian::Query qry_xpand_col(Xapian::Query qry, unsigned column)
+{
+	Xapian::Query xqry = Xapian::Query::MatchNothing;
+
+	Xapian::Enquire enq(*cur_srch->db);
+	std::set<std::string> vals; // serialised Xapian column
+
+	enq.set_weighting_scheme(Xapian::BoolWeight());
+	enq.set_query(qry);
+	enq.set_collapse_key(column);
+
+	Xapian::MSet mset = enq.get_mset(0, cur_srch->db->get_doccount());
+
+	for (Xapian::MSetIterator i = mset.begin(); i != mset.end(); i++)  {
+		for (int t = 10; t > 0; --t)
+			switch (xpand_col_iter(vals, &i, column)) {
+			case ITER_OK: t = 0; break; // leave inner loop
+			case ITER_RETRY: break; // continue for-loop
+			case ITER_ABORT: return xqry; // impossible
+			}
+	}
+
+	std::set<std::string>::const_iterator tid;
+	for (tid = vals.begin(); tid != vals.end(); tid++)
+		xqry = Xapian::Query(Xapian::Query::OP_OR, xqry,
+				Xapian::Query(
+					Xapian::Query::OP_VALUE_RANGE,
+					column, *tid, *tid));
+	return xqry;
+}
+
+// Xapian calls this when processing queries since it's registered by
+// ->add_boolean_prefix
+Xapian::Query ThreadFieldProcessor::operator()(const std::string &str)
+{
+	Xapian::Query qry;
+
+	if (str.at(0) != '{') { // thread:$MSGID (no `{'/`}' encasement)
+		qry = Xapian::Query("Q" + str);
+	} else if (str.size() <= 1 || str.at(str.size() - 1) != '}') {
+		throw Xapian::QueryParserError("missing } in '" + str + "'");
+	} else { // thread:"{hello world}"
+		std::string qstr = str.substr(1, str.size() - 2);
+		qry = cur_srch->qp->parse_query(qstr, cur_srch->qp_flags);
+	}
+	return qry_xpand_col(qry, THREADID);
+}
diff --git a/t/xap_helper.t b/t/xap_helper.t
index b0fa75a2..3e8176a0 100644
--- a/t/xap_helper.t
+++ b/t/xap_helper.t
@@ -40,6 +40,41 @@ my $v2 = create_inbox 'v2', indexlevel => 'medium', version => 2,
 	}
 };
 
+my $thr = create_inbox 'thr', indexlevel => 'medium', version => 2,
+			tmpdir => "$tmp/thr", sub {
+	my ($im) = @_;
+	my $common = <<EOM;
+From: <BOFH\@YHBT.net>
+To: meta\@public-inbox.org
+Date: Mon, 1 Apr 2019 08:15:21 +0000
+EOM
+	$im->add(PublicInbox::Eml->new(<<EOM));
+${common}Subject: root message
+Message-ID: <thread-root\@example>
+
+hi
+EOM
+	my @t = qw(wildfires earthquake flood asteroid drought plague);
+	my $nr = 0;
+	for my $x (@t) {
+		++$nr;
+		$im->add(PublicInbox::Eml->new(<<EOM)) or xbail;
+${common}Subject: Re: root reply
+References: <thread-root\@example>
+Message-ID: <thread-hit-$nr\@example>
+
+$x
+EOM
+		$im->add(PublicInbox::Eml->new(<<EOM)) or xbail;
+${common}Subject: broken thread from $x
+References: <ghost-root\@example>
+Message-ID: <thread-miss-$nr\@example>
+
+$x
+EOM
+	}
+};
+
 my @ibx_idx = glob("$v2->{inboxdir}/xap*/?");
 my @ibx_shard_args = map { ('-d', $_) } @ibx_idx;
 my (@int) = glob("$crepo/public-inbox-cindex/cidx*/?");
@@ -269,6 +304,53 @@ for my $n (@NO_CXX) {
 	is $nr_out, scalar(@oids), "output count matches $xhc->{impl}" or
 		diag explain(\@res, \@err);
 
+	SKIP: {
+		$xhc->{impl} =~ /cxx/i or
+			skip "`thread:' field processor requires C++", 1;
+		require PublicInbox::XhcMset;
+		my $over = $thr->over;
+		my @thr_idx = glob("$thr->{inboxdir}/xap*/?");
+		my @thr_shard_args = map { ('-d', $_) } @thr_idx;
+		my (@art, $mset, $err);
+		my $capture = sub { ($mset, $err) = @_ };
+		my $retrieve = sub {
+			my ($qstr) = @_;
+			$r = $xhc->mkreq(undef, 'mset', @thr_shard_args, $qstr);
+			PublicInbox::XhcMset->maybe_new($r, undef, $capture);
+			map { $over->get_art($_->get_docid) } $mset->items;
+		};
+		@art = $retrieve->('thread:thread-root@example wildfires');
+		is scalar(@art), 1, 'got 1 result';
+		is scalar(grep { $_->{mid} =~ /thread-miss/ } @art), 0,
+			'no thread misses in result';
+		ok !$err, 'no error from thread:MSGID search';
+
+		@art = $retrieve->('thread:thread-root@example');
+		is scalar(@art), 7,
+			'expected number of results for thread:MSGID';
+		is scalar(grep {
+				$_->{mid} eq 'thread-root@example' ||
+				$_->{references} =~ /<thread-root\@example>/
+			} @art),
+			scalar(@art),
+			'got all matching results for thread:MSGID';
+
+		@art = $retrieve->('thread:"{ s:broken }"');
+		is scalar(@art), 6,
+			'expected number of results for thread:"{ SUBQUERY }"';
+		is scalar(grep { $_->{subject} =~ /broken/ } @art),
+			scalar(@art),
+			'expected matches for thread:"{ SUBQUERY }"';
+
+		my $nr = $ENV{TEST_LEAK_NR} or skip 'TEST_LEAK_NR unset', 1;
+		$ENV{VALGRIND} or diag
+"W: `VALGRIND=' unset w/ TEST_LEAK_NR (using -fsanitize?)";
+		for (1..$nr) {
+			$retrieve->('thread:thread-root@example wildfires');
+			$retrieve->('thread:"{ s:broken }" wildfires');
+		}
+	}
+
 	if ($ENV{TEST_XH_TIMEOUT}) {
 		diag 'testing timeouts...';
 		for my $j (qw(0 1)) {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] search: index References: for thread:GHOST-MSGID
  2025-02-20 22:14 [PATCH 0/4] www: support thread:{SUBQUERY} like notmuch Eric Wong
  2025-02-20 22:14 ` [PATCH 1/4] xap_helper: switch C++ implementation to AGPL-3 Eric Wong
  2025-02-20 22:14 ` [PATCH 2/4] xap_helper: support thread:{SUBQUERY} via C++ Eric Wong
@ 2025-02-20 22:14 ` Eric Wong
  2025-02-20 22:14 ` [PATCH 4/4] searchidx: doc: note ->add_message is v1+tests only Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2025-02-20 22:14 UTC (permalink / raw)
  To: meta

To search for messages in a thread with a ghost msgid,
we need to be able to search against msgids in the
References: header since (by definition) ghosts don't
show up as any Message-ID: we've indexed.

This should make our implementation of `thread:MSGID'
queries equivalent in capability to `thread:THREADID'
of notmuch.
---
 lib/PublicInbox/Search.pm      |  4 +++-
 lib/PublicInbox/SearchIdx.pm   |  7 ++++---
 lib/PublicInbox/xh_thread_fp.h |  4 +++-
 t/search.t                     | 10 ++++++++++
 t/xap_helper.t                 |  9 ++++++++-
 5 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index cb166101..0e288cf0 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -36,7 +36,7 @@ use constant {
 	# 4 - change "Re: " normalization, avoid circular Reference ghosts
 	# 5 - subject_path drops trailing '.'
 	# 6 - preserve References: order in document data
-	# 7 - remove references and inreplyto terms
+	# 7 - remove references and inreplyto terms (restored in 15 (v2.0))
 	# 8 - remove redundant/unneeded document data
 	# 9 - disable Message-ID compression (SHA-1)
 	# 10 - optimize doc for NNTP overviews
@@ -53,6 +53,7 @@ use constant {
 	#      * "lid:" and "l:" for List-Id searches
 	#
 	#      v1.6.0 adds BYTES, UID and THREADID values
+	#      v2.0.0 re-adds "references:"
 	SCHEMA_VERSION => 15,
 
 	# we may have up to 8 FDs per shard (depends on Xapian *shrug*)
@@ -151,6 +152,7 @@ our %PATCH_BOOL_COMMON = (
 my %bool_pfx_external = (
 	mid => 'Q', # Message-ID (full/exact), this is mostly uniQue
 	lid => 'G', # newsGroup (or similar entity), just inside <>
+	references => 'XRF',
 	%PATCH_BOOL_COMMON
 );
 
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 1e8246bb..db4fcf76 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -481,6 +481,9 @@ sub eml2doc ($$$;$) {
 	$doc->add_boolean_term('O'.$ekey) if ($ekey // '.') ne '.';
 	msg_iter($eml, \&index_xapian, [ $self, $doc ]);
 	index_ids($self, $doc, $eml, $mids);
+	for (@{$smsg->parse_references($eml, $mids)}) {
+		$doc->add_boolean_term('XRF'.$_)
+	}
 
 	# by default, we maintain compatibility with v1.5.0 and earlier
 	# by writing to docdata.glass, users who never expect to downgrade can
@@ -488,9 +491,7 @@ sub eml2doc ($$$;$) {
 	if (!$self->{-skip_docdata}) {
 		# WWW doesn't need {to} or {cc}, only NNTP
 		$smsg->{to} = $smsg->{cc} = '';
-		$smsg->parse_references($eml, $mids);
-		my $data = $smsg->to_doc_data;
-		$doc->set_data($data);
+		$doc->set_data($smsg->to_doc_data);
 	}
 	my $xtra = defined $ekey ? $self->{"-extra\t$ekey"} : undef;
 	$xtra //= $self->{-extra};
diff --git a/lib/PublicInbox/xh_thread_fp.h b/lib/PublicInbox/xh_thread_fp.h
index c7d36c36..2c88401c 100644
--- a/lib/PublicInbox/xh_thread_fp.h
+++ b/lib/PublicInbox/xh_thread_fp.h
@@ -64,7 +64,9 @@ Xapian::Query ThreadFieldProcessor::operator()(const std::string &str)
 	Xapian::Query qry;
 
 	if (str.at(0) != '{') { // thread:$MSGID (no `{'/`}' encasement)
-		qry = Xapian::Query("Q" + str);
+		qry = Xapian::Query(Xapian::Query::OP_OR,
+				Xapian::Query("Q" + str),
+				Xapian::Query("XRF" + str));
 	} else if (str.size() <= 1 || str.at(str.size() - 1) != '}') {
 		throw Xapian::QueryParserError("missing } in '" + str + "'");
 	} else { // thread:"{hello world}"
diff --git a/t/search.t b/t/search.t
index 8938e6c6..a0f25769 100644
--- a/t/search.t
+++ b/t/search.t
@@ -135,6 +135,16 @@ my $query = sub {
 	my $second = $res->[0];
 
 	isnt($first, $second, "offset returned different result from limit");
+
+	for my $f (qw(references)) {
+		$res = $query->($f . ':root@s');
+		@res = filter_mids($res);
+		is_deeply \@res, [ 'last@s' ],
+			  "got expected results for $f: match";
+			 diag explain(\@res);
+		$res = $query->($f . ':root');
+		is scalar(@$res), 0, "no partial mid match";
+	}
 }
 
 # ghost vivication
diff --git a/t/xap_helper.t b/t/xap_helper.t
index 3e8176a0..e87c9da8 100644
--- a/t/xap_helper.t
+++ b/t/xap_helper.t
@@ -40,7 +40,7 @@ my $v2 = create_inbox 'v2', indexlevel => 'medium', version => 2,
 	}
 };
 
-my $thr = create_inbox 'thr', indexlevel => 'medium', version => 2,
+my $thr = create_inbox 'thr-ref+', indexlevel => 'medium', version => 2,
 			tmpdir => "$tmp/thr", sub {
 	my ($im) = @_;
 	my $common = <<EOM;
@@ -342,6 +342,13 @@ for my $n (@NO_CXX) {
 			scalar(@art),
 			'expected matches for thread:"{ SUBQUERY }"';
 
+		@art = $retrieve->('thread:ghost-root@example');
+		is scalar(@art), 6,
+			'expected number of results for thread:GHOST-MSGID';
+		is scalar(grep { $_->{references} =~ /ghost-root/ } @art),
+			scalar(@art),
+			'thread:MSGID works on ghosts';
+
 		my $nr = $ENV{TEST_LEAK_NR} or skip 'TEST_LEAK_NR unset', 1;
 		$ENV{VALGRIND} or diag
 "W: `VALGRIND=' unset w/ TEST_LEAK_NR (using -fsanitize?)";

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] searchidx: doc: note ->add_message is v1+tests only
  2025-02-20 22:14 [PATCH 0/4] www: support thread:{SUBQUERY} like notmuch Eric Wong
                   ` (2 preceding siblings ...)
  2025-02-20 22:14 ` [PATCH 3/4] search: index References: for thread:GHOST-MSGID Eric Wong
@ 2025-02-20 22:14 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2025-02-20 22:14 UTC (permalink / raw)
  To: meta

Noticed while seeing if --xapian-only is a worthwhile
addition to -extindex.
---
 lib/PublicInbox/SearchIdx.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index db4fcf76..4ed3881f 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -543,7 +543,7 @@ sub v1_index_mm ($$$) {
 	}
 }
 
-sub add_message {
+sub add_message { # v1 + tests only
 	# mime = PublicInbox::Eml or Email::MIME object
 	my ($self, $mime, $smsg, $cmt_info) = @_;
 	begin_txn_lazy($self);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-20 22:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-20 22:14 [PATCH 0/4] www: support thread:{SUBQUERY} like notmuch Eric Wong
2025-02-20 22:14 ` [PATCH 1/4] xap_helper: switch C++ implementation to AGPL-3 Eric Wong
2025-02-20 22:14 ` [PATCH 2/4] xap_helper: support thread:{SUBQUERY} via C++ Eric Wong
2025-02-20 22:14 ` [PATCH 3/4] search: index References: for thread:GHOST-MSGID Eric Wong
2025-02-20 22:14 ` [PATCH 4/4] searchidx: doc: note ->add_message is v1+tests only Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).