* search by whole thread?
@ 2023-04-11 22:59 Jacob Keller
2023-04-12 0:06 ` Eric Wong
0 siblings, 1 reply; 12+ messages in thread
From: Jacob Keller @ 2023-04-11 22:59 UTC (permalink / raw)
To: meta
Hi,
I'm wondering if there is a way to search a list by the entire thread?
For example, I want to find all threads which have at least one message
with dfn:<some path> and which have no messages containing the text
"Reviewed-by".
This would for example let me search an open source archive for threads
(patch series for example) which have not received any reviewed-by reply.
The current search function available through the HTML website doesn't
seem to have a "by thread" function. I also haven't been able to find
any option similar to this in the email client I typically use for
interacting with the lists (Thunderbird).
Perhaps this is something that I could implement locally from the clone
of the archive, but I am not quite sure how to go about it.. It seems
like something that should be reasonably straight forward given the way
that public inbox already tracks threads. Any suggestions on how to get
something like this would be appreciated.
Thanks,
Jake
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2023-04-11 22:59 search by whole thread? Jacob Keller
@ 2023-04-12 0:06 ` Eric Wong
2023-04-12 18:29 ` Jacob Keller
2023-04-12 18:49 ` Konstantin Ryabitsev
0 siblings, 2 replies; 12+ messages in thread
From: Eric Wong @ 2023-04-12 0:06 UTC (permalink / raw)
To: Jacob Keller; +Cc: meta
Jacob Keller <jacob.e.keller@intel.com> wrote:
> Hi,
>
> I'm wondering if there is a way to search a list by the entire thread?
Not yet...
> For example, I want to find all threads which have at least one message
> with dfn:<some path> and which have no messages containing the text
> "Reviewed-by".
>
> This would for example let me search an open source archive for threads
> (patch series for example) which have not received any reviewed-by reply.
Yes, that's something I've wanted, too...
> The current search function available through the HTML website doesn't
> seem to have a "by thread" function. I also haven't been able to find
> any option similar to this in the email client I typically use for
> interacting with the lists (Thunderbird).
I think the reason it's rare in MUAs is that it's potentially
very expensive. But I think the `thread:{subquery}' feature
from notmuch I discussed with Konstantin the other week[1] can
do what you want it to do.
Keep in mind, notmuch-search-terms(7) states:
The performance of such queries can vary wildly.
And that's for a private client tool for a single user.
For a public-facing web UI, we'll need proper timeouts (likely
via RLIMIT_CPU + SIGXCPU) in an external process and a C++ build
against libxapian. AFAIK, custom query parsers aren't possible
in Xapian's high-level language bindings; fortunately I can
legally reuse GPL-3+ C++ code from notmuch \o/
The external process will probably be similar to
`git cat-file --batch-command' though it can use SOCK_SEQPACKET
for requests and pipes for large responses.
> Perhaps this is something that I could implement locally from the clone
> of the archive, but I am not quite sure how to go about it.. It seems
> like something that should be reasonably straight forward given the way
> that public inbox already tracks threads. Any suggestions on how to get
> something like this would be appreciated.
Are you able to confirm notmuch `thread:{subquery}' is what
you're after?
I plan on implementing it with proper timeouts for untrusted
clients within the next few weeks/months; assuming some other
stuff works out and I still have Internet + power.
[1] https://public-inbox.org/meta/20230328194549.M808175@dcvr/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2023-04-12 0:06 ` Eric Wong
@ 2023-04-12 18:29 ` Jacob Keller
2023-04-12 18:49 ` Konstantin Ryabitsev
1 sibling, 0 replies; 12+ messages in thread
From: Jacob Keller @ 2023-04-12 18:29 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On 4/11/2023 5:06 PM, Eric Wong wrote:
> Jacob Keller <jacob.e.keller@intel.com> wrote:
>> Hi,
>>
>> I'm wondering if there is a way to search a list by the entire thread?
>
> Not yet...
>
>> For example, I want to find all threads which have at least one message
>> with dfn:<some path> and which have no messages containing the text
>> "Reviewed-by".
>>
>> This would for example let me search an open source archive for threads
>> (patch series for example) which have not received any reviewed-by reply.
>
> Yes, that's something I've wanted, too...
>
>> The current search function available through the HTML website doesn't
>> seem to have a "by thread" function. I also haven't been able to find
>> any option similar to this in the email client I typically use for
>> interacting with the lists (Thunderbird).
>
> I think the reason it's rare in MUAs is that it's potentially
> very expensive. But I think the `thread:{subquery}' feature
> from notmuch I discussed with Konstantin the other week[1] can
> do what you want it to do.
>
> Keep in mind, notmuch-search-terms(7) states:
>
> The performance of such queries can vary wildly.
>
> And that's for a private client tool for a single user.
>
> For a public-facing web UI, we'll need proper timeouts (likely
> via RLIMIT_CPU + SIGXCPU) in an external process and a C++ build
> against libxapian. AFAIK, custom query parsers aren't possible
> in Xapian's high-level language bindings; fortunately I can
> legally reuse GPL-3+ C++ code from notmuch \o/
>
> The external process will probably be similar to
> `git cat-file --batch-command' though it can use SOCK_SEQPACKET
> for requests and pipes for large responses.
>
>> Perhaps this is something that I could implement locally from the clone
>> of the archive, but I am not quite sure how to go about it.. It seems
>> like something that should be reasonably straight forward given the way
>> that public inbox already tracks threads. Any suggestions on how to get
>> something like this would be appreciated.
>
> Are you able to confirm notmuch `thread:{subquery}' is what
> you're after?
Ah. I tried searching but didn't hit upon not much. To be honest, I bet
just directly using notmuch and having it subscribe to the messages from
the public inbox would be sufficient for my purposes :D
I'll explore this and see, but it does sound like the thread:{subquery}
is basically what I want.
>
> I plan on implementing it with proper timeouts for untrusted
> clients within the next few weeks/months; assuming some other
> stuff works out and I still have Internet + power.
>
>
> [1] https://public-inbox.org/meta/20230328194549.M808175@dcvr/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2023-04-12 0:06 ` Eric Wong
2023-04-12 18:29 ` Jacob Keller
@ 2023-04-12 18:49 ` Konstantin Ryabitsev
2023-04-12 20:17 ` Eric Wong
1 sibling, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2023-04-12 18:49 UTC (permalink / raw)
To: Eric Wong; +Cc: Jacob Keller, meta
On Wed, Apr 12, 2023 at 12:06:53AM +0000, Eric Wong wrote:
> I think the reason it's rare in MUAs is that it's potentially
> very expensive. But I think the `thread:{subquery}' feature
> from notmuch I discussed with Konstantin the other week[1] can
> do what you want it to do.
>
> Keep in mind, notmuch-search-terms(7) states:
>
> The performance of such queries can vary wildly.
>
> And that's for a private client tool for a single user.
Yes, when I was wondering about that, it was really for the lei side of
things. I don't really want to run expensive queries on lore (though I'm okay
if we can turn it off for /all/ or other very large lists).
-K
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2023-04-12 18:49 ` Konstantin Ryabitsev
@ 2023-04-12 20:17 ` Eric Wong
2023-04-12 21:01 ` Jacob Keller
0 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2023-04-12 20:17 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: Jacob Keller, meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Apr 12, 2023 at 12:06:53AM +0000, Eric Wong wrote:
> > I think the reason it's rare in MUAs is that it's potentially
> > very expensive. But I think the `thread:{subquery}' feature
> > from notmuch I discussed with Konstantin the other week[1] can
> > do what you want it to do.
> >
> > Keep in mind, notmuch-search-terms(7) states:
> >
> > The performance of such queries can vary wildly.
> >
> > And that's for a private client tool for a single user.
>
> Yes, when I was wondering about that, it was really for the lei side of
> things. I don't really want to run expensive queries on lore (though I'm okay
> if we can turn it off for /all/ or other very large lists).
I expect relying on timeouts in an external process will be fine
for lore, especially since some expensive queries are already
possible :x
I suppose ITIMER_REAL is better than RLIMIT_CPU since the former
accounts for I/O time. Xapian makes a lot of small pread
syscalls so I don't see it being stuck in D-state long on SSDs.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2023-04-12 20:17 ` Eric Wong
@ 2023-04-12 21:01 ` Jacob Keller
2023-04-12 21:21 ` Eric Wong
2025-02-20 22:23 ` Eric Wong
0 siblings, 2 replies; 12+ messages in thread
From: Jacob Keller @ 2023-04-12 21:01 UTC (permalink / raw)
To: Eric Wong, Konstantin Ryabitsev; +Cc: meta
On 4/12/2023 1:17 PM, Eric Wong wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
>> On Wed, Apr 12, 2023 at 12:06:53AM +0000, Eric Wong wrote:
>>> I think the reason it's rare in MUAs is that it's potentially
>>> very expensive. But I think the `thread:{subquery}' feature
>>> from notmuch I discussed with Konstantin the other week[1] can
>>> do what you want it to do.
>>>
>>> Keep in mind, notmuch-search-terms(7) states:
>>>
>>> The performance of such queries can vary wildly.
>>>
>>> And that's for a private client tool for a single user.
>>
>> Yes, when I was wondering about that, it was really for the lei side of
>> things. I don't really want to run expensive queries on lore (though I'm okay
>> if we can turn it off for /all/ or other very large lists).
>
> I expect relying on timeouts in an external process will be fine
> for lore, especially since some expensive queries are already
> possible :x
>
> I suppose ITIMER_REAL is better than RLIMIT_CPU since the former
> accounts for I/O time. Xapian makes a lot of small pread
> syscalls so I don't see it being stuck in D-state long on SSDs.
For what is worth to those watching the thread, I was able to get what I
needed via combining [1] with notmuch, and its good enough for my purposes.
Being able to do the thread:{} querying directly on lore would be
convenient, but doing the search locally is good enough for my purposes.
Thanks for the tip on notmuch!
-Jake
[1]: https://github.com/wkz/notmuch-lore
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2023-04-12 21:01 ` Jacob Keller
@ 2023-04-12 21:21 ` Eric Wong
2025-02-20 22:23 ` Eric Wong
1 sibling, 0 replies; 12+ messages in thread
From: Eric Wong @ 2023-04-12 21:21 UTC (permalink / raw)
To: Jacob Keller; +Cc: Konstantin Ryabitsev, meta
Jacob Keller <jacob.e.keller@intel.com> wrote:
> Thanks for the tip on notmuch!
No problem! Much of the indexing and search logic in
public-inbox was originally stolen from the C++ code of notmuch
and translated to Perl.
I haven't kept up-to-date with notmuch since giant Maildirs are
too expensive for me; but upcoming FUSE support in lei should
will exposing its git blobs w/o storing duplicate messages
anywhere on disk.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2023-04-12 21:01 ` Jacob Keller
2023-04-12 21:21 ` Eric Wong
@ 2025-02-20 22:23 ` Eric Wong
2025-02-20 22:28 ` Konstantin Ryabitsev
1 sibling, 1 reply; 12+ messages in thread
From: Eric Wong @ 2025-02-20 22:23 UTC (permalink / raw)
To: Jacob Keller; +Cc: Konstantin Ryabitsev, meta
Jacob Keller <jacob.e.keller@intel.com> wrote:
> Being able to do the thread:{} querying directly on lore would be
> convenient, but doing the search locally is good enough for my purposes.
Btw, I'm finally figuring out enough C++ to get thread:{SUBQUERY}
supported for C++ xap_helper users:
https://public-inbox.org/meta/20250220221431.3847239-1-e@80x24.org/
It's deployed and good enough for my lore mirror @ https://yhbt.net/lore/
but still needs (expensive) reindex for ghost messages.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2025-02-20 22:23 ` Eric Wong
@ 2025-02-20 22:28 ` Konstantin Ryabitsev
2025-02-20 23:38 ` Eric Wong
0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2025-02-20 22:28 UTC (permalink / raw)
To: Eric Wong; +Cc: Jacob Keller, meta
On Thu, Feb 20, 2025 at 10:23:26PM +0000, Eric Wong wrote:
> Jacob Keller <jacob.e.keller@intel.com> wrote:
> > Being able to do the thread:{} querying directly on lore would be
> > convenient, but doing the search locally is good enough for my purposes.
>
> Btw, I'm finally figuring out enough C++ to get thread:{SUBQUERY}
> supported for C++ xap_helper users:
Oh, nice, so this would let do stuff like "show me threads to which I haven't
replied yet," correct?
-K
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2025-02-20 22:28 ` Konstantin Ryabitsev
@ 2025-02-20 23:38 ` Eric Wong
2025-02-21 15:09 ` Konstantin Ryabitsev
0 siblings, 1 reply; 12+ messages in thread
From: Eric Wong @ 2025-02-20 23:38 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: Jacob Keller, meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Feb 20, 2025 at 10:23:26PM +0000, Eric Wong wrote:
> > Jacob Keller <jacob.e.keller@intel.com> wrote:
> > > Being able to do the thread:{} querying directly on lore would be
> > > convenient, but doing the search locally is good enough for my purposes.
> >
> > Btw, I'm finally figuring out enough C++ to get thread:{SUBQUERY}
> > supported for C++ xap_helper users:
>
> Oh, nice, so this would let do stuff like "show me threads to which I haven't
> replied yet," correct?
Unfortunately, not yet... Xapian fundamentally operates on
documents which have a 1:1 relationship with messages.
Thus the subquery finds individual messages which aren't
from you, and extract the internal threadids from those messages
and run the outer query against those threadids.
So as long as there's a single message in a thread you didn't
reply to, that threadid will be in the subset of threads to
be searched against.
Indexing threads separately from messages is a possible
solution, but Xapian is already expensive in terms of space.
I'll have to think of a good way to do what you want with the
existing data model, but it's probably possible by expanding
to using multiple subqueries.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2025-02-20 23:38 ` Eric Wong
@ 2025-02-21 15:09 ` Konstantin Ryabitsev
2025-02-23 12:39 ` Eric Wong
0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Ryabitsev @ 2025-02-21 15:09 UTC (permalink / raw)
To: Eric Wong; +Cc: Jacob Keller, meta
On Thu, Feb 20, 2025 at 11:38:26PM +0000, Eric Wong wrote:
> > > Btw, I'm finally figuring out enough C++ to get thread:{SUBQUERY}
> > > supported for C++ xap_helper users:
> >
> > Oh, nice, so this would let do stuff like "show me threads to which I haven't
> > replied yet," correct?
>
> Unfortunately, not yet... Xapian fundamentally operates on
> documents which have a 1:1 relationship with messages.
> Thus the subquery finds individual messages which aren't
> from you, and extract the internal threadids from those messages
> and run the outer query against those threadids.
>
> So as long as there's a single message in a thread you didn't
> reply to, that threadid will be in the subset of threads to
> be searched against.
>
> Indexing threads separately from messages is a possible
> solution, but Xapian is already expensive in terms of space.
>
> I'll have to think of a good way to do what you want with the
> existing data model, but it's probably possible by expanding
> to using multiple subqueries.
Well, it wasn't really meant as a request to add the feature, it's more like
me trying to understand what cool things this can be used for. :)
Would this make it possible to do a query like "show me threads with patches
touching some/file.h that also received a reviewed-by from maintainer Fooski"?
Also, how would you combine quoted expressions inside thread:"{ }"?
-K
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: search by whole thread?
2025-02-21 15:09 ` Konstantin Ryabitsev
@ 2025-02-23 12:39 ` Eric Wong
0 siblings, 0 replies; 12+ messages in thread
From: Eric Wong @ 2025-02-23 12:39 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: Jacob Keller, meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Feb 20, 2025 at 11:38:26PM +0000, Eric Wong wrote:
> > > > Btw, I'm finally figuring out enough C++ to get thread:{SUBQUERY}
> > > > supported for C++ xap_helper users:
> > >
> > > Oh, nice, so this would let do stuff like "show me threads to which I haven't
> > > replied yet," correct?
> >
> > Unfortunately, not yet... Xapian fundamentally operates on
> > documents which have a 1:1 relationship with messages.
> > Thus the subquery finds individual messages which aren't
> > from you, and extract the internal threadids from those messages
> > and run the outer query against those threadids.
> >
> > So as long as there's a single message in a thread you didn't
> > reply to, that threadid will be in the subset of threads to
> > be searched against.
> >
> > Indexing threads separately from messages is a possible
> > solution, but Xapian is already expensive in terms of space.
> >
> > I'll have to think of a good way to do what you want with the
> > existing data model, but it's probably possible by expanding
> > to using multiple subqueries.
>
> Well, it wasn't really meant as a request to add the feature, it's more like
> me trying to understand what cool things this can be used for. :)
It'd be useful to have, but the WIP I started is extremely slow
(>=10 min/query!). I'm hoping it's possible to speed up w/o
increasing DB size and already asked xapian-discuss for help on
the matter.
> Would this make it possible to do a query like "show me threads with patches
> touching some/file.h that also received a reviewed-by from maintainer Fooski"?
Yes, I think so.
> Also, how would you combine quoted expressions inside thread:"{ }"?
I had to look up something I learned many years ago, but yes,
you can use two double-quotes ("") to mean " when inside a
double-quoted phrase.
Thus:
thread:"{""phrase search""}"
I'll add this test to prevent regressions:
-------8<------
Subject: [PATCH] t/xap_helper: test phrase search works with `thread:{}'
Xapian's query parser allows using 2x double-quotes ("") inside
an existing double-quoted phrase to mean a literal double-quote.
---
t/xap_helper.t | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/t/xap_helper.t b/t/xap_helper.t
index e87c9da8..eecd8320 100644
--- a/t/xap_helper.t
+++ b/t/xap_helper.t
@@ -40,7 +40,7 @@ my $v2 = create_inbox 'v2', indexlevel => 'medium', version => 2,
}
};
-my $thr = create_inbox 'thr-ref+', indexlevel => 'medium', version => 2,
+my $thr = create_inbox 'thr-ref+', indexlevel => 'full', version => 2,
tmpdir => "$tmp/thr", sub {
my ($im) = @_;
my $common = <<EOM;
@@ -64,6 +64,7 @@ References: <thread-root\@example>
Message-ID: <thread-hit-$nr\@example>
$x
+search phrase
EOM
$im->add(PublicInbox::Eml->new(<<EOM)) or xbail;
${common}Subject: broken thread from $x
@@ -71,6 +72,7 @@ References: <ghost-root\@example>
Message-ID: <thread-miss-$nr\@example>
$x
+phrase search
EOM
}
};
@@ -349,6 +351,13 @@ for my $n (@NO_CXX) {
scalar(@art),
'thread:MSGID works on ghosts';
+ @art = $retrieve->('thread:"{""phrase search""}"');
+ is scalar(@art), 6,
+ 'expected number of results for thread:GHOST-MSGID';
+ is scalar(grep { $_->{references} =~ /ghost-root/ } @art),
+ scalar(@art),
+ 'thread:"{""phrase search""}" works w/ 2x double quote';
+
my $nr = $ENV{TEST_LEAK_NR} or skip 'TEST_LEAK_NR unset', 1;
$ENV{VALGRIND} or diag
"W: `VALGRIND=' unset w/ TEST_LEAK_NR (using -fsanitize?)";
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-02-23 12:39 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-11 22:59 search by whole thread? Jacob Keller
2023-04-12 0:06 ` Eric Wong
2023-04-12 18:29 ` Jacob Keller
2023-04-12 18:49 ` Konstantin Ryabitsev
2023-04-12 20:17 ` Eric Wong
2023-04-12 21:01 ` Jacob Keller
2023-04-12 21:21 ` Eric Wong
2025-02-20 22:23 ` Eric Wong
2025-02-20 22:28 ` Konstantin Ryabitsev
2025-02-20 23:38 ` Eric Wong
2025-02-21 15:09 ` Konstantin Ryabitsev
2025-02-23 12:39 ` Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).