* [PATCH] searchidx: don't index Base-85 w/ CRLF endings
@ 2025-02-19 10:10 Eric Wong
0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2025-02-19 10:10 UTC (permalink / raw)
To: meta
I encountered a false positive search result from a CRLF message
with a Base-85 patch in it. It turns out our Base-85 filtering
code didn't account for the possibility of "\r" showing up in
patch messages, so just ignore all trailing spaces (not just
horizontal spaces) in index_diff().
While we're at it, exclude horizontal whitespace and CR
consistently from Base-85-looking quoted text in
index_body_text(), too, since I'm sure there's messages with
CRCRLF in the wild, too...
---
lib/PublicInbox/SearchIdx.pm | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 51c8b9c5..1e8246bb 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -285,7 +285,7 @@ sub index_diff ($$$) {
push @$xnq, shift(@l);
# skip base85 and empty lines
- while (@l && ($l[0] =~ /\A$BASE85\h*\z/o ||
+ while (@l && ($l[0] =~ /\A$BASE85\s*\z/o ||
$l[0] !~ /\S/)) {
shift @l;
}
@@ -386,8 +386,8 @@ sub index_body_text {
if ($txt =~ /^[>\t ]+GIT binary patch\r?/sm) {
# get rid of Base-85 noise
$txt =~ s/^([>\h]+(?:literal|delta)
- \x20[0-9]+\r?\n)
- (?:[>\h]+$BASE85\h*\r?\n)+/$1/gsmx;
+ \x20[0-9]+\h*\r*\n)
+ (?:[>\h]+$BASE85\h*\r*\n)+/$1/gsmx;
}
index_text($self, $txt, 0, 'XQUOT');
} else { # does it look like a diff?
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2025-02-19 10:10 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 10:10 [PATCH] searchidx: don't index Base-85 w/ CRLF endings Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).