user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH] searchidx: don't index Base-85 w/ CRLF endings
@ 2025-02-19 10:10 Eric Wong
  0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2025-02-19 10:10 UTC (permalink / raw)
  To: meta

I encountered a false positive search result from a CRLF message
with a Base-85 patch in it.  It turns out our Base-85 filtering
code didn't account for the possibility of "\r" showing up in
patch messages, so just ignore all trailing spaces (not just
horizontal spaces) in index_diff().

While we're at it, exclude horizontal whitespace and CR
consistently from Base-85-looking quoted text in
index_body_text(), too, since I'm sure there's messages with
CRCRLF in the wild, too...
---
 lib/PublicInbox/SearchIdx.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 51c8b9c5..1e8246bb 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -285,7 +285,7 @@ sub index_diff ($$$) {
 				push @$xnq, shift(@l);
 
 				# skip base85 and empty lines
-				while (@l && ($l[0] =~ /\A$BASE85\h*\z/o ||
+				while (@l && ($l[0] =~ /\A$BASE85\s*\z/o ||
 						$l[0] !~ /\S/)) {
 					shift @l;
 				}
@@ -386,8 +386,8 @@ sub index_body_text {
 			if ($txt =~ /^[>\t ]+GIT binary patch\r?/sm) {
 				# get rid of Base-85 noise
 				$txt =~ s/^([>\h]+(?:literal|delta)
-						\x20[0-9]+\r?\n)
-					(?:[>\h]+$BASE85\h*\r?\n)+/$1/gsmx;
+						\x20[0-9]+\h*\r*\n)
+					(?:[>\h]+$BASE85\h*\r*\n)+/$1/gsmx;
 			}
 			index_text($self, $txt, 0, 'XQUOT');
 		} else { # does it look like a diff?

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2025-02-19 10:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 10:10 [PATCH] searchidx: don't index Base-85 w/ CRLF endings Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).