user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 1/3] eml: relax warn_ignore regexps for current Email::Address::XS
  2021-07-06 12:42  5% [PATCH 0/3] extindex: dedupe support, + gc fix Eric Wong
@ 2021-07-06 12:42  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2021-07-06 12:42 UTC (permalink / raw)
  To: meta

These seem needed with the data I'm currently working on, but I
haven't changed my version of Email::Address::XS since my last
Debian stable upgrade (to buster).
---
 lib/PublicInbox/Eml.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 46c273ce..955d6a96 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -484,8 +484,8 @@ sub crlf { $_[0]->{crlf} // "\n" }
 sub warn_ignore {
 	my $s = "@_";
 	# Email::Address::XS warnings
-	$s =~ /^Argument contains empty address at /
-	|| $s =~ /^Element at index [0-9]+ contains /
+	$s =~ /^Argument contains empty /
+	|| $s =~ /^Element at index [0-9]+.*? contains /
 	# PublicInbox::MsgTime
 	|| $s =~ /^bogus TZ offset: .+?, ignoring and assuming \+0000/
 	|| $s =~ /^bad Date: .+? in /

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/3] extindex: dedupe support, + gc fix
@ 2021-07-06 12:42  5% Eric Wong
  2021-07-06 12:42  7% ` [PATCH 1/3] eml: relax warn_ignore regexps for current Email::Address::XS Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2021-07-06 12:42 UTC (permalink / raw)
  To: meta

I'm still not sure how the duplicates got into my extindices;
but the problem doesn't seem reproducible at the moment so
maybe the original bug was fixed.

Since there's already dedupe failures from past indexing, the
--dedupe switch here should help us get rid of them.  It's only
lightly tested, but it seems to be working.

There's also a minor fix for --gc, too.

Eric Wong (3):
  eml: relax warn_ignore regexps for current Email::Address::XS
  extindex: implement --dedupe to fix old extindices
  extindex: --gc: avoid SQLite lock conflict on shard cleanup

 lib/PublicInbox/Eml.pm            |  4 +-
 lib/PublicInbox/ExtSearchIdx.pm   | 96 +++++++++++++++++++++++++++++++
 lib/PublicInbox/OverIdx.pm        | 20 +++++++
 lib/PublicInbox/SearchIdxShard.pm |  5 +-
 script/public-inbox-extindex      | 13 ++++-
 t/extsearch.t                     | 11 ++++
 6 files changed, 142 insertions(+), 7 deletions(-)

^ permalink raw reply	[relevance 5%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-07-06 12:42  5% [PATCH 0/3] extindex: dedupe support, + gc fix Eric Wong
2021-07-06 12:42  7% ` [PATCH 1/3] eml: relax warn_ignore regexps for current Email::Address::XS Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).