* Re: Troubleshooting threads missing from /all/
@ 2021-10-07 21:33 5% ` Eric Wong
0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2021-10-07 21:33 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
(resend, screwed up something with my MTA :x)
OK. I tried reproducing the problem even with f28fdcd6d8d6
(content_hash: normalize whitespace before hashing addresses, 2021-10-02)
reverted, but haven't been able to...
So far I've found some gc and dedupe bugs, but something's still
eluding me. And I also noticed and started fixing another bug
which may necessitate a full --reindex, anyways (at least for
non-ASCII subjects).
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> [publicinbox "regressions"]
> address = regressions@lists.linux.dev
> url = regressions
> inboxdir = /srv/public-inbox/lore.kernel.org/regressions
> indexlevel = full
Btw, "indexlevel = basic" ought to be sufficient if an inbox
is in extindex once bugs are ironed out. full/medium is
of course helpful if messages are missing from extindex,
though...
^ permalink raw reply [relevance 5%]
* Re: [PATCH 3/4] content_hash: normalize whitespace before hashing addresses
2021-10-02 11:18 7% ` [PATCH 3/4] content_hash: normalize whitespace before hashing addresses Eric Wong
@ 2021-10-05 7:09 7% ` Eric Wong
0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2021-10-05 7:09 UTC (permalink / raw)
To: meta
Eric Wong <e@80x24.org> wrote:
> This should prevent some false duplicates. I noticed this
> while implementing "lei mail-diff", and only noticed it when
> I implemented the ContentDigestDbg wrapper for mail-diff.
Btw, I completely forgot -extindex has a --dedupe switch
for dealing with situations like this:
public-inbox-extindex --dedupe=MSGID [--dedupe=MSGID1]
public-inbox-extindex --dedupe # everything!
It looks like there's even test cases for it in t/extsearch.t (!)
I'm running --dedupe on yhbt.net/lore/all, because apparently
--reindex doesn't deduplicate :x
And there's a lot of stuff I still need to document in
the pod/manpage :x
^ permalink raw reply [relevance 7%]
* [PATCH 3/4] content_hash: normalize whitespace before hashing addresses
2021-10-02 11:18 5% [PATCH 0/4] lei mail-diff and other small things Eric Wong
@ 2021-10-02 11:18 7% ` Eric Wong
2021-10-05 7:09 7% ` Eric Wong
0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-02 11:18 UTC (permalink / raw)
To: meta
This should prevent some false duplicates. I noticed this
while implementing "lei mail-diff", and only noticed it when
I implemented the ContentDigestDbg wrapper for mail-diff.
---
lib/PublicInbox/ContentHash.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/PublicInbox/ContentHash.pm b/lib/PublicInbox/ContentHash.pm
index f6ae9011c1bf..bacc9cdda124 100644
--- a/lib/PublicInbox/ContentHash.pm
+++ b/lib/PublicInbox/ContentHash.pm
@@ -20,6 +20,7 @@ use Digest::SHA;
sub digest_addr ($$$) {
my ($dig, $h, $v) = @_;
$v =~ tr/"//d;
+ $v =~ tr/\r\n\t / /s;
$v =~ s/@([a-z0-9\_\.\-\(\)]*([A-Z])\S*)/'@'.lc($1)/ge;
utf8::encode($v);
$dig->add("$h\0$v\0");
^ permalink raw reply related [relevance 7%]
* [PATCH 0/4] lei mail-diff and other small things
@ 2021-10-02 11:18 5% Eric Wong
2021-10-02 11:18 7% ` [PATCH 3/4] content_hash: normalize whitespace before hashing addresses Eric Wong
0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-02 11:18 UTC (permalink / raw)
To: meta
1/4 doesn't matter, atm;
2/4 is something I finally got around to doing :x
3/4 is long overdue, I think (and a result of 2/4)
Not sure why 4/4 wasn't done earlier, either, I guess
I never notice missing blobs.
Eric Wong (4):
extsearchidx: attach_config: set {ibx_map} value to $ibx
lei mail-diff: diagnostic command to diff mail contents
content_hash: normalize whitespace before hashing addresses
extsearchidx: emit diagnostics for missing blobs
MANIFEST | 1 +
lib/PublicInbox/ContentHash.pm | 7 +-
lib/PublicInbox/ExtSearchIdx.pm | 8 ++-
lib/PublicInbox/LEI.pm | 5 ++
lib/PublicInbox/LeiInput.pm | 6 ++
lib/PublicInbox/LeiMailDiff.pm | 111 ++++++++++++++++++++++++++++++++
lib/PublicInbox/LeiRediff.pm | 63 +++++++++---------
7 files changed, 167 insertions(+), 34 deletions(-)
create mode 100644 lib/PublicInbox/LeiMailDiff.pm
^ permalink raw reply [relevance 5%]
Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-10-01 20:08 Troubleshooting threads missing from /all/ Konstantin Ryabitsev
2021-10-01 20:41 ` Eric Wong
2021-10-01 20:49 ` Konstantin Ryabitsev
2021-10-01 20:54 ` Eric Wong
2021-10-01 20:58 ` Konstantin Ryabitsev
2021-10-01 22:25 ` Eric Wong
2021-10-01 23:11 ` Konstantin Ryabitsev
2021-10-01 23:46 ` Eric Wong
2021-10-05 4:39 ` Eric Wong
2021-10-05 18:03 ` Konstantin Ryabitsev
2021-10-07 21:33 5% ` Eric Wong
2021-10-02 11:18 5% [PATCH 0/4] lei mail-diff and other small things Eric Wong
2021-10-02 11:18 7% ` [PATCH 3/4] content_hash: normalize whitespace before hashing addresses Eric Wong
2021-10-05 7:09 7% ` Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).