user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: Troubleshooting threads missing from /all/
  @ 2021-10-07 21:33  5%                   ` Eric Wong
  0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2021-10-07 21:33 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

(resend, screwed up something with my MTA :x)

OK.  I tried reproducing the problem even with f28fdcd6d8d6
(content_hash: normalize whitespace before hashing addresses, 2021-10-02)
reverted, but haven't been able to...

So far I've found some gc and dedupe bugs, but something's still
eluding me.  And I also noticed and started fixing another bug
which may necessitate a full --reindex, anyways (at least for
non-ASCII subjects).

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> [publicinbox "regressions"]
>   address = regressions@lists.linux.dev
>   url = regressions
>   inboxdir = /srv/public-inbox/lore.kernel.org/regressions
>   indexlevel = full

Btw, "indexlevel = basic" ought to be sufficient if an inbox
is in extindex once bugs are ironed out.  full/medium is
of course helpful if messages are missing from extindex,
though...

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 3/4] content_hash: normalize whitespace before hashing addresses
  2021-10-02 11:18  7% ` [PATCH 3/4] content_hash: normalize whitespace before hashing addresses Eric Wong
@ 2021-10-05  7:09  7%   ` Eric Wong
  0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2021-10-05  7:09 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> This should prevent some false duplicates.  I noticed this
> while implementing "lei mail-diff", and only noticed it when
> I implemented the ContentDigestDbg wrapper for mail-diff.

Btw, I completely forgot -extindex has a --dedupe switch
for dealing with situations like this:

	public-inbox-extindex --dedupe=MSGID [--dedupe=MSGID1]
	public-inbox-extindex --dedupe # everything!

It looks like there's even test cases for it in t/extsearch.t (!)

I'm running --dedupe on yhbt.net/lore/all, because apparently
--reindex doesn't deduplicate :x

And there's a lot of stuff I still need to document in
the pod/manpage :x

^ permalink raw reply	[relevance 7%]

* [PATCH 3/4] content_hash: normalize whitespace before hashing addresses
  2021-10-02 11:18  5% [PATCH 0/4] lei mail-diff and other small things Eric Wong
@ 2021-10-02 11:18  7% ` Eric Wong
  2021-10-05  7:09  7%   ` Eric Wong
  0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-02 11:18 UTC (permalink / raw)
  To: meta

This should prevent some false duplicates.  I noticed this
while implementing "lei mail-diff", and only noticed it when
I implemented the ContentDigestDbg wrapper for mail-diff.
---
 lib/PublicInbox/ContentHash.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/ContentHash.pm b/lib/PublicInbox/ContentHash.pm
index f6ae9011c1bf..bacc9cdda124 100644
--- a/lib/PublicInbox/ContentHash.pm
+++ b/lib/PublicInbox/ContentHash.pm
@@ -20,6 +20,7 @@ use Digest::SHA;
 sub digest_addr ($$$) {
 	my ($dig, $h, $v) = @_;
 	$v =~ tr/"//d;
+	$v =~ tr/\r\n\t / /s;
 	$v =~ s/@([a-z0-9\_\.\-\(\)]*([A-Z])\S*)/'@'.lc($1)/ge;
 	utf8::encode($v);
 	$dig->add("$h\0$v\0");

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/4] lei mail-diff and other small things
@ 2021-10-02 11:18  5% Eric Wong
  2021-10-02 11:18  7% ` [PATCH 3/4] content_hash: normalize whitespace before hashing addresses Eric Wong
  0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-02 11:18 UTC (permalink / raw)
  To: meta

1/4 doesn't matter, atm;
2/4 is something I finally got around to doing :x
3/4 is long overdue, I think (and a result of 2/4)
Not sure why 4/4 wasn't done earlier, either, I guess
I never notice missing blobs.

Eric Wong (4):
  extsearchidx: attach_config: set {ibx_map} value to $ibx
  lei mail-diff: diagnostic command to diff mail contents
  content_hash: normalize whitespace before hashing addresses
  extsearchidx: emit diagnostics for missing blobs

 MANIFEST                        |   1 +
 lib/PublicInbox/ContentHash.pm  |   7 +-
 lib/PublicInbox/ExtSearchIdx.pm |   8 ++-
 lib/PublicInbox/LEI.pm          |   5 ++
 lib/PublicInbox/LeiInput.pm     |   6 ++
 lib/PublicInbox/LeiMailDiff.pm  | 111 ++++++++++++++++++++++++++++++++
 lib/PublicInbox/LeiRediff.pm    |  63 +++++++++---------
 7 files changed, 167 insertions(+), 34 deletions(-)
 create mode 100644 lib/PublicInbox/LeiMailDiff.pm

^ permalink raw reply	[relevance 5%]

Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-10-01 20:08     Troubleshooting threads missing from /all/ Konstantin Ryabitsev
2021-10-01 20:41     ` Eric Wong
2021-10-01 20:49       ` Konstantin Ryabitsev
2021-10-01 20:54         ` Eric Wong
2021-10-01 20:58           ` Konstantin Ryabitsev
2021-10-01 22:25             ` Eric Wong
2021-10-01 23:11               ` Konstantin Ryabitsev
2021-10-01 23:46                 ` Eric Wong
2021-10-05  4:39                   ` Eric Wong
2021-10-05 18:03                     ` Konstantin Ryabitsev
2021-10-07 21:33  5%                   ` Eric Wong
2021-10-02 11:18  5% [PATCH 0/4] lei mail-diff and other small things Eric Wong
2021-10-02 11:18  7% ` [PATCH 3/4] content_hash: normalize whitespace before hashing addresses Eric Wong
2021-10-05  7:09  7%   ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).