about summary refs log tree commit homepage
path: root/lib/PublicInbox/ContentHash.pm
DateCommit message (Collapse)
2021-10-02content_hash: normalize whitespace before hashing addresses
This should prevent some false duplicates. I noticed this while implementing "lei mail-diff", and only noticed it when I implemented the ContentDigestDbg wrapper for mail-diff.
2021-10-02lei mail-diff: diagnostic command to diff mail contents
This is useful in finding the cause of deduplication bugs, and possibly the cause of missing threads reported by Konstantin in <20211001130527.z7eivotlgqbgetzz@meerkat.local> usage: u=https://yhbt.net/lore/all/87czop5j33.fsf@tynnyri.adurom.net/raw lei mail-diff $u
2021-04-30content_hash: git_sha: allow unblessed SCALAR refs
This will be convenient to avoid the overhead of PublicInbox::Eml for verifying synchronization in lei.
2021-03-21lei q: fix warning on remote imports
This will let us tie keywords from remote externals to those which only exist in local externals.
2021-01-31content_hash: skip Sender for cross posted messages
This regression was introduced long ago and matches behavior originally specified in the comments. It makes a noticeable improvement with search results using -extindex ("all") and lei results with multiple inboxes. Update some style bits at the top of the test case while we're at it. Fixes: f0ef0a56a8957d6f ("v2: improve deduplication checks")
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-08-02remove unnecessary ->header_obj calls
We used ->header_obj in the past as an optimization with Email::MIME. That optimization is no longer necessary with PublicInbox::Eml. This doesn't make any functional difference even if we were to go back to Email::MIME. However, it reduces the amount of code we have and slightly reduces allocations with PublicInbox::Eml.
2020-05-12rename "ContentId" to "ContentHash"
The old name may be confused with "Content-ID" as described in RFC 2392, so use an alternate name to avoid confusing future readers.