about summary refs log tree commit homepage
path: root/t/content_id.t
DateCommit message (Collapse)
2019-09-09run update-copyrights from gnulib for 2019
2018-05-11content_id: workaround quote handling change in Email::* modules
I'm not entirely sure where the behavior change lies, but it seems to be in some of the latest CPAN versions of these modules. In any case, this only affects the test setup and not actual behavior. cf. https://public-inbox.org/meta/2a2bf0e1-fd1f-f8bf-95bc-dac47906ef43@linuxfoundation.org/
2018-04-18v2: improve deduplication checks
First off, decode text portions of messages since some archived mail I got was converted from quoted-printable or base-64 to 8bit by the original recipient. Attempting to merge them with my own archives (which had no conversion done) led to unnecessary duplicates showing up. Then, normalize CRLF line endings in text portions to LF. In the headers, we relax the content_id hashing to ignore quotes and lower-case domain names in To, Cc, and From headers since some mail processors will alter them. Finally, I've discovered Email::MIME->new($mime->as_string) does not always round-trip reliably, so we calculate the content_id twice on user-supplied messages.
2018-03-02content_id: no need to be human-friendly
We merely use this for internal comparisons and do not store this in Xapian. So using a shorter, non-human readable digest is enough. Furthermore, introduce "content_digest" which returns the Digest::SHA object for extra changes.
2018-02-12content_id: add test case