user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH] content_hash: skip Sender for cross posted messages
Date: Fri, 29 Jan 2021 23:41:09 -0600	[thread overview]
Message-ID: <20210130054109.24815-1-e@80x24.org> (raw)

This regression was introduced long ago and matches behavior
originally specified in the comments.  It makes a noticeable
improvement with search results using -extindex ("all") and
lei results with multiple inboxes.

Update some style bits at the top of the test case while
we're at it.

Fixes: f0ef0a56a8957d6f ("v2: improve deduplication checks")
---
 lib/PublicInbox/ContentHash.pm |  7 +++----
 t/content_hash.t               | 14 +++++++++++++-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/ContentHash.pm b/lib/PublicInbox/ContentHash.pm
index 838fdd6f..4dbe7b50 100644
--- a/lib/PublicInbox/ContentHash.pm
+++ b/lib/PublicInbox/ContentHash.pm
@@ -68,10 +68,9 @@ sub content_digest ($) {
 
 	# Only use Sender: if From is not present
 	foreach my $h (qw(From Sender)) {
-		my @v = $eml->header($h);
-		if (@v) {
-			digest_addr($dig, $h, $_) foreach @v;
-		}
+		my @v = $eml->header($h) or next;
+		digest_addr($dig, $h, $_) foreach @v;
+		last;
 	}
 	foreach my $h (qw(Subject Date)) {
 		my @v = $eml->header($h);
diff --git a/t/content_hash.t b/t/content_hash.t
index 3f02b1b3..060665f6 100644
--- a/t/content_hash.t
+++ b/t/content_hash.t
@@ -1,7 +1,8 @@
+#!perl -w
 # Copyright (C) 2018-2021 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
-use warnings;
+use v5.10.1;
 use Test::More;
 use PublicInbox::ContentHash qw(content_hash);
 use PublicInbox::Eml;
@@ -19,6 +20,17 @@ EOF
 my $orig = content_hash($mime);
 my $reload = content_hash(PublicInbox::Eml->new($mime->as_string));
 is($orig, $reload, 'content_hash matches after serialization');
+{
+	my $s1 = PublicInbox::Eml->new($mime->as_string);
+	$s1->header_set('Sender', 's@example.com');
+	is(content_hash($s1), $orig, "Sender ignored when 'From' present");
+	my $s2 = PublicInbox::Eml->new($s1->as_string);
+	$s1->header_set('Sender', 'sender@example.com');
+	is(content_hash($s2), $orig, "Sender really ignored 'From'");
+	$_->header_set('From') for ($s1, $s2);
+	isnt(content_hash($s1), content_hash($s2),
+		'sender accounted when From missing');
+}
 
 foreach my $h (qw(From To Cc)) {
 	my $n = q("Quoted N'Ame" <foo@EXAMPLE.com>);

                 reply	other threads:[~2021-01-30  5:41 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210130054109.24815-1-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).