From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id DC08F1FD51 for ; Thu, 23 Jan 2020 23:06:00 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 6/6] contentid: ignore duplicate References: headers Date: Thu, 23 Jan 2020 23:05:59 +0000 Message-Id: <20200123230559.16781-7-e@yhbt.net> In-Reply-To: <20200123230559.16781-1-e@yhbt.net> References: <20200123230559.16781-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: OverIdx::parse_references already skips duplicate References (which we use in SearchThread for rendering). So there's no reason for our content deduplication logic to care if a Message-Id in the Reference header is mentioned twice. --- lib/PublicInbox/ContentId.pm | 3 +-- lib/PublicInbox/OverIdx.pm | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/ContentId.pm b/lib/PublicInbox/ContentId.pm index 0c4a8678..65691593 100644 --- a/lib/PublicInbox/ContentId.pm +++ b/lib/PublicInbox/ContentId.pm @@ -64,8 +64,7 @@ sub content_digest ($) { # if we got here, we've already got Message-ID reuse my %seen = map { $_ => 1 } @{mids($hdr)}; foreach my $mid (@{references($hdr)}) { - next if $seen{$mid}; - $dig->add("ref\0$mid\0"); + $dig->add("ref\0$mid\0") unless $seen{$mid}++; } # Only use Sender: if From is not present diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm index 189bd21d..5f1007aa 100644 --- a/lib/PublicInbox/OverIdx.pm +++ b/lib/PublicInbox/OverIdx.pm @@ -230,8 +230,7 @@ sub parse_references ($$$) { warn "References: <$ref> too long, ignoring\n"; next; } - next if $seen{$ref}++; - push @keep, $ref; + push(@keep, $ref) unless $seen{$ref}++; } $smsg->{references} = '<'.join('> <', @keep).'>' if @keep; \@keep;