diff options
author | Eric Wong (Contractor, The Linux Foundation) <e@80x24.org> | 2018-03-03 17:57:57 +0000 |
---|---|---|
committer | Eric Wong (Contractor, The Linux Foundation) <e@80x24.org> | 2018-03-03 18:12:00 +0000 |
commit | ae68bf5da734189549bbac3a525845a58e45d77f (patch) | |
tree | 104914b78e516ee21bb50190345c17ca3748b459 /lib/PublicInbox/MID.pm | |
parent | 95bd8265dfb67ba90e7068bf5a4360168a1f30b6 (diff) | |
download | public-inbox-ae68bf5da734189549bbac3a525845a58e45d77f.tar.gz |
Since we support duplicate MIDs in v2, we can safely truncate long MID terms in the database and let other normal duplicate resolution sort it out. It seems only spammers use excessively long MIDs, and there'll always be abuse/misuse vectors for causing mis-threaded messages, so it's not worth worrying about excessively long MIDs.
Diffstat (limited to 'lib/PublicInbox/MID.pm')
-rw-r--r-- | lib/PublicInbox/MID.pm | 11 |
1 files changed, 10 insertions, 1 deletions
diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm index 96085399..422902f5 100644 --- a/lib/PublicInbox/MID.pm +++ b/lib/PublicInbox/MID.pm @@ -10,7 +10,10 @@ our @EXPORT_OK = qw/mid_clean id_compress mid2path mid_mime mid_escape MID_ESC mids references/; use URI::Escape qw(uri_escape_utf8); use Digest::SHA qw/sha1_hex/; -use constant MID_MAX => 40; # SHA-1 hex length +use constant { + MID_MAX => 40, # SHA-1 hex length # TODO: get rid of this + MAX_MID_SIZE => 244, # max term size (Xapian limitation) - length('Q') +}; sub mid_clean { my ($mid) = @_; @@ -61,6 +64,12 @@ sub mids ($) { push(@mids, $v); } } + foreach my $i (0..$#mids) { + next if length($mids[$i]) <= MAX_MID_SIZE; + warn "Message-ID: <$mids[$i]> too long, truncating\n"; + $mids[$i] = substr($mids[$i], 0, MAX_MID_SIZE); + } + uniq_mids(\@mids); } |