about summary refs log tree commit homepage
path: root/lib/PublicInbox/MID.pm
DateCommit message (Collapse)
2020-01-24mid: shorten uniq_mids logic
We won't be able to use List::Util::uniq here, but we can still shorten our logic and make it more consistent with the rest of our code which does similar things.
2019-10-28index: allow search/lookups on X-Alt-Message-ID
Since we replace extra Message-ID headers with X-Alt-Message-ID to placate NNTP clients, we should allow searching and indexing on X-Alt-Message-ID just like we do with Message-ID.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-04mid: id_compress requires ASCII-clean words
Its result is used for HTML anchors and such.
2019-01-29mid: filter out 'y', 'n', and email addresses from references()
Looking at git@vger history, several emails had broken References/In-Reply-To pointing to <y>, <n> and email addresses as Message-IDs in References and In-Reply-To headers. This was causing too many unrelated messages to be linked together in the same thread.
2018-04-20disallow "\t" and "\n" in OVER headers
For Subject/To/Cc/From headers, we squeeze them to a space (' '). For Message-IDs (including References/In-Reply-To), '\t', '\n', '\r' are deleted since some MUAs might screw them up: https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw
2018-04-01truncate Message-IDs and References consistently
We need to stop ghost messages from generating longer Message-IDs than Xapian can handle with terms.
2018-03-19mid: mid_mime uses v2-compatible mids function
This allows us to be more consistent in dealing with completely empty Message-Ids.
2018-03-03mid: truncate excessively long MIDs early
Since we support duplicate MIDs in v2, we can safely truncate long MID terms in the database and let other normal duplicate resolution sort it out. It seems only spammers use excessively long MIDs, and there'll always be abuse/misuse vectors for causing mis-threaded messages, so it's not worth worrying about excessively long MIDs.
2018-03-03mid: be strict with References, but loose on Message-Id
Traditionally we've been more lax on parsing Message-Id and allow it without the angle brackets. We've always been strict on References and can't have it be pointlessly large when some MUA decides to use HTML-escaped angle brackets ("&lt;", "&gt;").
2018-03-02searchidx: use new `references' method for parsing References
It's shorter and more convenient, here.
2018-03-02mid: add `mids' and `references' methods for extraction
We'll be using a more consistent API for extracting Message-IDs from various headers.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-05-23www: do not mangle characters from search queries
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
2016-08-14www: do not unecessarily escape some chars in paths
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-14mid: no wide characters for sha1_hex
Apparently there are some really screwed up In-Reply-To fields out there.
2016-03-03use raw header for Message-ID
Message-IDs should not be MIME encoded, but in case they are, use the raw form for compatibility with ssoma and possibly other tools. This prevents a potential problem where a malicious client could confuse our storage layer into indexing incorrect contents.
2015-11-20various internal documentation updates
Hopefully this gives new hackers a better overview of how the components relate to each other.
2015-10-02rename mid_compress to id_compress
We use it as a general compressor for identifiers such as subject paths, so using the "mid_" prefix probably is not appropriate.
2015-09-06update copyright headers and email addresses
In the future, it should be possible to use this: git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright
2015-08-30mid2path: clean MID of angle brackets '<>'
We screwed up and needed to fix URL generation with '<>' in them. Regardless, users may attempt to copy and paste URLs with '<>' in them, do not punish them for that.
2015-08-27mid: extract Message-ID from inside '<>'
This is necessary for some mailers which include comment text in in the In-Reply-To header, merely assuming there is nothing outside of '<>' as we were doing is not enough.
2015-08-25mid: mid_compressed => mid_compress
Consistently name mid_* functions as verbs.
2015-08-17view: always compress Message-IDs for anchors
Valid URLs do not make valid anchor ids.
2015-08-17mid: compress Message-IDs with '%' in them
Some HTTP servers (apache2 2.2.22-13+deb7u5) on my system apparently do not handle "%25" correctly. I'm not yet sure if it's something weird with my rewrite rules or what....
2015-08-16view: deduplicate common code for loading search results
More to come later.
2015-08-15extract redundant Message-ID handling code
Quit repeating ourselves and use a common MID module instead.