Date | Commit message (Collapse) |
|
We won't be able to use List::Util::uniq here, but we can still
shorten our logic and make it more consistent with the rest of
our code which does similar things.
|
|
Since we replace extra Message-ID headers with X-Alt-Message-ID
to placate NNTP clients, we should allow searching and indexing
on X-Alt-Message-ID just like we do with Message-ID.
|
|
|
|
Its result is used for HTML anchors and such.
|
|
Looking at git@vger history, several emails had broken
References/In-Reply-To pointing to <y>, <n> and email
addresses as Message-IDs in References and In-Reply-To
headers.
This was causing too many unrelated messages to be linked
together in the same thread.
|
|
For Subject/To/Cc/From headers, we squeeze them to a space (' ').
For Message-IDs (including References/In-Reply-To), '\t', '\n', '\r'
are deleted since some MUAs might screw them up:
https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw
|
|
We need to stop ghost messages from generating longer
Message-IDs than Xapian can handle with terms.
|
|
This allows us to be more consistent in dealing with completely
empty Message-Ids.
|
|
Since we support duplicate MIDs in v2, we can safely truncate
long MID terms in the database and let other normal duplicate
resolution sort it out. It seems only spammers use excessively
long MIDs, and there'll always be abuse/misuse vectors for causing
mis-threaded messages, so it's not worth worrying about
excessively long MIDs.
|
|
Traditionally we've been more lax on parsing Message-Id
and allow it without the angle brackets. We've always been
strict on References and can't have it be pointlessly
large when some MUA decides to use HTML-escaped angle
brackets ("<", ">").
|
|
It's shorter and more convenient, here.
|
|
We'll be using a more consistent API for extracting Message-IDs
from various headers.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
Apparently there are some really screwed up In-Reply-To
fields out there.
|
|
Message-IDs should not be MIME encoded, but in case they are,
use the raw form for compatibility with ssoma and possibly
other tools. This prevents a potential problem where a
malicious client could confuse our storage layer into indexing
incorrect contents.
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
We use it as a general compressor for identifiers such as
subject paths, so using the "mid_" prefix probably is not
appropriate.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
We screwed up and needed to fix URL generation with '<>'
in them. Regardless, users may attempt to copy and paste
URLs with '<>' in them, do not punish them for that.
|
|
This is necessary for some mailers which include comment text
in in the In-Reply-To header, merely assuming there is nothing
outside of '<>' as we were doing is not enough.
|
|
Consistently name mid_* functions as verbs.
|
|
Valid URLs do not make valid anchor ids.
|
|
Some HTTP servers (apache2 2.2.22-13+deb7u5) on my system
apparently do not handle "%25" correctly. I'm not yet sure if
it's something weird with my rewrite rules or what....
|
|
More to come later.
|
|
Quit repeating ourselves and use a common MID module
instead.
|