Date | Commit message (Collapse) |
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
We'll be using it for Resent-Message-ID with lei, and possibly
other places.
|
|
As shown recently in commit a05445fb400108e60ede7d377cf3b26a0392eb24
("config: config_fh_parse: micro-optimize"), the relying on
the return value of `push' and defined-or operators can avoid
modifying a the hash value scalar with an increment.
|
|
|
|
It's only used for HTML anchors which we will need indefinitely.
|
|
We can rely on the newer mids() sub directly and use faster
numeric comparisons for Msgmap unindexing in v1.
|
|
Prefer the "ID" capitalization since it seems to to be the
preferred capitalization in RFC 5322.
In theory, this allows the interpreter to deduplicate the string
internally (I haven't checked if it does).
Unfortunately, there's too many instances of "Message-Id" in the
tests to be worth changing at this point.
|
|
This allows us to consistently enforce the same Message-ID
extraction rules everywhere and makes it easier for us to
make changes in the future.
Update scripts/ssoma-replay, as well, but don't rely on
PublicInbox::* modules in that since it's legacy and
public-inbox was never a dependency of ssoma.
|
|
I didn't wait until September to do it, this year!
|
|
We won't be able to use List::Util::uniq here, but we can still
shorten our logic and make it more consistent with the rest of
our code which does similar things.
|
|
Since we replace extra Message-ID headers with X-Alt-Message-ID
to placate NNTP clients, we should allow searching and indexing
on X-Alt-Message-ID just like we do with Message-ID.
|
|
|
|
Its result is used for HTML anchors and such.
|
|
Looking at git@vger history, several emails had broken
References/In-Reply-To pointing to <y>, <n> and email
addresses as Message-IDs in References and In-Reply-To
headers.
This was causing too many unrelated messages to be linked
together in the same thread.
|
|
For Subject/To/Cc/From headers, we squeeze them to a space (' ').
For Message-IDs (including References/In-Reply-To), '\t', '\n', '\r'
are deleted since some MUAs might screw them up:
https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw
|
|
We need to stop ghost messages from generating longer
Message-IDs than Xapian can handle with terms.
|
|
This allows us to be more consistent in dealing with completely
empty Message-Ids.
|
|
Since we support duplicate MIDs in v2, we can safely truncate
long MID terms in the database and let other normal duplicate
resolution sort it out. It seems only spammers use excessively
long MIDs, and there'll always be abuse/misuse vectors for causing
mis-threaded messages, so it's not worth worrying about
excessively long MIDs.
|
|
Traditionally we've been more lax on parsing Message-Id
and allow it without the angle brackets. We've always been
strict on References and can't have it be pointlessly
large when some MUA decides to use HTML-escaped angle
brackets ("<", ">").
|
|
It's shorter and more convenient, here.
|
|
We'll be using a more consistent API for extracting Message-IDs
from various headers.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
Apparently there are some really screwed up In-Reply-To
fields out there.
|
|
Message-IDs should not be MIME encoded, but in case they are,
use the raw form for compatibility with ssoma and possibly
other tools. This prevents a potential problem where a
malicious client could confuse our storage layer into indexing
incorrect contents.
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
We use it as a general compressor for identifiers such as
subject paths, so using the "mid_" prefix probably is not
appropriate.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
We screwed up and needed to fix URL generation with '<>'
in them. Regardless, users may attempt to copy and paste
URLs with '<>' in them, do not punish them for that.
|
|
This is necessary for some mailers which include comment text
in in the In-Reply-To header, merely assuming there is nothing
outside of '<>' as we were doing is not enough.
|
|
Consistently name mid_* functions as verbs.
|
|
Valid URLs do not make valid anchor ids.
|
|
Some HTTP servers (apache2 2.2.22-13+deb7u5) on my system
apparently do not handle "%25" correctly. I'm not yet sure if
it's something weird with my rewrite rules or what....
|
|
More to come later.
|
|
Quit repeating ourselves and use a common MID module
instead.
|