From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: [PATCH] disallow NUL characters in Message-ID and List-Id
Date: Mon, 27 Nov 2023 17:08:05 -0600 [thread overview]
Message-ID: <87leailrqi.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20231127222059.M964164@dcvr> (Eric Wong's message of "Mon, 27 Nov 2023 22:20:59 +0000")
Eric Wong <e@80x24.org> writes:
> While MTAs seem to stop '\0' from appearing in headers, users
> fetching archives via git remain susceptible to having '\0' land
> in archives. So we'll filter them out of Xapian and SQLite DBs
> to avoid interopability problems with CLI tools since there's no
> known messages in lore or any of my archives which feature them.
>
> Avoiding '\0' will ensure all indexed Message-IDs and List-Ids
> can be specified from the command-line (although some characters
> will still require $(printf) contortions).
>
> As with Message-ID, List-Id fields with /\n\t\r/ characters will
> also be stripped for indexing. I will assume whatever went wrong
> with the References: header in
> <https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw>
> could also happen to the List-Id header.
>
> This is inspired by commit aca47e05a6026c12c768753c87e6ff769ef6bee4
> (Import: Don't copy nulls from emails into git, 2018-07-07)
That seems reasonable to me.
Eric
> ---
> lib/PublicInbox/MID.pm | 2 +-
> lib/PublicInbox/SearchIdx.pm | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm
> index 97cf3a54..36c05855 100644
> --- a/lib/PublicInbox/MID.pm
> +++ b/lib/PublicInbox/MID.pm
> @@ -115,7 +115,7 @@ sub uniq_mids ($;$) {
> my @ret;
> $seen ||= {};
> foreach my $mid (@$mids) {
> - $mid =~ tr/\n\t\r//d;
> + $mid =~ tr/\n\t\r\0//d;
> if (length($mid) > MAX_MID_SIZE) {
> warn "Message-ID: <$mid> too long, truncating\n";
> $mid = substr($mid, 0, MAX_MID_SIZE);
> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
> index 32598b7c..f569428c 100644
> --- a/lib/PublicInbox/SearchIdx.pm
> +++ b/lib/PublicInbox/SearchIdx.pm
> @@ -414,6 +414,7 @@ sub index_list_id ($$$) {
> for my $l ($hdr->header_raw('List-Id')) {
> $l =~ /<([^>]+)>/ or next;
> my $lid = lc $1;
> + $lid =~ tr/\n\t\r\0//d; # same rules as Message-ID
> $doc->add_boolean_term('G' . $lid);
> index_phrase($self, $lid, 1, 'XL'); # probabilistic
> }
prev parent reply other threads:[~2023-11-27 23:08 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-27 22:20 [PATCH] disallow NUL characters in Message-ID and List-Id Eric Wong
2023-11-27 23:08 ` Eric W. Biederman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87leailrqi.fsf@email.froward.int.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).