user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: [PATCH] disallow NUL characters in Message-ID and List-Id
Date: Mon, 27 Nov 2023 17:08:05 -0600	[thread overview]
Message-ID: <87leailrqi.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20231127222059.M964164@dcvr> (Eric Wong's message of "Mon, 27 Nov 2023 22:20:59 +0000")

Eric Wong <e@80x24.org> writes:

> While MTAs seem to stop '\0' from appearing in headers, users
> fetching archives via git remain susceptible to having '\0' land
> in archives.  So we'll filter them out of Xapian and SQLite DBs
> to avoid interopability problems with CLI tools since there's no
> known messages in lore or any of my archives which feature them.
>
> Avoiding '\0' will ensure all indexed Message-IDs and List-Ids
> can be specified from the command-line (although some characters
> will still require $(printf) contortions).
>
> As with Message-ID, List-Id fields with /\n\t\r/ characters will
> also be stripped for indexing.  I will assume whatever went wrong
> with the References: header in
> <https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw>
> could also happen to the List-Id header.
>
> This is inspired by commit aca47e05a6026c12c768753c87e6ff769ef6bee4
> (Import: Don't copy nulls from emails into git, 2018-07-07)

That seems reasonable to me.

Eric


> ---
>  lib/PublicInbox/MID.pm       | 2 +-
>  lib/PublicInbox/SearchIdx.pm | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm
> index 97cf3a54..36c05855 100644
> --- a/lib/PublicInbox/MID.pm
> +++ b/lib/PublicInbox/MID.pm
> @@ -115,7 +115,7 @@ sub uniq_mids ($;$) {
>  	my @ret;
>  	$seen ||= {};
>  	foreach my $mid (@$mids) {
> -		$mid =~ tr/\n\t\r//d;
> +		$mid =~ tr/\n\t\r\0//d;
>  		if (length($mid) > MAX_MID_SIZE) {
>  			warn "Message-ID: <$mid> too long, truncating\n";
>  			$mid = substr($mid, 0, MAX_MID_SIZE);
> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
> index 32598b7c..f569428c 100644
> --- a/lib/PublicInbox/SearchIdx.pm
> +++ b/lib/PublicInbox/SearchIdx.pm
> @@ -414,6 +414,7 @@ sub index_list_id ($$$) {
>  	for my $l ($hdr->header_raw('List-Id')) {
>  		$l =~ /<([^>]+)>/ or next;
>  		my $lid = lc $1;
> +		$lid =~ tr/\n\t\r\0//d; # same rules as Message-ID
>  		$doc->add_boolean_term('G' . $lid);
>  		index_phrase($self, $lid, 1, 'XL'); # probabilistic
>  	}

      reply	other threads:[~2023-11-27 23:08 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 22:20 [PATCH] disallow NUL characters in Message-ID and List-Id Eric Wong
2023-11-27 23:08 ` Eric W. Biederman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87leailrqi.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).