From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 63F761F406; Mon, 27 Nov 2023 23:08:43 +0000 (UTC) Received: from in01.mta.xmission.com ([166.70.13.51]:34408) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1r7kiY-003rlQ-5o; Mon, 27 Nov 2023 16:08:42 -0700 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:50176 helo=email.froward.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1r7kiX-007Kf6-1A; Mon, 27 Nov 2023 16:08:41 -0700 From: "Eric W. Biederman" To: Eric Wong Cc: meta@public-inbox.org References: <20231127222059.M964164@dcvr> Date: Mon, 27 Nov 2023 17:08:05 -0600 In-Reply-To: <20231127222059.M964164@dcvr> (Eric Wong's message of "Mon, 27 Nov 2023 22:20:59 +0000") Message-ID: <87leailrqi.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1r7kiX-007Kf6-1A;;;mid=<87leailrqi.fsf@email.froward.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1/zZxD5vT1titLsG18CoFhVVeltGNKIy/Q= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH] disallow NUL characters in Message-ID and List-Id X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) List-Id: Eric Wong writes: > While MTAs seem to stop '\0' from appearing in headers, users > fetching archives via git remain susceptible to having '\0' land > in archives. So we'll filter them out of Xapian and SQLite DBs > to avoid interopability problems with CLI tools since there's no > known messages in lore or any of my archives which feature them. > > Avoiding '\0' will ensure all indexed Message-IDs and List-Ids > can be specified from the command-line (although some characters > will still require $(printf) contortions). > > As with Message-ID, List-Id fields with /\n\t\r/ characters will > also be stripped for indexing. I will assume whatever went wrong > with the References: header in > > could also happen to the List-Id header. > > This is inspired by commit aca47e05a6026c12c768753c87e6ff769ef6bee4 > (Import: Don't copy nulls from emails into git, 2018-07-07) That seems reasonable to me. Eric > --- > lib/PublicInbox/MID.pm | 2 +- > lib/PublicInbox/SearchIdx.pm | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm > index 97cf3a54..36c05855 100644 > --- a/lib/PublicInbox/MID.pm > +++ b/lib/PublicInbox/MID.pm > @@ -115,7 +115,7 @@ sub uniq_mids ($;$) { > my @ret; > $seen ||= {}; > foreach my $mid (@$mids) { > - $mid =~ tr/\n\t\r//d; > + $mid =~ tr/\n\t\r\0//d; > if (length($mid) > MAX_MID_SIZE) { > warn "Message-ID: <$mid> too long, truncating\n"; > $mid = substr($mid, 0, MAX_MID_SIZE); > diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm > index 32598b7c..f569428c 100644 > --- a/lib/PublicInbox/SearchIdx.pm > +++ b/lib/PublicInbox/SearchIdx.pm > @@ -414,6 +414,7 @@ sub index_list_id ($$$) { > for my $l ($hdr->header_raw('List-Id')) { > $l =~ /<([^>]+)>/ or next; > my $lid = lc $1; > + $lid =~ tr/\n\t\r\0//d; # same rules as Message-ID > $doc->add_boolean_term('G' . $lid); > index_phrase($self, $lid, 1, 'XL'); # probabilistic > }