user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [PATCH] disallow NUL characters in Message-ID and List-Id
  2023-11-27 22:20  4% [PATCH] disallow NUL characters in Message-ID and List-Id Eric Wong
@ 2023-11-27 23:08  0% ` Eric W. Biederman
  0 siblings, 0 replies; 5+ results
From: Eric W. Biederman @ 2023-11-27 23:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> While MTAs seem to stop '\0' from appearing in headers, users
> fetching archives via git remain susceptible to having '\0' land
> in archives.  So we'll filter them out of Xapian and SQLite DBs
> to avoid interopability problems with CLI tools since there's no
> known messages in lore or any of my archives which feature them.
>
> Avoiding '\0' will ensure all indexed Message-IDs and List-Ids
> can be specified from the command-line (although some characters
> will still require $(printf) contortions).
>
> As with Message-ID, List-Id fields with /\n\t\r/ characters will
> also be stripped for indexing.  I will assume whatever went wrong
> with the References: header in
> <https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw>
> could also happen to the List-Id header.
>
> This is inspired by commit aca47e05a6026c12c768753c87e6ff769ef6bee4
> (Import: Don't copy nulls from emails into git, 2018-07-07)

That seems reasonable to me.

Eric


> ---
>  lib/PublicInbox/MID.pm       | 2 +-
>  lib/PublicInbox/SearchIdx.pm | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm
> index 97cf3a54..36c05855 100644
> --- a/lib/PublicInbox/MID.pm
> +++ b/lib/PublicInbox/MID.pm
> @@ -115,7 +115,7 @@ sub uniq_mids ($;$) {
>  	my @ret;
>  	$seen ||= {};
>  	foreach my $mid (@$mids) {
> -		$mid =~ tr/\n\t\r//d;
> +		$mid =~ tr/\n\t\r\0//d;
>  		if (length($mid) > MAX_MID_SIZE) {
>  			warn "Message-ID: <$mid> too long, truncating\n";
>  			$mid = substr($mid, 0, MAX_MID_SIZE);
> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
> index 32598b7c..f569428c 100644
> --- a/lib/PublicInbox/SearchIdx.pm
> +++ b/lib/PublicInbox/SearchIdx.pm
> @@ -414,6 +414,7 @@ sub index_list_id ($$$) {
>  	for my $l ($hdr->header_raw('List-Id')) {
>  		$l =~ /<([^>]+)>/ or next;
>  		my $lid = lc $1;
> +		$lid =~ tr/\n\t\r\0//d; # same rules as Message-ID
>  		$doc->add_boolean_term('G' . $lid);
>  		index_phrase($self, $lid, 1, 'XL'); # probabilistic
>  	}

^ permalink raw reply	[relevance 0%]

* [PATCH] disallow NUL characters in Message-ID and List-Id
@ 2023-11-27 22:20  4% Eric Wong
  2023-11-27 23:08  0% ` Eric W. Biederman
  0 siblings, 1 reply; 5+ results
From: Eric Wong @ 2023-11-27 22:20 UTC (permalink / raw)
  To: meta; +Cc: Eric W. Biederman

While MTAs seem to stop '\0' from appearing in headers, users
fetching archives via git remain susceptible to having '\0' land
in archives.  So we'll filter them out of Xapian and SQLite DBs
to avoid interopability problems with CLI tools since there's no
known messages in lore or any of my archives which feature them.

Avoiding '\0' will ensure all indexed Message-IDs and List-Ids
can be specified from the command-line (although some characters
will still require $(printf) contortions).

As with Message-ID, List-Id fields with /\n\t\r/ characters will
also be stripped for indexing.  I will assume whatever went wrong
with the References: header in
<https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw>
could also happen to the List-Id header.

This is inspired by commit aca47e05a6026c12c768753c87e6ff769ef6bee4
(Import: Don't copy nulls from emails into git, 2018-07-07)
---
 lib/PublicInbox/MID.pm       | 2 +-
 lib/PublicInbox/SearchIdx.pm | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm
index 97cf3a54..36c05855 100644
--- a/lib/PublicInbox/MID.pm
+++ b/lib/PublicInbox/MID.pm
@@ -115,7 +115,7 @@ sub uniq_mids ($;$) {
 	my @ret;
 	$seen ||= {};
 	foreach my $mid (@$mids) {
-		$mid =~ tr/\n\t\r//d;
+		$mid =~ tr/\n\t\r\0//d;
 		if (length($mid) > MAX_MID_SIZE) {
 			warn "Message-ID: <$mid> too long, truncating\n";
 			$mid = substr($mid, 0, MAX_MID_SIZE);
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 32598b7c..f569428c 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -414,6 +414,7 @@ sub index_list_id ($$$) {
 	for my $l ($hdr->header_raw('List-Id')) {
 		$l =~ /<([^>]+)>/ or next;
 		my $lid = lc $1;
+		$lid =~ tr/\n\t\r\0//d; # same rules as Message-ID
 		$doc->add_boolean_term('G' . $lid);
 		index_phrase($self, $lid, 1, 'XL'); # probabilistic
 	}

^ permalink raw reply related	[relevance 4%]

* Re: [PATCH] Import: Don't copy nulls from emails into git
  2018-07-08  0:07  7%             ` Eric Wong
@ 2018-07-08  1:52  7%               ` Eric W. Biederman
  0 siblings, 0 replies; 5+ results
From: Eric W. Biederman @ 2018-07-08  1:52 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 
>> Recently I ran git --git-dir=lkml/git/1.git fsck
>> and it reported:
>> > warning in commit 299dbd50b6995c6debe2275f0df984ce697fb4cc: nulInCommit: NULL byte inthe commit object body
>> 
>> Which I found quite scary.  Nulls in the wrong place have a bad tendency
>> to make programs misbehave.
>
> Thanks.  In this case, it's a bit weird for fast-import to
> accept this...
>
>> +	# Mime decoding can create nulls replace them with spaces to protect git
>> +	$subject =~ s/\0/ /;
>
> \0 could appear more than once, so it needs a 'g', at least.
> I squashed in a change to use "tr/\0/ /" instead
> (tr// should be faster)

Thank you.

I had thought of doing that but I forgot to come back and see if there
were any tweaks like that I needed to do.

Sigh. My perl is currently quite rusty.

Eric


^ permalink raw reply	[relevance 7%]

* Re: [PATCH] Import: Don't copy nulls from emails into git
  2018-07-07 18:22  6%           ` [PATCH] Import: Don't copy nulls from emails into git Eric W. Biederman
@ 2018-07-08  0:07  7%             ` Eric Wong
  2018-07-08  1:52  7%               ` Eric W. Biederman
  0 siblings, 1 reply; 5+ results
From: Eric Wong @ 2018-07-08  0:07 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> Recently I ran git --git-dir=lkml/git/1.git fsck
> and it reported:
> > warning in commit 299dbd50b6995c6debe2275f0df984ce697fb4cc: nulInCommit: NULL byte inthe commit object body
> 
> Which I found quite scary.  Nulls in the wrong place have a bad tendency
> to make programs misbehave.

Thanks.  In this case, it's a bit weird for fast-import to
accept this...

> +	# Mime decoding can create nulls replace them with spaces to protect git
> +	$subject =~ s/\0/ /;

\0 could appear more than once, so it needs a 'g', at least.
I squashed in a change to use "tr/\0/ /" instead
(tr// should be faster)

^ permalink raw reply	[relevance 7%]

* [PATCH] Import: Don't copy nulls from emails into git
  @ 2018-07-07 18:22  6%           ` Eric W. Biederman
  2018-07-08  0:07  7%             ` Eric Wong
  0 siblings, 1 reply; 5+ results
From: Eric W. Biederman @ 2018-07-07 18:22 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Recently I ran git --git-dir=lkml/git/1.git fsck
and it reported:
> warning in commit 299dbd50b6995c6debe2275f0df984ce697fb4cc: nulInCommit: NULL byte inthe commit object body

Which I found quite scary.  Nulls in the wrong place have a bad tendency
to make programs misbehave.

It turns out someone had placed "=?iso-8859-1?q?=00?=" at the end of
their subject line.  Which is the mime encoding for NULL.  Email::Mime
had correctly decoded the header, and then public-inbox had simply
copied the contents of the header into the subject line of the git
commit.

To prevent that from causing problems replace nulls in such subject
lines with spaces.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
---
 lib/PublicInbox/Import.pm |  2 ++
 t/nulsubject.t            | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)
 create mode 100644 t/nulsubject.t

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 250a2db31e97..8c1819209e58 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -405,6 +405,8 @@ sub add {
 		print $w "reset $ref\n" or wfail;
 	}
 
+	# Mime decoding can create nulls replace them with spaces to protect git
+	$subject =~ s/\0/ /;
 	utf8::encode($subject);
 	print $w "commit $ref\nmark :$commit\n",
 		"author $name <$email> $author_time_raw\n",
diff --git a/t/nulsubject.t b/t/nulsubject.t
new file mode 100644
index 000000000000..bb05be8589e7
--- /dev/null
+++ b/t/nulsubject.t
@@ -0,0 +1,33 @@
+# Copyright (C) 2016-2018 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use File::Temp qw/tempdir/;
+
+use_ok 'PublicInbox::Import';
+use_ok 'PublicInbox::Git';
+my $tmpdir = tempdir('pi-nulsubject-XXXXXX', TMPDIR => 1, CLEANUP => 1);
+my $git_dir = "$tmpdir/a.git";
+
+{
+	is(system(qw(git init -q --bare), $git_dir), 0, 'git init ok');
+	my $git = PublicInbox::Git->new($git_dir);
+	my $im = PublicInbox::Import->new($git, 'testbox', 'test@example');
+	$im->add(Email::MIME->create(
+		header => [
+			From => 'a@example.com',
+			To => 'b@example.com',
+			'Content-Type' => 'text/plain',
+			Subject => ' A subject line with a null =?iso-8859-1?q?=00?= see!',
+			'Message-ID' => '<null-test.a@example.com>',
+		],
+		body => "hello world\n",
+	));
+	$im->done;
+	is(system(qw(git --git-dir), $git_dir, 'fsck', '--strict'), 0, 'git fsck ok');
+}
+
+done_testing();
+
+1;
-- 
2.17.1


^ permalink raw reply related	[relevance 6%]

Results 1-5 of 5 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-07-05  5:40     Warnings from git fsck after lkml import Eric W. Biederman
2018-07-05 23:13     ` Eric Wong
2018-07-06  0:36       ` Eric W. Biederman
2018-07-06  3:47         ` Eric W. Biederman
2018-07-06 21:32           ` [PATCH] MsgTime.pm: Use strptime to compute the time zone Eric W. Biederman
2018-07-06 22:22             ` Eric Wong
2018-07-07 18:22  6%           ` [PATCH] Import: Don't copy nulls from emails into git Eric W. Biederman
2018-07-08  0:07  7%             ` Eric Wong
2018-07-08  1:52  7%               ` Eric W. Biederman
2023-11-27 22:20  4% [PATCH] disallow NUL characters in Message-ID and List-Id Eric Wong
2023-11-27 23:08  0% ` Eric W. Biederman

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).