user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [WIP 1/?] v2writable: index Message-IDs w/ spaces properly
  2020-03-31  8:49  6% ` [WIP 1/?] v2writable: index Message-IDs w/ spaces properly Eric Wong
@ 2020-04-01  0:05  7%   ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2020-04-01  0:05 UTC (permalink / raw)
  To: meta

Eric Wong <e@yhbt.net> wrote:
> Message-IDs can apparently contain spaces and other weird
> characters.  Ensure we pass those properly to shard subprocesses
> when importing messages in parallel mode.
> 
> Our NNTP parser does not deal with spaces in the Message-ID,
> yet, and I don't expect most NNTP clients to, either.

Nor does Net::NNTP on the client side...
But regardless of what happens with Message-IDs in the NNTP
side, this patch will remain correct and fixes an indexing
problem when Message-IDs.

This bug was exacerbated by the changes to pass date and
timestamps from the git commit into the shard when mirroring,
but has always been with us when using multi-process indexing.

> diff --git a/t/v2writable.t b/t/v2writable.t
> index cdcfe4d0..8167e4de 100644
> --- a/t/v2writable.t
> +++ b/t/v2writable.t

> @@ -175,8 +180,12 @@ EOF
>  		is($uniq{$mid}++, 0, "MID for $num is unique in XOVER");
>  		is_deeply($n->xhdr('Message-ID', $num),
>  			 { $num => $mid }, "XHDR lookup OK on num $num");
> +
> +		# FIXME NNTP.pm doesn't handle spaces in Message-ID
> +		next if $mid =~ / /;
> +

Pushed with the following squashed in:

diff --git a/t/v2writable.t b/t/v2writable.t
index 8167e4de..66d5663e 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -181,7 +181,8 @@ EOF
 		is_deeply($n->xhdr('Message-ID', $num),
 			 { $num => $mid }, "XHDR lookup OK on num $num");
 
-		# FIXME NNTP.pm doesn't handle spaces in Message-ID
+		# FIXME PublicInbox::NNTP (server) doesn't handle spaces in
+		# Message-ID, but neither does Net::NNTP (client)
 		next if $mid =~ / /;
 
 		is_deeply($n->xhdr('Message-ID', $mid),

^ permalink raw reply	[relevance 7%]

* [WIP 1/?] v2writable: index Message-IDs w/ spaces properly
  @ 2020-03-31  8:49  6% ` Eric Wong
  2020-04-01  0:05  7%   ` Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2020-03-31  8:49 UTC (permalink / raw)
  To: meta

Message-IDs can apparently contain spaces and other weird
characters.  Ensure we pass those properly to shard subprocesses
when importing messages in parallel mode.

Our NNTP parser does not deal with spaces in the Message-ID,
yet, and I don't expect most NNTP clients to, either.
---
 lib/PublicInbox/SearchIdxShard.pm |  8 +++++---
 t/v2writable.t                    | 11 ++++++++++-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm
index 1ea01095..06bcd403 100644
--- a/lib/PublicInbox/SearchIdxShard.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -69,8 +69,9 @@ sub shard_worker_loop ($$$$$) {
 			$self->remove_by_oid($oid, $mid);
 		} else {
 			chomp $line;
-			my ($bytes, $num, $blob, $mid, $ds, $ts) =
-							split(/ /, $line);
+			# n.b. $mid may contain spaces(!)
+			my ($bytes, $num, $blob, $ds, $ts, $mid) =
+							split(/ /, $line, 6);
 			$self->begin_txn_lazy;
 			my $n = read($r, my $msg, $bytes) or die "read: $!\n";
 			$n == $bytes or die "short read: $n != $bytes\n";
@@ -93,7 +94,8 @@ sub shard_worker_loop ($$$$$) {
 sub index_raw {
 	my ($self, $msgref, $mime, $smsg) = @_;
 	if (my $w = $self->{w}) {
-		print $w join(' ', @$smsg{qw(bytes num blob mid ds ts)}),
+		# mid must be last, it can contain spaces (but not LF)
+		print $w join(' ', @$smsg{qw(bytes num blob ds ts mid)}),
 			"\n", $$msgref or die "failed to write shard $!\n";
 	} else {
 		$$msgref = undef;
diff --git a/t/v2writable.t b/t/v2writable.t
index cdcfe4d0..8167e4de 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -109,6 +109,11 @@ if ('ensure git configs are correct') {
 	@mids = $mime->header_obj->header_raw('Message-Id');
 	like($mids[0], $sane_mid, 'mid was generated');
 	is(scalar(@mids), 1, 'new generated');
+
+	@warn = ();
+	$mime->header_set('Message-Id', '<space@ (NXDOMAIN) >');
+	ok($im->add($mime), 'message added with space in Message-Id');
+	is_deeply([], \@warn);
 }
 
 {
@@ -175,8 +180,12 @@ EOF
 		is($uniq{$mid}++, 0, "MID for $num is unique in XOVER");
 		is_deeply($n->xhdr('Message-ID', $num),
 			 { $num => $mid }, "XHDR lookup OK on num $num");
+
+		# FIXME NNTP.pm doesn't handle spaces in Message-ID
+		next if $mid =~ / /;
+
 		is_deeply($n->xhdr('Message-ID', $mid),
-			 { $mid => $mid }, "XHDR lookup OK on MID $num");
+			 { $mid => $mid }, "XHDR lookup OK on MID $mid ($num)");
 	}
 	my %nn;
 	foreach my $mid (@{$n->newnews(0, $group)}) {

^ permalink raw reply	[relevance 6%]

Results 1-2 of 2 | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2020-03-31  8:32     how to gracefully handle spaces in Message-IDs? Eric Wong
2020-03-31  8:49  6% ` [WIP 1/?] v2writable: index Message-IDs w/ spaces properly Eric Wong
2020-04-01  0:05  7%   ` Eric Wong

Code repositories for project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).