user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: Re: [WIP 1/?] v2writable: index Message-IDs w/ spaces properly
Date: Wed, 1 Apr 2020 00:05:28 +0000	[thread overview]
Message-ID: <20200401000528.GA32055@dcvr> (raw)
In-Reply-To: <20200331084936.GA26977@dcvr>

Eric Wong <e@yhbt.net> wrote:
> Message-IDs can apparently contain spaces and other weird
> characters.  Ensure we pass those properly to shard subprocesses
> when importing messages in parallel mode.
> 
> Our NNTP parser does not deal with spaces in the Message-ID,
> yet, and I don't expect most NNTP clients to, either.

Nor does Net::NNTP on the client side...
But regardless of what happens with Message-IDs in the NNTP
side, this patch will remain correct and fixes an indexing
problem when Message-IDs.

This bug was exacerbated by the changes to pass date and
timestamps from the git commit into the shard when mirroring,
but has always been with us when using multi-process indexing.

> diff --git a/t/v2writable.t b/t/v2writable.t
> index cdcfe4d0..8167e4de 100644
> --- a/t/v2writable.t
> +++ b/t/v2writable.t

> @@ -175,8 +180,12 @@ EOF
>  		is($uniq{$mid}++, 0, "MID for $num is unique in XOVER");
>  		is_deeply($n->xhdr('Message-ID', $num),
>  			 { $num => $mid }, "XHDR lookup OK on num $num");
> +
> +		# FIXME NNTP.pm doesn't handle spaces in Message-ID
> +		next if $mid =~ / /;
> +

Pushed with the following squashed in:

diff --git a/t/v2writable.t b/t/v2writable.t
index 8167e4de..66d5663e 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -181,7 +181,8 @@ EOF
 		is_deeply($n->xhdr('Message-ID', $num),
 			 { $num => $mid }, "XHDR lookup OK on num $num");
 
-		# FIXME NNTP.pm doesn't handle spaces in Message-ID
+		# FIXME PublicInbox::NNTP (server) doesn't handle spaces in
+		# Message-ID, but neither does Net::NNTP (client)
 		next if $mid =~ / /;
 
 		is_deeply($n->xhdr('Message-ID', $mid),

      reply	other threads:[~2020-04-01  0:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  8:32 how to gracefully handle spaces in Message-IDs? Eric Wong
2020-03-31  8:49 ` [WIP 1/?] v2writable: index Message-IDs w/ spaces properly Eric Wong
2020-04-01  0:05   ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200401000528.GA32055@dcvr \
    --to=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).