user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* From-munge not being reversed on mbox import
@ 2020-04-04  4:58 Kyle Meyer
  2020-04-04  6:20 ` [PATCH] inboxwritable: fix From_ line unescaping Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Kyle Meyer @ 2020-04-04  4:58 UTC (permalink / raw)
  To: meta

I'm feeding mbox files created with Konstantin Ryabitsev's
list-archive-maker.py script [^1] to import_vger_from_mbox.  Looking
through the result, I noticed some ">From" lines.  Here's an example:

  https://yhetil.org/orgmode/871rpt9zc4.fsf@kyleam.com/

If I'm following the code correctly, that leads to an import_mbox call,
which in turn calls mb_add:

   sub mb_add ($$$$) {
       my ($im, $variant, $filter, $msg) = @_;
       $$msg =~ s/(\r?\n)+\z/$1/s;
       my $mime = PublicInbox::MIME->new($msg);
       if ($variant eq 'mboxrd') {
               $$msg =~ s/^>(>*From )/$1/sm;
       } elsif ($variant eq 'mboxo') {
               $$msg =~ s/^>From /From /sm;
       }
   [...]

So, it appears the ">From" _should_ be getting reversed.  To eliminate
any stupid things I may have done when creating the archive, I looked
for a message on meta that has an in-body line starting with "From" and
found

  https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/

So I downloaded the public-inbox generated mbox and fed it to
import_vger_from_mbox:

  curl -s https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/t.mbox.gz \
   | zcat | scripts/import_vger_from_mbox testing emacs-orgmode@gnu.org ~/inboxes/testing

That too leaves a ">From" in the body:

  https://yhetil.org/testing/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/

Any idea what's going wrong here?


[^1]: https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/plain/list-archive-maker.py

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] inboxwritable: fix From_ line unescaping
  2020-04-04  4:58 From-munge not being reversed on mbox import Kyle Meyer
@ 2020-04-04  6:20 ` Eric Wong
  2020-04-04 16:31   ` Kyle Meyer
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2020-04-04  6:20 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> I'm feeding mbox files created with Konstantin Ryabitsev's
> list-archive-maker.py script [^1] to import_vger_from_mbox.  Looking
> through the result, I noticed some ">From" lines.  Here's an example:
> 
>   https://yhetil.org/orgmode/871rpt9zc4.fsf@kyleam.com/
> 
> If I'm following the code correctly, that leads to an import_mbox call,
> which in turn calls mb_add:
> 
>    sub mb_add ($$$$) {
>        my ($im, $variant, $filter, $msg) = @_;
>        $$msg =~ s/(\r?\n)+\z/$1/s;
>        my $mime = PublicInbox::MIME->new($msg);
>        if ($variant eq 'mboxrd') {
>                $$msg =~ s/^>(>*From )/$1/sm;
>        } elsif ($variant eq 'mboxo') {
>                $$msg =~ s/^>From /From /sm;
>        }
>    [...]

Yup, and that's buggy on first sight.  My fault :x

> So, it appears the ">From" _should_ be getting reversed.  To eliminate
> any stupid things I may have done when creating the archive, I looked
> for a message on meta that has an in-body line starting with "From" and
> found
> 
>   https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/
> 
> So I downloaded the public-inbox generated mbox and fed it to
> import_vger_from_mbox:
> 
>   curl -s https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/t.mbox.gz \
>    | zcat | scripts/import_vger_from_mbox testing emacs-orgmode@gnu.org ~/inboxes/testing
> 
> That too leaves a ">From" in the body:
> 
>   https://yhetil.org/testing/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/

Thanks for the reproducible test case.  A fix is below
(only tested with your case, nothing in t/*.t yet)

> Any idea what's going wrong here?

Two bugs, actually, but one affected your case.

> [^1]: https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/plain/list-archive-maker.py

Can you confirm the following fixes things for you?
Thanks again for the excellent bug report and apologies for
my careless bug :x

----8<----
From: Eric Wong <e@yhbt.net>
Date: Sat, 04 Apr 2020 06:17:29 +0000
Subject: [PATCH] inboxwritable: fix From_ line unescaping

We can't rely on Email::MIME noticing the change to our
scalar ref after calling `PublicInbox::MIME->new'.

This is because Email::MIME::body_set (unlike
Email::Simple::body_set) will copy the contents of the body into
`->{body_raw}' as a new scalar.

Furthermore, we need to escape multiple From lines in the body,
not just the first one, using the `g' modifier to `s//'.

Reported-by: Kyle Meyer <kyle@kyleam.com>
---
 lib/PublicInbox/InboxWritable.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index ce979ea2..f2ba21fc 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -157,12 +157,12 @@ my $from_strict = qr/^From \S+ +\S+ \S+ +\S+ [^:]+:[^:]+:[^:]+ [^:]+/;
 sub mb_add ($$$$) {
 	my ($im, $variant, $filter, $msg) = @_;
 	$$msg =~ s/(\r?\n)+\z/$1/s;
-	my $mime = PublicInbox::MIME->new($msg);
 	if ($variant eq 'mboxrd') {
-		$$msg =~ s/^>(>*From )/$1/sm;
+		$$msg =~ s/^>(>*From )/$1/gms;
 	} elsif ($variant eq 'mboxo') {
-		$$msg =~ s/^>From /From /sm;
+		$$msg =~ s/^>From /From /gms;
 	}
+	my $mime = PublicInbox::MIME->new($msg);
 	if ($filter) {
 		my $ret = $filter->scrub($mime) or return;
 		return if $ret == REJECT();

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] inboxwritable: fix From_ line unescaping
  2020-04-04  6:20 ` [PATCH] inboxwritable: fix From_ line unescaping Eric Wong
@ 2020-04-04 16:31   ` Kyle Meyer
  0 siblings, 0 replies; 3+ messages in thread
From: Kyle Meyer @ 2020-04-04 16:31 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> Can you confirm the following fixes things for you?

It does.  Thank you!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-04 16:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-04  4:58 From-munge not being reversed on mbox import Kyle Meyer
2020-04-04  6:20 ` [PATCH] inboxwritable: fix From_ line unescaping Eric Wong
2020-04-04 16:31   ` Kyle Meyer

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).