* From-munge not being reversed on mbox import
@ 2020-04-04 4:58 Kyle Meyer
2020-04-04 6:20 ` [PATCH] inboxwritable: fix From_ line unescaping Eric Wong
0 siblings, 1 reply; 3+ messages in thread
From: Kyle Meyer @ 2020-04-04 4:58 UTC (permalink / raw)
To: meta
I'm feeding mbox files created with Konstantin Ryabitsev's
list-archive-maker.py script [^1] to import_vger_from_mbox. Looking
through the result, I noticed some ">From" lines. Here's an example:
https://yhetil.org/orgmode/871rpt9zc4.fsf@kyleam.com/
If I'm following the code correctly, that leads to an import_mbox call,
which in turn calls mb_add:
sub mb_add ($$$$) {
my ($im, $variant, $filter, $msg) = @_;
$$msg =~ s/(\r?\n)+\z/$1/s;
my $mime = PublicInbox::MIME->new($msg);
if ($variant eq 'mboxrd') {
$$msg =~ s/^>(>*From )/$1/sm;
} elsif ($variant eq 'mboxo') {
$$msg =~ s/^>From /From /sm;
}
[...]
So, it appears the ">From" _should_ be getting reversed. To eliminate
any stupid things I may have done when creating the archive, I looked
for a message on meta that has an in-body line starting with "From" and
found
https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/
So I downloaded the public-inbox generated mbox and fed it to
import_vger_from_mbox:
curl -s https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/t.mbox.gz \
| zcat | scripts/import_vger_from_mbox testing emacs-orgmode@gnu.org ~/inboxes/testing
That too leaves a ">From" in the body:
https://yhetil.org/testing/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/
Any idea what's going wrong here?
[^1]: https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/plain/list-archive-maker.py
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] inboxwritable: fix From_ line unescaping
2020-04-04 4:58 From-munge not being reversed on mbox import Kyle Meyer
@ 2020-04-04 6:20 ` Eric Wong
2020-04-04 16:31 ` Kyle Meyer
0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2020-04-04 6:20 UTC (permalink / raw)
To: Kyle Meyer; +Cc: meta
Kyle Meyer <kyle@kyleam.com> wrote:
> I'm feeding mbox files created with Konstantin Ryabitsev's
> list-archive-maker.py script [^1] to import_vger_from_mbox. Looking
> through the result, I noticed some ">From" lines. Here's an example:
>
> https://yhetil.org/orgmode/871rpt9zc4.fsf@kyleam.com/
>
> If I'm following the code correctly, that leads to an import_mbox call,
> which in turn calls mb_add:
>
> sub mb_add ($$$$) {
> my ($im, $variant, $filter, $msg) = @_;
> $$msg =~ s/(\r?\n)+\z/$1/s;
> my $mime = PublicInbox::MIME->new($msg);
> if ($variant eq 'mboxrd') {
> $$msg =~ s/^>(>*From )/$1/sm;
> } elsif ($variant eq 'mboxo') {
> $$msg =~ s/^>From /From /sm;
> }
> [...]
Yup, and that's buggy on first sight. My fault :x
> So, it appears the ">From" _should_ be getting reversed. To eliminate
> any stupid things I may have done when creating the archive, I looked
> for a message on meta that has an in-body line starting with "From" and
> found
>
> https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/
>
> So I downloaded the public-inbox generated mbox and fed it to
> import_vger_from_mbox:
>
> curl -s https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/t.mbox.gz \
> | zcat | scripts/import_vger_from_mbox testing emacs-orgmode@gnu.org ~/inboxes/testing
>
> That too leaves a ">From" in the body:
>
> https://yhetil.org/testing/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/
Thanks for the reproducible test case. A fix is below
(only tested with your case, nothing in t/*.t yet)
> Any idea what's going wrong here?
Two bugs, actually, but one affected your case.
> [^1]: https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/plain/list-archive-maker.py
Can you confirm the following fixes things for you?
Thanks again for the excellent bug report and apologies for
my careless bug :x
----8<----
From: Eric Wong <e@yhbt.net>
Date: Sat, 04 Apr 2020 06:17:29 +0000
Subject: [PATCH] inboxwritable: fix From_ line unescaping
We can't rely on Email::MIME noticing the change to our
scalar ref after calling `PublicInbox::MIME->new'.
This is because Email::MIME::body_set (unlike
Email::Simple::body_set) will copy the contents of the body into
`->{body_raw}' as a new scalar.
Furthermore, we need to escape multiple From lines in the body,
not just the first one, using the `g' modifier to `s//'.
Reported-by: Kyle Meyer <kyle@kyleam.com>
---
lib/PublicInbox/InboxWritable.pm | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index ce979ea2..f2ba21fc 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -157,12 +157,12 @@ my $from_strict = qr/^From \S+ +\S+ \S+ +\S+ [^:]+:[^:]+:[^:]+ [^:]+/;
sub mb_add ($$$$) {
my ($im, $variant, $filter, $msg) = @_;
$$msg =~ s/(\r?\n)+\z/$1/s;
- my $mime = PublicInbox::MIME->new($msg);
if ($variant eq 'mboxrd') {
- $$msg =~ s/^>(>*From )/$1/sm;
+ $$msg =~ s/^>(>*From )/$1/gms;
} elsif ($variant eq 'mboxo') {
- $$msg =~ s/^>From /From /sm;
+ $$msg =~ s/^>From /From /gms;
}
+ my $mime = PublicInbox::MIME->new($msg);
if ($filter) {
my $ret = $filter->scrub($mime) or return;
return if $ret == REJECT();
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] inboxwritable: fix From_ line unescaping
2020-04-04 6:20 ` [PATCH] inboxwritable: fix From_ line unescaping Eric Wong
@ 2020-04-04 16:31 ` Kyle Meyer
0 siblings, 0 replies; 3+ messages in thread
From: Kyle Meyer @ 2020-04-04 16:31 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
Eric Wong <e@yhbt.net> writes:
> Can you confirm the following fixes things for you?
It does. Thank you!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-04-04 16:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-04 4:58 From-munge not being reversed on mbox import Kyle Meyer
2020-04-04 6:20 ` [PATCH] inboxwritable: fix From_ line unescaping Eric Wong
2020-04-04 16:31 ` Kyle Meyer
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).