user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 0/8] some small yak shaving things
@ 2020-04-18  3:38 Eric Wong
  2020-04-18  3:38 ` [PATCH 1/8] inboxwritable: mime_from_path: reuse in more places Eric Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

Eric Wong (8):
  inboxwritable: mime_from_path: reuse in more places
  searchidx: die on cat-file failures
  inbox: do not memoize description or cloneurl if missing
  inbox: replace `eval {}' with `do {}' where appropriate
  favor `do {}' over `eval {}' for localized slurp
  wwwatomstream: move {emit_header} field to $self
  mbox: use per-message line-ending for From_ line
  reduce scope of mbox From_ line removal

 Documentation/mknews.perl               |  2 +-
 MANIFEST                                |  4 ++--
 lib/PublicInbox/Inbox.pm                | 27 ++++++++++++------------
 lib/PublicInbox/InboxWritable.pm        |  4 ++--
 lib/PublicInbox/Mbox.pm                 |  7 +++++--
 lib/PublicInbox/NNTP.pm                 |  2 ++
 lib/PublicInbox/SearchIdx.pm            | 14 +++++--------
 lib/PublicInbox/WatchMaildir.pm         |  6 +++---
 lib/PublicInbox/WwwAtomStream.pm        |  5 ++---
 script/public-inbox-edit                |  5 ++---
 script/public-inbox-learn               |  6 +++---
 script/public-inbox-mda                 |  2 +-
 script/public-inbox-purge               |  2 +-
 scripts/import_maildir                  |  4 ++--
 scripts/import_slrnspool                |  2 +-
 scripts/slrnspool2maildir               |  2 +-
 scripts/ssoma-replay                    |  5 +----
 t/inbox.t                               | 19 +++++++++++++++++
 t/{iso-2202-jp.mbox => iso-2202-jp.eml} |  1 -
 t/mda.t                                 | 10 ++++-----
 t/msg_iter.t                            | 18 ++++++----------
 t/nntpd-tls.t                           |  8 +++----
 t/psgi_v2.t                             | 28 ++++++++++++++++---------
 t/search.t                              | 12 ++++-------
 t/solver_git.t                          |  4 ++--
 t/{utf8.mbox => utf8.eml}               |  1 -
 t/v2reindex.t                           |  3 +--
 t/watch_maildir_v2.t                    |  2 +-
 28 files changed, 105 insertions(+), 100 deletions(-)
 rename t/{iso-2202-jp.mbox => iso-2202-jp.eml} (84%)
 rename t/{utf8.mbox => utf8.eml} (90%)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/8] inboxwritable: mime_from_path: reuse in more places
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  2020-04-18  3:38 ` [PATCH 2/8] searchidx: die on cat-file failures Eric Wong
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

There's nothing Maildir-specific about the function, so
`maildir_path_load' was a bad name.  So give it a more
appropriate name and use it in our tests.

This save ourselves some code and inconsistency by reusing an
existing internal library routine in more places.  We can drop
the "From_" line in some of our (formerly) mbox sample files.
---
 MANIFEST                                |  4 ++--
 lib/PublicInbox/InboxWritable.pm        |  4 ++--
 lib/PublicInbox/WatchMaildir.pm         |  6 +++---
 script/public-inbox-edit                |  5 ++---
 t/{iso-2202-jp.mbox => iso-2202-jp.eml} |  1 -
 t/mda.t                                 | 10 ++++------
 t/msg_iter.t                            | 18 ++++++------------
 t/nntpd-tls.t                           |  8 +++-----
 t/search.t                              | 12 ++++--------
 t/solver_git.t                          |  4 ++--
 t/{utf8.mbox => utf8.eml}               |  1 -
 11 files changed, 28 insertions(+), 45 deletions(-)
 rename t/{iso-2202-jp.mbox => iso-2202-jp.eml} (84%)
 rename t/{utf8.mbox => utf8.eml} (90%)

diff --git a/MANIFEST b/MANIFEST
index ba5cc6a4..92cda5d8 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -253,7 +253,7 @@ t/index-git-times.t
 t/indexlevels-mirror-v1.t
 t/indexlevels-mirror.t
 t/init.t
-t/iso-2202-jp.mbox
+t/iso-2202-jp.eml
 t/linkify.t
 t/main-bin/spamc
 t/mda.t
@@ -294,7 +294,7 @@ t/spamcheck_spamc.t
 t/spawn.t
 t/thread-cycle.t
 t/time.t
-t/utf8.mbox
+t/utf8.eml
 t/v1-add-remove-add.t
 t/v1reindex.t
 t/v2-add-remove-add.t
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index f2ba21fc..31aa76c6 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -111,7 +111,7 @@ sub is_maildir_path ($) {
 	(is_maildir_basename($p[-1]) && -f $path) ? 1 : 0;
 }
 
-sub maildir_path_load ($) {
+sub mime_from_path ($) {
 	my ($path) = @_;
 	if (open my $fh, '<', $path) {
 		local $/;
@@ -138,7 +138,7 @@ sub import_maildir {
 		opendir my $dh, "$dir/$sub" or die "opendir $dir/$sub: $!\n";
 		while (defined(my $fn = readdir($dh))) {
 			next unless is_maildir_basename($fn);
-			my $mime = maildir_path_load("$dir/$fn") or next;
+			my $mime = mime_from_path("$dir/$fn") or next;
 
 			if (my $filter = $self->filter($im)) {
 				my $ret = $filter->scrub($mime) or return;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index e2024640..bea2ed2a 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -11,7 +11,7 @@ use PublicInbox::InboxWritable;
 use File::Temp 0.19 (); # 0.19 for ->newdir
 use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
-*maildir_path_load = *PublicInbox::InboxWritable::maildir_path_load;
+*mime_from_path = \&PublicInbox::InboxWritable::mime_from_path;
 
 sub new {
 	my ($class, $config) = @_;
@@ -123,7 +123,7 @@ sub _remove_spam {
 	my ($self, $path) = @_;
 	# path must be marked as (S)een
 	$path =~ /:2,[A-R]*S[T-Za-z]*\z/ or return;
-	my $mime = maildir_path_load($path) or return;
+	my $mime = mime_from_path($path) or return;
 	$self->{config}->each_inbox(sub {
 		my ($ibx) = @_;
 		eval {
@@ -165,7 +165,7 @@ sub _try_path {
 		$warn_cb->(@_);
 	};
 	foreach my $ibx (@$inboxes) {
-		my $mime = maildir_path_load($path) or next;
+		my $mime = mime_from_path($path) or next;
 		my $im = _importer_for($self, $ibx);
 
 		# any header match means it's eligible for the inbox:
diff --git a/script/public-inbox-edit b/script/public-inbox-edit
index ae5d8289..28b1b5e8 100755
--- a/script/public-inbox-edit
+++ b/script/public-inbox-edit
@@ -92,9 +92,8 @@ Multiple messages with different content found matching
 		warn "Will edit all of them\n";
 	}
 } else {
-	open my $fh, '<', $file or die "open($file) failed: $!";
-	my $orig = do { local $/; <$fh> };
-	my $mime = PublicInbox::MIME->new(\$orig);
+	my $mime = PublicInbox::InboxWritable::mime_from_path($file) or
+		die "open($file) failed: $!";
 	my $mids = mids($mime->header_obj);
 	find_mid($found, $_, \@ibxs) for (@$mids); # populates $found
 	my $cid = content_id($mime);
diff --git a/t/iso-2202-jp.mbox b/t/iso-2202-jp.eml
similarity index 84%
rename from t/iso-2202-jp.mbox
rename to t/iso-2202-jp.eml
index 1a8e1974..9e0bbad4 100644
--- a/t/iso-2202-jp.mbox
+++ b/t/iso-2202-jp.eml
@@ -1,4 +1,3 @@
-From historical@ruby-dev Thu Jan  1 00:00:00 1970
 Message-Id: <199707281508.AAA24167@hoyogw.example>
 Date: Tue, 29 Jul 97 00:08:29 +0900
 From: matz@example.com
diff --git a/t/mda.t b/t/mda.t
index ddc0c279..ec09cf69 100644
--- a/t/mda.t
+++ b/t/mda.t
@@ -7,6 +7,7 @@ use Email::MIME;
 use Cwd qw(getcwd);
 use PublicInbox::MID qw(mid2path);
 use PublicInbox::Git;
+use PublicInbox::InboxWritable;
 use PublicInbox::TestCommon;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $home = "$tmpdir/pi-home";
@@ -62,12 +63,9 @@ local $ENV{GIT_COMMITTER_NAME} = eval {
 	use PublicInbox::MDA;
 	use PublicInbox::Address;
 	use Encode qw/encode/;
-	my $mbox = 't/utf8.mbox';
-	open(my $fh, '<', $mbox) or die "failed to open mbox: $mbox\n";
-	my $str = eval { local $/; <$fh> };
-	close $fh;
-	my $msg = Email::MIME->new($str);
-
+	my $eml = 't/utf8.eml';
+	my $msg = PublicInbox::InboxWritable::mime_from_path($eml) or
+		die "failed to open $eml: $!";
 	my $from = $msg->header('From');
 	my ($author) = PublicInbox::Address::names($from);
 	my ($email) = PublicInbox::Address::emails($from);
diff --git a/t/msg_iter.t b/t/msg_iter.t
index d303564f..573ee412 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -5,6 +5,7 @@ use warnings;
 use Test::More;
 use Email::MIME;
 use PublicInbox::Hval qw(ascii_html);
+use PublicInbox::InboxWritable;
 use_ok('PublicInbox::MsgIter');
 
 {
@@ -42,12 +43,9 @@ use_ok('PublicInbox::MsgIter');
 }
 
 {
-	my $f = 't/iso-2202-jp.mbox';
-	my $mime = Email::MIME->new(do {
-		open my $fh, '<', $f or die "open($f): $!";
-		local $/;
-		<$fh>;
-	});
+	my $f = 't/iso-2202-jp.eml';
+	my $mime = PublicInbox::InboxWritable::mime_from_path($f) or
+		die "open $f: $!";
 	my $raw = '';
 	msg_iter($mime, sub {
 		my ($part, $level, @ex) = @{$_[0]};
@@ -61,12 +59,8 @@ use_ok('PublicInbox::MsgIter');
 
 {
 	my $f = 't/x-unknown-alpine.eml';
-	my $mime = Email::MIME->new(do {
-		open my $fh, '<', $f or die "open($f): $!";
-		local $/;
-		binmode $fh;
-		<$fh>;
-	});
+	my $mime = PublicInbox::InboxWritable::mime_from_path($f) or
+		die "open $f: $!";
 	my $raw = '';
 	msg_iter($mime, sub {
 		my ($part, $level, @ex) = @{$_[0]};
diff --git a/t/nntpd-tls.t b/t/nntpd-tls.t
index 0714631d..a0522e1f 100644
--- a/t/nntpd-tls.t
+++ b/t/nntpd-tls.t
@@ -63,11 +63,9 @@ EOF
 
 {
 	my $im = $ibx->importer(0);
-	my $mime = PublicInbox::MIME->new(do {
-		open my $fh, '<', 't/data/0001.patch' or die;
-		local $/;
-		<$fh>
-	});
+	my $eml = 't/data/0001.patch';
+	my $mime = PublicInbox::InboxWritable::mime_from_path($eml) or
+		die "open $eml: $!";
 	ok($im->add($mime), 'message added');
 	$im->done;
 	if ($version == 1) {
diff --git a/t/search.t b/t/search.t
index 839a320a..101d44e9 100644
--- a/t/search.t
+++ b/t/search.t
@@ -7,6 +7,7 @@ use PublicInbox::TestCommon;
 require_mods(qw(DBD::SQLite Search::Xapian));
 require PublicInbox::SearchIdx;
 require PublicInbox::Inbox;
+require PublicInbox::InboxWritable;
 use Email::MIME;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $git_dir = "$tmpdir/a.git";
@@ -290,14 +291,9 @@ $ibx->with_umask(sub {
 });
 
 $ibx->with_umask(sub {
-	my $str = eval {
-		my $mbox = 't/utf8.mbox';
-		open(my $fh, '<', $mbox) or die "failed to open mbox: $mbox\n";
-		local $/;
-		<$fh>
-	};
-	$str =~ s/\AFrom [^\n]+\n//s;
-	my $mime = Email::MIME->new($str);
+	my $eml = 't/utf8.eml';
+	my $mime = PublicInbox::InboxWritable::mime_from_path($eml) or
+		die "open $eml: $!";
 	my $doc_id = $rw->add_message($mime);
 	ok($doc_id > 0, 'message indexed doc_id with UTF-8');
 	my $msg = $rw->query('m:testmessage@example.com', {limit => 1})->[0];
diff --git a/t/solver_git.t b/t/solver_git.t
index 2dbb07b0..7f0cd999 100644
--- a/t/solver_git.t
+++ b/t/solver_git.t
@@ -28,8 +28,8 @@ my $im = PublicInbox::V2Writable->new($ibx, 1);
 $im->{parallel} = 0;
 
 my $deliver_patch = sub ($) {
-	open my $fh, '<', $_[0] or die "open: $!";
-	my $mime = PublicInbox::MIME->new(do { local $/; <$fh> });
+	my $mime = PublicInbox::InboxWritable::mime_from_path($_[0]) or
+		die "open $_[0]: $!";
 	$im->add($mime);
 	$im->done;
 };
diff --git a/t/utf8.mbox b/t/utf8.eml
similarity index 90%
rename from t/utf8.mbox
rename to t/utf8.eml
index cebaf9b0..9bf1002c 100644
--- a/t/utf8.mbox
+++ b/t/utf8.eml
@@ -1,4 +1,3 @@
-From e@yhbt.net Thu Jan 01 00:00:00 1970
 Date: Thu, 01 Jan 1970 00:00:00 +0000
 To: =?utf-8?Q?El=C3=A9anor?= <e@example.com>
 From: =?utf-8?Q?El=C3=A9anor?= <e@example.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/8] searchidx: die on cat-file failures
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
  2020-04-18  3:38 ` [PATCH 1/8] inboxwritable: mime_from_path: reuse in more places Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  2020-04-18  3:38 ` [PATCH 3/8] inbox: don't memoize missing description|cloneurl Eric Wong
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

We always use the object ID from "git <log|rev-list>" for
retrieving blobs, so fail loudly if the git repository is
corrupt instead of silently continuing.
---
 lib/PublicInbox/SearchIdx.pm | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 05689941..d1290dc2 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -551,13 +551,11 @@ sub unindex_both {
 
 sub do_cat_mail {
 	my ($git, $blob, $sizeref) = @_;
-	my $mime = eval {
-		my $str = $git->cat_file($blob, $sizeref);
-		# fixup bugs from import:
-		$$str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
-		PublicInbox::MIME->new($str);
-	};
-	$@ ? undef : $mime;
+	my $str = $git->cat_file($blob, $sizeref) or
+		die "BUG: $blob not found in $git->{git_dir}";
+	# fixup bugs from import:
+	$$str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
+	PublicInbox::MIME->new($str);
 }
 
 # called by public-inbox-index
@@ -602,7 +600,7 @@ sub read_log {
 				}
 				next;
 			}
-			my $mime = do_cat_mail($git, $blob, \$bytes) or next;
+			my $mime = do_cat_mail($git, $blob, \$bytes);
 			my $smsg = bless {}, 'PublicInbox::Smsg';
 			batch_adjust(\$max, $bytes, $batch_cb, $latest, ++$nr);
 			$smsg->{blob} = $blob;
@@ -623,7 +621,7 @@ sub read_log {
 	close($log) or die "git log failed: \$?=$?";
 	# get the leftovers
 	foreach my $blob (keys %D) {
-		my $mime = do_cat_mail($git, $blob, \$bytes) or next;
+		my $mime = do_cat_mail($git, $blob, \$bytes);
 		$del_cb->($self, $mime);
 	}
 	$batch_cb->($nr, $latest, $newest);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/8] inbox: don't memoize missing description|cloneurl
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
  2020-04-18  3:38 ` [PATCH 1/8] inboxwritable: mime_from_path: reuse in more places Eric Wong
  2020-04-18  3:38 ` [PATCH 2/8] searchidx: die on cat-file failures Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  2020-04-18  3:38 ` [PATCH 4/8] inbox: replace `eval {}' with `do {}' where appropriate Eric Wong
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

It's probably common to have inboxes initially setup without
these files properly configured, so don't memoize at that stage.
---
 lib/PublicInbox/Inbox.pm | 13 ++++++++-----
 t/inbox.t                | 19 +++++++++++++++++++
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 95ffd039..e49f85fc 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -219,19 +219,22 @@ sub try_cat {
 
 sub description {
 	my ($self) = @_;
-	$self->{description} //= do {
+	($self->{description} //= do {
 		my $desc = try_cat("$self->{inboxdir}/description");
 		local $/ = "\n";
 		chomp $desc;
 		$desc =~ s/\s+/ /smg;
-		$desc eq '' ? '($INBOX_DIR/description missing)' : $desc;
-	};
+		$desc eq '' ? undef : $desc;
+	}) // '($INBOX_DIR/description missing)';
 }
 
 sub cloneurl {
 	my ($self) = @_;
-	$self->{cloneurl} //=
-		[ split(/\s+/s, try_cat("$self->{inboxdir}/cloneurl")) ];
+	($self->{cloneurl} //= do {
+		my $s = try_cat("$self->{inboxdir}/cloneurl");
+		my @urls = split(/\s+/s, $s);
+		scalar(@urls) ? \@urls : undef
+	}) // [];
 }
 
 sub base_url {
diff --git a/t/inbox.t b/t/inbox.t
index 5f86440d..b59d5dba 100644
--- a/t/inbox.t
+++ b/t/inbox.t
@@ -4,12 +4,31 @@ use strict;
 use warnings;
 use Test::More;
 use_ok 'PublicInbox::Inbox';
+use File::Temp 0.19 ();
 my $x = PublicInbox::Inbox->new({url => [ '//example.com/test/' ]});
 is($x->base_url, 'https://example.com/test/', 'expanded protocol-relative');
 $x = PublicInbox::Inbox->new({url => [ 'http://example.com/test' ]});
 is($x->base_url, 'http://example.com/test/', 'added trailing slash');
 
 $x = PublicInbox::Inbox->new({});
+
 is($x->base_url, undef, 'undef base_url allowed');
+my $tmpdir = File::Temp->newdir('pi-inbox-XXXXXX', TMPDIR => 1);
+$x->{inboxdir} = $tmpdir->dirname;
+is_deeply($x->cloneurl, [], 'no cloneurls');
+is($x->description, '($INBOX_DIR/description missing)', 'default description');
+{
+	open my $fh, '>', "$x->{inboxdir}/cloneurl" or die;
+	print $fh "https://example.com/inbox\n" or die;
+	close $fh or die;
+	open $fh, '>', "$x->{inboxdir}/description" or die;
+	print $fh "blah\n" or die;
+	close $fh or die;
+}
+is_deeply($x->cloneurl, ['https://example.com/inbox'], 'cloneurls update');
+is($x->description, 'blah', 'description updated');
+is(unlink(glob("$x->{inboxdir}/*")), 2, 'unlinked cloneurl & description');
+is_deeply($x->cloneurl, ['https://example.com/inbox'], 'cloneurls memoized');
+is($x->description, 'blah', 'description memoized');
 
 done_testing();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/8] inbox: replace `eval {}' with `do {}' where appropriate
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
                   ` (2 preceding siblings ...)
  2020-04-18  3:38 ` [PATCH 3/8] inbox: don't memoize missing description|cloneurl Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  2020-04-18  3:38 ` [PATCH 5/8] favor `do {}' over `eval {}' for localized slurp Eric Wong
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

-Git->new and -Limiter->new will never fail unless there's
an OOM, so using `eval' is incorrect.
---
 lib/PublicInbox/Inbox.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index e49f85fc..4bd82989 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -82,7 +82,7 @@ sub _set_uint ($$$) {
 sub _set_limiter ($$$) {
 	my ($self, $pi_config, $pfx) = @_;
 	my $lkey = "-${pfx}_limiter";
-	$self->{$lkey} ||= eval {
+	$self->{$lkey} ||= do {
 		# full key is: publicinbox.$NAME.httpbackendmax
 		my $mkey = $pfx.'max';
 		my $val = $self->{$mkey} or return;
@@ -130,7 +130,7 @@ sub version { $_[0]->{version} // 1 }
 sub git_epoch {
 	my ($self, $epoch) = @_;
 	$self->version == 2 or return;
-	$self->{"$epoch.git"} ||= eval {
+	$self->{"$epoch.git"} ||= do {
 		my $git_dir = "$self->{inboxdir}/git/$epoch.git";
 		my $g = PublicInbox::Git->new($git_dir);
 		$g->{-httpbackend_limiter} = $self->{-httpbackend_limiter};
@@ -141,7 +141,7 @@ sub git_epoch {
 
 sub git {
 	my ($self) = @_;
-	$self->{git} ||= eval {
+	$self->{git} ||= do {
 		my $git_dir = $self->{inboxdir};
 		$git_dir .= '/all.git' if $self->version == 2;
 		my $g = PublicInbox::Git->new($git_dir);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 5/8] favor `do {}' over `eval {}' for localized slurp
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
                   ` (3 preceding siblings ...)
  2020-04-18  3:38 ` [PATCH 4/8] inbox: replace `eval {}' with `do {}' where appropriate Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  2020-04-18  3:38 ` [PATCH 6/8] wwwatomstream: move {emit_header} field to $self Eric Wong
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

I did not know to use the return value of `do' back in the day.
There's probably no practical difference in these cases, but
`eval' is overkill for these uses and may hide actual errors.

We can get rid of a few redundant `scalar' ops and pass scalar
refs to Email::MIME->new to avoid copies in a few more places,
too.
---
 script/public-inbox-learn | 6 +++---
 script/public-inbox-mda   | 2 +-
 script/public-inbox-purge | 2 +-
 scripts/import_maildir    | 4 ++--
 scripts/import_slrnspool  | 2 +-
 scripts/slrnspool2maildir | 2 +-
 scripts/ssoma-replay      | 5 +----
 t/v2reindex.t             | 3 +--
 t/watch_maildir_v2.t      | 2 +-
 9 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 0d6c989b..4c10b68b 100644
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -20,9 +20,9 @@ if ($train !~ /\A(?:ham|spam|rm)\z/) {
 my $spamc = PublicInbox::Spamcheck::Spamc->new;
 my $pi_config = PublicInbox::Config->new;
 my $err;
-my $mime = PublicInbox::MIME->new(eval {
+my $mime = PublicInbox::MIME->new(do{
 	local $/;
-	my $data = scalar <STDIN>;
+	my $data = <STDIN>;
 	$data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 
 	if ($train ne 'rm') {
@@ -36,7 +36,7 @@ my $mime = PublicInbox::MIME->new(eval {
 		};
 		$err = $@;
 	}
-	$data
+	\$data
 });
 
 sub remove_or_add ($$$$) {
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index f37c7492..54d0af01 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -29,7 +29,7 @@ use PublicInbox::Spamcheck;
 # in case there's bugs in our code or user error.
 my $emergency = $ENV{PI_EMERGENCY} || "$ENV{HOME}/.public-inbox/emergency/";
 $ems = PublicInbox::Emergency->new($emergency);
-my $str = eval { local $/; <STDIN> };
+my $str = do { local $/; <STDIN> };
 $str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 $ems->prepare(\$str);
 my $simple = Email::Simple->new(\$str);
diff --git a/script/public-inbox-purge b/script/public-inbox-purge
index c9b69c3d..8301b06d 100755
--- a/script/public-inbox-purge
+++ b/script/public-inbox-purge
@@ -21,7 +21,7 @@ GetOptions($opt, @PublicInbox::AdminEdit::OPT) or
 my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt);
 PublicInbox::AdminEdit::check_editable(\@ibxs);
 
-my $data = do { local $/; scalar <STDIN> };
+my $data = do { local $/; <STDIN> };
 $data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 my $n_purged = 0;
 
diff --git a/scripts/import_maildir b/scripts/import_maildir
index fbf3f649..f4e82543 100755
--- a/scripts/import_maildir
+++ b/scripts/import_maildir
@@ -28,7 +28,7 @@ my @msgs;
 foreach my $sub (qw(cur new)) {
 	foreach my $fn (glob("$dir/$sub/*")) {
 		open my $fh, '<', $fn or next;
-		my $s = Email::Simple->new(eval { local $/; <$fh> });
+		my $s = Email::Simple->new(do { local $/; <$fh> });
 		my $date = $s->header('Date');
 		my $t = eval { str2time($date) };
 		defined $t or next;
@@ -45,7 +45,7 @@ my $im = PublicInbox::Import->new($git, $name, $email);
 while (my $ary = pop @msgs) {
 	my $fn = "$dir/$ary->[1]";
 	open my $fh, '<', $fn or next;
-	my $mime = PublicInbox::MIME->new(eval { local $/; <$fh> });
+	my $mime = PublicInbox::MIME->new(do { local $/; <$fh> });
 	$im->add($mime);
 }
 $im->done;
diff --git a/scripts/import_slrnspool b/scripts/import_slrnspool
index e569d004..480e7b4f 100755
--- a/scripts/import_slrnspool
+++ b/scripts/import_slrnspool
@@ -70,7 +70,7 @@ for (; $exit == 0 && $n < $max; $n++) {
 	$max = $n + $max_gap;
 	print STDERR $fn, "\n";
 
-	my $mime = PublicInbox::MIME->new(eval { local $/; <$fh> });
+	my $mime = PublicInbox::MIME->new(do { local $/; <$fh> });
 	$filter->scrub($mime);
 	$im->add($mime);
 
diff --git a/scripts/slrnspool2maildir b/scripts/slrnspool2maildir
index 0c21806a..8e444e84 100755
--- a/scripts/slrnspool2maildir
+++ b/scripts/slrnspool2maildir
@@ -23,7 +23,7 @@ foreach my $sub (qw(cur new tmp)) {
 
 foreach my $n (grep(/\d+\z/, glob("$spool/*"))) {
 	if (open my $fh, '<', $n) {
-		my $f = Email::Filter->new(data => eval { local $/; <$fh> });
+		my $f = Email::Filter->new(data => do { local $/; <$fh> });
 		my $s = $f->simple;
 
 		# gmane rewrites Received headers, which increases spamminess
diff --git a/scripts/ssoma-replay b/scripts/ssoma-replay
index 3e928084..07121423 100755
--- a/scripts/ssoma-replay
+++ b/scripts/ssoma-replay
@@ -30,10 +30,7 @@ use Email::Simple;
 use URI::Escape qw/uri_escape_utf8/;
 use File::Temp qw/tempfile/;
 my ($fh, $filename) = tempfile('ssoma-replay-XXXXXXXX', TMPDIR => 1);
-my $msg = eval {
-	local $/;
-	Email::Simple->new(<STDIN>);
-};
+my $msg = Email::Simple->new(do { local $/; <STDIN> });
 select $fh;
 
 # Note: the archive URL makes assumptions about where the
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 7c14117a..b6164ff8 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -18,12 +18,11 @@ my $ibx_config = {
 	-primary_address => 'test@example.com',
 	indexlevel => 'full',
 };
-my $agpl = eval {
+my $agpl = do {
 	open my $fh, '<', 'COPYING' or die "can't open COPYING: $!";
 	local $/;
 	<$fh>;
 };
-$agpl or die "AGPL or die :P\n";
 my $phrase = q("defending all users' freedom");
 my $mime = PublicInbox::MIME->create(
 	header => [
diff --git a/t/watch_maildir_v2.t b/t/watch_maildir_v2.t
index b2514c16..685cd6ed 100644
--- a/t/watch_maildir_v2.t
+++ b/t/watch_maildir_v2.t
@@ -125,7 +125,7 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 {
 	my $patch = 't/data/0001.patch';
 	open my $fh, '<', $patch or die "failed to open $patch: $!\n";
-	$msg = eval { local $/; <$fh> };
+	$msg = do { local $/; <$fh> };
 	PublicInbox::Emergency->new($maildir)->prepare(\$msg);
 	PublicInbox::WatchMaildir->new($config)->scan('full');
 	my ($nr, $msgs) = $srch->reopen->query('dfpost:6e006fd7');

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 6/8] wwwatomstream: move {emit_header} field to $self
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
                   ` (4 preceding siblings ...)
  2020-04-18  3:38 ` [PATCH 5/8] favor `do {}' over `eval {}' for localized slurp Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  2020-04-18  3:38 ` [PATCH 7/8] mbox: use per-message line-ending for From_ line Eric Wong
  2020-04-18  3:38 ` [PATCH 8/8] reduce scope of mbox From_ line removal Eric Wong
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

There's no need to pollute the cross-package $ctx with it.
---
 Documentation/mknews.perl        | 2 +-
 lib/PublicInbox/WwwAtomStream.pm | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index adb83832..a9dede00 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -127,7 +127,7 @@ sub atom_start {
 	require PublicInbox::WwwAtomStream;
 	# WwwAtomStream stats this dir for mtime
 	my $astream = PublicInbox::WwwAtomStream->new($ctx);
-	delete $ctx->{emit_header};
+	delete $astream->{emit_header};
 	my $ibx = $ctx->{-inbox};
 	my $title = PublicInbox::WwwAtomStream::title_tag($ibx->description);
 	my $updated = PublicInbox::WwwAtomStream::feed_updated(gmtime($mtime));
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index aa917ed8..c3fbb1a7 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -20,9 +20,8 @@ sub close {}
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
-	$ctx->{emit_header} = 1;
 	$ctx->{feed_base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	bless { cb => $cb || \&close, ctx => $ctx }, $class;
+	bless { cb => $cb || \&close, ctx => $ctx, emit_header => 1 }, $class;
 }
 
 sub response {
@@ -130,7 +129,7 @@ sub feed_entry {
 	$email = ascii_html($email);
 
 	my $s = '';
-	if (delete $ctx->{emit_header}) {
+	if (delete $self->{emit_header}) {
 		$s .= atom_header($ctx, $title);
 	}
 	$s .= "<entry><author><name>$name</name><email>$email</email>" .

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 7/8] mbox: use per-message line-ending for From_ line
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
                   ` (5 preceding siblings ...)
  2020-04-18  3:38 ` [PATCH 6/8] wwwatomstream: move {emit_header} field to $self Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  2020-04-18  3:38 ` [PATCH 8/8] reduce scope of mbox From_ line removal Eric Wong
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

Email::Simple preserves the message line ending in headers, so
make the From_ line consistent with the rest of the headers.
---
 lib/PublicInbox/Mbox.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index d5beceaf..16de1a72 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -106,7 +106,7 @@ sub msg_hdr ($$;$) {
 		'List-Post', "<mailto:$ibx->{-primary_address}>",
 	);
 	my $crlf = $header_obj->crlf;
-	my $buf = "From mboxrd\@z Thu Jan  1 00:00:00 1970\n" .
+	my $buf = 'From mboxrd@z Thu Jan  1 00:00:00 1970' . $crlf .
 			$header_obj->as_string;
 	for (my $i = 0; $i < @append; $i += 2) {
 		my $k = $append[$i];

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 8/8] reduce scope of mbox From_ line removal
  2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
                   ` (6 preceding siblings ...)
  2020-04-18  3:38 ` [PATCH 7/8] mbox: use per-message line-ending for From_ line Eric Wong
@ 2020-04-18  3:38 ` Eric Wong
  7 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-04-18  3:38 UTC (permalink / raw)
  To: meta

It's unnecessary overhead for anything which does Email::MIME
parsing.  It was never done for v2 indexing, even though v1->v2
conversions did NOT remove those From_ lines.  There was never a
need to remote From_ lines the v1 SearchIdx paths, either.

Hitting a /$INBOX_URL/$MSGID/T/ endpoint with an 18 message
thread reveals a ~0.5% speed improvement.  This will become
more apparent when we have a faster MIME parser.
---
 lib/PublicInbox/Inbox.pm     |  8 ++------
 lib/PublicInbox/Mbox.pm      |  7 +++++--
 lib/PublicInbox/NNTP.pm      |  2 ++
 lib/PublicInbox/SearchIdx.pm |  2 --
 t/psgi_v2.t                  | 28 ++++++++++++++++++----------
 5 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 4bd82989..186eb420 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -311,9 +311,7 @@ sub nntp_usable {
 # for v1 users w/o SQLite only
 sub msg_by_path ($$;$) {
 	my ($self, $path, $ref) = @_;
-	my $str = git($self)->cat_file('HEAD:'.$path, $ref);
-	$$str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s if $str;
-	$str;
+	git($self)->cat_file('HEAD:'.$path, $ref);
 }
 
 sub msg_by_smsg ($$;$) {
@@ -324,9 +322,7 @@ sub msg_by_smsg ($$;$) {
 	return unless defined $smsg;
 	defined(my $blob = $smsg->{blob}) or return;
 
-	my $str = git($self)->cat_file($blob, $ref);
-	$$str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s if $str;
-	$str;
+	git($self)->cat_file($blob, $ref);
 }
 
 sub smsg_mime {
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 16de1a72..9995140c 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -106,8 +106,11 @@ sub msg_hdr ($$;$) {
 		'List-Post', "<mailto:$ibx->{-primary_address}>",
 	);
 	my $crlf = $header_obj->crlf;
-	my $buf = 'From mboxrd@z Thu Jan  1 00:00:00 1970' . $crlf .
-			$header_obj->as_string;
+	my $buf = $header_obj->as_string;
+	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
+	$buf =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
+	$buf = "From mboxrd\@z Thu Jan  1 00:00:00 1970" . $crlf . $buf;
+
 	for (my $i = 0; $i < @append; $i += 2) {
 		my $k = $append[$i];
 		my $v = $append[$i + 1];
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index d1f75f6f..c79f198b 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -506,6 +506,8 @@ sub set_art {
 sub msg_hdr_write ($$$) {
 	my ($self, $hdr, $body_follows) = @_;
 	$hdr = $hdr->as_string;
+	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
+	$hdr =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 	utf8::encode($hdr);
 	$hdr =~ s/(?<!\r)\n/\r\n/sg; # Alpine barfs without this
 
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index d1290dc2..579b85e3 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -553,8 +553,6 @@ sub do_cat_mail {
 	my ($git, $blob, $sizeref) = @_;
 	my $str = $git->cat_file($blob, $sizeref) or
 		die "BUG: $blob not found in $git->{git_dir}";
-	# fixup bugs from import:
-	$$str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 	PublicInbox::MIME->new($str);
 }
 
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index c4f80869..57017de1 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -26,16 +26,16 @@ my $new_mid;
 my $im = PublicInbox::V2Writable->new($ibx, 1);
 $im->{parallel} = 0;
 
-my $mime = PublicInbox::MIME->create(
-	header => [
-		From => 'a@example.com',
-		To => 'test@example.com',
-		Subject => 'this is a subject',
-		'Message-ID' => '<a-mid@b>',
-		Date => 'Fri, 02 Oct 1993 00:00:00 +0000',
-	],
-	body => "hello world\n",
-);
+my $mime = PublicInbox::MIME->new(<<'EOF');
+From oldbug-pre-a0c07cba0e5d8b6a Fri Oct  2 00:00:00 1993
+From: a@example.com
+To: test@example.com
+Subject: this is a subject
+Message-ID: <a-mid@b>
+Date: Fri, 02 Oct 1993 00:00:00 +0000
+
+hello world
+EOF
 ok($im->add($mime), 'added one message');
 $mime->body_set("hello world!\n");
 
@@ -48,6 +48,10 @@ my $mids = mids($mime->header_obj);
 $new_mid = $mids->[1];
 $im->done;
 
+my $msg = $ibx->msg_by_mid('a-mid@b');
+like($$msg, qr/\AFrom oldbug/s,
+	'"From_" line stored to test old bug workaround');
+
 my $cfgpfx = "publicinbox.v2test";
 my $cfg = <<EOF;
 $cfgpfx.address=$ibx->{-primary_address}
@@ -63,6 +67,7 @@ test_psgi(sub { $www->call(@_) }, sub {
 		'got v2 description missing message');
 	$res = $cb->(GET('/v2test/a-mid@b/raw'));
 	$raw = $res->content;
+	unlike($raw, qr/^From oldbug/sm, 'buggy "From_" line omitted');
 	like($raw, qr/^hello world$/m, 'got first message');
 	like($raw, qr/^hello world!$/m, 'got second message');
 	@from_ = ($raw =~ m/^From /mg);
@@ -123,6 +128,7 @@ test_psgi(sub { $www->call(@_) }, sub {
 		my $out;
 		my $in = $res->content;
 		my $status = IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+		unlike($out, qr/^From oldbug/sm, 'buggy "From_" line omitted');
 		like($out, qr/^hello world$/m, 'got first in t.mbox.gz');
 		like($out, qr/^hello world!$/m, 'got second in t.mbox.gz');
 		like($out, qr/^hello ghosts$/m, 'got third in t.mbox.gz');
@@ -133,6 +139,7 @@ test_psgi(sub { $www->call(@_) }, sub {
 		$res = $cb->(POST('/v2test/?q=m:a-mid@b&x=m'));
 		$in = $res->content;
 		$status = IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+		unlike($out, qr/^From oldbug/sm, 'buggy "From_" line omitted');
 		like($out, qr/^hello world$/m, 'got first in mbox POST');
 		like($out, qr/^hello world!$/m, 'got second in mbox POST');
 		like($out, qr/^hello ghosts$/m, 'got third in mbox POST');
@@ -143,6 +150,7 @@ test_psgi(sub { $www->call(@_) }, sub {
 		$res = $cb->(GET('/v2test/all.mbox.gz'));
 		$in = $res->content;
 		$status = IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+		unlike($out, qr/^From oldbug/sm, 'buggy "From_" line omitted');
 		like($out, qr/^hello world$/m, 'got first in all.mbox');
 		like($out, qr/^hello world!$/m, 'got second in all.mbox');
 		like($out, qr/^hello ghosts$/m, 'got third in all.mbox');

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-18  3:38 [PATCH 0/8] some small yak shaving things Eric Wong
2020-04-18  3:38 ` [PATCH 1/8] inboxwritable: mime_from_path: reuse in more places Eric Wong
2020-04-18  3:38 ` [PATCH 2/8] searchidx: die on cat-file failures Eric Wong
2020-04-18  3:38 ` [PATCH 3/8] inbox: don't memoize missing description|cloneurl Eric Wong
2020-04-18  3:38 ` [PATCH 4/8] inbox: replace `eval {}' with `do {}' where appropriate Eric Wong
2020-04-18  3:38 ` [PATCH 5/8] favor `do {}' over `eval {}' for localized slurp Eric Wong
2020-04-18  3:38 ` [PATCH 6/8] wwwatomstream: move {emit_header} field to $self Eric Wong
2020-04-18  3:38 ` [PATCH 7/8] mbox: use per-message line-ending for From_ line Eric Wong
2020-04-18  3:38 ` [PATCH 8/8] reduce scope of mbox From_ line removal Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git