From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 01/21] lei: more consistent dedupe and ovv_buf init
Date: Sun, 31 Jan 2021 22:28:13 -1000 [thread overview]
Message-ID: <20210201082833.3293-2-e@80x24.org> (raw)
In-Reply-To: <20210201082833.3293-1-e@80x24.org>
This fixes "--dedupe none" with Maildir where we don't
create the object at all.
---
lib/PublicInbox/LeiDedupe.pm | 4 ++--
lib/PublicInbox/LeiOverview.pm | 18 ++++++++++--------
lib/PublicInbox/LeiToMail.pm | 3 +--
3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/lib/PublicInbox/LeiDedupe.pm b/lib/PublicInbox/LeiDedupe.pm
index 3f478aa4..e3ae8e33 100644
--- a/lib/PublicInbox/LeiDedupe.pm
+++ b/lib/PublicInbox/LeiDedupe.pm
@@ -103,8 +103,8 @@ sub new {
bless [ $skv, undef, undef, $m ], $cls;
}
-# returns true on unseen messages according to the deduplication strategy,
-# returns false if seen
+# returns true on seen messages according to the deduplication strategy,
+# returns false if unseen
sub is_dup {
my ($self, $eml, $oid) = @_;
!$self->[1]->($eml, $oid);
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index c67e2747..fa041457 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -92,13 +92,14 @@ sub new {
ovv_out_lk_init($self);
}
}
- if (!$json) {
+ if ($json) {
+ $lei->{dedupe} //= PublicInbox::LeiDedupe->new($lei);
+ } else {
# default to the cheapest sort since MUA usually resorts
$lei->{opt}->{'sort'} //= 'docid' if $dst ne '/dev/stdout';
$lei->{l2m} = eval { PublicInbox::LeiToMail->new($lei) };
return $lei->fail($@) if $@;
}
- $lei->{dedupe} //= PublicInbox::LeiDedupe->new($lei);
$self;
}
@@ -201,15 +202,19 @@ sub _json_pretty {
sub ovv_each_smsg_cb { # runs in wq worker usually
my ($self, $lei, $ibxish) = @_;
- my $json;
+ my ($json, $dedupe);
$lei->{1}->autoflush(1);
- my $dedupe = $lei->{dedupe} // die 'BUG: {dedupe} missing';
if (my $pkg = $self->{json}) {
$json = $pkg->new;
$json->utf8->canonical;
$json->ascii(1) if $lei->{opt}->{ascii};
}
- my $l2m = $lei->{l2m} or $dedupe->prepare_dedupe;
+ my $l2m = $lei->{l2m};
+ if (!$l2m) {
+ $dedupe = $lei->{dedupe} // die 'BUG: {dedupe} missing';
+ $dedupe->prepare_dedupe;
+ }
+ $lei->{ovv_buf} = \(my $buf = '') if !$l2m;
if ($l2m && !$ibxish) { # remote https?:// mboxrd
delete $l2m->{-wq_s1};
my $g2m = $l2m->can('git_to_mail');
@@ -241,7 +246,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
$self->{git} = $git; # for ovv_atexit_child
my $g2m = $l2m->can('git_to_mail');
- $dedupe->prepare_dedupe;
sub {
my ($smsg, $mitem) = @_;
$smsg->{pct} = get_pct($mitem) if $mitem;
@@ -249,7 +253,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
};
} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
- $lei->{ovv_buf} = \(my $buf = '');
sub { # DIY prettiness :P
my ($smsg, $mitem) = @_;
return if $dedupe->is_smsg_dup($smsg);
@@ -273,7 +276,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
}
} elsif ($json) {
my $ORS = $self->{fmt} eq 'json' ? ",\n" : "\n"; # JSONL
- $lei->{ovv_buf} = \(my $buf = '');
sub {
my ($smsg, $mitem) = @_;
return if $dedupe->is_smsg_dup($smsg);
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 61b546b5..244bfb67 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -323,7 +323,7 @@ sub _buf2maildir {
sub _maildir_write_cb ($$) {
my ($self, $lei) = @_;
my $dedupe = $lei->{dedupe};
- $dedupe->prepare_dedupe;
+ $dedupe->prepare_dedupe if $dedupe;
my $dst = $lei->{ovv}->{dst};
sub { # for git_to_mail
my ($buf, $smsg, $eml) = @_;
@@ -464,7 +464,6 @@ sub write_mail { # via ->wq_do
my $wcb = $self->{wcb} //= do { # first message
my %sig = $lei->atfork_child_wq($self);
@SIG{keys %sig} = values %sig; # not local
- $lei->{dedupe}->prepare_dedupe;
$self->write_cb($lei);
};
my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);
next prev parent reply other threads:[~2021-02-01 8:28 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-01 8:28 [PATCH 00/21] lei2mail worker segfault finally fixed Eric Wong
2021-02-01 8:28 ` Eric Wong [this message]
2021-02-01 8:28 ` [PATCH 02/21] ipc: switch wq to use the event loop Eric Wong
2021-02-01 8:28 ` [PATCH 03/21] lei: remove per-child SIG{__WARN__} Eric Wong
2021-02-01 8:28 ` [PATCH 04/21] lei: remove SIGPIPE handler Eric Wong
2021-02-01 8:28 ` [PATCH 05/21] ipc: more helpful ETOOMANYREFS error messages Eric Wong
2021-02-01 8:28 ` [PATCH 06/21] lei: remove syslog dependency Eric Wong
2021-02-01 8:28 ` [PATCH 07/21] sharedkv: release {dbh} before rmtree Eric Wong
2021-02-01 8:28 ` [PATCH 08/21] lei: keep $lei around until workers are reaped Eric Wong
2021-02-01 8:28 ` [PATCH 09/21] lei_dedupe: use Digest::SHA Eric Wong
2021-02-01 8:28 ` [PATCH 10/21] lei_xsearch: load PublicInbox::Smsg Eric Wong
2021-02-01 8:28 ` [PATCH 11/21] lei: deep clone {ovv} for l2m workers Eric Wong
2021-02-01 8:28 ` [PATCH 12/21] sharedkv: lock and explicitly disconnect {dbh} Eric Wong
2021-02-01 8:28 ` [PATCH 13/21] lei: increase initial timeout Eric Wong
2021-02-01 8:28 ` [PATCH 14/21] sharedkv: use lock_for_scope_fast Eric Wong
2021-02-01 8:28 ` [PATCH 15/21] lei_to_mail: reduce spew on Maildir removal Eric Wong
2021-02-01 8:28 ` [PATCH 16/21] sharedkv: do not set cache_size by default Eric Wong
2021-02-01 8:28 ` [PATCH 17/21] import: reap git-config(1) synchronously Eric Wong
2021-02-01 8:28 ` [PATCH 18/21] ds: guard against stack-not-refcounted quirk of Perl 5 Eric Wong
2021-02-01 9:07 ` Perl debug patches used to track down source of segfault Eric Wong
2021-02-01 8:28 ` [PATCH 19/21] ds: next_tick: avoid $_ in top-level loop iterator Eric Wong
2021-02-01 8:28 ` [PATCH 20/21] lei: avoid ETOOMANYREFS, cleanup imports Eric Wong
2021-02-01 8:28 ` [PATCH 21/21] doc: note optional BSD::Resource use Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210201082833.3293-2-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).