user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 01/21] lei: more consistent dedupe and ovv_buf init
  2021-02-01  8:28  6% [PATCH 00/21] lei2mail worker segfault finally fixed Eric Wong
@ 2021-02-01  8:28  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2021-02-01  8:28 UTC (permalink / raw)
  To: meta

This fixes "--dedupe none" with Maildir where we don't
create the object at all.
---
 lib/PublicInbox/LeiDedupe.pm   |  4 ++--
 lib/PublicInbox/LeiOverview.pm | 18 ++++++++++--------
 lib/PublicInbox/LeiToMail.pm   |  3 +--
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/LeiDedupe.pm b/lib/PublicInbox/LeiDedupe.pm
index 3f478aa4..e3ae8e33 100644
--- a/lib/PublicInbox/LeiDedupe.pm
+++ b/lib/PublicInbox/LeiDedupe.pm
@@ -103,8 +103,8 @@ sub new {
 	bless [ $skv, undef, undef, $m ], $cls;
 }
 
-# returns true on unseen messages according to the deduplication strategy,
-# returns false if seen
+# returns true on seen messages according to the deduplication strategy,
+# returns false if unseen
 sub is_dup {
 	my ($self, $eml, $oid) = @_;
 	!$self->[1]->($eml, $oid);
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index c67e2747..fa041457 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -92,13 +92,14 @@ sub new {
 			ovv_out_lk_init($self);
 		}
 	}
-	if (!$json) {
+	if ($json) {
+		$lei->{dedupe} //= PublicInbox::LeiDedupe->new($lei);
+	} else {
 		# default to the cheapest sort since MUA usually resorts
 		$lei->{opt}->{'sort'} //= 'docid' if $dst ne '/dev/stdout';
 		$lei->{l2m} = eval { PublicInbox::LeiToMail->new($lei) };
 		return $lei->fail($@) if $@;
 	}
-	$lei->{dedupe} //= PublicInbox::LeiDedupe->new($lei);
 	$self;
 }
 
@@ -201,15 +202,19 @@ sub _json_pretty {
 
 sub ovv_each_smsg_cb { # runs in wq worker usually
 	my ($self, $lei, $ibxish) = @_;
-	my $json;
+	my ($json, $dedupe);
 	$lei->{1}->autoflush(1);
-	my $dedupe = $lei->{dedupe} // die 'BUG: {dedupe} missing';
 	if (my $pkg = $self->{json}) {
 		$json = $pkg->new;
 		$json->utf8->canonical;
 		$json->ascii(1) if $lei->{opt}->{ascii};
 	}
-	my $l2m = $lei->{l2m} or $dedupe->prepare_dedupe;
+	my $l2m = $lei->{l2m};
+	if (!$l2m) {
+		$dedupe = $lei->{dedupe} // die 'BUG: {dedupe} missing';
+		$dedupe->prepare_dedupe;
+	}
+	$lei->{ovv_buf} = \(my $buf = '') if !$l2m;
 	if ($l2m && !$ibxish) { # remote https?:// mboxrd
 		delete $l2m->{-wq_s1};
 		my $g2m = $l2m->can('git_to_mail');
@@ -241,7 +246,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
 		my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
 		$self->{git} = $git; # for ovv_atexit_child
 		my $g2m = $l2m->can('git_to_mail');
-		$dedupe->prepare_dedupe;
 		sub {
 			my ($smsg, $mitem) = @_;
 			$smsg->{pct} = get_pct($mitem) if $mitem;
@@ -249,7 +253,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
 		};
 	} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
 		my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
-		$lei->{ovv_buf} = \(my $buf = '');
 		sub { # DIY prettiness :P
 			my ($smsg, $mitem) = @_;
 			return if $dedupe->is_smsg_dup($smsg);
@@ -273,7 +276,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
 		}
 	} elsif ($json) {
 		my $ORS = $self->{fmt} eq 'json' ? ",\n" : "\n"; # JSONL
-		$lei->{ovv_buf} = \(my $buf = '');
 		sub {
 			my ($smsg, $mitem) = @_;
 			return if $dedupe->is_smsg_dup($smsg);
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 61b546b5..244bfb67 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -323,7 +323,7 @@ sub _buf2maildir {
 sub _maildir_write_cb ($$) {
 	my ($self, $lei) = @_;
 	my $dedupe = $lei->{dedupe};
-	$dedupe->prepare_dedupe;
+	$dedupe->prepare_dedupe if $dedupe;
 	my $dst = $lei->{ovv}->{dst};
 	sub { # for git_to_mail
 		my ($buf, $smsg, $eml) = @_;
@@ -464,7 +464,6 @@ sub write_mail { # via ->wq_do
 	my $wcb = $self->{wcb} //= do { # first message
 		my %sig = $lei->atfork_child_wq($self);
 		@SIG{keys %sig} = values %sig; # not local
-		$lei->{dedupe}->prepare_dedupe;
 		$self->write_cb($lei);
 	};
 	my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);

^ permalink raw reply related	[relevance 7%]

* [PATCH 00/21] lei2mail worker segfault finally fixed
@ 2021-02-01  8:28  6% Eric Wong
  2021-02-01  8:28  7% ` [PATCH 01/21] lei: more consistent dedupe and ovv_buf init Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2021-02-01  8:28 UTC (permalink / raw)
  To: meta

This lei2mail segfault turned out to be an old Perl 5 quirk
which plagued many before me.  It was not consistently
reproducible, and random changes seemed to make it happen more
or less frequently.  There were several times when I thought I
fixed it (and maybe this is still one of them!) only to have it
pop up again.

Still, I found many other little bugs and improvements worth
doing along the way.  Hope things go more smoothly in the
future...

Anyways, [PATCH 18/21] is the fix (and I'll followup with more
on how I found the fix).  19/21 is purely defensive
future-proofing.

Eric Wong (21):
  lei: more consistent dedupe and ovv_buf init
  ipc: switch wq to use the event loop
  lei: remove per-child SIG{__WARN__}
  lei: remove SIGPIPE handler
  ipc: more helpful ETOOMANYREFS error messages
  lei: remove syslog dependency
  sharedkv: release {dbh} before rmtree
  lei: keep $lei around until workers are reaped
  lei_dedupe: use Digest::SHA
  lei_xsearch: load PublicInbox::Smsg
  lei: deep clone {ovv} for l2m workers
  sharedkv: lock and explicitly disconnect {dbh}
  lei: increase initial timeout
  sharedkv: use lock_for_scope_fast
  lei_to_mail: reduce spew on Maildir removal
  sharedkv: do not set cache_size by default
  import: reap git-config(1) synchronously
  ds: guard against stack-not-refcounted quirk of Perl 5
  ds: next_tick: avoid $_ in top-level loop iterator
  lei: avoid ETOOMANYREFS, cleanup imports
  doc: note optional BSD::Resource use

 Documentation/public-inbox-config.pod |  2 +-
 INSTALL                               |  6 ++
 MANIFEST                              |  2 +
 lib/PublicInbox/DS.pm                 | 12 ++--
 lib/PublicInbox/IPC.pm                | 43 +++++++-----
 lib/PublicInbox/Import.pm             |  1 +
 lib/PublicInbox/LEI.pm                | 95 +++++++++++++++------------
 lib/PublicInbox/LeiDedupe.pm          |  6 +-
 lib/PublicInbox/LeiExternal.pm        |  3 +-
 lib/PublicInbox/LeiOverview.pm        | 51 +++++++-------
 lib/PublicInbox/LeiToMail.pm          | 84 +++++++++++------------
 lib/PublicInbox/LeiXSearch.pm         | 36 +++++-----
 lib/PublicInbox/Lock.pm               | 17 +++++
 lib/PublicInbox/SharedKV.pm           | 33 +++++++---
 lib/PublicInbox/WQWorker.pm           | 34 ++++++++++
 script/lei                            | 28 +++++---
 t/lei_to_mail.t                       | 31 +++++----
 xt/stress-sharedkv.t                  | 50 ++++++++++++++
 18 files changed, 342 insertions(+), 192 deletions(-)
 create mode 100644 lib/PublicInbox/WQWorker.pm
 create mode 100644 xt/stress-sharedkv.t

^ permalink raw reply	[relevance 6%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-02-01  8:28  6% [PATCH 00/21] lei2mail worker segfault finally fixed Eric Wong
2021-02-01  8:28  7% ` [PATCH 01/21] lei: more consistent dedupe and ovv_buf init Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).