user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Cc: Eric Wong <e@yhbt.net>
Subject: [PATCH 05/11] watch: avoid unnecessary spawning on spam removals
Date: Mon, 31 Aug 2020 04:41:34 +0000	[thread overview]
Message-ID: <20200831044140.17027-6-e@80x24.org> (raw)
In-Reply-To: <20200831044140.17027-1-e@80x24.org>

From: Eric Wong <e@yhbt.net>

This should further mitigate lock contention problems
when -watch is configured to watch on a Maildir for spam
while performing a large NNTP import.

There is now a small risk a message won't get removed because if
it's in the current (uncommitted) fast-import batch, but
unlikely given the batch size is now only 10 messages.

If a that small window is hit, flipping the \Seen flag
(e.g. marking it unread, and then read again) will trigger
another removal attempt via IMAP or Maildir.
---
 lib/PublicInbox/Import.pm     |  3 +++
 lib/PublicInbox/V2Writable.pm |  3 +++
 lib/PublicInbox/Watch.pm      | 31 +++++++++++++++++++++++++------
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 700b4026..ee5ca2ea 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -461,6 +461,9 @@ sub init_bare {
 	}
 }
 
+# true if locked and active
+sub active { !!$_[0]->{out} }
+
 sub done {
 	my ($self) = @_;
 	my $w = delete $self->{out} or return;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index f2288904..553dd839 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -655,6 +655,9 @@ sub checkpoint ($;$) {
 # public
 sub barrier { checkpoint($_[0], 1) };
 
+# true if locked and active
+sub active { !!$_[0]->{im} }
+
 # public
 sub done {
 	my ($self) = @_;
diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm
index 5f786139..0bb92d0a 100644
--- a/lib/PublicInbox/Watch.pm
+++ b/lib/PublicInbox/Watch.pm
@@ -134,15 +134,34 @@ sub _done_for_now {
 sub remove_eml_i { # each_inbox callback
 	my ($ibx, $arg) = @_;
 	my ($self, $eml, $loc) = @$arg;
+
 	eval {
-		my $im = _importer_for($self, $ibx);
-		$im->remove($eml, 'spam');
-		if (my $scrub = $ibx->filter($im)) {
-			my $scrubbed = $scrub->scrub($eml, 1);
-			if ($scrubbed && $scrubbed != REJECT) {
-				$im->remove($scrubbed, 'spam');
+		# try to avoid taking a lock or unnecessary spawning
+		my $im = $self->{importers}->{"$ibx"};
+		my $scrubbed;
+		if ((!$im || !$im->active) && $ibx->over) {
+			if (content_exists($ibx, $eml)) {
+				# continue
+			} elsif (my $scrub = $ibx->filter($im)) {
+				$scrubbed = $scrub->scrub($eml, 1);
+				if ($scrubbed && $scrubbed != REJECT &&
+					  !content_exists($ibx, $scrubbed)) {
+					return;
+				}
+			} else {
+				return;
 			}
 		}
+
+		$im //= _importer_for($self, $ibx); # may spawn fast-import
+		$im->remove($eml, 'spam');
+		$scrubbed //= do {
+			my $scrub = $ibx->filter($im);
+			$scrub ? $scrub->scrub($eml, 1) : undef;
+		};
+		if ($scrubbed && $scrubbed != REJECT) {
+			$im->remove($scrubbed, 'spam');
+		}
 	};
 	if ($@) {
 		warn "error removing spam at: $loc from $ibx->{name}: $@\n";

  parent reply	other threads:[~2020-08-31  4:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-31  4:41 [PATCH 00/11] watch: fix contention w/ Maildir & NNTP Eric Wong
2020-08-31  4:41 ` [PATCH 01/11] watch: limit batch size of NNTP and IMAP workers, too Eric Wong
2020-08-31  4:41 ` [PATCH 02/11] watchmaildir: use v5.10.1, drop warnings Eric Wong
2020-08-31  4:41 ` [PATCH 03/11] rename WatchMaildir => Watch Eric Wong
2020-08-31  4:41 ` [PATCH 04/11] watch: log signal activities to STDERR Eric Wong
2020-08-31  4:41 ` Eric Wong [this message]
2020-08-31  4:41 ` [PATCH 06/11] watch: block signals before fork on non-signalfd/kevent systems Eric Wong
2020-08-31  4:41 ` [PATCH 07/11] watch: comments and tiny cleanups Eric Wong
2020-08-31  4:41 ` [PATCH 08/11] ds: avoid excessive queueing when reaping PIDs Eric Wong
2020-08-31  4:41 ` [PATCH 09/11] watch: use EOFpipe to reduce dwaitpid wakeups Eric Wong
2020-08-31  4:41 ` [PATCH 10/11] ds: avoid unnecessary timer for waitpid Eric Wong
2020-08-31  4:41 ` [PATCH 11/11] replace ParentPipe with EOFpipe Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200831044140.17027-6-e@80x24.org \
    --to=e@80x24.org \
    --cc=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).