user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 05/11] watch: avoid unnecessary spawning on spam removals
  2020-08-31  4:41  7% [PATCH 00/11] watch: fix contention w/ Maildir & NNTP Eric Wong
@ 2020-08-31  4:41  6% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2020-08-31  4:41 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong

From: Eric Wong <e@yhbt.net>

This should further mitigate lock contention problems
when -watch is configured to watch on a Maildir for spam
while performing a large NNTP import.

There is now a small risk a message won't get removed because if
it's in the current (uncommitted) fast-import batch, but
unlikely given the batch size is now only 10 messages.

If a that small window is hit, flipping the \Seen flag
(e.g. marking it unread, and then read again) will trigger
another removal attempt via IMAP or Maildir.
---
 lib/PublicInbox/Import.pm     |  3 +++
 lib/PublicInbox/V2Writable.pm |  3 +++
 lib/PublicInbox/Watch.pm      | 31 +++++++++++++++++++++++++------
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 700b4026..ee5ca2ea 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -461,6 +461,9 @@ sub init_bare {
 	}
 }
 
+# true if locked and active
+sub active { !!$_[0]->{out} }
+
 sub done {
 	my ($self) = @_;
 	my $w = delete $self->{out} or return;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index f2288904..553dd839 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -655,6 +655,9 @@ sub checkpoint ($;$) {
 # public
 sub barrier { checkpoint($_[0], 1) };
 
+# true if locked and active
+sub active { !!$_[0]->{im} }
+
 # public
 sub done {
 	my ($self) = @_;
diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm
index 5f786139..0bb92d0a 100644
--- a/lib/PublicInbox/Watch.pm
+++ b/lib/PublicInbox/Watch.pm
@@ -134,15 +134,34 @@ sub _done_for_now {
 sub remove_eml_i { # each_inbox callback
 	my ($ibx, $arg) = @_;
 	my ($self, $eml, $loc) = @$arg;
+
 	eval {
-		my $im = _importer_for($self, $ibx);
-		$im->remove($eml, 'spam');
-		if (my $scrub = $ibx->filter($im)) {
-			my $scrubbed = $scrub->scrub($eml, 1);
-			if ($scrubbed && $scrubbed != REJECT) {
-				$im->remove($scrubbed, 'spam');
+		# try to avoid taking a lock or unnecessary spawning
+		my $im = $self->{importers}->{"$ibx"};
+		my $scrubbed;
+		if ((!$im || !$im->active) && $ibx->over) {
+			if (content_exists($ibx, $eml)) {
+				# continue
+			} elsif (my $scrub = $ibx->filter($im)) {
+				$scrubbed = $scrub->scrub($eml, 1);
+				if ($scrubbed && $scrubbed != REJECT &&
+					  !content_exists($ibx, $scrubbed)) {
+					return;
+				}
+			} else {
+				return;
 			}
 		}
+
+		$im //= _importer_for($self, $ibx); # may spawn fast-import
+		$im->remove($eml, 'spam');
+		$scrubbed //= do {
+			my $scrub = $ibx->filter($im);
+			$scrub ? $scrub->scrub($eml, 1) : undef;
+		};
+		if ($scrubbed && $scrubbed != REJECT) {
+			$im->remove($scrubbed, 'spam');
+		}
 	};
 	if ($@) {
 		warn "error removing spam at: $loc from $ibx->{name}: $@\n";

^ permalink raw reply related	[relevance 6%]

* [PATCH 00/11] watch: fix contention w/ Maildir & NNTP
@ 2020-08-31  4:41  7% Eric Wong
  2020-08-31  4:41  6% ` [PATCH 05/11] watch: avoid unnecessary spawning on spam removals Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2020-08-31  4:41 UTC (permalink / raw)
  To: meta

Here's a bunch of fixes to improve watch performance when
both Maildirs and NNTP are being watched (possibly on the same
inbox, or if `watchspam' is configured for spam removals).

Wakeups are reduced, and inbox.lock contention is minimized by
using read-only ->over to check for `watchspam' removals.

These affect IMAP, too; but I've been mainly using NNTP.

Eric Wong (11):
  watch: limit batch size of NNTP and IMAP workers, too
  watchmaildir: use v5.10.1, drop warnings
  rename WatchMaildir => Watch
  watch: log signal activities to STDERR
  watch: avoid unnecessary spawning on spam removals
  watch: block signals before fork on non-signalfd/kevent systems
  watch: comments and tiny cleanups
  ds: avoid excessive queueing when reaping PIDs
  watch: use EOFpipe to reduce dwaitpid wakeups
  ds: avoid unnecessary timer for waitpid
  replace ParentPipe with EOFpipe

 MANIFEST                                      |   4 +-
 lib/PublicInbox/DS.pm                         |  38 +++---
 lib/PublicInbox/Daemon.pm                     |   6 +-
 lib/PublicInbox/EOFpipe.pm                    |  24 ++++
 lib/PublicInbox/Import.pm                     |   3 +
 lib/PublicInbox/ParentPipe.pm                 |  23 ----
 lib/PublicInbox/V2Writable.pm                 |   3 +
 lib/PublicInbox/{WatchMaildir.pm => Watch.pm} | 111 +++++++++++++-----
 script/public-inbox-watch                     |  34 ++++--
 t/imapd.t                                     |   2 +-
 t/nntpd.t                                     |   2 +-
 t/watch_filter_rubylang.t                     |   4 +-
 t/watch_imap.t                                |   4 +-
 t/watch_maildir.t                             |  18 +--
 t/watch_maildir_v2.t                          |  22 ++--
 t/watch_multiple_headers.t                    |   4 +-
 t/watch_nntp.t                                |   4 +-
 17 files changed, 190 insertions(+), 116 deletions(-)
 create mode 100644 lib/PublicInbox/EOFpipe.pm
 delete mode 100644 lib/PublicInbox/ParentPipe.pm
 rename lib/PublicInbox/{WatchMaildir.pm => Watch.pm} (92%)

^ permalink raw reply	[relevance 7%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-08-31  4:41  7% [PATCH 00/11] watch: fix contention w/ Maildir & NNTP Eric Wong
2020-08-31  4:41  6% ` [PATCH 05/11] watch: avoid unnecessary spawning on spam removals Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).