user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH 25/34] watch: remove {mdir} array
Date: Sat, 27 Jun 2020 10:03:51 +0000
Message-ID: <20200627100400.9871-26-e@yhbt.net> (raw)
In-Reply-To: <20200627100400.9871-1-e@yhbt.net>

Since we store all watched directory names as keys in %mdmap,
there should be no need to keep an array of those directories
around.

t/watch_maildir*.t required changes to remove trained spam.
Once we've trained something as spam, there shouldn't be
a need to rescan it.
---
 lib/PublicInbox/WatchMaildir.pm | 22 ++++++++--------------
 t/watch_maildir.t               |  2 ++
 t/watch_maildir_v2.t            |  2 ++
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 621d41bd81d..8d2dc432684 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -40,8 +40,7 @@ sub compile_watchheaders ($) {
 
 sub new {
 	my ($class, $config) = @_;
-	my (%mdmap, @mdir, $spamc);
-	my %uniq; # directory => count
+	my (%mdmap, $spamc);
 	my %imap; # url => [inbox objects] or 'watchspam'
 
 	# "publicinboxwatch" is the documented namespace
@@ -54,10 +53,7 @@ sub new {
 		for my $dir (@$dirs) {
 			if (is_maildir($dir)) {
 				# skip "new", no MUA has seen it, yet.
-				my $cur = "$dir/cur";
-				push @mdir, $cur;
-				$uniq{$cur}++;
-				$mdmap{$cur} = 'watchspam';
+				$mdmap{"$dir/cur"} = 'watchspam';
 			} elsif (my $url = imap_url($dir)) {
 				$imap{$url} = 'watchspam';
 			} else {
@@ -83,8 +79,6 @@ sub new {
 				my ($new, $cur) = ("$watch/new", "$watch/cur");
 				my $cur_dst = $mdmap{$cur} //= [];
 				return if is_watchspam($cur, $cur_dst, $ibx);
-				push @mdir, $new unless $uniq{$new}++;
-				push @mdir, $cur unless $uniq{$cur}++;
 				push @{$mdmap{$new} //= []}, $ibx;
 				push @$cur_dst, $ibx;
 			} elsif (my $url = imap_url($watch)) {
@@ -96,17 +90,16 @@ sub new {
 			}
 		}
 	});
-	return unless scalar(@mdir) || scalar(keys %imap);
 
 	my $mdre;
-	if (@mdir) {
-		$mdre = join('|', map { quotemeta($_) } @mdir);
+	if (scalar keys %mdmap) {
+		$mdre = join('|', map { quotemeta($_) } keys %mdmap);
 		$mdre = qr!\A($mdre)/!;
 	}
+	return unless $mdre || scalar(keys %imap);
 	bless {
 		spamcheck => $spamcheck,
 		mdmap => \%mdmap,
-		mdir => \@mdir,
 		mdre => $mdre,
 		config => $config,
 		imap => scalar keys %imap ? \%imap : undef,
@@ -231,7 +224,8 @@ sub watch_fs_init ($) {
 		$self->{done_timer} //= PublicInbox::DS::requeue($done);
 	};
 	require PublicInbox::DirIdle;
-	PublicInbox::DirIdle->new($self->{mdir}, $cb); # EPOLL_CTL_ADD
+	# inotify_create + EPOLL_CTL_ADD
+	PublicInbox::DirIdle->new([keys %{$self->{mdmap}}], $cb);
 }
 
 # returns the git config section name, e.g [imap "imaps://user@example.com"]
@@ -688,7 +682,7 @@ sub fs_scan_step {
 		$opendirs->{$dir} = $dh if $n < 0;
 	}
 	if ($op && $op eq 'full') {
-		foreach my $dir (@{$self->{mdir}}) {
+		foreach my $dir (keys %{$self->{mdmap}}) {
 			next if $opendirs->{$dir}; # already in progress
 			my $ok = opendir(my $dh, $dir);
 			unless ($ok) {
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index c8658140cf2..c44273f0519 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -84,6 +84,7 @@ PublicInbox::WatchMaildir->new($config)->scan('full');
 is(scalar @list, 2, 'two revisions in rev-list');
 @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master));
 is(scalar @list, 0, 'tree is empty');
+is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 
 # check with scrubbing
 {
@@ -105,6 +106,7 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 	is(scalar @list, 0, 'tree is empty');
 	@list = $git->qx(qw(rev-list refs/heads/master));
 	is(scalar @list, 4, 'four revisions in rev-list');
+	is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 }
 
 {
diff --git a/t/watch_maildir_v2.t b/t/watch_maildir_v2.t
index 6cc8b6ff0e9..f5b8e932985 100644
--- a/t/watch_maildir_v2.t
+++ b/t/watch_maildir_v2.t
@@ -71,6 +71,7 @@ $write_spam->();
 is(unlink(glob("$maildir/new/*")), 1, 'unlinked old spam');
 PublicInbox::WatchMaildir->new($config)->scan('full');
 is(($srch->reopen->query(''))[0], 0, 'deleted file');
+is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 
 # check with scrubbing
 {
@@ -90,6 +91,7 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 	PublicInbox::WatchMaildir->new($config)->scan('full');
 	($nr, $msgs) = $srch->reopen->query('');
 	is($nr, 0, 'inbox is empty again');
+	is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 }
 
 {

  parent reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
2020-06-27 10:03 ` [PATCH 01/34] inboxwritable: ensure ssoma.lock exists on init Eric Wong
2020-06-27 10:03 ` [PATCH 02/34] inbox: warn on ->on_inbox_unlock exception Eric Wong
2020-06-27 10:03 ` [PATCH 03/34] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric Wong
2020-06-27 10:03 ` [PATCH 04/34] imaptracker: use ~/.local/share/public-inbox/imap.sqlite3 Eric Wong
2020-06-27 10:03 ` [PATCH 05/34] watchmaildir: hoist out compile_watchheaders Eric Wong
2020-06-27 10:03 ` [PATCH 06/34] watchmaildir: fix check for spam vs ham inbox conflicts Eric Wong
2020-06-27 10:03 ` [PATCH 07/34] URI IMAP support Eric Wong
2020-06-27 10:03 ` [PATCH 08/34] watch: preliminary " Eric Wong
2020-06-27 10:03 ` [PATCH 09/34] kqnotify|fake_inotify: detect Maildir write ops Eric Wong
2020-06-27 10:03 ` [PATCH 10/34] watch: remove Filesys::Notify::Simple dependency Eric Wong
2020-06-27 10:03 ` [PATCH 11/34] watch: use signalfd for Maildir watching Eric Wong
2020-06-27 19:05   ` Kyle Meyer
2020-06-27 22:32     ` Eric Wong
2020-06-27 10:03 ` [PATCH 12/34] ds: remove fields.pm usage Eric Wong
2020-06-27 10:03 ` [PATCH 13/34] watch: wire up IMAP IDLE reapers to DS Eric Wong
2020-06-27 10:03 ` [PATCH 14/34] watch: support IMAP polling Eric Wong
2020-06-27 10:03 ` [PATCH 15/34] config: support ->urlmatch method for -watch Eric Wong
2020-06-27 10:03 ` [PATCH 16/34] watch: stop importers before forking Eric Wong
2020-06-27 10:03 ` [PATCH 17/34] watch: use UID SEARCH to avoid empty UID FETCH Eric Wong
2020-06-27 10:03 ` [PATCH 18/34] ds: add_timer: allow passing arg to callback Eric Wong
2020-06-27 10:03 ` [PATCH 19/34] imaptracker: add {url} field to reduce args Eric Wong
2020-06-27 10:03 ` [PATCH 20/34] imaptracker: drop {dbname} field Eric Wong
2020-06-27 10:03 ` [PATCH 21/34] watch: avoid long transaction to IMAPTracker Eric Wong
2020-06-27 10:03 ` [PATCH 22/34] watch: support imap.fetchBatchSize parameter Eric Wong
2020-06-27 10:03 ` [PATCH 23/34] watch: imap: be quiet about disconnecting on quit Eric Wong
2020-06-27 10:03 ` [PATCH 24/34] watch: support multiple watch: directives per-inbox Eric Wong
2020-06-27 10:03 ` Eric Wong [this message]
2020-06-27 10:03 ` [PATCH 26/34] watch: just use ->urlmatch Eric Wong
2020-06-27 10:03 ` [PATCH 27/34] testcommon: $ENV{TAIL} supports non-@ARGV redirects Eric Wong
2020-06-27 10:03 ` [PATCH 28/34] watch: add NNTP support Eric Wong
2020-06-27 19:06   ` Kyle Meyer
2020-06-27 10:03 ` [PATCH 29/34] watch: show user-specified URL consistently Eric Wong
2020-06-27 10:03 ` [PATCH 30/34] watch: enable autoflush for STDOUT and STDERR Eric Wong
2020-06-27 10:03 ` [PATCH 31/34] watch: use our own "git credential" wrapper Eric Wong
2020-06-27 10:03 ` [PATCH 32/34] watch: support ~/.netrc via Net::Netrc Eric Wong
2020-06-27 10:03 ` [PATCH 33/34] imaptracker: use flock(2) around writes Eric Wong
2020-06-27 10:04 ` [PATCH 34/34] watch: simplify internal structures Eric Wong
2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
2020-06-29 10:34   ` [PATCH 1/5] watch: check for duplicates in ->over before spamcheck Eric Wong
2020-06-29 10:34   ` [PATCH 2/5] watch: show path for warnings from spam messages Eric Wong
2020-06-29 10:34   ` [PATCH 3/5] watch: ensure SIGCHLD works in forked children Eric Wong
2020-06-29 10:34   ` [PATCH 4/5] spawn: unblock SIGCHLD in subprocess Eric Wong
2020-06-29 10:34   ` [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans Eric Wong
2020-06-29 10:37     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200627100400.9871-26-e@yhbt.net \
    --to=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git