user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Eric Wong <e@yhbt.net>
Cc: meta@public-inbox.org
Subject: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox
Date: Fri, 15 May 2020 16:02:00 -0500	[thread overview]
Message-ID: <87a728c3p3.fsf_-_@x220.int.ebiederm.org> (raw)
In-Reply-To: <87ftc0c3r4.fsf_-_@x220.int.ebiederm.org> (Eric W. Biederman's message of "Fri, 15 May 2020 16:00:47 -0500")


The command imap_fetch connects to the specified imap mailbox and
fetches any unfetch messages than waits with imap idle until there are
more messages to fetch.

By default messages are placed in the specified public inbox mailbox.
The value of List-ID is consulted and if it is present used to
select an alternate public-inbox mailbox to place the messages in.

The email messages are placed without modification into the public
inbox repository so minimize changes of corruption or of loosing
valuable information.  I use the command imap_fetch for all of my
email and not just a mailling list mirror so I don't want automation
to accidentally cause something important to be lost.

No email messages are deleted from the server instead IMAPTracker
is used to remember which messages were downloaded.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 scripts/imap_fetch | 336 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 336 insertions(+)
 create mode 100755 scripts/imap_fetch

diff --git a/scripts/imap_fetch b/scripts/imap_fetch
new file mode 100755
index 000000000000..007f78a71b52
--- /dev/null
+++ b/scripts/imap_fetch
@@ -0,0 +1,336 @@
+#!/usr/bin/perl -w
+# Script to fetch IMAP messages and put then into a public-inbox
+=begin usage
+	./imap_fetch imap://username@hostname/mailbox inbox
+=cut
+use strict;
+use warnings;
+use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
+use Mail::IMAPClient;
+use IO::Socket;
+use IO::Socket::SSL;
+use File::Sync qw(sync);
+use Term::ReadKey;
+use PublicInbox::IMAPTracker;
+use PublicInbox::InboxWritable;
+use POSIX qw(strftime);
+sub usage { "Usage:\n".join('', grep(/\t/, `head -n 24 $0`)) }
+my $verify_ssl = 1;
+my %opts = (
+	'--verify-ssl!' => \$verify_ssl,
+);
+GetOptions(%opts) or die usage();
+
+my $mail_url = shift @ARGV or die usage();
+my $inbox_name = shift @ARGV or die usage();
+my $mail_hostname;
+my $mail_username;
+my $mail_password;
+my $mailbox;
+if ($mail_url =~ m$\Aimap://([^@]+)[@]([^@]+)/(.+)\z$) {
+	$mail_username = $1;
+	$mail_hostname = $2;
+	$mailbox = $3;
+} else {
+	die usage();
+}
+
+my $url = 'imap://' . $mail_username . '@' . $mail_hostname . '/' . $mailbox ;
+
+sub list_hdr_ibx($$)
+{
+	my ($config, $list_hdr) = @_;
+	my $list_id;
+	if ($list_hdr =~ m/\0/) {
+		warn("Bad List-ID: $list_hdr contains a null\n");
+		return undef;
+	} elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) {
+		$list_id = $1;
+	} else {
+		warn("Bad List-ID: $list_hdr\n");
+		return undef;
+	}
+	my $ibx = $config->lookup_list_id($list_id);
+	if (!defined($ibx)) {
+		warn("Cound not find inbox for List-ID: $list_id\n");
+	}
+
+	print(" List-ID: $list_id\n");
+	$ibx;
+}
+
+sub email_dest($$)
+{
+	my ($config, $mime) = @_;
+	my %ibxs;
+	my $hdr = $mime->header_obj;
+	my @list_hdrs = $hdr->header_raw('List-ID');
+	for my $list_hdr (@list_hdrs) {
+		my $ibx = list_hdr_ibx($config, $list_hdr);
+		if (defined($ibx)) {
+			$ibxs{$ibx->{name}} = $ibx;
+		}
+	}
+	my @ibxs = values %ibxs;
+	return @ibxs;
+}
+
+if (-t STDIN) {
+	print("Enter your imap password: ");
+	ReadMode('noecho');
+	$mail_password = ReadLine(0);
+	ReadMode('normal');
+	print("\n");
+} else {
+	print("Not a tty\n");
+	$mail_password = readline();
+}
+die("No password") unless defined($mail_password);
+chomp($mail_password);
+
+sub imap_ssl_client()
+{
+	my %ca = eval { IO::Socket::SSL::default_ca(); };
+	my $socket = IO::Socket::SSL->new(
+		PeerAddr => $mail_hostname,
+		PeerPort => 993,
+		Timeout  => 5,
+		SSL_verify_mode => $verify_ssl ? SSL_VERIFY_PEER : SSL_VERIFY_NONE,
+		%ca,
+	);
+	if (!defined($socket)) {
+		warn("Could not open socket to mailserver: $@\n");
+		return undef;
+	}
+	my $client = Mail::IMAPClient->new(
+		Socket   => $socket,
+		User     => $mail_username,
+		Password => $mail_password,
+		Timeout  => 5,
+	);
+	if (!defined($client)) {
+		warn("Could not initialize imap client $@\n");
+		$socket->close();
+		return undef;
+	}
+	if (!$client->IsConnected()) {
+		warn("LastIMAPCommand: " . $client->LastIMAPCommand . "\n");
+		warn("LastError: " . $client->LastError . "\n");
+		warn("Could not connect to the IMAP server: $@\n");
+		$client = undef;
+		$socket->close();
+		return undef;
+	}
+	if (!$client->IsAuthenticated()) {
+		warn("LastIMAPCommand: " . $client->LastIMAPCommand . "\n");
+		warn("LastError: " . $client->LastError . "\n");
+		warn("Could not authenticate against IMAP: $@\n");
+		$client->logout();
+		$client = undef;
+		$socket->close();
+		return undef;
+	}
+
+	return $client;
+}
+
+sub setup_mailbox($$)
+{
+	my ($client, $mailbox) = @_;
+
+	$client->Peek(1);
+	$client->Uid(1);
+
+	$client->select($mailbox);
+	my @results = $client->Results();
+	my $validity = undef;
+	foreach (@results) {
+		if ($_ =~ /^\* OK \[UIDVALIDITY ([0-9]+)\].*$/) {
+			$validity = $1;
+			last;
+		}
+	}
+	if (!defined($validity) && $client->IsConnected()) {
+		$validity = $client->uidvalidity($mailbox);
+	}
+	die("No uid validity for $mailbox") unless $validity;
+
+	return ($validity);
+}
+
+sub fetch_mailbox ($$$$$$)
+{
+	my ($config, $tracker, $client, $mailbox, $validity, $default_ibx) = @_;
+	my $now = time();
+	print("mailbox: $mailbox @ " .
+	      strftime("%Y-%m-%d %H:%M:%S %z", localtime(time()))
+	      . "\n");
+
+	my %importers;
+	my ($last_validity, $last_uid) = $tracker->get_last();
+
+	if (defined($last_validity) and ($validity ne $last_validity)) {
+		die ("Unexpected uid validity $validity expected $last_validity");
+	}
+
+	my $search_str="ALL";
+	if (defined($last_uid)) {
+		# Find the last seen and all higher articles
+		$search_str = "UID $last_uid:*";
+	}
+	my $uids = $client->search($search_str);
+	if (!defined($uids) || (scalar(@$uids) == 0)) {
+		print("$mailbox: No uids found for '$search_str'! $@\n");
+		return 0;
+	}
+
+	my $last = undef;
+	my @sorted_uids = sort { $a <=> $b } @$uids;
+	# Cap the number of uids to process at once
+	my $more = 0;
+	my $uid_count = scalar(@sorted_uids);
+	if ($uid_count > 100) {
+		@sorted_uids = @sorted_uids[0..99];
+		$more = $uid_count - 100;
+	}
+	for my $uid (@sorted_uids) {
+		last unless $client->IsConnected();
+
+		print("$mailbox UID: $validity $uid\n");
+		if (defined($last_uid)) {
+			if ($uid == $last_uid) {
+				next;
+			}
+			if ($uid < $last_uid) {
+				print("$mailbox: uid $uid not below last $last_uid, skipping.\n");
+				next;
+			}
+		}
+		my $email_str = $client->message_string($uid) or die "Could not message_string $@\n";
+		my $email_len = length($email_str);
+		my $mime = Email::MIME->new($email_str);
+		$mime->{-public_inbox_raw} = $email_str;
+
+		my @dests = email_dest($config, $mime);
+		if (scalar(@dests) == 0) {
+			push(@dests, $default_ibx);
+		}
+		die ("no destination for the email") unless scalar(@dests) > 0;
+		#printf("$mailbox dests: %d\n", scalar(@dests));
+		for my $ibx (@dests) {
+			my $name = $ibx->{name};
+			my $im;
+			if (exists($importers{$name})) {
+				$im = $importers{$name}->[0];
+			} else {
+				my $wibx = PublicInbox::InboxWritable->new($ibx);
+				die "no wibx" unless defined($wibx);
+				$im = $wibx->importer(1);
+				die "no im" unless defined($im);
+				my @arr = ( $im, $ibx );
+				$importers{$name} = \@arr;
+			}
+			$im->add($mime);
+		}
+		$last = $uid;
+	}
+
+	if ($last) {
+		die ("no ibx's for $tracker->{url}") unless scalar(keys %importers) > 0;
+		for my $name (keys %importers) {
+			my $ref = delete $importers{$name};
+			my ($im, $ibx) = @$ref;
+			$im->done();
+		}
+		print("updating tracker for $tracker->{url}...\n");
+		$tracker->update_last($validity, $last);
+	}
+
+	return $more;
+}
+
+
+sub fetch_mailbox_loop($)
+{
+	my ($mailbox) = @_;
+	my $config = eval { PublicInbox::Config->new };
+	die("No public inbox config found!") unless $config;
+
+	my $ibx = $config->lookup_name($inbox_name);
+	die("Public inbox $inbox_name not found!") unless defined($ibx);
+
+	my $tracker = PublicInbox::IMAPTracker->new($url);
+	my $client = imap_ssl_client() || die("No imap connection");
+	my $validity = setup_mailbox($client, $mailbox);
+
+	for (;;) {
+		return unless $client->IsConnected();
+		my $more;
+		do {
+			$more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity, $ibx);
+			return unless $client->IsConnected();
+		} while ($more > 0);
+
+		my @untagged;
+		do {
+			my $max_idle = 15;
+			$client->idle() or die("idle failed!\n");
+			my @results = $client->idle_data($max_idle*60);
+			return unless $client->IsConnected();
+			my @ret = $client->done();
+			push(@results, @ret);
+			for my $line (@results) {
+				next if (!defined($line));
+				if ($line =~ m/^[*].*$/) {
+					push(@untagged, $line);
+				}
+			}
+		} while (scalar(@untagged) == 0);
+		print("$mailbox: untagged: '@untagged'\n");
+	}
+
+	$client->close($mailbox);
+	$client->logout();
+
+}
+
+sub handle_mailbox($)
+{
+	# Run fetch_mailbox_loop in it's own separate process so
+	# that if something goes wrong the process exits
+	# and everything cleans up properly.
+	#
+	# Running fetch_mailbox_loop in an eval block is not enough
+	# to prevent leaks of locks and other resources.
+	my ($mailbox) = @_;
+
+	for (;;) {
+		my $child = fork();
+		if (!defined($child)) {
+			warn("fork failed: for $mailbox\n");
+			continue;
+		}
+		elsif ($child == 0) {
+			# in the child
+			fetch_mailbox_loop($mailbox);
+			exit(0);
+		}
+		else {
+			my $sleep = 5;
+			warn("------------------------- CHILD: $child $mailbox -------------------------\n");
+			my $pid = waitpid($child, 0);
+			if ($pid != $child) {
+				exit(1);
+			}
+			print("$mailbox done\n");
+			sync();
+			print("\n$mailbox: Sleeping for $sleep minutes\n\n");
+			sleep($sleep*60);
+		}
+	}
+	exit(2);
+}
+
+handle_mailbox($mailbox);
+
+1;
-- 
2.20.1


  reply	other threads:[~2020-05-15 21:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-29 14:40 I have figured out IMAP IDLE Eric W. Biederman
2019-10-29 22:31 ` Eric Wong
2019-10-29 23:12   ` WWW::Curl [was: I have figured out IMAP IDLE] Eric Wong
2019-11-03 16:28   ` I have figured out IMAP IDLE Eric W. Biederman
2020-05-13 19:31 ` Eric Wong
2020-05-13 21:48   ` Eric W. Biederman
2020-05-13 22:17     ` Eric Wong
2020-05-14 12:32       ` Eric W. Biederman
2020-05-14 16:15         ` Eric Wong
2020-05-15 21:00         ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
2020-05-15 21:02           ` Eric W. Biederman [this message]
2020-05-15 21:26             ` [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox Eric W. Biederman
2020-05-15 22:56               ` Eric Wong
2020-05-16 10:47                 ` Eric W. Biederman
2020-05-16 19:12                   ` Eric Wong
2020-05-16 20:09                     ` Eric W. Biederman
2020-05-16 22:53                       ` [PATCH] confine Email::MIME use even further Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a728c3p3.fsf_-_@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).