user/dev discussion of public-inbox itself
 help / color / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Eric Wong <e@80x24.org>
Cc: Alyssa Ross <hi@alyssa.is>,  meta@public-inbox.org
Subject: Re: Do I need multiple publicinbox.<name>.address values?
Date: Tue, 08 Oct 2019 17:24:33 -0500
Message-ID: <87wodec1um.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20191008221108.3wsso25kviiwd7ek@dcvr>

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> 
>> > Alyssa Ross <hi@alyssa.is> wrote:
>> >
>> >> Subject: Do I need multiple publicinbox.<name>.address values?
>> >
>> > Absolutely not
>> >
>> >> Suppose I have a mailing list, foo-discuss@example.org, and a
>> >> public-inbox set up, subscribed to that mailing list, that is subscribed
>> >> to that list as public-inbox+foo-discuss@example.org, where my MTA
>> >> delivers to public-inbox using public-inbox-mda.
>> >
>> > Currently, -mda does if you're mirroring, unfortunately.  I
>> > think Eric Biederman is/was working on List-Id support to drop
>> > that requirement, but I'm not sure where that is...
>> >
>> > Eric B: would you mind if I take List-Id support over?  I've got
>> > some hours free in the coming days(s)... (I think :x)
>> 
>> I believe I have the config side of the work done.  I haven't
>> figured out how to add this to public-inbox-mda/public-inbox-watch.
>> 
>> Let me send out what I have and then you can work on the bits
>> for public-inbox-watch public-inbox-mda.
>
> Thanks, will do.
>
>> Last round I was messing with this I almost had my imap fetcher in
>> shape. I may try again but let's get the listid thing settled first
>> if we can.
>
> Alright.
>
> Btw, would using libcurl for IMAP support be easier for you?

Right now I think I just need to make certain all of my prereqs are
merged and push the code out.  I just don't have as much time to work
on these things as I am used to so it is taking me forever to get
anything done.

For what it is worth below is my imap import script below.

The hardest part has been making certain I get the error handling
correct when unexpected errors happen.  Because errors do happen.

> I'm considering introducing libcurl support via Inline::C (it
> might be easier to hook into the event loop for other things).
>
> I'm also thinking about distributing some C via *.h header files
> so it's easier to sparse/lint the code w/o it being embedded
> inside a Perl file (I'd send those *.h files to Inline::C for
> production use).
>
>
>
> Using *.h since MakeMaker will automatically assume any *.c
> files will be made into an XS extension, and I'm not sure how to
> workaround that... (using GNU make w/o MakeMaker is an option)
>
> Whether it's Perl or C; I want to keep everything on end-user
> systems in source form, so they can just hack/update the source
> if needed instead of having to find/download/build it to
> experiment.


#<--- scripts/import_imap_mailbox ---->
#!/usr/bin/perl -w
# Script to import a IMAP mailbox into a public-inbox
=begin usage
	./import_imap_mailbox imap://username@hostname
=cut
use strict;
use warnings;
use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
use Mail::IMAPClient;
use IO::Socket;
use IO::Socket::SSL;
use Data::Dumper;
use Email::Simple;
use Email::MIME;
use File::Basename;
use File::Sync qw(sync);
use Term::ReadKey;
use PublicInbox::Config;
use PublicInbox::Address;
use PublicInbox::IMAPTracker;
use PublicInbox::InboxWritable;
use POSIX qw(strftime);
sub usage { "Usage:\n".join('', grep(/\t/, `head -n 24 $0`)) }
my $push_origin = 1;
my $sanitize = 1;
my %opts = (
	'--push-origin!' => \$push_origin,
	'--sanitize!' => \$sanitize,
);
GetOptions(%opts) or die usage();

my $mail_url = shift @ARGV or die usage();
my $mail_hostname;
my $mail_domainname;
my $mail_username;
my $mail_password;
if ($mail_url =~ m$\Aimap://([^@]+)[@]([^@]+)(/|)\z$) {
	$mail_username = $1;
	$mail_hostname = $2;
	if ($mail_hostname eq "mail.xmission.com") {
		$mail_domainname = "xmission.com";
	}
} else {
	die usage();
}

chomp(my $committer_name = `git config user.name`);
chomp(my $committer_email = `git config user.email`);
my $committer = "$committer_name <$committer_email>";

my $mail_addr     = $mail_username . '@' . $mail_domainname;
my $url_base = 'imap://' . $mail_username . '@' . $mail_hostname . '/' ;

sub list_hdr_ibx($$)
{
	my ($config, $list_hdr) = @_;
	my $list_id;
	if ($list_hdr =~ m/\0/) {
		warn("Bad List-ID: $list_hdr contains a null\n");
		return undef;
	} elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) {
		$list_id = $1;
	} else {
		warn("Bad List-ID: $list_hdr\n");
		return undef;
	}
	my $ibx = $config->lookup_list_id($list_id);
	if (!defined($ibx)) {
		warn("Cound not find inbox for List-ID: $list_id\n");
	}

	print(" List-ID: $list_id\n");
	$ibx;
}

sub deliveredto_ibx($$)
{
	my ($config, $deliveredto) = @_;
	my ($email) = PublicInbox::Address::emails($deliveredto);
	if (!defined($email)) {
		warn("No email in Delivered-To: $deliveredto\n");
		return undef;
	}
	my $ibx = $config->lookup($email);
	if (!defined($ibx)) {
		warn("Cound not find inbox for deliveredto: $email\n");
		return undef;
	}

	print(" deliveredto: $deliveredto\n");
	$ibx;
}

sub mbox_default_ibx($$) {
	my ($config, $mailbox) = @_;
	my $addr = $mail_username . '+' . $mailbox . '@' . $mail_domainname;
	if ($mailbox eq 'INBOX') {
		$addr = $mail_addr;
	}
	my $ibx = $config->lookup($addr);
	if (defined($ibx) and !defined($ibx->{mainrepo})) {
		$ibx = undef;
	}
	$ibx;
}

sub ibx_gitdir($)
{
	my ($ibx) = @_;
	my $repo = $ibx->{mainrepo};
	die("Inbox without mainrepo") unless defined($repo);
	my $git_dir = $repo;
	if (-d "$repo/git") {
		my $last = 0;
		opendir(my $dh, "$repo/git") || die("Can not open git dir $repo/git/\n");
		while(my $name = readdir($dh)) {
			if ($name =~ m/^([0-9]+).git$/) {
				if ($last < $1) {
					$last = $1;
				}
			}
		}
		closedir($dh);
		$git_dir = "$repo/git/$last.git";
	}
	return $git_dir;
}

sub email_dest($$)
{
	my ($config, $mime) = @_;
	my %ibxs;
	my $hdr = $mime->header_obj;
	my @list_hdrs = $hdr->header_raw('List-ID');
	for my $list_hdr (@list_hdrs) {
		my $ibx = list_hdr_ibx($config, $list_hdr);
		if (defined($ibx)) {
			$ibxs{$ibx->{mainrepo}} = $ibx;
		}
	}
	my @DeliveredTo = $hdr->header_raw('Delivered-To');
	for my $deliveredto (@DeliveredTo) {
		my $ibx = deliveredto_ibx($config, $deliveredto);
		if (defined($ibx)) {
			$ibxs{$ibx->{mainrepo}} = $ibx;
		}
	}
	my @ibxs = values %ibxs;
	return @ibxs;
}

sub index_inbox ($)
{
	my ($ibx) = @_;
	my $repo = $ibx->{mainrepo};
	my $pid = fork();
	if (!defined($pid)) {
		warn("public-inbox-index $repo failed to start\n");
		return;
	}
	if ($pid != 0) {
		# Wait for the process so zombies don't accumulate
		waitpid($pid, 0);
		return;
	}
	# child
	exec( 'public-inbox-index', $repo);
	warn("exec public-inbox-index $repo failed\n");
	exit(1);
}

sub imap_sleep ()
{
	my $sleep = 5;
	print "\nSleeping for $sleep minutes\n\n";
	sleep($sleep*60);
}

sub fsck_inbox ($)
{
	my ($git_dir) = @_;
	if (system(('git', '--git-dir', $git_dir, 'fsck')) != 0) {
		die "git fsck failed: $?";
	}
}

sub push_inbox ($)
{
	my ($git_dir) = @_;
	if (!$push_origin) {
		return;
	}
	print("pushing $git_dir...\n");
	if (system(('git', '--git-dir', $git_dir, 'push', 'origin')) != 0) {
		die "git push failed: $?";
	}
}

print("Enter your imap password: ");
ReadMode('noecho');
$mail_password = ReadLine(0);
ReadMode('normal');
chomp($mail_password);
print("\n");

sub fetch_mailbox ($$$$)
{
	my ($config, $tracker, $client, $mailbox) = @_;
	my $now = time();
	print("mailbox: $mailbox @ " .
	      strftime("%Y-%m-%d %H:%M:%S %z", localtime(time()))
	      . "\n");

	my $default_ibx = mbox_default_ibx($config, $mailbox);

	if (!defined($default_ibx)) {
		print("skipping $mailbox no default inbox\n");
		return 0;
	}

	my %importers;
	my $url = $url_base  . $mailbox;
	my ($last_validity, $last_uid) = $tracker->get_last($url);

	$client->select($mailbox);
	$client->Peek(1);
	$client->Uid(1);
	my $validity = $client->uidvalidity($mailbox) or die("No uid validity");
	if (defined($last_validity) and ($validity ne $last_validity)) {
		die ("Unexpected uid validity $validity expected $last_validity");
	}

	my $search_str="ALL";
	if (defined($last_uid)) {
		# Find the last seen and all higher articles
		$search_str = "UID $last_uid:*";
	}
	my $uids = $client->search($search_str);
	if (!defined($uids) || (scalar(@$uids) == 0)) {
		print("No uids found! $@\n");
		return 0;
	}

	my $last = undef;
	my @sorted_uids = sort { $a <=> $b } @$uids;
	# Cap the number of uids to process at once
	my $more = 0;
	my $uid_count = scalar(@sorted_uids);
	if ($uid_count > 100) {
		@sorted_uids = @sorted_uids[0..99];
		$more = $uid_count - 100;
	}
	for my $uid (@sorted_uids) {
		print("UID: $validity $uid\n");
		if (defined($last_uid)) {
			if ($uid == $last_uid) {
				next;
			}
			if ($uid < $last_uid) {
				print("uid $uid not below last $last_uid, skipping.\n");
				next;
			}
		}
		my $email_str = $client->message_string($uid) or die "Could not message_string $@\n";
		my $email_len = length($email_str);
		my $mime = Email::MIME->new($email_str);
		$mime->{-public_inbox_raw} = $email_str;

		my @dests = email_dest($config, $mime);
		if (scalar(@dests) == 0) {
			push(@dests, $default_ibx);
		}
		die ("no destination for the email") unless scalar(@dests) > 0;
		printf(" dests: %d\n", scalar(@dests));
		for my $ibx (@dests) {
			my $git_dir = ibx_gitdir($ibx);
			print " git_dir: $git_dir\n";
			my $im = $importers{$git_dir}->[0];
			if (!defined($im)) {
				$ibx = PublicInbox::InboxWritable->new($ibx);
				$im = $ibx->importer(1);
				die "no im" unless defined($im);
				my @arr = ( $im, $ibx );
				$importers{$git_dir} = \@arr;
			}
			if (defined($im->{mm}->{num_highwater})) {
				print "Last: $git_dir: " . $im->{mm}->{num_highwater} . "\n";
			} else {
				print "Last: $git_dir: 0\n";
			}
			$im->add($mime);
			print "This: $git_dir: " . $im->{mm}->{num_highwater} . "\n";
		}
		#$client->delete_message($uid);
		$last = $uid;
	}

	if ($last) {
		die ("no git_dirs for $url") unless scalar(keys %importers) > 0;
		for my $git_dir (keys %importers) {
			my $ref = delete $importers{$git_dir};
			my ($im, $ibx) = @$ref;
			$im->done();
			push_inbox($git_dir);
		}
		print("updating tracker for $url...\n");
		$tracker->update_last($url, $validity, $last);

		#$client->delete_message($uids) or die ("Could not delete messages");
	}

	$client->close($mailbox);
	return $more;
}

sub fetch_mail ()
{
	my $config = eval { PublicInbox::Config->new };
	die("No public inbox config found!") unless $config;

	my $tracker = PublicInbox::IMAPTracker->new();

	my $socket = IO::Socket::SSL->new(
		PeerAddr => $mail_hostname,
		PeerPort => 993,
		Timeout  => 5,

		SSL_verify_mode => SSL_VERIFY_PEER,
		IO::Socket::SSL::default_ca(),
	);
	if (!defined($socket)) {
		die("Could not open socket to mailserver: $@\n");
	}

	my $client = Mail::IMAPClient->new(
		Socket   => $socket,
		User     => $mail_username,
		Password => $mail_password,
		Timeout  => 5,
	    );
	if (!defined($client)) {
		die("Could not initialize imap client $@\n");
	}
	if (!$client->IsAuthenticated()) {
		die("Could not authenticate against IMAP: $@\n");
	}
	if (!$client->IsConnected()) {
		die("Could not connect to the IMAP server: $@\n");
	}

	my $mailboxes = $client->folders();
	my @sorted_mailboxes = sort { $a  cmp $b } @$mailboxes;

	my $more;
	do {
		$more = 0;
		for my $mailbox (@sorted_mailboxes) {
			$more += fetch_mailbox($config, $tracker, $client, $mailbox);
		}
	} while ($more > 0);

	$client->logout();
}

sub relevant_inbox
{
	my ($ibx) = @_;
	# Verify the mailbox is one that would come from the server
	my $lc_user = lc($mail_username);
	my $lc_domain = lc($mail_domainname);
	foreach (@{$ibx->{address}}) {
		my $lc_addr = lc($_);
		if ($lc_addr =~ m/${lc_user}[+][^@]+[@]${lc_domain}/) {
			return 1;
		}
	}
	return 0;
}

sub sanitize_inboxes ()
{
	my $config = eval { PublicInbox::Config->new };
	die("No public inbox config found!") unless $config;

	$config->each_inbox(
		sub {
			my ($ibx) = @_;

			return unless relevant_inbox($ibx);

			print $ibx->{name} . "\n";
			my $git_dir = ibx_gitdir($ibx);
			fsck_inbox($git_dir);
			eval { push_inbox($git_dir); };
			index_inbox($ibx);
		}
	);
}

if ($sanitize) {
	sanitize_inboxes();
	sync();
}

for (;;imap_sleep()) {
	# Run fetch_mail in it's own separate process so
	# that if something goes wrong the process exits
	# and everything cleans up properly.
	#
	# Running fetch_mail in an eval block is not enough
	# to prevent leaks of locks and other resources.
	my $pid = fork();
	if (!defined($pid)) {
		warn("fork failed:\n");
		continue;
	}
	elsif ($pid == 0) {
		# in the child
		fetch_mail();
		exit(0);
	}
	else {
		warn("-------------------- CHILD: $pid --------------------\n");
		waitpid($pid, 0);
		sync();
	}
}

1;

  reply index

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-07 22:13 Alyssa Ross
2019-10-08  0:10 ` Eric Wong
2019-10-08 12:18   ` ebiederm
2019-10-08 12:23     ` [PATCH] Config.pm: Add support for mailing list information ebiederm
2019-10-08 22:11     ` Do I need multiple publicinbox.<name>.address values? Eric Wong
2019-10-08 22:24       ` ebiederm [this message]
2019-10-08 22:41         ` Eric Wong
2019-10-09  7:58           ` ebiederm
2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox ebiederm
2019-10-09  8:16               ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add ebiederm
2019-10-09  8:17               ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order ebiederm
2019-10-10  9:43                 ` Eric Wong
2019-10-10 11:05                   ` ebiederm
2019-10-09  8:23               ` [PATCH 3/4] Config.pm: Add support for looking up repos by their directories ebiederm
2019-10-09  8:25               ` [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes ebiederm
2019-10-10 19:08               ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong
2019-10-10 21:23                 ` ebiederm
2019-10-10  8:31             ` Do I need multiple publicinbox.<name>.address values? Eric Wong
2019-10-10 10:56               ` ebiederm
2019-10-09 11:59   ` Alyssa Ross
2019-10-10 10:06     ` Eric Wong

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wodec1um.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=e@80x24.org \
    --cc=hi@alyssa.is \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox