From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-2.4 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,URIBL_DBL_SPAM shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 9931C1F4BD; Tue, 8 Oct 2019 22:25:21 +0000 (UTC) Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1iHxux-0003oo-TB; Tue, 08 Oct 2019 16:25:20 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1iHxuv-0001tT-QM; Tue, 08 Oct 2019 16:25:19 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: Alyssa Ross , meta@public-inbox.org References: <87imp05hlm.fsf@alyssa.is> <20191008001050.rwd7bh7cek7qrydi@dcvr> <87wodfctwd.fsf@x220.int.ebiederm.org> <20191008221108.3wsso25kviiwd7ek@dcvr> Date: Tue, 08 Oct 2019 17:24:33 -0500 In-Reply-To: <20191008221108.3wsso25kviiwd7ek@dcvr> (Eric Wong's message of "Tue, 8 Oct 2019 22:11:08 +0000") Message-ID: <87wodec1um.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1iHxuv-0001tT-QM;;;mid=<87wodec1um.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/HV3Fi1zeTCJXbwfjb/gCJQ3yUvLtSzPc= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: Do I need multiple publicinbox..address values? X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) List-Id: Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> >> > Alyssa Ross wrote: >> > >> >> Subject: Do I need multiple publicinbox..address values? >> > >> > Absolutely not >> > >> >> Suppose I have a mailing list, foo-discuss@example.org, and a >> >> public-inbox set up, subscribed to that mailing list, that is subscribed >> >> to that list as public-inbox+foo-discuss@example.org, where my MTA >> >> delivers to public-inbox using public-inbox-mda. >> > >> > Currently, -mda does if you're mirroring, unfortunately. I >> > think Eric Biederman is/was working on List-Id support to drop >> > that requirement, but I'm not sure where that is... >> > >> > Eric B: would you mind if I take List-Id support over? I've got >> > some hours free in the coming days(s)... (I think :x) >> >> I believe I have the config side of the work done. I haven't >> figured out how to add this to public-inbox-mda/public-inbox-watch. >> >> Let me send out what I have and then you can work on the bits >> for public-inbox-watch public-inbox-mda. > > Thanks, will do. > >> Last round I was messing with this I almost had my imap fetcher in >> shape. I may try again but let's get the listid thing settled first >> if we can. > > Alright. > > Btw, would using libcurl for IMAP support be easier for you? Right now I think I just need to make certain all of my prereqs are merged and push the code out. I just don't have as much time to work on these things as I am used to so it is taking me forever to get anything done. For what it is worth below is my imap import script below. The hardest part has been making certain I get the error handling correct when unexpected errors happen. Because errors do happen. > I'm considering introducing libcurl support via Inline::C (it > might be easier to hook into the event loop for other things). > > I'm also thinking about distributing some C via *.h header files > so it's easier to sparse/lint the code w/o it being embedded > inside a Perl file (I'd send those *.h files to Inline::C for > production use). > > > > Using *.h since MakeMaker will automatically assume any *.c > files will be made into an XS extension, and I'm not sure how to > workaround that... (using GNU make w/o MakeMaker is an option) > > Whether it's Perl or C; I want to keep everything on end-user > systems in source form, so they can just hack/update the source > if needed instead of having to find/download/build it to > experiment. #<--- scripts/import_imap_mailbox ----> #!/usr/bin/perl -w # Script to import a IMAP mailbox into a public-inbox =begin usage ./import_imap_mailbox imap://username@hostname =cut use strict; use warnings; use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev); use Mail::IMAPClient; use IO::Socket; use IO::Socket::SSL; use Data::Dumper; use Email::Simple; use Email::MIME; use File::Basename; use File::Sync qw(sync); use Term::ReadKey; use PublicInbox::Config; use PublicInbox::Address; use PublicInbox::IMAPTracker; use PublicInbox::InboxWritable; use POSIX qw(strftime); sub usage { "Usage:\n".join('', grep(/\t/, `head -n 24 $0`)) } my $push_origin = 1; my $sanitize = 1; my %opts = ( '--push-origin!' => \$push_origin, '--sanitize!' => \$sanitize, ); GetOptions(%opts) or die usage(); my $mail_url = shift @ARGV or die usage(); my $mail_hostname; my $mail_domainname; my $mail_username; my $mail_password; if ($mail_url =~ m$\Aimap://([^@]+)[@]([^@]+)(/|)\z$) { $mail_username = $1; $mail_hostname = $2; if ($mail_hostname eq "mail.xmission.com") { $mail_domainname = "xmission.com"; } } else { die usage(); } chomp(my $committer_name = `git config user.name`); chomp(my $committer_email = `git config user.email`); my $committer = "$committer_name <$committer_email>"; my $mail_addr = $mail_username . '@' . $mail_domainname; my $url_base = 'imap://' . $mail_username . '@' . $mail_hostname . '/' ; sub list_hdr_ibx($$) { my ($config, $list_hdr) = @_; my $list_id; if ($list_hdr =~ m/\0/) { warn("Bad List-ID: $list_hdr contains a null\n"); return undef; } elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) { $list_id = $1; } else { warn("Bad List-ID: $list_hdr\n"); return undef; } my $ibx = $config->lookup_list_id($list_id); if (!defined($ibx)) { warn("Cound not find inbox for List-ID: $list_id\n"); } print(" List-ID: $list_id\n"); $ibx; } sub deliveredto_ibx($$) { my ($config, $deliveredto) = @_; my ($email) = PublicInbox::Address::emails($deliveredto); if (!defined($email)) { warn("No email in Delivered-To: $deliveredto\n"); return undef; } my $ibx = $config->lookup($email); if (!defined($ibx)) { warn("Cound not find inbox for deliveredto: $email\n"); return undef; } print(" deliveredto: $deliveredto\n"); $ibx; } sub mbox_default_ibx($$) { my ($config, $mailbox) = @_; my $addr = $mail_username . '+' . $mailbox . '@' . $mail_domainname; if ($mailbox eq 'INBOX') { $addr = $mail_addr; } my $ibx = $config->lookup($addr); if (defined($ibx) and !defined($ibx->{mainrepo})) { $ibx = undef; } $ibx; } sub ibx_gitdir($) { my ($ibx) = @_; my $repo = $ibx->{mainrepo}; die("Inbox without mainrepo") unless defined($repo); my $git_dir = $repo; if (-d "$repo/git") { my $last = 0; opendir(my $dh, "$repo/git") || die("Can not open git dir $repo/git/\n"); while(my $name = readdir($dh)) { if ($name =~ m/^([0-9]+).git$/) { if ($last < $1) { $last = $1; } } } closedir($dh); $git_dir = "$repo/git/$last.git"; } return $git_dir; } sub email_dest($$) { my ($config, $mime) = @_; my %ibxs; my $hdr = $mime->header_obj; my @list_hdrs = $hdr->header_raw('List-ID'); for my $list_hdr (@list_hdrs) { my $ibx = list_hdr_ibx($config, $list_hdr); if (defined($ibx)) { $ibxs{$ibx->{mainrepo}} = $ibx; } } my @DeliveredTo = $hdr->header_raw('Delivered-To'); for my $deliveredto (@DeliveredTo) { my $ibx = deliveredto_ibx($config, $deliveredto); if (defined($ibx)) { $ibxs{$ibx->{mainrepo}} = $ibx; } } my @ibxs = values %ibxs; return @ibxs; } sub index_inbox ($) { my ($ibx) = @_; my $repo = $ibx->{mainrepo}; my $pid = fork(); if (!defined($pid)) { warn("public-inbox-index $repo failed to start\n"); return; } if ($pid != 0) { # Wait for the process so zombies don't accumulate waitpid($pid, 0); return; } # child exec( 'public-inbox-index', $repo); warn("exec public-inbox-index $repo failed\n"); exit(1); } sub imap_sleep () { my $sleep = 5; print "\nSleeping for $sleep minutes\n\n"; sleep($sleep*60); } sub fsck_inbox ($) { my ($git_dir) = @_; if (system(('git', '--git-dir', $git_dir, 'fsck')) != 0) { die "git fsck failed: $?"; } } sub push_inbox ($) { my ($git_dir) = @_; if (!$push_origin) { return; } print("pushing $git_dir...\n"); if (system(('git', '--git-dir', $git_dir, 'push', 'origin')) != 0) { die "git push failed: $?"; } } print("Enter your imap password: "); ReadMode('noecho'); $mail_password = ReadLine(0); ReadMode('normal'); chomp($mail_password); print("\n"); sub fetch_mailbox ($$$$) { my ($config, $tracker, $client, $mailbox) = @_; my $now = time(); print("mailbox: $mailbox @ " . strftime("%Y-%m-%d %H:%M:%S %z", localtime(time())) . "\n"); my $default_ibx = mbox_default_ibx($config, $mailbox); if (!defined($default_ibx)) { print("skipping $mailbox no default inbox\n"); return 0; } my %importers; my $url = $url_base . $mailbox; my ($last_validity, $last_uid) = $tracker->get_last($url); $client->select($mailbox); $client->Peek(1); $client->Uid(1); my $validity = $client->uidvalidity($mailbox) or die("No uid validity"); if (defined($last_validity) and ($validity ne $last_validity)) { die ("Unexpected uid validity $validity expected $last_validity"); } my $search_str="ALL"; if (defined($last_uid)) { # Find the last seen and all higher articles $search_str = "UID $last_uid:*"; } my $uids = $client->search($search_str); if (!defined($uids) || (scalar(@$uids) == 0)) { print("No uids found! $@\n"); return 0; } my $last = undef; my @sorted_uids = sort { $a <=> $b } @$uids; # Cap the number of uids to process at once my $more = 0; my $uid_count = scalar(@sorted_uids); if ($uid_count > 100) { @sorted_uids = @sorted_uids[0..99]; $more = $uid_count - 100; } for my $uid (@sorted_uids) { print("UID: $validity $uid\n"); if (defined($last_uid)) { if ($uid == $last_uid) { next; } if ($uid < $last_uid) { print("uid $uid not below last $last_uid, skipping.\n"); next; } } my $email_str = $client->message_string($uid) or die "Could not message_string $@\n"; my $email_len = length($email_str); my $mime = Email::MIME->new($email_str); $mime->{-public_inbox_raw} = $email_str; my @dests = email_dest($config, $mime); if (scalar(@dests) == 0) { push(@dests, $default_ibx); } die ("no destination for the email") unless scalar(@dests) > 0; printf(" dests: %d\n", scalar(@dests)); for my $ibx (@dests) { my $git_dir = ibx_gitdir($ibx); print " git_dir: $git_dir\n"; my $im = $importers{$git_dir}->[0]; if (!defined($im)) { $ibx = PublicInbox::InboxWritable->new($ibx); $im = $ibx->importer(1); die "no im" unless defined($im); my @arr = ( $im, $ibx ); $importers{$git_dir} = \@arr; } if (defined($im->{mm}->{num_highwater})) { print "Last: $git_dir: " . $im->{mm}->{num_highwater} . "\n"; } else { print "Last: $git_dir: 0\n"; } $im->add($mime); print "This: $git_dir: " . $im->{mm}->{num_highwater} . "\n"; } #$client->delete_message($uid); $last = $uid; } if ($last) { die ("no git_dirs for $url") unless scalar(keys %importers) > 0; for my $git_dir (keys %importers) { my $ref = delete $importers{$git_dir}; my ($im, $ibx) = @$ref; $im->done(); push_inbox($git_dir); } print("updating tracker for $url...\n"); $tracker->update_last($url, $validity, $last); #$client->delete_message($uids) or die ("Could not delete messages"); } $client->close($mailbox); return $more; } sub fetch_mail () { my $config = eval { PublicInbox::Config->new }; die("No public inbox config found!") unless $config; my $tracker = PublicInbox::IMAPTracker->new(); my $socket = IO::Socket::SSL->new( PeerAddr => $mail_hostname, PeerPort => 993, Timeout => 5, SSL_verify_mode => SSL_VERIFY_PEER, IO::Socket::SSL::default_ca(), ); if (!defined($socket)) { die("Could not open socket to mailserver: $@\n"); } my $client = Mail::IMAPClient->new( Socket => $socket, User => $mail_username, Password => $mail_password, Timeout => 5, ); if (!defined($client)) { die("Could not initialize imap client $@\n"); } if (!$client->IsAuthenticated()) { die("Could not authenticate against IMAP: $@\n"); } if (!$client->IsConnected()) { die("Could not connect to the IMAP server: $@\n"); } my $mailboxes = $client->folders(); my @sorted_mailboxes = sort { $a cmp $b } @$mailboxes; my $more; do { $more = 0; for my $mailbox (@sorted_mailboxes) { $more += fetch_mailbox($config, $tracker, $client, $mailbox); } } while ($more > 0); $client->logout(); } sub relevant_inbox { my ($ibx) = @_; # Verify the mailbox is one that would come from the server my $lc_user = lc($mail_username); my $lc_domain = lc($mail_domainname); foreach (@{$ibx->{address}}) { my $lc_addr = lc($_); if ($lc_addr =~ m/${lc_user}[+][^@]+[@]${lc_domain}/) { return 1; } } return 0; } sub sanitize_inboxes () { my $config = eval { PublicInbox::Config->new }; die("No public inbox config found!") unless $config; $config->each_inbox( sub { my ($ibx) = @_; return unless relevant_inbox($ibx); print $ibx->{name} . "\n"; my $git_dir = ibx_gitdir($ibx); fsck_inbox($git_dir); eval { push_inbox($git_dir); }; index_inbox($ibx); } ); } if ($sanitize) { sanitize_inboxes(); sync(); } for (;;imap_sleep()) { # Run fetch_mail in it's own separate process so # that if something goes wrong the process exits # and everything cleans up properly. # # Running fetch_mail in an eval block is not enough # to prevent leaks of locks and other resources. my $pid = fork(); if (!defined($pid)) { warn("fork failed:\n"); continue; } elsif ($pid == 0) { # in the child fetch_mail(); exit(0); } else { warn("-------------------- CHILD: $pid --------------------\n"); waitpid($pid, 0); sync(); } } 1;