* Do I need multiple publicinbox.<name>.address values?
@ 2019-10-07 22:13 Alyssa Ross
2019-10-08 0:10 ` Eric Wong
0 siblings, 1 reply; 23+ messages in thread
From: Alyssa Ross @ 2019-10-07 22:13 UTC (permalink / raw)
To: meta
[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]
Suppose I have a mailing list, foo-discuss@example.org, and a
public-inbox set up, subscribed to that mailing list, that is subscribed
to that list as public-inbox+foo-discuss@example.org, where my MTA
delivers to public-inbox using public-inbox-mda.
In this case, mail will be To: foo-discuss@example.org, but RCPT TO (and
therefore ORIGINAL_RECIPIENT) public-inbox+foo-discuss@example.org. In
my testing, public-inbox requires *both* the To: header and
ORIGINAL_RECIPIENT to match publicinbox.<name>.address.
But, public-inbox-config(7) has this to say for
publicinbox.<name>.address:
> The email address of the public-inbox. May be specified
> more than once for merging multiple mailing lists (or migrating
> to new addresses).
Nothing in this suggests to me that I would need to set it more than
once for my simple use case.
So my question is, am I overcomplicating this -- should I be able to set
a single value for publicinbox.<name>.address that will get my
public-inbox to accept mail from the list? Or, alternatively, should a
sentence be added to that description explaining that multiple addresses
will also be required for the case where public-inbox is subscribed
itself to a mailing list?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-07 22:13 Do I need multiple publicinbox.<name>.address values? Alyssa Ross @ 2019-10-08 0:10 ` Eric Wong 2019-10-08 12:18 ` Eric W. Biederman 2019-10-09 11:59 ` Alyssa Ross 0 siblings, 2 replies; 23+ messages in thread From: Eric Wong @ 2019-10-08 0:10 UTC (permalink / raw) To: Alyssa Ross; +Cc: meta, Eric W. Biederman Alyssa Ross <hi@alyssa.is> wrote: > Subject: Do I need multiple publicinbox.<name>.address values? Absolutely not > Suppose I have a mailing list, foo-discuss@example.org, and a > public-inbox set up, subscribed to that mailing list, that is subscribed > to that list as public-inbox+foo-discuss@example.org, where my MTA > delivers to public-inbox using public-inbox-mda. Currently, -mda does if you're mirroring, unfortunately. I think Eric Biederman is/was working on List-Id support to drop that requirement, but I'm not sure where that is... Eric B: would you mind if I take List-Id support over? I've got some hours free in the coming days(s)... (I think :x) > In this case, mail will be To: foo-discuss@example.org, but RCPT TO (and > therefore ORIGINAL_RECIPIENT) public-inbox+foo-discuss@example.org. In > my testing, public-inbox requires *both* the To: header and > ORIGINAL_RECIPIENT to match publicinbox.<name>.address. Yes, but To:/Cc: and ORIGINAL_RECIPIENT can be different. lore.kernel.org uses two addresses: one for the well-known address (e.g. linux-kernel@vger.kernel.org) and the other something@something.kernel.org, which nobody (should) be sending to. > But, public-inbox-config(7) has this to say for > publicinbox.<name>.address: > > The email address of the public-inbox. May be specified > > more than once for merging multiple mailing lists (or migrating > > to new addresses). > Nothing in this suggests to me that I would need to set it more than > once for my simple use case. Sorry for that confusion. Historically, I wrote public-inbox and -mda because I needed to migrate mailing lists with a several month window where the old list would be active, but I was preparing for the switch: So it started as: posters -> to_be_shutdown_host -> MTA (postfix) -> public-inbox-mda And ends up being: posters -> MTA (postfix) -> public-inbox-mda --[1]--> (mlmmj|mailman) It sounds like what you're doing is: MTA -> (mlmmj|mailman) -> public-inbox-mda Which wasn't my original intended usecase for -mda, but is for -watch. > So my question is, am I overcomplicating this -- should I be able to set > a single value for publicinbox.<name>.address that will get my > public-inbox to accept mail from the list? Or, alternatively, should a > sentence be added to that description explaining that multiple addresses > will also be required for the case where public-inbox is subscribed > itself to a mailing list? Generally, I use -watch exclusively when mirroring lists. With -watch, you won't need to set extra addresses. But apparently some folks prefer -mda for everything. So I guess we'll have to better support that, too (via List-Id). (*) via scripts/ssoma-replay, which I intend to replace with an nntpd-based replacement at some point for v2 compatibility. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-08 0:10 ` Eric Wong @ 2019-10-08 12:18 ` Eric W. Biederman 2019-10-08 12:23 ` [PATCH] Config.pm: Add support for mailing list information Eric W. Biederman 2019-10-08 22:11 ` Do I need multiple publicinbox.<name>.address values? Eric Wong 2019-10-09 11:59 ` Alyssa Ross 1 sibling, 2 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-08 12:18 UTC (permalink / raw) To: Eric Wong; +Cc: Alyssa Ross, meta Eric Wong <e@80x24.org> writes: > Alyssa Ross <hi@alyssa.is> wrote: > >> Subject: Do I need multiple publicinbox.<name>.address values? > > Absolutely not > >> Suppose I have a mailing list, foo-discuss@example.org, and a >> public-inbox set up, subscribed to that mailing list, that is subscribed >> to that list as public-inbox+foo-discuss@example.org, where my MTA >> delivers to public-inbox using public-inbox-mda. > > Currently, -mda does if you're mirroring, unfortunately. I > think Eric Biederman is/was working on List-Id support to drop > that requirement, but I'm not sure where that is... > > Eric B: would you mind if I take List-Id support over? I've got > some hours free in the coming days(s)... (I think :x) I believe I have the config side of the work done. I haven't figured out how to add this to public-inbox-mda/public-inbox-watch. Let me send out what I have and then you can work on the bits for public-inbox-watch public-inbox-mda. Last round I was messing with this I almost had my imap fetcher in shape. I may try again but let's get the listid thing settled first if we can. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Config.pm: Add support for mailing list information 2019-10-08 12:18 ` Eric W. Biederman @ 2019-10-08 12:23 ` Eric W. Biederman 2019-10-08 22:11 ` Do I need multiple publicinbox.<name>.address values? Eric Wong 1 sibling, 0 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-08 12:23 UTC (permalink / raw) To: Eric Wong; +Cc: Alyssa Ross, meta Date: Thu, 16 May 2019 19:22:46 -0500 The world has turned since I first started following mailing lists and to my surprise every mailing list that I am subscribed to properly sets the "List-ID:" mailing list header. So instead of doing something clever and flexible I am adding support for looking up public inbox mailing lists by their mailing list name. That makes the work needed for each email trivial and easy to understand. - Parse the "List-ID:" header. - Lookup in the configuration which mailbox is connected to that "List-ID:" - Deliver the mail to that mailbox. To that end this change enhances PublicInbox to have an additional mailbox configuration parameter "listid" that holds the mailing list name. A method is added to the PublicInbox config object called lookup_list_id that given a mailing list name will return the PublicInbox in the configuration that is configured to handle that mailing list. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- The relevant snippet from my imap import program looks like: sub list_hdr_ibx($$) { my ($config, $list_hdr) = @_; my $list_id; if ($list_hdr =~ m/\0/) { warn("Bad List-ID: $list_hdr contains a null\n"); return undef; } elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) { $list_id = $1; } else { warn("Bad List-ID: $list_hdr\n"); return undef; } my $ibx = $config->lookup_list_id($list_id); if (!defined($ibx)) { warn("Cound not find inbox for List-ID: $list_id\n"); } print(" List-ID: $list_id\n"); $ibx; } sub email_dest($$) { my ($config, $mime) = @_; my %ibxs; my $hdr = $mime->header_obj; my @list_hdrs = $hdr->header_raw('List-ID'); for my $list_hdr (@list_hdrs) { my $ibx = list_hdr_ibx($config, $list_hdr); if (defined($ibx)) { $ibxs{$ibx->{mainrepo}} = $ibx; } } my @ibxs = values %ibxs; return @ibxs; } lib/PublicInbox/Config.pm | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index 4fcb20d24437..9f3f8df7eeaa 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -25,6 +25,7 @@ sub new { # caches $self->{-by_addr} ||= {}; + $self->{-by_list_id} ||= {}; $self->{-by_name} ||= {}; $self->{-by_newsgroup} ||= {}; $self->{-no_obfuscate} ||= {}; @@ -84,6 +85,33 @@ sub lookup { _fill($self, $pfx); } +sub lookup_list_id { + my ($self, $list_id) = @_; + $list_id = lc($list_id); + my $ibx = $self->{-by_list_id}->{$list_id}; + return $ibx if $ibx; + + my $pfx; + + foreach my $k (keys %$self) { + $k =~ /\A(publicinbox\.[\w-]+)\.listid\z/ or next; + my $v = $self->{$k}; + if (ref($v) eq "ARRAY") { + foreach my $alias (@$v) { + (lc($alias) eq $list_id) or next; + $pfx = $1; + last; + } + } else { + (lc($v) eq $list_id) or next; + $pfx = $1; + last; + } + } + defined $pfx or return; + _fill($self, $pfx); +} + sub lookup_name ($$) { my ($self, $name) = @_; $self->{-by_name}->{$name} || _fill($self, "publicinbox.$name"); @@ -398,7 +426,7 @@ sub _fill { } # TODO: more arrays, we should support multi-value for # more things to encourage decentralization - foreach my $k (qw(address altid nntpmirror coderepo hide)) { + foreach my $k (qw(address altid nntpmirror coderepo hide listid)) { if (defined(my $v = $self->{"$pfx.$k"})) { $ibx->{$k} = _array($v); } @@ -421,6 +449,9 @@ sub _fill { $self->{-by_addr}->{$lc_addr} = $ibx; $self->{-no_obfuscate}->{$lc_addr} = 1; } + foreach my $list_id (@{$ibx->{listid}}) { + $self->{-by_list_id}->{$list_id} = $ibx; + } if (my $ng = $ibx->{newsgroup}) { $self->{-by_newsgroup}->{$ng} = $ibx; } -- 2.23.0 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-08 12:18 ` Eric W. Biederman 2019-10-08 12:23 ` [PATCH] Config.pm: Add support for mailing list information Eric W. Biederman @ 2019-10-08 22:11 ` Eric Wong 2019-10-08 22:24 ` Eric W. Biederman 1 sibling, 1 reply; 23+ messages in thread From: Eric Wong @ 2019-10-08 22:11 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alyssa Ross, meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong <e@80x24.org> writes: > > > Alyssa Ross <hi@alyssa.is> wrote: > > > >> Subject: Do I need multiple publicinbox.<name>.address values? > > > > Absolutely not > > > >> Suppose I have a mailing list, foo-discuss@example.org, and a > >> public-inbox set up, subscribed to that mailing list, that is subscribed > >> to that list as public-inbox+foo-discuss@example.org, where my MTA > >> delivers to public-inbox using public-inbox-mda. > > > > Currently, -mda does if you're mirroring, unfortunately. I > > think Eric Biederman is/was working on List-Id support to drop > > that requirement, but I'm not sure where that is... > > > > Eric B: would you mind if I take List-Id support over? I've got > > some hours free in the coming days(s)... (I think :x) > > I believe I have the config side of the work done. I haven't > figured out how to add this to public-inbox-mda/public-inbox-watch. > > Let me send out what I have and then you can work on the bits > for public-inbox-watch public-inbox-mda. Thanks, will do. > Last round I was messing with this I almost had my imap fetcher in > shape. I may try again but let's get the listid thing settled first > if we can. Alright. Btw, would using libcurl for IMAP support be easier for you? I'm considering introducing libcurl support via Inline::C (it might be easier to hook into the event loop for other things). I'm also thinking about distributing some C via *.h header files so it's easier to sparse/lint the code w/o it being embedded inside a Perl file (I'd send those *.h files to Inline::C for production use). Using *.h since MakeMaker will automatically assume any *.c files will be made into an XS extension, and I'm not sure how to workaround that... (using GNU make w/o MakeMaker is an option) Whether it's Perl or C; I want to keep everything on end-user systems in source form, so they can just hack/update the source if needed instead of having to find/download/build it to experiment. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-08 22:11 ` Do I need multiple publicinbox.<name>.address values? Eric Wong @ 2019-10-08 22:24 ` Eric W. Biederman 2019-10-08 22:41 ` Eric Wong 0 siblings, 1 reply; 23+ messages in thread From: Eric W. Biederman @ 2019-10-08 22:24 UTC (permalink / raw) To: Eric Wong; +Cc: Alyssa Ross, meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> Eric Wong <e@80x24.org> writes: >> >> > Alyssa Ross <hi@alyssa.is> wrote: >> > >> >> Subject: Do I need multiple publicinbox.<name>.address values? >> > >> > Absolutely not >> > >> >> Suppose I have a mailing list, foo-discuss@example.org, and a >> >> public-inbox set up, subscribed to that mailing list, that is subscribed >> >> to that list as public-inbox+foo-discuss@example.org, where my MTA >> >> delivers to public-inbox using public-inbox-mda. >> > >> > Currently, -mda does if you're mirroring, unfortunately. I >> > think Eric Biederman is/was working on List-Id support to drop >> > that requirement, but I'm not sure where that is... >> > >> > Eric B: would you mind if I take List-Id support over? I've got >> > some hours free in the coming days(s)... (I think :x) >> >> I believe I have the config side of the work done. I haven't >> figured out how to add this to public-inbox-mda/public-inbox-watch. >> >> Let me send out what I have and then you can work on the bits >> for public-inbox-watch public-inbox-mda. > > Thanks, will do. > >> Last round I was messing with this I almost had my imap fetcher in >> shape. I may try again but let's get the listid thing settled first >> if we can. > > Alright. > > Btw, would using libcurl for IMAP support be easier for you? Right now I think I just need to make certain all of my prereqs are merged and push the code out. I just don't have as much time to work on these things as I am used to so it is taking me forever to get anything done. For what it is worth below is my imap import script below. The hardest part has been making certain I get the error handling correct when unexpected errors happen. Because errors do happen. > I'm considering introducing libcurl support via Inline::C (it > might be easier to hook into the event loop for other things). > > I'm also thinking about distributing some C via *.h header files > so it's easier to sparse/lint the code w/o it being embedded > inside a Perl file (I'd send those *.h files to Inline::C for > production use). > > > > Using *.h since MakeMaker will automatically assume any *.c > files will be made into an XS extension, and I'm not sure how to > workaround that... (using GNU make w/o MakeMaker is an option) > > Whether it's Perl or C; I want to keep everything on end-user > systems in source form, so they can just hack/update the source > if needed instead of having to find/download/build it to > experiment. #<--- scripts/import_imap_mailbox ----> #!/usr/bin/perl -w # Script to import a IMAP mailbox into a public-inbox =begin usage ./import_imap_mailbox imap://username@hostname =cut use strict; use warnings; use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev); use Mail::IMAPClient; use IO::Socket; use IO::Socket::SSL; use Data::Dumper; use Email::Simple; use Email::MIME; use File::Basename; use File::Sync qw(sync); use Term::ReadKey; use PublicInbox::Config; use PublicInbox::Address; use PublicInbox::IMAPTracker; use PublicInbox::InboxWritable; use POSIX qw(strftime); sub usage { "Usage:\n".join('', grep(/\t/, `head -n 24 $0`)) } my $push_origin = 1; my $sanitize = 1; my %opts = ( '--push-origin!' => \$push_origin, '--sanitize!' => \$sanitize, ); GetOptions(%opts) or die usage(); my $mail_url = shift @ARGV or die usage(); my $mail_hostname; my $mail_domainname; my $mail_username; my $mail_password; if ($mail_url =~ m$\Aimap://([^@]+)[@]([^@]+)(/|)\z$) { $mail_username = $1; $mail_hostname = $2; if ($mail_hostname eq "mail.xmission.com") { $mail_domainname = "xmission.com"; } } else { die usage(); } chomp(my $committer_name = `git config user.name`); chomp(my $committer_email = `git config user.email`); my $committer = "$committer_name <$committer_email>"; my $mail_addr = $mail_username . '@' . $mail_domainname; my $url_base = 'imap://' . $mail_username . '@' . $mail_hostname . '/' ; sub list_hdr_ibx($$) { my ($config, $list_hdr) = @_; my $list_id; if ($list_hdr =~ m/\0/) { warn("Bad List-ID: $list_hdr contains a null\n"); return undef; } elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) { $list_id = $1; } else { warn("Bad List-ID: $list_hdr\n"); return undef; } my $ibx = $config->lookup_list_id($list_id); if (!defined($ibx)) { warn("Cound not find inbox for List-ID: $list_id\n"); } print(" List-ID: $list_id\n"); $ibx; } sub deliveredto_ibx($$) { my ($config, $deliveredto) = @_; my ($email) = PublicInbox::Address::emails($deliveredto); if (!defined($email)) { warn("No email in Delivered-To: $deliveredto\n"); return undef; } my $ibx = $config->lookup($email); if (!defined($ibx)) { warn("Cound not find inbox for deliveredto: $email\n"); return undef; } print(" deliveredto: $deliveredto\n"); $ibx; } sub mbox_default_ibx($$) { my ($config, $mailbox) = @_; my $addr = $mail_username . '+' . $mailbox . '@' . $mail_domainname; if ($mailbox eq 'INBOX') { $addr = $mail_addr; } my $ibx = $config->lookup($addr); if (defined($ibx) and !defined($ibx->{mainrepo})) { $ibx = undef; } $ibx; } sub ibx_gitdir($) { my ($ibx) = @_; my $repo = $ibx->{mainrepo}; die("Inbox without mainrepo") unless defined($repo); my $git_dir = $repo; if (-d "$repo/git") { my $last = 0; opendir(my $dh, "$repo/git") || die("Can not open git dir $repo/git/\n"); while(my $name = readdir($dh)) { if ($name =~ m/^([0-9]+).git$/) { if ($last < $1) { $last = $1; } } } closedir($dh); $git_dir = "$repo/git/$last.git"; } return $git_dir; } sub email_dest($$) { my ($config, $mime) = @_; my %ibxs; my $hdr = $mime->header_obj; my @list_hdrs = $hdr->header_raw('List-ID'); for my $list_hdr (@list_hdrs) { my $ibx = list_hdr_ibx($config, $list_hdr); if (defined($ibx)) { $ibxs{$ibx->{mainrepo}} = $ibx; } } my @DeliveredTo = $hdr->header_raw('Delivered-To'); for my $deliveredto (@DeliveredTo) { my $ibx = deliveredto_ibx($config, $deliveredto); if (defined($ibx)) { $ibxs{$ibx->{mainrepo}} = $ibx; } } my @ibxs = values %ibxs; return @ibxs; } sub index_inbox ($) { my ($ibx) = @_; my $repo = $ibx->{mainrepo}; my $pid = fork(); if (!defined($pid)) { warn("public-inbox-index $repo failed to start\n"); return; } if ($pid != 0) { # Wait for the process so zombies don't accumulate waitpid($pid, 0); return; } # child exec( 'public-inbox-index', $repo); warn("exec public-inbox-index $repo failed\n"); exit(1); } sub imap_sleep () { my $sleep = 5; print "\nSleeping for $sleep minutes\n\n"; sleep($sleep*60); } sub fsck_inbox ($) { my ($git_dir) = @_; if (system(('git', '--git-dir', $git_dir, 'fsck')) != 0) { die "git fsck failed: $?"; } } sub push_inbox ($) { my ($git_dir) = @_; if (!$push_origin) { return; } print("pushing $git_dir...\n"); if (system(('git', '--git-dir', $git_dir, 'push', 'origin')) != 0) { die "git push failed: $?"; } } print("Enter your imap password: "); ReadMode('noecho'); $mail_password = ReadLine(0); ReadMode('normal'); chomp($mail_password); print("\n"); sub fetch_mailbox ($$$$) { my ($config, $tracker, $client, $mailbox) = @_; my $now = time(); print("mailbox: $mailbox @ " . strftime("%Y-%m-%d %H:%M:%S %z", localtime(time())) . "\n"); my $default_ibx = mbox_default_ibx($config, $mailbox); if (!defined($default_ibx)) { print("skipping $mailbox no default inbox\n"); return 0; } my %importers; my $url = $url_base . $mailbox; my ($last_validity, $last_uid) = $tracker->get_last($url); $client->select($mailbox); $client->Peek(1); $client->Uid(1); my $validity = $client->uidvalidity($mailbox) or die("No uid validity"); if (defined($last_validity) and ($validity ne $last_validity)) { die ("Unexpected uid validity $validity expected $last_validity"); } my $search_str="ALL"; if (defined($last_uid)) { # Find the last seen and all higher articles $search_str = "UID $last_uid:*"; } my $uids = $client->search($search_str); if (!defined($uids) || (scalar(@$uids) == 0)) { print("No uids found! $@\n"); return 0; } my $last = undef; my @sorted_uids = sort { $a <=> $b } @$uids; # Cap the number of uids to process at once my $more = 0; my $uid_count = scalar(@sorted_uids); if ($uid_count > 100) { @sorted_uids = @sorted_uids[0..99]; $more = $uid_count - 100; } for my $uid (@sorted_uids) { print("UID: $validity $uid\n"); if (defined($last_uid)) { if ($uid == $last_uid) { next; } if ($uid < $last_uid) { print("uid $uid not below last $last_uid, skipping.\n"); next; } } my $email_str = $client->message_string($uid) or die "Could not message_string $@\n"; my $email_len = length($email_str); my $mime = Email::MIME->new($email_str); $mime->{-public_inbox_raw} = $email_str; my @dests = email_dest($config, $mime); if (scalar(@dests) == 0) { push(@dests, $default_ibx); } die ("no destination for the email") unless scalar(@dests) > 0; printf(" dests: %d\n", scalar(@dests)); for my $ibx (@dests) { my $git_dir = ibx_gitdir($ibx); print " git_dir: $git_dir\n"; my $im = $importers{$git_dir}->[0]; if (!defined($im)) { $ibx = PublicInbox::InboxWritable->new($ibx); $im = $ibx->importer(1); die "no im" unless defined($im); my @arr = ( $im, $ibx ); $importers{$git_dir} = \@arr; } if (defined($im->{mm}->{num_highwater})) { print "Last: $git_dir: " . $im->{mm}->{num_highwater} . "\n"; } else { print "Last: $git_dir: 0\n"; } $im->add($mime); print "This: $git_dir: " . $im->{mm}->{num_highwater} . "\n"; } #$client->delete_message($uid); $last = $uid; } if ($last) { die ("no git_dirs for $url") unless scalar(keys %importers) > 0; for my $git_dir (keys %importers) { my $ref = delete $importers{$git_dir}; my ($im, $ibx) = @$ref; $im->done(); push_inbox($git_dir); } print("updating tracker for $url...\n"); $tracker->update_last($url, $validity, $last); #$client->delete_message($uids) or die ("Could not delete messages"); } $client->close($mailbox); return $more; } sub fetch_mail () { my $config = eval { PublicInbox::Config->new }; die("No public inbox config found!") unless $config; my $tracker = PublicInbox::IMAPTracker->new(); my $socket = IO::Socket::SSL->new( PeerAddr => $mail_hostname, PeerPort => 993, Timeout => 5, SSL_verify_mode => SSL_VERIFY_PEER, IO::Socket::SSL::default_ca(), ); if (!defined($socket)) { die("Could not open socket to mailserver: $@\n"); } my $client = Mail::IMAPClient->new( Socket => $socket, User => $mail_username, Password => $mail_password, Timeout => 5, ); if (!defined($client)) { die("Could not initialize imap client $@\n"); } if (!$client->IsAuthenticated()) { die("Could not authenticate against IMAP: $@\n"); } if (!$client->IsConnected()) { die("Could not connect to the IMAP server: $@\n"); } my $mailboxes = $client->folders(); my @sorted_mailboxes = sort { $a cmp $b } @$mailboxes; my $more; do { $more = 0; for my $mailbox (@sorted_mailboxes) { $more += fetch_mailbox($config, $tracker, $client, $mailbox); } } while ($more > 0); $client->logout(); } sub relevant_inbox { my ($ibx) = @_; # Verify the mailbox is one that would come from the server my $lc_user = lc($mail_username); my $lc_domain = lc($mail_domainname); foreach (@{$ibx->{address}}) { my $lc_addr = lc($_); if ($lc_addr =~ m/${lc_user}[+][^@]+[@]${lc_domain}/) { return 1; } } return 0; } sub sanitize_inboxes () { my $config = eval { PublicInbox::Config->new }; die("No public inbox config found!") unless $config; $config->each_inbox( sub { my ($ibx) = @_; return unless relevant_inbox($ibx); print $ibx->{name} . "\n"; my $git_dir = ibx_gitdir($ibx); fsck_inbox($git_dir); eval { push_inbox($git_dir); }; index_inbox($ibx); } ); } if ($sanitize) { sanitize_inboxes(); sync(); } for (;;imap_sleep()) { # Run fetch_mail in it's own separate process so # that if something goes wrong the process exits # and everything cleans up properly. # # Running fetch_mail in an eval block is not enough # to prevent leaks of locks and other resources. my $pid = fork(); if (!defined($pid)) { warn("fork failed:\n"); continue; } elsif ($pid == 0) { # in the child fetch_mail(); exit(0); } else { warn("-------------------- CHILD: $pid --------------------\n"); waitpid($pid, 0); sync(); } } 1; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-08 22:24 ` Eric W. Biederman @ 2019-10-08 22:41 ` Eric Wong 2019-10-09 7:58 ` Eric W. Biederman 0 siblings, 1 reply; 23+ messages in thread From: Eric Wong @ 2019-10-08 22:41 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alyssa Ross, meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > my $tracker = PublicInbox::IMAPTracker->new(); Thanks. What's PublicInbox::IMAPTracker? > for (;;imap_sleep()) { > # Run fetch_mail in it's own separate process so > # that if something goes wrong the process exits > # and everything cleans up properly. > # > # Running fetch_mail in an eval block is not enough > # to prevent leaks of locks and other resources. I was thinking along the same lines :) The rest looked good and it's definitely something I can work with to get supportable in a way that reuses git's credential system and configs. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-08 22:41 ` Eric Wong @ 2019-10-09 7:58 ` Eric W. Biederman 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman 2019-10-10 8:31 ` Do I need multiple publicinbox.<name>.address values? Eric Wong 0 siblings, 2 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-09 7:58 UTC (permalink / raw) To: Eric Wong; +Cc: Alyssa Ross, meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> my $tracker = PublicInbox::IMAPTracker->new(); > > Thanks. What's PublicInbox::IMAPTracker? Something that keeps the last fetched UID in an sqlite database. I will follow up with a patch for that as well. I haven't been brave enough to let this script delete any mail yet so I need to track what has been fetched. Something that will be rolled back if the email message isn't commited into git. I have a companion script that will delete mail. I mostly sent it so that there is some idea what I am working with. In context of this discussion I don't remember how often I am looking at the "Delivered-To:" header. That is almost gone but I know for a while I was using that as well. >> for (;;imap_sleep()) { >> # Run fetch_mail in it's own separate process so >> # that if something goes wrong the process exits >> # and everything cleans up properly. >> # >> # Running fetch_mail in an eval block is not enough >> # to prevent leaks of locks and other resources. > > I was thinking along the same lines :) > > The rest looked good and it's definitely something I can work > with to get supportable in a way that reuses git's credential > system and configs. I think fundamentally the script sucks because it is one email message at a time (not using IMAPs overlapping), the script is polling (NOT using IMAP idle), and I have magic to go from the server that I need to fetch to the actual domain. But in practice the script seems to be working reliably even in the event of an error right now. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/4] Various bits to support import_imap_mailbox 2019-10-09 7:58 ` Eric W. Biederman @ 2019-10-09 8:15 ` Eric W. Biederman 2019-10-09 8:16 ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman ` (4 more replies) 2019-10-10 8:31 ` Do I need multiple publicinbox.<name>.address values? Eric Wong 1 sibling, 5 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-09 8:15 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong, These should all of my generic patches to support my import_imap_mailbox script. The really important patch that adds to the support for List-ID to public inbox configuration file I have already sent. I haven't written tests and I get the following test failure when I run make test > t/config.t ................. 1/? > # Failed test 'lookup matches expected output' > # at t/config.t line 26. > # Structures begin differing at: > # $got->{listid} = ARRAY(0x55c1d4e3b6a8) > # $expected->{listid} = Does not exist > > # Failed test 'lookup matches expected output for test' > # at t/config.t line 42. > # Structures begin differing at: > # $got->{listid} = ARRAY(0x55c1d508d8d0) > # $expected->{listid} = Does not exist > # Looks like you failed 2 tests of 69. > t/config.t ................. Dubious, test returned 2 (wstat 512, 0x200) I haven't looked into what is happening there. Eric Biederman (4): PublicInbox::Import Smuggle a raw message into add PublicInbox::Config: Process mailboxes in sorted order Config.pm: Add support for looking up repos by their directories IMAPTracker: Add a helper to track our place in reading imap mailboxes lib/PublicInbox/Config.pm | 19 ++++++++++- lib/PublicInbox/IMAPTracker.pm | 73 ++++++++++++++++++++++++++++++++++++++++++ lib/PublicInbox/Import.pm | 9 +++--- 3 files changed, 96 insertions(+), 5 deletions(-) Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman @ 2019-10-09 8:16 ` Eric W. Biederman 2019-10-15 20:26 ` Eric Wong 2019-10-09 8:17 ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman ` (3 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Eric W. Biederman @ 2019-10-09 8:16 UTC (permalink / raw) To: Eric Wong; +Cc: meta Date: Tue, 15 Jan 2019 16:36:42 -0600 I don't trust the MIME type to not munge my email messages in horrible ways upon occasion. Therefore allow for passing in the raw message value instead of trusting the mime object to preserve it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- As we discussed last time I was working to merge my imap import script. lib/PublicInbox/Import.pm | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index 137b2b7800c4..b17c9d5c9fa8 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -394,16 +394,17 @@ sub add { } my $blob = $self->{mark}++; - my $str = $mime->as_string; - my $n = length($str); + my $raw_email = $mime->{-public_inbox_raw}; + $raw_email ||= $mime->as_string; + my $n = length($raw_email); $self->{bytes_added} += $n; print $w "blob\nmark :$blob\ndata ", $n, "\n" or wfail; - print $w $str, "\n" or wfail; + print $w $raw_email, "\n" or wfail; # v2: we need this for Xapian if ($self->{want_object_info}) { my $oid = $self->get_mark(":$blob"); - $self->{last_object} = [ $oid, $n, \$str ]; + $self->{last_object} = [ $oid, $n, \$raw_email ]; } my $ref = $self->{ref}; my $commit = $self->{mark}++; -- 2.23.0 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add 2019-10-09 8:16 ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman @ 2019-10-15 20:26 ` Eric Wong 2019-10-15 23:05 ` Eric W. Biederman 0 siblings, 1 reply; 23+ messages in thread From: Eric Wong @ 2019-10-15 20:26 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > Date: Tue, 15 Jan 2019 16:36:42 -0600 > > I don't trust the MIME type to not munge my email messages in horrible > ways upon occasion. Therefore allow for passing in the raw message > value instead of trusting the mime object to preserve it. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > > As we discussed last time I was working to merge my imap import script. Thanks, pushed with the following interdiff: diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index b17c9d5c..e1f48771 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -394,8 +394,7 @@ sub add { } my $blob = $self->{mark}++; - my $raw_email = $mime->{-public_inbox_raw}; - $raw_email ||= $mime->as_string; + my $raw_email = $mime->{-public_inbox_raw} // $mime->as_string; my $n = length($raw_email); $self->{bytes_added} += $n; print $w "blob\nmark :$blob\ndata ", $n, "\n" or wfail; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add 2019-10-15 20:26 ` Eric Wong @ 2019-10-15 23:05 ` Eric W. Biederman 0 siblings, 0 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-15 23:05 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Date: Tue, 15 Jan 2019 16:36:42 -0600 >> >> I don't trust the MIME type to not munge my email messages in horrible >> ways upon occasion. Therefore allow for passing in the raw message >> value instead of trusting the mime object to preserve it. >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> >> As we discussed last time I was working to merge my imap import script. > > Thanks, pushed with the following interdiff: > > diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm > index b17c9d5c..e1f48771 100644 > --- a/lib/PublicInbox/Import.pm > +++ b/lib/PublicInbox/Import.pm > @@ -394,8 +394,7 @@ sub add { > } > > my $blob = $self->{mark}++; > - my $raw_email = $mime->{-public_inbox_raw}; > - $raw_email ||= $mime->as_string; > + my $raw_email = $mime->{-public_inbox_raw} // $mime->as_string; > my $n = length($raw_email); > $self->{bytes_added} += $n; > print $w "blob\nmark :$blob\ndata ", $n, "\n" or wfail; Looks good. I hadn't realized that // was an operator in perl. It is nice to see. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman 2019-10-09 8:16 ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman @ 2019-10-09 8:17 ` Eric W. Biederman 2019-10-10 9:43 ` Eric Wong 2019-10-09 8:23 ` [PATCH 3/4] Config.pm: Add support for looking up repos by their directories Eric W. Biederman ` (2 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Eric W. Biederman @ 2019-10-09 8:17 UTC (permalink / raw) To: Eric Wong; +Cc: meta Date: Thu, 16 May 2019 19:26:47 -0500 To make the results reproducible and comprehensible when a large number of mail boxes are being processed process the mail boxes in sorted order, instead of in random hash order. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- lib/PublicInbox/Config.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index 9f3f8df7eeaa..9ec00b0ddb6d 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -128,7 +128,7 @@ sub each_inbox { } } else { my %seen; - foreach my $k (keys %$self) { + foreach my $k (sort keys %$self) { $k =~ m!\Apublicinbox\.([^/]+)\.mainrepo\z! or next; next if $seen{$1}; $seen{$1} = 1; -- 2.23.0 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order 2019-10-09 8:17 ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman @ 2019-10-10 9:43 ` Eric Wong 2019-10-10 11:05 ` Eric W. Biederman 0 siblings, 1 reply; 23+ messages in thread From: Eric Wong @ 2019-10-10 9:43 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > Date: Thu, 16 May 2019 19:26:47 -0500 > > To make the results reproducible and comprehensible when > a large number of mail boxes are being processed process the > mail boxes in sorted order, instead of in random hash order. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > lib/PublicInbox/Config.pm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm > index 9f3f8df7eeaa..9ec00b0ddb6d 100644 > --- a/lib/PublicInbox/Config.pm > +++ b/lib/PublicInbox/Config.pm > @@ -128,7 +128,7 @@ sub each_inbox { > } > } else { > my %seen; > - foreach my $k (keys %$self) { > + foreach my $k (sort keys %$self) { > $k =~ m!\Apublicinbox\.([^/]+)\.mainrepo\z! or next; > next if $seen{$1}; > $seen{$1} = 1; I'm not sure if that code path happens outside of tests. If it's parsing "git config -l" output, it'll use section_order which preserves the ordering of the config file. Will take a deeper look tomorrow (and possibly rewrite those tests to parse a scalarref instead of just taking a hashref). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order 2019-10-10 9:43 ` Eric Wong @ 2019-10-10 11:05 ` Eric W. Biederman 0 siblings, 0 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-10 11:05 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Date: Thu, 16 May 2019 19:26:47 -0500 >> >> To make the results reproducible and comprehensible when >> a large number of mail boxes are being processed process the >> mail boxes in sorted order, instead of in random hash order. >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> lib/PublicInbox/Config.pm | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm >> index 9f3f8df7eeaa..9ec00b0ddb6d 100644 >> --- a/lib/PublicInbox/Config.pm >> +++ b/lib/PublicInbox/Config.pm >> @@ -128,7 +128,7 @@ sub each_inbox { >> } >> } else { >> my %seen; >> - foreach my $k (keys %$self) { >> + foreach my $k (sort keys %$self) { >> $k =~ m!\Apublicinbox\.([^/]+)\.mainrepo\z! or next; >> next if $seen{$1}; >> $seen{$1} = 1; > > I'm not sure if that code path happens outside of tests. > If it's parsing "git config -l" output, it'll use section_order > which preserves the ordering of the config file. > > Will take a deeper look tomorrow (and possibly rewrite those > tests to parse a scalarref instead of just taking a hashref). I know I have at least one use in my import_imap_inbox script. Looking at the dates it looks like I added this in July 2018, and you added section order in January 2019. So it may be you have a better fix in place already and I simply hadn't noticed. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/4] Config.pm: Add support for looking up repos by their directories 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman 2019-10-09 8:16 ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman 2019-10-09 8:17 ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman @ 2019-10-09 8:23 ` Eric W. Biederman 2019-10-09 8:25 ` [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman 2019-10-10 19:08 ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong 4 siblings, 0 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-09 8:23 UTC (permalink / raw) To: Eric Wong; +Cc: meta Date: Sun, 29 Jul 2018 15:52:57 -0500 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- Hmm. I thought I was using this but now that I am quickly checking I don't see this being used anywhere. I think PublicInbox::Admin::resolve_inboxes has superceded this functionality. Please feel free to drop this one. I am just sending it at this point because it would be strange to get to 3/4 and not send the patch. lib/PublicInbox/Config.pm | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index 9ec00b0ddb6d..7c5221c395be 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -156,6 +156,23 @@ sub lookup_newsgroup { undef; } +sub lookup_repodir { + my ($self, $repodir) = @_; + my $rv = $self->{-by_repodir}->{$repodir}; + return $rv if $rv; + + foreach my $k (keys %$self) { + $k =~ /\A(publicinbox\.[\w-]+)\.mainrepo\z/ or next; + my $v = $self->{$k}; + my $pfx = $1; + if ($v eq $repodir) { + $rv = _fill($self, $pfx); + return $rv; + } + } + undef; +} + sub limiter { my ($self, $name) = @_; $self->{-limiters}->{$name} ||= do { -- 2.23.0 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman ` (2 preceding siblings ...) 2019-10-09 8:23 ` [PATCH 3/4] Config.pm: Add support for looking up repos by their directories Eric W. Biederman @ 2019-10-09 8:25 ` Eric W. Biederman 2019-10-10 19:08 ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong 4 siblings, 0 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-09 8:25 UTC (permalink / raw) To: Eric Wong; +Cc: meta Date: Fri, 27 Jul 2018 20:54:27 -0500 This removes the need to delete from an imap mailbox when downloading it's messages. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- This is simple and potentially very useful. lib/PublicInbox/IMAPTracker.pm | 73 ++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 lib/PublicInbox/IMAPTracker.pm diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm new file mode 100644 index 000000000000..65241eeebf51 --- /dev/null +++ b/lib/PublicInbox/IMAPTracker.pm @@ -0,0 +1,73 @@ +# Copyright (C) 2018 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +package PublicInbox::IMAPTracker; +use strict; +use warnings; +use DBI; +use DBD::SQLite; +use PublicInbox::Config; + +sub create_tables ($) +{ + my ($dbh) = @_; + + $dbh->do(<<''); +CREATE TABLE IF NOT EXISTS imap_last ( + url VARCHAR PRIMARY KEY NOT NULL, + uid_validity INTEGER NOT NULL, + uid INTEGER NOT NULL, + UNIQUE (url) +) + +} + + +sub dbh_new ($) +{ + my ($dbname) = @_; + my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", '', '', { + AutoCommit => 1, + RaiseError => 1, + PrintError => 0, + ReadOnly => 0, + sqlite_use_immediate_transaction => 1, + }); + $dbh->{sqlite_unicode} = 1; + $dbh->do('PRAGMA journal_mode = TRUNCATE'); + $dbh->do('PRAGMA cache_size = 80000'); + create_tables($dbh); + $dbh; +} + +sub get_last($$) +{ + my ($self, $url) = @_; + my $dbh = $self->{dbh}; + + my $sth = $dbh->prepare_cached(<<'', undef, 1); +SELECT uid_validity, uid FROM imap_last WHERE url = ? + + $sth->execute($url); + $sth->fetchrow_array; +} + +sub update_last($$$$) +{ + my ($self, $url, $validity, $last) = @_; + my $dbh = $self->{dbh}; + my $sth = $dbh->prepare_cached(<<''); +INSERT OR REPLACE INTO imap_last (url, uid_validity, uid) +VALUES (?, ?, ?) + + $sth->execute($url, $validity, $last); +} + +sub new { + my ($class) = @_; + my $dbname = PublicInbox::Config->config_dir() . "/imap.sqlite3"; + my $dbh = dbh_new($dbname); + bless { dbname => $dbname, dbh => $dbh }, $class; +} + +1; -- 2.23.0 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman ` (3 preceding siblings ...) 2019-10-09 8:25 ` [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman @ 2019-10-10 19:08 ` Eric Wong 2019-10-10 21:23 ` Eric W. Biederman 4 siblings, 1 reply; 23+ messages in thread From: Eric Wong @ 2019-10-10 19:08 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong, > > These should all of my generic patches to support my import_imap_mailbox > script. The really important patch that adds to the support for List-ID > to public inbox configuration file I have already sent. > > I haven't written tests and I get the following test failure when I run > make test > > > t/config.t ................. 1/? > > # Failed test 'lookup matches expected output' > > # at t/config.t line 26. > > # Structures begin differing at: > > # $got->{listid} = ARRAY(0x55c1d4e3b6a8) > > # $expected->{listid} = Does not exist > > > > # Failed test 'lookup matches expected output for test' > > # at t/config.t line 42. > > # Structures begin differing at: > > # $got->{listid} = ARRAY(0x55c1d508d8d0) > > # $expected->{listid} = Does not exist > > # Looks like you failed 2 tests of 69. > > t/config.t ................. Dubious, test returned 2 (wstat 512, 0x200) > > I haven't looked into what is happening there. Hey Eric, I'll squash this in to fix the tests: diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index 9f3f8df7..c2fa40f9 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -449,8 +449,10 @@ sub _fill { $self->{-by_addr}->{$lc_addr} = $ibx; $self->{-no_obfuscate}->{$lc_addr} = 1; } - foreach my $list_id (@{$ibx->{listid}}) { - $self->{-by_list_id}->{$list_id} = $ibx; + if (my $listids = $ibx->{listid}) { + foreach my $list_id (@$listids) { + $self->{-by_list_id}->{$list_id} = $ibx; + } } if (my $ng = $ibx->{newsgroup}) { $self->{-by_newsgroup}->{$ng} = $ibx; Perl has this weird feature where it autovivifies arrayrefs (and hashrefs) on member access. ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] 2019-10-10 19:08 ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong @ 2019-10-10 21:23 ` Eric W. Biederman 0 siblings, 0 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-10 21:23 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> Eric Wong, >> >> These should all of my generic patches to support my import_imap_mailbox >> script. The really important patch that adds to the support for List-ID >> to public inbox configuration file I have already sent. >> >> I haven't written tests and I get the following test failure when I run >> make test >> >> > t/config.t ................. 1/? >> > # Failed test 'lookup matches expected output' >> > # at t/config.t line 26. >> > # Structures begin differing at: >> > # $got->{listid} = ARRAY(0x55c1d4e3b6a8) >> > # $expected->{listid} = Does not exist >> > >> > # Failed test 'lookup matches expected output for test' >> > # at t/config.t line 42. >> > # Structures begin differing at: >> > # $got->{listid} = ARRAY(0x55c1d508d8d0) >> > # $expected->{listid} = Does not exist >> > # Looks like you failed 2 tests of 69. >> > t/config.t ................. Dubious, test returned 2 (wstat 512, 0x200) >> >> I haven't looked into what is happening there. > > Hey Eric, I'll squash this in to fix the tests: > > diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm > index 9f3f8df7..c2fa40f9 100644 > --- a/lib/PublicInbox/Config.pm > +++ b/lib/PublicInbox/Config.pm > @@ -449,8 +449,10 @@ sub _fill { > $self->{-by_addr}->{$lc_addr} = $ibx; > $self->{-no_obfuscate}->{$lc_addr} = 1; > } > - foreach my $list_id (@{$ibx->{listid}}) { > - $self->{-by_list_id}->{$list_id} = $ibx; > + if (my $listids = $ibx->{listid}) { > + foreach my $list_id (@$listids) { > + $self->{-by_list_id}->{$list_id} = $ibx; > + } > } > if (my $ng = $ibx->{newsgroup}) { > $self->{-by_newsgroup}->{$ng} = $ibx; > > Perl has this weird feature where it autovivifies arrayrefs (and > hashrefs) on member access. Thank you. It makes complete sense from that perspective. If I didn't already know things were rough around the edges with these patches I would be embarrased. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-09 7:58 ` Eric W. Biederman 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman @ 2019-10-10 8:31 ` Eric Wong 2019-10-10 10:56 ` Eric W. Biederman 1 sibling, 1 reply; 23+ messages in thread From: Eric Wong @ 2019-10-10 8:31 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alyssa Ross, meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong <e@80x24.org> writes: > > > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >> my $tracker = PublicInbox::IMAPTracker->new(); > > > > Thanks. What's PublicInbox::IMAPTracker? > > Something that keeps the last fetched UID in an sqlite database. > I will follow up with a patch for that as well. Thanks! > I haven't been brave enough to let this script delete any mail > yet so I need to track what has been fetched. Something that > will be rolled back if the email message isn't commited into git. > > I have a companion script that will delete mail. Yup. I also don't trust and don't want somebody trusting my code when it comes to deleting stuff. So I've also been considering a script which deletes mail from my Maildirs/IMAP folders if a ContentId matches what's in a known public-inbox, perhaps after a certain time... > I mostly sent it so that there is some idea what I am working with. > > In context of this discussion I don't remember how often I am looking at > the "Delivered-To:" header. That is almost gone but I know for a while > I was using that as well. > > >> for (;;imap_sleep()) { > >> # Run fetch_mail in it's own separate process so > >> # that if something goes wrong the process exits > >> # and everything cleans up properly. > >> # > >> # Running fetch_mail in an eval block is not enough > >> # to prevent leaks of locks and other resources. > > > > I was thinking along the same lines :) > > > > The rest looked good and it's definitely something I can work > > with to get supportable in a way that reuses git's credential > > system and configs. > > I think fundamentally the script sucks because it is one email message > at a time (not using IMAPs overlapping), the script is polling (NOT > using IMAP idle), and I have magic to go from the server that I need > to fetch to the actual domain. Yeah, IMAP client libraries are pretty disappointing considering what the protocol is capable of. I just checked libcurl and I don't see that it supports IMAP IDLE, either; and honestly I'm not sure how to add it since curl is more intended for URLs. > But in practice the script seems to be working reliably even in the > event of an error right now. Cool. Perfect is the enemy of good, after all :> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-10 8:31 ` Do I need multiple publicinbox.<name>.address values? Eric Wong @ 2019-10-10 10:56 ` Eric W. Biederman 0 siblings, 0 replies; 23+ messages in thread From: Eric W. Biederman @ 2019-10-10 10:56 UTC (permalink / raw) To: Eric Wong; +Cc: Alyssa Ross, meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> Eric Wong <e@80x24.org> writes: >> >> > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> my $tracker = PublicInbox::IMAPTracker->new(); >> > >> > Thanks. What's PublicInbox::IMAPTracker? >> >> Something that keeps the last fetched UID in an sqlite database. >> I will follow up with a patch for that as well. > > Thanks! > >> I haven't been brave enough to let this script delete any mail >> yet so I need to track what has been fetched. Something that >> will be rolled back if the email message isn't commited into git. >> >> I have a companion script that will delete mail. > > Yup. I also don't trust and don't want somebody trusting my > code when it comes to deleting stuff. > > So I've also been considering a script which deletes mail from > my Maildirs/IMAP folders if a ContentId matches what's in a > known public-inbox, perhaps after a certain time... Interesting. Partly I have been leaving things up so I could restart the process if the script fails. I like the idea of a verifier/deleter. I have a delete up to my imap tracker script. A verifier will probably share a lot of logic with the imap importer script, but since it only has to confirm something is stable before deleting, it isn't quite the same. Time permitting I will play with that next. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-08 0:10 ` Eric Wong 2019-10-08 12:18 ` Eric W. Biederman @ 2019-10-09 11:59 ` Alyssa Ross 2019-10-10 10:06 ` Eric Wong 1 sibling, 1 reply; 23+ messages in thread From: Alyssa Ross @ 2019-10-09 11:59 UTC (permalink / raw) To: Eric Wong; +Cc: meta [-- Attachment #1: Type: text/plain, Size: 1088 bytes --] > Sorry for that confusion. Historically, I wrote public-inbox > and -mda because I needed to migrate mailing lists with a > several month window where the old list would be active, > but I was preparing for the switch: > > So it started as: > > posters -> to_be_shutdown_host -> MTA (postfix) -> public-inbox-mda > > And ends up being: > > posters -> MTA (postfix) -> public-inbox-mda --[1]--> (mlmmj|mailman) > > > It sounds like what you're doing is: > > MTA -> (mlmmj|mailman) -> public-inbox-mda > > Which wasn't my original intended usecase for -mda, but is for -watch. Interesting! The reason I set it up this way is that mailman (at least v3) doesn't archive raw incoming mail -- the default archiver, Hyperkitty, pulls out the body and some headers it cares about, saves them into a database, and discards the rest. Which IMO is not great for an archiver (although I do like Hyperkitty as a web interface for people who haven't yet seen the light of mail). So my plan is to use public-inbox as my primary archiver, and just use Hyperkitty as a "modern" web interface. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Do I need multiple publicinbox.<name>.address values? 2019-10-09 11:59 ` Alyssa Ross @ 2019-10-10 10:06 ` Eric Wong 0 siblings, 0 replies; 23+ messages in thread From: Eric Wong @ 2019-10-10 10:06 UTC (permalink / raw) To: Alyssa Ross; +Cc: meta Alyssa Ross <hi@alyssa.is> wrote: > > Sorry for that confusion. Historically, I wrote public-inbox > > and -mda because I needed to migrate mailing lists with a > > several month window where the old list would be active, > > but I was preparing for the switch: > > > > So it started as: > > > > posters -> to_be_shutdown_host -> MTA (postfix) -> public-inbox-mda > > > > And ends up being: > > > > posters -> MTA (postfix) -> public-inbox-mda --[1]--> (mlmmj|mailman) > > > > > > It sounds like what you're doing is: > > > > MTA -> (mlmmj|mailman) -> public-inbox-mda > > > > Which wasn't my original intended usecase for -mda, but is for -watch. > > Interesting! The reason I set it up this way is that mailman (at least > v3) doesn't archive raw incoming mail -- the default archiver, > Hyperkitty, pulls out the body and some headers it cares about, saves > them into a database, and discards the rest. Which IMO is not great for > an archiver (although I do like Hyperkitty as a web interface for people > who haven't yet seen the light of mail). So my plan is to use > public-inbox as my primary archiver, and just use Hyperkitty as a > "modern" web interface. Yeah, right now you could have Mailman deliver to an address -> Maildir which public-inbox-watch runs on. It's a bit roundabout and you'd need to age out old messages from the Maildir. Or you could write an NNTP -> pipe-to-arbitrary command script which can pipe to Mailman or mlmmj, keeping track of which messages have already been sent (or wait for me to write it). I've been meaning to replace scripts/ssoma-replay (for mlmmj) with that script at some point. I've never actually run Mailman, but I assume it allows raw messages to be piped to it like most mail tools. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-10-15 23:06 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-10-07 22:13 Do I need multiple publicinbox.<name>.address values? Alyssa Ross 2019-10-08 0:10 ` Eric Wong 2019-10-08 12:18 ` Eric W. Biederman 2019-10-08 12:23 ` [PATCH] Config.pm: Add support for mailing list information Eric W. Biederman 2019-10-08 22:11 ` Do I need multiple publicinbox.<name>.address values? Eric Wong 2019-10-08 22:24 ` Eric W. Biederman 2019-10-08 22:41 ` Eric Wong 2019-10-09 7:58 ` Eric W. Biederman 2019-10-09 8:15 ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman 2019-10-09 8:16 ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman 2019-10-15 20:26 ` Eric Wong 2019-10-15 23:05 ` Eric W. Biederman 2019-10-09 8:17 ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman 2019-10-10 9:43 ` Eric Wong 2019-10-10 11:05 ` Eric W. Biederman 2019-10-09 8:23 ` [PATCH 3/4] Config.pm: Add support for looking up repos by their directories Eric W. Biederman 2019-10-09 8:25 ` [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman 2019-10-10 19:08 ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong 2019-10-10 21:23 ` Eric W. Biederman 2019-10-10 8:31 ` Do I need multiple publicinbox.<name>.address values? Eric Wong 2019-10-10 10:56 ` Eric W. Biederman 2019-10-09 11:59 ` Alyssa Ross 2019-10-10 10:06 ` Eric Wong
Code repositories for project(s) associated with this public inbox https://80x24.org/public-inbox.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).