user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* Do I need multiple publicinbox.<name>.address values?
@ 2019-10-07 22:13 Alyssa Ross
  2019-10-08  0:10 ` Eric Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Alyssa Ross @ 2019-10-07 22:13 UTC (permalink / raw)
  To: meta

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

Suppose I have a mailing list, foo-discuss@example.org, and a
public-inbox set up, subscribed to that mailing list, that is subscribed
to that list as public-inbox+foo-discuss@example.org, where my MTA
delivers to public-inbox using public-inbox-mda.

In this case, mail will be To: foo-discuss@example.org, but RCPT TO (and
therefore ORIGINAL_RECIPIENT) public-inbox+foo-discuss@example.org.  In
my testing, public-inbox requires *both* the To: header and
ORIGINAL_RECIPIENT to match publicinbox.<name>.address.

But, public-inbox-config(7) has this to say for
publicinbox.<name>.address:

> The email address of the public-inbox.  May be specified
> more than once for merging multiple mailing lists (or migrating
> to new addresses).

Nothing in this suggests to me that I would need to set it more than
once for my simple use case.

So my question is, am I overcomplicating this -- should I be able to set
a single value for publicinbox.<name>.address that will get my
public-inbox to accept mail from the list?  Or, alternatively, should a
sentence be added to that description explaining that multiple addresses
will also be required for the case where public-inbox is subscribed
itself to a mailing list?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-07 22:13 Do I need multiple publicinbox.<name>.address values? Alyssa Ross
@ 2019-10-08  0:10 ` Eric Wong
  2019-10-08 12:18   ` Eric W. Biederman
  2019-10-09 11:59   ` Alyssa Ross
  0 siblings, 2 replies; 23+ messages in thread
From: Eric Wong @ 2019-10-08  0:10 UTC (permalink / raw)
  To: Alyssa Ross; +Cc: meta, Eric W. Biederman

Alyssa Ross <hi@alyssa.is> wrote:

> Subject: Do I need multiple publicinbox.<name>.address values?

Absolutely not

> Suppose I have a mailing list, foo-discuss@example.org, and a
> public-inbox set up, subscribed to that mailing list, that is subscribed
> to that list as public-inbox+foo-discuss@example.org, where my MTA
> delivers to public-inbox using public-inbox-mda.

Currently, -mda does if you're mirroring, unfortunately.  I
think Eric Biederman is/was working on List-Id support to drop
that requirement, but I'm not sure where that is...

Eric B: would you mind if I take List-Id support over?  I've got
some hours free in the coming days(s)... (I think :x)

> In this case, mail will be To: foo-discuss@example.org, but RCPT TO (and
> therefore ORIGINAL_RECIPIENT) public-inbox+foo-discuss@example.org.  In
> my testing, public-inbox requires *both* the To: header and
> ORIGINAL_RECIPIENT to match publicinbox.<name>.address.

Yes, but To:/Cc: and ORIGINAL_RECIPIENT can be different.
lore.kernel.org uses two addresses: one for the well-known
address (e.g. linux-kernel@vger.kernel.org) and the other
something@something.kernel.org, which nobody (should) be
sending to.

> But, public-inbox-config(7) has this to say for
> publicinbox.<name>.address:

> > The email address of the public-inbox.  May be specified
> > more than once for merging multiple mailing lists (or migrating
> > to new addresses).

> Nothing in this suggests to me that I would need to set it more than
> once for my simple use case.

Sorry for that confusion.  Historically, I wrote public-inbox
and -mda because I needed to migrate mailing lists with a
several month window where the old list would be active,
but I was preparing for the switch:

So it started as:

  posters -> to_be_shutdown_host -> MTA (postfix) -> public-inbox-mda

And ends up being:

  posters -> MTA (postfix) -> public-inbox-mda --[1]--> (mlmmj|mailman)


It sounds like what you're doing is:

  MTA -> (mlmmj|mailman) -> public-inbox-mda

Which wasn't my original intended usecase for -mda, but is for -watch.

> So my question is, am I overcomplicating this -- should I be able to set
> a single value for publicinbox.<name>.address that will get my
> public-inbox to accept mail from the list?  Or, alternatively, should a
> sentence be added to that description explaining that multiple addresses
> will also be required for the case where public-inbox is subscribed
> itself to a mailing list?

Generally, I use -watch exclusively when mirroring lists.
With -watch, you won't need to set extra addresses.

But apparently some folks prefer -mda for everything.  So I
guess we'll have to better support that, too (via List-Id).



(*) via scripts/ssoma-replay, which I intend to replace with an
    nntpd-based replacement at some point for v2 compatibility.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-08  0:10 ` Eric Wong
@ 2019-10-08 12:18   ` Eric W. Biederman
  2019-10-08 12:23     ` [PATCH] Config.pm: Add support for mailing list information Eric W. Biederman
  2019-10-08 22:11     ` Do I need multiple publicinbox.<name>.address values? Eric Wong
  2019-10-09 11:59   ` Alyssa Ross
  1 sibling, 2 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-08 12:18 UTC (permalink / raw)
  To: Eric Wong; +Cc: Alyssa Ross, meta

Eric Wong <e@80x24.org> writes:

> Alyssa Ross <hi@alyssa.is> wrote:
>
>> Subject: Do I need multiple publicinbox.<name>.address values?
>
> Absolutely not
>
>> Suppose I have a mailing list, foo-discuss@example.org, and a
>> public-inbox set up, subscribed to that mailing list, that is subscribed
>> to that list as public-inbox+foo-discuss@example.org, where my MTA
>> delivers to public-inbox using public-inbox-mda.
>
> Currently, -mda does if you're mirroring, unfortunately.  I
> think Eric Biederman is/was working on List-Id support to drop
> that requirement, but I'm not sure where that is...
>
> Eric B: would you mind if I take List-Id support over?  I've got
> some hours free in the coming days(s)... (I think :x)

I believe I have the config side of the work done.  I haven't
figured out how to add this to public-inbox-mda/public-inbox-watch.

Let me send out what I have and then you can work on the bits
for public-inbox-watch public-inbox-mda.

Last round I was messing with this I almost had my imap fetcher in
shape. I may try again but let's get the listid thing settled first
if we can.

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] Config.pm: Add support for mailing list information
  2019-10-08 12:18   ` Eric W. Biederman
@ 2019-10-08 12:23     ` Eric W. Biederman
  2019-10-08 22:11     ` Do I need multiple publicinbox.<name>.address values? Eric Wong
  1 sibling, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-08 12:23 UTC (permalink / raw)
  To: Eric Wong; +Cc: Alyssa Ross, meta

Date: Thu, 16 May 2019 19:22:46 -0500

The world has turned since I first started following mailing lists and
to my surprise every mailing list that I am subscribed to properly
sets the "List-ID:" mailing list header.  So instead of doing
something clever and flexible I am adding support for looking up
public inbox mailing lists by their mailing list name.

That makes the work needed for each email trivial and easy to understand.
- Parse the "List-ID:" header.
- Lookup in the configuration which mailbox is connected to that
  "List-ID:"
- Deliver the mail to that mailbox.

To that end this change enhances PublicInbox to have an additional
mailbox configuration parameter "listid" that holds the mailing list
name.

A method is added to the PublicInbox config object called
lookup_list_id that given a mailing list name will return the
PublicInbox in the configuration that is configured to handle that
mailing list.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

The relevant snippet from my imap import program looks like:

sub list_hdr_ibx($$)
{
	my ($config, $list_hdr) = @_;
	my $list_id;
	if ($list_hdr =~ m/\0/) {
		warn("Bad List-ID: $list_hdr contains a null\n");
		return undef;
	} elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) {
		$list_id = $1;
	} else {
		warn("Bad List-ID: $list_hdr\n");
		return undef;
	}
	my $ibx = $config->lookup_list_id($list_id);
	if (!defined($ibx)) {
		warn("Cound not find inbox for List-ID: $list_id\n");
	}

	print(" List-ID: $list_id\n");
	$ibx;
}

sub email_dest($$)
{
	my ($config, $mime) = @_;
	my %ibxs;
	my $hdr = $mime->header_obj;
	my @list_hdrs = $hdr->header_raw('List-ID');
	for my $list_hdr (@list_hdrs) {
		my $ibx = list_hdr_ibx($config, $list_hdr);
		if (defined($ibx)) {
			$ibxs{$ibx->{mainrepo}} = $ibx;
		}
	}
	my @ibxs = values %ibxs;
	return @ibxs;
}


 lib/PublicInbox/Config.pm | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 4fcb20d24437..9f3f8df7eeaa 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -25,6 +25,7 @@ sub new {
 
 	# caches
 	$self->{-by_addr} ||= {};
+	$self->{-by_list_id} ||= {};
 	$self->{-by_name} ||= {};
 	$self->{-by_newsgroup} ||= {};
 	$self->{-no_obfuscate} ||= {};
@@ -84,6 +85,33 @@ sub lookup {
 	_fill($self, $pfx);
 }
 
+sub lookup_list_id {
+	my ($self, $list_id) = @_;
+	$list_id = lc($list_id);
+	my $ibx = $self->{-by_list_id}->{$list_id};
+	return $ibx if $ibx;
+
+	my $pfx;
+
+	foreach my $k (keys %$self) {
+		$k =~ /\A(publicinbox\.[\w-]+)\.listid\z/ or next;
+		my $v = $self->{$k};
+		if (ref($v) eq "ARRAY") {
+			foreach my $alias (@$v) {
+				(lc($alias) eq $list_id) or next;
+				$pfx = $1;
+				last;
+			}
+		} else {
+			(lc($v) eq $list_id) or next;
+			$pfx = $1;
+			last;
+		}
+	}
+	defined $pfx or return;
+	_fill($self, $pfx);
+}
+
 sub lookup_name ($$) {
 	my ($self, $name) = @_;
 	$self->{-by_name}->{$name} || _fill($self, "publicinbox.$name");
@@ -398,7 +426,7 @@ sub _fill {
 	}
 	# TODO: more arrays, we should support multi-value for
 	# more things to encourage decentralization
-	foreach my $k (qw(address altid nntpmirror coderepo hide)) {
+	foreach my $k (qw(address altid nntpmirror coderepo hide listid)) {
 		if (defined(my $v = $self->{"$pfx.$k"})) {
 			$ibx->{$k} = _array($v);
 		}
@@ -421,6 +449,9 @@ sub _fill {
 		$self->{-by_addr}->{$lc_addr} = $ibx;
 		$self->{-no_obfuscate}->{$lc_addr} = 1;
 	}
+	foreach my $list_id (@{$ibx->{listid}}) {
+		$self->{-by_list_id}->{$list_id} = $ibx;
+	}
 	if (my $ng = $ibx->{newsgroup}) {
 		$self->{-by_newsgroup}->{$ng} = $ibx;
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-08 12:18   ` Eric W. Biederman
  2019-10-08 12:23     ` [PATCH] Config.pm: Add support for mailing list information Eric W. Biederman
@ 2019-10-08 22:11     ` Eric Wong
  2019-10-08 22:24       ` Eric W. Biederman
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Wong @ 2019-10-08 22:11 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Alyssa Ross, meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > Alyssa Ross <hi@alyssa.is> wrote:
> >
> >> Subject: Do I need multiple publicinbox.<name>.address values?
> >
> > Absolutely not
> >
> >> Suppose I have a mailing list, foo-discuss@example.org, and a
> >> public-inbox set up, subscribed to that mailing list, that is subscribed
> >> to that list as public-inbox+foo-discuss@example.org, where my MTA
> >> delivers to public-inbox using public-inbox-mda.
> >
> > Currently, -mda does if you're mirroring, unfortunately.  I
> > think Eric Biederman is/was working on List-Id support to drop
> > that requirement, but I'm not sure where that is...
> >
> > Eric B: would you mind if I take List-Id support over?  I've got
> > some hours free in the coming days(s)... (I think :x)
> 
> I believe I have the config side of the work done.  I haven't
> figured out how to add this to public-inbox-mda/public-inbox-watch.
> 
> Let me send out what I have and then you can work on the bits
> for public-inbox-watch public-inbox-mda.

Thanks, will do.

> Last round I was messing with this I almost had my imap fetcher in
> shape. I may try again but let's get the listid thing settled first
> if we can.

Alright.

Btw, would using libcurl for IMAP support be easier for you?

I'm considering introducing libcurl support via Inline::C (it
might be easier to hook into the event loop for other things).

I'm also thinking about distributing some C via *.h header files
so it's easier to sparse/lint the code w/o it being embedded
inside a Perl file (I'd send those *.h files to Inline::C for
production use).



Using *.h since MakeMaker will automatically assume any *.c
files will be made into an XS extension, and I'm not sure how to
workaround that... (using GNU make w/o MakeMaker is an option)

Whether it's Perl or C; I want to keep everything on end-user
systems in source form, so they can just hack/update the source
if needed instead of having to find/download/build it to
experiment.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-08 22:11     ` Do I need multiple publicinbox.<name>.address values? Eric Wong
@ 2019-10-08 22:24       ` Eric W. Biederman
  2019-10-08 22:41         ` Eric Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-08 22:24 UTC (permalink / raw)
  To: Eric Wong; +Cc: Alyssa Ross, meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> 
>> > Alyssa Ross <hi@alyssa.is> wrote:
>> >
>> >> Subject: Do I need multiple publicinbox.<name>.address values?
>> >
>> > Absolutely not
>> >
>> >> Suppose I have a mailing list, foo-discuss@example.org, and a
>> >> public-inbox set up, subscribed to that mailing list, that is subscribed
>> >> to that list as public-inbox+foo-discuss@example.org, where my MTA
>> >> delivers to public-inbox using public-inbox-mda.
>> >
>> > Currently, -mda does if you're mirroring, unfortunately.  I
>> > think Eric Biederman is/was working on List-Id support to drop
>> > that requirement, but I'm not sure where that is...
>> >
>> > Eric B: would you mind if I take List-Id support over?  I've got
>> > some hours free in the coming days(s)... (I think :x)
>> 
>> I believe I have the config side of the work done.  I haven't
>> figured out how to add this to public-inbox-mda/public-inbox-watch.
>> 
>> Let me send out what I have and then you can work on the bits
>> for public-inbox-watch public-inbox-mda.
>
> Thanks, will do.
>
>> Last round I was messing with this I almost had my imap fetcher in
>> shape. I may try again but let's get the listid thing settled first
>> if we can.
>
> Alright.
>
> Btw, would using libcurl for IMAP support be easier for you?

Right now I think I just need to make certain all of my prereqs are
merged and push the code out.  I just don't have as much time to work
on these things as I am used to so it is taking me forever to get
anything done.

For what it is worth below is my imap import script below.

The hardest part has been making certain I get the error handling
correct when unexpected errors happen.  Because errors do happen.

> I'm considering introducing libcurl support via Inline::C (it
> might be easier to hook into the event loop for other things).
>
> I'm also thinking about distributing some C via *.h header files
> so it's easier to sparse/lint the code w/o it being embedded
> inside a Perl file (I'd send those *.h files to Inline::C for
> production use).
>
>
>
> Using *.h since MakeMaker will automatically assume any *.c
> files will be made into an XS extension, and I'm not sure how to
> workaround that... (using GNU make w/o MakeMaker is an option)
>
> Whether it's Perl or C; I want to keep everything on end-user
> systems in source form, so they can just hack/update the source
> if needed instead of having to find/download/build it to
> experiment.


#<--- scripts/import_imap_mailbox ---->
#!/usr/bin/perl -w
# Script to import a IMAP mailbox into a public-inbox
=begin usage
	./import_imap_mailbox imap://username@hostname
=cut
use strict;
use warnings;
use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
use Mail::IMAPClient;
use IO::Socket;
use IO::Socket::SSL;
use Data::Dumper;
use Email::Simple;
use Email::MIME;
use File::Basename;
use File::Sync qw(sync);
use Term::ReadKey;
use PublicInbox::Config;
use PublicInbox::Address;
use PublicInbox::IMAPTracker;
use PublicInbox::InboxWritable;
use POSIX qw(strftime);
sub usage { "Usage:\n".join('', grep(/\t/, `head -n 24 $0`)) }
my $push_origin = 1;
my $sanitize = 1;
my %opts = (
	'--push-origin!' => \$push_origin,
	'--sanitize!' => \$sanitize,
);
GetOptions(%opts) or die usage();

my $mail_url = shift @ARGV or die usage();
my $mail_hostname;
my $mail_domainname;
my $mail_username;
my $mail_password;
if ($mail_url =~ m$\Aimap://([^@]+)[@]([^@]+)(/|)\z$) {
	$mail_username = $1;
	$mail_hostname = $2;
	if ($mail_hostname eq "mail.xmission.com") {
		$mail_domainname = "xmission.com";
	}
} else {
	die usage();
}

chomp(my $committer_name = `git config user.name`);
chomp(my $committer_email = `git config user.email`);
my $committer = "$committer_name <$committer_email>";

my $mail_addr     = $mail_username . '@' . $mail_domainname;
my $url_base = 'imap://' . $mail_username . '@' . $mail_hostname . '/' ;

sub list_hdr_ibx($$)
{
	my ($config, $list_hdr) = @_;
	my $list_id;
	if ($list_hdr =~ m/\0/) {
		warn("Bad List-ID: $list_hdr contains a null\n");
		return undef;
	} elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) {
		$list_id = $1;
	} else {
		warn("Bad List-ID: $list_hdr\n");
		return undef;
	}
	my $ibx = $config->lookup_list_id($list_id);
	if (!defined($ibx)) {
		warn("Cound not find inbox for List-ID: $list_id\n");
	}

	print(" List-ID: $list_id\n");
	$ibx;
}

sub deliveredto_ibx($$)
{
	my ($config, $deliveredto) = @_;
	my ($email) = PublicInbox::Address::emails($deliveredto);
	if (!defined($email)) {
		warn("No email in Delivered-To: $deliveredto\n");
		return undef;
	}
	my $ibx = $config->lookup($email);
	if (!defined($ibx)) {
		warn("Cound not find inbox for deliveredto: $email\n");
		return undef;
	}

	print(" deliveredto: $deliveredto\n");
	$ibx;
}

sub mbox_default_ibx($$) {
	my ($config, $mailbox) = @_;
	my $addr = $mail_username . '+' . $mailbox . '@' . $mail_domainname;
	if ($mailbox eq 'INBOX') {
		$addr = $mail_addr;
	}
	my $ibx = $config->lookup($addr);
	if (defined($ibx) and !defined($ibx->{mainrepo})) {
		$ibx = undef;
	}
	$ibx;
}

sub ibx_gitdir($)
{
	my ($ibx) = @_;
	my $repo = $ibx->{mainrepo};
	die("Inbox without mainrepo") unless defined($repo);
	my $git_dir = $repo;
	if (-d "$repo/git") {
		my $last = 0;
		opendir(my $dh, "$repo/git") || die("Can not open git dir $repo/git/\n");
		while(my $name = readdir($dh)) {
			if ($name =~ m/^([0-9]+).git$/) {
				if ($last < $1) {
					$last = $1;
				}
			}
		}
		closedir($dh);
		$git_dir = "$repo/git/$last.git";
	}
	return $git_dir;
}

sub email_dest($$)
{
	my ($config, $mime) = @_;
	my %ibxs;
	my $hdr = $mime->header_obj;
	my @list_hdrs = $hdr->header_raw('List-ID');
	for my $list_hdr (@list_hdrs) {
		my $ibx = list_hdr_ibx($config, $list_hdr);
		if (defined($ibx)) {
			$ibxs{$ibx->{mainrepo}} = $ibx;
		}
	}
	my @DeliveredTo = $hdr->header_raw('Delivered-To');
	for my $deliveredto (@DeliveredTo) {
		my $ibx = deliveredto_ibx($config, $deliveredto);
		if (defined($ibx)) {
			$ibxs{$ibx->{mainrepo}} = $ibx;
		}
	}
	my @ibxs = values %ibxs;
	return @ibxs;
}

sub index_inbox ($)
{
	my ($ibx) = @_;
	my $repo = $ibx->{mainrepo};
	my $pid = fork();
	if (!defined($pid)) {
		warn("public-inbox-index $repo failed to start\n");
		return;
	}
	if ($pid != 0) {
		# Wait for the process so zombies don't accumulate
		waitpid($pid, 0);
		return;
	}
	# child
	exec( 'public-inbox-index', $repo);
	warn("exec public-inbox-index $repo failed\n");
	exit(1);
}

sub imap_sleep ()
{
	my $sleep = 5;
	print "\nSleeping for $sleep minutes\n\n";
	sleep($sleep*60);
}

sub fsck_inbox ($)
{
	my ($git_dir) = @_;
	if (system(('git', '--git-dir', $git_dir, 'fsck')) != 0) {
		die "git fsck failed: $?";
	}
}

sub push_inbox ($)
{
	my ($git_dir) = @_;
	if (!$push_origin) {
		return;
	}
	print("pushing $git_dir...\n");
	if (system(('git', '--git-dir', $git_dir, 'push', 'origin')) != 0) {
		die "git push failed: $?";
	}
}

print("Enter your imap password: ");
ReadMode('noecho');
$mail_password = ReadLine(0);
ReadMode('normal');
chomp($mail_password);
print("\n");

sub fetch_mailbox ($$$$)
{
	my ($config, $tracker, $client, $mailbox) = @_;
	my $now = time();
	print("mailbox: $mailbox @ " .
	      strftime("%Y-%m-%d %H:%M:%S %z", localtime(time()))
	      . "\n");

	my $default_ibx = mbox_default_ibx($config, $mailbox);

	if (!defined($default_ibx)) {
		print("skipping $mailbox no default inbox\n");
		return 0;
	}

	my %importers;
	my $url = $url_base  . $mailbox;
	my ($last_validity, $last_uid) = $tracker->get_last($url);

	$client->select($mailbox);
	$client->Peek(1);
	$client->Uid(1);
	my $validity = $client->uidvalidity($mailbox) or die("No uid validity");
	if (defined($last_validity) and ($validity ne $last_validity)) {
		die ("Unexpected uid validity $validity expected $last_validity");
	}

	my $search_str="ALL";
	if (defined($last_uid)) {
		# Find the last seen and all higher articles
		$search_str = "UID $last_uid:*";
	}
	my $uids = $client->search($search_str);
	if (!defined($uids) || (scalar(@$uids) == 0)) {
		print("No uids found! $@\n");
		return 0;
	}

	my $last = undef;
	my @sorted_uids = sort { $a <=> $b } @$uids;
	# Cap the number of uids to process at once
	my $more = 0;
	my $uid_count = scalar(@sorted_uids);
	if ($uid_count > 100) {
		@sorted_uids = @sorted_uids[0..99];
		$more = $uid_count - 100;
	}
	for my $uid (@sorted_uids) {
		print("UID: $validity $uid\n");
		if (defined($last_uid)) {
			if ($uid == $last_uid) {
				next;
			}
			if ($uid < $last_uid) {
				print("uid $uid not below last $last_uid, skipping.\n");
				next;
			}
		}
		my $email_str = $client->message_string($uid) or die "Could not message_string $@\n";
		my $email_len = length($email_str);
		my $mime = Email::MIME->new($email_str);
		$mime->{-public_inbox_raw} = $email_str;

		my @dests = email_dest($config, $mime);
		if (scalar(@dests) == 0) {
			push(@dests, $default_ibx);
		}
		die ("no destination for the email") unless scalar(@dests) > 0;
		printf(" dests: %d\n", scalar(@dests));
		for my $ibx (@dests) {
			my $git_dir = ibx_gitdir($ibx);
			print " git_dir: $git_dir\n";
			my $im = $importers{$git_dir}->[0];
			if (!defined($im)) {
				$ibx = PublicInbox::InboxWritable->new($ibx);
				$im = $ibx->importer(1);
				die "no im" unless defined($im);
				my @arr = ( $im, $ibx );
				$importers{$git_dir} = \@arr;
			}
			if (defined($im->{mm}->{num_highwater})) {
				print "Last: $git_dir: " . $im->{mm}->{num_highwater} . "\n";
			} else {
				print "Last: $git_dir: 0\n";
			}
			$im->add($mime);
			print "This: $git_dir: " . $im->{mm}->{num_highwater} . "\n";
		}
		#$client->delete_message($uid);
		$last = $uid;
	}

	if ($last) {
		die ("no git_dirs for $url") unless scalar(keys %importers) > 0;
		for my $git_dir (keys %importers) {
			my $ref = delete $importers{$git_dir};
			my ($im, $ibx) = @$ref;
			$im->done();
			push_inbox($git_dir);
		}
		print("updating tracker for $url...\n");
		$tracker->update_last($url, $validity, $last);

		#$client->delete_message($uids) or die ("Could not delete messages");
	}

	$client->close($mailbox);
	return $more;
}

sub fetch_mail ()
{
	my $config = eval { PublicInbox::Config->new };
	die("No public inbox config found!") unless $config;

	my $tracker = PublicInbox::IMAPTracker->new();

	my $socket = IO::Socket::SSL->new(
		PeerAddr => $mail_hostname,
		PeerPort => 993,
		Timeout  => 5,

		SSL_verify_mode => SSL_VERIFY_PEER,
		IO::Socket::SSL::default_ca(),
	);
	if (!defined($socket)) {
		die("Could not open socket to mailserver: $@\n");
	}

	my $client = Mail::IMAPClient->new(
		Socket   => $socket,
		User     => $mail_username,
		Password => $mail_password,
		Timeout  => 5,
	    );
	if (!defined($client)) {
		die("Could not initialize imap client $@\n");
	}
	if (!$client->IsAuthenticated()) {
		die("Could not authenticate against IMAP: $@\n");
	}
	if (!$client->IsConnected()) {
		die("Could not connect to the IMAP server: $@\n");
	}

	my $mailboxes = $client->folders();
	my @sorted_mailboxes = sort { $a  cmp $b } @$mailboxes;

	my $more;
	do {
		$more = 0;
		for my $mailbox (@sorted_mailboxes) {
			$more += fetch_mailbox($config, $tracker, $client, $mailbox);
		}
	} while ($more > 0);

	$client->logout();
}

sub relevant_inbox
{
	my ($ibx) = @_;
	# Verify the mailbox is one that would come from the server
	my $lc_user = lc($mail_username);
	my $lc_domain = lc($mail_domainname);
	foreach (@{$ibx->{address}}) {
		my $lc_addr = lc($_);
		if ($lc_addr =~ m/${lc_user}[+][^@]+[@]${lc_domain}/) {
			return 1;
		}
	}
	return 0;
}

sub sanitize_inboxes ()
{
	my $config = eval { PublicInbox::Config->new };
	die("No public inbox config found!") unless $config;

	$config->each_inbox(
		sub {
			my ($ibx) = @_;

			return unless relevant_inbox($ibx);

			print $ibx->{name} . "\n";
			my $git_dir = ibx_gitdir($ibx);
			fsck_inbox($git_dir);
			eval { push_inbox($git_dir); };
			index_inbox($ibx);
		}
	);
}

if ($sanitize) {
	sanitize_inboxes();
	sync();
}

for (;;imap_sleep()) {
	# Run fetch_mail in it's own separate process so
	# that if something goes wrong the process exits
	# and everything cleans up properly.
	#
	# Running fetch_mail in an eval block is not enough
	# to prevent leaks of locks and other resources.
	my $pid = fork();
	if (!defined($pid)) {
		warn("fork failed:\n");
		continue;
	}
	elsif ($pid == 0) {
		# in the child
		fetch_mail();
		exit(0);
	}
	else {
		warn("-------------------- CHILD: $pid --------------------\n");
		waitpid($pid, 0);
		sync();
	}
}

1;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-08 22:24       ` Eric W. Biederman
@ 2019-10-08 22:41         ` Eric Wong
  2019-10-09  7:58           ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Wong @ 2019-10-08 22:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Alyssa Ross, meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 	my $tracker = PublicInbox::IMAPTracker->new();

Thanks.  What's PublicInbox::IMAPTracker?

> for (;;imap_sleep()) {
> 	# Run fetch_mail in it's own separate process so
> 	# that if something goes wrong the process exits
> 	# and everything cleans up properly.
> 	#
> 	# Running fetch_mail in an eval block is not enough
> 	# to prevent leaks of locks and other resources.

I was thinking along the same lines :)

The rest looked good and it's definitely something I can work
with to get supportable in a way that reuses git's credential
system and configs.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-08 22:41         ` Eric Wong
@ 2019-10-09  7:58           ` Eric W. Biederman
  2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
  2019-10-10  8:31             ` Do I need multiple publicinbox.<name>.address values? Eric Wong
  0 siblings, 2 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-09  7:58 UTC (permalink / raw)
  To: Eric Wong; +Cc: Alyssa Ross, meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 	my $tracker = PublicInbox::IMAPTracker->new();
>
> Thanks.  What's PublicInbox::IMAPTracker?

Something that keeps the last fetched UID in an sqlite database.
I will follow up with a patch for that as well.

I haven't been brave enough to let this script delete any mail
yet so I need to track what has been fetched.  Something that
will be rolled back if the email message isn't commited into git.

I have a companion script that will delete mail.

I mostly sent it so that there is some idea what I am working with.

In context of this discussion I don't remember how often I am looking at
the "Delivered-To:" header.  That is almost gone but I know for a while
I was using that as well.

>> for (;;imap_sleep()) {
>> 	# Run fetch_mail in it's own separate process so
>> 	# that if something goes wrong the process exits
>> 	# and everything cleans up properly.
>> 	#
>> 	# Running fetch_mail in an eval block is not enough
>> 	# to prevent leaks of locks and other resources.
>
> I was thinking along the same lines :)
>
> The rest looked good and it's definitely something I can work
> with to get supportable in a way that reuses git's credential
> system and configs.

I think fundamentally the script sucks because it is one email message
at a time (not using IMAPs overlapping), the script is polling (NOT
using IMAP idle), and I have magic to go from the server that I need
to fetch to the actual domain.

But in practice the script seems to be working reliably even in the
event of an error right now.

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 0/4] Various bits to support import_imap_mailbox
  2019-10-09  7:58           ` Eric W. Biederman
@ 2019-10-09  8:15             ` Eric W. Biederman
  2019-10-09  8:16               ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman
                                 ` (4 more replies)
  2019-10-10  8:31             ` Do I need multiple publicinbox.<name>.address values? Eric Wong
  1 sibling, 5 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-09  8:15 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Eric Wong,

These should all of my generic patches to support my import_imap_mailbox
script.  The really important patch that adds to the support for List-ID
to public inbox configuration file I have already sent.

I haven't written tests and I get the following test failure when I run
make test

> t/config.t ................. 1/? 
> #   Failed test 'lookup matches expected output'
> #   at t/config.t line 26.
> #     Structures begin differing at:
> #          $got->{listid} = ARRAY(0x55c1d4e3b6a8)
> #     $expected->{listid} = Does not exist
> 
> #   Failed test 'lookup matches expected output for test'
> #   at t/config.t line 42.
> #     Structures begin differing at:
> #          $got->{listid} = ARRAY(0x55c1d508d8d0)
> #     $expected->{listid} = Does not exist
> # Looks like you failed 2 tests of 69.
> t/config.t ................. Dubious, test returned 2 (wstat 512, 0x200)

I haven't looked into what is happening there.

Eric Biederman (4):
      PublicInbox::Import  Smuggle a raw message into add
      PublicInbox::Config: Process mailboxes in sorted order
      Config.pm: Add support for looking up repos by their directories
      IMAPTracker:  Add a helper to track our place in reading imap mailboxes

 lib/PublicInbox/Config.pm      | 19 ++++++++++-
 lib/PublicInbox/IMAPTracker.pm | 73 ++++++++++++++++++++++++++++++++++++++++++
 lib/PublicInbox/Import.pm      |  9 +++---
 3 files changed, 96 insertions(+), 5 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add
  2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
@ 2019-10-09  8:16               ` Eric W. Biederman
  2019-10-15 20:26                 ` Eric Wong
  2019-10-09  8:17               ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman
                                 ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-09  8:16 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Date: Tue, 15 Jan 2019 16:36:42 -0600

I don't trust the MIME type to not munge my email messages in horrible
ways upon occasion.  Therefore  allow for passing in the raw message
value instead of trusting the mime object to preserve it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

As we discussed last time I was working to merge my imap import script.

 lib/PublicInbox/Import.pm | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 137b2b7800c4..b17c9d5c9fa8 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -394,16 +394,17 @@ sub add {
 	}
 
 	my $blob = $self->{mark}++;
-	my $str = $mime->as_string;
-	my $n = length($str);
+	my $raw_email = $mime->{-public_inbox_raw};
+	$raw_email ||= $mime->as_string;
+	my $n = length($raw_email);
 	$self->{bytes_added} += $n;
 	print $w "blob\nmark :$blob\ndata ", $n, "\n" or wfail;
-	print $w $str, "\n" or wfail;
+	print $w $raw_email, "\n" or wfail;
 
 	# v2: we need this for Xapian
 	if ($self->{want_object_info}) {
 		my $oid = $self->get_mark(":$blob");
-		$self->{last_object} = [ $oid, $n, \$str ];
+		$self->{last_object} = [ $oid, $n, \$raw_email ];
 	}
 	my $ref = $self->{ref};
 	my $commit = $self->{mark}++;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order
  2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
  2019-10-09  8:16               ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman
@ 2019-10-09  8:17               ` Eric W. Biederman
  2019-10-10  9:43                 ` Eric Wong
  2019-10-09  8:23               ` [PATCH 3/4] Config.pm: Add support for looking up repos by their directories Eric W. Biederman
                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-09  8:17 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Date: Thu, 16 May 2019 19:26:47 -0500

To make the results reproducible and comprehensible when
a large number of mail boxes are being processed process the
mail boxes in sorted order, instead of in random hash order.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/Config.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 9f3f8df7eeaa..9ec00b0ddb6d 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -128,7 +128,7 @@ sub each_inbox {
 		}
 	} else {
 		my %seen;
-		foreach my $k (keys %$self) {
+		foreach my $k (sort keys %$self) {
 			$k =~ m!\Apublicinbox\.([^/]+)\.mainrepo\z! or next;
 			next if $seen{$1};
 			$seen{$1} = 1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] Config.pm: Add support for looking up repos by their directories
  2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
  2019-10-09  8:16               ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman
  2019-10-09  8:17               ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman
@ 2019-10-09  8:23               ` Eric W. Biederman
  2019-10-09  8:25               ` [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
  2019-10-10 19:08               ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong
  4 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-09  8:23 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Date: Sun, 29 Jul 2018 15:52:57 -0500

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Hmm. I thought I was using this but now that I am quickly checking
I don't see this being used anywhere.  I think
PublicInbox::Admin::resolve_inboxes has superceded this functionality.

Please feel free to drop this one.  I am just sending it at this
point because it would be strange to get to 3/4 and not send
the patch.

 lib/PublicInbox/Config.pm | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 9ec00b0ddb6d..7c5221c395be 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -156,6 +156,23 @@ sub lookup_newsgroup {
 	undef;
 }
 
+sub lookup_repodir {
+	my ($self, $repodir) = @_;
+	my $rv = $self->{-by_repodir}->{$repodir};
+	return $rv if $rv;
+
+	foreach my $k (keys %$self) {
+		$k =~ /\A(publicinbox\.[\w-]+)\.mainrepo\z/ or next;
+		my $v = $self->{$k};
+		my $pfx = $1;
+		if ($v eq $repodir) {
+			$rv = _fill($self, $pfx);
+			return $rv;
+		}
+	}
+	undef;
+}
+
 sub limiter {
 	my ($self, $name) = @_;
 	$self->{-limiters}->{$name} ||= do {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] IMAPTracker:  Add a helper to track our place in reading imap mailboxes
  2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
                                 ` (2 preceding siblings ...)
  2019-10-09  8:23               ` [PATCH 3/4] Config.pm: Add support for looking up repos by their directories Eric W. Biederman
@ 2019-10-09  8:25               ` Eric W. Biederman
  2019-10-10 19:08               ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong
  4 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-09  8:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Date: Fri, 27 Jul 2018 20:54:27 -0500

This removes the need to delete from an imap mailbox when downloading it's messages.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

This is simple and potentially very useful.

 lib/PublicInbox/IMAPTracker.pm | 73 ++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 lib/PublicInbox/IMAPTracker.pm

diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
new file mode 100644
index 000000000000..65241eeebf51
--- /dev/null
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -0,0 +1,73 @@
+# Copyright (C) 2018 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+package PublicInbox::IMAPTracker;
+use strict;
+use warnings;
+use DBI;
+use DBD::SQLite;
+use PublicInbox::Config;
+
+sub create_tables ($)
+{
+	my ($dbh) = @_;
+
+	$dbh->do(<<'');
+CREATE TABLE IF NOT EXISTS imap_last (
+	url VARCHAR PRIMARY KEY NOT NULL,
+	uid_validity INTEGER NOT NULL,
+	uid INTEGER NOT NULL,
+	UNIQUE (url)
+)
+
+}
+
+
+sub dbh_new ($)
+{
+	my ($dbname) = @_;
+	my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", '', '', {
+		AutoCommit => 1,
+		RaiseError => 1,
+		PrintError => 0,
+		ReadOnly => 0,
+		sqlite_use_immediate_transaction => 1,
+	});
+	$dbh->{sqlite_unicode} = 1;
+	$dbh->do('PRAGMA journal_mode = TRUNCATE');
+	$dbh->do('PRAGMA cache_size = 80000');
+	create_tables($dbh);
+	$dbh;
+}
+
+sub get_last($$)
+{
+	my ($self, $url) = @_;
+	my $dbh = $self->{dbh};
+
+	my $sth = $dbh->prepare_cached(<<'', undef, 1);
+SELECT uid_validity, uid FROM imap_last WHERE url = ?
+
+	$sth->execute($url);
+	$sth->fetchrow_array;
+}
+
+sub update_last($$$$)
+{
+	my ($self, $url, $validity, $last) = @_;
+	my $dbh = $self->{dbh};
+	my $sth = $dbh->prepare_cached(<<'');
+INSERT OR REPLACE INTO imap_last (url, uid_validity, uid)
+VALUES (?, ?, ?)
+
+	$sth->execute($url, $validity, $last);
+}
+
+sub new {
+	my ($class) = @_;
+	my $dbname = PublicInbox::Config->config_dir() . "/imap.sqlite3";
+	my $dbh = dbh_new($dbname);
+	bless { dbname => $dbname, dbh => $dbh }, $class;
+}
+
+1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-08  0:10 ` Eric Wong
  2019-10-08 12:18   ` Eric W. Biederman
@ 2019-10-09 11:59   ` Alyssa Ross
  2019-10-10 10:06     ` Eric Wong
  1 sibling, 1 reply; 23+ messages in thread
From: Alyssa Ross @ 2019-10-09 11:59 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]

> Sorry for that confusion.  Historically, I wrote public-inbox
> and -mda because I needed to migrate mailing lists with a
> several month window where the old list would be active,
> but I was preparing for the switch:
>
> So it started as:
>
>   posters -> to_be_shutdown_host -> MTA (postfix) -> public-inbox-mda
>
> And ends up being:
>
>   posters -> MTA (postfix) -> public-inbox-mda --[1]--> (mlmmj|mailman)
>
>
> It sounds like what you're doing is:
>
>   MTA -> (mlmmj|mailman) -> public-inbox-mda
>
> Which wasn't my original intended usecase for -mda, but is for -watch.

Interesting!  The reason I set it up this way is that mailman (at least
v3) doesn't archive raw incoming mail -- the default archiver,
Hyperkitty, pulls out the body and some headers it cares about, saves
them into a database, and discards the rest.  Which IMO is not great for
an archiver (although I do like Hyperkitty as a web interface for people
who haven't yet seen the light of mail).  So my plan is to use
public-inbox as my primary archiver, and just use Hyperkitty as a
"modern" web interface.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-09  7:58           ` Eric W. Biederman
  2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
@ 2019-10-10  8:31             ` Eric Wong
  2019-10-10 10:56               ` Eric W. Biederman
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Wong @ 2019-10-10  8:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Alyssa Ross, meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >> 	my $tracker = PublicInbox::IMAPTracker->new();
> >
> > Thanks.  What's PublicInbox::IMAPTracker?
> 
> Something that keeps the last fetched UID in an sqlite database.
> I will follow up with a patch for that as well.

Thanks!

> I haven't been brave enough to let this script delete any mail
> yet so I need to track what has been fetched.  Something that
> will be rolled back if the email message isn't commited into git.
> 
> I have a companion script that will delete mail.

Yup.  I also don't trust and don't want somebody trusting my
code when it comes to deleting stuff.

So I've also been considering a script which deletes mail from
my Maildirs/IMAP folders if a ContentId matches what's in a
known public-inbox, perhaps after a certain time...

> I mostly sent it so that there is some idea what I am working with.
> 
> In context of this discussion I don't remember how often I am looking at
> the "Delivered-To:" header.  That is almost gone but I know for a while
> I was using that as well.
> 
> >> for (;;imap_sleep()) {
> >> 	# Run fetch_mail in it's own separate process so
> >> 	# that if something goes wrong the process exits
> >> 	# and everything cleans up properly.
> >> 	#
> >> 	# Running fetch_mail in an eval block is not enough
> >> 	# to prevent leaks of locks and other resources.
> >
> > I was thinking along the same lines :)
> >
> > The rest looked good and it's definitely something I can work
> > with to get supportable in a way that reuses git's credential
> > system and configs.
> 
> I think fundamentally the script sucks because it is one email message
> at a time (not using IMAPs overlapping), the script is polling (NOT
> using IMAP idle), and I have magic to go from the server that I need
> to fetch to the actual domain.

Yeah, IMAP client libraries are pretty disappointing
considering what the protocol is capable of.

I just checked libcurl and I don't see that it supports
IMAP IDLE, either; and honestly I'm not sure how to add
it since curl is more intended for URLs.

> But in practice the script seems to be working reliably even in the
> event of an error right now.

Cool.  Perfect is the enemy of good, after all :>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order
  2019-10-09  8:17               ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman
@ 2019-10-10  9:43                 ` Eric Wong
  2019-10-10 11:05                   ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Wong @ 2019-10-10  9:43 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> Date: Thu, 16 May 2019 19:26:47 -0500
> 
> To make the results reproducible and comprehensible when
> a large number of mail boxes are being processed process the
> mail boxes in sorted order, instead of in random hash order.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  lib/PublicInbox/Config.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
> index 9f3f8df7eeaa..9ec00b0ddb6d 100644
> --- a/lib/PublicInbox/Config.pm
> +++ b/lib/PublicInbox/Config.pm
> @@ -128,7 +128,7 @@ sub each_inbox {
>  		}
>  	} else {
>  		my %seen;
> -		foreach my $k (keys %$self) {
> +		foreach my $k (sort keys %$self) {
>  			$k =~ m!\Apublicinbox\.([^/]+)\.mainrepo\z! or next;
>  			next if $seen{$1};
>  			$seen{$1} = 1;

I'm not sure if that code path happens outside of tests.
If it's parsing "git config -l" output, it'll use section_order
which preserves the ordering of the config file.

Will take a deeper look tomorrow (and possibly rewrite those
tests to parse a scalarref instead of just taking a hashref).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-09 11:59   ` Alyssa Ross
@ 2019-10-10 10:06     ` Eric Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Wong @ 2019-10-10 10:06 UTC (permalink / raw)
  To: Alyssa Ross; +Cc: meta

Alyssa Ross <hi@alyssa.is> wrote:
> > Sorry for that confusion.  Historically, I wrote public-inbox
> > and -mda because I needed to migrate mailing lists with a
> > several month window where the old list would be active,
> > but I was preparing for the switch:
> >
> > So it started as:
> >
> >   posters -> to_be_shutdown_host -> MTA (postfix) -> public-inbox-mda
> >
> > And ends up being:
> >
> >   posters -> MTA (postfix) -> public-inbox-mda --[1]--> (mlmmj|mailman)
> >
> >
> > It sounds like what you're doing is:
> >
> >   MTA -> (mlmmj|mailman) -> public-inbox-mda
> >
> > Which wasn't my original intended usecase for -mda, but is for -watch.
> 
> Interesting!  The reason I set it up this way is that mailman (at least
> v3) doesn't archive raw incoming mail -- the default archiver,
> Hyperkitty, pulls out the body and some headers it cares about, saves
> them into a database, and discards the rest.  Which IMO is not great for
> an archiver (although I do like Hyperkitty as a web interface for people
> who haven't yet seen the light of mail).  So my plan is to use
> public-inbox as my primary archiver, and just use Hyperkitty as a
> "modern" web interface.

Yeah, right now you could have Mailman deliver to an address ->
Maildir which public-inbox-watch runs on.  It's a bit roundabout
and you'd need to age out old messages from the Maildir.

Or you could write an NNTP -> pipe-to-arbitrary command script
which can pipe to Mailman or mlmmj, keeping track of which
messages have already been sent (or wait for me to write it).

I've been meaning to replace scripts/ssoma-replay (for mlmmj)
with that script at some point.  I've never actually run
Mailman, but I assume it allows raw messages to be piped to it
like most mail tools.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Do I need multiple publicinbox.<name>.address values?
  2019-10-10  8:31             ` Do I need multiple publicinbox.<name>.address values? Eric Wong
@ 2019-10-10 10:56               ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-10 10:56 UTC (permalink / raw)
  To: Eric Wong; +Cc: Alyssa Ross, meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> 
>> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> >> 	my $tracker = PublicInbox::IMAPTracker->new();
>> >
>> > Thanks.  What's PublicInbox::IMAPTracker?
>> 
>> Something that keeps the last fetched UID in an sqlite database.
>> I will follow up with a patch for that as well.
>
> Thanks!
>
>> I haven't been brave enough to let this script delete any mail
>> yet so I need to track what has been fetched.  Something that
>> will be rolled back if the email message isn't commited into git.
>> 
>> I have a companion script that will delete mail.
>
> Yup.  I also don't trust and don't want somebody trusting my
> code when it comes to deleting stuff.
>
> So I've also been considering a script which deletes mail from
> my Maildirs/IMAP folders if a ContentId matches what's in a
> known public-inbox, perhaps after a certain time...

Interesting.

Partly I have been leaving things up so I could restart the process
if the script fails.

I like the idea of a verifier/deleter.

I have a delete up to my imap tracker script.  A verifier will probably
share a lot of logic with the imap importer script, but since it only
has to confirm something is stable before deleting, it isn't quite the
same.

Time permitting I will play with that next.

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order
  2019-10-10  9:43                 ` Eric Wong
@ 2019-10-10 11:05                   ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-10 11:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 
>> Date: Thu, 16 May 2019 19:26:47 -0500
>> 
>> To make the results reproducible and comprehensible when
>> a large number of mail boxes are being processed process the
>> mail boxes in sorted order, instead of in random hash order.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  lib/PublicInbox/Config.pm | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
>> index 9f3f8df7eeaa..9ec00b0ddb6d 100644
>> --- a/lib/PublicInbox/Config.pm
>> +++ b/lib/PublicInbox/Config.pm
>> @@ -128,7 +128,7 @@ sub each_inbox {
>>  		}
>>  	} else {
>>  		my %seen;
>> -		foreach my $k (keys %$self) {
>> +		foreach my $k (sort keys %$self) {
>>  			$k =~ m!\Apublicinbox\.([^/]+)\.mainrepo\z! or next;
>>  			next if $seen{$1};
>>  			$seen{$1} = 1;
>
> I'm not sure if that code path happens outside of tests.
> If it's parsing "git config -l" output, it'll use section_order
> which preserves the ordering of the config file.
>
> Will take a deeper look tomorrow (and possibly rewrite those
> tests to parse a scalarref instead of just taking a hashref).

I know I have at least one use in my import_imap_inbox script.

Looking at the dates it looks like I added this in July 2018, and you
added section order in January 2019.  So it may be you have a better fix
in place already and I simply hadn't noticed.

Eric


^ permalink raw reply	[flat|nested] 23+ messages in thread

* ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox]
  2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
                                 ` (3 preceding siblings ...)
  2019-10-09  8:25               ` [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
@ 2019-10-10 19:08               ` Eric Wong
  2019-10-10 21:23                 ` Eric W. Biederman
  4 siblings, 1 reply; 23+ messages in thread
From: Eric Wong @ 2019-10-10 19:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong,
> 
> These should all of my generic patches to support my import_imap_mailbox
> script.  The really important patch that adds to the support for List-ID
> to public inbox configuration file I have already sent.
> 
> I haven't written tests and I get the following test failure when I run
> make test
> 
> > t/config.t ................. 1/? 
> > #   Failed test 'lookup matches expected output'
> > #   at t/config.t line 26.
> > #     Structures begin differing at:
> > #          $got->{listid} = ARRAY(0x55c1d4e3b6a8)
> > #     $expected->{listid} = Does not exist
> > 
> > #   Failed test 'lookup matches expected output for test'
> > #   at t/config.t line 42.
> > #     Structures begin differing at:
> > #          $got->{listid} = ARRAY(0x55c1d508d8d0)
> > #     $expected->{listid} = Does not exist
> > # Looks like you failed 2 tests of 69.
> > t/config.t ................. Dubious, test returned 2 (wstat 512, 0x200)
> 
> I haven't looked into what is happening there.

Hey Eric, I'll squash this in to fix the tests:

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 9f3f8df7..c2fa40f9 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -449,8 +449,10 @@ sub _fill {
 		$self->{-by_addr}->{$lc_addr} = $ibx;
 		$self->{-no_obfuscate}->{$lc_addr} = 1;
 	}
-	foreach my $list_id (@{$ibx->{listid}}) {
-		$self->{-by_list_id}->{$list_id} = $ibx;
+	if (my $listids = $ibx->{listid}) {
+		foreach my $list_id (@$listids) {
+			$self->{-by_list_id}->{$list_id} = $ibx;
+		}
 	}
 	if (my $ng = $ibx->{newsgroup}) {
 		$self->{-by_newsgroup}->{$ng} = $ibx;

Perl has this weird feature where it autovivifies arrayrefs (and
hashrefs) on member access.

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox]
  2019-10-10 19:08               ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong
@ 2019-10-10 21:23                 ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-10 21:23 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> Eric Wong,
>> 
>> These should all of my generic patches to support my import_imap_mailbox
>> script.  The really important patch that adds to the support for List-ID
>> to public inbox configuration file I have already sent.
>> 
>> I haven't written tests and I get the following test failure when I run
>> make test
>> 
>> > t/config.t ................. 1/? 
>> > #   Failed test 'lookup matches expected output'
>> > #   at t/config.t line 26.
>> > #     Structures begin differing at:
>> > #          $got->{listid} = ARRAY(0x55c1d4e3b6a8)
>> > #     $expected->{listid} = Does not exist
>> > 
>> > #   Failed test 'lookup matches expected output for test'
>> > #   at t/config.t line 42.
>> > #     Structures begin differing at:
>> > #          $got->{listid} = ARRAY(0x55c1d508d8d0)
>> > #     $expected->{listid} = Does not exist
>> > # Looks like you failed 2 tests of 69.
>> > t/config.t ................. Dubious, test returned 2 (wstat 512, 0x200)
>> 
>> I haven't looked into what is happening there.
>
> Hey Eric, I'll squash this in to fix the tests:
>
> diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
> index 9f3f8df7..c2fa40f9 100644
> --- a/lib/PublicInbox/Config.pm
> +++ b/lib/PublicInbox/Config.pm
> @@ -449,8 +449,10 @@ sub _fill {
>  		$self->{-by_addr}->{$lc_addr} = $ibx;
>  		$self->{-no_obfuscate}->{$lc_addr} = 1;
>  	}
> -	foreach my $list_id (@{$ibx->{listid}}) {
> -		$self->{-by_list_id}->{$list_id} = $ibx;
> +	if (my $listids = $ibx->{listid}) {
> +		foreach my $list_id (@$listids) {
> +			$self->{-by_list_id}->{$list_id} = $ibx;
> +		}
>  	}
>  	if (my $ng = $ibx->{newsgroup}) {
>  		$self->{-by_newsgroup}->{$ng} = $ibx;
>
> Perl has this weird feature where it autovivifies arrayrefs (and
> hashrefs) on member access.

Thank you.

It makes complete sense from that perspective.

If I didn't already know things were rough around the
edges with these patches I would be embarrased.

Eric




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add
  2019-10-09  8:16               ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman
@ 2019-10-15 20:26                 ` Eric Wong
  2019-10-15 23:05                   ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Wong @ 2019-10-15 20:26 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> Date: Tue, 15 Jan 2019 16:36:42 -0600
> 
> I don't trust the MIME type to not munge my email messages in horrible
> ways upon occasion.  Therefore  allow for passing in the raw message
> value instead of trusting the mime object to preserve it.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> As we discussed last time I was working to merge my imap import script.

Thanks, pushed with the following interdiff:

  diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
  index b17c9d5c..e1f48771 100644
  --- a/lib/PublicInbox/Import.pm
  +++ b/lib/PublicInbox/Import.pm
  @@ -394,8 +394,7 @@ sub add {
   	}
   
   	my $blob = $self->{mark}++;
  -	my $raw_email = $mime->{-public_inbox_raw};
  -	$raw_email ||= $mime->as_string;
  +	my $raw_email = $mime->{-public_inbox_raw} // $mime->as_string;
   	my $n = length($raw_email);
   	$self->{bytes_added} += $n;
   	print $w "blob\nmark :$blob\ndata ", $n, "\n" or wfail;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add
  2019-10-15 20:26                 ` Eric Wong
@ 2019-10-15 23:05                   ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2019-10-15 23:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 
>> Date: Tue, 15 Jan 2019 16:36:42 -0600
>> 
>> I don't trust the MIME type to not munge my email messages in horrible
>> ways upon occasion.  Therefore  allow for passing in the raw message
>> value instead of trusting the mime object to preserve it.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> 
>> As we discussed last time I was working to merge my imap import script.
>
> Thanks, pushed with the following interdiff:
>
>   diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
>   index b17c9d5c..e1f48771 100644
>   --- a/lib/PublicInbox/Import.pm
>   +++ b/lib/PublicInbox/Import.pm
>   @@ -394,8 +394,7 @@ sub add {
>    	}
>    
>    	my $blob = $self->{mark}++;
>   -	my $raw_email = $mime->{-public_inbox_raw};
>   -	$raw_email ||= $mime->as_string;
>   +	my $raw_email = $mime->{-public_inbox_raw} // $mime->as_string;
>    	my $n = length($raw_email);
>    	$self->{bytes_added} += $n;
>    	print $w "blob\nmark :$blob\ndata ", $n, "\n" or wfail;

Looks good.  I hadn't realized that // was an operator in perl.
It is nice to see.

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-10-15 23:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 22:13 Do I need multiple publicinbox.<name>.address values? Alyssa Ross
2019-10-08  0:10 ` Eric Wong
2019-10-08 12:18   ` Eric W. Biederman
2019-10-08 12:23     ` [PATCH] Config.pm: Add support for mailing list information Eric W. Biederman
2019-10-08 22:11     ` Do I need multiple publicinbox.<name>.address values? Eric Wong
2019-10-08 22:24       ` Eric W. Biederman
2019-10-08 22:41         ` Eric Wong
2019-10-09  7:58           ` Eric W. Biederman
2019-10-09  8:15             ` [PATCH 0/4] Various bits to support import_imap_mailbox Eric W. Biederman
2019-10-09  8:16               ` [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add Eric W. Biederman
2019-10-15 20:26                 ` Eric Wong
2019-10-15 23:05                   ` Eric W. Biederman
2019-10-09  8:17               ` [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order Eric W. Biederman
2019-10-10  9:43                 ` Eric Wong
2019-10-10 11:05                   ` Eric W. Biederman
2019-10-09  8:23               ` [PATCH 3/4] Config.pm: Add support for looking up repos by their directories Eric W. Biederman
2019-10-09  8:25               ` [PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
2019-10-10 19:08               ` ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox] Eric Wong
2019-10-10 21:23                 ` Eric W. Biederman
2019-10-10  8:31             ` Do I need multiple publicinbox.<name>.address values? Eric Wong
2019-10-10 10:56               ` Eric W. Biederman
2019-10-09 11:59   ` Alyssa Ross
2019-10-10 10:06     ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).