user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* Mirroring mailing lists directly
@ 2019-02-23  0:35 Mateusz Loskot
  2019-02-23  3:10 ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Mateusz Loskot @ 2019-02-23  0:35 UTC (permalink / raw)
  To: meta

Hi,

I'm trying to figure out how to "mirror existing mailing lists" [1]
in practice. I've got Debian 9 with all dependencies installed.
I also did [2]

git clone https://public-inbox.org/ public-inbox

Next, AFAIU, is to set up inbox watcher [3].

However, I'm missing some details of the bigger picture:
I'm going to host a docker or VM somewhere, and I'd like to set it up
as a mirror of all new posts.
If I manage to get mbox archives, I may also try to import existing
archives, but that is for later.

How to actually deliver mailing list posts to the public-inbox watch?
Could anyone mirroring a list out there share any details on setup of
the public-inbox mirror host?

[1] https://public-inbox.org/public-inbox-overview.html
[2] https://public-inbox.org/README.html
[3] https://public-inbox.org/public-inbox-watch.html
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mirroring mailing lists directly
  2019-02-23  0:35 Mirroring mailing lists directly Mateusz Loskot
@ 2019-02-23  3:10 ` Eric Wong
  2019-02-23 21:19   ` Mateusz Loskot
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2019-02-23  3:10 UTC (permalink / raw)
  To: Mateusz Loskot; +Cc: meta

Mateusz Loskot <mateusz@loskot.net> wrote:
> Hi,
> 
> I'm trying to figure out how to "mirror existing mailing lists" [1]
> in practice. I've got Debian 9 with all dependencies installed.
> I also did [2]
> 
> git clone https://public-inbox.org/ public-inbox
> 
> Next, AFAIU, is to set up inbox watcher [3].
> 
> However, I'm missing some details of the bigger picture:
> I'm going to host a docker or VM somewhere, and I'd like to set it up
> as a mirror of all new posts.

Fwiw, a chroot also works fine and requires less tools to download
(but less isolation than containers or VMs).

> If I manage to get mbox archives, I may also try to import existing
> archives, but that is for later.

In that case, you might want to try the newish --skip feature
which leaves epoch space when running public-inbox-init:

	https://public-inbox.org/meta/20181228101611.16702-1-e@80x24.org/

(no padding for old NNTP article numbers, yet :x)

> How to actually deliver mailing list posts to the public-inbox watch?

-watch currently requires mail to be delivered to a Maildir.
I use offlineimap for that; but mbsync (isync) or other
similar tools should work, too.

If you run your own MTA, using public-inbox-mda is a
possibility, too; but I figure more people have IMAP or Maildir
access than run their own MTAs.

> Could anyone mirroring a list out there share any details on setup of
> the public-inbox mirror host?

Is the example at the top of

	https://public-inbox.org/public-inbox-watch.html

not enough?

For the git mailing list, I also have a "filter" attribute
to kill signatures in old mails:

[publicinbox "git"]
	address = git@vger.kernel.org
	watch = maildir:/home/ew/.maildir/.INBOX.git
	watchheader = X-Mailing-List:git@vger.kernel.org
	filter = PublicInbox::Filter::Vger


I also use the ListMirror SpamAssassin plugin because I'm
paranoid about mail only hitting the archives, but not going
through vger, first:

  https://public-inbox.org/meta/20160624204718.27540-1-e@80x24.org/

One (of many) goals I have for the web interface is to expose
part of the config so it's easier to setup mirrors of existing
list.

But I also don't want to be exposing local pathnames or resource
limiter details (Qspawn stuff)  since that can be used to aid
attackers.

Please let us know if there's specific stuff to clarify in docs
or if the getting mail-to-Maildir was the primary thing.  I have
a lot on my plate :x

> [1] https://public-inbox.org/public-inbox-overview.html
> [2] https://public-inbox.org/README.html
> [3] https://public-inbox.org/public-inbox-watch.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mirroring mailing lists directly
  2019-02-23  3:10 ` Eric Wong
@ 2019-02-23 21:19   ` Mateusz Loskot
  2019-02-23 22:07     ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Mateusz Loskot @ 2019-02-23 21:19 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Sat, 23 Feb 2019 at 04:10, Eric Wong <e@80x24.org> wrote:
> Mateusz Loskot <mateusz@loskot.net> wrote:
> >
> > I'm trying to figure out how to "mirror existing mailing lists" [1]
> > in practice. I've got Debian 9 with all dependencies installed.
> > [...]
> > How to actually deliver mailing list posts to the public-inbox watch?
>
> -watch currently requires mail to be delivered to a Maildir.
> I use offlineimap for that; but mbsync (isync) or other
> similar tools should work, too.

This is helpful. I will try offlineimap.

> If you run your own MTA

No, I'd prefer to avoid own MTA.

> > Could anyone mirroring a list out there share any details on setup of
> > the public-inbox mirror host?
>
> Is the example at the top of
>
>         https://public-inbox.org/public-inbox-watch.html
>
> not enough?

The example is clear and I think it should be enough for me.

All the details you gave about configs will be useful, I'm sure.
It's just that I'm not there yet. I'm still missing some
aspects of the bigger picture of mirroring a mailing list.

I've realised, I'm missing an outline of the overall procedure:

0. Install public-inbox and its dependencies
1. Find mailing list to mirror e.g. public@list.org
2. Get a new e-mail address e.g. mirror@user.org
3. Subscribe to list@host.org with mirror@user.org
4. Set up (to run manually or daemon) offlineimap to
   sync from mirror@user.org to local Maildir
5. Set up public-inbox-watch
6. Set up public-inbox-httpd to publish via HTTP
7. Set up public-inbox-index to enable search
8. Set up git daemon to allow `git clone` access to mirrored archives.

Is this plan correct, complete or am I missing anything?

Do I need to bother with public-inbox-watch's bidirectional sync?

> Please let us know if there's specific stuff to clarify in docs
> or if the getting mail-to-Maildir was the primary thing.  I have
> a lot on my plate :x

The docs of the public-inbox toolset are clear and they seem complete.
As explained above, I am missing a basic "where do I start" to
create mirror, especially for someone who is not a sysadmin,
like myself :)

Once I get to the 4. and later points of the plan above,
I will get back to your other suggestions on the implementation
details.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mirroring mailing lists directly
  2019-02-23 21:19   ` Mateusz Loskot
@ 2019-02-23 22:07     ` Eric Wong
  2019-02-26 17:54       ` Mateusz Łoskot
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2019-02-23 22:07 UTC (permalink / raw)
  To: Mateusz Loskot; +Cc: meta

Mateusz Loskot <mateusz@loskot.net> wrote:
> All the details you gave about configs will be useful, I'm sure.
> It's just that I'm not there yet. I'm still missing some
> aspects of the bigger picture of mirroring a mailing list.
> 
> I've realised, I'm missing an outline of the overall procedure:
> 
> 0. Install public-inbox and its dependencies
> 1. Find mailing list to mirror e.g. public@list.org
> 2. Get a new e-mail address e.g. mirror@user.org
> 3. Subscribe to list@host.org with mirror@user.org

I typically use my normal address and not a list-specific one.

> 4. Set up (to run manually or daemon) offlineimap to
>    sync from mirror@user.org to local Maildir

You need to initialize the inbox repo:

  public-inbox-init -V2 NAME /path/to/inbox HTTP_URL list@host.org

(I still need to write the manpage for -init :x)

> 5. Set up public-inbox-watch
> 6. Set up public-inbox-httpd to publish via HTTP

Correct.

> 7. Set up public-inbox-index to enable search

Running -index separately is not necessary if you used "-V2" for
public-inbox-init right now.  V2 repos are significantly more
scalable when you have hundreds of thousands of messages, but
require SQLite and Xapian.

(*) Xapian dependency can be removed, though

For future updates to public-inbox code itself, you may need to
run public-inbox-index if the Xapian schema changes
incompatibly.

> 8. Set up git daemon to allow `git clone` access to mirrored archives.

public-inbox-httpd already supports smart HTTP clone.
git-daemon is only necessary for git://, which seems to be
falling out of favor given the popularity of HTTP/HTTPS.

> Is this plan correct, complete or am I missing anything?

Looks close to me.

I also suggest running public-inbox-nntpd in addition to -httpd
for NNTP users.  It shares a common core with -nntpd and I have
plans for a combined server to minimize memory use.

> Do I need to bother with public-inbox-watch's bidirectional sync?

There is no bidirectional sync in -watch.  Perhaps you mean
offlineimap?


You may also want to prevent your Maildir and IMAP folder from
growing too large.  You can setup a cronjob to remove old mails
from the Maildir; which causes offlineimap bidirectional sync to
remove the old messages from IMAP, too.

The following example removes mails older than 7 days:

   cd /path/to/Maildir &&
   find new cur -ctime +7 -type f -print0 | xargs -0 rm -f

> The docs of the public-inbox toolset are clear and they seem complete.
> As explained above, I am missing a basic "where do I start" to
> create mirror, especially for someone who is not a sysadmin,
> like myself :)

Yes, public-inbox-overview.pod probably needs to be updated.

> Once I get to the 4. and later points of the plan above,
> I will get back to your other suggestions on the implementation
> details.

Alright, please do :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mirroring mailing lists directly
  2019-02-23 22:07     ` Eric Wong
@ 2019-02-26 17:54       ` Mateusz Łoskot
  2019-02-26 23:19         ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Mateusz Łoskot @ 2019-02-26 17:54 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Sat, Feb 23, 2019 at 10:07:38PM +0000, Eric Wong wrote:
>Mateusz Loskot <mateusz@loskot.net> wrote:
>> [...]
>> Is this plan correct, complete or am I missing anything?
>
>Looks close to me.

Thank you for the confirmation and all the very useful tips.

>I also suggest running public-inbox-nntpd in addition to -httpd
>for NNTP users.  It shares a common core with -nntpd and I have
>plans for a combined server to minimize memory use.

Yes, I will consider. Good idea.

>> Do I need to bother with public-inbox-watch's bidirectional sync?
>
>There is no bidirectional sync in -watch.  Perhaps you mean
>offlineimap?

Of course, it's offlineimap.


>You may also want to prevent your Maildir and IMAP folder from
>growing too large.  You can setup a cronjob to remove old mails
>from the Maildir; which causes offlineimap bidirectional sync to
>remove the old messages from IMAP, too.
>
>The following example removes mails older than 7 days:
>
>   cd /path/to/Maildir &&
>   find new cur -ctime +7 -type f -print0 | xargs -0 rm -f

Good idea. I have successfully married offlineimap and mutt
a few days ago, for GMail, so I now have pretty good
understanding of this workflow.

>> Once I get to the 4. and later points of the plan above,
>> I will get back to your other suggestions on the implementation
>> details.
>
>Alright, please do :)

I haven't done it yet.
Meanwhile, if it's nota secret could you tell a bit about
hosting of the public-inbox.org?

Is this a VPS or a VM solution, a docker container?
If a container, could you tell where do you host it?
I'm still looking for options.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mirroring mailing lists directly
  2019-02-26 17:54       ` Mateusz Łoskot
@ 2019-02-26 23:19         ` Eric Wong
  2019-02-27  0:28           ` Mateusz Łoskot
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2019-02-26 23:19 UTC (permalink / raw)
  To: Mateusz Łoskot; +Cc: meta

Mateusz Łoskot <mateusz@loskot.net> wrote:
> Meanwhile, if it's nota secret could you tell a bit about
> hosting of the public-inbox.org?
> 
> Is this a VPS or a VM solution, a docker container?
> If a container, could you tell where do you host it?
> I'm still looking for options.

I've been using a $20/month VPS since 2008 (USD).  Nowadays it's
up to 2 cores, 4GB RAM which is more than I need but I keep
getting upgrades without paying more.  Multiple cores definitely
helps, though; especially with SpamAssassin and incoming mail.

I don't give commercial endorsements, but dig/traceroute should
give you info about where it's hosted.  If you're mirroring
lists I host, I prefer you host it elsewhere to avoid putting
too much dependency on a single provider or datacenter.

Process setup is:

public-inbox-httpd -> varnish -> yet-another-horribly-named-server (HTTP/HTTPS)
                             \
			      >- tor (.onion)
public-inbox-nntpd ----------/
                   \
		    `--------(NNTP port 119)

yet-another-horribly-named-server is an experimental GPL-3 Ruby
server which does HTTPS termination and hosts some other Ruby
stuff on the same IP (different vhost).  nginx is a more common
replacement and recommended for most sites :)

public-inbox-nntpd should learn TLS anyways, so doing HTTPS from
public-inbox-httpd won't be far off, either.  Making varnish
unnecessary is another goal, but could be tougher...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mirroring mailing lists directly
  2019-02-26 23:19         ` Eric Wong
@ 2019-02-27  0:28           ` Mateusz Łoskot
  2019-02-27  0:41             ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Mateusz Łoskot @ 2019-02-27  0:28 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

O 19-02-26, Eric Wong wrote:
>Mateusz Łoskot <mateusz@loskot.net> wrote:
>> Meanwhile, if it's nota secret could you tell a bit about
>> hosting of the public-inbox.org?
>>
>> Is this a VPS or a VM solution, a docker container?
>> If a container, could you tell where do you host it?
>> I'm still looking for options.
>
>I've been using a $20/month VPS since 2008 (USD).  Nowadays it's
>up to 2 cores, 4GB RAM which is more than I need but I keep
>getting upgrades without paying more.  Multiple cores definitely
>helps, though; especially with SpamAssassin and incoming mail.

For a mailing list mirror, is SpamAssassing necessary?

My understanding is taht a mirror just pulls posts that have already
reached the mailing list.

>I don't give commercial endorsements (...(

I perfectly understand that.

>If you're mirroring lists I host, I prefer you host it elsewhere
>to avoid putting too much dependency on a single provider or datacenter.

I will keep that in mind. First, I'm going to mirror a different list.
I will let you know when I succeed :)

>Process setup is:
>
>public-inbox-httpd -> varnish -> yet-another-horribly-named-server (HTTP/HTTPS)
>                             \
>			      >- tor (.onion)
>public-inbox-nntpd ----------/
>                   \
>		    `--------(NNTP port 119)
>
>yet-another-horribly-named-server is an experimental GPL-3 Ruby
>server which does HTTPS termination and hosts some other Ruby
>stuff on the same IP (different vhost).  nginx is a more common
>replacement and recommended for most sites :)
>
>public-inbox-nntpd should learn TLS anyways, so doing HTTPS from
>public-inbox-httpd won't be far off, either.  Making varnish
>unnecessary is another goal, but could be tougher...

I will try simplest setup possible.

Once again, thank you!

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mirroring mailing lists directly
  2019-02-27  0:28           ` Mateusz Łoskot
@ 2019-02-27  0:41             ` Eric Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2019-02-27  0:41 UTC (permalink / raw)
  To: Mateusz Łoskot; +Cc: meta

Mateusz Łoskot <mateusz@loskot.net> wrote:
> For a mailing list mirror, is SpamAssassing necessary?
> 
> My understanding is taht a mirror just pulls posts that have already
> reached the mailing list.

It depends on how good the mailing list is at filtering spam.
vger misses some spam which my local SA instance catches.
It doesn't seem necessary with GNU lists.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-27  0:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-23  0:35 Mirroring mailing lists directly Mateusz Loskot
2019-02-23  3:10 ` Eric Wong
2019-02-23 21:19   ` Mateusz Loskot
2019-02-23 22:07     ` Eric Wong
2019-02-26 17:54       ` Mateusz Łoskot
2019-02-26 23:19         ` Eric Wong
2019-02-27  0:28           ` Mateusz Łoskot
2019-02-27  0:41             ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).