user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [RFC] ssoma-mda: Use the email subject as the commit message
@ 2014-10-18 20:19 W. Trevor King
  2014-10-18 21:04 ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: W. Trevor King @ 2014-10-18 20:19 UTC (permalink / raw)
  To: meta; +Cc: W. Trevor King

This is more interesting than just using 'mda' all the time, but it's
harder to setup proper quoting around the message without using
third-party Perl modules (e.g. IPC::Run or String::ShellQuote).  This
proof-of-concept patch just assumes the subject doesn't contain
single-quotes (').  This patch also doesn't handle the empty/missing
subject case, which should probably fall back to '<no subject>' or
some such.

I'm fine dropping support for older Gits here, and just using the -m
option to commit-tree.  That landed with 96b8d93a (commit-tree: teach
-m/-F options to read logs from elsewhere, 2011-11-09) in Git v1.7.9,
which was released over 2.5 years ago on 2012-01-27.

It would also be useful (I think) to set the GIT_AUTHOR_NAME,
GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables from the
message header before committing.  I know how to do that using
Python's subprocess module, but I don't know the Perl incantation.
---
Is there any interest in a Python port of ssoma?  The subprocess
handling in Perl's standard libraries is not my favorite ;).  I expect
we could handle all of ssoma without leaving Python's standard
libraries.  For an example of a related Perl -> Python rewrite that I
just landed, see nmbug [1,2,3].

Cheers,
Trevor

[1]: http://notmuchmail.org/nmbug/
[2]: http://thread.gmane.org/gmane.mail.notmuch.general/19189
     id:cover.1412359989.git.wking@tremily.us
[3]: http://article.gmane.org/gmane.mail.notmuch.general/19272
     id:2a9f3e7423fe3ab95c2c6fbd6047aed935b6463b.1412703127.git.wking@tremily.us

 lib/Ssoma/Git.pm | 5 ++---
 lib/Ssoma/MDA.pm | 3 ++-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/Ssoma/Git.pm b/lib/Ssoma/Git.pm
index 2211893..e8d4cf6 100644
--- a/lib/Ssoma/Git.pm
+++ b/lib/Ssoma/Git.pm
@@ -266,10 +266,9 @@ sub commit_index {
 	my @cmd = qw/git commit-tree/;
 	push @cmd, $tree;
 	push @cmd, '-p', $parent if $parent;
+	push @cmd, '-m', "'$message'";
 
-	# git commit-tree -m didn't work in older git versions
-	$message =~ /\A\w+\z/ or die "message must be \\w+ only\n";
-	my $commit = $self->qx_sha1("echo $message |". join(' ', @cmd));
+	my $commit = $self->qx_sha1(join(' ', @cmd));
 
 	# update the ref
 	@cmd = (qw/git update-ref/, $ref, $commit);
diff --git a/lib/Ssoma/MDA.pm b/lib/Ssoma/MDA.pm
index 02816a5..6b58b43 100644
--- a/lib/Ssoma/MDA.pm
+++ b/lib/Ssoma/MDA.pm
@@ -102,7 +102,8 @@ sub append {
 		my $id = $git->simple_to_blob($simple);
 		$gii->update('100644', $id, $path);
 	}
-	$git->commit_index($gii, 0, $ref, "mda");
+	my $subject = $simple->header("Subject");
+	$git->commit_index($gii, 0, $ref, $subject);
 }
 
 # the main entry point takes an Email::Simple object
-- 
2.1.0.60.g85f0837


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-18 20:19 [RFC] ssoma-mda: Use the email subject as the commit message W. Trevor King
@ 2014-10-18 21:04 ` Eric Wong
  2014-10-18 21:50   ` W. Trevor King
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2014-10-18 21:04 UTC (permalink / raw)
  To: W. Trevor King; +Cc: meta

"W. Trevor King" <wking@tremily.us> wrote:
> This is more interesting than just using 'mda' all the time, but it's
> harder to setup proper quoting around the message without using
> third-party Perl modules (e.g. IPC::Run or String::ShellQuote).  This
> proof-of-concept patch just assumes the subject doesn't contain
> single-quotes (').  This patch also doesn't handle the empty/missing
> subject case, which should probably fall back to '<no subject>' or
> some such.

Right, carelessness here would open us up to command injection.  It
would also need to work with internationalized subjects.  I considered
it for public-inbox-mda; but decided it was not worth the trouble.

> It would also be useful (I think) to set the GIT_AUTHOR_NAME,
> GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables from the
> message header before committing.  I know how to do that using
> Python's subprocess module, but I don't know the Perl incantation.

That's done in public-inbox-mda using

	local $ENV{...} = ...

And more Email::* modules to properly decode various email addresses and
internationalized names.  I wanted to keep ssoma as lean and dumb as
possible.

> Is there any interest in a Python port of ssoma?  The subprocess
> handling in Perl's standard libraries is not my favorite ;).  I expect
> we could handle all of ssoma without leaving Python's standard
> libraries.  For an example of a related Perl -> Python rewrite that I
> just landed, see nmbug [1,2,3].

I think you're the only one who's shown any interest in ssoma at all :)
I would love to have multiple implementations of ssoma and want a Ruby
one, too.  However I don't think using Python/Ruby would increase it's
ease-of-installation or adoption much (and most of my software is Ruby).

Fwiw, the commit subject/message currently has no bearing on the way
ssoma or public-inbox handles the mail data.  So another implementation
is free to use more metadata in the commit message.

I've considered adding fuzzy generation counters to commit messages to
public-inbox to allow easier history traversals; but decided it's
probably better to do in any out-of-band, easily-regenerated store
using sqlite or similar (this may help with adding search support to
the web UI as well).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-18 21:04 ` Eric Wong
@ 2014-10-18 21:50   ` W. Trevor King
  2014-10-18 23:43     ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: W. Trevor King @ 2014-10-18 21:50 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 5396 bytes --]

On Sat, Oct 18, 2014 at 09:04:00PM +0000, Eric Wong wrote:
> W. Trevor King wrote:
> > This is more interesting than just using 'mda' all the time, but
> > it's harder to setup proper quoting around the message without
> > using third-party Perl modules (e.g. IPC::Run or
> > String::ShellQuote).  This proof-of-concept patch just assumes the
> > subject doesn't contain single-quotes (').  This patch also
> > doesn't handle the empty/missing subject case, which should
> > probably fall back to '<no subject>' or some such.
> 
> Right, carelessness here would open us up to command injection.

There's no chance of carelessness if you're using a subprocess
launcher that's based on execve (see exec(3)) instead of using a
shell.

> It would also need to work with internationalized subjects.  I
> considered it for public-inbox-mda; but decided it was not worth the
> trouble.

Python handles that out of the box without difficulty [1].  In Python
3:

  >>> import email.header  
  >>> h = email.header.Header('p\xf6stal', 'iso-8859-1')
  >>> str(h)
  'pöstal'

In Python 2, you just need to import the unicode_literals future [2]
and use unicode() instead of str().  It's easy to bind the appropriate
to-Unicode function to a unicode_str helper depending on the Python
version if you want a code-base compatible with both.

> > It would also be useful (I think) to set the GIT_AUTHOR_NAME,
> > GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables from
> > the message header before committing.  I know how to do that using
> > Python's subprocess module, but I don't know the Perl incantation.
> 
> That's done in public-inbox-mda using
> 
> 	local $ENV{...} = ...
> 
> And more Email::* modules to properly decode various email addresses
> and internationalized names.  I wanted to keep ssoma as lean and
> dumb as possible.

It doesn't seem like *that* much more complication ;).  Can we make it
optional, and error out if it's enabled and the appropriate decoding
modules aren't present?  It seems like you'd want to handle input and
local browsing with ssoma and then point public-inbox at the resulting
Git archive.  Collecting the archive should be independent of serving
it over HTTP.

> > Is there any interest in a Python port of ssoma?  The subprocess
> > handling in Perl's standard libraries is not my favorite ;).  I
> > expect we could handle all of ssoma without leaving Python's
> > standard libraries.  For an example of a related Perl -> Python
> > rewrite that I just landed, see nmbug [1,2,3].
> 
> I think you're the only one who's shown any interest in ssoma at all
> :)
>
> I would love to have multiple implementations of ssoma and want a Ruby
> one, too.  However I don't think using Python/Ruby would increase it's
> ease-of-installation or adoption much (and most of my software is Ruby).

I would have tried it sooner if it had been written in a language I
liked ;).  I'm not familiar with Ruby's email-parsing modules, but I
am familiar with Python's.

What do you see as the ease-of-installation and adoption barriers?
I'd guess they're just “Gmane works pretty well”.

> Fwiw, the commit subject/message currently has no bearing on the way
> ssoma or public-inbox handles the mail data.  So another
> implementation is free to use more metadata in the commit message.

Right, but if you're going to put something into Git, you might as
well make the history pleasant to browse ;).

> I've considered adding fuzzy generation counters to commit messages to
> public-inbox to allow easier history traversals; but decided it's
> probably better to do in any out-of-band, easily-regenerated store
> using sqlite or similar (this may help with adding search support to
> the web UI as well).

Fuzzy generation counters?  For search, I'd just run a local notmuch
index [3].  It already has Python, Ruby, and Go bindings [4], although
I'm not sure how mature the non-Python ones are (219 commits touch
bindings/python, but only 38 and 20 touch bindings/ruby and
bindings/go).  Of course, you can always call the notmuch command-line
client as a subprocess if the bindings don't work for you. Personally,
I'd rather use ssoma for aggregating and sharing the archive, and then
notmuch to handle threading and search, with a read-only web frontent
in front of notmuch, that just hit the ssoma archive for message
bodies (but served thread lists and such straight from notmuch,
hitting the Xapian database but not the ssoma archives).

I think ssoma + notmuch + nmbug is a good pairing for users too, since
you'll generally want the whole archive locally for that (although
with the in-flight ghost message series for notmuch [5,6], having the
whole archive locally will move from “you really want this” to “you
probably want this”).

Cheers,
Trevor

[1]: https://docs.python.org/3/library/email.header.html
[2]: https://docs.python.org/2/library/__future__.html
[3]: http://notmuchmail.org/
[4]: http://git.notmuchmail.org/git/notmuch/tree/HEAD:/bindings
[5]: http://notmuchmail.org/pipermail/notmuch/2014/019160.html
[6]: http://notmuchmail.org/pipermail/notmuch/2014/019235.html

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-18 21:50   ` W. Trevor King
@ 2014-10-18 23:43     ` Eric Wong
  2014-10-19  3:48       ` W. Trevor King
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2014-10-18 23:43 UTC (permalink / raw)
  To: W. Trevor King; +Cc: meta

"W. Trevor King" <wking@tremily.us> wrote:
> On Sat, Oct 18, 2014 at 09:04:00PM +0000, Eric Wong wrote:
> > W. Trevor King wrote:
> > > This is more interesting than just using 'mda' all the time, but
> > > it's harder to setup proper quoting around the message without
> > > using third-party Perl modules (e.g. IPC::Run or
> > > String::ShellQuote).  This proof-of-concept patch just assumes the
> > > subject doesn't contain single-quotes (').  This patch also
> > > doesn't handle the empty/missing subject case, which should
> > > probably fall back to '<no subject>' or some such.
> > 
> > Right, carelessness here would open us up to command injection.
> 
> There's no chance of carelessness if you're using a subprocess
> launcher that's based on execve (see exec(3)) instead of using a
> shell.

Right.  I'd probably use IPC::Run in Perl since public-inbox already
depends on it; but probably optionally (as mentioned below)

> > It would also need to work with internationalized subjects.  I
> > considered it for public-inbox-mda; but decided it was not worth the
> > trouble.
> 
> Python handles that out of the box without difficulty [1].  In Python

<snip> Good to know.

> In Python 2, you just need to import the unicode_literals future [2]
> and use unicode() instead of str().  It's easy to bind the appropriate
> to-Unicode function to a unicode_str helper depending on the Python
> version if you want a code-base compatible with both.

How's the Python 2->3 transition these days?
(sorry, not familiar with Python, my brain didn't "get it")

> > > It would also be useful (I think) to set the GIT_AUTHOR_NAME,
> > > GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables from
> > > the message header before committing.  I know how to do that using
> > > Python's subprocess module, but I don't know the Perl incantation.
> > 
> > That's done in public-inbox-mda using
> > 
> > 	local $ENV{...} = ...
> > 
> > And more Email::* modules to properly decode various email addresses
> > and internationalized names.  I wanted to keep ssoma as lean and
> > dumb as possible.
> 
> It doesn't seem like *that* much more complication ;).  Can we make it
> optional, and error out if it's enabled and the appropriate decoding
> modules aren't present?

Sounds like a good idea to make it fall back if require fails.
Can you do it or would you like me to handle it in Perl?

> It seems like you'd want to handle input and
> local browsing with ssoma and then point public-inbox at the resulting
> Git archive.  Collecting the archive should be independent of serving
> it over HTTP.

public-inbox also wraps spam filtering/learning (SpamAssassin) +
sanitization, and that's arguably more important than the web UI.

> I would have tried it sooner if it had been written in a language I
> liked ;).  I'm not familiar with Ruby's email-parsing modules, but I
> am familiar with Python's.

Familiarity was why I chose Perl, too.  I've been using Email::* modules
forever in private projects.  For me, the Ruby 1.8 -> 1.9 transition was
a huge pain, and it seems the Python 2 -> 3 transition is just as bad
(from a 1000 foot view)

Perl 5 has been great with compatibility and I doubt there'll ever be
a need to transition to 6 :)

> What do you see as the ease-of-installation and adoption barriers?
> I'd guess they're just “Gmane works pretty well”.

My main concern is for distro users, not hard-core Python/Perl/Ruby
users.  I consider out-of-the-box support on stable (and even
older, long-term) GNU/Linux distros to be important to adoption

Ruby has a lot of modules (gems) for mail but distro support often lags
behind.  And SpamAssassin is _the_ killer app for me as far as spam
filtering goes.

> > Fwiw, the commit subject/message currently has no bearing on the way
> > ssoma or public-inbox handles the mail data.  So another
> > implementation is free to use more metadata in the commit message.
> 
> Right, but if you're going to put something into Git, you might as
> well make the history pleasant to browse ;).

Fair enough.

> > I've considered adding fuzzy generation counters to commit messages to
> > public-inbox to allow easier history traversals; but decided it's
> > probably better to do in any out-of-band, easily-regenerated store
> > using sqlite or similar (this may help with adding search support to
> > the web UI as well).
> 
> Fuzzy generation counters?

Commit generation numbers (age relative to the root commit).
There were several discussions around summer 2011 on the git ML
around it.
I imagine using git merge to split/combine mailing lists
(either project forking/merging back or dealing with migrating
 ML servers/hosts).

> For search, I'd just run a local notmuch index [3].

<snip>

> Personally,
> I'd rather use ssoma for aggregating and sharing the archive, and then
> notmuch to handle threading and search, with a read-only web frontent
> in front of notmuch, that just hit the ssoma archive for message
> bodies (but served thread lists and such straight from notmuch,
> hitting the Xapian database but not the ssoma archives).

Am I correct notmuch only handles Maildir and MH currently?

I really want a mail search engine to index the git blobs directly
without the need to keep decompressed messages around.

I have much mail in gzipped mboxes (new mail in Maildirs); so I've been
sticking to mairix for my local search needs.  Just having the mail
archived in git+ssoma without mboxes is the goal one day...

> I think ssoma + notmuch + nmbug is a good pairing for users too, since
> you'll generally want the whole archive locally for that (although
> with the in-flight ghost message series for notmuch [5,6], having the
> whole archive locally will move from “you really want this” to “you
> probably want this”).

Cool.  I need to look at notmuch, more.  I've considered the
Perl Xapian bindings for indexing the git repo directly, too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-18 23:43     ` Eric Wong
@ 2014-10-19  3:48       ` W. Trevor King
  2014-10-19  5:30         ` Eric Wong
  2014-10-26 22:57         ` Eric Wong
  0 siblings, 2 replies; 13+ messages in thread
From: W. Trevor King @ 2014-10-19  3:48 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 7441 bytes --]

On Sat, Oct 18, 2014 at 11:43:23PM +0000, Eric Wong wrote:
> W. Trevor King wrote:
> > On Sat, Oct 18, 2014 at 09:04:00PM +0000, Eric Wong wrote:
> > > W. Trevor King wrote:
> > > > This is more interesting than just using 'mda' all the time,
> > > > but it's harder to setup proper quoting around the message
> > > > without using third-party Perl modules (e.g. IPC::Run or
> > > > String::ShellQuote).  This proof-of-concept patch just assumes
> > > > the subject doesn't contain single-quotes (').  This patch
> > > > also doesn't handle the empty/missing subject case, which
> > > > should probably fall back to '<no subject>' or some such.
> > > 
> > > Right, carelessness here would open us up to command injection.
> > 
> > There's no chance of carelessness if you're using a subprocess
> > launcher that's based on execve (see exec(3)) instead of using a
> > shell.
> 
> Right.  I'd probably use IPC::Run in Perl since public-inbox already
> depends on it; but probably optionally (as mentioned below)

Works for me.

> > In Python 2, you just need to import the unicode_literals future
> > [2] and use unicode() instead of str().  It's easy to bind the
> > appropriate to-Unicode function to a unicode_str helper depending
> > on the Python version if you want a code-base compatible with
> > both.
> 
> How's the Python 2->3 transition these days?  (sorry, not familiar
> with Python, my brain didn't "get it")

In several of my projects, I don't bother with 2.x support, but in
this case it would be pretty easy to be compatible with both.

> > > > It would also be useful (I think) to set the GIT_AUTHOR_NAME,
> > > > GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables
> > > > from the message header before committing.  I know how to do
> > > > that using Python's subprocess module, but I don't know the
> > > > Perl incantation.
> > > 
> > > That's done in public-inbox-mda using
> > > 
> > > 	local $ENV{...} = ...
> > > 
> > > And more Email::* modules to properly decode various email
> > > addresses and internationalized names.  I wanted to keep ssoma
> > > as lean and dumb as possible.
> > 
> > It doesn't seem like *that* much more complication ;).  Can we
> > make it optional, and error out if it's enabled and the
> > appropriate decoding modules aren't present?
> 
> Sounds like a good idea to make it fall back if require fails.  Can
> you do it or would you like me to handle it in Perl?

If you tell me the Perl idiom for that, I can write up a patch.  In
Python I usually do:

  try:
    import some.module as _some_module
  except ImportError as e:
    _some_module = None
    _some_module_import_error = e
  …
  def foo():
      if _some_module is None:
          raise _some_module_import_error

> > It seems like you'd want to handle input and local browsing with
> > ssoma and then point public-inbox at the resulting Git archive.
> > Collecting the archive should be independent of serving it over
> > HTTP.
> 
> public-inbox also wraps spam filtering/learning (SpamAssassin) +
> sanitization, and that's arguably more important than the web UI.

Then I'd shift those hooks over to ssoma-mda.  Actually, I'd probably
leave it up to folks to hook those into their mail server / MDA before
messages get as far as ssoma-mda.  Spam filtering is a generic issue;
there's no need to build all the checks you'd want (also greylisting,
DKIM, SPF, …) into ssoma-mda itself.

> > I would have tried it sooner if it had been written in a language
> > I liked ;).  I'm not familiar with Ruby's email-parsing modules,
> > but I am familiar with Python's.
> 
> Familiarity was why I chose Perl, too.  I've been using Email::*
> modules forever in private projects.  For me, the Ruby 1.8 -> 1.9
> transition was a huge pain, and it seems the Python 2 -> 3
> transition is just as bad (from a 1000 foot view)

Python 2 → 3 wasn't bad if the original code understood the difference
between Unicode and byte streams and used Unicode for text.
Unfortunately, that was frequently not the case.  I imagine Ruby 1.8 →
1.9 had a lot of the same issue.

> Perl 5 has been great with compatibility and I doubt there'll ever
> be a need to transition to 6 :)

The bytes/Unicode distinction is partly a compatibility thing, but
mostly it's an internal bookkeeping thing.  If you consistently used
Unicode for text in Python 2, Python 3 was mostly “yay, now I don't
have to mangle my text into bytes before passing it to this external
library”.

> > > I've considered adding fuzzy generation counters to commit
> > > messages to public-inbox to allow easier history traversals; but
> > > decided it's probably better to do in any out-of-band,
> > > easily-regenerated store using sqlite or similar (this may help
> > > with adding search support to the web UI as well).
> > 
> > Fuzzy generation counters?
> 
> Commit generation numbers (age relative to the root commit).  There
> were several discussions around summer 2011 on the git ML around it.
>
> I imagine using git merge to split/combine mailing lists (either
> project forking/merging back or dealing with migrating ML
> servers/hosts).

That sounds good to me, but I don't see the need to have generation
numbers to do that.  We just need to patch mlmmj to support:

  <LIST+get-MESSAGE-ID-HASH@DOMAIN>

instead of:

  <LIST+get-N@DOMAIN>

and let Git handle the rest.

> > For search, I'd just run a local notmuch index [3].
> 
> <snip>
> 
> > Personally, I'd rather use ssoma for aggregating and sharing the
> > archive, and then notmuch to handle threading and search, with a
> > read-only web frontent in front of notmuch, that just hit the
> > ssoma archive for message bodies (but served thread lists and such
> > straight from notmuch, hitting the Xapian database but not the
> > ssoma archives).
> 
> Am I correct notmuch only handles Maildir and MH currently?

‘notmuch new’ only traverses Maildir and MH, but ‘notmuch insert’
reads a message off stdin just like ssoma-mda.  ‘notmuch insert’ also
currently delivers the message to maildir (besides indexing it), but
it should be easy to patch things to optionally disable that delivery
(and only index the message).

> I really want a mail search engine to index the git blobs directly
> without the need to keep decompressed messages around.

No need for decompressed messages, but you'd have to iterate over your
Git repository and feed messages to ‘notmuch insert’ one at a time
when you started a fresh index.  After that, it should be easy to have
the mail server pass the message to both ssoma-mda and ‘notmuch
insert’.

> I have much mail in gzipped mboxes (new mail in Maildirs); so I've
> been sticking to mairix for my local search needs.  Just having the
> mail archived in git+ssoma without mboxes is the goal one day...

You use gzipped mboxes instead of Maildirs for everything just from a
disk space perspective?  Patching notmuch to read email directly from
Git shouldn't be too bad, since there aren't many views where you
actually need the full email (usually the stuff in the Xapian index is
sufficient).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-19  3:48       ` W. Trevor King
@ 2014-10-19  5:30         ` Eric Wong
  2014-10-19 17:31           ` W. Trevor King
  2014-10-26 22:57         ` Eric Wong
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Wong @ 2014-10-19  5:30 UTC (permalink / raw)
  To: W. Trevor King; +Cc: meta

"W. Trevor King" <wking@tremily.us> wrote:
> On Sat, Oct 18, 2014 at 11:43:23PM +0000, Eric Wong wrote:
> > Sounds like a good idea to make it fall back if require fails.  Can
> > you do it or would you like me to handle it in Perl?
> 
> If you tell me the Perl idiom for that, I can write up a patch.  In

	eval { require Foo; };
	my $have_foo = $@ ? 0 : 1;

That won't perform any imports, but I think most of those modules
do not require imports.

> > public-inbox also wraps spam filtering/learning (SpamAssassin) +
> > sanitization, and that's arguably more important than the web UI.
> 
> Then I'd shift those hooks over to ssoma-mda.  Actually, I'd probably
> leave it up to folks to hook those into their mail server / MDA before
> messages get as far as ssoma-mda.  Spam filtering is a generic issue;
> there's no need to build all the checks you'd want (also greylisting,
> DKIM, SPF, …) into ssoma-mda itself.

Uh, you just contradicted yourself :)
public-inbox is that mail server/MDA layer before ssoma for me.

> That sounds good to me, but I don't see the need to have generation
> numbers to do that.  We just need to patch mlmmj to support:

Heh, I haven't even thought of mlmmj integration much
(even though I use it myself for some lists).

Thanks for the heaps of info on notmuch.

> > I have much mail in gzipped mboxes (new mail in Maildirs); so I've
> > been sticking to mairix for my local search needs.  Just having the
> > mail archived in git+ssoma without mboxes is the goal one day...
> 
> You use gzipped mboxes instead of Maildirs for everything just from a
> disk space perspective?

No, I only use gzipped mboxes for old mail (several months).
One glaring weakness of mairix is the lack of incremental import
mode, so it does at least readdir() scans which get expensive
on maildirs.

I'll have to look into how notmuch can integrate with my setup.

What I like about mairix is it generates a new Maildir of results (which
I can nuke at any time) and doesn't care what MUA I use.  I'd like to
get notmuch to do that for my local mail.

> Patching notmuch to read email directly from
> Git shouldn't be too bad, since there aren't many views where you
> actually need the full email (usually the stuff in the Xapian index is
> sufficient).

Just having notmuch return a Message-ID should be sufficient to retrieve
the full email from ssoma, actually.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-19  5:30         ` Eric Wong
@ 2014-10-19 17:31           ` W. Trevor King
  2014-10-20  0:49             ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: W. Trevor King @ 2014-10-19 17:31 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 3146 bytes --]

On Sun, Oct 19, 2014 at 05:30:29AM +0000, Eric Wong wrote:
> W. Trevor King wrote:
> > On Sat, Oct 18, 2014 at 11:43:23PM +0000, Eric Wong wrote:
> > > Sounds like a good idea to make it fall back if require fails.  Can
> > > you do it or would you like me to handle it in Perl?
> > 
> > If you tell me the Perl idiom for that, I can write up a patch.  In
> 
> 	eval { require Foo; };
> 	my $have_foo = $@ ? 0 : 1;
> 
> That won't perform any imports, but I think most of those modules do
> not require imports.

And then if have_foo, we ‘use Foo’ to get the import?

> > > public-inbox also wraps spam filtering/learning (SpamAssassin) +
> > > sanitization, and that's arguably more important than the web
> > > UI.
> > 
> > Then I'd shift those hooks over to ssoma-mda.  Actually, I'd
> > probably leave it up to folks to hook those into their mail server
> > / MDA before messages get as far as ssoma-mda.  Spam filtering is
> > a generic issue; there's no need to build all the checks you'd
> > want (also greylisting, DKIM, SPF, …) into ssoma-mda itself.
> 
> Uh, you just contradicted yourself :) public-inbox is that mail
> server/MDA layer before ssoma for me.

Filtering spam is something that lots of folks want (folks who may not
be interested in Git archives), and it should happen at the MTA level
(e.g. [1]) or in a procmail chain (e.g. [2]).  The spamc hook
shouldn't be tied to any Git-archive stuff.  I'd shift the Git
metadata extraction to ssoma-mda, use the Postfix filtering to drop
spam for everyone, use procmail to deliver mail for everyone, and then
call ssoma-mda from ~meta/.procmailrc.

> What I like about mairix is it generates a new Maildir of results
> (which I can nuke at any time) and doesn't care what MUA I use.  I'd
> like to get notmuch to do that for my local mail.

Notmuch has helpers to do that sort of thing [3] and notmuchfs looks
pretty generic [4].  Personally, I use notmuch's Emacs client for
searching and browsing old mail, and then use Mutt independently for
checking my inbox and composing mail.

> > Patching notmuch to read email directly from Git shouldn't be too
> > bad, since there aren't many views where you actually need the
> > full email (usually the stuff in the Xapian index is sufficient).
> 
> Just having notmuch return a Message-ID should be sufficient to
> retrieve the full email from ssoma, actually.

Right, but ssoma isn't a MUA.  The notmuch command line and MUAs also
return individual messages (‘notmuch show’, the Emacs UI [5],
presumably notmuchfs, …) also need access to the raw messages (which
aren't stored in Xapian) for message-level views.

Cheers,
Trevor

[1]: http://www.postfix.org/FILTER_README.html
[2]: https://wiki.archlinux.org/index.php/Procmail#Spamassassin
[3]: http://notmuchmail.org/notmuch-mutt/
[4]: https://github.com/tsto/notmuchfs
[5]: http://notmuchmail.org/screenshots/Screenshot-notmuch-thread-zkj.png

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-19 17:31           ` W. Trevor King
@ 2014-10-20  0:49             ` Eric Wong
  2014-10-20 15:36               ` W. Trevor King
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2014-10-20  0:49 UTC (permalink / raw)
  To: W. Trevor King; +Cc: meta

"W. Trevor King" <wking@tremily.us> wrote:
> On Sun, Oct 19, 2014 at 05:30:29AM +0000, Eric Wong wrote:
> > W. Trevor King wrote:
> > > On Sat, Oct 18, 2014 at 11:43:23PM +0000, Eric Wong wrote:
> > > > Sounds like a good idea to make it fall back if require fails.  Can
> > > > you do it or would you like me to handle it in Perl?
> > > 
> > > If you tell me the Perl idiom for that, I can write up a patch.  In
> > 
> > 	eval { require Foo; };
> > 	my $have_foo = $@ ? 0 : 1;
> > 
> > That won't perform any imports, but I think most of those modules do
> > not require imports.
> 
> And then if have_foo, we ‘use Foo’ to get the import?

No need to import for the Email::* modules I don't think.  "use" is
evaluated at compile/parse time, so you can't use it lazily outside of a
string eval.  "require" is always lazy, I think.

I think you can just call "IPC::Run::run" directly (instead of just "run")

> > > > public-inbox also wraps spam filtering/learning (SpamAssassin) +
> > > > sanitization, and that's arguably more important than the web
> > > > UI.
> > > 
> > > Then I'd shift those hooks over to ssoma-mda.  Actually, I'd
> > > probably leave it up to folks to hook those into their mail server
> > > / MDA before messages get as far as ssoma-mda.  Spam filtering is
> > > a generic issue; there's no need to build all the checks you'd
> > > want (also greylisting, DKIM, SPF, …) into ssoma-mda itself.
> > 
> > Uh, you just contradicted yourself :) public-inbox is that mail
> > server/MDA layer before ssoma for me.
> 
> Filtering spam is something that lots of folks want (folks who may not
> be interested in Git archives), and it should happen at the MTA level
> (e.g. [1]) or in a procmail chain (e.g. [2]).  The spamc hook
> shouldn't be tied to any Git-archive stuff.  I'd shift the Git
> metadata extraction to ssoma-mda, use the Postfix filtering to drop
> spam for everyone, use procmail to deliver mail for everyone, and then
> call ssoma-mda from ~meta/.procmailrc.

I don't disagree with the metadata extraction for ssoma-mda; but
everything above that layer (including public-inbox) should be open to
interpretation.  I don't use procmail myself, for example; and I know
folks who would want to use extra/different spam filters.

public-inbox is my opinionated policy layer;
but ssoma was intended to be generic.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-20  0:49             ` Eric Wong
@ 2014-10-20 15:36               ` W. Trevor King
  2014-10-20 19:26                 ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: W. Trevor King @ 2014-10-20 15:36 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 2946 bytes --]

On Mon, Oct 20, 2014 at 12:49:08AM +0000, Eric Wong wrote:
> W. Trevor King wrote:
> > On Sun, Oct 19, 2014 at 05:30:29AM +0000, Eric Wong wrote:
> > > W. Trevor King wrote:
> > > > On Sat, Oct 18, 2014 at 11:43:23PM +0000, Eric Wong wrote:
> > > > > Sounds like a good idea to make it fall back if require fails.  Can
> > > > > you do it or would you like me to handle it in Perl?
> > > > 
> > > > If you tell me the Perl idiom for that, I can write up a patch.  In
> > > 
> > > 	eval { require Foo; };
> > > 	my $have_foo = $@ ? 0 : 1;
> > > 
> > > That won't perform any imports, but I think most of those modules do
> > > not require imports.
> > 
> > And then if have_foo, we ‘use Foo’ to get the import?
> 
> No need to import for the Email::* modules I don't think.  "use" is
> evaluated at compile/parse time, so you can't use it lazily outside of a
> string eval.  "require" is always lazy, I think.
> 
> I think you can just call "IPC::Run::run" directly (instead of just "run")

Sounds good, I'll give this a shot for v2.

> > > > > public-inbox also wraps spam filtering/learning
> > > > > (SpamAssassin) + sanitization, and that's arguably more
> > > > > important than the web UI.
> > > > 
> > > > Then I'd shift those hooks over to ssoma-mda.  Actually, I'd
> > > > probably leave it up to folks to hook those into their mail
> > > > server / MDA before messages get as far as ssoma-mda.  Spam
> > > > filtering is a generic issue; there's no need to build all the
> > > > checks you'd want (also greylisting, DKIM, SPF, …) into
> > > > ssoma-mda itself.
> > > 
> > > Uh, you just contradicted yourself :) public-inbox is that mail
> > > server/MDA layer before ssoma for me.
> > 
> > Filtering spam is something that lots of folks want (folks who may
> > not be interested in Git archives), and it should happen at the
> > MTA level (e.g. [1]) or in a procmail chain (e.g. [2]).  The spamc
> > hook shouldn't be tied to any Git-archive stuff.  I'd shift the
> > Git metadata extraction to ssoma-mda, use the Postfix filtering to
> > drop spam for everyone, use procmail to deliver mail for everyone,
> > and then call ssoma-mda from ~meta/.procmailrc.
> 
> I don't disagree with the metadata extraction for ssoma-mda; but
> everything above that layer (including public-inbox) should be open
> to interpretation.  I don't use procmail myself, for example; and I
> know folks who would want to use extra/different spam filters.
> 
> public-inbox is my opinionated policy layer; but ssoma was intended
> to be generic.

Fair enough.  Maybe I'm looking for a generic ssoma-archive-to-HTTP
server (like public-inbox.cgi) that is independent of the opinionated
filtering in public-inbox-mda?

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-20 15:36               ` W. Trevor King
@ 2014-10-20 19:26                 ` Eric Wong
  2014-10-20 19:53                   ` W. Trevor King
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2014-10-20 19:26 UTC (permalink / raw)
  To: W. Trevor King; +Cc: meta

"W. Trevor King" <wking@tremily.us> wrote:
> On Mon, Oct 20, 2014 at 12:49:08AM +0000, Eric Wong wrote:
> > public-inbox is my opinionated policy layer; but ssoma was intended
> > to be generic.
> 
> Fair enough.  Maybe I'm looking for a generic ssoma-archive-to-HTTP
> server (like public-inbox.cgi) that is independent of the opinionated
> filtering in public-inbox-mda?

You can use public-inbox.{cgi,psgi} independenty of the -mda.

I'm not sure how well it does with HTML portions (that's stripped or
converted by -mda); but it should probably only show text/plain
parts.

I doubt there's a risk HTML tag injection since we're required to
sanitize plain-text mail, too, but (I think) HTML will be shown in
source form at the moment.   Of course you can filter/reject HTML
at the MTA layer, too

But of course web design is totally opinionated, too.
I design for lynx and w3m and reject all use of JS/graphics :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-20 19:26                 ` Eric Wong
@ 2014-10-20 19:53                   ` W. Trevor King
  0 siblings, 0 replies; 13+ messages in thread
From: W. Trevor King @ 2014-10-20 19:53 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 434 bytes --]

On Mon, Oct 20, 2014 at 07:26:48PM +0000, Eric Wong wrote:
> But of course web design is totally opinionated, too.  I design for
> lynx and w3m and reject all use of JS/graphics :)

Yes, but your HTTP opinions are independent of your MDA opinions ;).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-19  3:48       ` W. Trevor King
  2014-10-19  5:30         ` Eric Wong
@ 2014-10-26 22:57         ` Eric Wong
  2014-10-27  0:19           ` W. Trevor King
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Wong @ 2014-10-26 22:57 UTC (permalink / raw)
  To: W. Trevor King; +Cc: meta

"W. Trevor King" <wking@tremily.us> wrote:
> ‘notmuch new’ only traverses Maildir and MH, but ‘notmuch insert’
> reads a message off stdin just like ssoma-mda.  ‘notmuch insert’ also
> currently delivers the message to maildir (besides indexing it), but
> it should be easy to patch things to optionally disable that delivery
> (and only index the message).

Unfortunately, there seems to be no way to easily delete a single
message by Message-ID from the Xapian index using notmuch (without
maintaining a copy in the Maildir).

What I want to do is:

1) "notmuch insert" (Xapian index + Maildir delivery)
2) remove from Maildir immediately (keep message in ssoma)

  <some time passes...>

3) a message is decided to be spam after human review,
   use ssoma-rm (via public-inbox-learn) to remove it
   from current history.

But notmuch/Xapian still knows about the message.

"notmuch new" does not work here because the Maildir is
maintained in a constantly empty state.

I wonder if it is better for public-inbox/ssoma to use (the already
available) Xapian Perl bindings directly; likely using the notmuch
configuration of Xapian as a guide.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] ssoma-mda: Use the email subject as the commit message
  2014-10-26 22:57         ` Eric Wong
@ 2014-10-27  0:19           ` W. Trevor King
  0 siblings, 0 replies; 13+ messages in thread
From: W. Trevor King @ 2014-10-27  0:19 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

On Sun, Oct 26, 2014 at 10:57:40PM +0000, Eric Wong wrote:
> 1) "notmuch insert" (Xapian index + Maildir delivery)
> 2) remove from Maildir immediately (keep message in ssoma)
> 
>   <some time passes...>
> 
> 3) a message is decided to be spam after human review,
>    use ssoma-rm (via public-inbox-learn) to remove it
>    from current history.
> 
> But notmuch/Xapian still knows about the message.
> 
> "notmuch new" does not work here because the Maildir is
> maintained in a constantly empty state.

That sounds like a reasonable use case for me.  I'll work up notmuch
patches for:

  $ notmuch insert --index-only

so you can skip step 2, and a new:

  $ notmuch remove

that removes a message (read from stdin) from the index.

> I wonder if it is better for public-inbox/ssoma to use (the already
> available) Xapian Perl bindings directly; likely using the notmuch
> configuration of Xapian as a guide.

I'm not going to stand in your way, but 98% of what you need is
already in notmuch.  I'd suggest seeing how long it takes me to write
those patches, and how well they are received upstream, before
dropping notmuch.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-10-27  0:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-18 20:19 [RFC] ssoma-mda: Use the email subject as the commit message W. Trevor King
2014-10-18 21:04 ` Eric Wong
2014-10-18 21:50   ` W. Trevor King
2014-10-18 23:43     ` Eric Wong
2014-10-19  3:48       ` W. Trevor King
2014-10-19  5:30         ` Eric Wong
2014-10-19 17:31           ` W. Trevor King
2014-10-20  0:49             ` Eric Wong
2014-10-20 15:36               ` W. Trevor King
2014-10-20 19:26                 ` Eric Wong
2014-10-20 19:53                   ` W. Trevor King
2014-10-26 22:57         ` Eric Wong
2014-10-27  0:19           ` W. Trevor King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).