user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "W. Trevor King" <wking@tremily.us>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: [RFC] ssoma-mda: Use the email subject as the commit message
Date: Sat, 18 Oct 2014 20:48:15 -0700	[thread overview]
Message-ID: <20141019034815.GL17200@odin.tremily.us> (raw)
In-Reply-To: <20141018234323.GA5226@dcvr.yhbt.net>

[-- Attachment #1: Type: text/plain, Size: 7441 bytes --]

On Sat, Oct 18, 2014 at 11:43:23PM +0000, Eric Wong wrote:
> W. Trevor King wrote:
> > On Sat, Oct 18, 2014 at 09:04:00PM +0000, Eric Wong wrote:
> > > W. Trevor King wrote:
> > > > This is more interesting than just using 'mda' all the time,
> > > > but it's harder to setup proper quoting around the message
> > > > without using third-party Perl modules (e.g. IPC::Run or
> > > > String::ShellQuote).  This proof-of-concept patch just assumes
> > > > the subject doesn't contain single-quotes (').  This patch
> > > > also doesn't handle the empty/missing subject case, which
> > > > should probably fall back to '<no subject>' or some such.
> > > 
> > > Right, carelessness here would open us up to command injection.
> > 
> > There's no chance of carelessness if you're using a subprocess
> > launcher that's based on execve (see exec(3)) instead of using a
> > shell.
> 
> Right.  I'd probably use IPC::Run in Perl since public-inbox already
> depends on it; but probably optionally (as mentioned below)

Works for me.

> > In Python 2, you just need to import the unicode_literals future
> > [2] and use unicode() instead of str().  It's easy to bind the
> > appropriate to-Unicode function to a unicode_str helper depending
> > on the Python version if you want a code-base compatible with
> > both.
> 
> How's the Python 2->3 transition these days?  (sorry, not familiar
> with Python, my brain didn't "get it")

In several of my projects, I don't bother with 2.x support, but in
this case it would be pretty easy to be compatible with both.

> > > > It would also be useful (I think) to set the GIT_AUTHOR_NAME,
> > > > GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables
> > > > from the message header before committing.  I know how to do
> > > > that using Python's subprocess module, but I don't know the
> > > > Perl incantation.
> > > 
> > > That's done in public-inbox-mda using
> > > 
> > > 	local $ENV{...} = ...
> > > 
> > > And more Email::* modules to properly decode various email
> > > addresses and internationalized names.  I wanted to keep ssoma
> > > as lean and dumb as possible.
> > 
> > It doesn't seem like *that* much more complication ;).  Can we
> > make it optional, and error out if it's enabled and the
> > appropriate decoding modules aren't present?
> 
> Sounds like a good idea to make it fall back if require fails.  Can
> you do it or would you like me to handle it in Perl?

If you tell me the Perl idiom for that, I can write up a patch.  In
Python I usually do:

  try:
    import some.module as _some_module
  except ImportError as e:
    _some_module = None
    _some_module_import_error = e
  …
  def foo():
      if _some_module is None:
          raise _some_module_import_error

> > It seems like you'd want to handle input and local browsing with
> > ssoma and then point public-inbox at the resulting Git archive.
> > Collecting the archive should be independent of serving it over
> > HTTP.
> 
> public-inbox also wraps spam filtering/learning (SpamAssassin) +
> sanitization, and that's arguably more important than the web UI.

Then I'd shift those hooks over to ssoma-mda.  Actually, I'd probably
leave it up to folks to hook those into their mail server / MDA before
messages get as far as ssoma-mda.  Spam filtering is a generic issue;
there's no need to build all the checks you'd want (also greylisting,
DKIM, SPF, …) into ssoma-mda itself.

> > I would have tried it sooner if it had been written in a language
> > I liked ;).  I'm not familiar with Ruby's email-parsing modules,
> > but I am familiar with Python's.
> 
> Familiarity was why I chose Perl, too.  I've been using Email::*
> modules forever in private projects.  For me, the Ruby 1.8 -> 1.9
> transition was a huge pain, and it seems the Python 2 -> 3
> transition is just as bad (from a 1000 foot view)

Python 2 → 3 wasn't bad if the original code understood the difference
between Unicode and byte streams and used Unicode for text.
Unfortunately, that was frequently not the case.  I imagine Ruby 1.8 →
1.9 had a lot of the same issue.

> Perl 5 has been great with compatibility and I doubt there'll ever
> be a need to transition to 6 :)

The bytes/Unicode distinction is partly a compatibility thing, but
mostly it's an internal bookkeeping thing.  If you consistently used
Unicode for text in Python 2, Python 3 was mostly “yay, now I don't
have to mangle my text into bytes before passing it to this external
library”.

> > > I've considered adding fuzzy generation counters to commit
> > > messages to public-inbox to allow easier history traversals; but
> > > decided it's probably better to do in any out-of-band,
> > > easily-regenerated store using sqlite or similar (this may help
> > > with adding search support to the web UI as well).
> > 
> > Fuzzy generation counters?
> 
> Commit generation numbers (age relative to the root commit).  There
> were several discussions around summer 2011 on the git ML around it.
>
> I imagine using git merge to split/combine mailing lists (either
> project forking/merging back or dealing with migrating ML
> servers/hosts).

That sounds good to me, but I don't see the need to have generation
numbers to do that.  We just need to patch mlmmj to support:

  <LIST+get-MESSAGE-ID-HASH@DOMAIN>

instead of:

  <LIST+get-N@DOMAIN>

and let Git handle the rest.

> > For search, I'd just run a local notmuch index [3].
> 
> <snip>
> 
> > Personally, I'd rather use ssoma for aggregating and sharing the
> > archive, and then notmuch to handle threading and search, with a
> > read-only web frontent in front of notmuch, that just hit the
> > ssoma archive for message bodies (but served thread lists and such
> > straight from notmuch, hitting the Xapian database but not the
> > ssoma archives).
> 
> Am I correct notmuch only handles Maildir and MH currently?

‘notmuch new’ only traverses Maildir and MH, but ‘notmuch insert’
reads a message off stdin just like ssoma-mda.  ‘notmuch insert’ also
currently delivers the message to maildir (besides indexing it), but
it should be easy to patch things to optionally disable that delivery
(and only index the message).

> I really want a mail search engine to index the git blobs directly
> without the need to keep decompressed messages around.

No need for decompressed messages, but you'd have to iterate over your
Git repository and feed messages to ‘notmuch insert’ one at a time
when you started a fresh index.  After that, it should be easy to have
the mail server pass the message to both ssoma-mda and ‘notmuch
insert’.

> I have much mail in gzipped mboxes (new mail in Maildirs); so I've
> been sticking to mairix for my local search needs.  Just having the
> mail archived in git+ssoma without mboxes is the goal one day...

You use gzipped mboxes instead of Maildirs for everything just from a
disk space perspective?  Patching notmuch to read email directly from
Git shouldn't be too bad, since there aren't many views where you
actually need the full email (usually the stuff in the Xapian index is
sufficient).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


  reply	other threads:[~2014-10-19  3:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-18 20:19 [RFC] ssoma-mda: Use the email subject as the commit message W. Trevor King
2014-10-18 21:04 ` Eric Wong
2014-10-18 21:50   ` W. Trevor King
2014-10-18 23:43     ` Eric Wong
2014-10-19  3:48       ` W. Trevor King [this message]
2014-10-19  5:30         ` Eric Wong
2014-10-19 17:31           ` W. Trevor King
2014-10-20  0:49             ` Eric Wong
2014-10-20 15:36               ` W. Trevor King
2014-10-20 19:26                 ` Eric Wong
2014-10-20 19:53                   ` W. Trevor King
2014-10-26 22:57         ` Eric Wong
2014-10-27  0:19           ` W. Trevor King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141019034815.GL17200@odin.tremily.us \
    --to=wking@tremily.us \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).