From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.9 required=3.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED shortcircuit=no autolearn=unavailable version=3.3.2 X-Original-To: meta@public-inbox.org Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 72EDD1F8B3; Sat, 18 Oct 2014 23:43:23 +0000 (UTC) Date: Sat, 18 Oct 2014 23:43:23 +0000 From: Eric Wong To: "W. Trevor King" Cc: meta@public-inbox.org Subject: Re: [RFC] ssoma-mda: Use the email subject as the commit message Message-ID: <20141018234323.GA5226@dcvr.yhbt.net> References: <20141018210400.GA2448@dcvr.yhbt.net> <20141018215020.GK17200@odin.tremily.us> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20141018215020.GK17200@odin.tremily.us> List-Id: "W. Trevor King" wrote: > On Sat, Oct 18, 2014 at 09:04:00PM +0000, Eric Wong wrote: > > W. Trevor King wrote: > > > This is more interesting than just using 'mda' all the time, but > > > it's harder to setup proper quoting around the message without > > > using third-party Perl modules (e.g. IPC::Run or > > > String::ShellQuote). This proof-of-concept patch just assumes the > > > subject doesn't contain single-quotes ('). This patch also > > > doesn't handle the empty/missing subject case, which should > > > probably fall back to '' or some such. > > > > Right, carelessness here would open us up to command injection. > > There's no chance of carelessness if you're using a subprocess > launcher that's based on execve (see exec(3)) instead of using a > shell. Right. I'd probably use IPC::Run in Perl since public-inbox already depends on it; but probably optionally (as mentioned below) > > It would also need to work with internationalized subjects. I > > considered it for public-inbox-mda; but decided it was not worth the > > trouble. > > Python handles that out of the box without difficulty [1]. In Python Good to know. > In Python 2, you just need to import the unicode_literals future [2] > and use unicode() instead of str(). It's easy to bind the appropriate > to-Unicode function to a unicode_str helper depending on the Python > version if you want a code-base compatible with both. How's the Python 2->3 transition these days? (sorry, not familiar with Python, my brain didn't "get it") > > > It would also be useful (I think) to set the GIT_AUTHOR_NAME, > > > GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables from > > > the message header before committing. I know how to do that using > > > Python's subprocess module, but I don't know the Perl incantation. > > > > That's done in public-inbox-mda using > > > > local $ENV{...} = ... > > > > And more Email::* modules to properly decode various email addresses > > and internationalized names. I wanted to keep ssoma as lean and > > dumb as possible. > > It doesn't seem like *that* much more complication ;). Can we make it > optional, and error out if it's enabled and the appropriate decoding > modules aren't present? Sounds like a good idea to make it fall back if require fails. Can you do it or would you like me to handle it in Perl? > It seems like you'd want to handle input and > local browsing with ssoma and then point public-inbox at the resulting > Git archive. Collecting the archive should be independent of serving > it over HTTP. public-inbox also wraps spam filtering/learning (SpamAssassin) + sanitization, and that's arguably more important than the web UI. > I would have tried it sooner if it had been written in a language I > liked ;). I'm not familiar with Ruby's email-parsing modules, but I > am familiar with Python's. Familiarity was why I chose Perl, too. I've been using Email::* modules forever in private projects. For me, the Ruby 1.8 -> 1.9 transition was a huge pain, and it seems the Python 2 -> 3 transition is just as bad (from a 1000 foot view) Perl 5 has been great with compatibility and I doubt there'll ever be a need to transition to 6 :) > What do you see as the ease-of-installation and adoption barriers? > I'd guess they're just “Gmane works pretty well”. My main concern is for distro users, not hard-core Python/Perl/Ruby users. I consider out-of-the-box support on stable (and even older, long-term) GNU/Linux distros to be important to adoption Ruby has a lot of modules (gems) for mail but distro support often lags behind. And SpamAssassin is _the_ killer app for me as far as spam filtering goes. > > Fwiw, the commit subject/message currently has no bearing on the way > > ssoma or public-inbox handles the mail data. So another > > implementation is free to use more metadata in the commit message. > > Right, but if you're going to put something into Git, you might as > well make the history pleasant to browse ;). Fair enough. > > I've considered adding fuzzy generation counters to commit messages to > > public-inbox to allow easier history traversals; but decided it's > > probably better to do in any out-of-band, easily-regenerated store > > using sqlite or similar (this may help with adding search support to > > the web UI as well). > > Fuzzy generation counters? Commit generation numbers (age relative to the root commit). There were several discussions around summer 2011 on the git ML around it. I imagine using git merge to split/combine mailing lists (either project forking/merging back or dealing with migrating ML servers/hosts). > For search, I'd just run a local notmuch index [3]. > Personally, > I'd rather use ssoma for aggregating and sharing the archive, and then > notmuch to handle threading and search, with a read-only web frontent > in front of notmuch, that just hit the ssoma archive for message > bodies (but served thread lists and such straight from notmuch, > hitting the Xapian database but not the ssoma archives). Am I correct notmuch only handles Maildir and MH currently? I really want a mail search engine to index the git blobs directly without the need to keep decompressed messages around. I have much mail in gzipped mboxes (new mail in Maildirs); so I've been sticking to mairix for my local search needs. Just having the mail archived in git+ssoma without mboxes is the goal one day... > I think ssoma + notmuch + nmbug is a good pairing for users too, since > you'll generally want the whole archive locally for that (although > with the in-flight ghost message series for notmuch [5,6], having the > whole archive locally will move from “you really want this” to “you > probably want this”). Cool. I need to look at notmuch, more. I've considered the Perl Xapian bindings for indexing the git repo directly, too.