On Sat, Oct 18, 2014 at 11:43:23PM +0000, Eric Wong wrote: > W. Trevor King wrote: > > On Sat, Oct 18, 2014 at 09:04:00PM +0000, Eric Wong wrote: > > > W. Trevor King wrote: > > > > This is more interesting than just using 'mda' all the time, > > > > but it's harder to setup proper quoting around the message > > > > without using third-party Perl modules (e.g. IPC::Run or > > > > String::ShellQuote). This proof-of-concept patch just assumes > > > > the subject doesn't contain single-quotes ('). This patch > > > > also doesn't handle the empty/missing subject case, which > > > > should probably fall back to '' or some such. > > > > > > Right, carelessness here would open us up to command injection. > > > > There's no chance of carelessness if you're using a subprocess > > launcher that's based on execve (see exec(3)) instead of using a > > shell. > > Right. I'd probably use IPC::Run in Perl since public-inbox already > depends on it; but probably optionally (as mentioned below) Works for me. > > In Python 2, you just need to import the unicode_literals future > > [2] and use unicode() instead of str(). It's easy to bind the > > appropriate to-Unicode function to a unicode_str helper depending > > on the Python version if you want a code-base compatible with > > both. > > How's the Python 2->3 transition these days? (sorry, not familiar > with Python, my brain didn't "get it") In several of my projects, I don't bother with 2.x support, but in this case it would be pretty easy to be compatible with both. > > > > It would also be useful (I think) to set the GIT_AUTHOR_NAME, > > > > GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables > > > > from the message header before committing. I know how to do > > > > that using Python's subprocess module, but I don't know the > > > > Perl incantation. > > > > > > That's done in public-inbox-mda using > > > > > > local $ENV{...} = ... > > > > > > And more Email::* modules to properly decode various email > > > addresses and internationalized names. I wanted to keep ssoma > > > as lean and dumb as possible. > > > > It doesn't seem like *that* much more complication ;). Can we > > make it optional, and error out if it's enabled and the > > appropriate decoding modules aren't present? > > Sounds like a good idea to make it fall back if require fails. Can > you do it or would you like me to handle it in Perl? If you tell me the Perl idiom for that, I can write up a patch. In Python I usually do: try: import some.module as _some_module except ImportError as e: _some_module = None _some_module_import_error = e … def foo(): if _some_module is None: raise _some_module_import_error > > It seems like you'd want to handle input and local browsing with > > ssoma and then point public-inbox at the resulting Git archive. > > Collecting the archive should be independent of serving it over > > HTTP. > > public-inbox also wraps spam filtering/learning (SpamAssassin) + > sanitization, and that's arguably more important than the web UI. Then I'd shift those hooks over to ssoma-mda. Actually, I'd probably leave it up to folks to hook those into their mail server / MDA before messages get as far as ssoma-mda. Spam filtering is a generic issue; there's no need to build all the checks you'd want (also greylisting, DKIM, SPF, …) into ssoma-mda itself. > > I would have tried it sooner if it had been written in a language > > I liked ;). I'm not familiar with Ruby's email-parsing modules, > > but I am familiar with Python's. > > Familiarity was why I chose Perl, too. I've been using Email::* > modules forever in private projects. For me, the Ruby 1.8 -> 1.9 > transition was a huge pain, and it seems the Python 2 -> 3 > transition is just as bad (from a 1000 foot view) Python 2 → 3 wasn't bad if the original code understood the difference between Unicode and byte streams and used Unicode for text. Unfortunately, that was frequently not the case. I imagine Ruby 1.8 → 1.9 had a lot of the same issue. > Perl 5 has been great with compatibility and I doubt there'll ever > be a need to transition to 6 :) The bytes/Unicode distinction is partly a compatibility thing, but mostly it's an internal bookkeeping thing. If you consistently used Unicode for text in Python 2, Python 3 was mostly “yay, now I don't have to mangle my text into bytes before passing it to this external library”. > > > I've considered adding fuzzy generation counters to commit > > > messages to public-inbox to allow easier history traversals; but > > > decided it's probably better to do in any out-of-band, > > > easily-regenerated store using sqlite or similar (this may help > > > with adding search support to the web UI as well). > > > > Fuzzy generation counters? > > Commit generation numbers (age relative to the root commit). There > were several discussions around summer 2011 on the git ML around it. > > I imagine using git merge to split/combine mailing lists (either > project forking/merging back or dealing with migrating ML > servers/hosts). That sounds good to me, but I don't see the need to have generation numbers to do that. We just need to patch mlmmj to support: instead of: and let Git handle the rest. > > For search, I'd just run a local notmuch index [3]. > > > > > Personally, I'd rather use ssoma for aggregating and sharing the > > archive, and then notmuch to handle threading and search, with a > > read-only web frontent in front of notmuch, that just hit the > > ssoma archive for message bodies (but served thread lists and such > > straight from notmuch, hitting the Xapian database but not the > > ssoma archives). > > Am I correct notmuch only handles Maildir and MH currently? ‘notmuch new’ only traverses Maildir and MH, but ‘notmuch insert’ reads a message off stdin just like ssoma-mda. ‘notmuch insert’ also currently delivers the message to maildir (besides indexing it), but it should be easy to patch things to optionally disable that delivery (and only index the message). > I really want a mail search engine to index the git blobs directly > without the need to keep decompressed messages around. No need for decompressed messages, but you'd have to iterate over your Git repository and feed messages to ‘notmuch insert’ one at a time when you started a fresh index. After that, it should be easy to have the mail server pass the message to both ssoma-mda and ‘notmuch insert’. > I have much mail in gzipped mboxes (new mail in Maildirs); so I've > been sticking to mairix for my local search needs. Just having the > mail archived in git+ssoma without mboxes is the goal one day... You use gzipped mboxes instead of Maildirs for everything just from a disk space perspective? Patching notmuch to read email directly from Git shouldn't be too bad, since there aren't many views where you actually need the full email (usually the stuff in the Xapian index is sufficient). Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy