user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Nicolás Ojeda Bär" <n.oje.bar@gmail.com>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: Relationship between public-inbox and ssoma?
Date: Mon, 5 Mar 2018 12:45:04 +0100	[thread overview]
Message-ID: <CAPunWhD_BKT0QgpL5Z=jdMTaV7nE=uK_-JSt1Or2=u6U+wk4Fg@mail.gmail.com> (raw)
In-Reply-To: <20180305020754.GA11496@dcvr>

Hello Eric,

Thanks for the prompt reply.  I am trying to migrate a long-lived
mailing list (65k messages over 26 years), below are some
troubles/questions I am having;
any suggestions would be greatly appreciated.

- public-inbox-watch seems to struggle with very big maildirs; for now
I am moving the data into the maildir a little at a time and that
seems to work. Is there a particular obstacle
  to making the importing process more incremental?

- Trouble due to missing/malformed headers (mostly on very old
messages). For example, here is the header of a message that trips
public-inbox-watch:

From weis@margaux  Fri Nov 27 16:24:50 1992
Received: by margaux.inria.fr, Fri, 27 Nov 92 16:24:50 +0100
Message-ID: <9211271524.AA29971@margaux.inria.fr>
To: caml-list@margaux
Sender: weis@margaux
Status: O

The error is: fatal: Invalid rfc2822 date "" in ident:  <> (I guess
due to the lack of a Date: field). I added a Date: field just to test
and
noticed that Author: in the git commit was empty, I guess due to the
use of Sender: rather than From: header.

Do you think it is feasible to improve public-inbox-watch to try to
extract the date from some other header like above?
and to use Sender: when From: is not found?

- There are some messages that do not have Message-Id, but
public-inbox-watch seems to be able to handle them.
  Is it the case that Date: is the only header that is absolutely
necessary for public-inbox-watch to process the message?

- Does public-inbox-watch ever modify the message data?

- In general public-inbox-watch prints very little about what it is
doing, which makes it hard(er) to trace problems; a verbose flag would
be a nice
  addition, I think.

Thanks!

Best wishes,
Nicolás

On Mon, Mar 5, 2018 at 3:07 AM, Eric Wong <e@80x24.org> wrote:
> Nicolás Ojeda Bär <n.oje.bar@gmail.com> wrote:
>> Hello,
>>
>> Thanks very much for this great project.
>>
>> I am a bit puzzled about the difference between public-inbox and ssoma. In particular:
>>
>> - What is the difference between public-inbox-mda and ssoma-mda ?
>
> public-inbox-mda is more suitable for public endpoints where
> it's the primary entry point for a publically-shared mail.
> ssoma-mda is/was intended for personal mail.  Originally,
> public-inbox depended on and used ssoma, but that was given up
> for more performance.
>
> Sidenote: I don't recommend public-inbox-mda for running
> _mirrors_ of existing mailing lists since it's stricter than
> what most lists accept.  public-inbox-watch is more lenient and
> more performant (on Linux with inotify, at least); so I wrote
> it for mirroring.
>
>> - Are the git repository formats the same for public-inbox and ssoma ?
>
> Currently they are the same with one exception: ssoma allows two
> different messages (different blob SHA-1) to have the same
> Message-Id by default; public-inbox (current version) does not.
> (ssoma-mda has a "-1" option to disable duplicate Message-Id).
>
> The work-in-progress "v2" public-inbox format diverges and I
> don't currently have plans to port ssoma to use it.  The v1
> format will remain supported in public-inbox.
>
> I'm not sure if ssoma is worth the effort any more, as it's too
> much effort to promote a new sync protocol (even if based on
> git).  I'd rather improve NNTP servers and clients as an option
> for people to read public inboxes.
>
>> Any comments appreciated.
>>
>> Thanks a lot!
>
> No problem, thanks for your interest.

  reply	other threads:[~2018-03-05 11:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05  0:54 Relationship between public-inbox and ssoma? Nicolás Ojeda Bär
2018-03-05  2:07 ` Eric Wong
2018-03-05 11:45   ` Nicolás Ojeda Bär [this message]
2018-03-05 17:50     ` Eric Wong
2018-03-05 18:06       ` Nicolás Ojeda Bär
2018-03-19  7:43         ` watch performance [was: Relationship between public-inbox and ssoma?] Eric Wong
2018-03-15 15:30   ` internal format (was: Relationship between public-inbox and ssoma?) Stefan Monnier
2018-03-15 16:40     ` Eric Wong
2018-03-15 18:49       ` internal format Stefan Monnier
2018-03-15 20:14         ` Eric Wong
2018-03-15 21:05           ` Stefan Monnier
2018-03-15 21:21             ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPunWhD_BKT0QgpL5Z=jdMTaV7nE=uK_-JSt1Or2=u6U+wk4Fg@mail.gmail.com' \
    --to=n.oje.bar@gmail.com \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).