From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from mail-io0-x234.google.com (mail-io0-x234.google.com [IPv6:2607:f8b0:4001:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4A3531F576 for ; Mon, 5 Mar 2018 11:45:06 +0000 (UTC) Received: by mail-io0-x234.google.com with SMTP id g21so17623241ioj.5 for ; Mon, 05 Mar 2018 03:45:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=6G1dEF0f7avkNNxX4lGNojzs7+lLAgb0LbXy0SNx4Jc=; b=Vlu4ZqQcPRZ0B5F67H3Np6XC69+ZvkEp9Av3EwBmLtQ7ZWuhY5cWZqpv19UtrePOhq FjEH2RtSeS1XMLb/Vb6q3WXUvBy1ZJlBEvhtQrJa1bYnucUyftSYPE5XtaWFwW0Ns20I z7vHuFz5X+AMJ4x2MHf3+d+yRjasj9sLjOZU5V1buUvSWwSGWQtbtNlgpf6rVLK7QzIC 2y9fRgMhu2iumWk6i2M0bfSueFOZ1YnU5hHXijniApuDADHbpV2ihSj0xFGx/01dcU0M 3IaZU4bgg7bRR2I4+qoThqHEI7X/Qqssd5XzD3ykSg6P2WkTM/vN2D/DH6eZ4PCEQgoX HhUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6G1dEF0f7avkNNxX4lGNojzs7+lLAgb0LbXy0SNx4Jc=; b=lbuMfStSwK5xlbjQ4xF6m1v4XvZklLGsBh0O+EYy3cpKsMYKn27e8xk32PBuVPyGHJ Z5YM/uyu5X5roKTz4c+x57e3ygyL8Xg1DKeJknLHNDMviDqfrmg9QGQvCDROoA3GLlnc W0WF0kTgx3YymAbtMYYOGEmVjKRyIjjC+DEfeqpf31Y3usVt+6oVGt1fRDb7afBcwFNn 9LzN3hWXMuQXzryMMxxXmA3x8udweUuA2Z7+lT1CLD4YHkvBma3whyDJW8mPiqnMLt8O Xo5dd/Px2mlK7tID80yBw1gcqXq33gchVvosWpXnZZHfwwGSw03ONEFXsFrxvnM8yBLw 73pQ== X-Gm-Message-State: AElRT7HJEsW5x55GH0MMw/NRT6LA+7isK2YA8qCxHmcrveb3klIudeUo vutLCPrl4/ZxA2LwwdPXXKjbpHGkBQsnxE1MZi0= X-Google-Smtp-Source: AG47ELvcyjZbYUnsyczUlXJG+VBYzamIhgXF3X6baCixt/7V2R7kBWtkL5y8SVvI/7wNZzsAgzsUCx2M59Mlf7aSeN0= X-Received: by 10.107.202.67 with SMTP id a64mr17475080iog.194.1520250305468; Mon, 05 Mar 2018 03:45:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.79.118.213 with HTTP; Mon, 5 Mar 2018 03:45:04 -0800 (PST) In-Reply-To: <20180305020754.GA11496@dcvr> References: <20180305020754.GA11496@dcvr> From: =?UTF-8?Q?Nicol=C3=A1s_Ojeda_B=C3=A4r?= Date: Mon, 5 Mar 2018 12:45:04 +0100 Message-ID: Subject: Re: Relationship between public-inbox and ssoma? To: Eric Wong Cc: meta@public-inbox.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: Hello Eric, Thanks for the prompt reply. I am trying to migrate a long-lived mailing list (65k messages over 26 years), below are some troubles/questions I am having; any suggestions would be greatly appreciated. - public-inbox-watch seems to struggle with very big maildirs; for now I am moving the data into the maildir a little at a time and that seems to work. Is there a particular obstacle to making the importing process more incremental? - Trouble due to missing/malformed headers (mostly on very old messages). For example, here is the header of a message that trips public-inbox-watch: >From weis@margaux Fri Nov 27 16:24:50 1992 Received: by margaux.inria.fr, Fri, 27 Nov 92 16:24:50 +0100 Message-ID: <9211271524.AA29971@margaux.inria.fr> To: caml-list@margaux Sender: weis@margaux Status: O The error is: fatal: Invalid rfc2822 date "" in ident: <> (I guess due to the lack of a Date: field). I added a Date: field just to test and noticed that Author: in the git commit was empty, I guess due to the use of Sender: rather than From: header. Do you think it is feasible to improve public-inbox-watch to try to extract the date from some other header like above? and to use Sender: when From: is not found? - There are some messages that do not have Message-Id, but public-inbox-watch seems to be able to handle them. Is it the case that Date: is the only header that is absolutely necessary for public-inbox-watch to process the message? - Does public-inbox-watch ever modify the message data? - In general public-inbox-watch prints very little about what it is doing, which makes it hard(er) to trace problems; a verbose flag would be a nice addition, I think. Thanks! Best wishes, Nicol=C3=A1s On Mon, Mar 5, 2018 at 3:07 AM, Eric Wong wrote: > Nicol=C3=A1s Ojeda B=C3=A4r wrote: >> Hello, >> >> Thanks very much for this great project. >> >> I am a bit puzzled about the difference between public-inbox and ssoma. = In particular: >> >> - What is the difference between public-inbox-mda and ssoma-mda ? > > public-inbox-mda is more suitable for public endpoints where > it's the primary entry point for a publically-shared mail. > ssoma-mda is/was intended for personal mail. Originally, > public-inbox depended on and used ssoma, but that was given up > for more performance. > > Sidenote: I don't recommend public-inbox-mda for running > _mirrors_ of existing mailing lists since it's stricter than > what most lists accept. public-inbox-watch is more lenient and > more performant (on Linux with inotify, at least); so I wrote > it for mirroring. > >> - Are the git repository formats the same for public-inbox and ssoma ? > > Currently they are the same with one exception: ssoma allows two > different messages (different blob SHA-1) to have the same > Message-Id by default; public-inbox (current version) does not. > (ssoma-mda has a "-1" option to disable duplicate Message-Id). > > The work-in-progress "v2" public-inbox format diverges and I > don't currently have plans to port ssoma to use it. The v1 > format will remain supported in public-inbox. > > I'm not sure if ssoma is worth the effort any more, as it's too > much effort to promote a new sync protocol (even if based on > git). I'd rather improve NNTP servers and clients as an option > for people to read public inboxes. > >> Any comments appreciated. >> >> Thanks a lot! > > No problem, thanks for your interest.