NAME
    public-inbox v1 git repository and tree description (aka "ssoma")

DESCRIPTION
    WARNING: this does NOT describe the scalable v2 format used by
    public-inbox. Use of ssoma is not recommended for new installations due
    to scalability problems.

    ssoma uses a git repository to store each email as a git blob. The tree
    filename of the blob is based on the SHA1 hexdigest of the first
    Message-ID header. A commit is made for each message delivered. The
    commit SHA-1 identifier is used by ssoma clients to track
    synchronization state.

PATHNAMES IN TREES
    A Message-ID may be extremely long and also contain slashes, so using
    them as a path name is challenging. Instead we use the SHA-1 hexdigest
    of the Message-ID (excluding the leading "<" and trailing ">") to
    generate a path name. Leading and trailing white space in the Message-ID
    header is ignored for hashing.

    A message with Message-ID of: <20131106023245.GA20224@dcvr.yhbt.net>

    Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63

    Thus it is easy to look up the contents of a message matching a given a
    Message-ID.

MESSAGE-ID CONFLICTS
    public-inbox v1 repositories currently do not resolve conflicting
    Message-IDs or messages with multiple Message-IDs.

HEADERS
    The Message-ID header is required. "Bytes", "Lines" and "Content-Length"
    headers are stripped and not allowed, they can interfere with further
    processing. When using ssoma with public-inbox-mda, the "Status" mbox
    header is also stripped as that header makes no sense in a public
    archive.

LOCKING
    flock(2) locking exclusively locks the empty $GIT_DIR/ssoma.lock file
    for all non-atomic operations.

EXAMPLE INPUT FLOW (SERVER-SIDE MDA)
    1. Message is delivered to a mail transport agent (MTA)

    1a. (optional) reject/discard spam, this should run before ssoma-mda

    1b. (optional) reject/strip unwanted attachments

    ssoma-mda handles all steps once invoked.

    2. Mail transport agent invokes ssoma-mda

    3. reads message via stdin, extracting Message-ID

    4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock

    5. creates or updates the blob of associated 2/38 SHA-1 path

    6. updates the index and commits

    7. releases $GIT_DIR/ssoma.lock

    ssoma-mda can also be used as an inotify(7) trigger to monitor maildirs,
    and the ability to monitor IMAP mailboxes using IDLE will be available
    in the future.

GIT REPOSITORIES (SERVERS)
    ssoma uses bare git repositories on both servers and clients.

    Using the git-init(1) command with --bare is the recommend method of
    creating a git repository on a server:

            git init --bare /path/to/wherever/you/want.git

    There are no standardized paths for servers, administrators make all the
    choices regarding git repository locations.

    Special files in $GIT_DIR on the server:

    $GIT_DIR/ssoma.lock
        An empty file for flock(2) locking. This is necessary to ensure the
        index and commits are updated consistently and multiple processes
        running MDA do not step on each other.

    $GIT_DIR/public-inbox/msgmap.sqlite3
        SQLite3 database maintaining a stable mapping of Message-IDs to NNTP
        article numbers. Used by public-inbox-nntpd(1) and created and
        updated by public-inbox-index(1).

        Automatically updated by public-inbox-mda(1), public-inbox-learn(1)
        and public-inbox-watch(1).

        Losing or damaging this file will cause synchronization problems for
        NNTP clients. This file is expected to be stable and require no
        updates to its schema.

        Requires DBD::SQLite.

    $GIT_DIR/public-inbox/xapian$N/
        Xapian database for search indices in the PSGI web UI.

        $N is the value of PublicInbox::Search::SCHEMA_VERSION, and
        installations may have parallel versions on disk during upgrades or
        to roll-back upgrades.

        This is created and updated by public-inbox-index(1).

        Automatically updated by public-inbox-mda(1), public-inbox-learn(1)
        and public-inbox-watch(1).

        This directory can always be regenerated with public-inbox-index(1).
        If lost or damaaged, there is no need to back it up unless the
        CPU/memory cost of regenerating it outweighs the storage/transfer
        cost.

        Since SCHEMA_VERSION 15 and the development of the v2 format, the
        "overview" DB also exists in the xapian directory for v1
        repositories. See "OVERVIEW DB" in public-inbox-v2-format(5)

    $GIT_DIR/ssoma.index
        This file is no longer used or created by public-inbox, but it is
        updated if it exists to remain compatible with ssoma installations.

        A git index file used for MDA updates. The normal git index (in
        $GIT_DIR/index) is not used at all as there is typically no working
        tree.

    Each client $GIT_DIR may have multiple mbox/maildir/command targets. It
    is possible for a client to extract the mail stored in the git
    repository to multiple mboxes for compatibility with a variety of
    different tools.

CAVEATS
    It is NOT recommended to check out the working directory of a git. there
    may be many files.

    It is impossible to completely expunge messages, even spam, as git
    retains full history. Projects may (with adequate notice) cycle to new
    repositories/branches with history cleaned up via git-filter-branch(1).
    This is up to the administrators.

COPYRIGHT
    Copyright 2013-2019 all contributors <mailto:meta@public-inbox.org>

    License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>

SEE ALSO
    gitrepository-layout(5), ssoma(1)