% public-inbox developer manual =head1 NAME public-inbox v1 git repository and tree description (aka "ssoma") =head1 DESCRIPTION WARNING: this does NOT describe the scalable v2 format used by public-inbox. Use of ssoma is not recommended for new installations due to scalability problems. ssoma uses a git repository to store each email as a git blob. The tree filename of the blob is based on the SHA1 hexdigest of the first Message-ID header. A commit is made for each message delivered. The commit SHA-1 identifier is used by ssoma clients to track synchronization state. =head1 PATHNAMES IN TREES A Message-ID may be extremely long and also contain slashes, so using them as a path name is challenging. Instead we use the SHA-1 hexdigest of the Message-ID (excluding the leading "E" and trailing "E") to generate a path name. Leading and trailing white space in the Message-ID header is ignored for hashing. A message with Message-ID of: E20131106023245.GA20224@dcvr.yhbt.netE Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63 Thus it is easy to look up the contents of a message matching a given a Message-ID. =head1 MESSAGE-ID CONFLICTS public-inbox v1 repositories currently do not resolve conflicting Message-IDs or messages with multiple Message-IDs. =head1 HEADERS The Message-ID header is required. "Bytes", "Lines" and "Content-Length" headers are stripped and not allowed, they can interfere with further processing. When using ssoma with public-inbox-mda, the "Status" mbox header is also stripped as that header makes no sense in a public archive. =head1 LOCKING L locking exclusively locks the empty $GIT_DIR/ssoma.lock file for all non-atomic operations. =head1 EXAMPLE INPUT FLOW (SERVER-SIDE MDA) 1. Message is delivered to a mail transport agent (MTA) 1a. (optional) reject/discard spam, this should run before ssoma-mda 1b. (optional) reject/strip unwanted attachments ssoma-mda handles all steps once invoked. 2. Mail transport agent invokes ssoma-mda 3. reads message via stdin, extracting Message-ID 4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock 5. creates or updates the blob of associated 2/38 SHA-1 path 6. updates the index and commits 7. releases $GIT_DIR/ssoma.lock ssoma-mda can also be used as an L trigger to monitor maildirs, and the ability to monitor IMAP mailboxes using IDLE will be available in the future. =head1 GIT REPOSITORIES (SERVERS) ssoma uses bare git repositories on both servers and clients. Using the L command with --bare is the recommend method of creating a git repository on a server: git init --bare /path/to/wherever/you/want.git There are no standardized paths for servers, administrators make all the choices regarding git repository locations. Special files in $GIT_DIR on the server: =over =item $GIT_DIR/ssoma.lock An empty file for L locking. This is necessary to ensure the index and commits are updated consistently and multiple processes running MDA do not step on each other. =item $GIT_DIR/public-inbox/msgmap.sqlite3 SQLite3 database maintaining a stable mapping of Message-IDs to NNTP article numbers. Used by L and created and updated by L. Users of the L interface will find it useful for attempting recovery from copy-paste truncations of URLs containing long Message-IDs. Automatically updated by L, L and L. Losing or damaging this file will cause synchronization problems for NNTP clients. This file is expected to be stable and require no updates to its schema. Requires L. =item $GIT_DIR/public-inbox/xapian$N/ Xapian database for search indices in the PSGI web UI. $N is the value of PublicInbox::Search::SCHEMA_VERSION, and installations may have parallel versions on disk during upgrades or to roll-back upgrades. This is created and updated by L. Automatically updated by L, L and L. This directory can always be regenerated with L. If lost or damaged, there is no need to back it up unless the CPU/memory cost of regenerating it outweighs the storage/transfer cost. Since SCHEMA_VERSION 15 and the development of the v2 format, the "overview" DB also exists in the xapian directory for v1 repositories. See L Our use of the L requires Xapian document IDs to remain stable. Using L and L wrappers are recommended over tools provided by Xapian. This directory is large, often two to three times the size of the objects stored in a packed git repository. =item $GIT_DIR/ssoma.index This file is no longer used or created by public-inbox, but it is updated if it exists to remain compatible with ssoma installations. A git index file used for MDA updates. The normal git index (in $GIT_DIR/index) is not used at all as there is typically no working tree. =back Each client $GIT_DIR may have multiple mbox/maildir/command targets. It is possible for a client to extract the mail stored in the git repository to multiple mboxes for compatibility with a variety of different tools. =head1 CAVEATS It is NOT recommended to check out the working directory of a git. there may be many files. It is impossible to completely expunge messages, even spam, as git retains full history. Projects may (with adequate notice) cycle to new repositories/branches with history cleaned up via L or L. This is up to the administrators. =head1 COPYRIGHT Copyright 2013-2019 all contributors L License: AGPL-3.0+ L =head1 SEE ALSO L, L