about summary refs log tree commit homepage
path: root/Documentation/public-inbox-v1-format.pod
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/public-inbox-v1-format.pod')
-rw-r--r--Documentation/public-inbox-v1-format.pod171
1 files changed, 171 insertions, 0 deletions
diff --git a/Documentation/public-inbox-v1-format.pod b/Documentation/public-inbox-v1-format.pod
new file mode 100644
index 00000000..2a6b8d3c
--- /dev/null
+++ b/Documentation/public-inbox-v1-format.pod
@@ -0,0 +1,171 @@
+% public-inbox developer manual
+
+=head1 NAME
+
+public-inbox v1 git repository and tree description (aka "ssoma")
+
+=head1 DESCRIPTION
+
+WARNING: this does NOT describe the scalable v2 format used
+by public-inbox.  Use of ssoma is not recommended for new
+installations due to scalability problems.
+
+ssoma uses a git repository to store each email as a git blob.
+The tree filename of the blob is based on the SHA1 hexdigest of
+the first Message-ID header.  A commit is made for each message
+delivered.  The commit SHA-1 identifier is used by ssoma clients
+to track synchronization state.
+
+=head1 PATHNAMES IN TREES
+
+A Message-ID may be extremely long and also contain slashes, so using
+them as a path name is challenging.  Instead we use the SHA-1 hexdigest
+of the Message-ID (excluding the leading "E<lt>" and trailing "E<gt>")
+to generate a path name.  Leading and trailing white space in the
+Message-ID header is ignored for hashing.
+
+A message with Message-ID of: E<lt>20131106023245.GA20224@dcvr.yhbt.netE<gt>
+
+Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63
+
+Thus it is easy to look up the contents of a message matching a given
+a Message-ID.
+
+=head1 MESSAGE-ID CONFLICTS
+
+public-inbox v1 repositories currently do not resolve conflicting
+Message-IDs or messages with multiple Message-IDs.
+
+=head1 HEADERS
+
+The Message-ID header is required.
+"Bytes", "Lines" and "Content-Length" headers are stripped and not
+allowed, they can interfere with further processing.
+When using ssoma with public-inbox-mda, the "Status" mbox header
+is also stripped as that header makes no sense in a public archive.
+
+=head1 LOCKING
+
+L<flock(2)> locking exclusively locks the empty $GIT_DIR/ssoma.lock file
+for all non-atomic operations.
+
+=head1 EXAMPLE INPUT FLOW (SERVER-SIDE MDA)
+
+1. Message is delivered to a mail transport agent (MTA)
+
+1a. (optional) reject/discard spam, this should run before ssoma-mda
+
+1b. (optional) reject/strip unwanted attachments
+
+ssoma-mda handles all steps once invoked.
+
+2. Mail transport agent invokes ssoma-mda
+
+3. reads message via stdin, extracting Message-ID
+
+4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock
+
+5. creates or updates the blob of associated 2/38 SHA-1 path
+
+6. updates the index and commits
+
+7. releases $GIT_DIR/ssoma.lock
+
+ssoma-mda can also be used as an L<inotify(7)> trigger to monitor maildirs,
+and the ability to monitor IMAP mailboxes using IDLE will be available
+in the future.
+
+=head1 GIT REPOSITORIES (SERVERS)
+
+ssoma uses bare git repositories on both servers and clients.
+
+Using the L<git-init(1)> command with --bare is the recommend method
+of creating a git repository on a server:
+
+        git init --bare /path/to/wherever/you/want.git
+
+There are no standardized paths for servers, administrators make
+all the choices regarding git repository locations.
+
+Special files in $GIT_DIR on the server:
+
+=over
+
+=item $GIT_DIR/ssoma.lock
+
+An empty file for L<flock(2)> locking.
+This is necessary to ensure the index and commits are updated
+consistently and multiple processes running MDA do not step on
+each other.
+
+=item $GIT_DIR/public-inbox/msgmap.sqlite3
+
+SQLite3 database maintaining a stable mapping of Message-IDs to NNTP
+article numbers.  Used by L<public-inbox-nntpd(1)> and created
+and updated by L<public-inbox-index(1)>.
+
+Automatically updated by L<public-inbox-mda(1)>,
+L<public-inbox-learn(1)> and L<public-inbox-watch(1)>.
+
+Losing or damaging this file will cause synchronization problems for
+NNTP clients.  This file is expected to be stable and require no
+updates to its schema.
+
+Requires L<DBD::SQLite>.
+
+=item $GIT_DIR/public-inbox/xapian$N/
+
+Xapian database for search indices in the PSGI web UI.
+
+$N is the value of PublicInbox::Search::SCHEMA_VERSION, and
+installations may have parallel versions on disk during upgrades
+or to roll-back upgrades.
+
+This is created and updated by L<public-inbox-index(1)>.
+
+Automatically updated by L<public-inbox-mda(1)>,
+L<public-inbox-learn(1)> and L<public-inbox-watch(1)>.
+
+This directory can always be regenerated with L<public-inbox-index(1)>.
+If lost or damaaged, there is no need to back it up unless the
+CPU/memory cost of regenerating it outweighs the storage/transfer cost.
+
+Since SCHEMA_VERSION 15 and the development of the v2 format,
+the "overview" DB also exists in the xapian directory for v1
+repositories.  See L<public-inbox-v2-format(5)/OVERVIEW DB>
+
+=item $GIT_DIR/ssoma.index
+
+This file is no longer used or created by public-inbox, but it is
+updated if it exists to remain compatible with ssoma installations.
+
+A git index file used for MDA updates.  The normal git index (in
+$GIT_DIR/index) is not used at all as there is typically no working
+tree.
+
+=back
+
+Each client $GIT_DIR may have multiple mbox/maildir/command targets.
+It is possible for a client to extract the mail stored in the git
+repository to multiple mboxes for compatibility with a variety of
+different tools.
+
+=head1 CAVEATS
+
+It is NOT recommended to check out the working directory of a git.
+there may be many files.
+
+It is impossible to completely expunge messages, even spam, as git
+retains full history.  Projects may (with adequate notice) cycle to new
+repositories/branches with history cleaned up via L<git-filter-branch(1)>.
+This is up to the administrators.
+
+=head1 COPYRIGHT
+
+Copyright 2013-2019 all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<http://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<gitrepository-layout(5)>, L<ssoma(1)>