From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id E8F811F5AE for ; Wed, 21 Apr 2021 10:03:15 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] doc: add lei_design_notes.txt and lei-store-format(5) Date: Wed, 21 Apr 2021 10:03:15 +0000 Message-Id: <20210421100315.7455-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: lei itself is a somewhat weird design, but lei/store is a fairly normal hybrid of extindex with v2-style storage. --- Documentation/lei-store-format.pod | 91 ++++++++++++++++++++++++++++++ Documentation/lei_design_notes.txt | 20 +++++++ MANIFEST | 2 + Makefile.PL | 2 +- 4 files changed, 114 insertions(+), 1 deletion(-) create mode 100644 Documentation/lei-store-format.pod create mode 100644 Documentation/lei_design_notes.txt diff --git a/Documentation/lei-store-format.pod b/Documentation/lei-store-format.pod new file mode 100644 index 00000000..a42c770e --- /dev/null +++ b/Documentation/lei-store-format.pod @@ -0,0 +1,91 @@ +% public-inbox developer manual + +=head1 NAME + +lei-store-format - lei/store format description + +=head1 DESCRIPTION + +C is a hybrid store based on L +("extindex") combined with L ("v2") for blob +storage. While v2 is ideal for archiving a single public mailing list; +it was never intended for personal mail nor storing multiple +blobs of the "same" message. + +As with extindex, it can index disparate C headers +belonging to the "same" message with different git blob OIDs. +Unlike v2 and extindex, C headers are NOT required; +allowing unsent draft messages to be stored and indexed. + +=head1 DIRECTORY LAYOUT + +Blob storage exists in the form of v2-style epochs. These epochs +are under the C directory (instead of C) to +prevent them from being accidentally treated as a v2 inbox. + +=head2 INDEX OVERVIEW AND DEFINITIONS + + $EPOCH - Integer starting with 0 based on time + $SCHEMA_VERSION - DB schema version (for Xapian) + $SHARD - Integer starting with 0 based on parallelism + + ~/.local/share/lei/store + - ipc.lock # lock file for internal lei IPC + - local/$EPOCH.git # normal bare git repositories + +Additionally, the following share the same roles they do in extindex: + + - ei.lock # lock file to protect global state + - ALL.git # empty, alternates for local/*.git + - ei$SCHEMA_VERSION/$SHARD # per-shard Xapian DB + - ei$SCHEMA_VERSION/over.sqlite3 # overview DB for WWW, IMAP + - ei$SCHEMA_VERSION/misc # misc Xapian DB + +=head2 XREF3 DEDUPLICATION + +Index deduplication follows extindex, see +L for +more information. + +=head2 BLOB DEDUPLICATION + +The contents of C repos is deduplicated by git blob +object IDs (currently SHA-1). This allows multiple copies of +cross-posted and personally Cc-ed messages to be stored with +different C, C and similar headers to +allow troubleshooting. + +=head2 VOLATILE METADATA + +Keywords and label information (as described in RFC 8621 for JMAP) +is stored in existing Xapian shards (C). +It is possible to search for messages matching labels and +keywords using C and C, respectively. As with all data +stored in Xapian indices, volatile metadata is associated with +the Xapian document, thus it is shared across different blobs of +the "same" message. + +=head1 IPC + +When L is run in daemon mode, L is used on +C is used to serialize writes to C across +multiple internal lei workers while minimizing commits. + +=head1 CAVEATS + +Reindexing and synchronization is not yet supported. + +=head1 THANKS + +Thanks to the Linux Foundation for sponsoring the development +and testing. + +=head1 COPYRIGHT + +Copyright 2021 all contributors L + +License: AGPL-3.0+ L + +=head1 SEE ALSO + +L, L diff --git a/Documentation/lei_design_notes.txt b/Documentation/lei_design_notes.txt new file mode 100644 index 00000000..a5606c05 --- /dev/null +++ b/Documentation/lei_design_notes.txt @@ -0,0 +1,20 @@ +lei design notes +---------------- + +Daemon architecture +------------------- + +The use of a persistent daemon works around slow startup time of +Perl. This is especially important for built-in support for +shell completion. It will eventually support inotify and EVFILT_VNODE +background monitoring of Maildir keyword changes. + +If lei were reimplemented in a language with faster startup +time, the daemon architecture would likely remain since it also +lets us easily decouple the local storage from slow IMAP/NNTP +backends and allow us to serialize writes to git-fast-import, +SQLite, and Xapian across multiple processes. + +The coupling of IMAP and NNTP network latency to local storage +is a current weakness of public-inbox-watch. Therefore, -watch +will likely adopt the daemon architecture of lei in the future. diff --git a/MANIFEST b/MANIFEST index 1a1d72a6..e0f9c35b 100644 --- a/MANIFEST +++ b/MANIFEST @@ -35,8 +35,10 @@ Documentation/lei-mail-formats.pod Documentation/lei-overview.pod Documentation/lei-p2q.pod Documentation/lei-q.pod +Documentation/lei-store-format.pod Documentation/lei-tag.pod Documentation/lei.pod +Documentation/lei_design_notes.txt Documentation/marketing.txt Documentation/mknews.perl Documentation/public-inbox-compact.pod diff --git a/Makefile.PL b/Makefile.PL index 129b082d..85b18e7d 100644 --- a/Makefile.PL +++ b/Makefile.PL @@ -50,7 +50,7 @@ $v->{-m1} = [ map { lei-tag lei-p2q lei-q)]; $v->{-m5} = [ qw(public-inbox-config public-inbox-v1-format public-inbox-v2-format public-inbox-extindex-format - lei-mail-formats + lei-mail-formats lei-store-format ) ]; $v->{-m7} = [ qw(lei-overview public-inbox-overview public-inbox-tuning public-inbox-glossary) ];