From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH] doc: add lei_design_notes.txt and lei-store-format(5)
Date: Wed, 21 Apr 2021 10:03:15 +0000 [thread overview]
Message-ID: <20210421100315.7455-1-e@80x24.org> (raw)
lei itself is a somewhat weird design, but lei/store is
a fairly normal hybrid of extindex with v2-style storage.
---
Documentation/lei-store-format.pod | 91 ++++++++++++++++++++++++++++++
Documentation/lei_design_notes.txt | 20 +++++++
MANIFEST | 2 +
Makefile.PL | 2 +-
4 files changed, 114 insertions(+), 1 deletion(-)
create mode 100644 Documentation/lei-store-format.pod
create mode 100644 Documentation/lei_design_notes.txt
diff --git a/Documentation/lei-store-format.pod b/Documentation/lei-store-format.pod
new file mode 100644
index 00000000..a42c770e
--- /dev/null
+++ b/Documentation/lei-store-format.pod
@@ -0,0 +1,91 @@
+% public-inbox developer manual
+
+=head1 NAME
+
+lei-store-format - lei/store format description
+
+=head1 DESCRIPTION
+
+C<lei/store> is a hybrid store based on L<public-inbox-extindex-format(5)>
+("extindex") combined with L<public-inbox-v2-format(5)> ("v2") for blob
+storage. While v2 is ideal for archiving a single public mailing list;
+it was never intended for personal mail nor storing multiple
+blobs of the "same" message.
+
+As with extindex, it can index disparate C<List-Id> headers
+belonging to the "same" message with different git blob OIDs.
+Unlike v2 and extindex, C<Message-ID> headers are NOT required;
+allowing unsent draft messages to be stored and indexed.
+
+=head1 DIRECTORY LAYOUT
+
+Blob storage exists in the form of v2-style epochs. These epochs
+are under the C<local/> directory (instead of C<git/>) to
+prevent them from being accidentally treated as a v2 inbox.
+
+=head2 INDEX OVERVIEW AND DEFINITIONS
+
+ $EPOCH - Integer starting with 0 based on time
+ $SCHEMA_VERSION - DB schema version (for Xapian)
+ $SHARD - Integer starting with 0 based on parallelism
+
+ ~/.local/share/lei/store
+ - ipc.lock # lock file for internal lei IPC
+ - local/$EPOCH.git # normal bare git repositories
+
+Additionally, the following share the same roles they do in extindex:
+
+ - ei.lock # lock file to protect global state
+ - ALL.git # empty, alternates for local/*.git
+ - ei$SCHEMA_VERSION/$SHARD # per-shard Xapian DB
+ - ei$SCHEMA_VERSION/over.sqlite3 # overview DB for WWW, IMAP
+ - ei$SCHEMA_VERSION/misc # misc Xapian DB
+
+=head2 XREF3 DEDUPLICATION
+
+Index deduplication follows extindex, see
+L<public-inbox-extindex-format(5)/XREF3 DEDUPLICATION> for
+more information.
+
+=head2 BLOB DEDUPLICATION
+
+The contents of C<local/*.git> repos is deduplicated by git blob
+object IDs (currently SHA-1). This allows multiple copies of
+cross-posted and personally Cc-ed messages to be stored with
+different C<Received:>, C<X-Spam-Status:> and similar headers to
+allow troubleshooting.
+
+=head2 VOLATILE METADATA
+
+Keywords and label information (as described in RFC 8621 for JMAP)
+is stored in existing Xapian shards (C<ei$SCHEMA_VERSION/$SHARD>).
+It is possible to search for messages matching labels and
+keywords using C<L:> and C<kw:>, respectively. As with all data
+stored in Xapian indices, volatile metadata is associated with
+the Xapian document, thus it is shared across different blobs of
+the "same" message.
+
+=head1 IPC
+
+When L<lei(1)> is run in daemon mode, L<flock(2)> is used on
+C<ipc.lock> is used to serialize writes to C<lei/store> across
+multiple internal lei workers while minimizing commits.
+
+=head1 CAVEATS
+
+Reindexing and synchronization is not yet supported.
+
+=head1 THANKS
+
+Thanks to the Linux Foundation for sponsoring the development
+and testing.
+
+=head1 COPYRIGHT
+
+Copyright 2021 all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<http://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<public-inbox-v2-format(5)>, L<public-inbox-extindex(5)>
diff --git a/Documentation/lei_design_notes.txt b/Documentation/lei_design_notes.txt
new file mode 100644
index 00000000..a5606c05
--- /dev/null
+++ b/Documentation/lei_design_notes.txt
@@ -0,0 +1,20 @@
+lei design notes
+----------------
+
+Daemon architecture
+-------------------
+
+The use of a persistent daemon works around slow startup time of
+Perl. This is especially important for built-in support for
+shell completion. It will eventually support inotify and EVFILT_VNODE
+background monitoring of Maildir keyword changes.
+
+If lei were reimplemented in a language with faster startup
+time, the daemon architecture would likely remain since it also
+lets us easily decouple the local storage from slow IMAP/NNTP
+backends and allow us to serialize writes to git-fast-import,
+SQLite, and Xapian across multiple processes.
+
+The coupling of IMAP and NNTP network latency to local storage
+is a current weakness of public-inbox-watch. Therefore, -watch
+will likely adopt the daemon architecture of lei in the future.
diff --git a/MANIFEST b/MANIFEST
index 1a1d72a6..e0f9c35b 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -35,8 +35,10 @@ Documentation/lei-mail-formats.pod
Documentation/lei-overview.pod
Documentation/lei-p2q.pod
Documentation/lei-q.pod
+Documentation/lei-store-format.pod
Documentation/lei-tag.pod
Documentation/lei.pod
+Documentation/lei_design_notes.txt
Documentation/marketing.txt
Documentation/mknews.perl
Documentation/public-inbox-compact.pod
diff --git a/Makefile.PL b/Makefile.PL
index 129b082d..85b18e7d 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -50,7 +50,7 @@ $v->{-m1} = [ map {
lei-tag lei-p2q lei-q)];
$v->{-m5} = [ qw(public-inbox-config public-inbox-v1-format
public-inbox-v2-format public-inbox-extindex-format
- lei-mail-formats
+ lei-mail-formats lei-store-format
) ];
$v->{-m7} = [ qw(lei-overview public-inbox-overview public-inbox-tuning
public-inbox-glossary) ];
reply other threads:[~2021-04-21 10:03 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210421100315.7455-1-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).