user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
blob 71aa72cb96e3b6d9a03dce1680f11e9848ee40c9 3159 bytes (raw)
name: Documentation/lei-store-format.pod 	 # note: path name is non-authoritative(*)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
 
% public-inbox developer manual

=head1 NAME

lei-store-format - lei/store format description

=head1 DESCRIPTION

C<lei/store> is a hybrid store based on L<public-inbox-extindex-format(5)>
("extindex") combined with L<public-inbox-v2-format(5)> ("v2") for blob
storage.  While v2 is ideal for archiving a single public mailing list;
it was never intended for personal mail nor storing multiple
blobs of the "same" message.

As with extindex, it can index disparate C<List-Id> headers
belonging to the "same" message with different git blob OIDs.
Unlike v2 and extindex, C<Message-ID> headers are NOT required;
allowing unsent draft messages to be stored and indexed.

=head1 DIRECTORY LAYOUT

Blob storage exists in the form of v2-style epochs.  These epochs
are under the C<local/> directory (instead of C<git/>) to
prevent them from being accidentally treated as a v2 inbox.

=head2 INDEX OVERVIEW AND DEFINITIONS

  $EPOCH - Integer starting with 0 based on time
  $SCHEMA_VERSION - DB schema version (for Xapian)
  $SHARD - Integer starting with 0 based on parallelism

  ~/.local/share/lei/store
  - ipc.lock                        # lock file for internal lei IPC
  - local/$EPOCH.git                # normal bare git repositories
  - mail_sync.sqlite3               # sync state IMAP, Maildir, NNTP

Additionally, the following share the same roles they do in extindex:

  - ei.lock                         # lock file to protect global state
  - ALL.git                         # empty, alternates for local/*.git
  - ei$SCHEMA_VERSION/$SHARD        # per-shard Xapian DB
  - ei$SCHEMA_VERSION/over.sqlite3  # overview DB for WWW, IMAP
  - ei$SCHEMA_VERSION/misc          # misc Xapian DB

=head2 XREF3 DEDUPLICATION

Index deduplication follows extindex, see
L<public-inbox-extindex-format(5)/XREF3 DEDUPLICATION> for
more information.

=head2 BLOB DEDUPLICATION

The contents of C<local/*.git> repos is deduplicated by git blob
object IDs (currently SHA-1).  This allows multiple copies of
cross-posted and personally Cc-ed messages to be stored with
different C<Received:>, C<X-Spam-Status:> and similar headers to
allow troubleshooting.

=head2 VOLATILE METADATA

Keywords and label information (as described in RFC 8621 for JMAP)
is stored in existing Xapian shards (C<ei$SCHEMA_VERSION/$SHARD>).
It is possible to search for messages matching labels and
keywords using C<L:> and C<kw:>, respectively.  As with all data
stored in Xapian indices, volatile metadata is associated with
the Xapian document, thus it is shared across different blobs of
the "same" message.

=head1 IPC

When L<lei(1)> is run in daemon mode, L<flock(2)> is used on
C<ipc.lock> is used to serialize writes to C<lei/store> across
multiple internal lei workers while minimizing commits.

=head1 CAVEATS

Reindexing and synchronization is not yet supported.

=head1 THANKS

Thanks to the Linux Foundation for sponsoring the development
and testing.

=head1 COPYRIGHT

Copyright 2021 all contributors L<mailto:meta@public-inbox.org>

License: AGPL-3.0+ L<http://www.gnu.org/licenses/agpl-3.0.txt>

=head1 SEE ALSO

L<public-inbox-v2-format(5)>, L<public-inbox-extindex(5)>

debug log:

solving 71aa72cb96e3 ...
found 71aa72cb96e3 in https://80x24.org/public-inbox.git

(*) Git path names are given by the tree(s) the blob belongs to.
    Blobs themselves have no identifier aside from the hash of its contents.^

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).