diff options
author | Eric Wong (Contractor, The Linux Foundation) <e@80x24.org> | 2018-02-22 01:49:08 +0000 |
---|---|---|
committer | Eric Wong (Contractor, The Linux Foundation) <e@80x24.org> | 2018-02-22 18:33:46 +0000 |
commit | 9ecbfc09928dada28094fd3fc79e91a5472b27ea (patch) | |
tree | a829ab7765f45e139e8a9d5de1c3784fc26bbf69 /MANIFEST | |
parent | a81ad9c4b1b5d8c2ae8444b6dcb8710bd361f628 (diff) | |
download | public-inbox-9ecbfc09928dada28094fd3fc79e91a5472b27ea.tar.gz |
The parallelization requires splitting Msgmap, text+term indexing, and thread-linking out into separate processes. git-fast-import is fast, so we don't bother parallelizing it. Msgmap (SQLite) and thread-linking (Xapian) must be serialized because they rely on monotonically increasing numbers (NNTP article number and internal thread_id, respectively). We handle msgmap in the main process which drives fast-import. When the article number is retrieved/generated, we write the entire message to per-partition subprocesses via pipes for expensive text+term indexing. When these per-partition subprocesses are done with the expensive text+term indexing, they write SearchMsg (small data) to a shared pipe (inherited from the main V2Writable process) back to the threader, which runs its own subprocess. The number of text+term Xapian partitions is chosen at import and can be made equal to the number of cores in a machine. V2Writable --> Import -> git-fast-import \-> SearchIdxThread -> Msgmap (synchronous) \-> SearchIdxPart[n] -> SearchIdx[*] \-> SearchIdxThread -> SearchIdx ("threader", a subprocess) [* ] each subprocess writes to threader
Diffstat (limited to 'MANIFEST')
-rw-r--r-- | MANIFEST | 2 |
1 files changed, 2 insertions, 0 deletions
@@ -84,6 +84,8 @@ lib/PublicInbox/Reply.pm lib/PublicInbox/SaPlugin/ListMirror.pm lib/PublicInbox/Search.pm lib/PublicInbox/SearchIdx.pm +lib/PublicInbox/SearchIdxPart.pm +lib/PublicInbox/SearchIdxThread.pm lib/PublicInbox/SearchMsg.pm lib/PublicInbox/SearchThread.pm lib/PublicInbox/SearchView.pm |