From 4f2f0eb94739edf315910451bd25e02b0a668c65 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 16 Jan 2018 22:18:16 +0000 Subject: TODO: notes about v2 format for giant archives Inspired by interest in LKML archival: https://public-inbox.org/meta/d5546b24-5840-4ae9-d25b-5e3e737ed73b@linuxfoundation.org --- TODO | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'TODO') diff --git a/TODO b/TODO index 3163b8a8..605013e4 100644 --- a/TODO +++ b/TODO @@ -78,3 +78,34 @@ all need to be considered for everything we introduce) * more and better test cases (use git fast-import to speed up creation) * large mbox/Maildir/MH/NNTP spool import (see PublicInbox::Import) + +* Read-only WebDAV interface to the git repo so it can be mounted + via davfs2 or fusedav to avoid full clones. + +* Improve tree layout to help giant archives (v2 format): + + * Must be optional; old ssoma users may continue using v1 + + * Xapian becomes becomes a requirement when using v2; they + claim good scalability: https://xapian.org/docs/scalability.html + + * Allow git to perform better deltafication for quoted messages + + * Changing tree layout for deltafication means we need to handle + deletes for spam differently than we do now. + + * Deal with duplicate Message-IDs (web UI, at least, not sure about NNTP) + + * (Maybe) SQLite alternatives (MySQL/MariaDB/Pg) for NNTP article + number mapping: https://www.sqlite.org/whentouse.html + + * Ref rotation (splitting heads by YYYY or YYYY-MM) + + * Support multiple git repos for a single archive? + This seems gross, but splitting large packs in in git conflicts + with bitmaps and we want to use both features. Perhaps this + limitation can be fixed in git instead of merely being documented: + https://public-inbox.org/git/20160428072854.GA5252@dcvr.yhbt.net/ + + * Optional history squashing to reduce commit and intermediate + tree objects -- cgit v1.2.3-24-ge0c7