The v1.2.0 is a work-in-progress, while the others are copied out of our mail archives. Eventually, a NEWS file will be generated from these emails and distributed in the release tarball. There'll also be an Atom feed for the website reusing our feed generation code. --- .gitattributes | 2 + Documentation/RelNotes/v1.0.0.eml | 21 ++ Documentation/RelNotes/v1.1.0-pre1.eml | 295 +++++++++++++++++++++++++ Documentation/RelNotes/v1.2.0.wip | 40 ++++ MANIFEST | 4 + 5 files changed, 362 insertions(+) create mode 100644 .gitattributes create mode 100644 Documentation/RelNotes/v1.0.0.eml create mode 100644 Documentation/RelNotes/v1.1.0-pre1.eml create mode 100644 Documentation/RelNotes/v1.2.0.wip diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..bb53518 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,2 @@ +# Email signatures start with "-- \n" +*.eml whitespace=-blank-at-eol diff --git a/Documentation/RelNotes/v1.0.0.eml b/Documentation/RelNotes/v1.0.0.eml new file mode 100644 index 0000000..ae6ea4e --- /dev/null +++ b/Documentation/RelNotes/v1.0.0.eml @@ -0,0 +1,21 @@ +From e@80x24.org Thu Feb 8 02:33:57 2018 +Date: Thu, 8 Feb 2018 02:33:57 +0000 +From: Eric Wong <e@80x24.org> +To: meta@public-inbox.org +Subject: [ANNOUNCE] public-inbox 1.0.0 +Message-ID: <20180208023357.GA32591@80x24.org> + +After some 3.5 odd years of working on this, I suppose now is +as good a time as any to tar this up and call it 1.0.0. + +The TODO list is still very long and there'll be some new +development in coming weeks :> + +So, here you have a release: + + https://public-inbox.org/releases/public-inbox-1.0.0.tar.gz + +Checksums, mainly as a safeguard against accidental file corruption: + +SHA-256 4a08569f3d99310f713bb32bec0aa4819d6b41871e0421ec4eec0657a5582216 + (in other words, don't trust me; instead read the code :>) diff --git a/Documentation/RelNotes/v1.1.0-pre1.eml b/Documentation/RelNotes/v1.1.0-pre1.eml new file mode 100644 index 0000000..ee1ecc3 --- /dev/null +++ b/Documentation/RelNotes/v1.1.0-pre1.eml @@ -0,0 +1,295 @@ +From e@80x24.org Wed May 9 20:23:03 2018 +Date: Wed, 9 May 2018 20:23:03 +0000 +From: Eric Wong <e@80x24.org> +To: meta@public-inbox.org +Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> +Subject: [ANNOUNCE] public-inbox 1.1.0-pre1 +Message-ID: <20180509202303.GA15156@dcvr> + +Pre-release for v2 repository support. +Thanks to The Linux Foundation for supporting this work! + +https://public-inbox.org/releases/public-inbox-1.1.0-pre1.tar.gz + +SHA-256: d0023770a63ca109e6fe2c58b04c58987d4f81572ac69d18f95d6af0915fa009 +(only intended to guard against accidental file corruption) + +shortlog below: + +Eric Wong (27): + nntp: improve fairness during XOVER and similar commands + nntp: do not drain rbuf if there is a command pending + extmsg: use news.gmane.org for Message-ID lookups + searchview: fix non-numeric comparison + mbox: do not barf on queries which return no results + nntp: allow and ignore empty commands + ensure SQLite and Xapian files respect core.sharedRepository + TODO: a few more updates + filter/rubylang: do not set altid on spam training + import: cleanup git cat-file processes when ->done + disallow "\t" and "\n" in OVER headers + searchidx: release lock again during v1 batch callback + searchidx: remove leftover debugging code + convert: copy description and git config from v1 repo + view: untangle loop when showing message headers + view: wrap To: and Cc: headers in HTML display + view: drop redundant References: display code + TODO: add EPOLLEXCLUSIVE item + searchview: do not blindly append "l" parameter to URL + search: avoid repeated mbox results from search + msgmap: add limit to response for NNTP + thread: prevent hidden threads in /$INBOX/ landing page + thread: sort incoming messages by Date + searchidx: preserve umask when starting/committing transactions + scripts/import_slrnspool: support v2 repos + scripts/import_slrnspool: cleanup progress messages + public-inbox 1.1.0-pre1 + +Eric Wong (Contractor, The Linux Foundation) (239): + AUTHORS: add The Linux Foundation + watch_maildir: allow '-' in mail filename + scripts/import_vger_from_mbox: relax From_ line match slightly + import: stop writing legacy ssoma.index by default + import: begin supporting this without ssoma.lock + import: initial handling for v2 + t/import: test for last_object_id insertion + content_id: add test case + searchmsg: add mid_mime import for _extract_mid + scripts/import_vger_from_mbox: support --dry-run option + import: APIs to support v2 use + search: free up 'Q' prefix for a real unique identifier + searchidx: fix comment around next_thread_id + address: extract more characters from email addresses + import: pass "raw" dates to git-fast-import(1) + scripts/import_vger_from_mbox: use v2 layout for import + import: quiet down warnings from bogus From: lines + import: allow the epoch (0s) as a valid time + extmsg: fix broken Xapian MID lookup + search: stop assuming Message-ID is unique + www: stop assuming mainrepo == git_dir + v2writable: initial cut for repo-rotation + git: reload alternates file on missing blob + v2: support Xapian + SQLite indexing + import_vger_from_inbox: allow "-V" option + import_vger_from_mbox: use PublicInbox::MIME and avoid clobbering + v2: parallelize Xapian indexing + v2writable: round-robin to partitions based on article number + searchidxpart: increase pipe size for partitions + v2writable: warn on duplicate Message-IDs + searchidx: do not modify Xapian DB while iterating + v2/ui: some hacky things to get the PSGI UI to show up + v2/ui: retry DB reopens in a few more places + v2writable: cleanup unused pipes in partitions + searchidxpart: binmode + use PublicInbox::MIME consistently + searchidxpart: chomp line before splitting + searchidx*: name child subprocesses + searchidx: get rid of pointless index_blob wrapper + view: remove X-PI-TS reference + searchidxthread: load doc data for references + searchidxpart: force integers into add_message + search: reopen skeleton DB as well + searchidx: index values in the threader + search: use different Enquire object for skeleton queries + rename SearchIdxThread to SearchIdxSkeleton + v2writable: commit to skeleton via remote partitions + searchidxskeleton: extra error checking + searchidx: do not modify Xapian DB while iterating + search: query_xover uses skeleton DB iff available + v2/ui: get nntpd and init tests running on v2 + v2writable: delete ::Import obj when ->done + search: remove informational "warning" message + searchidx: add PID to error message when die-ing + content_id: special treatment for Message-Id headers + evcleanup: disable outside of daemon + v2writable: deduplicate detection on add + evcleanup: do not create event loop if nothing was registered + mid: add `mids' and `references' methods for extraction + content_id: use `mids' and `references' for MID extraction + searchidx: use new `references' method for parsing References + content_id: no need to be human-friendly + v2writable: inject new Message-IDs on true duplicates + search: revert to using 'Q' as a uniQue id per-Xapian conventions + searchidx: support indexing multiple MIDs + mid: be strict with References, but loose on Message-Id + searchidx: avoid excessive XNQ indexing with diffs + searchidxskeleton: add a note about locking + v2writable: generated Message-ID goes first + searchidx: use add_boolean_term for internal terms + searchidx: add NNTP article number as a searchable term + mid: truncate excessively long MIDs early + nntp: use NNTP article numbers for lookups + nntp: fix NEWNEWS command + searchidx: store the primary MID in doc data for NNTP + import: consolidate object info for v2 imports + v2: avoid redundant/repeated configs for git partition repos + INSTALL: document more optional dependencies + search: favor skeleton DB for lookup_mail + search: each_smsg_by_mid uses skeleton if available + v2writable: remove unnecessary skeleton commit + favor Received: date over Date: header globally + import: fall back to Sender for extracting name and email + scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping + v2writable: detect and use previous partition count + extmsg: rework partial MID matching to favor current inbox + extmsg: rework partial MID matching to favor current inbox + content_id: use Sender header if From is not available + v2writable: support "barrier" operation to avoid reforking + use string ref for Email::Simple->new + v2writable: remove unnecessary idx_init call + searchidx: do not delete documents while iterating + search: allow ->reopen to be chainable + v2writable: implement remove correctly + skeleton: barrier init requires a lock + import: (v2) delete writes the blob into history in subdir + import: (v2): write deletes to a separate '_' subdirectory + import: implement barrier operation for v1 repos + mid: mid_mime uses v2-compatible mids function + watchmaildir: use content_digest to generate Message-Id + import: force Message-ID generation for v1 here + import: switch to URL-safe Base64 for Message-IDs + v2writable: test for idempotent removals + import: enable locking under v2 + index: s/GIT_DIR/REPO_DIR/ + Lock: new base class for writable lockers + t/watch_maildir: note the reason for FIFO creation + v2writable: ensure ->done is idempotent + watchmaildir: support v2 repositories + searchidxpart: s/barrier/remote_barrier/ + v2writable: allow disabling parallelization + scripts/import_vger_from_mbox: filter out same headers as MDA + v2writable: add DEBUG_DIFF env support + v2writable: remove "resent" message for duplicate Message-IDs + content_id: do not take Message-Id into account + introduce InboxWritable class + import: discard all the same headers as MDA + InboxWritable: add mbox/maildir parsing + import logic + use both Date: and Received: times + msgmap: add tmp_clone to create an anonymous copy + fix syntax warnings + v2writable: support reindexing Xapian + t/altid.t: extra tests for mid_set + v2writable: add NNTP article number regeneration support + v2writable: clarify header cleanups + v2writable: DEBUG_DIFF respects $TMPDIR + feed: $INBOX/new.atom endpoint supports v2 inboxes + import: consolidate mid prepend logic, here + www: $MESSAGE_ID/raw endpoint supports "duplicates" + search: reopen DB if each_smsg_by_mid fails + t/psgi_v2: minimal test for Atom feed and t.mbox.gz + feed: fix new.html for v2 + view: permalink (per-message) view shows multiple messages + searchidx: warn about vivifying multiple ghosts + v2writable: warn on unseen deleted files + www: get rid of unnecessary 'inbox' name reference + searchview: remove unnecessary imports from MID module + view: depend on SearchMsg for Message-ID + http: fix modification of read-only value + githttpbackend: avoid infinite loop on generic PSGI servers + www: support cloning individual v2 git partitions + http: fix modification of read-only value + githttpbackend: avoid infinite loop on generic PSGI servers + www: remove unnecessary ghost checks + v2writable: append, instead of prepending generated Message-ID + lookup by Message-ID favors the "primary" one + www: fix attachment downloads for conflicted Message-IDs + searchmsg: document why we store To: and Cc: for NNTP + public-inbox-convert: tool for converting old to new inboxes + v2writable: support purging messages from git entirely + search: cleanup uniqueness checking + search: get rid of most lookup_* subroutines + search: move find_doc_ids to searchidx + v2writable: cleanup: get rid of unused fields + mbox: avoid extracting Message-ID for linkification + www: cleanup expensive fallback for legacy URLs + view: get rid of some unnecessary imports + search: retry_reopen on first_smsg_by_mid + import: run_die supports redirects as spawn does + v2writable: initializing an existing inbox is idempotent + public-inbox-compact: new tool for driving xapian-compact + mda: support v2 inboxes + search: warn on reopens and die on total failure + v2writable: allow gaps in git partitions + v2writable: convert some fatal reindex errors to warnings + wwwstream: flesh out clone instructions for v2 + v2writable: go backwards through alternate Message-IDs + view: speed up homepage loading time with date clamp + view: drop load_results + feed: optimize query for feeds, too + msgtime: parse 3-digit years properly + convert: avoid redundant "done\n" statement for fast-import + search: move permissions handling to InboxWritable + t/v2writable: use simplify permissions reading + v2: respect core.sharedRepository in git configs + searchidx: correct warning for over-vivification + v2: one file, really + v2writable: fix parallel termination + truncate Message-IDs and References consistently + scripts/import_vger_from_mbox: set address properly + search: reduce columns stored in Xapian + replace Xapian skeleton with SQLite overview DB + v2writable: simplify barrier vs checkpoints + t/over: test empty Subject: line matching + www: rework query responses to avoid COUNT in SQLite + over: speedup get_thread by avoiding JOIN + nntp: fix NEWNEWS command + t/thread-all.t: modernize test to support modern inboxes + rename+rewrite test using Benchmark module + nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster + view: avoid offset during pagination + mbox: remove remaining OFFSET usage in SQLite + msgmap: replace id_batch with ids_after + nntp: simplify the long_response API + searchidx: ensure duplicated Message-IDs can be linked together + init: s/GIT_DIR/REPO_DIR/ in usage + import: rewrite less history during purge + v2: support incremental indexing + purge + v2writable: do not modify DBs while iterating for ->remove + v2writable: recount partitions after acquiring lock + searchmsg: remove unused `tid' and `path' methods + search: remove unnecessary OP_AND of query + mbox: do not sort search results + searchview: minor cleanup + support altid mechanism for v2 + compact: better handling of over.sqlite3* files + v2writable: remove redundant remove from Over DB + v2writable: allow tracking parallel versions + v2writable: refer to git each repository as "epoch" + over: use only supported and safe SQLite APIs + search: index and allow searching by date-time + altid: fix miscopied field name + nntp: set Xref across multiple inboxes + www: favor reading more from SQLite, and less from Xapian + ensure Xapian and SQLite are still optional for v1 tests + psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological + over: avoid excessive SELECT + over: remove forked subprocess + v2writable: reduce barriers + index: allow specifying --jobs=0 to disable multiprocess + convert: support converting with altid defined + store less data in the Xapian document + msgmap: speed up minmax with separate queries + feed: respect feedmax, again + v1: remove articles from overview DB + compact: do not merge v2 repos by default + v2writable: reduce partititions by one + search: preserve References in Xapian smsg for x=t view + v2: generate better Message-IDs for duplicates + v2: improve deduplication checks + import: cat_blob drops leading 'From ' lines like Inbox + searchidx: regenerate and avoid article number gaps on full index + extmsg: remove expensive git path checks + use %H consistently to disable abbreviations + searchidx: increase term positions for all text terms + searchidx: revert default BATCH_BYTES to 1_000_000 + Merge remote-tracking branch 'origin/master' into v2 + fix tests to run without Xapian installed + extmsg: use Xapian only for partial matches + +Jonathan Corbet (3): + Don't use LIMIT in UPDATE statements + Update the installation instructions with Fedora package names + Allow specification of the number of search results to return +-- +git clone https://public-inbox.org/ public-inbox +(working on a homepage... sorta :) diff --git a/Documentation/RelNotes/v1.2.0.wip b/Documentation/RelNotes/v1.2.0.wip new file mode 100644 index 0000000..41236a0 --- /dev/null +++ b/Documentation/RelNotes/v1.2.0.wip @@ -0,0 +1,40 @@ +To: meta@public-inbox.org +Subject: [WIP] public-inbox 1.2.0 + +* first non-pre/rc release with v2 format support for scalability. + See public-inbox-v2-format(5) manpage for more details. + +* new admin tools for v2 repos: + - public-inbox-convert - converts v1 to v2 repo formats + - public-inbox-compact - v2 convenience wrapper for xapian-compact(1) + - public-inbox-purge - purges entire messages out of v2 history + - public-inbox-edit - edits sensitive data out messages from v2 history + - public-inbox-xcpdb - copydatabase(1) wrapper to upgrade Xapian formats + (e.g. from "chert" to "glass") and resharding + of v2 repos + +* SQLite3 support decoupled from Xapian support, and Xapian DBs may be + configured without phrase support to save space. See "indexlevel" in + public-inbox-config(5) manpage for more info. + +* public-inbox-nntpd + - support STARTTLS and NNTPS + - support COMPRESS extension + - fix several RFC3977 compliance bugs + - improved interopability with picky clients such as leafnode + +* public-inbox-watch + - support multiple spam training directories + - support mapping multiple inboxes per Maildir + +* PublicInbox::WWW + - grokmirror-compatible manifest.js.gz endpoint generation + - user-configurable color support in $INBOX_URL/_/text/color/ + - BOFHs may set default colors via "publicinbox.css" + (see public-inbox-config(5)) + +* Danga::Socket is no longer a runtime dependency of daemons. + +* improved FreeBSD support + +See archives at https://public-inbox.org/meta/ for all history. diff --git a/MANIFEST b/MANIFEST index f5290b4..ecf239f 100644 --- a/MANIFEST +++ b/MANIFEST @@ -1,7 +1,11 @@ +.gitattributes .gitignore AUTHORS COPYING Documentation/.gitignore +Documentation/RelNotes/v1.0.0.eml +Documentation/RelNotes/v1.1.0-pre1.eml +Documentation/RelNotes/v1.2.0.wip Documentation/dc-dlvr-spam-flow.txt Documentation/design_notes.txt Documentation/design_www.txt -- EW
Pre-release for v2 repository support. Thanks to The Linux Foundation for supporting this work! https://public-inbox.org/releases/public-inbox-1.1.0-pre1.tar.gz SHA-256: d0023770a63ca109e6fe2c58b04c58987d4f81572ac69d18f95d6af0915fa009 (only intended to guard against accidental file corruption) shortlog below: Eric Wong (27): nntp: improve fairness during XOVER and similar commands nntp: do not drain rbuf if there is a command pending extmsg: use news.gmane.org for Message-ID lookups searchview: fix non-numeric comparison mbox: do not barf on queries which return no results nntp: allow and ignore empty commands ensure SQLite and Xapian files respect core.sharedRepository TODO: a few more updates filter/rubylang: do not set altid on spam training import: cleanup git cat-file processes when ->done disallow "\t" and "\n" in OVER headers searchidx: release lock again during v1 batch callback searchidx: remove leftover debugging code convert: copy description and git config from v1 repo view: untangle loop when showing message headers view: wrap To: and Cc: headers in HTML display view: drop redundant References: display code TODO: add EPOLLEXCLUSIVE item searchview: do not blindly append "l" parameter to URL search: avoid repeated mbox results from search msgmap: add limit to response for NNTP thread: prevent hidden threads in /$INBOX/ landing page thread: sort incoming messages by Date searchidx: preserve umask when starting/committing transactions scripts/import_slrnspool: support v2 repos scripts/import_slrnspool: cleanup progress messages public-inbox 1.1.0-pre1 Eric Wong (Contractor, The Linux Foundation) (239): AUTHORS: add The Linux Foundation watch_maildir: allow '-' in mail filename scripts/import_vger_from_mbox: relax From_ line match slightly import: stop writing legacy ssoma.index by default import: begin supporting this without ssoma.lock import: initial handling for v2 t/import: test for last_object_id insertion content_id: add test case searchmsg: add mid_mime import for _extract_mid scripts/import_vger_from_mbox: support --dry-run option import: APIs to support v2 use search: free up 'Q' prefix for a real unique identifier searchidx: fix comment around next_thread_id address: extract more characters from email addresses import: pass "raw" dates to git-fast-import(1) scripts/import_vger_from_mbox: use v2 layout for import import: quiet down warnings from bogus From: lines import: allow the epoch (0s) as a valid time extmsg: fix broken Xapian MID lookup search: stop assuming Message-ID is unique www: stop assuming mainrepo == git_dir v2writable: initial cut for repo-rotation git: reload alternates file on missing blob v2: support Xapian + SQLite indexing import_vger_from_inbox: allow "-V" option import_vger_from_mbox: use PublicInbox::MIME and avoid clobbering v2: parallelize Xapian indexing v2writable: round-robin to partitions based on article number searchidxpart: increase pipe size for partitions v2writable: warn on duplicate Message-IDs searchidx: do not modify Xapian DB while iterating v2/ui: some hacky things to get the PSGI UI to show up v2/ui: retry DB reopens in a few more places v2writable: cleanup unused pipes in partitions searchidxpart: binmode use PublicInbox::MIME consistently searchidxpart: chomp line before splitting searchidx*: name child subprocesses searchidx: get rid of pointless index_blob wrapper view: remove X-PI-TS reference searchidxthread: load doc data for references searchidxpart: force integers into add_message search: reopen skeleton DB as well searchidx: index values in the threader search: use different Enquire object for skeleton queries rename SearchIdxThread to SearchIdxSkeleton v2writable: commit to skeleton via remote partitions searchidxskeleton: extra error checking searchidx: do not modify Xapian DB while iterating search: query_xover uses skeleton DB iff available v2/ui: get nntpd and init tests running on v2 v2writable: delete ::Import obj when ->done search: remove informational "warning" message searchidx: add PID to error message when die-ing content_id: special treatment for Message-Id headers evcleanup: disable outside of daemon v2writable: deduplicate detection on add evcleanup: do not create event loop if nothing was registered mid: add `mids' and `references' methods for extraction content_id: use `mids' and `references' for MID extraction searchidx: use new `references' method for parsing References content_id: no need to be human-friendly v2writable: inject new Message-IDs on true duplicates search: revert to using 'Q' as a uniQue id per-Xapian conventions searchidx: support indexing multiple MIDs mid: be strict with References, but loose on Message-Id searchidx: avoid excessive XNQ indexing with diffs searchidxskeleton: add a note about locking v2writable: generated Message-ID goes first searchidx: use add_boolean_term for internal terms searchidx: add NNTP article number as a searchable term mid: truncate excessively long MIDs early nntp: use NNTP article numbers for lookups nntp: fix NEWNEWS command searchidx: store the primary MID in doc data for NNTP import: consolidate object info for v2 imports v2: avoid redundant/repeated configs for git partition repos INSTALL: document more optional dependencies search: favor skeleton DB for lookup_mail search: each_smsg_by_mid uses skeleton if available v2writable: remove unnecessary skeleton commit favor Received: date over Date: header globally import: fall back to Sender for extracting name and email scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping v2writable: detect and use previous partition count extmsg: rework partial MID matching to favor current inbox extmsg: rework partial MID matching to favor current inbox content_id: use Sender header if From is not available v2writable: support "barrier" operation to avoid reforking use string ref for Email::Simple->new v2writable: remove unnecessary idx_init call searchidx: do not delete documents while iterating search: allow ->reopen to be chainable v2writable: implement remove correctly skeleton: barrier init requires a lock import: (v2) delete writes the blob into history in subdir import: (v2): write deletes to a separate '_' subdirectory import: implement barrier operation for v1 repos mid: mid_mime uses v2-compatible mids function watchmaildir: use content_digest to generate Message-Id import: force Message-ID generation for v1 here import: switch to URL-safe Base64 for Message-IDs v2writable: test for idempotent removals import: enable locking under v2 index: s/GIT_DIR/REPO_DIR/ Lock: new base class for writable lockers t/watch_maildir: note the reason for FIFO creation v2writable: ensure ->done is idempotent watchmaildir: support v2 repositories searchidxpart: s/barrier/remote_barrier/ v2writable: allow disabling parallelization scripts/import_vger_from_mbox: filter out same headers as MDA v2writable: add DEBUG_DIFF env support v2writable: remove "resent" message for duplicate Message-IDs content_id: do not take Message-Id into account introduce InboxWritable class import: discard all the same headers as MDA InboxWritable: add mbox/maildir parsing + import logic use both Date: and Received: times msgmap: add tmp_clone to create an anonymous copy fix syntax warnings v2writable: support reindexing Xapian t/altid.t: extra tests for mid_set v2writable: add NNTP article number regeneration support v2writable: clarify header cleanups v2writable: DEBUG_DIFF respects $TMPDIR feed: $INBOX/new.atom endpoint supports v2 inboxes import: consolidate mid prepend logic, here www: $MESSAGE_ID/raw endpoint supports "duplicates" search: reopen DB if each_smsg_by_mid fails t/psgi_v2: minimal test for Atom feed and t.mbox.gz feed: fix new.html for v2 view: permalink (per-message) view shows multiple messages searchidx: warn about vivifying multiple ghosts v2writable: warn on unseen deleted files www: get rid of unnecessary 'inbox' name reference searchview: remove unnecessary imports from MID module view: depend on SearchMsg for Message-ID http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: support cloning individual v2 git partitions http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: remove unnecessary ghost checks v2writable: append, instead of prepending generated Message-ID lookup by Message-ID favors the "primary" one www: fix attachment downloads for conflicted Message-IDs searchmsg: document why we store To: and Cc: for NNTP public-inbox-convert: tool for converting old to new inboxes v2writable: support purging messages from git entirely search: cleanup uniqueness checking search: get rid of most lookup_* subroutines search: move find_doc_ids to searchidx v2writable: cleanup: get rid of unused fields mbox: avoid extracting Message-ID for linkification www: cleanup expensive fallback for legacy URLs view: get rid of some unnecessary imports search: retry_reopen on first_smsg_by_mid import: run_die supports redirects as spawn does v2writable: initializing an existing inbox is idempotent public-inbox-compact: new tool for driving xapian-compact mda: support v2 inboxes search: warn on reopens and die on total failure v2writable: allow gaps in git partitions v2writable: convert some fatal reindex errors to warnings wwwstream: flesh out clone instructions for v2 v2writable: go backwards through alternate Message-IDs view: speed up homepage loading time with date clamp view: drop load_results feed: optimize query for feeds, too msgtime: parse 3-digit years properly convert: avoid redundant "done\n" statement for fast-import search: move permissions handling to InboxWritable t/v2writable: use simplify permissions reading v2: respect core.sharedRepository in git configs searchidx: correct warning for over-vivification v2: one file, really v2writable: fix parallel termination truncate Message-IDs and References consistently scripts/import_vger_from_mbox: set address properly search: reduce columns stored in Xapian replace Xapian skeleton with SQLite overview DB v2writable: simplify barrier vs checkpoints t/over: test empty Subject: line matching www: rework query responses to avoid COUNT in SQLite over: speedup get_thread by avoiding JOIN nntp: fix NEWNEWS command t/thread-all.t: modernize test to support modern inboxes rename+rewrite test using Benchmark module nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster view: avoid offset during pagination mbox: remove remaining OFFSET usage in SQLite msgmap: replace id_batch with ids_after nntp: simplify the long_response API searchidx: ensure duplicated Message-IDs can be linked together init: s/GIT_DIR/REPO_DIR/ in usage import: rewrite less history during purge v2: support incremental indexing + purge v2writable: do not modify DBs while iterating for ->remove v2writable: recount partitions after acquiring lock searchmsg: remove unused `tid' and `path' methods search: remove unnecessary OP_AND of query mbox: do not sort search results searchview: minor cleanup support altid mechanism for v2 compact: better handling of over.sqlite3* files v2writable: remove redundant remove from Over DB v2writable: allow tracking parallel versions v2writable: refer to git each repository as "epoch" over: use only supported and safe SQLite APIs search: index and allow searching by date-time altid: fix miscopied field name nntp: set Xref across multiple inboxes www: favor reading more from SQLite, and less from Xapian ensure Xapian and SQLite are still optional for v1 tests psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological over: avoid excessive SELECT over: remove forked subprocess v2writable: reduce barriers index: allow specifying --jobs=0 to disable multiprocess convert: support converting with altid defined store less data in the Xapian document msgmap: speed up minmax with separate queries feed: respect feedmax, again v1: remove articles from overview DB compact: do not merge v2 repos by default v2writable: reduce partititions by one search: preserve References in Xapian smsg for x=t view v2: generate better Message-IDs for duplicates v2: improve deduplication checks import: cat_blob drops leading 'From ' lines like Inbox searchidx: regenerate and avoid article number gaps on full index extmsg: remove expensive git path checks use %H consistently to disable abbreviations searchidx: increase term positions for all text terms searchidx: revert default BATCH_BYTES to 1_000_000 Merge remote-tracking branch 'origin/master' into v2 fix tests to run without Xapian installed extmsg: use Xapian only for partial matches Jonathan Corbet (3): Don't use LIMIT in UPDATE statements Update the installation instructions with Fedora package names Allow specification of the number of search results to return -- git clone https://public-inbox.org/ public-inbox (working on a homepage... sorta :)
I actually merged master into v2, so it's a bit backwards :P commit cfb8d16578e7f2f2e300f9f436205e4a8fc7f322 Merge: 1dc0f0c 119463b Author: Eric Wong (Contractor, The Linux Foundation) <e@80x24.org> Date: Wed Apr 18 20:58:35 2018 +0000 Merge remote-tracking branch 'origin/master' into v2 I screwed up the indexing on http://hjrcffqmbrq6wope.onion/git/ so that's still going, but I think I was able to update the rest of them (including the heavily trafficked non-.onion) w/o downtime. The mirror at http://czquwvybam4bgbro.onion/git/ has been running the v2 code for over a week, now. Thanks to the Linux Foundation for funding this work. Will still need to make some documentation updates and such. Eric Wong (Contractor, The Linux Foundation) (237): AUTHORS: add The Linux Foundation watch_maildir: allow '-' in mail filename scripts/import_vger_from_mbox: relax From_ line match slightly import: stop writing legacy ssoma.index by default import: begin supporting this without ssoma.lock import: initial handling for v2 t/import: test for last_object_id insertion content_id: add test case searchmsg: add mid_mime import for _extract_mid scripts/import_vger_from_mbox: support --dry-run option import: APIs to support v2 use search: free up 'Q' prefix for a real unique identifier searchidx: fix comment around next_thread_id address: extract more characters from email addresses import: pass "raw" dates to git-fast-import(1) scripts/import_vger_from_mbox: use v2 layout for import import: quiet down warnings from bogus From: lines import: allow the epoch (0s) as a valid time extmsg: fix broken Xapian MID lookup search: stop assuming Message-ID is unique www: stop assuming mainrepo == git_dir v2writable: initial cut for repo-rotation git: reload alternates file on missing blob v2: support Xapian + SQLite indexing import_vger_from_inbox: allow "-V" option import_vger_from_mbox: use PublicInbox::MIME and avoid clobbering v2: parallelize Xapian indexing v2writable: round-robin to partitions based on article number searchidxpart: increase pipe size for partitions v2writable: warn on duplicate Message-IDs searchidx: do not modify Xapian DB while iterating v2/ui: some hacky things to get the PSGI UI to show up v2/ui: retry DB reopens in a few more places v2writable: cleanup unused pipes in partitions searchidxpart: binmode use PublicInbox::MIME consistently searchidxpart: chomp line before splitting searchidx*: name child subprocesses searchidx: get rid of pointless index_blob wrapper view: remove X-PI-TS reference searchidxthread: load doc data for references searchidxpart: force integers into add_message search: reopen skeleton DB as well searchidx: index values in the threader search: use different Enquire object for skeleton queries rename SearchIdxThread to SearchIdxSkeleton v2writable: commit to skeleton via remote partitions searchidxskeleton: extra error checking searchidx: do not modify Xapian DB while iterating search: query_xover uses skeleton DB iff available v2/ui: get nntpd and init tests running on v2 v2writable: delete ::Import obj when ->done search: remove informational "warning" message searchidx: add PID to error message when die-ing content_id: special treatment for Message-Id headers evcleanup: disable outside of daemon v2writable: deduplicate detection on add evcleanup: do not create event loop if nothing was registered mid: add `mids' and `references' methods for extraction content_id: use `mids' and `references' for MID extraction searchidx: use new `references' method for parsing References content_id: no need to be human-friendly v2writable: inject new Message-IDs on true duplicates search: revert to using 'Q' as a uniQue id per-Xapian conventions searchidx: support indexing multiple MIDs mid: be strict with References, but loose on Message-Id searchidx: avoid excessive XNQ indexing with diffs searchidxskeleton: add a note about locking v2writable: generated Message-ID goes first searchidx: use add_boolean_term for internal terms searchidx: add NNTP article number as a searchable term mid: truncate excessively long MIDs early nntp: use NNTP article numbers for lookups nntp: fix NEWNEWS command searchidx: store the primary MID in doc data for NNTP import: consolidate object info for v2 imports v2: avoid redundant/repeated configs for git partition repos INSTALL: document more optional dependencies search: favor skeleton DB for lookup_mail search: each_smsg_by_mid uses skeleton if available v2writable: remove unnecessary skeleton commit favor Received: date over Date: header globally import: fall back to Sender for extracting name and email scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping v2writable: detect and use previous partition count extmsg: rework partial MID matching to favor current inbox extmsg: rework partial MID matching to favor current inbox content_id: use Sender header if From is not available v2writable: support "barrier" operation to avoid reforking use string ref for Email::Simple->new v2writable: remove unnecessary idx_init call searchidx: do not delete documents while iterating search: allow ->reopen to be chainable v2writable: implement remove correctly skeleton: barrier init requires a lock import: (v2) delete writes the blob into history in subdir import: (v2): write deletes to a separate '_' subdirectory import: implement barrier operation for v1 repos mid: mid_mime uses v2-compatible mids function watchmaildir: use content_digest to generate Message-Id import: force Message-ID generation for v1 here import: switch to URL-safe Base64 for Message-IDs v2writable: test for idempotent removals import: enable locking under v2 index: s/GIT_DIR/REPO_DIR/ Lock: new base class for writable lockers t/watch_maildir: note the reason for FIFO creation v2writable: ensure ->done is idempotent watchmaildir: support v2 repositories searchidxpart: s/barrier/remote_barrier/ v2writable: allow disabling parallelization scripts/import_vger_from_mbox: filter out same headers as MDA v2writable: add DEBUG_DIFF env support v2writable: remove "resent" message for duplicate Message-IDs content_id: do not take Message-Id into account introduce InboxWritable class import: discard all the same headers as MDA InboxWritable: add mbox/maildir parsing + import logic use both Date: and Received: times msgmap: add tmp_clone to create an anonymous copy fix syntax warnings v2writable: support reindexing Xapian t/altid.t: extra tests for mid_set v2writable: add NNTP article number regeneration support v2writable: clarify header cleanups v2writable: DEBUG_DIFF respects $TMPDIR feed: $INBOX/new.atom endpoint supports v2 inboxes import: consolidate mid prepend logic, here www: $MESSAGE_ID/raw endpoint supports "duplicates" search: reopen DB if each_smsg_by_mid fails t/psgi_v2: minimal test for Atom feed and t.mbox.gz feed: fix new.html for v2 view: permalink (per-message) view shows multiple messages searchidx: warn about vivifying multiple ghosts v2writable: warn on unseen deleted files www: get rid of unnecessary 'inbox' name reference searchview: remove unnecessary imports from MID module view: depend on SearchMsg for Message-ID http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: support cloning individual v2 git partitions http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: remove unnecessary ghost checks v2writable: append, instead of prepending generated Message-ID lookup by Message-ID favors the "primary" one www: fix attachment downloads for conflicted Message-IDs searchmsg: document why we store To: and Cc: for NNTP public-inbox-convert: tool for converting old to new inboxes v2writable: support purging messages from git entirely search: cleanup uniqueness checking search: get rid of most lookup_* subroutines search: move find_doc_ids to searchidx v2writable: cleanup: get rid of unused fields mbox: avoid extracting Message-ID for linkification www: cleanup expensive fallback for legacy URLs view: get rid of some unnecessary imports search: retry_reopen on first_smsg_by_mid import: run_die supports redirects as spawn does v2writable: initializing an existing inbox is idempotent public-inbox-compact: new tool for driving xapian-compact mda: support v2 inboxes search: warn on reopens and die on total failure v2writable: allow gaps in git partitions v2writable: convert some fatal reindex errors to warnings wwwstream: flesh out clone instructions for v2 v2writable: go backwards through alternate Message-IDs view: speed up homepage loading time with date clamp view: drop load_results feed: optimize query for feeds, too msgtime: parse 3-digit years properly convert: avoid redundant "done\n" statement for fast-import search: move permissions handling to InboxWritable t/v2writable: use simplify permissions reading v2: respect core.sharedRepository in git configs searchidx: correct warning for over-vivification v2: one file, really v2writable: fix parallel termination truncate Message-IDs and References consistently scripts/import_vger_from_mbox: set address properly search: reduce columns stored in Xapian replace Xapian skeleton with SQLite overview DB v2writable: simplify barrier vs checkpoints t/over: test empty Subject: line matching www: rework query responses to avoid COUNT in SQLite over: speedup get_thread by avoiding JOIN nntp: fix NEWNEWS command t/thread-all.t: modernize test to support modern inboxes rename+rewrite test using Benchmark module nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster view: avoid offset during pagination mbox: remove remaining OFFSET usage in SQLite msgmap: replace id_batch with ids_after nntp: simplify the long_response API searchidx: ensure duplicated Message-IDs can be linked together init: s/GIT_DIR/REPO_DIR/ in usage import: rewrite less history during purge v2: support incremental indexing + purge v2writable: do not modify DBs while iterating for ->remove v2writable: recount partitions after acquiring lock searchmsg: remove unused `tid' and `path' methods search: remove unnecessary OP_AND of query mbox: do not sort search results searchview: minor cleanup support altid mechanism for v2 compact: better handling of over.sqlite3* files v2writable: remove redundant remove from Over DB v2writable: allow tracking parallel versions v2writable: refer to git each repository as "epoch" over: use only supported and safe SQLite APIs search: index and allow searching by date-time altid: fix miscopied field name nntp: set Xref across multiple inboxes www: favor reading more from SQLite, and less from Xapian ensure Xapian and SQLite are still optional for v1 tests psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological over: avoid excessive SELECT over: remove forked subprocess v2writable: reduce barriers index: allow specifying --jobs=0 to disable multiprocess convert: support converting with altid defined store less data in the Xapian document msgmap: speed up minmax with separate queries feed: respect feedmax, again v1: remove articles from overview DB compact: do not merge v2 repos by default v2writable: reduce partititions by one search: preserve References in Xapian smsg for x=t view v2: generate better Message-IDs for duplicates v2: improve deduplication checks import: cat_blob drops leading 'From ' lines like Inbox searchidx: regenerate and avoid article number gaps on full index extmsg: remove expensive git path checks use %H consistently to disable abbreviations searchidx: increase term positions for all text terms searchidx: revert default BATCH_BYTES to 1_000_000 Merge remote-tracking branch 'origin/master' into v2
Xapian is size-intensive and SQLite is not strictly necessary for v1. --- script/public-inbox-compact | 2 +- scripts/import_vger_from_mbox | 2 +- t/convert-compact.t | 2 +- t/v2mirror.t | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/script/public-inbox-compact b/script/public-inbox-compact index 43e9460..d855b9e 100755 --- a/script/public-inbox-compact +++ b/script/public-inbox-compact @@ -4,9 +4,9 @@ use strict; use warnings; use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev); -use PublicInbox::V2Writable; use PublicInbox::Search; use PublicInbox::Config; +use PublicInbox::InboxWritable; use Cwd 'abs_path'; use File::Temp qw(tempdir); use File::Path qw(remove_tree); diff --git a/scripts/import_vger_from_mbox b/scripts/import_vger_from_mbox index 191f75d..ca5a408 100644 --- a/scripts/import_vger_from_mbox +++ b/scripts/import_vger_from_mbox @@ -6,7 +6,6 @@ use warnings; use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev/; use PublicInbox::MIME; use PublicInbox::InboxWritable; -use PublicInbox::V2Writable; use PublicInbox::Import; use PublicInbox::MDA; my $usage = "usage: $0 NAME EMAIL DIR <MBOX\n"; @@ -35,6 +34,7 @@ my $ibx = { $ibx = PublicInbox::Inbox->new($ibx); unless ($dry_run) { if ($version >= 2) { + require PublicInbox::V2Writable; PublicInbox::V2Writable->new($ibx, 1)->init_inbox(0); } else { system(qw(git init --bare -q), $mainrepo) == 0 or die; diff --git a/t/convert-compact.t b/t/convert-compact.t index e2ba40a..5caa0ac 100644 --- a/t/convert-compact.t +++ b/t/convert-compact.t @@ -10,7 +10,7 @@ foreach my $mod (@mods) { eval "require $mod"; plan skip_all => "$mod missing for convert-compact.t" if $@; } -use PublicInbox::V2Writable; +use_ok 'PublicInbox::V2Writable'; use PublicInbox::Import; my $tmpdir = tempdir('convert-compact-XXXXXX', TMPDIR => 1, CLEANUP => 1); my $ibx = { diff --git a/t/v2mirror.t b/t/v2mirror.t index 0c66aef..9e0c9e1 100644 --- a/t/v2mirror.t +++ b/t/v2mirror.t @@ -13,7 +13,7 @@ foreach my $mod (qw(Plack::Util Plack::Builder Danga::Socket use File::Temp qw/tempdir/; use IO::Socket; use POSIX qw(dup2); -use PublicInbox::V2Writable; +use_ok 'PublicInbox::V2Writable'; use PublicInbox::MIME; use PublicInbox::Config; use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD); -- EW
Eric Wong (Contractor, The Linux Foundation) (4): altid: fix miscopied field name nntp: set Xref across multiple inboxes www: favor reading more from SQLite, and less from Xapian ensure Xapian and SQLite are still optional for v1 tests lib/PublicInbox/AltId.pm | 2 +- lib/PublicInbox/Inbox.pm | 2 +- lib/PublicInbox/Mbox.pm | 39 +++++++++++++------------------- lib/PublicInbox/NNTP.pm | 43 ++++++++++++++++++++++------------- lib/PublicInbox/Over.pm | 29 ++++++++++++++++++++++++ lib/PublicInbox/Search.pm | 52 ++++--------------------------------------- lib/PublicInbox/SearchIdx.pm | 3 ++- lib/PublicInbox/V2Writable.pm | 34 ++++++++++++---------------- lib/PublicInbox/View.pm | 35 ++++++++++------------------- script/public-inbox-compact | 2 +- scripts/import_vger_from_mbox | 2 +- t/convert-compact.t | 2 +- t/nntp.t | 6 +++-- t/psgi_v2.t | 1 + t/search.t | 16 +++++-------- t/v2mirror.t | 2 +- t/v2writable.t | 18 +++++++-------- 17 files changed, 129 insertions(+), 159 deletions(-) -- EW