The v1.2.0 is a work-in-progress, while the others are copied out of our mail archives. Eventually, a NEWS file will be generated from these emails and distributed in the release tarball. There'll also be an Atom feed for the website reusing our feed generation code. --- .gitattributes | 2 + Documentation/RelNotes/v1.0.0.eml | 21 ++ Documentation/RelNotes/v1.1.0-pre1.eml | 295 +++++++++++++++++++++++++ Documentation/RelNotes/v1.2.0.wip | 40 ++++ MANIFEST | 4 + 5 files changed, 362 insertions(+) create mode 100644 .gitattributes create mode 100644 Documentation/RelNotes/v1.0.0.eml create mode 100644 Documentation/RelNotes/v1.1.0-pre1.eml create mode 100644 Documentation/RelNotes/v1.2.0.wip diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..bb53518 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,2 @@ +# Email signatures start with "-- \n" +*.eml whitespace=-blank-at-eol diff --git a/Documentation/RelNotes/v1.0.0.eml b/Documentation/RelNotes/v1.0.0.eml new file mode 100644 index 0000000..ae6ea4e --- /dev/null +++ b/Documentation/RelNotes/v1.0.0.eml @@ -0,0 +1,21 @@ +From e@80x24.org Thu Feb 8 02:33:57 2018 +Date: Thu, 8 Feb 2018 02:33:57 +0000 +From: Eric Wong <e@80x24.org> +To: meta@public-inbox.org +Subject: [ANNOUNCE] public-inbox 1.0.0 +Message-ID: <20180208023357.GA32591@80x24.org> + +After some 3.5 odd years of working on this, I suppose now is +as good a time as any to tar this up and call it 1.0.0. + +The TODO list is still very long and there'll be some new +development in coming weeks :> + +So, here you have a release: + + https://public-inbox.org/releases/public-inbox-1.0.0.tar.gz + +Checksums, mainly as a safeguard against accidental file corruption: + +SHA-256 4a08569f3d99310f713bb32bec0aa4819d6b41871e0421ec4eec0657a5582216 + (in other words, don't trust me; instead read the code :>) diff --git a/Documentation/RelNotes/v1.1.0-pre1.eml b/Documentation/RelNotes/v1.1.0-pre1.eml new file mode 100644 index 0000000..ee1ecc3 --- /dev/null +++ b/Documentation/RelNotes/v1.1.0-pre1.eml @@ -0,0 +1,295 @@ +From e@80x24.org Wed May 9 20:23:03 2018 +Date: Wed, 9 May 2018 20:23:03 +0000 +From: Eric Wong <e@80x24.org> +To: meta@public-inbox.org +Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> +Subject: [ANNOUNCE] public-inbox 1.1.0-pre1 +Message-ID: <20180509202303.GA15156@dcvr> + +Pre-release for v2 repository support. +Thanks to The Linux Foundation for supporting this work! + +https://public-inbox.org/releases/public-inbox-1.1.0-pre1.tar.gz + +SHA-256: d0023770a63ca109e6fe2c58b04c58987d4f81572ac69d18f95d6af0915fa009 +(only intended to guard against accidental file corruption) + +shortlog below: + +Eric Wong (27): + nntp: improve fairness during XOVER and similar commands + nntp: do not drain rbuf if there is a command pending + extmsg: use news.gmane.org for Message-ID lookups + searchview: fix non-numeric comparison + mbox: do not barf on queries which return no results + nntp: allow and ignore empty commands + ensure SQLite and Xapian files respect core.sharedRepository + TODO: a few more updates + filter/rubylang: do not set altid on spam training + import: cleanup git cat-file processes when ->done + disallow "\t" and "\n" in OVER headers + searchidx: release lock again during v1 batch callback + searchidx: remove leftover debugging code + convert: copy description and git config from v1 repo + view: untangle loop when showing message headers + view: wrap To: and Cc: headers in HTML display + view: drop redundant References: display code + TODO: add EPOLLEXCLUSIVE item + searchview: do not blindly append "l" parameter to URL + search: avoid repeated mbox results from search + msgmap: add limit to response for NNTP + thread: prevent hidden threads in /$INBOX/ landing page + thread: sort incoming messages by Date + searchidx: preserve umask when starting/committing transactions + scripts/import_slrnspool: support v2 repos + scripts/import_slrnspool: cleanup progress messages + public-inbox 1.1.0-pre1 + +Eric Wong (Contractor, The Linux Foundation) (239): + AUTHORS: add The Linux Foundation + watch_maildir: allow '-' in mail filename + scripts/import_vger_from_mbox: relax From_ line match slightly + import: stop writing legacy ssoma.index by default + import: begin supporting this without ssoma.lock + import: initial handling for v2 + t/import: test for last_object_id insertion + content_id: add test case + searchmsg: add mid_mime import for _extract_mid + scripts/import_vger_from_mbox: support --dry-run option + import: APIs to support v2 use + search: free up 'Q' prefix for a real unique identifier + searchidx: fix comment around next_thread_id + address: extract more characters from email addresses + import: pass "raw" dates to git-fast-import(1) + scripts/import_vger_from_mbox: use v2 layout for import + import: quiet down warnings from bogus From: lines + import: allow the epoch (0s) as a valid time + extmsg: fix broken Xapian MID lookup + search: stop assuming Message-ID is unique + www: stop assuming mainrepo == git_dir + v2writable: initial cut for repo-rotation + git: reload alternates file on missing blob + v2: support Xapian + SQLite indexing + import_vger_from_inbox: allow "-V" option + import_vger_from_mbox: use PublicInbox::MIME and avoid clobbering + v2: parallelize Xapian indexing + v2writable: round-robin to partitions based on article number + searchidxpart: increase pipe size for partitions + v2writable: warn on duplicate Message-IDs + searchidx: do not modify Xapian DB while iterating + v2/ui: some hacky things to get the PSGI UI to show up + v2/ui: retry DB reopens in a few more places + v2writable: cleanup unused pipes in partitions + searchidxpart: binmode + use PublicInbox::MIME consistently + searchidxpart: chomp line before splitting + searchidx*: name child subprocesses + searchidx: get rid of pointless index_blob wrapper + view: remove X-PI-TS reference + searchidxthread: load doc data for references + searchidxpart: force integers into add_message + search: reopen skeleton DB as well + searchidx: index values in the threader + search: use different Enquire object for skeleton queries + rename SearchIdxThread to SearchIdxSkeleton + v2writable: commit to skeleton via remote partitions + searchidxskeleton: extra error checking + searchidx: do not modify Xapian DB while iterating + search: query_xover uses skeleton DB iff available + v2/ui: get nntpd and init tests running on v2 + v2writable: delete ::Import obj when ->done + search: remove informational "warning" message + searchidx: add PID to error message when die-ing + content_id: special treatment for Message-Id headers + evcleanup: disable outside of daemon + v2writable: deduplicate detection on add + evcleanup: do not create event loop if nothing was registered + mid: add `mids' and `references' methods for extraction + content_id: use `mids' and `references' for MID extraction + searchidx: use new `references' method for parsing References + content_id: no need to be human-friendly + v2writable: inject new Message-IDs on true duplicates + search: revert to using 'Q' as a uniQue id per-Xapian conventions + searchidx: support indexing multiple MIDs + mid: be strict with References, but loose on Message-Id + searchidx: avoid excessive XNQ indexing with diffs + searchidxskeleton: add a note about locking + v2writable: generated Message-ID goes first + searchidx: use add_boolean_term for internal terms + searchidx: add NNTP article number as a searchable term + mid: truncate excessively long MIDs early + nntp: use NNTP article numbers for lookups + nntp: fix NEWNEWS command + searchidx: store the primary MID in doc data for NNTP + import: consolidate object info for v2 imports + v2: avoid redundant/repeated configs for git partition repos + INSTALL: document more optional dependencies + search: favor skeleton DB for lookup_mail + search: each_smsg_by_mid uses skeleton if available + v2writable: remove unnecessary skeleton commit + favor Received: date over Date: header globally + import: fall back to Sender for extracting name and email + scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping + v2writable: detect and use previous partition count + extmsg: rework partial MID matching to favor current inbox + extmsg: rework partial MID matching to favor current inbox + content_id: use Sender header if From is not available + v2writable: support "barrier" operation to avoid reforking + use string ref for Email::Simple->new + v2writable: remove unnecessary idx_init call + searchidx: do not delete documents while iterating + search: allow ->reopen to be chainable + v2writable: implement remove correctly + skeleton: barrier init requires a lock + import: (v2) delete writes the blob into history in subdir + import: (v2): write deletes to a separate '_' subdirectory + import: implement barrier operation for v1 repos + mid: mid_mime uses v2-compatible mids function + watchmaildir: use content_digest to generate Message-Id + import: force Message-ID generation for v1 here + import: switch to URL-safe Base64 for Message-IDs + v2writable: test for idempotent removals + import: enable locking under v2 + index: s/GIT_DIR/REPO_DIR/ + Lock: new base class for writable lockers + t/watch_maildir: note the reason for FIFO creation + v2writable: ensure ->done is idempotent + watchmaildir: support v2 repositories + searchidxpart: s/barrier/remote_barrier/ + v2writable: allow disabling parallelization + scripts/import_vger_from_mbox: filter out same headers as MDA + v2writable: add DEBUG_DIFF env support + v2writable: remove "resent" message for duplicate Message-IDs + content_id: do not take Message-Id into account + introduce InboxWritable class + import: discard all the same headers as MDA + InboxWritable: add mbox/maildir parsing + import logic + use both Date: and Received: times + msgmap: add tmp_clone to create an anonymous copy + fix syntax warnings + v2writable: support reindexing Xapian + t/altid.t: extra tests for mid_set + v2writable: add NNTP article number regeneration support + v2writable: clarify header cleanups + v2writable: DEBUG_DIFF respects $TMPDIR + feed: $INBOX/new.atom endpoint supports v2 inboxes + import: consolidate mid prepend logic, here + www: $MESSAGE_ID/raw endpoint supports "duplicates" + search: reopen DB if each_smsg_by_mid fails + t/psgi_v2: minimal test for Atom feed and t.mbox.gz + feed: fix new.html for v2 + view: permalink (per-message) view shows multiple messages + searchidx: warn about vivifying multiple ghosts + v2writable: warn on unseen deleted files + www: get rid of unnecessary 'inbox' name reference + searchview: remove unnecessary imports from MID module + view: depend on SearchMsg for Message-ID + http: fix modification of read-only value + githttpbackend: avoid infinite loop on generic PSGI servers + www: support cloning individual v2 git partitions + http: fix modification of read-only value + githttpbackend: avoid infinite loop on generic PSGI servers + www: remove unnecessary ghost checks + v2writable: append, instead of prepending generated Message-ID + lookup by Message-ID favors the "primary" one + www: fix attachment downloads for conflicted Message-IDs + searchmsg: document why we store To: and Cc: for NNTP + public-inbox-convert: tool for converting old to new inboxes + v2writable: support purging messages from git entirely + search: cleanup uniqueness checking + search: get rid of most lookup_* subroutines + search: move find_doc_ids to searchidx + v2writable: cleanup: get rid of unused fields + mbox: avoid extracting Message-ID for linkification + www: cleanup expensive fallback for legacy URLs + view: get rid of some unnecessary imports + search: retry_reopen on first_smsg_by_mid + import: run_die supports redirects as spawn does + v2writable: initializing an existing inbox is idempotent + public-inbox-compact: new tool for driving xapian-compact + mda: support v2 inboxes + search: warn on reopens and die on total failure + v2writable: allow gaps in git partitions + v2writable: convert some fatal reindex errors to warnings + wwwstream: flesh out clone instructions for v2 + v2writable: go backwards through alternate Message-IDs + view: speed up homepage loading time with date clamp + view: drop load_results + feed: optimize query for feeds, too + msgtime: parse 3-digit years properly + convert: avoid redundant "done\n" statement for fast-import + search: move permissions handling to InboxWritable + t/v2writable: use simplify permissions reading + v2: respect core.sharedRepository in git configs + searchidx: correct warning for over-vivification + v2: one file, really + v2writable: fix parallel termination + truncate Message-IDs and References consistently + scripts/import_vger_from_mbox: set address properly + search: reduce columns stored in Xapian + replace Xapian skeleton with SQLite overview DB + v2writable: simplify barrier vs checkpoints + t/over: test empty Subject: line matching + www: rework query responses to avoid COUNT in SQLite + over: speedup get_thread by avoiding JOIN + nntp: fix NEWNEWS command + t/thread-all.t: modernize test to support modern inboxes + rename+rewrite test using Benchmark module + nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster + view: avoid offset during pagination + mbox: remove remaining OFFSET usage in SQLite + msgmap: replace id_batch with ids_after + nntp: simplify the long_response API + searchidx: ensure duplicated Message-IDs can be linked together + init: s/GIT_DIR/REPO_DIR/ in usage + import: rewrite less history during purge + v2: support incremental indexing + purge + v2writable: do not modify DBs while iterating for ->remove + v2writable: recount partitions after acquiring lock + searchmsg: remove unused `tid' and `path' methods + search: remove unnecessary OP_AND of query + mbox: do not sort search results + searchview: minor cleanup + support altid mechanism for v2 + compact: better handling of over.sqlite3* files + v2writable: remove redundant remove from Over DB + v2writable: allow tracking parallel versions + v2writable: refer to git each repository as "epoch" + over: use only supported and safe SQLite APIs + search: index and allow searching by date-time + altid: fix miscopied field name + nntp: set Xref across multiple inboxes + www: favor reading more from SQLite, and less from Xapian + ensure Xapian and SQLite are still optional for v1 tests + psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological + over: avoid excessive SELECT + over: remove forked subprocess + v2writable: reduce barriers + index: allow specifying --jobs=0 to disable multiprocess + convert: support converting with altid defined + store less data in the Xapian document + msgmap: speed up minmax with separate queries + feed: respect feedmax, again + v1: remove articles from overview DB + compact: do not merge v2 repos by default + v2writable: reduce partititions by one + search: preserve References in Xapian smsg for x=t view + v2: generate better Message-IDs for duplicates + v2: improve deduplication checks + import: cat_blob drops leading 'From ' lines like Inbox + searchidx: regenerate and avoid article number gaps on full index + extmsg: remove expensive git path checks + use %H consistently to disable abbreviations + searchidx: increase term positions for all text terms + searchidx: revert default BATCH_BYTES to 1_000_000 + Merge remote-tracking branch 'origin/master' into v2 + fix tests to run without Xapian installed + extmsg: use Xapian only for partial matches + +Jonathan Corbet (3): + Don't use LIMIT in UPDATE statements + Update the installation instructions with Fedora package names + Allow specification of the number of search results to return +-- +git clone https://public-inbox.org/ public-inbox +(working on a homepage... sorta :) diff --git a/Documentation/RelNotes/v1.2.0.wip b/Documentation/RelNotes/v1.2.0.wip new file mode 100644 index 0000000..41236a0 --- /dev/null +++ b/Documentation/RelNotes/v1.2.0.wip @@ -0,0 +1,40 @@ +To: meta@public-inbox.org +Subject: [WIP] public-inbox 1.2.0 + +* first non-pre/rc release with v2 format support for scalability. + See public-inbox-v2-format(5) manpage for more details. + +* new admin tools for v2 repos: + - public-inbox-convert - converts v1 to v2 repo formats + - public-inbox-compact - v2 convenience wrapper for xapian-compact(1) + - public-inbox-purge - purges entire messages out of v2 history + - public-inbox-edit - edits sensitive data out messages from v2 history + - public-inbox-xcpdb - copydatabase(1) wrapper to upgrade Xapian formats + (e.g. from "chert" to "glass") and resharding + of v2 repos + +* SQLite3 support decoupled from Xapian support, and Xapian DBs may be + configured without phrase support to save space. See "indexlevel" in + public-inbox-config(5) manpage for more info. + +* public-inbox-nntpd + - support STARTTLS and NNTPS + - support COMPRESS extension + - fix several RFC3977 compliance bugs + - improved interopability with picky clients such as leafnode + +* public-inbox-watch + - support multiple spam training directories + - support mapping multiple inboxes per Maildir + +* PublicInbox::WWW + - grokmirror-compatible manifest.js.gz endpoint generation + - user-configurable color support in $INBOX_URL/_/text/color/ + - BOFHs may set default colors via "publicinbox.css" + (see public-inbox-config(5)) + +* Danga::Socket is no longer a runtime dependency of daemons. + +* improved FreeBSD support + +See archives at https://public-inbox.org/meta/ for all history. diff --git a/MANIFEST b/MANIFEST index f5290b4..ecf239f 100644 --- a/MANIFEST +++ b/MANIFEST @@ -1,7 +1,11 @@ +.gitattributes .gitignore AUTHORS COPYING Documentation/.gitignore +Documentation/RelNotes/v1.0.0.eml +Documentation/RelNotes/v1.1.0-pre1.eml +Documentation/RelNotes/v1.2.0.wip Documentation/dc-dlvr-spam-flow.txt Documentation/design_notes.txt Documentation/design_www.txt -- EW
Pre-release for v2 repository support. Thanks to The Linux Foundation for supporting this work! https://public-inbox.org/releases/public-inbox-1.1.0-pre1.tar.gz SHA-256: d0023770a63ca109e6fe2c58b04c58987d4f81572ac69d18f95d6af0915fa009 (only intended to guard against accidental file corruption) shortlog below: Eric Wong (27): nntp: improve fairness during XOVER and similar commands nntp: do not drain rbuf if there is a command pending extmsg: use news.gmane.org for Message-ID lookups searchview: fix non-numeric comparison mbox: do not barf on queries which return no results nntp: allow and ignore empty commands ensure SQLite and Xapian files respect core.sharedRepository TODO: a few more updates filter/rubylang: do not set altid on spam training import: cleanup git cat-file processes when ->done disallow "\t" and "\n" in OVER headers searchidx: release lock again during v1 batch callback searchidx: remove leftover debugging code convert: copy description and git config from v1 repo view: untangle loop when showing message headers view: wrap To: and Cc: headers in HTML display view: drop redundant References: display code TODO: add EPOLLEXCLUSIVE item searchview: do not blindly append "l" parameter to URL search: avoid repeated mbox results from search msgmap: add limit to response for NNTP thread: prevent hidden threads in /$INBOX/ landing page thread: sort incoming messages by Date searchidx: preserve umask when starting/committing transactions scripts/import_slrnspool: support v2 repos scripts/import_slrnspool: cleanup progress messages public-inbox 1.1.0-pre1 Eric Wong (Contractor, The Linux Foundation) (239): AUTHORS: add The Linux Foundation watch_maildir: allow '-' in mail filename scripts/import_vger_from_mbox: relax From_ line match slightly import: stop writing legacy ssoma.index by default import: begin supporting this without ssoma.lock import: initial handling for v2 t/import: test for last_object_id insertion content_id: add test case searchmsg: add mid_mime import for _extract_mid scripts/import_vger_from_mbox: support --dry-run option import: APIs to support v2 use search: free up 'Q' prefix for a real unique identifier searchidx: fix comment around next_thread_id address: extract more characters from email addresses import: pass "raw" dates to git-fast-import(1) scripts/import_vger_from_mbox: use v2 layout for import import: quiet down warnings from bogus From: lines import: allow the epoch (0s) as a valid time extmsg: fix broken Xapian MID lookup search: stop assuming Message-ID is unique www: stop assuming mainrepo == git_dir v2writable: initial cut for repo-rotation git: reload alternates file on missing blob v2: support Xapian + SQLite indexing import_vger_from_inbox: allow "-V" option import_vger_from_mbox: use PublicInbox::MIME and avoid clobbering v2: parallelize Xapian indexing v2writable: round-robin to partitions based on article number searchidxpart: increase pipe size for partitions v2writable: warn on duplicate Message-IDs searchidx: do not modify Xapian DB while iterating v2/ui: some hacky things to get the PSGI UI to show up v2/ui: retry DB reopens in a few more places v2writable: cleanup unused pipes in partitions searchidxpart: binmode use PublicInbox::MIME consistently searchidxpart: chomp line before splitting searchidx*: name child subprocesses searchidx: get rid of pointless index_blob wrapper view: remove X-PI-TS reference searchidxthread: load doc data for references searchidxpart: force integers into add_message search: reopen skeleton DB as well searchidx: index values in the threader search: use different Enquire object for skeleton queries rename SearchIdxThread to SearchIdxSkeleton v2writable: commit to skeleton via remote partitions searchidxskeleton: extra error checking searchidx: do not modify Xapian DB while iterating search: query_xover uses skeleton DB iff available v2/ui: get nntpd and init tests running on v2 v2writable: delete ::Import obj when ->done search: remove informational "warning" message searchidx: add PID to error message when die-ing content_id: special treatment for Message-Id headers evcleanup: disable outside of daemon v2writable: deduplicate detection on add evcleanup: do not create event loop if nothing was registered mid: add `mids' and `references' methods for extraction content_id: use `mids' and `references' for MID extraction searchidx: use new `references' method for parsing References content_id: no need to be human-friendly v2writable: inject new Message-IDs on true duplicates search: revert to using 'Q' as a uniQue id per-Xapian conventions searchidx: support indexing multiple MIDs mid: be strict with References, but loose on Message-Id searchidx: avoid excessive XNQ indexing with diffs searchidxskeleton: add a note about locking v2writable: generated Message-ID goes first searchidx: use add_boolean_term for internal terms searchidx: add NNTP article number as a searchable term mid: truncate excessively long MIDs early nntp: use NNTP article numbers for lookups nntp: fix NEWNEWS command searchidx: store the primary MID in doc data for NNTP import: consolidate object info for v2 imports v2: avoid redundant/repeated configs for git partition repos INSTALL: document more optional dependencies search: favor skeleton DB for lookup_mail search: each_smsg_by_mid uses skeleton if available v2writable: remove unnecessary skeleton commit favor Received: date over Date: header globally import: fall back to Sender for extracting name and email scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping v2writable: detect and use previous partition count extmsg: rework partial MID matching to favor current inbox extmsg: rework partial MID matching to favor current inbox content_id: use Sender header if From is not available v2writable: support "barrier" operation to avoid reforking use string ref for Email::Simple->new v2writable: remove unnecessary idx_init call searchidx: do not delete documents while iterating search: allow ->reopen to be chainable v2writable: implement remove correctly skeleton: barrier init requires a lock import: (v2) delete writes the blob into history in subdir import: (v2): write deletes to a separate '_' subdirectory import: implement barrier operation for v1 repos mid: mid_mime uses v2-compatible mids function watchmaildir: use content_digest to generate Message-Id import: force Message-ID generation for v1 here import: switch to URL-safe Base64 for Message-IDs v2writable: test for idempotent removals import: enable locking under v2 index: s/GIT_DIR/REPO_DIR/ Lock: new base class for writable lockers t/watch_maildir: note the reason for FIFO creation v2writable: ensure ->done is idempotent watchmaildir: support v2 repositories searchidxpart: s/barrier/remote_barrier/ v2writable: allow disabling parallelization scripts/import_vger_from_mbox: filter out same headers as MDA v2writable: add DEBUG_DIFF env support v2writable: remove "resent" message for duplicate Message-IDs content_id: do not take Message-Id into account introduce InboxWritable class import: discard all the same headers as MDA InboxWritable: add mbox/maildir parsing + import logic use both Date: and Received: times msgmap: add tmp_clone to create an anonymous copy fix syntax warnings v2writable: support reindexing Xapian t/altid.t: extra tests for mid_set v2writable: add NNTP article number regeneration support v2writable: clarify header cleanups v2writable: DEBUG_DIFF respects $TMPDIR feed: $INBOX/new.atom endpoint supports v2 inboxes import: consolidate mid prepend logic, here www: $MESSAGE_ID/raw endpoint supports "duplicates" search: reopen DB if each_smsg_by_mid fails t/psgi_v2: minimal test for Atom feed and t.mbox.gz feed: fix new.html for v2 view: permalink (per-message) view shows multiple messages searchidx: warn about vivifying multiple ghosts v2writable: warn on unseen deleted files www: get rid of unnecessary 'inbox' name reference searchview: remove unnecessary imports from MID module view: depend on SearchMsg for Message-ID http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: support cloning individual v2 git partitions http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: remove unnecessary ghost checks v2writable: append, instead of prepending generated Message-ID lookup by Message-ID favors the "primary" one www: fix attachment downloads for conflicted Message-IDs searchmsg: document why we store To: and Cc: for NNTP public-inbox-convert: tool for converting old to new inboxes v2writable: support purging messages from git entirely search: cleanup uniqueness checking search: get rid of most lookup_* subroutines search: move find_doc_ids to searchidx v2writable: cleanup: get rid of unused fields mbox: avoid extracting Message-ID for linkification www: cleanup expensive fallback for legacy URLs view: get rid of some unnecessary imports search: retry_reopen on first_smsg_by_mid import: run_die supports redirects as spawn does v2writable: initializing an existing inbox is idempotent public-inbox-compact: new tool for driving xapian-compact mda: support v2 inboxes search: warn on reopens and die on total failure v2writable: allow gaps in git partitions v2writable: convert some fatal reindex errors to warnings wwwstream: flesh out clone instructions for v2 v2writable: go backwards through alternate Message-IDs view: speed up homepage loading time with date clamp view: drop load_results feed: optimize query for feeds, too msgtime: parse 3-digit years properly convert: avoid redundant "done\n" statement for fast-import search: move permissions handling to InboxWritable t/v2writable: use simplify permissions reading v2: respect core.sharedRepository in git configs searchidx: correct warning for over-vivification v2: one file, really v2writable: fix parallel termination truncate Message-IDs and References consistently scripts/import_vger_from_mbox: set address properly search: reduce columns stored in Xapian replace Xapian skeleton with SQLite overview DB v2writable: simplify barrier vs checkpoints t/over: test empty Subject: line matching www: rework query responses to avoid COUNT in SQLite over: speedup get_thread by avoiding JOIN nntp: fix NEWNEWS command t/thread-all.t: modernize test to support modern inboxes rename+rewrite test using Benchmark module nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster view: avoid offset during pagination mbox: remove remaining OFFSET usage in SQLite msgmap: replace id_batch with ids_after nntp: simplify the long_response API searchidx: ensure duplicated Message-IDs can be linked together init: s/GIT_DIR/REPO_DIR/ in usage import: rewrite less history during purge v2: support incremental indexing + purge v2writable: do not modify DBs while iterating for ->remove v2writable: recount partitions after acquiring lock searchmsg: remove unused `tid' and `path' methods search: remove unnecessary OP_AND of query mbox: do not sort search results searchview: minor cleanup support altid mechanism for v2 compact: better handling of over.sqlite3* files v2writable: remove redundant remove from Over DB v2writable: allow tracking parallel versions v2writable: refer to git each repository as "epoch" over: use only supported and safe SQLite APIs search: index and allow searching by date-time altid: fix miscopied field name nntp: set Xref across multiple inboxes www: favor reading more from SQLite, and less from Xapian ensure Xapian and SQLite are still optional for v1 tests psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological over: avoid excessive SELECT over: remove forked subprocess v2writable: reduce barriers index: allow specifying --jobs=0 to disable multiprocess convert: support converting with altid defined store less data in the Xapian document msgmap: speed up minmax with separate queries feed: respect feedmax, again v1: remove articles from overview DB compact: do not merge v2 repos by default v2writable: reduce partititions by one search: preserve References in Xapian smsg for x=t view v2: generate better Message-IDs for duplicates v2: improve deduplication checks import: cat_blob drops leading 'From ' lines like Inbox searchidx: regenerate and avoid article number gaps on full index extmsg: remove expensive git path checks use %H consistently to disable abbreviations searchidx: increase term positions for all text terms searchidx: revert default BATCH_BYTES to 1_000_000 Merge remote-tracking branch 'origin/master' into v2 fix tests to run without Xapian installed extmsg: use Xapian only for partial matches Jonathan Corbet (3): Don't use LIMIT in UPDATE statements Update the installation instructions with Fedora package names Allow specification of the number of search results to return -- git clone https://public-inbox.org/ public-inbox (working on a homepage... sorta :)
I actually merged master into v2, so it's a bit backwards :P commit cfb8d16578e7f2f2e300f9f436205e4a8fc7f322 Merge: 1dc0f0c 119463b Author: Eric Wong (Contractor, The Linux Foundation) <e@80x24.org> Date: Wed Apr 18 20:58:35 2018 +0000 Merge remote-tracking branch 'origin/master' into v2 I screwed up the indexing on http://hjrcffqmbrq6wope.onion/git/ so that's still going, but I think I was able to update the rest of them (including the heavily trafficked non-.onion) w/o downtime. The mirror at http://czquwvybam4bgbro.onion/git/ has been running the v2 code for over a week, now. Thanks to the Linux Foundation for funding this work. Will still need to make some documentation updates and such. Eric Wong (Contractor, The Linux Foundation) (237): AUTHORS: add The Linux Foundation watch_maildir: allow '-' in mail filename scripts/import_vger_from_mbox: relax From_ line match slightly import: stop writing legacy ssoma.index by default import: begin supporting this without ssoma.lock import: initial handling for v2 t/import: test for last_object_id insertion content_id: add test case searchmsg: add mid_mime import for _extract_mid scripts/import_vger_from_mbox: support --dry-run option import: APIs to support v2 use search: free up 'Q' prefix for a real unique identifier searchidx: fix comment around next_thread_id address: extract more characters from email addresses import: pass "raw" dates to git-fast-import(1) scripts/import_vger_from_mbox: use v2 layout for import import: quiet down warnings from bogus From: lines import: allow the epoch (0s) as a valid time extmsg: fix broken Xapian MID lookup search: stop assuming Message-ID is unique www: stop assuming mainrepo == git_dir v2writable: initial cut for repo-rotation git: reload alternates file on missing blob v2: support Xapian + SQLite indexing import_vger_from_inbox: allow "-V" option import_vger_from_mbox: use PublicInbox::MIME and avoid clobbering v2: parallelize Xapian indexing v2writable: round-robin to partitions based on article number searchidxpart: increase pipe size for partitions v2writable: warn on duplicate Message-IDs searchidx: do not modify Xapian DB while iterating v2/ui: some hacky things to get the PSGI UI to show up v2/ui: retry DB reopens in a few more places v2writable: cleanup unused pipes in partitions searchidxpart: binmode use PublicInbox::MIME consistently searchidxpart: chomp line before splitting searchidx*: name child subprocesses searchidx: get rid of pointless index_blob wrapper view: remove X-PI-TS reference searchidxthread: load doc data for references searchidxpart: force integers into add_message search: reopen skeleton DB as well searchidx: index values in the threader search: use different Enquire object for skeleton queries rename SearchIdxThread to SearchIdxSkeleton v2writable: commit to skeleton via remote partitions searchidxskeleton: extra error checking searchidx: do not modify Xapian DB while iterating search: query_xover uses skeleton DB iff available v2/ui: get nntpd and init tests running on v2 v2writable: delete ::Import obj when ->done search: remove informational "warning" message searchidx: add PID to error message when die-ing content_id: special treatment for Message-Id headers evcleanup: disable outside of daemon v2writable: deduplicate detection on add evcleanup: do not create event loop if nothing was registered mid: add `mids' and `references' methods for extraction content_id: use `mids' and `references' for MID extraction searchidx: use new `references' method for parsing References content_id: no need to be human-friendly v2writable: inject new Message-IDs on true duplicates search: revert to using 'Q' as a uniQue id per-Xapian conventions searchidx: support indexing multiple MIDs mid: be strict with References, but loose on Message-Id searchidx: avoid excessive XNQ indexing with diffs searchidxskeleton: add a note about locking v2writable: generated Message-ID goes first searchidx: use add_boolean_term for internal terms searchidx: add NNTP article number as a searchable term mid: truncate excessively long MIDs early nntp: use NNTP article numbers for lookups nntp: fix NEWNEWS command searchidx: store the primary MID in doc data for NNTP import: consolidate object info for v2 imports v2: avoid redundant/repeated configs for git partition repos INSTALL: document more optional dependencies search: favor skeleton DB for lookup_mail search: each_smsg_by_mid uses skeleton if available v2writable: remove unnecessary skeleton commit favor Received: date over Date: header globally import: fall back to Sender for extracting name and email scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping v2writable: detect and use previous partition count extmsg: rework partial MID matching to favor current inbox extmsg: rework partial MID matching to favor current inbox content_id: use Sender header if From is not available v2writable: support "barrier" operation to avoid reforking use string ref for Email::Simple->new v2writable: remove unnecessary idx_init call searchidx: do not delete documents while iterating search: allow ->reopen to be chainable v2writable: implement remove correctly skeleton: barrier init requires a lock import: (v2) delete writes the blob into history in subdir import: (v2): write deletes to a separate '_' subdirectory import: implement barrier operation for v1 repos mid: mid_mime uses v2-compatible mids function watchmaildir: use content_digest to generate Message-Id import: force Message-ID generation for v1 here import: switch to URL-safe Base64 for Message-IDs v2writable: test for idempotent removals import: enable locking under v2 index: s/GIT_DIR/REPO_DIR/ Lock: new base class for writable lockers t/watch_maildir: note the reason for FIFO creation v2writable: ensure ->done is idempotent watchmaildir: support v2 repositories searchidxpart: s/barrier/remote_barrier/ v2writable: allow disabling parallelization scripts/import_vger_from_mbox: filter out same headers as MDA v2writable: add DEBUG_DIFF env support v2writable: remove "resent" message for duplicate Message-IDs content_id: do not take Message-Id into account introduce InboxWritable class import: discard all the same headers as MDA InboxWritable: add mbox/maildir parsing + import logic use both Date: and Received: times msgmap: add tmp_clone to create an anonymous copy fix syntax warnings v2writable: support reindexing Xapian t/altid.t: extra tests for mid_set v2writable: add NNTP article number regeneration support v2writable: clarify header cleanups v2writable: DEBUG_DIFF respects $TMPDIR feed: $INBOX/new.atom endpoint supports v2 inboxes import: consolidate mid prepend logic, here www: $MESSAGE_ID/raw endpoint supports "duplicates" search: reopen DB if each_smsg_by_mid fails t/psgi_v2: minimal test for Atom feed and t.mbox.gz feed: fix new.html for v2 view: permalink (per-message) view shows multiple messages searchidx: warn about vivifying multiple ghosts v2writable: warn on unseen deleted files www: get rid of unnecessary 'inbox' name reference searchview: remove unnecessary imports from MID module view: depend on SearchMsg for Message-ID http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: support cloning individual v2 git partitions http: fix modification of read-only value githttpbackend: avoid infinite loop on generic PSGI servers www: remove unnecessary ghost checks v2writable: append, instead of prepending generated Message-ID lookup by Message-ID favors the "primary" one www: fix attachment downloads for conflicted Message-IDs searchmsg: document why we store To: and Cc: for NNTP public-inbox-convert: tool for converting old to new inboxes v2writable: support purging messages from git entirely search: cleanup uniqueness checking search: get rid of most lookup_* subroutines search: move find_doc_ids to searchidx v2writable: cleanup: get rid of unused fields mbox: avoid extracting Message-ID for linkification www: cleanup expensive fallback for legacy URLs view: get rid of some unnecessary imports search: retry_reopen on first_smsg_by_mid import: run_die supports redirects as spawn does v2writable: initializing an existing inbox is idempotent public-inbox-compact: new tool for driving xapian-compact mda: support v2 inboxes search: warn on reopens and die on total failure v2writable: allow gaps in git partitions v2writable: convert some fatal reindex errors to warnings wwwstream: flesh out clone instructions for v2 v2writable: go backwards through alternate Message-IDs view: speed up homepage loading time with date clamp view: drop load_results feed: optimize query for feeds, too msgtime: parse 3-digit years properly convert: avoid redundant "done\n" statement for fast-import search: move permissions handling to InboxWritable t/v2writable: use simplify permissions reading v2: respect core.sharedRepository in git configs searchidx: correct warning for over-vivification v2: one file, really v2writable: fix parallel termination truncate Message-IDs and References consistently scripts/import_vger_from_mbox: set address properly search: reduce columns stored in Xapian replace Xapian skeleton with SQLite overview DB v2writable: simplify barrier vs checkpoints t/over: test empty Subject: line matching www: rework query responses to avoid COUNT in SQLite over: speedup get_thread by avoiding JOIN nntp: fix NEWNEWS command t/thread-all.t: modernize test to support modern inboxes rename+rewrite test using Benchmark module nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster view: avoid offset during pagination mbox: remove remaining OFFSET usage in SQLite msgmap: replace id_batch with ids_after nntp: simplify the long_response API searchidx: ensure duplicated Message-IDs can be linked together init: s/GIT_DIR/REPO_DIR/ in usage import: rewrite less history during purge v2: support incremental indexing + purge v2writable: do not modify DBs while iterating for ->remove v2writable: recount partitions after acquiring lock searchmsg: remove unused `tid' and `path' methods search: remove unnecessary OP_AND of query mbox: do not sort search results searchview: minor cleanup support altid mechanism for v2 compact: better handling of over.sqlite3* files v2writable: remove redundant remove from Over DB v2writable: allow tracking parallel versions v2writable: refer to git each repository as "epoch" over: use only supported and safe SQLite APIs search: index and allow searching by date-time altid: fix miscopied field name nntp: set Xref across multiple inboxes www: favor reading more from SQLite, and less from Xapian ensure Xapian and SQLite are still optional for v1 tests psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological over: avoid excessive SELECT over: remove forked subprocess v2writable: reduce barriers index: allow specifying --jobs=0 to disable multiprocess convert: support converting with altid defined store less data in the Xapian document msgmap: speed up minmax with separate queries feed: respect feedmax, again v1: remove articles from overview DB compact: do not merge v2 repos by default v2writable: reduce partititions by one search: preserve References in Xapian smsg for x=t view v2: generate better Message-IDs for duplicates v2: improve deduplication checks import: cat_blob drops leading 'From ' lines like Inbox searchidx: regenerate and avoid article number gaps on full index extmsg: remove expensive git path checks use %H consistently to disable abbreviations searchidx: increase term positions for all text terms searchidx: revert default BATCH_BYTES to 1_000_000 Merge remote-tracking branch 'origin/master' into v2
We'll let the config of all.git dictate every other subrepo to ease maintenance and configuration. The "include" directive has been supported since git 1.7.10, so it's safe to depend on as v2 requires git 2.6.0+ anyways for "get-mark" in fast-import. --- lib/PublicInbox/SearchIdx.pm | 2 +- lib/PublicInbox/V2Writable.pm | 10 +++++++--- t/init.t | 2 ++ t/v2writable.t | 16 ++++++++++++++++ 4 files changed, 26 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 71469a9..725bbd8 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -817,7 +817,7 @@ sub _read_git_config_perm { my ($self) = @_; my @cmd = qw(config); if ($self->{version} == 2) { - push @cmd, "--file=$self->{mainrepo}/inbox-config"; + push @cmd, "--file=$self->{mainrepo}/all.git/config"; } my $fh = $self->{git}->popen(@cmd, 'core.sharedRepository'); local $/ = "\n"; diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 31376db..461432e 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -244,16 +244,20 @@ sub git_init { die "$git_dir exists\n" if -e $git_dir; my @cmd = (qw(git init --bare -q), $git_dir); PublicInbox::Import::run_die(\@cmd); - @cmd = (qw/git config/, "--file=$git_dir/config", - 'repack.writeBitmaps', 'true'); - PublicInbox::Import::run_die(\@cmd); my $all = "$self->{-inbox}->{mainrepo}/all.git"; unless (-d $all) { @cmd = (qw(git init --bare -q), $all); PublicInbox::Import::run_die(\@cmd); + @cmd = (qw/git config/, "--file=$all/config", + 'repack.writeBitmaps', 'true'); + PublicInbox::Import::run_die(\@cmd); } + @cmd = (qw/git config/, "--file=$git_dir/config", + 'include.path', '../../all.git/config'); + PublicInbox::Import::run_die(\@cmd); + my $alt = "$all/objects/info/alternates"; my $new_obj_dir = "../../git/$new.git/objects"; my %alts; diff --git a/t/init.t b/t/init.t index 54b90ec..6ae608e 100644 --- a/t/init.t +++ b/t/init.t @@ -38,6 +38,8 @@ SKIP: { ok(-d "$tmpdir/v2list", 'v2list directory exists'); ok(-f "$tmpdir/v2list/msgmap.sqlite3", 'msgmap exists'); ok(-d "$tmpdir/v2list/all.git", 'catch-all.git directory exists'); + @cmd = (qw(git config), "--file=$tmpdir/v2list/all.git/config", + qw(core.sharedRepository 0644)); } done_testing(); diff --git a/t/v2writable.t b/t/v2writable.t index bf8ae5e..2d35aca 100644 --- a/t/v2writable.t +++ b/t/v2writable.t @@ -32,6 +32,22 @@ my $mime = PublicInbox::MIME->create( my $im = PublicInbox::V2Writable->new($ibx, 1); ok($im->add($mime), 'ordinary message added'); + +if ('ensure git configs are correct') { + my @cmd = (qw(git config), "--file=$mainrepo/all.git/config", + qw(core.sharedRepository 0644)); + is(system(@cmd), 0, "set sharedRepository in all.git"); + my $git0 = PublicInbox::Git->new("$mainrepo/git/0.git"); + my $fh = $git0->popen(qw(config core.sharedRepository)); + my $v = eval { local $/; <$fh> }; + chomp $v; + is($v, '0644', 'child repo inherited core.sharedRepository'); + $fh = $git0->popen(qw(config --bool repack.writeBitmaps)); + $v = eval { local $/; <$fh> }; + chomp $v; + is($v, 'true', 'child repo inherited repack.writeBitmaps'); +} + { my @warn; local $SIG{__WARN__} = sub { push @warn, @_ }; -- EW
Duplicate detection based on `content_id' now works and rejects obviously re-sent messages with the same Message-Id. Since many historical messages already have multiple Message-Ids (some from buggy versions of git-send-email), we will inject Message-Ids as needed to differentiate messages with the SAME Message-Id. This prevents NNTP readers from missing out on messages. Internally, the Message-Id we _favor_ for NNTP is also the one which gets used for rendering threads. Excessively long Message-Ids are just truncated to 244 for now (Xapian limit for terms). I hope it's not an abuse vector going forward (only one spam message used it), but this is another problem our inject-new-Message-Id-on-duplicate scheme "solves". Internal timestamps used for sorting now favor the first (last-added) Received: header since is more likely to be correct than the Date: header. A wrong Date: header will still show up in the per-message ("permalink") view, so it can still be used to embarass people with bad clocks :P (Of course, downloadable mboxes will continue to show them). For thread skeleton (index) views in HTML, we use the internal timestamp for now; but maybe we'll use the Date: like the permalink view. Maybe internally there can be two timestamps like git's author-vs-committer dates. Xapian index size is reduced, as the "nq:" search field is no longer redundantly storing information that would be in searchable diff fields (df* in https://public-inbox.org/git/_/text/help/). This (along with remembering to run fstrim(8)) seems to have reduced best-case indexing speed to around 3.5 hours for the 2000-2017 dataset I'm using \o/ Eric Wong (Contractor, The Linux Foundation) (34): v2writable: delete ::Import obj when ->done search: remove informational "warning" message searchidx: add PID to error message when die-ing content_id: special treatment for Message-Id headers evcleanup: disable outside of daemon v2writable: deduplicate detection on add evcleanup: do not create event loop if nothing was registered mid: add `mids' and `references' methods for extraction content_id: use `mids' and `references' for MID extraction searchidx: use new `references' method for parsing References content_id: no need to be human-friendly v2writable: inject new Message-IDs on true duplicates search: revert to using 'Q' as a uniQue id per-Xapian conventions searchidx: support indexing multiple MIDs mid: be strict with References, but loose on Message-Id searchidx: avoid excessive XNQ indexing with diffs searchidxskeleton: add a note about locking v2writable: generated Message-ID goes first searchidx: use add_boolean_term for internal terms searchidx: add NNTP article number as a searchable term mid: truncate excessively long MIDs early nntp: use NNTP article numbers for lookups nntp: fix NEWNEWS command searchidx: store the primary MID in doc data for NNTP import: consolidate object info for v2 imports v2: avoid redundant/repeated configs for git partition repos INSTALL: document more optional dependencies search: favor skeleton DB for lookup_mail search: each_smsg_by_mid uses skeleton if available v2writable: remove unnecessary skeleton commit favor Received: date over Date: header globally import: fall back to Sender for extracting name and email scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping v2writable: detect and use previous partition count INSTALL | 13 ++ MANIFEST | 2 + lib/PublicInbox/ContentId.pm | 32 +++-- lib/PublicInbox/Daemon.pm | 1 + lib/PublicInbox/EvCleanup.pm | 6 +- lib/PublicInbox/ExtMsg.pm | 2 +- lib/PublicInbox/Import.pm | 99 ++++++------- lib/PublicInbox/Inbox.pm | 1 + lib/PublicInbox/MID.pm | 55 +++++++- lib/PublicInbox/MsgTime.pm | 51 +++++++ lib/PublicInbox/NNTP.pm | 31 ++--- lib/PublicInbox/Search.pm | 70 ++++++++-- lib/PublicInbox/SearchIdx.pm | 260 +++++++++++++++++++++-------------- lib/PublicInbox/SearchIdxPart.pm | 8 +- lib/PublicInbox/SearchIdxSkeleton.pm | 27 +--- lib/PublicInbox/SearchMsg.pm | 26 ++-- lib/PublicInbox/V2Writable.pm | 166 +++++++++++++++++++--- lib/PublicInbox/View.pm | 8 +- lib/PublicInbox/WwwAtomStream.pm | 5 +- scripts/import_vger_from_mbox | 11 +- t/content_id.t | 5 +- t/import.t | 9 +- t/init.t | 2 + t/mid.t | 22 ++- t/nntpd.t | 2 + t/search-thr-index.t | 2 +- t/v2writable.t | 195 ++++++++++++++++++++++++++ 27 files changed, 842 insertions(+), 269 deletions(-) create mode 100644 lib/PublicInbox/MsgTime.pm create mode 100644 t/v2writable.t -- EW