about summary refs log tree commit homepage
path: root/scripts/import_vger_from_mbox
DateCommit message (Collapse)
2018-02-22import_vger_from_inbox: allow "-V" option
This will let us quickly test between v2 and v1 inboxes.
2018-02-20v2: support Xapian + SQLite indexing
This is too slow, currently. Working with only 2017 LKML archives: git-only: ~1 minute git + SQLite: ~12 minutes git+Xapian+SQlite: ~45 minutes So yes, it looks like we'll need to parallelize Xapian indexing, at least.
2018-02-19v2writable: initial cut for repo-rotation
Wrap the old Import package to enable creating new repos based on size thresholds. This is better than relying on time-based rotation as LKML traffic seems to be increasing.
2018-02-15scripts/import_vger_from_mbox: use v2 layout for import
Big lists are orders of magnitude more efficient with v2.
2018-02-13scripts/import_vger_from_mbox: support --dry-run option
This can be useful for getting baseline of performance of just Email::MIME and Date: header parsing. We'll need to do some Date: header parsing for LKML since there are some wonky date formats which causes the git RFC822 parser to choke.
2018-02-08scripts/import_vger_from_mbox: relax From_ line match slightly
The mboxes I got from cregit have two spaces after the email address, while the "git format-patch" output I'm used to dealing with only has one space. It's still a "strict" match in that it checks for something resembling a timestamp, but it relaxes the number of spaces between the email address and date.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2016-07-28add script used for importing git from download.gmane.org
In case others want to use it...