about summary refs log tree commit homepage
path: root/scripts/import_vger_from_mbox
DateCommit message (Collapse)
2024-02-01scripts/import_*: update usage to include lei tips
These scripts probably don't offer anything useful now that lei has fleshed out read-only MH support and v2 outputs.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-02-24import_vger_from_mbox: add --filter parameter
It shouldn't be hard to make this into a more generic importer not specific to vger lists.
2020-02-24import_vger_from_mbox: drop redundant "use" statements
PublicInbox::InboxWritable takes care of those imports.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-09-09run update-copyrights from gnulib for 2019
2018-04-06ensure Xapian and SQLite are still optional for v1 tests
Xapian is size-intensive and SQLite is not strictly necessary for v1.
2018-04-01scripts/import_vger_from_mbox: set address properly
For objects like Inbox; the '-' prefixed hash keys are probably intended for auto-generated/hidden parameters.
2018-03-20InboxWritable: add mbox/maildir parsing + import logic
This will make it easier to as well as supporting future Filter API users. It allows simplifying our ad-hoc import_vger_from_mbox script.
2018-03-19scripts/import_vger_from_mbox: filter out same headers as MDA
Perhaps we should filter these headers out in Import
2018-03-06scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping
It appears most of the mboxes in the archive I've been given are mboxrd (despite having Content-Length:) and needs the escaping.
2018-03-06favor Received: date over Date: header globally
The first Received: header is believable since it typically hits the user's mail server and can be treated as relatively trustworthy. We still show the Date: in per-message (permalink) views, which may expose users for having incorrect Date: headers, but all the ISO YYYY-MM-DD dates we display will match what we see.
2018-02-28use PublicInbox::MIME consistently
It works around some bugs in older Email::MIME which we'll find useful.
2018-02-22import_vger_from_mbox: use PublicInbox::MIME and avoid clobbering
It is less confusing without the clobber assignment; and PublicInbox::MIME exists to workaround bugs in older Email::MIME (which is in Debian 9 (stretch))
2018-02-22import_vger_from_inbox: allow "-V" option
This will let us quickly test between v2 and v1 inboxes.
2018-02-20v2: support Xapian + SQLite indexing
This is too slow, currently. Working with only 2017 LKML archives: git-only: ~1 minute git + SQLite: ~12 minutes git+Xapian+SQlite: ~45 minutes So yes, it looks like we'll need to parallelize Xapian indexing, at least.
2018-02-19v2writable: initial cut for repo-rotation
Wrap the old Import package to enable creating new repos based on size thresholds. This is better than relying on time-based rotation as LKML traffic seems to be increasing.
2018-02-15scripts/import_vger_from_mbox: use v2 layout for import
Big lists are orders of magnitude more efficient with v2.
2018-02-13scripts/import_vger_from_mbox: support --dry-run option
This can be useful for getting baseline of performance of just Email::MIME and Date: header parsing. We'll need to do some Date: header parsing for LKML since there are some wonky date formats which causes the git RFC822 parser to choke.
2018-02-08scripts/import_vger_from_mbox: relax From_ line match slightly
The mboxes I got from cregit have two spaces after the email address, while the "git format-patch" output I'm used to dealing with only has one space. It's still a "strict" match in that it checks for something resembling a timestamp, but it relaxes the number of spaces between the email address and date.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2016-07-28add script used for importing git from download.gmane.org
In case others want to use it...