Date | Commit message (Collapse) |
|
These scripts probably don't offer anything useful now that
lei has fleshed out read-only MH support and v2 outputs.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
It shouldn't be hard to make this into a more generic
importer not specific to vger lists.
|
|
PublicInbox::InboxWritable takes care of those imports.
|
|
I didn't wait until September to do it, this year!
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
|
|
Xapian is size-intensive and SQLite is not strictly necessary for v1.
|
|
For objects like Inbox; the '-' prefixed hash keys are
probably intended for auto-generated/hidden parameters.
|
|
This will make it easier to as well as supporting future
Filter API users. It allows simplifying our ad-hoc
import_vger_from_mbox script.
|
|
Perhaps we should filter these headers out in Import
|
|
It appears most of the mboxes in the archive I've been given are
mboxrd (despite having Content-Length:) and needs the escaping.
|
|
The first Received: header is believable since it typically
hits the user's mail server and can be treated as relatively
trustworthy. We still show the Date: in per-message (permalink)
views, which may expose users for having incorrect Date:
headers, but all the ISO YYYY-MM-DD dates we display will
match what we see.
|
|
It works around some bugs in older Email::MIME which we'll
find useful.
|
|
It is less confusing without the clobber assignment; and
PublicInbox::MIME exists to workaround bugs in older
Email::MIME (which is in Debian 9 (stretch))
|
|
This will let us quickly test between v2 and v1 inboxes.
|
|
This is too slow, currently. Working with only 2017 LKML
archives:
git-only: ~1 minute
git + SQLite: ~12 minutes
git+Xapian+SQlite: ~45 minutes
So yes, it looks like we'll need to parallelize Xapian indexing,
at least.
|
|
Wrap the old Import package to enable creating new repos based
on size thresholds. This is better than relying on time-based
rotation as LKML traffic seems to be increasing.
|
|
Big lists are orders of magnitude more efficient with v2.
|
|
This can be useful for getting baseline of performance
of just Email::MIME and Date: header parsing. We'll need
to do some Date: header parsing for LKML since there are
some wonky date formats which causes the git RFC822 parser
to choke.
|
|
The mboxes I got from cregit have two spaces after the email
address, while the "git format-patch" output I'm used to dealing
with only has one space.
It's still a "strict" match in that it checks for something
resembling a timestamp, but it relaxes the number of spaces
between the email address and date.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
In case others want to use it...
|