Date | Commit message (Collapse) |
|
Wrap the old Import package to enable creating new repos based
on size thresholds. This is better than relying on time-based
rotation as LKML traffic seems to be increasing.
|
|
Big lists are orders of magnitude more efficient with v2.
|
|
This can be useful for getting baseline of performance
of just Email::MIME and Date: header parsing. We'll need
to do some Date: header parsing for LKML since there are
some wonky date formats which causes the git RFC822 parser
to choke.
|
|
The mboxes I got from cregit have two spaces after the email
address, while the "git format-patch" output I'm used to dealing
with only has one space.
It's still a "strict" match in that it checks for something
resembling a timestamp, but it relaxes the number of spaces
between the email address and date.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
This will be much faster and invoking -mda for every message.
|
|
Oops, due to an old mistake , List-ID was set incorrectly
in the MDA. This could cause some breakage w.r.t. mail filters.
|
|
This makes us closer to git.git style (though I'm not quite sure
why we do this...)
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
I needed to use this to resurrect some messages missing
from my initial downloads from gmane...
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
This is used to quickly generate an article number to Message-ID
mapping.
Usage:
NNTPSERVER=news.example.org ./scripts/xhdr-num2mid GROUP >file
|
|
In case others want to use it...
|
|
Oops :x
|
|
SpamAssassin often misses messages which contain viruses,
so ClamAV should fill that gap nicely.
|
|
Unfortunately, people screw up addresses enough and
for this to be a real problem.
|
|
|
|
Otherwise, tempfile() will use the current working directory,
which may not be writable.
|
|
A public-inbox is NOT necessarily a mailing list, but it
could serve as an input point for zero, one, or infinite
mailing lists :D
|
|
Unfortunately, most users still prefer their mail delivered
over SMTP; so we'll at least document mlmmj integration for now
until we can popularize pull-based reading over POP3/NNTP/ssoma.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
We want to be able to reject errors back to the MTA.
|
|
This prevents process growth when importing large messages.
Memory growth could be due to the sliding sbrk window in glibc malloc
or a circular reference in the Email::* Perl code somewhere.
|
|
PublicInbox::Config->lookup won't return unknown keys
|
|
This should alleviate fears of interrupting the process.
|
|
|
|
Some mailing lists (e.g. git@vger.kernel.org) accept messages
via Bcc: and possibly other things which get rejected by
the strict PublicInbox::Filter rules. So rely on ssoma-mda
instead.
This prefers a recent revision of ssoma-mda (commit 7fce38e9
onwards) to display subject/author/date information in the
commit message.
|
|
Apparently it's not a problem with recent archives.
|
|
We start with zero and only store the next valid ID.
|
|
This allows incremental imports of slrn spools, ideal for
tracking lists via gmane.
|
|
Any existing directory should do.
|
|
While we're at it, add a script for easy editing of user prefs.
We need some human-maintained rules based on the spam we get.
It's an imperfect world, but I'd _much_ rather deal with the
occassional spam than require signup/registration to post.
|
|
The old import_gmane_spool script was inflexible,
since we may import from maildir archives as well, so
get everything into maildir, first.
|
|
The ~/.dc-dlvr.pre script for my public-inbox user does this.
|
|
It should be common for a single users to be subscribed to multiple
addresses/lists, so we must use the address before alias expansion.
This partially reverts commit b949afc9edf89dd494cac6255c78b124d58e11a5
|
|
We normally want committer date to be different so we may
track delivery latencies (which do not differ much).
However, the rules for importing are much different and
tend to screw things up when using time ranges with git-rev-list.
|
|
Unfortunately, this means we get rid of parallelization,
as we need to preserve delivery order so HTML indices look
chronological. Order may also affect spam filtering and
training, too.
|
|
We may promote this to be a real script, since public-inbox-mda
is idempotent.
|
|
Trying my best to not forget things I wrote this years ago.
|
|
|
|
Just use $0 for now, since I suck at naming things.
|
|
|