about summary refs log tree commit homepage
path: root/lib/PublicInbox/Import.pm
DateCommit message (Collapse)
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2016-10-16import: failed GC runs are non-fatal
We should not completely kill a process if "git gc --auto" errors out due to a warning or whatnot.
2016-09-08import: run "git gc --auto" when done
We need to prevent excessive repository growth for public-inbox-watch and public-inbox-mda users.
2016-09-08import: hoist out common run_die subroutine
We will be reusing this in the next commit, too.
2016-09-08import: hoist out _check_path function
This reduces duplication, slightly. We may be using it yet again in a to-be-introduced function (or we may not introduce it).
2016-08-15import: use common address parsing to drop unnecessary quotes
Not sure why or how I missed this before; but the common address parsing routine we have should be more correct. Add a test to ensure excessively quoted names don't make it through, either.
2016-08-12watch: respect altid for incremental watch changes
We need to pass the Inbox object to SearchIdx to get altid mappings properly for incremental imports. TODO: use the Inbox object in more places where it makes sense to do so.
2016-08-09searchidx: release Xapian FDs before spawning git log
This will allow us to release and re-acquire Xapian locks due to the lack of FD_CLOEXEC on some FDs.
2016-08-02search: improve reindexing behavior
For reindexing, fresh Xapian DBs do not count as a reindex, allowing users to blindly use --reindex on the first run on a clean repo. While we're at it, allow indexing to override HEAD ref for multi-head git repos.
2016-07-27localize $/ when using chomp
Callers may have localized $/ to something else, so make sure we chomp the expected character(s) when calling chomp.
2016-06-24watch_maildir: implement optional spam checking
Mailing lists I watch and mirror may not have the best spam filtering, and an extra layer should not hurt.
2016-06-19import: allow messages without subject
Because our WatchMaildir module is liberal about what it accepts, we can potentially have messages without a subject.
2016-06-17import: auto-update index when done
This prevents multiple update processes from stepping over each other while called under the lock, and also allows the new -watch process to update the index iff indexing was desired.
2016-05-25remove Email::Address dependency
git has stricter requirements for ident names (no '<>') which Email::Address allows. Even in 1.908, Email::Address also has an incomplete fix for CVE-2015-7686 with a DoS-able regexp for comments. Since we don't care for or need all the RFC compliance of Email::Address, avoiding it entirely may be preferable. Email::Address will still be installed as a requirement for Email::MIME, but it is only used by the Email::MIME::header_str_set which we do not use
2016-05-21import: avoid needless git update-server-info
We don't need to update-server-info (or read-tree) if fast import was spawned for removals and no changes were made.
2016-05-12import: fallback to email if '<>' exists in author name
git doesn't handle '<' and '>' characters in the author name at all regardless of quoting, not just matched pairs. So fall back to using the email as the author name since the commit info isn't critical, anyways (shallow clones are fine).
2016-05-12import: normalize body by stripping trailing newlines
Mbox formatters may add extra newlines at the end of the message, and that's not relevant for comparing messages for deletion.
2016-04-28import: run git-update-server-info when done
We should update $GIT_DIR/info/refs for dumb HTTP clients whenever we make changes to the repository. The best place to update is immediately after making commits. This fixes a bug where public-inbox-learn did not properly update $GIT_DIR/info/refs after inserting or removing messages.
2016-04-27import: document API for public consumption
This is probably trivial enough to be final?
2016-04-25remove ssoma dependency
By converting to using ourt git-fast-import-based Import module. This should allow us to be more easily installed.
2016-04-25import: extra check for final byte read
The read could fail entirely and leave $lf undefined.
2016-04-12import: filter out [<>] from user names
It confuses the git ident parser and may not be a great idea to fix in git since it could break interopability with older versions.
2016-04-11import: use bytes::length for true data length in bytes
git is byte-oriented and fast-import will not tolerate miscalculations. This is necessary for wide characters in commit messages (email Subjects).
2016-04-11import: set binmode before printing author names
Author names may have wide characters in them, so avoid warnings as git favors UTF-8 for names and fast-import even requires them for commit messages
2016-04-11import: initial module + test case
This will allow us to write fast importers for existing archives as well as eventually removing the ssoma dependency for performance and ease-of-installation.