about summary refs log tree commit homepage
DateCommit message (Collapse)
2019-10-23Merge branch 'regen'
* regen: v2writable: use msgmap as multi_mid queue v2writable: move git->cleanup to the correct place v2writable: reindex handles 3-headered monsters v2writable: improve "num_for" API and disambiguate v2writable: set unindexed article number
2019-10-22doc: clean-doc: remove generated standards.txt
I need to remove all the generated documentation files before running "git-set-file-times" for rsync to our website.
2019-10-22syscall: get rid of sendfile wrappers for now
I'm not sure they'll make a measurable difference or will be worth the effort in the future given the prevalance of HTTPS and giant socket buffers. Using Inline::C for this may make more sense in the future, too, especially if we want to be able to use GnuTLS.
2019-10-22hval: remove new_oneline
commit 476fc666c223f0fb ('reduce "PublicInbox::Hval->new_oneline" use') was mis-titled, since it completely eliminated ->new_oneline use.
2019-10-22git: remove src_blob_url
This was intended for solver, but it's unused since commit 915cd090798069a4 ("solver: switch patch application to use a callback")
2019-10-22watchmaildir: remove redundant _path_to_mime
InboxWritable::maildir_path_load exists and we may support it for use with standalone scripts.
2019-10-22inboxwritable: import_maildir uses maildir_path_load
I'm not sure if this will get used anywhere, but at least call a function which exists in dead code.
2019-10-22www: remove unused ctx_get sub
This hasn't been used since commit 48b21cb662c1e17b7 in 2016: ("declare Inbox object for reusability")
2019-10-22overidx: remove unused delete_articles sub
This hasn't been used since commit 1b7e935ab1690e28 ("searchidx: fix incremental index with indexlevel=basic on v1")
2019-10-22v2writable: use msgmap as multi_mid queue
Instead of storing Message-IDs in the Msgmap object, we can store the blob OID. For initial indexing of mirrors, this lets us preserve $sync->{regen} by storing the intended article number in the queue. On --reindex, the article number we store in Msgmap is ignored but only used for ordering purposes. This also allows us to avoid ENOMEM errors if somebody abuses our system by reusing Message-IDs; but we now risk ENOSPC instead (but systems tend to have more FS storage than RAM).
2019-10-22v2writable: move git->cleanup to the correct place
We need to stop the git process to avoid leaking FDs to Xapian if we recurse ->index_sync on reindex.
2019-10-21v2writable: reindex handles 3-headered monsters
And maybe 8-headered ones, too... I noticed --reindex failing on the linux-renesas-soc mirror due one 3-headed monster of a message having 3 sets of headers; while another normal message had a Message-ID that matched one of the 3 IDs of the 3-headed monster. We still try to do the majority of indexing backwards, but we defer indexing multi-Message-ID'd messages until the end to ensure we get all the "good" messages in before we process the multi-headered ones. Link: https://public-inbox.org/meta/20191016211415.GA6084@dcvr/
2019-10-21v2writable: improve "num_for" API and disambiguate
Make it obvious that we're not the Msgmap sub and return an array because it's less awkward than providing a modifiable ref to a function to write to.
2019-10-21v2writable: set unindexed article number
We'll actually use the keys of this hash in future commits.
2019-10-21doc: update 1.2 work-in-progress release notes
Will be updating this further after some reindex and multi-header bugs are fixed.
2019-10-21TODO: DHT (distributed hash table) for Message-IDs
This would encourage SPOF avoidance. It would also make it easier to do "Certificate Transparency"-style attestation of messages.
2019-10-18examples/grok-pull.post_update_hook: fix config detection
We need to account for both the old ("mainrepo") and new ("inboxdir") names. But "dir" was just a search+replace error and we don't use that outside of "coderepo.dir".
2019-10-17TODO: add a few more items around coderepos
Going where no email client/tool has gone before...
2019-10-17doc: v2-format: get man output under 80 cols
We need to better ensure our manpage output is readable with a standard terminal width. And fix some wording while we're at it: * use "inbox" instead of "list" for our storage * replace the last "$PART" reference with "$SHARD"
2019-10-17public-inbox-v2-format(5): fix formatting
This was being rendered as a paragraph, so line breaks weren't preserved and it was unreadable in man.
2019-10-17Merge remote-tracking branch 'origin/inboxdir'
* origin/inboxdir: config: remove redundant inboxdir check config: support "inboxdir" in addition to "mainrepo" examples/grok-pull.post_update_hook: use "inbox_dir"
2019-10-17doc: enable "check-man" target via "check" in gmake
man(1) on FreeBSD supports pathnames as operands just fine, so there's hope other BSDs follow suit and we can enable this check target everywhere.
2019-10-17doc: avoid [<directory>] arg for git-clone(1)
While it is possible to host source code from the root of a URL using git-http-backend(1), the lack of pathname in the URL can also be confusing to users. So just add the path name of the project into the URL itself so users can invoke "git clone" with one command-line argument instead of two. Of course, previously documented URLs continue to work as normal.
2019-10-17doc: update public-inbox-overview(7) for v2
The overview was v1-specific and probably confusing/misleading to new users since v2 is favored. Hopefully improve wording while we're at it and avoid overloading terms like "parts" (which could be confused with Xapian "shards"). Using the word "directly" after "Mirroring mailing lists" did not make sense to me, either.
2019-10-16doc: check-man: use COLUMNS env for width
That's the environment documented in ncurses(3) and man(1) from the man-db.nongnu.org distribution of man.
2019-10-16config: remove redundant inboxdir check
This was causing compatibility problems for old configs when using public-inbox-nntpd.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-10-16examples/grok-pull.post_update_hook: use "inbox_dir"
Move away from using "mainrepo" since it's confusing to new users, especially with v2.
2019-10-16doc: check-man: ignore backspace char
man(1) on FreeBSD unconditionally emits backspace characters for the bold effect despite its output being piped to awk(1). Also tested with the man-db.nongnu.org version provided with Debian (and presumably most other Linux systems).
2019-10-16mda: support --no-precheck option
Since -mda now supports List-ID to better support mirroring of existing mailing lists, it probably makes sense to support disabling the precheck function to provide more accurate (though potentially spammier) mirrors of lists
2019-10-16Merge branch 'listid'
* listid: wwwtext: show listid config directive(s) mda, watch: wire up List-ID header support config: allow "0" as a valid mainrepo path config: avoid unnecessary '||' use config: simplify lookup* methods config: we always have {-section_order} Config.pm: Add support for mailing list information
2019-10-16admin: show failing directory
Since public-inbox-index may be run against a large list of (intended) inboxes from the command-line, it's helpful to show which directory fails the resolution.
2019-10-16doc: "check-man" target to ensure we stay <=80 cols
This should prevent future documentation changes from exceeding the limit of standard terminals.
2019-10-16doc: check-NEWS.atom fails gracefully on FreeBSD make(1)
We should also note that the package "xmlstarlet" on FreeBSD installs a command "xml" (but not "xmlstarlet") on FreeBSD.
2019-10-15wwwtext: show listid config directive(s)
We want to share this piece for potential mirror-ers just like watchheader.
2019-10-15mda, watch: wire up List-ID header support
This also adds watchheader tests for -watch, which we never had before :x
2019-10-15config: allow "0" as a valid mainrepo path
It's probably wrong to use relative path names, but things are all relative these days anyways with shared and networked FSes.
2019-10-15config: avoid unnecessary '||' use
'//' is available in Perl 5.10+ which allows `0' and `""' (empty string) to remain unclobbered. We also don't need '||=' for initializing our internal caches.
2019-10-15config: simplify lookup* methods
This ensures we always process inboxes in section order and reduces the amount of code we have to maintain for each lookup. Avoiding the cost of inboxes object creation is not worth the code overhead; and we can implement a config cache via Storable easily for large configs and -mda users.
2019-10-15config: we always have {-section_order}
Rewrite a bunch of tests to use ordered input (emulating "git config -l" output) so we can always walk sections in the order they were given in the config file.
2019-10-15Config.pm: Add support for mailing list information
The world has turned since I first started following mailing lists and to my surprise every mailing list that I am subscribed to properly sets the "List-ID:" mailing list header. So instead of doing something clever and flexible I am adding support for looking up public inbox mailing lists by their mailing list name. That makes the work needed for each email trivial and easy to understand. - Parse the "List-ID:" header. - Lookup in the configuration which mailbox is connected to that "List-ID:" - Deliver the mail to that mailbox. To that end this change enhances PublicInbox to have an additional mailbox configuration parameter "listid" that holds the mailing list name. A method is added to the PublicInbox config object called lookup_list_id that given a mailing list name will return the PublicInbox in the configuration that is configured to handle that mailing list. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> [ew: avoid autovivification of $ibx->{listid} for t/config.t]
2019-10-15PublicInbox::Import Smuggle a raw message into add
I don't trust the MIME type to not munge my email messages in horrible ways upon occasion. Therefore allow for passing in the raw message value instead of trusting the mime object to preserve it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> [ew: use "//" from Perl 5.10+ for defined check]
2019-10-15doc: remove unnecessary dependency on RelNotes directory
It was causing unnecessary rebuilds of NEWS* files
2019-10-15INSTALL: recommend inotify|kqueue modules for -watch
Jan Kiszka reported high polling frequency when using -watch. It turns out OS-specific packages for Filesys::Notify::Simple do not pull in interfaces to use kqueue or inotify, which are required to perform power-efficient event-based wakeups on Maildir writes. Fix the name of the Filesys::Notify::Simple for FreeBSD while we're at it. Link: https://public-inbox.org/meta/c85803c6-6d77-a300-491a-9f310dd284c1@web.de/
2019-10-15TODO: add an item for Python pygments
Pygments seems to be a popular highlighter and widely available, so we'll be providing support for that at some point... Link: https://public-inbox.org/meta/20190926131836.GB10467@chatter.i7.local/ Link: https://public-inbox.org/meta/874l0zt7sd.fsf@alyssa.is/
2019-10-14TODO: add item for config linter and grapher
It'll be useful for tools and users to test and perhaps visualize configs before reloading -httpd/-nntpd/-watch.
2019-10-10t/git-http-backend: disable worker processes
We want to ensure we run lsof(8) on the worker (if needed), and not the master, which doesn't serve requests. This was originally on top of a test-only patch in https://public-inbox.org/meta/20190913015043.17149-1-e@80x24.org/ In any case, no point in spawning extra processes for this test.
2019-10-10doc: explain publicinbox.<name>.watchheader
It wasn't clear to me exactly what this does -- in particular, what happens if it isn't specified? Does it support multiple values? A very brief explanation can answer both of these questions without making somebody look at the code.
2019-10-09doc: use local modules to generate NEWS*
We shouldn't need installed modules to generate NEWS* files.
2019-10-09INSTALL: note that we prefer GNU make
ExtUtils::MakeMaker uses non-POSIX '::', at least; and our own Documentation/include.mk and our postamble are written for GNU make. GNU make is also more widely-installed and available than any other make; even if I'm not generally a fan of GNU-isms.