Date | Commit message (Collapse) |
|
We will need to allow simultaneous iterators on the same
config object, since we'll need this for ExtMsg, NNTPD,
WwwListing, NewsWWW, and other places.
|
|
In Perl, we can simplify callers by passing a single array
all the way down the stack instead of a single array ref which
needs to be expanded every call.
|
|
Just some golfing to reduce scrolling and hopefully readability.
|
|
This gives better page cache utilization for Xapian indexing on
slow storage by improving locality for random I/O activity on
the Xapian DB.
Instead of doing a single-pass to index both SQLite and Xapian;
this indexes them separately. The first pass is identical to
indexlevel=basic: it indexes both over.sqlite3 and msgmap.sqlite3.
Subsequent passes only operate on a single Xapian shard for
documents belonging to that shard. Given enough shards, each
individual shard can be made small enough to fit into the kernel
page cache and avoid HDD seeks for read activity.
Doing rough tests with a busy system with a 7200 RPM HDD with ext4,
full indexing of LKML (9 epochs) goes from ~80 hours (-j0) to
~30 hours (-j8) with 16GB RAM with 7 shards configured and fsync(2)
disabled (--no-sync) and `--batch-size=10m'.
|
|
"\n" and other characters requiring quoting and/or escaping in
in $GIT_DIR/objects/info/alternates was not supported in git 2.11
and earlier; nor does it seem supported at all in libgit2.
This will allow us to support sharing git-cat-file or similar
endpoints across multiple inboxes via alternates.
This breaks an existing use case for anybody wacky
enough to put `\n' in the `inboxdir' pathname; but I doubt
this affects anybody.
|
|
Since we have IMAP client support in -watch; make sure per-URL
settings are familiar to git users by taking advantage of git's
URL matching abilities.
This requires git 1.8.5+, which most users ought to have
(though base CentOS 7 is on 1.8.3).
|
|
Finish up the IMAP-only portion of iterative config reloading,
which allows us to create all sub-ranges of an inbox up front.
The InboxIdler still uses ->each_inbox which will struggle with
100K inboxes.
Having messages in the top-level newsgroup name of an inbox will
still waste bandwidth for clients which want to do full syncs
once there's a rollover to a new 50K range. So instead, make
every inbox accessible exclusively via 50K slices in the form of
"$NEWSGROUP.$UID_MIN-$UID_END".
This introduces the DummyInbox, which makes $NEWSGROUP
and every parent component a selectable, empty inbox.
This aids navigation with mutt and possibly other MUAs.
Finally, the xt/perf-imap-list maintainer test is broken, now,
so remove it. The grep perlfunc is already proven effective,
and we'll have separate tests for mocking out ~100k inboxes.
|
|
This will be used to prevent reloading a giant config with
tens/hundreds of thousands of inboxes from blocking the event
loop.
|
|
The watchheader key supports only a single value. Supporting multiple
watchheader values was mentioned in discussion [1] of 8d3e3bd8 (doc:
explain publicinbox.<name>.watchheader, 2019-10-09), and it wasn't
clear if there was a need.
One scenario in which matching multiple headers would be convenient is
when someone wants to set up public-inbox archives for some small
projects but does _not_ want to run mailing lists for them, instead
allowing others to follow the project by any of the pull mechanisms.
Using a common underlying address, an address alias for each project
is configured via a third-party email provider, with messages for each
alias being exposed as a separate public-inbox archive. In this
setup, messages for an inbox cannot be selected by a List-ID header
but can be identified by the inbox's address in either the To or Cc
header.
To support such a use case, update the watchheader handling to
consider multiple values, accepting a message if it matches any value.
While selecting a message based on matching _any_ rather than _all_
values is motivated by the above scenario, it's worth noting that the
"any" behavior is consistent with how multiple listid config values
are handled.
[1] https://public-inbox.org/meta/20191010085118.r3amey4cayazfycb@dcvr/
|
|
This allows for a setup where a central config file for the web server
includes per-user config files.
|
|
I didn't wait until September to do it, this year!
|
|
Since we support inboxes with multiple URLs and multiple
infourls to reduce reliance on SPOFs, we'll do the same with
cgit URLs.
|
|
cgitrc files can have hundreds or thousands of lines in them and
slurping them into memory is a waste. "while (<$fh>)" only
reads one line at a time, whereas "for (<$fh>)" reads the entire
contents of the file into a temporary array.
|
|
Most spawn and popen_rd callers die on failure to spawn,
anyways, and some are missing checks entirely. This saves
us a bunch of verbose error-checking code in callers.
This also makes popen_rd more consistent, since it already
dies on pipe creation failures.
|
|
No point in lazy-loading these, since they're always loaded
anyways and would not have portability problems on systems with
minimal dependencies.
|
|
Since the beginning of this project, we've implicitly supported
inboxes with multiple URLs by relying on the Host: header sent
by the client ($env->{HTTP_HOST}).
We now offer the option to explicitly configure multiple URLs for
every inbox along with the ability to do a best-effort match for
matching hostnames.
|
|
Another place where we can replace anonymous subs with named
subs by passing a user-supplied arg.
|
|
This was causing compatibility problems for old configs
when using public-inbox-nntpd.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
It's probably wrong to use relative path names, but things are
all relative these days anyways with shared and networked FSes.
|
|
'//' is available in Perl 5.10+ which allows `0' and `""' (empty
string) to remain unclobbered.
We also don't need '||=' for initializing our internal caches.
|
|
This ensures we always process inboxes in section order and
reduces the amount of code we have to maintain for each lookup.
Avoiding the cost of inboxes object creation is not worth the
code overhead; and we can implement a config cache via Storable
easily for large configs and -mda users.
|
|
Rewrite a bunch of tests to use ordered input (emulating
"git config -l" output) so we can always walk sections in
the order they were given in the config file.
|
|
The world has turned since I first started following mailing lists and
to my surprise every mailing list that I am subscribed to properly
sets the "List-ID:" mailing list header. So instead of doing
something clever and flexible I am adding support for looking up
public inbox mailing lists by their mailing list name.
That makes the work needed for each email trivial and easy to understand.
- Parse the "List-ID:" header.
- Lookup in the configuration which mailbox is connected to that
"List-ID:"
- Deliver the mail to that mailbox.
To that end this change enhances PublicInbox to have an additional
mailbox configuration parameter "listid" that holds the mailing list
name.
A method is added to the PublicInbox config object called
lookup_list_id that given a mailing list name will return the
PublicInbox in the configuration that is configured to handle that
mailing list.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[ew: avoid autovivification of $ibx->{listid} for t/config.t]
|
|
This allows us to deal with newlines in config values,
since git-config(1) acquired "-z" support in git v1.5.3.
I'm not sure if it's actually useful in our case, but
maybe some multi-line texts could be added. And newlines
in path names are super useful!
|
|
We need to handle arbitrary integers and case-insensitive
variations of human words to match git-config(1) behavior,
since that's what users would expect given we use config
files parseable by git-config(1).
|
|
|
|
cgit uses atoi(3), and now we can retain compatibility.
|
|
We will still return a 404 by default to '/' for compatibility
with users of Plack::App::Cascade or similar. Inboxes are
sorted by modification times to help users detect activity
(similar to the /$INBOX/ topic view).
New configuration options:
* publicinbox.wwwlisting - configure the listing type
* publicinbox.<name>.hide - hide a particular inbox from the listing
See changes to public-inbox-config.pod for full descriptions
of the new options.
Requested-by: Leah Neukirchen <leah@vuxu.org>
https://public-inbox.org/meta/871sdfzy80.fsf@gmail.com/
|
|
Followup-to: 6e6f7999361925e4
("cleanup: use '$ibx' consistently when referring to Inbox refs")
|
|
'$inbox' is more human-readable, so that is for the more
human-readable name in most cases. Making our variable naming
more consistent should make the code easier-to-review and
harder to screw up.
|
|
We parse cgitrc for "repo.path", while we use "coderepo.dir" to
mean the same thing for non-cgit users. So I ended up confusing
myself, here.
But then again, git uses "--git-dir" and "GIT_DIR", so I suspect
"dir" is the better choice than "path", here
|
|
Hopefully this gets us closer to matching cgit upstream behavior
(which also lacks tests). We'll still need to support macro
expansion at some point for compatibility...
|
|
We can reduce the configuration needed to run cgit by reusing
the static file handling logic of the dumb git HTTP protocol.
I hate logos and icons, so don't expect public-inbox.org or
80x24.org to ever have those to waste users' bandwidth with :P
But I expect other users to find this useful.
|
|
project_list support still needs to be done
And tests need to be written... :<
|
|
I mainly need this to enforce RLIMIT_CPU (and RLIMIT_CORE)
when requests come which generate giant, unrealistic diffs.
Per-coderepo limiters may be added in the future. But for now,
I need to prevent cgit from monopolizing resources on my dinky
server.
|
|
This allows users to configure RLIMIT_{CORE,CPU,DATA} using
our "limiter" config directive when spawning external processes.
|
|
Requests intended for cgit are unlikely to conflict with
requests to inboxes. So we can safely hand those requests
off to cgit.cgi.
|
|
We can save admins the trouble of declaring [coderepo "..."]
sections in the public-inbox config by parsing the cgitrc
directly.
Macro expansion (e.g. $HTTP_HOST) expansion is not supported,
yet; but may be in the future.
|
|
There's no reason for us to have git-config(1) warn users when a
config file is entirely missing.
|
|
|
|
For cross-inbox Message-ID resolution; having some sort of
stable ordering makes the most sense. Relying on the
order of the config file seems most natural and allows us
to avoid introducing yet another configuration knob.
|
|
Maybe we'll default to a dark theme to promote energy savings...
See contrib/css/README for details
|
|
|
|
Actually, it turns out git.git/remote.c::valid_remote_nick
rules alone are insufficient. More checking is performed as
part of the refname in the git.git/refs.c::check_refname_component
I also considered rejecting URL-unfriendly inbox names entirely,
but realized some users may intentionally configure names not
handled by our WWW endpoint for archives they don't want
accessible over HTTP.
|
|
Since "publicinbox" sections are analogous to git remotes, we
may use the same rules for naming git remotes to reduce
cognitive overhead.
Most notably, this allows '.' in the middle of inbox names,
(e.g. "foo.bar") as it's common for email addresses, too.
|
|
This adds a new inbox configuration option 'indexlevel' that can take
the values 'full', 'medium', and 'basic'.
When set to 'full' everything is indexed including the positions
of all terms.
When set to 'medium' everything except the positions of terms is
indexed.
When set to 'basic' terms and positions are not indexed. Just the
Overview database for NNTP is created. Which is still quite good and
allows searching for messages by Message-ID. But there are no indexes to support
searching inside the email messages themselves.
Update the reindex tests to exercise the full medium and basic code paths
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
We will also treat all known list addresses as non-obfuscated.
By setting publicinbox.noObfuscate in ~/.public-inbox/config,
this will allow users to disable address obfuscation on a
per-domain or per-address basis.
|
|
This should simplify the rest of our code for handling
the do-not-obfuscate list.
|