Date | Commit message (Collapse) |
|
Since "publicinbox" sections are analogous to git remotes, we
may use the same rules for naming git remotes to reduce
cognitive overhead.
Most notably, this allows '.' in the middle of inbox names,
(e.g. "foo.bar") as it's common for email addresses, too.
|
|
This adds a new inbox configuration option 'indexlevel' that can take
the values 'full', 'medium', and 'basic'.
When set to 'full' everything is indexed including the positions
of all terms.
When set to 'medium' everything except the positions of terms is
indexed.
When set to 'basic' terms and positions are not indexed. Just the
Overview database for NNTP is created. Which is still quite good and
allows searching for messages by Message-ID. But there are no indexes to support
searching inside the email messages themselves.
Update the reindex tests to exercise the full medium and basic code paths
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
We will also treat all known list addresses as non-obfuscated.
By setting publicinbox.noObfuscate in ~/.public-inbox/config,
this will allow users to disable address obfuscation on a
per-domain or per-address basis.
|
|
This should simplify the rest of our code for handling
the do-not-obfuscate list.
|
|
This is lightly-tested and seems to work. I'm still
hesitant to support this, but the alternative of receiving death
threats for displaying unobfuscated addresses seems to
be not worth it.
|
|
This allows us to support centralized mailing lists (which suck,
but better than no mailing list at all).
|
|
There's no need to hold everything in memory, here,
since apparently "foreach" will read everything at
once in array context
(for some reason, I thought Perl5 was smart enough
to avoid creating a temporary array, here...)
|
|
This allows certain inboxes to override the global nntpserver
(perhaps under a different domain).
|
|
This seems like an unnecessary abstraction, or an abstraction
on the wrong level.
|
|
I'm not sure if we'll ever support sharing a config file
with other tools, but maybe we will, and "limiter" is
too generic.
|
|
This allows users to customize by using smaller or larger Atom
feeds than the default value of 25 entries.
|
|
Just having "limiter" in the prefix may confuse
it with something else. Use the full prefix to
avoid this confusion.
|
|
Improve the discoverability of NNTP endpoints for users
who still know what NNTP is.
==> ~/.public-inbox/config <==
; aliases for the locally-run nntpd can be specified in
; the "publicinbox" section:
[publicinbox]
nntpserver = nntp://ou63pmih66umazou.onion/
nntpserver = news.public-inbox.org
; NNTPS is not supported natively, yet,
; but one can use haproxy or similar
; nntpserver = nntps://news.public-inbox.invalid/
; mirrors for specific inboxes may be specified either as full
; NNTP (or NNTPS) URLs, or with the server name only if the
; newsgroup name is specfied for a local NNTP server
[publicinbox "git"]
...
newsgroup = inbox.a.b.c
nntpmirror = nntp://czquwvybam4bgbro.onion/
nntpmirror = hjrcffqmbrq6wope.onion
; there may be a mirror on a different server with a
; different name:
nntpmirror = nntp://news.example.com/differently.named.group
; (And I really need to write manpages for all this...)
|
|
Oops. We will inevitably need to support multiple altids for a
public-inbox one day.
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
|
|
Currently only for git-http-backend use, this allows limiting
the number of spawned processes per-inbox or by group, if there
are multiple large inboxes amidst a sea of small ones.
For example, a "big" repo limiter could be used for big inboxes:
which would be shared between multiple repos:
[limiter "big"]
max = 4
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.git
; shared limiter with giant:
httpbackendmax = big
[publicinbox "giant"]
address = giant@project.org
mainrepo = /path/to/giant.git
; shared limiter with git:
httpbackendmax = big
; This is a tiny inbox, use the default limiter with 32 slots:
[publicinbox "meta"]
address = meta@public-inbox.org
mainrepo = /path/to/meta.git
|
|
This fills in the internal lookup hashes and simplifies
callers.
|
|
fork failures are unfortunately common when Xapian has
gigabytes and gigabytes mmapped.
|
|
Inboxes are normally created by Config, but having the
population logic in Inbox should make it easier to mock
for testing.
|
|
This will allow users to run importers off existing mail
accounts where they may not have access to run -mda.
Currently, we only support Maildirs, but IMAP ought to be
doable.
|
|
We still pull it in via Email::LocalDelivery, but that
dependency will go away, soon.
|
|
Most of its functionality is in the PublicInbox::Inbox class.
While we're at it, we no longer auto-create newsgroup names
based on the inbox name, since newsgroup names probably deserve
some thought when it comes to hierarchy.
|
|
It's moved into the Inbox module and we no longer use it
in WWW
|
|
Oops, added a test to prevent regressions while we're at it.
|
|
We may spawn this in a large server process, so be sure
to take advantage of the optional vfork() support when
for folks who set PERL_INLINE_DIRECTORY.
|
|
This hopefully makes the intent of the code clearer, too.
The the HTTP use of the numeric reference for getline
caused problems in Git.pm, already.
|
|
This should make creating test cases easier and faster.
|
|
From the beginning, we've avoided objects here in favor
of faster startup time; but it may not be worth it
since a persistent httpd/nntpd is faster and -mda
isn't hit as often.
|
|
A public-inbox is NOT necessarily a mailing list, but it
could serve as an input point for zero, one, or infinite
mailing lists :D
|
|
For error messages intended to show user error (e.g. giving
invalid options), we add a newline ("\n") at the end to
polluting the output with location information.
However, for diagnosing non-user-triggered errors, we should
show the location of where the error occured.
|
|
We can rely on timely auto-destruction based on reference
counting; reducing the chance of redundant close(2) calls
which may hit the wront FD.
We do care about certain close calls (e.g. writing to a buffered
IO handle) if we require error-checking for write-integrity. In
other cases, let things go out-of-scope so it can be freed
automatically after use.
|
|
These will be reused elsewhere.
|
|
Apparently git allows them, and they're definitely alright
in email addresses.
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
For list where we are not the primary archival entry point,
defaulting to filter=scrub makes sense since their list
conventions may be more tolerant of HTML and other crap
than we are.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
We may not have PATH available on some servers (e.g. webrick)
and must rely on the hardcoded system PATH. My installation of
IPC::Run does not seem to work without PATH set in the env,
however normal Perl "open" calls work fine.
|
|
It looks ugly in error messages to have redundant slashes.
|
|
%ENV changes do not propagate by default under a mod_perl2
environment, and we might as well work towards being thread-safe.
|
|
Do not repeat ourselves, just use the same description file
gitweb uses to avoid surprising users.
|
|
This may be useful for generating List-Id headers and HTML pages.
|
|
This makes it possible to gradually migrate to new address in case
of list name changes, and is one step closer to operating in
"stealth hijack mode" :)
|
|
We will be using it when setting GIT_COMMITTER_NAME
|
|
We will just use the fallback in Email::Filter to
reduce configuration knobs. Failed messages are failed
messages, do not classify them beyond that.
|
|
Hopefully this makes for less ad-hoc hash access in case
our config format changes.
|
|
We will be reusing the config parsing code for the CGI
script, too.
|
|
|
|
We'll be using git config files after all...
|