Date | Commit message (Collapse) |
|
One may build the initial index on a powerful host and transfer
it to a weaker one for incremental indexing. Thus there is
no requirement to have a configured public-inbox for building
the index unless a user needs altid support or some such.
|
|
We want to encourage users to serve repositories. So enable
bitmaps by default so performance suffers less with smart HTTP.
|
|
This can be useful for adding new lists, as restarting is
expensive (but still non-lossy).
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
This should make tweaking the way we search more efficiet
by allowing us to avoid doubling destroying the index every
time we want to change something.
We also give priority to incremental indexing via
public-inbox-{watch,mda} and have manual invocations of
public-inbox-index perform batch updates while releasing
ssoma.lock.
|
|
We don't want to leave fast_import_crash_* dumps
around on duplicates.
|
|
Oops :x
|
|
We don't need to care about client IPs anywhere.
|
|
Lighter and ever-so-slightly faster!
Most importantly, this won't do non-obvious stuff behind our
backs like trying to parse a POST request body for a query
string param.
|
|
Oops...
While we're at it, drop blank lines before the "From ", too,
since it could happen.
|
|
This should hopefully make it easier to try other anti-spam
systems (or none at all) in the future.
|
|
fork failures are unfortunately common when Xapian has
gigabytes and gigabytes mmapped.
|
|
This prevents multiple update processes from stepping over
each other while called under the lock, and also allows the
new -watch process to update the index iff indexing was
desired.
|
|
Give users some rope to do their own filtering.
|
|
This will allow users to run importers off existing mail
accounts where they may not have access to run -mda.
Currently, we only support Maildirs, but IMAP ought to be
doable.
|
|
This removes the Email::Filter dependency as well as the
signature-breaking scrubber code. We now prefer to
reject unacceptable messages and grudgingly (and blindly)
mirror messages we're not the primary endpoint for.
|
|
Email::Filter doesn't offer any functionality we need, here;
and our dependency on Email::Filter will gradually be removed
since it (and Email::LocalDelivery) seem abandoned and we
can have more-fine-grained control by rolling our own Maildir
delivery which can work transactionally.
|
|
We'll be relying on our spawn implementation, for now;
since it'll be consistent with the rest of our code and
can optionally take advantage of vfork.
|
|
We still pull it in via Email::LocalDelivery, but that
dependency will go away, soon.
|
|
User input is imperfect, do not pollute our mail logs with
warnings we cannot fix. This is documented in the
Email::MIME::ContentType manpage so it should remain supported.
|
|
git has stricter requirements for ident names (no '<>')
which Email::Address allows.
Even in 1.908, Email::Address also has an incomplete fix for
CVE-2015-7686 with a DoS-able regexp for comments. Since we
don't care for or need all the RFC compliance of Email::Address,
avoiding it entirely may be preferable.
Email::Address will still be installed as a requirement for
Email::MIME, but it is only used by the
Email::MIME::header_str_set which we do not use
|
|
Since PSGI does not require Transfer-Encoding: chunked or
Content-Length, we cannot expect random apps we host to chunk
their responses.
Thus, to improve interoperability, chunk at the HTTP layer like
other PSGI servers do. I'm chosing a more syscall-intensive method
(via multiple send(...MSG_MORE) for now to reduce copy + packet
overhead.
|
|
From the beginning, we've avoided objects here in favor
of faster startup time; but it may not be worth it
since a persistent httpd/nntpd is faster and -mda
isn't hit as often.
|
|
A public-inbox is NOT necessarily a mailing list, but it
could serve as an input point for zero, one, or infinite
mailing lists :D
|
|
We should update $GIT_DIR/info/refs for dumb HTTP clients
whenever we make changes to the repository. The best place
to update is immediately after making commits.
This fixes a bug where public-inbox-learn did not properly
update $GIT_DIR/info/refs after inserting or removing
messages.
|
|
By converting to using ourt git-fast-import-based Import
module. This should allow us to be more easily installed.
|
|
Hopefully this modularizes things a little and allows us
to work on a combined super server to save RAM.
|
|
It can confuse Email::MIME if we have it.
|
|
Avoid wasting memory and the risk of a potential reference
cycles by dropping the callback ASAP.
|
|
We have per-middleware evals to deal with them being missing;
no need to put an eval around the whole thing and use an
extra level of indentation.
|
|
This allows us to share more code between daemons and avoids
having to make additional syscalls for preparing REMOTE_HOST
and REMOTE_PORT in the PSGI env in -httpd.
This will also make supporting HTTP (and NNTP) over Unix sockets
easier in a future commit.
|
|
We've distilled the daemon code into one public function ("run"),
so avoid polluting the main namespace and just have users
prefix with the full package name for this rarely-used class.
|
|
Vestigial pieces from the nntpd code which aren't needed because
the psgi env already has the "psgi.errors" key.
|
|
We'll have to use it some more before deciding it is a public
interface. I do hope for it to be a usable public interface
one day for other users.
|
|
We do not need to load Plack::Request outside of WWW anymore.
|
|
This makes for better compile-time checking and also helps
document which calls are private for HTTP and NNTP.
While we're at it, use IO::Handle::* functions procedurally,
too, since we know we're working with native glob handles.
|
|
Not everybody will be running this behind a ReverseProxy;
but it's probably the likely configuration. Anyways,
warn about this and also about Deflater being missing.
|
|
This seems to match more closely with what is expected of Perl
packages based on how blib is used. Hopefully makes the top-level
source tree less cluttered and things easier-to-find.
|