Date | Commit message (Collapse) |
|
Mailing lists I watch and mirror may not have the best spam
filtering, and an extra layer should not hurt.
|
|
And improve documentation for existing dependencies, too.
|
|
This should hopefully make it easier to try other anti-spam
systems (or none at all) in the future.
|
|
Inboxes are normally created by Config, but having the
population logic in Inbox should make it easier to mock
for testing.
|
|
Favor Inbox objects as our primary source of truth to simplify
our code. This increases our coupling with PSGI to make it
easier to write tests in the future.
A lot of this code was originally designed to be usable
standalone without PSGI or CGI at all; but that might increase
development effort.
|
|
Only mark seen messages as spam, otherwise it could be
too aggressive and cause problems or over training.
We wouldn't want a wayward FIFO ruining our day, either :)
|
|
We can support spam removal by watching a special "spam"
Maildir, too. We can run public-inbox-learn as a separate
step, and that command will be improved to support
auto-learning, too.
|
|
This should be portable despite the intended use of this
directory being non-portable.
|
|
While we only want to stop our daemons and gracefully destroy
subprocesses, it is common for 'Ctrl-C' from a terminal to kill
the entire pgroup.
Killing an entire pgroup nukes subprocesses like git-upload-pack
breaks graceful shutdown on long clones. Make a best effort to
ensure git-upload-pack processes are not broken when somebody
signals an entire process group.
Followup-to: commit 37bf2db81bbbe114d7fc5a00e30d3d5a6fa74de5
("doc: systemd examples should only kill one process")
|
|
This will allow us to commonalize HTML generation in the future
and is the start of moving existing HTML generation to a "pull"
streaming model (from the existing "push" one).
Using the getline/close pull model is superior to the existing
$fh->write streaming as it allows us to throttle response
generation based on backpressure from slow clients.
|
|
We no longer depend on it for the core code, and tests
are optional for users. Hopefully this makes this
easier-to-install.
|
|
It should be possible to serve the contents of a public-inbox
over NNTP but not HTTP.
|
|
This removes the Email::Filter dependency as well as the
signature-breaking scrubber code. We now prefer to
reject unacceptable messages and grudgingly (and blindly)
mirror messages we're not the primary endpoint for.
|
|
This is transactional and hopefully safer in case we hit SIGSEGV
or SIGKILL during processing, as the tmp/ copy will remain on
the FS even if DESTROY/END handlers are not called.
|
|
This filter API should be independent of Email::Filter and
hopefully less intrusive to long running processes.
|
|
Email::Filter doesn't offer any functionality we need, here;
and our dependency on Email::Filter will gradually be removed
since it (and Email::LocalDelivery) seem abandoned and we
can have more-fine-grained control by rolling our own Maildir
delivery which can work transactionally.
|
|
Remove mbox tests since mbox is unreliable due to raciness
and incompatible implementations. We will drop support for
mbox emergency destinations, soon.
|
|
Totally unnecessary...
|
|
Since ssoma is optional, here, IPC::Run shall also be optional.
(And it may be removed entirely in the future).
|
|
Or whatever the appropriate Perl terminology, is...
And we will need to do something appropriate for other
encodings, too. I still barely understand Perl Unicode
despite attempting to understand the docs over the years..
|
|
We need to ensure we show the message body ASAP since
the thread generation via Xapian could take a while
and maybe even raise an exception or crash.
|
|
Oops :x Add an additional test for live data for any
unprintable characters, too, since this could be a dangerous
source of HTML injection.
|
|
This should reduce link following for replies and improve
visibility. This should also reduce cache overhead/footprint
for crawlers.
|
|
Plack::Request is unnecessary overhead for this given the
strictness of git-http-backend. Furthermore, having to make
commit 311c2adc8c63 ("avoid Plack::Request parsing body")
to avoid tempfiles should not have been necessary.
|
|
Oops, we totally forgot to automate testing for this :x
|
|
Most of its functionality is in the PublicInbox::Inbox class.
While we're at it, we no longer auto-create newsgroup names
based on the inbox name, since newsgroup names probably deserve
some thought when it comes to hierarchy.
|
|
We don't serve things like robots.txt, favicon.ico, or
.well-known/ endpoints ourselves, but ensure we can be
used with Plack::App::Cascade for others.
|
|
Oops, added a test to prevent regressions while we're at it.
|
|
git has stricter requirements for ident names (no '<>')
which Email::Address allows.
Even in 1.908, Email::Address also has an incomplete fix for
CVE-2015-7686 with a DoS-able regexp for comments. Since we
don't care for or need all the RFC compliance of Email::Address,
avoiding it entirely may be preferable.
Email::Address will still be installed as a requirement for
Email::MIME, but it is only used by the
Email::MIME::header_str_set which we do not use
|
|
Having an excessive amount of git-pack-objects processes is
dangerous to the health of the server. Queue up process spawning
for long-running responses and serve them sequentially, instead.
|
|
Since PSGI does not require Transfer-Encoding: chunked or
Content-Length, we cannot expect random apps we host to chunk
their responses.
Thus, to improve interoperability, chunk at the HTTP layer like
other PSGI servers do. I'm chosing a more syscall-intensive method
(via multiple send(...MSG_MORE) for now to reduce copy + packet
overhead.
|
|
Followup-to: commit 24e0219f364ed402f9136227756e0f196dc651aa
("remove GIT_DIR env usage in favor of --git-dir")
|
|
We need to ensure $? is set properly for users.
|
|
This hopefully makes the intent of the code clearer, too.
The the HTTP use of the numeric reference for getline
caused problems in Git.pm, already.
|
|
Having a file start with '.' or '-' can be confusing
and for users, so do not allow it.
|
|
We shall ensure links continue working for this.
|
|
Email::MIME >= 1.923 and < 1.935 would drop too many newlines
in attachments. This would lead to ugly text files without
a proper trailing newline if using quoted-printable, 7bit, or
8bit. Attachments encoded with base64 were not affected.
These versions of Email::MIME are widely available in Debian 8
(Jessie) and even Ubuntu LTS distros so we will need to support
this workaround for a while.
|
|
msg_iter lets us know the index of the attachment,
allow us to make more sensible labels and in a future
commit, hyperlinks to download attachments.
|
|
Unlike Email::MIME::walk_parts, this is non-recursive and gives
depth + index offset information about the part for creating
links for later retrieval
It is intended for read-only access and changes are not
propagated to the parent; however future versions of it
may clobber bodies or the original version as it iterates
to reduce memory overhead.
It is intended for making it easy to locate attachments within a
message in the WWW view.
|
|
There's no place for them in the commands and we don't take
messages; potentially printing them into a log opened in a
terminal is too dangerous.
Hoist out read_til_dot in the test while we're at it.
|
|
This can be useful for hammering a live HTTP server
with requests to ensure it does not fall over under
load.
|
|
We try to avoid issues like these by using relative URLs
in hrefs, but we can't avoid the problem with Location:
for redirects and Atom feeds which are likely to be
rehosted elsewhere.
We also reorder some of the code to work around a weird
issue on the psgi-plack mailing list:
<20160516073750.GA11931@dcvr.yhbt.net>
(Somewhere on https://groups.google.com/group/psgi-plack
but it's probably not bookmarkable)
|
|
From the beginning, we've avoided objects here in favor
of faster startup time; but it may not be worth it
since a persistent httpd/nntpd is faster and -mda
isn't hit as often.
|
|
A public-inbox is NOT necessarily a mailing list, but it
could serve as an input point for zero, one, or infinite
mailing lists :D
|
|
We'll need to test non-UTF-8 messages at some point, too.
There are lots of legacy-encoded messages in old archives
and I would not bet we behave sanely w.r.t. those.
|
|
The Xapian search index is required for the NNTP server, so
there's no point in calling system() for it like we do in
other tests. This should speed up the test a small amount.
|
|
Ugh, I really need to get off my ass to write automated tests for
an Apache2 + mod_perl config.
|
|
When serving large static files or large packs, we may call
Danga::Socket::write directly to queue up callbacks to resume
reading and defer firing them until the socket is writable.
This prevents us from scheduling writes or buffering until we
know the socket is writable and prevents needless buffering by
Danga::Socket when faced with slow clients.
For smart clones, this comes at the cost of throttling the
output of "git pack-objects" to the speed of the client
connection. This is probably not ideal, but is the behavior of
the standard git-daemon, too; and is preferable to running the
httpd out-of-memory. Buffering to the filesystem may be an
option in the future...
|
|
Process startup times are atrocious for fast tests and there's far
too much setup involved. Rely on git-fast-import instead; but
more work is needed in this area.
|
|
It limits flexibility and makes it harder to switch
to use PublicImport::Import.
|