Date | Commit message (Collapse) |
|
Output $! for diagnostic purposes since I've noticed this on
two slow machines, today (and seemingly, never prior).
|
|
And while we're at it, ensure searching inside displayable
attachment bodies works.
|
|
"bs:" and "b:" are adapted from mairix(1)
We will also support searching explicitly for quoted vs
non-quoted text via "q:" and "nq:" prefixes since sometimes
readers will not care for quoted text.
In the future, we will support parsing diffs (perhaps when
repobrowse integration is complete).
Note: this roughly doubles the size of the Xapian database due
to the additional information; so this change may not be worth
it.
|
|
We only document the "s:" anyways. While the long name is more
descriptive, the ambiguity makes agnostic caching (by Varnish or
similar) slightly harder and longer URLs are more likely to be
accidentally truncated when shared.
|
|
Sometimes it can be useful to search based on who the
message was sent to, sent by, or Cc:-ed. Of course,
headers can be faked, but they usually are not...
Anyways this mostly matches the behavior of mairix(1).
|
|
Begin documenting some basic help functionality.
I may tweak the anchor names of the various HTML endpoints
to be more consistent with each other (old ones will be
supported for a short while), so I'm not documenting
those, for now.
This may become part of a builtin key-value store for
basic texts, but this probably shouldn't become a wiki
engine, either.
|
|
We're not to-the-letter about percent-encoding, but
we should allow all the characters. This is mainly
so we can effectively use the link to some Wikipedia
pages with parentheses in them:
https://en.wikipedia.org/wiki/Atom_(standard)
https://en.wikipedia.org/wiki/Git_(software)
|
|
This is similar to mairix in that it uses a "d:" prefix; but
only takes YYYYMMDD, for now. Using custom date/time parsers
via Perl will be much more work:
nntp://news.gmane.org/20151005222157.GE5880@survex.com
Anyhow, this ought to be more human-friendly than searching by
Unix timestamps, but it requires reindexing to take advantage of.
|
|
Not sure why or how I missed this before; but the common address
parsing routine we have should be more correct.
Add a test to ensure excessively quoted names don't make it
through, either.
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
Improve the discoverability of NNTP endpoints for users
who still know what NNTP is.
==> ~/.public-inbox/config <==
; aliases for the locally-run nntpd can be specified in
; the "publicinbox" section:
[publicinbox]
nntpserver = nntp://ou63pmih66umazou.onion/
nntpserver = news.public-inbox.org
; NNTPS is not supported natively, yet,
; but one can use haproxy or similar
; nntpserver = nntps://news.public-inbox.invalid/
; mirrors for specific inboxes may be specified either as full
; NNTP (or NNTPS) URLs, or with the server name only if the
; newsgroup name is specfied for a local NNTP server
[publicinbox "git"]
...
newsgroup = inbox.a.b.c
nntpmirror = nntp://czquwvybam4bgbro.onion/
nntpmirror = hjrcffqmbrq6wope.onion
; there may be a mirror on a different server with a
; different name:
nntpmirror = nntp://news.example.com/differently.named.group
; (And I really need to write manpages for all this...)
|
|
Oops. We will inevitably need to support multiple altids for a
public-inbox one day.
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
This will allow us to release and re-acquire Xapian locks
due to the lack of FD_CLOEXEC on some FDs.
|
|
PSGI applications (like our WWW :P) can fail unpredictability,
but lets try to avoid bringing the entire process down when this
happens.
|
|
If we completely undef an object, it is likely possible
to have the same scalar address as the original object
even if they are different. So keep the same object
around and only force creation of the same reference.
Tested on Perl 5.14.2 on Debian 7.x wheezy.
|
|
Oops :x
|
|
This is common when multiple participants are in a thread.
|
|
Currently only for git-http-backend use, this allows limiting
the number of spawned processes per-inbox or by group, if there
are multiple large inboxes amidst a sea of small ones.
For example, a "big" repo limiter could be used for big inboxes:
which would be shared between multiple repos:
[limiter "big"]
max = 4
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.git
; shared limiter with giant:
httpbackendmax = big
[publicinbox "giant"]
address = giant@project.org
mainrepo = /path/to/giant.git
; shared limiter with git:
httpbackendmax = big
; This is a tiny inbox, use the default limiter with 32 slots:
[publicinbox "meta"]
address = meta@public-inbox.org
mainrepo = /path/to/meta.git
|
|
And bump the default limit to 32 so we match git-daemon
behavior. This shall allow us to configure different levels
of concurrency for different repositories and prevent clones
of giant repos from stalling service to small repos.
|
|
We now generate all of our HTML using WwwStream which
forces us to have consistent headers and footers in
the HTML itself.
This also makes the search-capable vs search-less installs
go to the new.html endpoint to maintain consistency
(in case an admin decides to enable Xapian).
|
|
We should not fail tests when this is not available.
|
|
They're uncommon, fortunately, but we make no attempt to
handle nested comments (which would open us up to things
like CVE-2015-7686) or use the comment in place of a
missing name.
|
|
This avoids breaking clients on graceful shutdown since
NNTP responses should usually be quick.
|
|
Lighter and ever-so-slightly faster!
Most importantly, this won't do non-obvious stuff behind our
backs like trying to parse a POST request body for a query
string param.
|
|
This is lighter and we can work further towards eliminating
our Plack::Request dependency entirely.
|
|
It seems common for address entries to end up as:
"foo@example" <foo@example>
Avoid needlessly displaying the domain name in that case.
|
|
Probably better than bloating our own API with configurable
warning streams and such...
|
|
This encapsulates an entire PSGI response array, hopefully
making it easier to generate responses and avoid typos when
setting the Content-Type.
|
|
Angle brackets around the --in-reply-to= arg for git send-email
has been optional since git v1.5.3.2, so strip them and make the
command-line argument easier-to-type.
|
|
Address::names is sufficient to handle what from_name did.
|
|
We may remove from_name in the future.
...And disallow quotes in email addresses.
Technically I believe they're allowed, but they're definitely
uncommon and unlikely to show up in legitimate mail.
|
|
Mailing lists I watch and mirror may not have the best spam
filtering, and an extra layer should not hurt.
|
|
And improve documentation for existing dependencies, too.
|
|
This should hopefully make it easier to try other anti-spam
systems (or none at all) in the future.
|
|
Inboxes are normally created by Config, but having the
population logic in Inbox should make it easier to mock
for testing.
|
|
Favor Inbox objects as our primary source of truth to simplify
our code. This increases our coupling with PSGI to make it
easier to write tests in the future.
A lot of this code was originally designed to be usable
standalone without PSGI or CGI at all; but that might increase
development effort.
|
|
Only mark seen messages as spam, otherwise it could be
too aggressive and cause problems or over training.
We wouldn't want a wayward FIFO ruining our day, either :)
|
|
We can support spam removal by watching a special "spam"
Maildir, too. We can run public-inbox-learn as a separate
step, and that command will be improved to support
auto-learning, too.
|
|
This should be portable despite the intended use of this
directory being non-portable.
|
|
While we only want to stop our daemons and gracefully destroy
subprocesses, it is common for 'Ctrl-C' from a terminal to kill
the entire pgroup.
Killing an entire pgroup nukes subprocesses like git-upload-pack
breaks graceful shutdown on long clones. Make a best effort to
ensure git-upload-pack processes are not broken when somebody
signals an entire process group.
Followup-to: commit 37bf2db81bbbe114d7fc5a00e30d3d5a6fa74de5
("doc: systemd examples should only kill one process")
|
|
This will allow us to commonalize HTML generation in the future
and is the start of moving existing HTML generation to a "pull"
streaming model (from the existing "push" one).
Using the getline/close pull model is superior to the existing
$fh->write streaming as it allows us to throttle response
generation based on backpressure from slow clients.
|
|
We no longer depend on it for the core code, and tests
are optional for users. Hopefully this makes this
easier-to-install.
|
|
It should be possible to serve the contents of a public-inbox
over NNTP but not HTTP.
|
|
This removes the Email::Filter dependency as well as the
signature-breaking scrubber code. We now prefer to
reject unacceptable messages and grudgingly (and blindly)
mirror messages we're not the primary endpoint for.
|
|
This is transactional and hopefully safer in case we hit SIGSEGV
or SIGKILL during processing, as the tmp/ copy will remain on
the FS even if DESTROY/END handlers are not called.
|
|
This filter API should be independent of Email::Filter and
hopefully less intrusive to long running processes.
|
|
Email::Filter doesn't offer any functionality we need, here;
and our dependency on Email::Filter will gradually be removed
since it (and Email::LocalDelivery) seem abandoned and we
can have more-fine-grained control by rolling our own Maildir
delivery which can work transactionally.
|
|
Remove mbox tests since mbox is unreliable due to raciness
and incompatible implementations. We will drop support for
mbox emergency destinations, soon.
|
|
Totally unnecessary...
|