Date | Commit message (Collapse) |
|
This allows users to customize by using smaller or larger Atom
feeds than the default value of 25 entries.
|
|
We don't actually use anything from SearchMsg,
just the class name.
|
|
We may not always use strftime and may implement caching.
But for now, just add a test.
|
|
This matches git-config(1) behavior, and implied user
intent when it comes to programatically editing files.
|
|
Since we use SearchMsg from Xapian data, we can be
assured we do not get self-referential {references}
field.
However, we may need to be more careful when checking
has_descendent for loops, as blindly calling add_child
could open us up to that possibility...
|
|
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
This fixes parentheses detection at sentence endings, as seen
in practice on emails.
|
|
This reverts commit 130d0c4e33c5c73dc69e270fc698735d49e0f159.
|
|
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
|
|
This will let us stream larger Atom documents bodies without
wasting too much memory and reduce the amount of round-trip
requests needed to get necessary information.
Hopefully clients are using streaming (SAX) parsers, too.
This is the final transition in the core public-inbox
code to allow migrating to a "pull"-based body streaming
scheme which allows a HTTP server to respond appropriately
to backpressure from slow clients.
|
|
We do not need to import IO::File into the main programs
since Perl 5.8+ supports literal "undef" for generating
anonymous temporary file handles.
|
|
Some broken (or malicious) mailers may include a generated
Message-ID in its References header, so be prepared for it.
|
|
This starts to show noticeable performance improvements when
attempting to thread over 400 messages; but the improvement
may not be measurable with less.
However, the resulting code is much shorter and (IMHO)
much easier to understand.
|
|
This roughly doubles performance due to the reduction in
object creation and abstraction layers.
|
|
Introduce our own SearchThread class for threading messages.
This should allow us to specialize and optimize away objects
in future commits.
|
|
Output $! for diagnostic purposes since I've noticed this on
two slow machines, today (and seemingly, never prior).
|
|
And while we're at it, ensure searching inside displayable
attachment bodies works.
|
|
"bs:" and "b:" are adapted from mairix(1)
We will also support searching explicitly for quoted vs
non-quoted text via "q:" and "nq:" prefixes since sometimes
readers will not care for quoted text.
In the future, we will support parsing diffs (perhaps when
repobrowse integration is complete).
Note: this roughly doubles the size of the Xapian database due
to the additional information; so this change may not be worth
it.
|
|
We only document the "s:" anyways. While the long name is more
descriptive, the ambiguity makes agnostic caching (by Varnish or
similar) slightly harder and longer URLs are more likely to be
accidentally truncated when shared.
|
|
Sometimes it can be useful to search based on who the
message was sent to, sent by, or Cc:-ed. Of course,
headers can be faked, but they usually are not...
Anyways this mostly matches the behavior of mairix(1).
|
|
Begin documenting some basic help functionality.
I may tweak the anchor names of the various HTML endpoints
to be more consistent with each other (old ones will be
supported for a short while), so I'm not documenting
those, for now.
This may become part of a builtin key-value store for
basic texts, but this probably shouldn't become a wiki
engine, either.
|
|
We're not to-the-letter about percent-encoding, but
we should allow all the characters. This is mainly
so we can effectively use the link to some Wikipedia
pages with parentheses in them:
https://en.wikipedia.org/wiki/Atom_(standard)
https://en.wikipedia.org/wiki/Git_(software)
|
|
This is similar to mairix in that it uses a "d:" prefix; but
only takes YYYYMMDD, for now. Using custom date/time parsers
via Perl will be much more work:
nntp://news.gmane.org/20151005222157.GE5880@survex.com
Anyhow, this ought to be more human-friendly than searching by
Unix timestamps, but it requires reindexing to take advantage of.
|
|
Not sure why or how I missed this before; but the common address
parsing routine we have should be more correct.
Add a test to ensure excessively quoted names don't make it
through, either.
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
Improve the discoverability of NNTP endpoints for users
who still know what NNTP is.
==> ~/.public-inbox/config <==
; aliases for the locally-run nntpd can be specified in
; the "publicinbox" section:
[publicinbox]
nntpserver = nntp://ou63pmih66umazou.onion/
nntpserver = news.public-inbox.org
; NNTPS is not supported natively, yet,
; but one can use haproxy or similar
; nntpserver = nntps://news.public-inbox.invalid/
; mirrors for specific inboxes may be specified either as full
; NNTP (or NNTPS) URLs, or with the server name only if the
; newsgroup name is specfied for a local NNTP server
[publicinbox "git"]
...
newsgroup = inbox.a.b.c
nntpmirror = nntp://czquwvybam4bgbro.onion/
nntpmirror = hjrcffqmbrq6wope.onion
; there may be a mirror on a different server with a
; different name:
nntpmirror = nntp://news.example.com/differently.named.group
; (And I really need to write manpages for all this...)
|
|
Oops. We will inevitably need to support multiple altids for a
public-inbox one day.
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
This will allow us to release and re-acquire Xapian locks
due to the lack of FD_CLOEXEC on some FDs.
|
|
PSGI applications (like our WWW :P) can fail unpredictability,
but lets try to avoid bringing the entire process down when this
happens.
|
|
If we completely undef an object, it is likely possible
to have the same scalar address as the original object
even if they are different. So keep the same object
around and only force creation of the same reference.
Tested on Perl 5.14.2 on Debian 7.x wheezy.
|
|
Oops :x
|
|
This is common when multiple participants are in a thread.
|
|
Currently only for git-http-backend use, this allows limiting
the number of spawned processes per-inbox or by group, if there
are multiple large inboxes amidst a sea of small ones.
For example, a "big" repo limiter could be used for big inboxes:
which would be shared between multiple repos:
[limiter "big"]
max = 4
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.git
; shared limiter with giant:
httpbackendmax = big
[publicinbox "giant"]
address = giant@project.org
mainrepo = /path/to/giant.git
; shared limiter with git:
httpbackendmax = big
; This is a tiny inbox, use the default limiter with 32 slots:
[publicinbox "meta"]
address = meta@public-inbox.org
mainrepo = /path/to/meta.git
|
|
And bump the default limit to 32 so we match git-daemon
behavior. This shall allow us to configure different levels
of concurrency for different repositories and prevent clones
of giant repos from stalling service to small repos.
|
|
We now generate all of our HTML using WwwStream which
forces us to have consistent headers and footers in
the HTML itself.
This also makes the search-capable vs search-less installs
go to the new.html endpoint to maintain consistency
(in case an admin decides to enable Xapian).
|
|
We should not fail tests when this is not available.
|
|
They're uncommon, fortunately, but we make no attempt to
handle nested comments (which would open us up to things
like CVE-2015-7686) or use the comment in place of a
missing name.
|
|
This avoids breaking clients on graceful shutdown since
NNTP responses should usually be quick.
|
|
Lighter and ever-so-slightly faster!
Most importantly, this won't do non-obvious stuff behind our
backs like trying to parse a POST request body for a query
string param.
|
|
This is lighter and we can work further towards eliminating
our Plack::Request dependency entirely.
|
|
It seems common for address entries to end up as:
"foo@example" <foo@example>
Avoid needlessly displaying the domain name in that case.
|
|
Probably better than bloating our own API with configurable
warning streams and such...
|
|
This encapsulates an entire PSGI response array, hopefully
making it easier to generate responses and avoid typos when
setting the Content-Type.
|
|
Angle brackets around the --in-reply-to= arg for git send-email
has been optional since git v1.5.3.2, so strip them and make the
command-line argument easier-to-type.
|
|
Address::names is sufficient to handle what from_name did.
|
|
We may remove from_name in the future.
...And disallow quotes in email addresses.
Technically I believe they're allowed, but they're definitely
uncommon and unlikely to show up in legitimate mail.
|
|
Mailing lists I watch and mirror may not have the best spam
filtering, and an extra layer should not hurt.
|
|
And improve documentation for existing dependencies, too.
|
|
This should hopefully make it easier to try other anti-spam
systems (or none at all) in the future.
|
|
Inboxes are normally created by Config, but having the
population logic in Inbox should make it easier to mock
for testing.
|